Professional Documents
Culture Documents
Statistics
Shaheena Bashir
FALL, 2019
2/38
Outline
Background
Introduction
Estimation
Example
Analysis
Checking the Regression Model Assumptions
Multicollinearity
Covariance
Properties
Correlation
Pearson’s Correlation
Spearman’s Rank Correlation
o
3/38
Outline
o
4/38
Background
Example
Blood pressure tends to increase with age, body mass, and stress.
To investigate the relationship of blood pressure to these variables,
a sample of men in a large corporation was selected. For each
subject, their age (years), body mass (kg), and a stress index
(ranges from 0 to 100) was recorded along with their blood
pressure.
o
5/38
Background
BP Data
Age BP (mm) Body Mass (Kg) Stress Index
50 120 55 69
20 141 47 83
20 124 33 77
30 126 65 75
30 117 47 71
50 129 58 73
60 123 46 67
50 125 68 71
40 132 70 77
55 123 42 69
40 132 33 74
40 155 55 86
20 147 48 84
o
6/38
Background
Scatter Plot
20 30 40 50 60
60
50
age
40
30
20
●
150
● ●
140
●
bp
130
●
● ●
● ● ●
120
●
●
● ● ●
85
● ●
● ●
80
● ● ● ●
Stress
75
● ●
● ●
● ●
● ● ● ●
70
● ● ● ●
● ●
70
70
● ● ●
● ● ●
● ● ●
60
60
● ● ● ● ●
● ● ● ● ● ●
● ●
BM
50
50
●
● ● ● ● ● ● ●●
● ● ●
● ● ● o
40
40
● ● ● ● ● ●
7/38
Background
Regression Models
o
8/38
Background
o
9/38
Background
Background
o
10/38
Background
Background Cont’d
I If X is linearly related to Y, then simple linear regression line
explains some of the variability in Y
I In most cases, there is still a lot of variability about the line
remaining.
I Some of the unexplained variability may be explained by
including other predictors in the model.
o
11/38
Introduction
o
12/38
Estimation
(yi − β1 X1 − · · · − βp Xp )2
P
I Least squares minimizes
I On taking partial derivatives and setting them equal to 0 we
get the p + 1 normal equations which must be solved
simultaneously to get the estimators.
o
13/38
Estimation
yn
I Define the Design Matrix X to be the n × (p + 1) matrix.
1 x11 · · · x1p
.. .. ..
X = . . ··· .
1 xn1 · · · xnp
I The error vector is similarly defined as = (1 , . . . , n )t & the
vector of coefficients as β = (βo , β1 , . . . , βp )t
I The linear regression model can be written as
Y = Xβ + o
14/38
Estimation
β̂ = (X t X )−1 (X t Y )
SSE = e T e = (Y − X β)t (Y − X β)
o
16/38
Estimation
1 xn
Y = Xβ +
o
17/38
Estimation
" P 2 #
1 xi −nx̄
2 t −1 2
σ (X X ) = σ i
nSxx −nx̄ n
" 2
#
σ 2 ( n1 + Sx̄xx ) − Sx̄xx σ 2
= σ2
− Sx̄xx σ 2 Sxx
o
19/38
Example
Analysis
Adjusted R 2 = 0.9583
A side note: In multiple regression settings, the R 2 will always
increase as more variables are included in the model. That’s why
the adjusted R 2 is the preferred measure as it adjusts for the
o
number of variables considered.
20/38
Example
Checking the Regression Model Assumptions
3D Scatter plot
3D Scatterplot
●
160
150
●
●●
140
● ●
●●
bp
●
130
age
60
● 50
●
120
40
30
110
20
65 70 75 80 85 90
Stress
o
21/38
Example
Checking the Regression Model Assumptions
4
●
●
2
●
Residuals
● ●
0
●
●
●
●
●
−2
●
−4
Fitted
o
22/38
Example
Checking the Regression Model Assumptions
4
●
●
2
●
Sample Quantiles
● ●
0
●
●
●
●
●
−2
●
−4
Theoretical Quantiles
o
23/38
Multicollinearity
Covariance
o
25/38
Covariance
Properties
Covariance: Properties
1.3 13.98
Cov =
13.98 184.82
o
26/38
Covariance
Properties
https://www.visiondummy.com/2014/04/geometric-
c
o
interpretation-covariance-matrix/
27/38
Correlation
o
28/38
Correlation
Pearson’s Correlation
Correlation
● ● ●
● ●
● ● ●
● ●
100
● ● ●
90
● ●● ●●●
● ● ●
● ● ● ● ●●
400
● ● ●
● ●
● ●● ●●
● ●
● ●●● ●
●● ●●
●●● ●● ● ●
● ● ● ● ● ●●● ● ● ●
95
● ● ● ●●●● ●● ● ● ● ● ●
● ● ●
●● ● ●● ● ●● ● ●● ● ●
80
● ● ● ● ●● ● ● ● ●
● ● ● ● ●● ●● ●● ●
● ●● ● ● ●● ● ● ●● ●● ●● ●
●● ●● ● ● ●● ● ● ● ●
●● ●● ● ● ● ● ● ● ●
●● ●● ● ●●● ● ●
300
90
● ● ● ● ●●
● ● ● ●● ● ● ●●
● ●
waiting
Postwt
● ● ● ●● ● ●
disp
70
● ● ●● ● ● ●
● ●
● ● ● ● ●
●
● ●
● ● ●
85
●
● ● ● ● ●●
● ● ● ●
● ● ● ●
● ● ●
● ● ● ● ● ● ●
200
●
60
● ● ● ●●● ● ●●
● ●● ●● ●● ●● ● ●
● ●● ●
80
● ●
● ● ● ●
●● ● ● ●
● ●● ●● ● ● ●
●●●●●● ● ● ● ●
●● ●● ● ● ●● ● ●
● ● ●● ● ● ●● ●
● ●● ● ● ● ●
50
● ● ● ● ● ● ●●
75
●●● ● ● ● ● ●
● ● ● ●
100
● ● ● ●●
●●● ● ● ●
●● ● ●
● ● ●
● ● ●
o
29/38
Correlation
Pearson’s Correlation
Correlation: Example
Person 1 2 3 4 5 6 7 8
Armspan(inches) 68 63 65 69 68 69 62 61
Height(inches) 69 62 65 70 67 67 63 62
o
31/38
Correlation
Pearson’s Correlation
o
32/38
Correlation
Pearson’s Correlation
o
33/38
Correlation
Spearman’s Rank Correlation
8000
● ●
●
● ●
●
●
90
● ●● ●●●
7000
● ● ●
● ● ● ● ●● ●
● ●
● ●● ● ● ●
● ● ● ●●●
●● ●
●
●●● ●● ●
● ● ● ● ● ●●● ● ●
● ● ●●
●●
● ●● ● ● ●
● ● ● ●● ● ●● ● ●● ●
6000
80
● ● ● ● ●● ●
● ● ● ● ●● ● ● ●●
● ●
● ● ● ●● ● ● ●● ●● ●● ●
●● ●● ● ● ● ● ● ●
●● ●● ● ● ● ● ●
● ● ●
● ● ●
●●
● ● ● ● ●● ●
● ● ● ●● ●
5000
Volume
●
waiting
● ● ● ● ● ● ●
70
● ● ●● ●
● ●
●
●
● ●
● ● ●
● ● ● ●
4000
● ● ●
● ● ● ●
60
● ● ● ●● ● ●
● ●● ● ● ●●
● ●● ●
● ● ●
●● ●
3000
● ●● ● ● ●
●
●●●●● ● ● ●
●
●● ●● ● ●
● ● ● ● ●
●●●● ● ● ● ●
50
● ● ● ● ●
●●● ● ● ●
● ● ●
● ● ●
2000
●●
● ●
●● ● ●
● ●
1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 400 450 500 550
eruptions Pitch
o
34/38
Correlation
Spearman’s Rank Correlation
o
35/38
Correlation
Spearman’s Rank Correlation
Introduction
o
36/38
Correlation
Spearman’s Rank Correlation
o
37/38
Correlation
Spearman’s Rank Correlation
Example
Volume Pitch
1 1760 529
2 2040 566
3 2440 473
4 2550 461
5 2730 465
6 2740 532
7 3010 484
8 3080 527
9 3370 488
10 3740 485
11 4910 478
12 5090 434
13 5090 468
14 5380 449
15 5850 425
16 6730 389
17 6990 421
18 7960 416
Example