Lect6 Math231

1/38
Statistics
Multiple Linear Regression
Shaheena Bashir
FALL, 2019
2/38
Outline
Background
Introduction
Estimation
Example
Analysis
Checking the Regression Model Assumptions
Multicollinearity
Covariance
Properties
Correlation
Pearson’s Correlation
Spearman’s Rank Correlation
o
3/38
Outline
o
4/38
Background
Example
Blood pressure tends to increase with age, body mass, and stress.
To investigate the relationship of blood pressure to these variables,
a sample of men in a large corporation was selected. For each
subject, their age (years), body mass (kg), and a stress index
(ranges from 0 to 100) was recorded along with their blood
pressure.
o
5/38
Background
BP Data
Age BP (mm) Body Mass (Kg) Stress Index
50 120 55 69
20 141 47 83
20 124 33 77
30 126 65 75
30 117 47 71
50 129 58 73
60 123 46 67
50 125 68 71
40 132 70 77
55 123 42 69
40 132 33 74
40 155 55 86
20 147 48 84
o
6/38
Background
Scatter Plot
20 30 40 50 60
60
50
age
40
30
20
●
150
● ●
140
●
bp
130
●
● ●
● ● ●
120
●
●
● ● ●
85
● ●
● ●
80
● ● ● ●
Stress
75
● ●
● ●
● ●
● ● ● ●
70
● ● ● ●
● ●
70
70
● ● ●
● ● ●
● ● ●
60
60
● ● ● ● ●
● ● ● ● ● ●
● ●
BM
50
50
●
● ● ● ● ● ● ●●
● ● ●
● ● ● o
40
40
● ● ● ● ● ●
7/38
Background
Regression Models
I c = 125.94 + 0.10BM, with R 2 = 0.011,

BP
I c = 143 − 0.33age, with R 2 = 0.147,
BP
I c = 6.4 + 1.65Stress, with R 2 = 0.82,
BP
I c = −60.3 + 2.32Stress + 0.422age, with adjusted
BP
R 2 = 0.95,
o
8/38
Background
o
9/38
Background
Background
I An insurance company is interested in how last year’s claims

can predict a person’s time in the hospital this year
I They want to use an enormous amount of data contained in
claims to predict a single number. Simple linear regression is
not equipped to handle more than one predictor.
I How can one generalize SLR to incorporate lots of
independent variables (regressors) for the purpose of
prediction?
I What are the consequences of adding lots of regressors
(independent variables)?
o
10/38
Background
Background Cont’d
I If X is linearly related to Y, then simple linear regression line
explains some of the variability in Y
I In most cases, there is still a lot of variability about the line
remaining.
I Some of the unexplained variability may be explained by
including other predictors in the model.
o
11/38
Introduction
The Multiple Linear Regression Model
I The general multiple linear model extends simple linear

regression (SLR) by adding terms linearly into the model, i.e.,
Y = βo + β1 X1 + · · · + βp Xp +
I The error terms are assumed to have mean 0 for every value
of x
I We want to estimate p + 2 (p + 1 regression coefficients, & 1
residual variance σ 2 ) parameters and make inference about
the coefficients in the model.
I The interpretation of the parameters β1 , β2 , . . . , βp is different
than in the simple model. Multiple regression‘adjusts’ a
coefficient for the linear impact of the other variables.
o
12/38
Estimation
Least Squares/Maximum Likelihood Estimators
(yi − β1 X1 − · · · − βp Xp )2
P
I Least squares minimizes
I On taking partial derivatives and setting them equal to 0 we
get the p + 1 normal equations which must be solved
simultaneously to get the estimators.
o
13/38
Estimation
Regression Model: Matrix Notation

I Let Y be the response vector.
 
y1
Y =  ... 
 
yn
I Define the Design Matrix X to be the n × (p + 1) matrix.
 
1 x11 · · · x1p
 .. .. .. 
X = . . ··· . 
1 xn1 · · · xnp
I The error vector is similarly defined as = (1 , . . . , n )t & the
vector of coefficients as β = (βo , β1 , . . . , βp )t
I The linear regression model can be written as
Y = Xβ + o
14/38
Estimation
Estimation of Parameters of Regression Model: β
I For any given β the residuals can be written as ei = yi − xi β,

where i = 1, . . . , n is the i th row of X.
I Then the sum of squares of residuals can be
X
S(β) = (yi − xi β)2 = (Y − X β)t (Y − X β)
i
I On taking derivatives and setting them equal to 0 we get the

least squares estimates
β̂ = (X t X )−1 (X t Y )
A Side Note: The estimation method is Least Squares, not

matrices
o
15/38
Estimation
Estimation of Parameters of Regression Model: σ 2
I The fitted values from the regression are Ŷ = X β̂

I The residuals can be written as e = Y − Ŷ = Y − X β̂
I Then the sum of squares of residuals can be
SSE = e T e = (Y − X β)t (Y − X β)
I The estimator of residual variance

SSE
σ̂ 2 =
n − (p + 1)
o
16/38
Estimation
Simple Linear Regression: Matrix Notation

Simple regression is a special case of multiple regression with
p = 1 and can be formulated in the same matrix framework.
I Let Y be the observed response vector.
I The Design Matrix X is
 
1 x1
X =  ... ... 
 
1 xn
I The error vector is similarly defined as = (1 , . . . , n )t & the

vector of coefficients as β = (βo , β1 )
I The linear regression model can be written as
Y = Xβ +
o
17/38
Estimation
Estimation of Parameters of Regression Model: β

" #
n Pnx̄ 2
t
X X = nx̄ xi
i
The inverse of this matrix is
" P 2 #
1 xi −nx̄
(X t X )−1 = i
nSxx −nx̄ n
P 2 2
P 2
where Sxx = xi − nx̄ = (xi − x̄)
i i
 P 
yi
X t Y =  Pi 
xi yi
i
A little algebra shows that these estimates agree with those given
o
for the simple regression model.
18/38
Estimation
Estimation of Parameters of Regression Model: σ 2
" P 2 #
1 xi −nx̄
2 t −1 2
σ (X X ) = σ i
nSxx −nx̄ n
" 2
#
σ 2 ( n1 + Sx̄xx ) − Sx̄xx σ 2
= σ2
− Sx̄xx σ 2 Sxx
o
19/38
Example
Analysis
Multiple Linear Regression Model: BP Data

c = −61.33 + 0.45 Age +2.37 Stress −0.087 BM
BP
I These coefficients are interpreted as the MARGINAL increase
in the blood pressure when each variable changes by 1 unit
AND ALL OTHER VARIABLES REMAIN FIXED
Estimate Std. Error t value Pr(>|t|)

(Intercept) -61.33 13.22 -4.64 0.00
Age 0.46 0.07 6.53 0.00
Stress 2.38 0.15 15.37 0.00
BM -0.09 0.06 -1.51 0.17
Adjusted R 2 = 0.9583
A side note: In multiple regression settings, the R 2 will always
increase as more variables are included in the model. That’s why
the adjusted R 2 is the preferred measure as it adjusts for the
o
number of variables considered.
20/38
Example
3D Scatter plot
3D Scatterplot
●
160
150
●
●●
140
● ●
●●
bp
●
130
age
60
● 50
●
120
40
30
110
20
65 70 75 80 85 90
Stress
o
21/38
Example
Residuals vs Fitted Values
4
●
●
2
●
Residuals
● ●
0
●
●
●
●
●
−2
●
−4
120 130 140 150
Fitted
o
22/38
Example
Normal Probability Plot of Residuals

Normal Q−Q Plot
4
●
●
2
●
Sample Quantiles
● ●
0
●
●
●
●
●
−2
●
−4
−1.5 −1.0 −0.5 0.0 0.5 1.0 1.5
Theoretical Quantiles
o
23/38
Multicollinearity
Many correlated variables in the Regression Model?
I Does higher collinearity causes bias?

I How collinearity affects the variance estimates?
o
24/38
Covariance
Covariance
The covariance between two variables X and Y denoted as σXY is:

Cov (X , Y ) = E X − E (X ) . Y − E (Y )

= E XY − E X .E Y
If X and Y are independent, then

Cov (X , Y ) = E XY − E X .E Y

= E X .E Y − E X .E Y
= 0
However, the converse is not true
o
25/38
Covariance
Properties
Covariance: Properties
Cov (X , Y ) = Cov (Y , X ); Symmetry

Cov (X , X ) = Var (X ) ≥ 0; positive semi-definite
Cov (aX , Y ) = aCov (X , Y )
X X XX
Cov ( Xi , Yj ) = Cov (Xi , Yj )
i j i j

1.3 13.98
Cov =
13.98 184.82
o
26/38
Covariance
Properties
Covariance: Shape of the Data
https://www.visiondummy.com/2014/04/geometric-
c
o
interpretation-covariance-matrix/
27/38
Correlation
o
28/38
Correlation
Correlation
● ● ●
● ●
● ● ●
● ●
100
● ● ●
90
● ●● ●●●
● ● ●
● ● ● ● ●●
400
● ● ●
● ●
● ●● ●●
● ●
● ●●● ●
●● ●●
●●● ●● ● ●
● ● ● ● ● ●●● ● ● ●
95
● ● ● ●●●● ●● ● ● ● ● ●
● ● ●
●● ● ●● ● ●● ● ●● ● ●
80
● ● ● ● ●● ● ● ● ●
● ● ● ● ●● ●● ●● ●
● ●● ● ● ●● ● ● ●● ●● ●● ●
●● ●● ● ● ●● ● ● ● ●
●● ●● ● ● ● ● ● ● ●
●● ●● ● ●●● ● ●
300
90
● ● ● ● ●●
● ● ● ●● ● ● ●●
● ●
waiting
Postwt
● ● ● ●● ● ●
disp
70
● ● ●● ● ● ●
● ●
● ● ● ● ●
●
● ●
● ● ●
85
●
● ● ● ● ●●
● ● ● ●
● ● ● ●
● ● ●
● ● ● ● ● ● ●
200
●
60
● ● ● ●●● ● ●●
● ●● ●● ●● ●● ● ●
● ●● ●
80
● ●
● ● ● ●
●● ● ● ●
● ●● ●● ● ● ●
●●●●●● ● ● ● ●
●● ●● ● ● ●● ● ●
● ● ●● ● ● ●● ●
● ●● ● ● ● ●
50
● ● ● ● ● ● ●●
75
●●● ● ● ● ● ●
● ● ● ●
100
● ● ● ●●
●●● ● ● ●
●● ● ●
● ● ●
● ● ●
1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 70 75 80 85 90 95 10 15 20 25 30
eruptions Prewt mpg
o
29/38
Correlation
The Sample Correlation
I The sample estimate of ρ based on the bivariate random

sample (y1 , x1 ), (y2 , x2 ), . . . , (yn , xn ) is:
P
(yi − ȳ )(xi − x̄)
r = pP
(xi − x̄)2 . (yi − ȳ )2
P
I This sample quantity also satisfies that −1 < r < 1.

I It is important to note that r only measures the strength of
the linear relationship between the two observed variables.
I For a plausible linear relationship between X and Y , r gives a
very good indication of the strength and direction of that
relationship.
o
30/38
Correlation
Correlation: Example
The armspan & height of 8 people are given:
Person 1 2 3 4 5 6 7 8
Armspan(inches) 68 63 65 69 68 69 62 61
Height(inches) 69 62 65 70 67 67 63 62
Calculate the strength of linear relationship between armspan &

height.
o
31/38
Correlation
Interpretation of Correlation Coefficient
I Exactly -1 indicates a perfect downhill (negative) linear

relationship.
I Close to -1 indicates a strong downhill (negative) linear
relationship.
I Close to 0 means no linear relationship exists.
I Close to +1 indicates a strong uphill (positive) linear
relationship
I Exactly +1 indicates a perfect uphill (positive) linear
relationship
o
32/38
Correlation
o
33/38
Correlation
Spearman’s Rank Correlation: Background
8000
● ●
●
● ●
●
●
90
● ●● ●●●
7000
● ● ●
● ● ● ● ●● ●
● ●
● ●● ● ● ●
● ● ● ●●●
●● ●
●
●●● ●● ●
● ● ● ● ● ●●● ● ●
● ● ●●
●●
● ●● ● ● ●
● ● ● ●● ● ●● ● ●● ●
6000
80
● ● ● ● ●● ●
● ● ● ● ●● ● ● ●●
● ●
● ● ● ●● ● ● ●● ●● ●● ●
●● ●● ● ● ● ● ● ●
●● ●● ● ● ● ● ●
● ● ●
● ● ●
●●
● ● ● ● ●● ●
● ● ● ●● ●
5000
Volume
●
waiting
● ● ● ● ● ● ●
70
● ● ●● ●
● ●
●
●
● ●
● ● ●
● ● ● ●
4000
● ● ●
● ● ● ●
60
● ● ● ●● ● ●
● ●● ● ● ●●
● ●● ●
● ● ●
●● ●
3000
● ●● ● ● ●
●
●●●●● ● ● ●
●
●● ●● ● ●
● ● ● ● ●
●●●● ● ● ● ●
50
● ● ● ● ●
●●● ● ● ●
● ● ●
● ● ●
2000
●●
● ●
●● ● ●
● ●
1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 400 450 500 550
eruptions Pitch
o
34/38
Correlation
Spearman’s Rank Correlation: Background
I The fact that r measures only linear association and can be

affected by unusual observations
I When x and y variables are ordinal variables (their values fall
into categories, but the possible values can be placed into an
order and given a numerical value that has some meaning,
e.g., grades on a scale of A = 4, B = 3, C = 2, D = 1, and E
= 0).
o
35/38
Correlation
Introduction
I Spearmans rank correlation coefficient, denoted by rS , is

based on separate ranks of the x’s and y’s. Specifically, let ri
be the rank of xi among x1 , . . . , xn and let si be the rank of yi
among y1 , . . . , yn . The Spearman rank correlation rS , is then
simply the correlation coefficient computed on the ranks
(ri , si ) instead of the original observations (xi , yi ).
I It’s the nonparametric (no assumption of normality)
counterpart to Pearson’s correlation.
o
36/38
Correlation
Spearman’s Rank Correlation: Properties
I Spearman’s rank applies to ordinal/quantitative data only

I It does not require the variables to be numerical
I Spearmans rank correlation is the same as Pearsons correlation
except that its calculated based on the ranks of the x variable
and the ranks of the y variable rather than their actual values
I Spearman’s rank correlation rs follows −1 < rs < 1.
o
37/38
Correlation
Example
Volume Pitch
1 1760 529
2 2040 566
3 2440 473
4 2550 461
5 2730 465
6 2740 532
7 3010 484
8 3080 527
9 3370 488
10 3740 485
11 4910 478
12 5090 434
13 5090 468
14 5380 449
15 5850 425
16 6730 389
17 6990 421
18 7960 416
I Rank the volume from the 1=lowest to n=highest (where n is

the number of pairs of data in the data set).
I Compute the ranks for pitch variable similarly.
I Calculate the correlation coefficient based on the ranks of o
pitch & volume.
38/38
Correlation
Example
Pitch rank.Pitch. Volume rank.Volume.

1 529 16.00 1760 1.00
2 566 18.00 2040 2.00
3 473 10.00 2440 3.00
4 461 7.00 2550 4.00
5 465 8.00 2730 5.00
6 532 17.00 2740 6.00
7 484 12.00 3010 7.00
8 527 15.00 3080 8.00
9 488 14.00 3370 9.00
10 485 13.00 3740 10.00
11 478 11.00 4910 11.00
12 434 5.00 5090 12.50
13 468 9.00 5090 12.50
14 449 6.00 5380 14.00
15 425 4.00 5850 15.00
16 389 1.00 6730 16.00
17 421 3.00 6990 17.00
18 416 2.00 7960 18.00

Lect6 Math231

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Lect6 Math231

Uploaded by

Copyright:

Available Formats

1/38

Multiple Linear Regression

I c = 125.94 + 0.10BM, with R 2 = 0.011,

I An insurance company is interested in how last year’s claims

The Multiple Linear Regression Model

I The general multiple linear model extends simple linear

Least Squares/Maximum Likelihood Estimators

Regression Model: Matrix Notation

Estimation of Parameters of Regression Model: β

I For any given β the residuals can be written as ei = yi − xi β,

I On taking derivatives and setting them equal to 0 we get the

A Side Note: The estimation method is Least Squares, not

Estimation of Parameters of Regression Model: σ 2

I The fitted values from the regression are Ŷ = X β̂

I The estimator of residual variance

Simple Linear Regression: Matrix Notation

I The error vector is similarly defined as  = (1 , . . . , n )t & the

Estimation of Parameters of Regression Model: β

Estimation of Parameters of Regression Model: σ 2

Multiple Linear Regression Model: BP Data

Estimate Std. Error t value Pr(>|t|)

Residuals vs Fitted Values

120 130 140 150

Normal Probability Plot of Residuals

−1.5 −1.0 −0.5 0.0 0.5 1.0 1.5

Many correlated variables in the Regression Model?

I Does higher collinearity causes bias?

The covariance between two variables X and Y denoted as σXY is:

If X and Y are independent, then

However, the converse is not true

Cov (X , Y ) = Cov (Y , X ); Symmetry

Covariance: Shape of the Data

1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 70 75 80 85 90 95 10 15 20 25 30

eruptions Prewt mpg

The Sample Correlation

I The sample estimate of ρ based on the bivariate random

I This sample quantity also satisfies that −1 < r < 1.

The armspan & height of 8 people are given:

Calculate the strength of linear relationship between armspan &

Interpretation of Correlation Coefficient

I Exactly -1 indicates a perfect downhill (negative) linear

Spearman’s Rank Correlation: Background

Spearman’s Rank Correlation: Background

I The fact that r measures only linear association and can be

I Spearmans rank correlation coefficient, denoted by rS , is

Spearman’s Rank Correlation: Properties

I Spearman’s rank applies to ordinal/quantitative data only

I Rank the volume from the 1=lowest to n=highest (where n is

Pitch rank.Pitch. Volume rank.Volume.

You might also like

I The error vector is similarly defined as = (1 , . . . , n )t & the