You are on page 1of 38

1/38

Statistics

Multiple Linear Regression

Shaheena Bashir

FALL, 2019
2/38
Outline

Background
Introduction
Estimation
Example
Analysis
Checking the Regression Model Assumptions
Multicollinearity
Covariance
Properties
Correlation
Pearson’s Correlation
Spearman’s Rank Correlation
o
3/38
Outline

o
4/38
Background

Example

Blood pressure tends to increase with age, body mass, and stress.
To investigate the relationship of blood pressure to these variables,
a sample of men in a large corporation was selected. For each
subject, their age (years), body mass (kg), and a stress index
(ranges from 0 to 100) was recorded along with their blood
pressure.

o
5/38
Background

BP Data
Age BP (mm) Body Mass (Kg) Stress Index
50 120 55 69
20 141 47 83
20 124 33 77
30 126 65 75
30 117 47 71
50 129 58 73
60 123 46 67
50 125 68 71
40 132 70 77
55 123 42 69
40 132 33 74
40 155 55 86
20 147 48 84
o
6/38
Background

Scatter Plot
20 30 40 50 60
60
50

age
40
30
20


150

● ●
140


bp
130


● ●
● ● ●
120


● ● ●
85

● ●
● ●
80

● ● ● ●
Stress
75

● ●
● ●
● ●
● ● ● ●
70

● ● ● ●
● ●
70

70
● ● ●
● ● ●
● ● ●
60

60
● ● ● ● ●
● ● ● ● ● ●
● ●
BM
50

50

● ● ● ● ● ● ●●
● ● ●
● ● ● o
40

40
● ● ● ● ● ●
7/38
Background

Regression Models

I c = 125.94 + 0.10BM, with R 2 = 0.011,


BP
I c = 143 − 0.33age, with R 2 = 0.147,
BP
I c = 6.4 + 1.65Stress, with R 2 = 0.82,
BP
I c = −60.3 + 2.32Stress + 0.422age, with adjusted
BP
R 2 = 0.95,

o
8/38
Background

o
9/38
Background

Background

I An insurance company is interested in how last year’s claims


can predict a person’s time in the hospital this year
I They want to use an enormous amount of data contained in
claims to predict a single number. Simple linear regression is
not equipped to handle more than one predictor.
I How can one generalize SLR to incorporate lots of
independent variables (regressors) for the purpose of
prediction?
I What are the consequences of adding lots of regressors
(independent variables)?

o
10/38
Background

Background Cont’d
I If X is linearly related to Y, then simple linear regression line
explains some of the variability in Y
I In most cases, there is still a lot of variability about the line
remaining.
I Some of the unexplained variability may be explained by
including other predictors in the model.

o
11/38
Introduction

The Multiple Linear Regression Model

I The general multiple linear model extends simple linear


regression (SLR) by adding terms linearly into the model, i.e.,
Y = βo + β1 X1 + · · · + βp Xp + 
I The error terms  are assumed to have mean 0 for every value
of x
I We want to estimate p + 2 (p + 1 regression coefficients, & 1
residual variance σ 2 ) parameters and make inference about
the coefficients in the model.
I The interpretation of the parameters β1 , β2 , . . . , βp is different
than in the simple model. Multiple regression‘adjusts’ a
coefficient for the linear impact of the other variables.

o
12/38
Estimation

Least Squares/Maximum Likelihood Estimators

(yi − β1 X1 − · · · − βp Xp )2
P
I Least squares minimizes
I On taking partial derivatives and setting them equal to 0 we
get the p + 1 normal equations which must be solved
simultaneously to get the estimators.

o
13/38
Estimation

Regression Model: Matrix Notation


I Let Y be the response vector.
 
y1
Y =  ... 
 

yn
I Define the Design Matrix X to be the n × (p + 1) matrix.
 
1 x11 · · · x1p
 .. .. .. 
X = . . ··· . 
1 xn1 · · · xnp
I The error vector is similarly defined as  = (1 , . . . , n )t & the
vector of coefficients as β = (βo , β1 , . . . , βp )t
I The linear regression model can be written as
Y = Xβ +  o
14/38
Estimation

Estimation of Parameters of Regression Model: β

I For any given β the residuals can be written as ei = yi − xi β,


where i = 1, . . . , n is the i th row of X.
I Then the sum of squares of residuals can be
X
S(β) = (yi − xi β)2 = (Y − X β)t (Y − X β)
i

I On taking derivatives and setting them equal to 0 we get the


least squares estimates

β̂ = (X t X )−1 (X t Y )

A Side Note: The estimation method is Least Squares, not


matrices
o
15/38
Estimation

Estimation of Parameters of Regression Model: σ 2

I The fitted values from the regression are Ŷ = X β̂


I The residuals can be written as e = Y − Ŷ = Y − X β̂
I Then the sum of squares of residuals can be

SSE = e T e = (Y − X β)t (Y − X β)

I The estimator of residual variance


SSE
σ̂ 2 =
n − (p + 1)

o
16/38
Estimation

Simple Linear Regression: Matrix Notation


Simple regression is a special case of multiple regression with
p = 1 and can be formulated in the same matrix framework.
I Let Y be the observed response vector.
I The Design Matrix X is
 
1 x1
X =  ... ... 
 

1 xn

I The error vector is similarly defined as  = (1 , . . . , n )t & the


vector of coefficients as β = (βo , β1 )
I The linear regression model can be written as

Y = Xβ + 
o
17/38
Estimation

Estimation of Parameters of Regression Model: β


" #
n Pnx̄ 2
t
X X = nx̄ xi
i
The inverse of this matrix is
" P 2 #
1 xi −nx̄
(X t X )−1 = i
nSxx −nx̄ n
P 2 2
P 2
where Sxx = xi − nx̄ = (xi − x̄)
i i
 P 
yi
X t Y =  Pi 
xi yi
i
A little algebra shows that these estimates agree with those given
o
for the simple regression model.
18/38
Estimation

Estimation of Parameters of Regression Model: σ 2

" P 2 #
1 xi −nx̄
2 t −1 2
σ (X X ) = σ i
nSxx −nx̄ n
" 2
#
σ 2 ( n1 + Sx̄xx ) − Sx̄xx σ 2
= σ2
− Sx̄xx σ 2 Sxx

o
19/38
Example
Analysis

Multiple Linear Regression Model: BP Data


c = −61.33 + 0.45 Age +2.37 Stress −0.087 BM
BP
I These coefficients are interpreted as the MARGINAL increase
in the blood pressure when each variable changes by 1 unit
AND ALL OTHER VARIABLES REMAIN FIXED

Estimate Std. Error t value Pr(>|t|)


(Intercept) -61.33 13.22 -4.64 0.00
Age 0.46 0.07 6.53 0.00
Stress 2.38 0.15 15.37 0.00
BM -0.09 0.06 -1.51 0.17

Adjusted R 2 = 0.9583
A side note: In multiple regression settings, the R 2 will always
increase as more variables are included in the model. That’s why
the adjusted R 2 is the preferred measure as it adjusts for the
o
number of variables considered.
20/38
Example
Checking the Regression Model Assumptions

3D Scatter plot
3D Scatterplot


160
150


●●
140

● ●
●●
bp


130

age
60
● 50

120

40

30
110

20
65 70 75 80 85 90

Stress

o
21/38
Example
Checking the Regression Model Assumptions

Residuals vs Fitted Values

4


2


Residuals

● ●
0





−2


−4

120 130 140 150

Fitted
o
22/38
Example
Checking the Regression Model Assumptions

Normal Probability Plot of Residuals


Normal Q−Q Plot

4


2


Sample Quantiles

● ●
0





−2


−4

−1.5 −1.0 −0.5 0.0 0.5 1.0 1.5

Theoretical Quantiles
o
23/38
Multicollinearity

Many correlated variables in the Regression Model?

I Does higher collinearity causes bias?


I How collinearity affects the variance estimates?
o
24/38
Covariance

Covariance

The covariance between two variables X and Y denoted as σXY is:


  
Cov (X , Y ) = E X − E (X ) . Y − E (Y )
     
= E XY − E X .E Y

If X and Y are independent, then


     
Cov (X , Y ) = E XY − E X .E Y
       
= E X .E Y − E X .E Y
= 0

However, the converse is not true

o
25/38
Covariance
Properties

Covariance: Properties

Cov (X , Y ) = Cov (Y , X ); Symmetry


Cov (X , X ) = Var (X ) ≥ 0; positive semi-definite
Cov (aX , Y ) = aCov (X , Y )
X X XX
Cov ( Xi , Yj ) = Cov (Xi , Yj )
i j i j

 
1.3 13.98
Cov =
13.98 184.82

o
26/38
Covariance
Properties

Covariance: Shape of the Data

https://www.visiondummy.com/2014/04/geometric-
c
o
interpretation-covariance-matrix/
27/38
Correlation

o
28/38
Correlation
Pearson’s Correlation

Correlation

● ● ●
● ●
● ● ●
● ●

100
● ● ●
90

● ●● ●●●
● ● ●
● ● ● ● ●●

400
● ● ●
● ●
● ●● ●●
● ●
● ●●● ●
●● ●●
●●● ●● ● ●
● ● ● ● ● ●●● ● ● ●

95
● ● ● ●●●● ●● ● ● ● ● ●
● ● ●
●● ● ●● ● ●● ● ●● ● ●
80

● ● ● ● ●● ● ● ● ●
● ● ● ● ●● ●● ●● ●
● ●● ● ● ●● ● ● ●● ●● ●● ●
●● ●● ● ● ●● ● ● ● ●
●● ●● ● ● ● ● ● ● ●
●● ●● ● ●●● ● ●

300
90

● ● ● ● ●●
● ● ● ●● ● ● ●●
● ●
waiting

Postwt

● ● ● ●● ● ●

disp
70

● ● ●● ● ● ●
● ●
● ● ● ● ●

● ●
● ● ●
85


● ● ● ● ●●
● ● ● ●
● ● ● ●
● ● ●
● ● ● ● ● ● ●

200

60

● ● ● ●●● ● ●●
● ●● ●● ●● ●● ● ●
● ●● ●
80

● ●
● ● ● ●
●● ● ● ●
● ●● ●● ● ● ●
●●●●●● ● ● ● ●
●● ●● ● ● ●● ● ●
● ● ●● ● ● ●● ●
● ●● ● ● ● ●
50

● ● ● ● ● ● ●●
75

●●● ● ● ● ● ●
● ● ● ●

100
● ● ● ●●
●●● ● ● ●
●● ● ●
● ● ●
● ● ●

1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 70 75 80 85 90 95 10 15 20 25 30

eruptions Prewt mpg

o
29/38
Correlation
Pearson’s Correlation

The Sample Correlation

I The sample estimate of ρ based on the bivariate random


sample (y1 , x1 ), (y2 , x2 ), . . . , (yn , xn ) is:
P
(yi − ȳ )(xi − x̄)
r = pP
(xi − x̄)2 . (yi − ȳ )2
P

I This sample quantity also satisfies that −1 < r < 1.


I It is important to note that r only measures the strength of
the linear relationship between the two observed variables.
I For a plausible linear relationship between X and Y , r gives a
very good indication of the strength and direction of that
relationship.
o
30/38
Correlation
Pearson’s Correlation

Correlation: Example

The armspan & height of 8 people are given:

Person 1 2 3 4 5 6 7 8
Armspan(inches) 68 63 65 69 68 69 62 61
Height(inches) 69 62 65 70 67 67 63 62

Calculate the strength of linear relationship between armspan &


height.

o
31/38
Correlation
Pearson’s Correlation

Interpretation of Correlation Coefficient

I Exactly -1 indicates a perfect downhill (negative) linear


relationship.
I Close to -1 indicates a strong downhill (negative) linear
relationship.
I Close to 0 means no linear relationship exists.
I Close to +1 indicates a strong uphill (positive) linear
relationship
I Exactly +1 indicates a perfect uphill (positive) linear
relationship

o
32/38
Correlation
Pearson’s Correlation

o
33/38
Correlation
Spearman’s Rank Correlation

Spearman’s Rank Correlation: Background

8000
● ●

● ●


90

● ●● ●●●

7000
● ● ●
● ● ● ● ●● ●
● ●
● ●● ● ● ●
● ● ● ●●●
●● ●

●●● ●● ●
● ● ● ● ● ●●● ● ●
● ● ●●
●●
● ●● ● ● ●
● ● ● ●● ● ●● ● ●● ●

6000
80

● ● ● ● ●● ●
● ● ● ● ●● ● ● ●●
● ●
● ● ● ●● ● ● ●● ●● ●● ●
●● ●● ● ● ● ● ● ●
●● ●● ● ● ● ● ●
● ● ●
● ● ●
●●
● ● ● ● ●● ●
● ● ● ●● ●

5000
Volume

waiting

● ● ● ● ● ● ●
70

● ● ●● ●
● ●


● ●
● ● ●
● ● ● ●

4000
● ● ●
● ● ● ●
60

● ● ● ●● ● ●
● ●● ● ● ●●
● ●● ●
● ● ●
●● ●

3000
● ●● ● ● ●

●●●●● ● ● ●

●● ●● ● ●
● ● ● ● ●
●●●● ● ● ● ●
50

● ● ● ● ●
●●● ● ● ●
● ● ●
● ● ●
2000
●●
● ●
●● ● ●

● ●

1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 400 450 500 550

eruptions Pitch

o
34/38
Correlation
Spearman’s Rank Correlation

Spearman’s Rank Correlation: Background

I The fact that r measures only linear association and can be


affected by unusual observations
I When x and y variables are ordinal variables (their values fall
into categories, but the possible values can be placed into an
order and given a numerical value that has some meaning,
e.g., grades on a scale of A = 4, B = 3, C = 2, D = 1, and E
= 0).

o
35/38
Correlation
Spearman’s Rank Correlation

Introduction

I Spearmans rank correlation coefficient, denoted by rS , is


based on separate ranks of the x’s and y’s. Specifically, let ri
be the rank of xi among x1 , . . . , xn and let si be the rank of yi
among y1 , . . . , yn . The Spearman rank correlation rS , is then
simply the correlation coefficient computed on the ranks
(ri , si ) instead of the original observations (xi , yi ).
I It’s the nonparametric (no assumption of normality)
counterpart to Pearson’s correlation.

o
36/38
Correlation
Spearman’s Rank Correlation

Spearman’s Rank Correlation: Properties

I Spearman’s rank applies to ordinal/quantitative data only


I It does not require the variables to be numerical
I Spearmans rank correlation is the same as Pearsons correlation
except that its calculated based on the ranks of the x variable
and the ranks of the y variable rather than their actual values
I Spearman’s rank correlation rs follows −1 < rs < 1.

o
37/38
Correlation
Spearman’s Rank Correlation

Example
Volume Pitch
1 1760 529
2 2040 566
3 2440 473
4 2550 461
5 2730 465
6 2740 532
7 3010 484
8 3080 527
9 3370 488
10 3740 485
11 4910 478
12 5090 434
13 5090 468
14 5380 449
15 5850 425
16 6730 389
17 6990 421
18 7960 416

I Rank the volume from the 1=lowest to n=highest (where n is


the number of pairs of data in the data set).
I Compute the ranks for pitch variable similarly.
I Calculate the correlation coefficient based on the ranks of o
pitch & volume.
38/38
Correlation
Spearman’s Rank Correlation

Example

Pitch rank.Pitch. Volume rank.Volume.


1 529 16.00 1760 1.00
2 566 18.00 2040 2.00
3 473 10.00 2440 3.00
4 461 7.00 2550 4.00
5 465 8.00 2730 5.00
6 532 17.00 2740 6.00
7 484 12.00 3010 7.00
8 527 15.00 3080 8.00
9 488 14.00 3370 9.00
10 485 13.00 3740 10.00
11 478 11.00 4910 11.00
12 434 5.00 5090 12.50
13 468 9.00 5090 12.50
14 449 6.00 5380 14.00
15 425 4.00 5850 15.00
16 389 1.00 6730 16.00
17 421 3.00 6990 17.00
18 416 2.00 7960 18.00

You might also like