You are on page 1of 14

How to do a linear regression?

Tom Broekel
Diagnostics: Multicollinearity

1 © TBroekel
Diagnostics of linear regression: Multicollinearity

Strong correlation between explanatory variables may cause


multicollinearit

Perfect multicollinearity: (at least) one explanatory variable(s) is


linear combination of other explanatory variable

Isolation of one variable’s influence difficult, as this variable’s variance


largely explained by others (see relation simple and multiple regression)

2 © TBroekel
y

Diagnostics of linear regression: Multicollinearity


Symptoms of multicollinearit

Inconsistent estimates (variance of coefficients too large (inflated)

Impact of individual explanatory variable difficult to identify (usually


insignificant

Relatively large R2 with few significant explanatory variable

Strongly varying results when model changed slightly (selected


explanatory variables

Strong correlation between explanatory variables (RP>0.7)

3 © TBroekel
)

Diagnostics of linear regression: Multicollinearity

Tests of multicollinearit

Correlation analysis of explanatory variables - > However,


multicollinearity might be a multivariate proble

Exact test: Variance inflation factor (VIF)

4 © TBroekel
y

Diagnostics of linear regression: Multicollinearity

Variance inflation facto

Estimation of auxiliary regressions, regressions of one explanatory with


all other explanatory variable

Original Regression with three explanatory variable

Y~a+b1*X1+b2*X2+b3*X3+e

Auxiliary regression

X1~a+b*X2+b3*X

X2~a+b*X11+b3*X3

X3~a+b*X1+b*X2

5 © TBroekel
3

Diagnostics of linear regression: Multicollinearity


Variance inflation facto

Goodness of fits (R2) of auxiliary regressions indicate strength of linear


relationships among explanatory variable

Transformed R2 of auxiliary regression with explanatory variable i as


dependent variable is its variance inflation factor in the original
regression
1
V IFi =
1 Ri2

6 © TBroekel
r

Diagnostics of linear regression: Multicollinearity

Large VIF (>5) indicate potential multicollinearity,

Very large VIF (>10) indicate multicollinearity almost certain

7 © TBroekel

Diagnostics of linear regression: Multicollinearity

How to deal with multicollinearity

Theory! Explanatory variables too similar (spurious correlation?

Eliminate problematic variable

Identify problematic variable combinations - stick to multicollinearity-free


models with relatively larger R2

Reduce dimensions of explanatory variables (factor analysis, stepwise


regression

Demeaning of variables (X-mean(X)

Change model from levels to growth rates

8 © TBroekel
)

Diagnostics of linear regression: Multicollinearity

Stepwise regressio

Automatic procedure for finding empirically best combination of


explanatory variables without multicollinearity issue

Backward selection

Initial model with all explanatory variable

Usually based on F-Test or R2 or Akaike Information Criterium (AIC

Stepwise elimination of variables not significantly harming the model

Is model significantly worse when excluding a variable?

9 © TBroekel

Diagnostics of linear regression: Multicollinearity

Stepwise regressio

Automatic procedure for finding empirically best combination of


explanatory variables without multicollinearity issue

Forward selection

Like backward, but model starts with one explanatory variable and adds
variables that significantly improve the model

10 © TBroekel

Diagnostics of linear regression: Multicollinearity

Stepwise regression with problem

Testing for determining if model is significantly improved or got worse


not straightforward

Variables excluded that are theoretically relevant

Information on variables’ insignificant as interesting as their significance!

11 © TBroekel

Diagnostics of linear regression: Multicollinearity

Testing for multicollinearity in

Use of function ols_vif_tol() in package olsrr

Function can be directly applied to regression results object

VIF values far below 5 ➡ no


concerns of multicollinearity

12 © TBroekel
R

Diagnostics of linear regression: Multicollinearity


Good habit: Even if test for multicollinearity with VIF values
smaller than

Report the same model for different combinations of explanatory


variables

13 © TBroekel
5

How to do a linear regression? Tom Broekel


Diagnostics: Multicollinearity

14 © TBroekel

You might also like