Lecture 9a

Lecture 9
Multicollinearity
Overview
Introduction Single- Relaxing Modelling

Equation Assumptions
Regression
Model
Basic Concepts Model
of Econometrics Specification
Diagnostic
Dummy
Two- Testing
Multiple Variable
Variable Regression Regression
Regression Models Models
Models
Multicollinearity
Heteroscedasticity
Autocorrelation
Contents
• Multicollinearity
– The nature of the problem
– Consequences
– Detection
– Remedial measures
CLRM and Least Square Principles
• To estimate and make inference about the true βi
– Assumptions about Xs and error term
• CLRM assumptions pertaining to PRF:
– Linear in parameters
– Fixed Xs or independent Xs and error term
– Zero mean value of disturbance
– Homoscedasticity
– No autocorrelation between disturbances
4
Properties of LS Estimators
• OLS estimators are BLUE, as long as CLRM
assumptions are satisfied
• OLS estimators are random variables
– The values change from sample to sample
• To make inference about the true value of
estimators
– Hypothesis testing, how close is the estimated
value to the true value?
• To relate them to their true values, need to know their
probability distribution
Assumptions of CLRM
1. The regression model is linear in parameter
2. X’s are fixed, or X values are independent of the error
terms. i.e. no covariance between them
3. The mean value of disturbance ui is zero
4. The variance of ui is constant or homoscedastic
5. There is no autocorrelation between the ui.
6. n>k
7. There must be sufficient variation in the values of the X
variables.
8. There is no exact multicollinearity between the X variables
9. The model is correctly specified
10. The stochastic (disturbance) term Ui is normally distributed.
Relaxation of the Assumption
• Lecture 9 : assumptions 8
• Lecture 10 : assumption 4
o Identify the nature of the problem

o Examine its consequences
o Suggest methods of detecting it
o Consider remedial measures
Multicollinearity: The Nature
• Existence of linear relationship among some or all explanatory
variable
– Non-linear relationships, X and X2?
– Causation between X and Y?
• Sources:
– Data collection
• Small sample size, insufficient variability in Xs
– Model specification
• Common time trends or phenomena
– Eg. stock price and GDP; unemployment rate, poverty and average income
– Over-determined model
• Too many explanatory variables
• Since the X's are supposed to be fixed, this a sample
phenomenon
• Is a question of degree and not the presence of multicollinearity
– multicollinearity is almost always present
Multicollinearity: The Nature
• Exact linear relationship between Xs (Perfect collinearity)
• Example: X1 = X3 + 4
• Near collinearity (highly correlated Xs but not perfect
relationship )
• Example: X1 = X3 + 4 + ν
Multicollinearity Cases
(a) No multicollinearity
• The regression would appear to be identical to separate
bivariate regressions
(b) Perfect Multicollinearity

• The regression coefficients of the X variables are
indeterminate and their standard errors are infinite.
• A model cannot be estimated under such circumstances
(c) High degree of Multicollinearity
• The variances and covariances of the βi's are inflated
– Lower t ratio
• The β's are unbiased, and BLUE
• This produces variances which are biased upward (too
large) making t-tests too small
Multicollinearity: Consequences
• Estimates will still be unbiased
– Unbiasedness is a repeated sampling property
• The average of the sample value will converge to true population
value of the estimator as the number of sample increases
• Although BLUE, the OLS estimators have large variances and
covariances, making precise estimation difficult
• Standard errors of the estimates increase
• t ratio of coefficient tends to be statistically insignificant
– Coefficients have the wrong signs
– the confidence intervals tend to be much wider, the probability of
accepting a false hypothesis (i.e., type II error) increases.
• Overall measure of goodness of fit can be high
– High R2
• Estimates vary widely if the specification model is changed
– Sensitive OLS estimators
Example
Dependent Variable: Y
Method: Least Squares
Sample: 1947 1961
Sample: 1947 1962
Included observations: 15
Variable Coefficient Std. Error t-Statistic Prob.

X1 -2.051082 8.709740 -0.235493 0.8197

X1 1.506187 8.491493 0.177376 0.8631
X2 -0.027334 0.033175 -0.823945 0.4338
X2 -0.035819 0.033491 -1.069516 0.3127
X3 -1.952293 0.476701 -4.095429 0.0035
X3 -2.020230 0.488400 -4.136427 0.0025
X4 -0.958239 0.216227 -4.431634 0.0022
X4 -1.033227 0.214274 -4.821985 0.0009
X5 0.051340 0.233968 0.219430 0.8318
X5 -0.051104 0.226073 -0.226051 0.8262
X6 1585.156 482.6832 3.284049 0.0111
X6 1829.151 455.4785 4.015890 0.0030
C 67271.28 23237.42 2.894954 0.0200
C 77270.12 22506.71 3.433204 0.0075
R-squared 0.995512 Mean dependent var 64968.07

Adjusted R-squared 0.992146 S.D. dependent var 3335.820
S.E. of regression 295.6219 Akaike info criterion 14.52076
Sum squared resid 699138.2 Schwarz criterion 14.85119
Log likelihood -101.9057 F-statistic 295.7710
Log likelihood -109.6174 F-statistic 330.2853
Durbin-Watson stat 2.492491 Prob(F-statistic) 0.000000
Durbin-Watson stat 2.559488 Prob(F-statistic) 0.000000
Multicollinearity: Detection
• Rules of thumb:
• High R2 but with the few significant t–ratio
– Significance of F test but insignificance of t test
• Individual effect of variables are not significant as they are highly
correlated
• High Pair-Wise Correlation among X’s
– High interdependent relationship among Xs
– Sufficient but not a necessary condition
• Auxiliary Regression
– Klein’s rule of thumb
• If the R2 obtained from an auxiliary regression is greater than the
overall R2
• Variance Inflation Factor,
– VIF > 10
1 Rj2 = R2 in the regression of Xj on the
VIF  2
1 R j remaining (k-2) regressions
High R2 but few significant t-ratio
Date: 03/03/20 Time: 18:06
Sample: 1990 2005
X3 -0.050293 0.163172 -0.308223 0.7632

X1 0.294188 0.166522 1.766663 0.1027
X2 0.161245 0.404313 0.398811 0.6970
C -892.9734 139.0328 -6.422751 0.0000

Log likelihood -88.44151 Hannan-Quinn criter. 11.56508
F-statistic 114.3584 Durbin-Watson stat 0.655048
Prob(F-statistic) 0.000000
High Pair-Wise Correlation among X’s
X1 X2 X3
X1 1.000000 0.759122 0.994863
X2 0.759122 1.000000 0.722003
X3 0.994863 0.722003 1.000000
Detection of Multicollinerirty
1
Variance Inflation Factors (VIFs) V IF 
1  Rk 2
• No multicollinearity produces VIFs = 1.0
• If the VIF is greater than 10.0, then multicollinearity is probably
severe
• 90% of the variance of Xj is explained by the other X’s.
• In small samples, a VIF of about 5.0 may indicate problems
Tolerance T O L j  1  R j 2  ( 1 / (V IF j ))
• If the tolerance equals 1, the variables are unrelated.

• If TOLj = 0, then they are perfectly correlated
Example
Dependent Variable: X3
Date: 03/03/20 Time: 18:09
Sample: 1990 2005
X1 1.012353 0.035767 28.30410 0.0000

X2 -1.249070 0.593524 -2.104499 0.0554
C -124.2414 233.7947 -0.531412 0.6041

VIF =
TOL =
Dependent Variable: X2
Date: 03/03/20 Time: 18:11
Sample: 1990 2005
X3 -0.203442 0.096670 -2.104499 0.0554

X1 0.240076 0.092817 2.586551 0.0226
C -156.3197 84.94956 -1.840147 0.0887

18
Multicollinerity: Remedies
1. Do nothing
– Data deficiency problem
2. A priori information
– from previous or from the relevant theory underlying the
field of study
– a linear combination of them
3. Conduct panel data analysis
4. Eliminate an independent variable
– May cause specification bias
5. Transforming some of the variables
– First difference for time series data
– Ratio of two related variables
6. Redesign the model
7. Increase the sample size, n
Multicollinearity Problem
• Multicollinerity is not necessary a serious
problem
– If the objective is prediction only
– High R2, better prediction
• Multicollinerity is a serious problem
– if the objective is estimation

Lecture 9a

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Lecture 9a

Uploaded by

Copyright:

Available Formats

Lecture 9

Introduction Single- Relaxing Modelling

o Identify the nature of the problem

(b) Perfect Multicollinearity

Variable Coefficient Std. Error t-Statistic Prob.

X1 -2.051082 8.709740 -0.235493 0.8197

R-squared 0.995512 Mean dependent var 64968.07

Variable Coefficient Std. Error t-Statistic Prob.

X3 -0.050293 0.163172 -0.308223 0.7632

R-squared 0.966204 Mean dependent var 1329.144

• If the tolerance equals 1, the variables are unrelated.

Variable Coefficient Std. Error t-Statistic Prob.

X1 1.012353 0.035767 28.30410 0.0000

R-squared 0.992356 Mean dependent var 8592.342

Variable Coefficient Std. Error t-Statistic Prob.

X3 -0.203442 0.096670 -2.104499 0.0554

R-squared 0.683942 Mean dependent var 231.2494

You might also like