You are on page 1of 12

Multicollinearity

Professor Dr. Md. Ayub Ali

Courtesy: Internet Authors


Multicollinearity
 Multicollinearity - violation of the assumption
that no independent variable is a perfect linear
function of one or more other independent
variables. β1 is the impact X1 has on Y holding
all other factors constant. If X1 is related to X2
then β1 will also capture the impact of changes
in X2. In other words, interpretation of the
parameters becomes difficult.
Perfect vs. Imperfect
With perfect multicollinearity one cannot technically
estimate the parameters.
Dominant variable - a variable that is so highly correlated
with the dependent variable that it completely masks the
effects of all other independent variables in the equation.
Example: Wins = f(PTS, DPTS)
Any other explanatory variable will tend to be insignificant.
Imperfect Multicollinearity - a statistical relationship exists
between two or more independent variables that
significantly affects the estimation of the model.
Consequences of Multicollinearity 
• Estimates will remain unbiased:
• Estimates will still be centered around the true values.
• The variances of the estimates will increase.
• In essence we are asking the model to tell us something we know
very little about (i.e. what is the impact of changing X1 on Y holding
everything else constant).
• Because we are not holding all else constant, the error associated
with our estimate increases.
• Hence it is possible for us to observe coefficients of the opposite
sign than expected due to multicollinearity.
• The computed t-stat will fall
• Variance and standard error are increased. WHY?
More Consequences
• Estimates will become very sensitive to changes in
specification.
• Overall fit of the equation will be generally unaffected.
• If the multicollinearity occurs in the population as well
as the sample, then the predictive power of the model is
unaffected.
• Note: It is possible that multicollinearity is a result of
the sample, so the above may not always be true.
• The severity of the multicollinearity worsens its
consequences.
Detection of Multicollinearity
 • High R2 with all low t-scores
• If this is the case, you have multicollinearity.
• If this is not the case, you may or may not have multicollinearity.
• If all the t-scores are significant and in the expected direction than
we can conclude that multicollinearity is not likely to be a problem.
• High Simple Correlation Coefficients
• A high r between two variables (.80 is the rule of thumb) indicates
the potential for multicollinearity. In a model with more than two
independent variables, though, this test will not tell us if a
relationship exists between a collection of independent variables.
Variance Inflation Factor
 • Run an OLS regression that has Xi as a function of all
the other explanatory variables in the equation.
• Calculate the VIF VIF(βi) = 1 / (1-R2)
• Analyze the degree of multicollinearity by evaluating
the size of VIF.
• There is no table of critical VIF values. The rule of
thumb is if VIF > 5 then multicollinearity is an issue.
Other authors suggest a rule of thumb of VIF > 10.
• What is the R2 necessary to reach the rules of thumb?
Variance Inflation Factor
Eigenvalue Method
Remedies
• Do nothing, If you are only interested in prediction,
multicollinearity is not an issue.
• t-stats may be deflated, but still significant, hence
multicollinearity is not significant.
• The cure is often worse than the disease.
• Drop one or more of the multicollinear variables. • This
solution can introduce specification bias. WHY?
• In an effort to avoid specification bias a researcher can
introduce multicollinearity, hence it would be appropriate
to drop a variable.
More Remedies
• Transform the multicollinear variables.
• Form a linear combination of the multicollinear variables.
• Transform the equation into first differences or logs.
• Increase the sample size.
• The issue of micronumerosity.
• Micronumerosity is the problem of (n) not exceeding (k). The
symptoms are similar to the issue of multicollinearity (lack of
variation in the independent variables).
• A solution to each problem is to increase the sample size. This
will solve the problem of micronumerosity but not necessarily
the problem of multicollinearity.
More Remedies
• Ridge Regression method ′X+KI)-1 X ′Y
• Principal Component method
• LASSO Method
• Instrumental Variable (IV) method

You might also like