Professional Documents
Culture Documents
A. Perfect Multicollinearity
Yi 0 + 1 X 2i + . . . + K X K 1i i 0 X 1i + 1 X 2i + . . . + K X K 1i i
if:
1 X 1i + 2 X 2i + 3 X 3i + . . . + K 1 X K 1i = 0
where the 'lambdas' are a set of parameters (not all equal to zero) and X 1i 1.
This must be true for all observations.
x 2i y i
ˆ1
x 22i
Suppose X2i=λX1i=λ. As a result x2i=0 for all i and hence the denominator is 0.
Thus, estimated slope coefficients are undefined. This result applies to MLR.
ˆ
2
se ( ˆ1 ) =
x22i (1 - r 223 )
But perfect multicollinearity implies r23=1 or r23=-1 (r223=1 in either case), and
the denominator is zero. Thus, standard errors are undefined ( ).
B. Imperfect Multicollinearity
1 X 1i + 2 X 2i + 3 X 3i + . . . + K 1 X K 1i + vi = 0
where vi is a stochastic variable with mean zero and small variance.
Coefficients can be estimated. OLS estimators are still unbiased, and minimum
variance (i.e., BLUE). Imperfect multicollinearity does not violate the classical
assumptions.
But standard errors 'blow up'. They increase with the degree of
multicollinearity. It reduces the ‘precision’ of our coefficient estimates.
For example, recall:
2
se ( ˆ1 ) =
x 2i (1 - r 223 )
2
4
3
2
1
This problem is closely related to the problem of a ‘small sample size’. In both
cases, standard errors ‘blow up’. With a small sample size the denominator is
reduced by the lack of variation in the explanatory variable.
Common 'rule of thumb'. Can't reject the null hypotheses that coefficients are
individually equal to zero (t tests), but can reject the null hypothesis that they
are simultaneously equal to zero (F test).
Not an 'exact test'. What do we mean by 'few' significant t tests, and a 'high' R 2?
Too imprecise. Also depends on other factors like sample size.
Often said that this is a “... sufficient, but not a necessary condition for
multicollinearity.” In other words, if you’ve got a high pairwise correlation,
you’ve got problems. However, it isn’t conclusive evidence of an absence of
multicollinearity.
We test the null hypothesis that the slope coefficients in this auxiliary regression
are simultaneously equal to zero:
H 0 : 3 4 K 1 0
with the following F test.
2
R2
F = K - 12
1 - R2
n-K
where R 22 is the coefficient of determination with X2i as the dependent variable,
and K is the number of coefficients in the original regression. This is related to
1
high Variance Inflation Factors discussed in the textbook, where VIFs 1 R 2 ;
2
Once we're convinced that multicollinearity is present, what can we do about it?
Diagnosis of the ailment isn’t clear cut, neither is the treatment.
Appropriateness of the following remedial measures varies from one situation to
another.
EXAMPLE: Estimating the labour supply of married women from 1950 -1999:
HRS t = 0 + 1W W t + 2 W M t + t
2
R = .847
Multicollinearity is a problem here. First tipoff is the t-ratios are less than 1.5
and 1 respectively (insignificant at 10% levels). Yet, R2 is 0.847. But easy to
confirm mulicollinearity in this case. Correlation between mean wage rates is
0.99 over our sample period! Standard errors blow up. Can’t separate the wage
effects on labour supply of married women.
Possible Solutions?
1. A Priori Information.
For example, suppose that β2=-.5β1. We expect that β1>0 and β2<0.
HRS t = 0 + 1W Wt - .5 1W Mt + t
= 0 + 1 ( W Wt - .5 W Mt ) + t
= 0 + 1W *t + t
2. Dropping a Variable.
We estimate:
HRS t = 0 + 1W W t + vt
Recall the fact that the estimate of α1 is likely to be a biased estimate of β1.
E (ˆ 1 ) 1 2 b12
In fact, bias is increased by the multicollinearity. Where the latter term comes
from the regression of the omitted variable on the included regressor.
One of the simplest things to do with time series regressions is to run 'first
differences'.
The same linear relationship holds for the previous period as well:
HRS t -1 = 0 + 1W W t -1 + 2 W M t -1 + t 1
or
HRS t = 1 W W t + 2 W M t + t
The advantage is that ‘changes’ in wage rates may not be as highly correlated as
their ‘levels’.
The disadvantages are:
t = t - t -1
t -1 = t -1 - t -2
Again, the cure may be worse than the disease. Violates one of the classical
assumptions. More on serial correlation later.
4. New Data.
Multicollinearity often given too much emphasis in the list of common problems
with regression analysis. If it’s imperfect multicollinearity, which is almost
always going to be the case, then it doesn’t violate the classical assumptions.
Much more a problem of if the goal is to test the significance of individual
coefficients. Less of a problem for forecasting and prediction.