Multicollinearity

Econometrics I
Nicolás Corona Juárez, Ph.D.

28.10.2020
Multicollinearity
Consider the model:
𝐶𝐸𝑖 = 𝛽1 + 𝛽2 𝐼𝑛𝑖 + 𝛽3 𝑇𝑎𝑖 + 𝑢𝑖
Dwelling size
Electricity Income
consumption
Is there something strange in this model?
2
Multicollinearity
• Consider this general model:
𝑌𝑖 = 𝛽1 + 𝛽2 𝑋2𝑖 + 𝛽3 𝑋3𝑖 + 𝑢𝑖
• We had assumed: The control variables should not be correlated.
3
Multicollinearity
• If we have a model with 𝑘 variables, there is an exact linear relationship if

following condition is satisfied.
𝜆1 𝑋1 + 𝜆2 𝑋2 + ⋯ + 𝜆𝑘 𝑋𝑘 = 0
• Where 𝜆1 , 𝜆2 , … , 𝜆𝑘 are all constant such that not all of them are simultaneously
equal to zero.
4
Multicollinearity
• If we solve for 𝑋2𝑖 we have:

𝜆1 𝜆3 𝜆𝑘
𝑋2𝑖 = − 𝑋1𝑖 − 𝑋3𝑖 − ⋯ − 𝑋𝑘𝑖
𝜆2 𝜆2 𝜆2
• 𝑋2𝑖 is linearly related with other variables.
5
Multicollinearity
• If we have an econometric model in which there are 𝑘 variables:
𝜆1 𝑋1 + 𝜆2 𝑋2 + ⋯ + 𝜆𝑘 𝑋𝑘 + 𝑣𝑖 = 0
• Where 𝑣𝑖 , is a stochstic term.
• This is called: less than perfect multicollinearity.
6
Multicollinearity
• If we solve for 𝑋2𝑖 :

𝜆1 𝜆3 𝜆𝑘 1
𝑋2𝑖 = − 𝑋1𝑖 − 𝑋3𝑖 − ⋯ − 𝑋𝑘𝑖 − 𝑣𝑖
𝜆2 𝜆2 𝜆2 𝜆2
• 𝑋2𝑖 is not an exact linear combination of other 𝑋 given that there is also the
stochastic error term 𝑣𝑖 .
7
Multicollinearity
• Example:
𝑿𝟐 𝑿𝟑 𝑿𝟑∗ 3. No perfect multicollinearity
10 50 52 between 𝑋2 and 𝑋3∗ .
15 75 75
18 90 97
4. However 𝑋2 and 𝑋3∗ are
24 120 129
closely correlated. Their
30 150 152
correlation coefficient is
0.9959.
1. Perfect 2. The variable 𝑋3∗ is build up

collinearity with 𝑋3 summing up following
between 𝑋2 and numbers: 2,0,7,9,2.
𝑋3 .
8
Multicollinearity
• Multicollinearity only refers to linear relationships.
• This concept does not apply to non-linear relationships among the control
variables.
𝑌𝑖 = 𝛽0 + 𝛽1 𝑋𝑖 + 𝛽2 𝑋𝑖2 + 𝛽3 𝑋𝑖3 + 𝑢𝑖
9
Multicollinearity
Why do we assume that there is no multicollinearity in the classical regression

model?
• If there is perfect multicollinearity, the coefficients of the control variables are

undetermined and their standard errors are undefined.
• If there is less than perfect multicollinearity, the coefficients of the control

variables, though determined, show very large standard errors and can not be
correctly estimated.
10
Multicollinearity
Suppose we have perfect multicollinearity in the following model:

𝑦𝑖 = 𝛽መ2 𝑥2𝑖 + 𝛽መ3 𝑥3𝑖 + 𝑢ො 𝑖
Without multicollinearity we get:
2
σ 𝑦𝑖 𝑥2𝑖 σ 𝑥3𝑖 − σ 𝑦𝑖 𝑥3𝑖 σ 𝑥2𝑖 𝑥3𝑖
𝛽መ2 = 2 2
σ 𝑥2𝑖 σ 𝑥3𝑖 − σ 𝑥2𝑖 𝑥3𝑖 2
2
σ 𝑦𝑖 𝑥3𝑖 σ 𝑥2𝑖 − σ 𝑦𝑖 𝑥2𝑖 σ 𝑥2𝑖 𝑥3𝑖
መ
𝛽3 = 2 2
σ 𝑥2𝑖 σ 𝑥3𝑖 − σ 𝑥2𝑖 𝑥3𝑖 2
11
Estimation under perfect Multicollinearity
Assume 𝑋3𝑖 = 𝜆𝑋2𝑖 where 𝜆 is a constant different than 0.
Substituting in 𝛽መ2 ,we have:
2
σ 𝑦𝑖 𝑥2𝑖 𝜆2 σ 𝑥2𝑖 2
− 𝜆 σ 𝑦𝑖 𝑥2𝑖 𝜆 σ 𝑥2𝑖
𝛽መ2 = 2
2 2 2
σ 𝑥2𝑖 𝜆2 σ 𝑥2𝑖 − 𝜆2 σ 𝑥2𝑖
0
𝛽መ2 =
0
For home: Show that 𝛽መ3 is undetermined as well.
12
Estimation under high but imperfect Multicollinearity
• Suppose we have perfect multicollinearity in the following model but we estimate

it:
• Instead of exact multicollinearity we can get:
𝑥3𝑖 = 𝜆𝑥2𝑖 + 𝑣𝑖
Where 𝜆 ≠ 0, 𝑣𝑖 is an stochastic error term.
• Note that σ 𝑥2𝑖 𝑣𝑖 = 0
13
Estimation under high but imperfect Multicollinearity
• Estimating the model, we get:
2
σ 𝑦𝑖 𝑥2𝑖 𝜆2 σ 𝑥2𝑖 + σ 𝑣𝑖2 − 𝜆 σ 𝑦𝑖 𝑥2𝑖 + σ 𝑦𝑖 𝑣𝑖 𝜆 σ 𝑥2𝑖
2
𝛽መ2 = 2
2 2
σ 𝑥2𝑖 𝜆2 σ 𝑥2𝑖 + σ 𝑣𝑖2 − 2
𝜆 σ 𝑥2𝑖
• Remember that σ 𝑥2𝑖 𝑣𝑖 = 0
• Compared to the previous case, we can estimate this model.
14
Practical consequences of multicollinearity
• If there is almost high or high multicollinearity, the consequences are as follows:
1. Even though the OLS estimators 𝛽መ𝑘 are BLUE, they show large variances and
covariances. Lack of precision in the estimation!
2. Due to 1), the confidence intervals are wider. This means, it is easier not to
reject 𝐻0 : 𝛽መ𝑘 = 0.
3. Due to 1), the statistic 𝑡 from one or more coefficients tend to be non
significant.
4. Despite the 𝑡 of one or more coefficients being non significant, 𝑅2 , can be very
high.
5. The 𝛽መ𝑘 of OLS and their standard errors are very sensible to small changes in
the data.
15
1. OLS estimators with large variances and covariances
• We have the model:

Its variance and covariance are:
𝜎2 If 𝑟2,3 rises, so do the
Where 𝑟2,3 is the correlation 𝑣𝑎𝑟 𝛽መ2 = 2 2 variances and covariances of
σ 𝑥2𝑖 (1 − 𝑟2,3 )
coefficient between 𝑋2 and the estimators.
𝑋3 .
𝜎2
𝑣𝑎𝑟 𝛽መ3 = 2 2
σ 𝑥3𝑖 (1 − 𝑟2,3 )
−𝑟2,3 𝜎 2
𝑐𝑜𝑣𝑎𝑟 𝛽መ2 , 𝛽መ3 =
2 2 σ 2
(1 − 𝑟2,3 ) σ 𝑥2𝑖 𝑥3𝑖
16
• The speed of increase shown by the variances and covariances can be tracked
with the Variance Inflation Factor (VIF):
1
𝑉𝐼𝐹 = 2
(1 − 𝑟2,3 )
2
What happens with VIF as 𝑟2,3 approaches to 1?
VIF raises. This means that the degree of collinearity rises and so does the variance
of the estimator too.
17
• The variances of the estimators can also be written as:
𝜎2
𝑣𝑎𝑟 𝛽መ2 = 2 𝑉𝐼𝐹
σ 𝑥2𝑖
𝜎2
𝑣𝑎𝑟 𝛽መ3 = 2 𝑉𝐼𝐹
σ 𝑥3𝑖
• The variances are directly proportional to VIF.
18
• For the model with 𝑘 variables:
𝜎2 1
𝑣𝑎𝑟 𝛽መ𝑘 =
σ 𝑥𝑘2 1 − 𝑅𝑘2
• This is the same:
𝜎2
𝑣𝑎𝑟 𝛽መ𝑘 = 2 𝐹𝐼𝑉𝑘
σ 𝑥𝑘
For home: With 𝜎 2 and 𝐹𝐼𝑉𝑘 constant, What happens with 𝑣𝑎𝑟 𝛽መ𝑘 if the variation of a
control variable is higher?
19
2. Wider confidence intervals
• The confidence interval is given by:
𝛽መ2 − 𝑡𝛼/2 𝑒𝑒(𝛽መ2 ) ≤ 𝛽2 ≤ 𝛽መ2 + 𝑡𝛼/2 𝑒𝑒(𝛽መ2 )
If the degree of collinearity increases, the confidence interval is wider.

Value of 𝒓𝟐,𝟑 95% confidence interval for 𝜷𝟐
The 𝑉𝐼𝐹 is marked in red.
0.00
𝛽መ2 ± 1.96 𝑣𝑎𝑟(𝛽መ2 )
0.50
𝛽መ2 ± 1.96 (1.33) 𝑣𝑎𝑟(𝛽መ2 )
0.95 The probability of accepting

𝛽መ2 ± 1.96 (10.26) 𝑣𝑎𝑟(𝛽መ2 )
a false hypothesis, (type
error II) increases.
0.99
𝛽መ2 ± 1.96 (500) 𝑣𝑎𝑟(𝛽መ2 )
20
3. Non-significant 𝒕 values
• With high multicollinearity, the standard errors dramatically increase:
𝛽መ2 − 𝛽2
𝑡=
𝑒𝑒(𝛽መ2 )
• We can easily accept 𝐻0 : 𝛽መ2 = 0
21
4. 𝑹𝟐 is high but not that many 𝒕 are significant
• Consider the model:
𝑌𝑖 = 𝛽1 + 𝛽2 𝑋2𝑖 + 𝛽3 𝑋3𝑖 + ⋯ + 𝛽𝑘 𝑋𝑘𝑖 + 𝑢𝑖
• 𝑡 values which are non significant with high 𝑅2 and a significant 𝐹 value: sign of
multicollinearity.
22
5. OLS estimators and their standard errors sensibility to small changes in the data
Consider following dataset:
Model 1
𝑌 = 𝛽1 + 𝛽2 𝑋2𝑖 + 𝛽3 𝑋3𝑖
Model 2
𝑌 = 𝛽1 + 𝛽2 𝑋2𝑖 + 𝛽3 𝑋𝑐𝑖
Observe 𝑋3 and 𝑋𝑐
23
How can we identify multicollinearity?
Model 1 Model 2
𝑟2,𝑥𝑐 = 0.8285
𝑟2,3 = 0.5523
24
How to do this in Stata?
Here you get 𝑟2,3 =

0.5523
𝑐𝑜𝑣 𝛽መ2 , 𝛽መ3 = −0.00868
𝑟2,𝑥𝑐 = 0.8285
𝑐𝑜𝑣 𝛽መ2 , 𝛽መ3 = −0.0282

25
How can we identify multicollinearity?
Model 1 Model 2
𝑟2,3 = 0.5523 𝑟2,𝑥𝑐 = 0.8285
𝑐𝑜𝑣 𝛽መ2 , 𝛽መ3 = −0.00868 𝑐𝑜𝑣 𝛽መ2 , 𝛽መ3 = −0.0282
Observe the standard errors in both models and

the individual significance of the coefficients.
26
How to detect multicollinearity?
1. High 𝑅2 and some significant 𝑡 values.
2. High correlation across pairs of control variables.

3. Partial correlations.
4. Auxiliary regressions.
5. Eigenvalues and condition index
6. VIF
7. Dispersion diagram
27
REFERENCE
GUJARATI, DAMODAR N. & D.C. Porter. 2010 “Introductory Econometrics”

Editorial Mc Graw Hill. 5ta Ed.
Chapter 10. Multicollinearity
28

Multicollinearity

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Multicollinearity

Uploaded by

Copyright:

Available Formats

Econometrics I

Nicolás Corona Juárez, Ph.D.

Consider the model:

𝐶𝐸𝑖 = 𝛽1 + 𝛽2 𝐼𝑛𝑖 + 𝛽3 𝑇𝑎𝑖 + 𝑢𝑖

Is there something strange in this model?

• Consider this general model:

• We had assumed: The control variables should not be correlated.

• If we have a model with 𝑘 variables, there is an exact linear relationship if

• If we solve for 𝑋2𝑖 we have:

• 𝑋2𝑖 is linearly related with other variables.

• If we have an econometric model in which there are 𝑘 variables:

• Where 𝑣𝑖 , is a stochstic term.

• This is called: less than perfect multicollinearity.

• If we solve for 𝑋2𝑖 :

1. Perfect 2. The variable 𝑋3∗ is build up

• Multicollinearity only refers to linear relationships.

Why do we assume that there is no multicollinearity in the classical regression

• If there is perfect multicollinearity, the coefficients of the control variables are

• If there is less than perfect multicollinearity, the coefficients of the control

Suppose we have perfect multicollinearity in the following model:

Without multicollinearity we get:

Assume 𝑋3𝑖 = 𝜆𝑋2𝑖 where 𝜆 is a constant different than 0.

Substituting in 𝛽መ2 ,we have:

• Suppose we have perfect multicollinearity in the following model but we estimate

Where 𝜆 ≠ 0, 𝑣𝑖 is an stochastic error term.

• Note that σ 𝑥2𝑖 𝑣𝑖 = 0

• Estimating the model, we get:

• Remember that σ 𝑥2𝑖 𝑣𝑖 = 0

• Compared to the previous case, we can estimate this model.

• If there is almost high or high multicollinearity, the consequences are as follows:

• We have the model:

• The variances of the estimators can also be written as:

• The variances are directly proportional to VIF.

• For the model with 𝑘 variables:

𝛽መ2 − 𝑡𝛼/2 𝑒𝑒(𝛽መ2 ) ≤ 𝛽2 ≤ 𝛽መ2 + 𝑡𝛼/2 𝑒𝑒(𝛽መ2 )

If the degree of collinearity increases, the confidence interval is wider.

0.95 The probability of accepting

• With high multicollinearity, the standard errors dramatically increase:

• We can easily accept 𝐻0 : 𝛽መ2 = 0

• Consider the model:

𝑌𝑖 = 𝛽1 + 𝛽2 𝑋2𝑖 + 𝛽3 𝑋3𝑖 + ⋯ + 𝛽𝑘 𝑋𝑘𝑖 + 𝑢𝑖

Consider following dataset:

How can we identify multicollinearity?

How to do this in Stata?

Here you get 𝑟2,3 =

𝑐𝑜𝑣 𝛽መ2 , 𝛽መ3 = −0.00868

𝑐𝑜𝑣 𝛽መ2 , 𝛽መ3 = −0.0282

How can we identify multicollinearity?

𝑟2,3 = 0.5523 𝑟2,𝑥𝑐 = 0.8285

𝑐𝑜𝑣 𝛽መ2 , 𝛽መ3 = −0.00868 𝑐𝑜𝑣 𝛽መ2 , 𝛽መ3 = −0.0282

Observe the standard errors in both models and

1. High 𝑅2 and some significant 𝑡 values.

2. High correlation across pairs of control variables.

GUJARATI, DAMODAR N. & D.C. Porter. 2010 “Introductory Econometrics”

You might also like