You are on page 1of 28

Econometrics I

Nicolás Corona Juárez, Ph.D.


28.10.2020
Multicollinearity

Consider the model:

𝐶𝐸𝑖 = 𝛽1 + 𝛽2 𝐼𝑛𝑖 + 𝛽3 𝑇𝑎𝑖 + 𝑢𝑖

Dwelling size
Electricity Income
consumption

Is there something strange in this model?

2
Multicollinearity

• Consider this general model:

𝑌𝑖 = 𝛽1 + 𝛽2 𝑋2𝑖 + 𝛽3 𝑋3𝑖 + 𝑢𝑖

• We had assumed: The control variables should not be correlated.

3
Multicollinearity

• If we have a model with 𝑘 variables, there is an exact linear relationship if


following condition is satisfied.

𝜆1 𝑋1 + 𝜆2 𝑋2 + ⋯ + 𝜆𝑘 𝑋𝑘 = 0

• Where 𝜆1 , 𝜆2 , … , 𝜆𝑘 are all constant such that not all of them are simultaneously
equal to zero.

4
Multicollinearity

• If we solve for 𝑋2𝑖 we have:


𝜆1 𝜆3 𝜆𝑘
𝑋2𝑖 = − 𝑋1𝑖 − 𝑋3𝑖 − ⋯ − 𝑋𝑘𝑖
𝜆2 𝜆2 𝜆2

• 𝑋2𝑖 is linearly related with other variables.

5
Multicollinearity

• If we have an econometric model in which there are 𝑘 variables:

𝜆1 𝑋1 + 𝜆2 𝑋2 + ⋯ + 𝜆𝑘 𝑋𝑘 + 𝑣𝑖 = 0

• Where 𝑣𝑖 , is a stochstic term.

• This is called: less than perfect multicollinearity.

6
Multicollinearity

• If we solve for 𝑋2𝑖 :


𝜆1 𝜆3 𝜆𝑘 1
𝑋2𝑖 = − 𝑋1𝑖 − 𝑋3𝑖 − ⋯ − 𝑋𝑘𝑖 − 𝑣𝑖
𝜆2 𝜆2 𝜆2 𝜆2

• 𝑋2𝑖 is not an exact linear combination of other 𝑋 given that there is also the
stochastic error term 𝑣𝑖 .

7
Multicollinearity

• Example:
𝑿𝟐 𝑿𝟑 𝑿𝟑∗ 3. No perfect multicollinearity
10 50 52 between 𝑋2 and 𝑋3∗ .
15 75 75
18 90 97
4. However 𝑋2 and 𝑋3∗ are
24 120 129
closely correlated. Their
30 150 152
correlation coefficient is
0.9959.

1. Perfect 2. The variable 𝑋3∗ is build up


collinearity with 𝑋3 summing up following
between 𝑋2 and numbers: 2,0,7,9,2.
𝑋3 .
8
Multicollinearity

• Multicollinearity only refers to linear relationships.

• This concept does not apply to non-linear relationships among the control
variables.

𝑌𝑖 = 𝛽0 + 𝛽1 𝑋𝑖 + 𝛽2 𝑋𝑖2 + 𝛽3 𝑋𝑖3 + 𝑢𝑖

9
Multicollinearity

Why do we assume that there is no multicollinearity in the classical regression


model?

• If there is perfect multicollinearity, the coefficients of the control variables are


undetermined and their standard errors are undefined.

• If there is less than perfect multicollinearity, the coefficients of the control


variables, though determined, show very large standard errors and can not be
correctly estimated.

10
Multicollinearity

Suppose we have perfect multicollinearity in the following model:


𝑦𝑖 = 𝛽መ2 𝑥2𝑖 + 𝛽መ3 𝑥3𝑖 + 𝑢ො 𝑖

Without multicollinearity we get:

2
σ 𝑦𝑖 𝑥2𝑖 σ 𝑥3𝑖 − σ 𝑦𝑖 𝑥3𝑖 σ 𝑥2𝑖 𝑥3𝑖
𝛽መ2 = 2 2
σ 𝑥2𝑖 σ 𝑥3𝑖 − σ 𝑥2𝑖 𝑥3𝑖 2

2
σ 𝑦𝑖 𝑥3𝑖 σ 𝑥2𝑖 − σ 𝑦𝑖 𝑥2𝑖 σ 𝑥2𝑖 𝑥3𝑖

𝛽3 = 2 2
σ 𝑥2𝑖 σ 𝑥3𝑖 − σ 𝑥2𝑖 𝑥3𝑖 2

11
Estimation under perfect Multicollinearity

Assume 𝑋3𝑖 = 𝜆𝑋2𝑖 where 𝜆 is a constant different than 0.

Substituting in 𝛽መ2 ,we have:

2
σ 𝑦𝑖 𝑥2𝑖 𝜆2 σ 𝑥2𝑖 2
− 𝜆 σ 𝑦𝑖 𝑥2𝑖 𝜆 σ 𝑥2𝑖
𝛽መ2 = 2
2 2 2
σ 𝑥2𝑖 𝜆2 σ 𝑥2𝑖 − 𝜆2 σ 𝑥2𝑖

0
𝛽መ2 =
0
For home: Show that 𝛽መ3 is undetermined as well.

12
Estimation under high but imperfect Multicollinearity

• Suppose we have perfect multicollinearity in the following model but we estimate


it:
𝑦𝑖 = 𝛽መ2 𝑥2𝑖 + 𝛽መ3 𝑥3𝑖 + 𝑢ො 𝑖
• Instead of exact multicollinearity we can get:

𝑥3𝑖 = 𝜆𝑥2𝑖 + 𝑣𝑖

Where 𝜆 ≠ 0, 𝑣𝑖 is an stochastic error term.

• Note that σ 𝑥2𝑖 𝑣𝑖 = 0

13
Estimation under high but imperfect Multicollinearity

• Estimating the model, we get:

2
σ 𝑦𝑖 𝑥2𝑖 𝜆2 σ 𝑥2𝑖 + σ 𝑣𝑖2 − 𝜆 σ 𝑦𝑖 𝑥2𝑖 + σ 𝑦𝑖 𝑣𝑖 𝜆 σ 𝑥2𝑖
2
𝛽መ2 = 2
2 2
σ 𝑥2𝑖 𝜆2 σ 𝑥2𝑖 + σ 𝑣𝑖2 − 2
𝜆 σ 𝑥2𝑖

• Remember that σ 𝑥2𝑖 𝑣𝑖 = 0

• Compared to the previous case, we can estimate this model.

14
Practical consequences of multicollinearity

• If there is almost high or high multicollinearity, the consequences are as follows:

1. Even though the OLS estimators 𝛽መ𝑘 are BLUE, they show large variances and
covariances. Lack of precision in the estimation!
2. Due to 1), the confidence intervals are wider. This means, it is easier not to
reject 𝐻0 : 𝛽መ𝑘 = 0.
3. Due to 1), the statistic 𝑡 from one or more coefficients tend to be non
significant.
4. Despite the 𝑡 of one or more coefficients being non significant, 𝑅2 , can be very
high.
5. The 𝛽መ𝑘 of OLS and their standard errors are very sensible to small changes in
the data.

15
1. OLS estimators with large variances and covariances

• We have the model:


𝑦𝑖 = 𝛽መ2 𝑥2𝑖 + 𝛽መ3 𝑥3𝑖 + 𝑢ො 𝑖
Its variance and covariance are:
𝜎2 If 𝑟2,3 rises, so do the
Where 𝑟2,3 is the correlation 𝑣𝑎𝑟 𝛽መ2 = 2 2 variances and covariances of
σ 𝑥2𝑖 (1 − 𝑟2,3 )
coefficient between 𝑋2 and the estimators.
𝑋3 .
𝜎2
𝑣𝑎𝑟 𝛽መ3 = 2 2
σ 𝑥3𝑖 (1 − 𝑟2,3 )

−𝑟2,3 𝜎 2
𝑐𝑜𝑣𝑎𝑟 𝛽መ2 , 𝛽መ3 =
2 2 σ 2
(1 − 𝑟2,3 ) σ 𝑥2𝑖 𝑥3𝑖

16
1. OLS estimators with large variances and covariances

• The speed of increase shown by the variances and covariances can be tracked
with the Variance Inflation Factor (VIF):

1
𝑉𝐼𝐹 = 2
(1 − 𝑟2,3 )

2
What happens with VIF as 𝑟2,3 approaches to 1?

VIF raises. This means that the degree of collinearity rises and so does the variance
of the estimator too.

17
1. OLS estimators with large variances and covariances

• The variances of the estimators can also be written as:

𝜎2
𝑣𝑎𝑟 𝛽መ2 = 2 𝑉𝐼𝐹
σ 𝑥2𝑖

𝜎2
𝑣𝑎𝑟 𝛽መ3 = 2 𝑉𝐼𝐹
σ 𝑥3𝑖

• The variances are directly proportional to VIF.

18
1. OLS estimators with large variances and covariances

• For the model with 𝑘 variables:

𝜎2 1
𝑣𝑎𝑟 𝛽መ𝑘 =
σ 𝑥𝑘2 1 − 𝑅𝑘2
• This is the same:
𝜎2
𝑣𝑎𝑟 𝛽መ𝑘 = 2 𝐹𝐼𝑉𝑘
σ 𝑥𝑘

For home: With 𝜎 2 and 𝐹𝐼𝑉𝑘 constant, What happens with 𝑣𝑎𝑟 𝛽መ𝑘 if the variation of a
control variable is higher?

19
2. Wider confidence intervals
• The confidence interval is given by:

𝛽መ2 − 𝑡𝛼/2 𝑒𝑒(𝛽መ2 ) ≤ 𝛽2 ≤ 𝛽መ2 + 𝑡𝛼/2 𝑒𝑒(𝛽መ2 )

If the degree of collinearity increases, the confidence interval is wider.


Value of 𝒓𝟐,𝟑 95% confidence interval for 𝜷𝟐
The 𝑉𝐼𝐹 is marked in red.
0.00
𝛽መ2 ± 1.96 𝑣𝑎𝑟(𝛽መ2 )

0.50
𝛽መ2 ± 1.96 (1.33) 𝑣𝑎𝑟(𝛽መ2 )

0.95 The probability of accepting


𝛽መ2 ± 1.96 (10.26) 𝑣𝑎𝑟(𝛽መ2 )
a false hypothesis, (type
error II) increases.
0.99
𝛽መ2 ± 1.96 (500) 𝑣𝑎𝑟(𝛽መ2 )
20
3. Non-significant 𝒕 values

• With high multicollinearity, the standard errors dramatically increase:

𝛽መ2 − 𝛽2
𝑡=
𝑒𝑒(𝛽መ2 )

• We can easily accept 𝐻0 : 𝛽መ2 = 0

21
4. 𝑹𝟐 is high but not that many 𝒕 are significant

• Consider the model:

𝑌𝑖 = 𝛽1 + 𝛽2 𝑋2𝑖 + 𝛽3 𝑋3𝑖 + ⋯ + 𝛽𝑘 𝑋𝑘𝑖 + 𝑢𝑖

• 𝑡 values which are non significant with high 𝑅2 and a significant 𝐹 value: sign of
multicollinearity.

22
5. OLS estimators and their standard errors sensibility to small changes in the data

Consider following dataset:

Model 1

𝑌 = 𝛽1 + 𝛽2 𝑋2𝑖 + 𝛽3 𝑋3𝑖

Model 2

𝑌 = 𝛽1 + 𝛽2 𝑋2𝑖 + 𝛽3 𝑋𝑐𝑖

Observe 𝑋3 and 𝑋𝑐

23
5. OLS estimators and their standard errors sensibility to small changes in the data

How can we identify multicollinearity?

Model 1 Model 2

𝑟2,𝑥𝑐 = 0.8285
𝑟2,3 = 0.5523

24
5. OLS estimators and their standard errors sensibility to small changes in the data

How to do this in Stata?

Here you get 𝑟2,3 =


0.5523

𝑐𝑜𝑣 𝛽መ2 , 𝛽መ3 = −0.00868

𝑟2,𝑥𝑐 = 0.8285

𝑐𝑜𝑣 𝛽መ2 , 𝛽መ3 = −0.0282


25
5. OLS estimators and their standard errors sensibility to small changes in the data

How can we identify multicollinearity?

Model 1 Model 2

𝑟2,3 = 0.5523 𝑟2,𝑥𝑐 = 0.8285

𝑐𝑜𝑣 𝛽መ2 , 𝛽መ3 = −0.00868 𝑐𝑜𝑣 𝛽መ2 , 𝛽መ3 = −0.0282

Observe the standard errors in both models and


the individual significance of the coefficients.

26
How to detect multicollinearity?

1. High 𝑅2 and some significant 𝑡 values.

2. High correlation across pairs of control variables.


3. Partial correlations.
4. Auxiliary regressions.
5. Eigenvalues and condition index
6. VIF
7. Dispersion diagram

27
REFERENCE

GUJARATI, DAMODAR N. & D.C. Porter. 2010 “Introductory Econometrics”


Editorial Mc Graw Hill. 5ta Ed.
Chapter 10. Multicollinearity

28

You might also like