Multicollinearity

Multicollinearity
• Assumption of OLS: X variables are

independent. That is, they are not correlated with
each other
• Meaning of Multicollinearity: It explains

presence of a perfect (or) less than perfect linear
relationship among some or all Xs
Multicollinearity
1. No
2. Low
3. Moderate
4. High
5. Very High
• In practice one rarely encounters perfect
multicollinearity
• But, cases of near or very high

multicollinearity arise in many applications
• Thus, multicollinearity is a question of degree
• Note that nonlinear relationship between

variables does not imply multicollinearity
What, if there is multicollinearity?
• Separate influence of Xs on Y can’t be assessed
• Example:
Consumption = f (Income, Wealth)
• If I and W have linear relationship (i.e. if they

move exactly together), there is no way to asses
their separate influence on consumption
Intuitive Reasoning
• We know that in multiple regression, B2 measures

change in the mean value of Y per unit change in
X2, holding value of X3 constant
• In this circumstances, if X2 and X3 are perfectly

collinear, there is no way X3 be kept constant
• As X2 changes so does X3
Sources of multicollinearity
(a) Model Constraints or Specification:
• Electricity consumption = f (Income, House

size)
• Both Xs are important for the model
• Becoz., families with high incomes

generally have larger homes than otherwise
(b) Overestimated model:
• When the model has more Xs than the number

of observations
• Example:
Information collected on large number of
variables from small no. of patients
(C) Data generation:
• Most economic data are not obtained in

controlled laboratory experiments
• Example: GDP, stock prices, profits
• If data could be obtained experimentally, we

would not allow collinearity to come up
• Thus, multicollinearity is a sample

phenomenon arising out of non-experimental
data generated in social sciences
Example
X1 X2 X3
10 50 52
15 75 75
18 90 97
24 120 129
30 150 152
X2 =5 X1, there is a perfect multicollinearity (r12=1)

r23 = 0.995 (highly correlated, but not perfect!).
With 2 X variables, use simple correlation; with more
than 2, partial correlation.
where r12 implies simple correlation
Is multicollinearity problem most
damaging?
• No, provided it is “imperfect” in nature
• So long as collinearity is not perfect, OLS

estimators still retain BLUE property
• This is because imperfect multicollinearity per

se doesn’t violate other assumption of OLS
method
• Anyway perfect collinearity is an extreme

case.
Then, why all the fuss?
Because multicollinearity has some
practical consequences
(a) Theory Vs. Practice:
• Theoretically inclusion of two Xs might be

warranted. But, it may create practical
problems
• Example:
 Consumption = f (Income, Wealth)
Although, income & wealth are logical
candidates to explain consumption, they
might be correlated – wealthier people tend
to have higher incomes
Ideal solution here would be to have sample

observations of both wealthy individuals with
low-income and high-income individuals
with low wealth
(b) High Standard Error (SE):
• In the presence of multicollinearity the SE
and variance of OLS estimators become
large
• If SE of an estimator increases, it becomes

more difficult to estimate true value of
estimator
(c) Wider confidence intervals:
• Because of large SEs
• Hence, probability of accepting a false

hypothesis (Type II error) or H0:B2:0
increases
(d) Insignificant t ratios:
• For the purpose of hypothesis testing, we
use following t ratio
b2  B2
t 
se (b2 )
• Here, if SE is large due to multicollinearity,

t values will be smaller
• As such one will increasingly accept

H0:B2:0
(e) High R2 and F ratio but few significant t
ratios:
• One or more bs are individually

statistically insignificant on the basis of t
test
• R2 may be so high
• F test can reject H0:b2=b3=…..=bk=0
• This is indeed a sign of presence of

multicollinearity
(f) Sensitivity of results:
• In the presence of multicollinearity, OLS

estimators and their SEs become very
sensitive to small changes in data
(g) Wrong signs:
• Regression coefficients will have

wrong/unexpected signs
Detection of multicollinearity
– Some thumb rules
(1) High R2 and F ratio but few significant t

ratios:
• One or more bs are individually statistically
insignificant on the basis of t test
• R2 may be so high
• F test can reject H0:b2=b3=…..=bk=0

• This is the commonly used detection
technique.
• Used as preliminary evidence, which can
be confirmed with other techniques
(2) zero-order (or) pair-wise (or) simple correlation
coefficient (r):
• r measures degree of linear association between

two variables, say X and Y
• r can be computed in two ways:

r   r2
(or)
r 
x y i i
 x  
2
i
y i2 
• If r between two Xs is high, say in excess of 0.8,
then multicollinearity can be a serious problem
• However, if there are more than two explanatory

variables in the model, partial correlation
coefficient will provide a more accurate
assessment of presence or absence of
multicollinearity
(3) High partial correlation:
• In general r is not likely to reflect true degree of

association between Y and X in the presence of
another variable, say X1
• What we need is a correlation coefficient that is

independent of influence, if any, of X1 on X and Y
• Such a correlation coefficient is known as partial

correlation coefficient
• Partial correlation represents correlation between
two variables holding another variable constant
• Conceptually it is similar to partial regression

coefficient
• Example: r 12 . 3 – Represents correlation between

variables 1 (say Y) and 2 (say X), holding a third
variable (X1) constant
• Consider 3 Xs (X2, X3 & X4) in a regression
model
• Under zero-order correlation X2 and X3 might

be highly correlated
• But under partial correlation (where we hold

influence of X4 constant) X2 and X3 might not
be highly correlated
• Thus, in the context of several Xs reliance of
zero-order correlation to check multicollinearity
can be misleading.
r12  r13 r23

r12.3 
1  r 1  r 
2
13
2
23
• Partial correlations give above are called
first order correlation coefficients
• By order we mean the number of

secondary subscripts
• Thus, r12.34 would be correlation coefficient

of order two, r12.345 would be of order three
and so on.
(4) Auxiliary regressions:
Step 1: Regress each Xs on remaining X

variables (called auxiliary regressions) and
get corresponding R2, which we designate as
R2i
Step 2: Construct the following F-statistic
R x2i . x2 x3 .... xk /(k  2)

Fi 
1  R 2
xi . x2 x3 .... xk /(n  k  1)
Where n = sample size
k = no. of Xs including intercept term
R2xi.x2x3….xk = R2 value from a single
auxiliary regression
k-2 = Numerator d.f.
n-k+1 = Denominator d.f.
Step 3: For any single auxiliary regression, if

computed F exceeds the critical F at chosen
 and given d.f. in numerator and
denominator, it is taken that particular Xi is
collinear with other Xs. Otherwise, the
reverse conclusion will apply.
Problem with this rule: Computational
burden
(5) Klien’s rule of thumb:
• Lawrence R. Klien proposed this rule
• Step 1: Obtain R2 from auxiliary regression
• Step 2: Obtain R2 from the regression of Y

on all the Xs (Overall R2)
• Step 3: If R2 from Step 1  R2 from Step 2,

then multicollinearity might be present
(7) Variance inflating factor (VIF):
• The speed with which variances increase can be

seen with the VIF
• VIF shows how the variance of an estimator is

inflated by the presence of multicollinearity
• VIF is defined as follows

1
VIF 
(1  rx2 x3 )
2
• Here r2x2x3 is the r-squared value obtained from
the auxiliary regression of x2 variable on x3
• Note that as r2x2x3 approaches 1, VIF approaches

infinity
• In other words, as the extent of collinearity

increases, variance of an estimator increases
• If there is no collinearity between x2 and x3 VIF

will be 1
• Thus, larger the value of VIF w.r.t a single X
variable, the more “troublesome” or collinear
that variable is with other Xs
• Rule of thumb: If VIF of a X exceeds 10,

which will happen if r2j or R2j exceeds 0.90, that
variable is said to be highly collinear
• R2j is R2 in the regression of a single X variable

on remaining Xs in the model [Auxiliary regression]
(8) Tolerance factor:
• The inverse of VIF is called Tolerance factor
(TOL). That is,
1
TOL j  (or) (1  R 2j )
VIF j
• When R2j = 1(i.e. perfect collinearity), TOL = 0

and when R2j = 0 (i.e. no collinearity), TOL = 1
• Therefore, closer is Tolerance to zero, the
greater the degree of collinearity of that
variable with other regressors
• The closer the Tolerance is to 1, greater the

evidence that xj not collinear with other
regressors
• To conclude:
We can’t tell which of these methods will

work in a given case
No single diagnostic will give as complete
handle over the collinearity problem
Since it is a sample specific problem, in
some situations it might be easy to diagnose
But in others one or more of various
methods will have to be used
In short, there is no easy solution to the
problem
Remedial measures
• What can be done if multicollinearity is
serious?
• Unfortunately, there is no surefire remedy.
• There are only a few rules of thumb
• This is because multicollinearity is largely

data deficiency problem over which we don’t
have full control
• If the problem is with data there is not much

that can be done
Rules of thumb to eliminate
/reduce multicollinearity
(1 ) A priori information:
• When constructing the regression model one

can avoid Xs that can have linear relationship
• How?
• Previous empirical work, theory, intuitive

reasoning etc.
(2) Dropping the collinear variables:
• Simplest solution is to drop one or more of

the collinear variables
• But, this remedy can be worse than the

disease
• Because this will lead to specification bias
• We construct regression models based on

some theoretical foundation
• Hence dropping variables for the sake of
multicollinearity will produce biased results
• The advice then is: do not drop a variable

from an theoretically variable model just
because of multicollinearity problem
(3) Transformation of variables/data:
• Applicable in case of time-series data
• In such data, multicollinearity emerges

because overtime variables tend to move in
the same direction
• One way to minimize this dependence is to

proceed as follows:
• Consider this relation: Yt=b1+b2X2t+b3X3t+et
• If it holds at time t, it must also hold at time

t-1
• Hence Yt-Yt-1=b2 (X2t-X2,t-1)+b3 (X3t-X3,t-1)+(et-
et-1)
• Now the relationship is in first difference form
• If we run this regression, it reduces severity

of multicollinearity problem
• This is because there is no a priori reason to

believe that difference between variables will
also be highly correlated
• But this approach has the following
problems
(a)There will be a loss of one observation

due to differencing procedure and hence
one d.f. will be lost
(b)This procedure may not be appropriate in

cross-sectional data where there is no
logical ordering of observations
(4) Ratio transformation of variables/data:
• This is another form of data transformation
• Consider the following model:
Yt=b1+b2X2t+b3X3t+et
Where Y is consumption expenditure, X2

is GDP and X3 is total population
• Since GDP & population grow overtime, they
are likely to be correlated
• Using ratio transformation method we can

express the above model on a per capita
basis
Yt  1   X 2t   ut 
• That is  b1    b2    b3   
X 3t  X 3t   X 3t   X 3t 
• Such a transformation may reduce
collinearity in original variables
• Population variable is now missing from the

model (Is this lead to specification bias?)
(5) Additional or new data:
• Sometimes, simply increasing sample size
may reduce collinearity problem provided it
helps to impart more variation in the data
(6) Specify the model correctly:

• Sometimes, a model chosen for empirical
analysis is not carefully thought out
• Some important variables may be omitted or

functional form of model is incorrectly chosen
• Avoiding this will solve major part of the

problem

Multicollinearity

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Multicollinearity

Uploaded by

Copyright:

Available Formats

Multicollinearity

• Assumption of OLS: X variables are

• Meaning of Multicollinearity: It explains

• But, cases of near or very high

• Thus, multicollinearity is a question of degree

• Note that nonlinear relationship between

• If I and W have linear relationship (i.e. if they

• We know that in multiple regression, B2 measures

• In this circumstances, if X2 and X3 are perfectly

(a) Model Constraints or Specification:

• Electricity consumption = f (Income, House

• Both Xs are important for the model

• Becoz., families with high incomes

• When the model has more Xs than the number

• Most economic data are not obtained in

• Example: GDP, stock prices, profits

• If data could be obtained experimentally, we

• Thus, multicollinearity is a sample

X2 =5 X1, there is a perfect multicollinearity (r12=1)

• No, provided it is “imperfect” in nature

• So long as collinearity is not perfect, OLS

• This is because imperfect multicollinearity per

• Anyway perfect collinearity is an extreme

• Theoretically inclusion of two Xs might be

Ideal solution here would be to have sample

• If SE of an estimator increases, it becomes

• Hence, probability of accepting a false

• Here, if SE is large due to multicollinearity,

• As such one will increasingly accept

• One or more bs are individually

• F test can reject H0:b2=b3=…..=bk=0

• This is indeed a sign of presence of

• In the presence of multicollinearity, OLS

(g) Wrong signs:

• Regression coefficients will have

(1) High R2 and F ratio but few significant t

• F test can reject H0:b2=b3=…..=bk=0

• r measures degree of linear association between

• r can be computed in two ways:

• However, if there are more than two explanatory

• In general r is not likely to reflect true degree of

• What we need is a correlation coefficient that is

• Such a correlation coefficient is known as partial

• Conceptually it is similar to partial regression

• Example: r 12 . 3 – Represents correlation between

• Under zero-order correlation X2 and X3 might

• But under partial correlation (where we hold

r12  r13 r23

• By order we mean the number of

• Thus, r12.34 would be correlation coefficient

Step 1: Regress each Xs on remaining X

Step 2: Construct the following F-statistic

R x2i . x2 x3 .... xk /(k  2)

Step 3: For any single auxiliary regression, if

• Lawrence R. Klien proposed this rule

• Step 1: Obtain R2 from auxiliary regression

• Step 2: Obtain R2 from the regression of Y

• Step 3: If R2 from Step 1  R2 from Step 2,

• The speed with which variances increase can be

• VIF shows how the variance of an estimator is

• VIF is defined as follows

• Note that as r2x2x3 approaches 1, VIF approaches

• In other words, as the extent of collinearity

• If there is no collinearity between x2 and x3 VIF