You are on page 1of 28

1

Topic:

Multicollinearity
Outline 2

 What is multicollinearity
 Types of multicollinearity
 Cause of multicollinearity
 Consequences of multicollinearity
 Detection methods of multicollinearity
 Remedies of multicollinearity
 Example
What is Multicollinearity? 3

 In statistics, multicollinearity is a phenomenon in which one


predictor variable in a multiple regression model can be linearly
predicted from the others with a substantial degree of accuracy
 Examples: If some one want to predict the age of a person by
using weight and height
4

Types of Multicollinearity

There are two types of multicollinearity


 Perfect multicollinearity
 High or non-perfect multicollinearity
Perfect Multicollinearity 5
 Perfect multicollinearity refers to a situation in which two or more
explanatory variables in a multiple regression model are perfectly
correlated
 Consider the model:

If

Thus only would be estimable. We cannot get the estimates of and


separately. In this case, there is perfect multicollinearity because and
perfectly correlated
6

High or Non-Perfect Multicollinearity

 Multicollinearity always exists.  It occurs when there are


high correlations among predictor variables, leading to
unreliable and unstable estimates of regression coefficients
How Multicollinearity Occurs? 7

 It is caused by inaccurate use of dummy variables


 It is caused by inclusion of a variable which is computed
from other variables in the data set
 Multicollinearity can also results from the replication of
same kind of variable
 Multicollinearity generally occurs when the variables are
highly correlated with each other
8

Consequences of Multicollinearity
 Even in the presence of multicollinearity, OLS is blue and
consistent
 Stand error of estimate tends to be large
 The probability of committing type-II error will be large
 Estimates of standard errors and parameters tends to be
sensitive to changes in the data and specification of the
model
9

Detection Method of Multicollinearity

 Correlation Matrix
 Tolerance Measure
 Variance Inflation Factor (VIF)
 Condition Index (CI)
10

What is Correlation
Matrix?
A correlation matrix is a table showing correlation
coefficient between sets of variables
11
What is Tolerance Measure?

 The percentage of the variance in the predictor variable


that is not accounted by the other predictor variables.
 Most common tolerance values of 0.10 or less are cited as
problematical
 Although 0.20 has also been suggested.

2
Tolerance  1  R
What is Variance Inflation Factor 12
(VIF)?
 In statistics,
the variance inflation factor (VIF)quantifies the
severity of multicollinearity in an ordinary least
square regression analysis. It provides an index that measures
how much the variance of an estimated regression coefficient
is increased because of collinearity

 VIF of 5 or 10 and above indicates a multicollinearity problem


13
Condition Index (CI)
 The condition index is supposed to measure the sensitivity of the
regression estimates to small changes in the data. It is defined as
the square root of the ratio of the largest to the smallest
eigenvalue of the matrix of the explanatory variables

 CI lies between 10 to 30 then there is moderate multicollinearity


 CI exceeds 30 then there will be severe multicollinearity
Remedies of Multicollinearity 14

The remedies for multicollinearity range from modification of the


regression variates to the use of specialized estimation procedures
 By weighted least square
 By using the Ridge Regression
 Omit one or more highly correlated independent variables
 Use the model with the highly correlated independent variables
Example 15

Klein and Goldberger want to attempt the following U.S.


economy from 1936-1949.
 Where Y = consumption, X1 = wage income, X2 = nonwage,
nonfarm income, and X3 = farm income. But since X1, X2,
and X3 are expected to be highly collinear.
 Detect the multicollinearity and reduce it.
 Raw data give for consumption, wage income, nonwage and
farm income.
16
Data
Analyze > Regression > Linear 17
Statistics… 18
Output 19
20
Remedy by Omitting variable
Analyze > Regression > Linear 21
Statistics… 22
23
Output
24
1 2
Remedy Through Weighted Least Square 25
By using weighted average 26
Output 27
28

1 2

You might also like