You are on page 1of 22

Multicollinearity

Abhijeet Kumar Kumar Anshuman Manish Kumar Umashankar Singh

Agenda
Definition of multicollinearity Is multicollinearity really a problem? Nature of multicollinearity Practical consequences Detection of multicollinearity Remedial measures

Definition
Meaning of Multicollinearity: It explains presence of a perfect (or) less than perfect linear relationship among some or all Xs Assumption of OLS: X variables are independent. That is, they are not correlated with each other

Example
As a numerical example, consider the following hypothetical data: X2 X3 X*3 10 50 52 15 75 75 18 90 97 24 120 129 30 150 152

Explanation
It is apparent that X3i = 5X2i . Therefore, there is perfect collinearity between X2 and X3 since the coefficient of correlation r23 is unity. The variable X*3 was created from X3 by simply adding to it the following numbers, which were taken from a table of random numbers: 2, 0, 7, 9, 2. Now there is no longer perfect collinearity between X2 and X*3. (X3i = 5X2i + vi ) However, the two variables are highly correlated because calculations will show that the coefficient of correlation between them is 0.9959.

Diagram

1. No 2. Low 3. Moderate 4. High 5. Very High

Nature of Multicollinearity Problem


In practice one rarely encounters perfect multicollinearity But, cases of near or very high multicollinearity arise in many applications Thus, multicollinearity is a question of degree Note that nonlinear relationship between variables does not imply multicollinearity Example; Population and GDP are closely related, i.e highly correlated.

Nature of Multicollinearity Problem


Many finance/ economic variables, especially time series variables are closely related with each other. In multiple regression models, a regression coefficient measures the partial effect of that individual variable on Y when all other X variables in the model are fixed. However, when two explanatory variables move closely together, we cannot assume that one is fixed while the other is changing. Because when one changes, the other one also changes as they are closely related. In such a case it is difficult to isolate the partial effect of a single X variable. This is the problem of Multicollinearity

What, if there is multicollinearity?


Separate influence of Xs on Y cant be assessed Example: Consumption = f (Income, Wealth)

If I and W have linear relationship (i.e. if they move exactly together), there is no way to asses their separate influence on consumption

Sources of multicollinearity
Model Constraints or Specification: Electricity consumption = f (Income, House size) Both Xs are important for the model Because, families with high incomes generally have larger homes than otherwise Overestimated model: When the model has more Xs than the number of observations Example, Information collected on large number of variables from small no. of patients

Data generation
Most economic data are not obtained in controlled laboratory experiments Example: GDP, stock prices, profits If data could be obtained experimentally, we would not allow collinearity to come up Thus, multicollinearity is a sample phenomenon arising out of non-experimental data generated in social sciences

Example
X1
10 15 18 24 30

X2
50 75 90 120 150

X3
52 75 97 129 152

X2 =5 X1, there is a perfect multicollinearity (r12=1)


r23 = 0.995 (highly correlated, but not perfect!). With 2 X variables, use simple correlation; with more than 2, partial correlation. where r12 implies simple correlation

Detection
Kliens rule of thumb Instead of formally testing all auxiliary R2 values, one may adopt Kliens rule of thumb, which suggests that multicollinearity may be a troublesome problem only if the R2 obtained from an auxiliary regression is greater than the overall R2, that is, that obtained from the regression of Y on all the regressors.

Of course, like all other rules of thumb, this one should be used judiciously.

Using colinearity statistics to diagnose multicolinearity


Tolerance: A measure of the % of variation in the predictor variable that cannot be explained by other predictors. A very small tolerance means the predictor is redundant. Tolerance values of less than 0.1 merit further investigation (may mean multicollinearity is present

Using colinearity contd..


Variance Inflation Factor (VIF): an indicator of the effect that the other independent variables have on the standard error of a regression coefficient. VIF is given by 1/Tolerance Large VIF values indicate a high degree of collinearity or muticolinearity A VIF value of more than 10 is generally considered to be high.

Practical consequences
Theory Vs. Practice: Theoretically inclusion of two Xs might be warranted. But, it may create practical problems Example: Consumption = f (Income, Wealth) Although, income & wealth are logical candidates to explain consumption, they might be correlated wealthier people tend to have higher incomes Ideal solution here would be to have sample observations of both wealthy individuals with low-income and high-income individuals with low wealth

Practical consequences contd..


High Standard Error (SE):
In the presence of multicollinearity the SE and variance of OLS estimators become large If SE of an estimator increases, it becomes more difficult to estimate true value of estimator

Sensitivity of results:
In the presence of multicollinearity, OLS estimators and their SEs become very sensitive to small changes in data

Remedial measures
What can be done if multicollinearity is serious? We have two choices: Do nothing or
Follow some rules of thumb.

Do Nothing.
Why, because Multicollinearity is essentially a data deficiency problem (micronumerosity) and some times we have no choice over the data we have available for empirical analysis.

Remedial measures contd..


Increasing the Sample Size: This solution only works when
Multicollinearity is due to measurement error Multicollinearity exist only in the sample and not in the population

Dropping of one of the colinear variables. Substitution of lagged variables for other explanatory variables in distributed lag models Application of methods incorporating extraneous quantitative information Introduction of additional equations in the model

Remedial measures contd..


Multicollinearity is a feature of the sample and not of the population. Therefore, we do not test for multicollinearity but we measure its degree in any particular sample. We do not have one unique method of detecting it or measuring its strength. What we have are some rules of thumb, some informal and some formal.

Conclusion
We cant tell which of these methods will work in a given case No single diagnostic will give as complete handle over the collinearity problem Since it is a sample specific problem, in some situations it might be easy to diagnose But in others one or more of various methods will have to be used In short, there is no easy solution to the problem

Thank You

You might also like