RELAXING THE ASSUMPTIONS OF CLRM
Dr. Obid A.Khakimov Senior lecturer,
Westminster International
University in Tashkent
ASSUMPTIONS OF CLASSICAL LINEAR REGRESSION MODEL
The regression model is linear in parameters The values of independent variables are fixed in repeated sampling Conditional mean of residuals is equal to zero
For given X’s there is no autocorrelation in the residuals
Independent variables , X’s , and residuals of the regression are independent.
ASSUMPTIONS OF CLASSICAL LINEAR REGRESSION MODEL
The number of observations must be greater than number of parameters.
There must be sufficient variability in the values of variables.
The regression model should be correctly specified.
There is no linear relationship among
2
independent variables.
e
i
e
~ (0,
N
)
Residuals of the regression normally distributed
MULTICOLLINEARITY:
Agenda:
The nature of multicollinearity. Practical consequences. Detection. Remedial measures to alleviate the problem.
REASONS:
Data collection process
Constraints on model or in the population being sampled.
Model specification An over-determined models
PERFECT V.S LESS THAN PERFECT
Perfect multicollinearity is the case when two ore more independe variables Can create perfect linear relationship.
X
+
X
+
X
1
1
2
2
3
3
2
3
X
=
X
+
X
1
2
3
1
1
.....
X
k
k
k
.....
X
k
1
=
0
Perfect multicollinearity is the case when two ore more independe variables Can create less than perfect linear relationship.
MULTIPLE REGRESSION MODEL
ˆ
2
X
=
X
2
i
3
i
(
)
2
y x
(
x
)
(
y x
)(
x x
)
i
2
i
3
i
i
3
i
2
i
3
i
=
2
2
2
(
x
)(
x
)
(
x x
)
2 i
3
i
2
i
3
i
(
)
2
y x
(
x
)
(
y x
)(
x x
)
ˆ
i
3
i
3
i
i
3
i
3
i
3
i
=
2
2
2
2
(
(
x
) )(
x
)
(
x x
)
3
i
3
i
3
i
3
i
(
)
2
2
y x
(
x
)
(
y x
)(
x
)
3
i
ˆ
i
3
i
3
i
i
3 i
=
=
2
2
2
2
2
(
(
x
)(
x
)
(
x
)
3
i
3
i
3 i
if
_ =
a
(
)
y x a
(
y x a
)
0
ˆ
i
3
i
i
3
i
=
=
2
2
aa
( a )
0
OLS ESTIMATION
ˆ
var( ) =
1
var( ˆ ) =
2
2
2
2
2
1
X
x
+
X
x
2
X X
x x
2
3 i
3
2
i
2
3
2
i
3
i
+
2
2
2
n
x
x
(
x x
)
2 i
3
i
2
i
3
i
2
2
var( ˆ ) =
2
2
3
2
2
x
(1
r
)
x
(1
r
)
2
i
2,3
3
i
2,3
2
cov( ˆ , ˆ ) =
2
3
2
r
2,3
2
2
2
(1
r
)
x
x
2,3
2 i
3 i
As degree of collinearity approaches to one, the variances of coefficients approaches to infinity.
Thus, the presence of high collinearity will
PRACTICAL CONSEQUENCES
The OLS is BLUE but large variances and covariances making process estimation difficult.
Large variances cause large confidence intervals and accepting or rejecting hypothesis are biased.
T statistics are biased
Although t-stats are low, R-square might be very high.
The sensitivity of estimators and variances are very high to small changes in dataset
VIF
VARIANCE INFLATION FACTOR
var( ˆ ) =
x
r
)
(1
1
=
x
(1
r
)
=
x
VIF
0
120
100
80
60
40
20
0.8
Correlation
0.6
0.4
0.2
1.2
0
1
IMPLICATION FOR K VARIABLE MODELS
Y
i
ˆ
=
X
i
=
ˆ
1
X
1
ˆ
2
X
2
ˆ
3
X
3
0
X
0
uˆ
+
+
+
+
k
X
k
+
.....
x
(1
R
)
1
x
var(
ˆ
) =
=
=
VIF
x
(1
R
)
+
ˆ
3
X
3
ˆ
0
X
0
ˆ
1
X
1
ˆ
2
X
ˆ
k
+
+
+
X
k
.....
i
R
2
=
R
2
j
CONFIDENCE INTERVALS AND T- STATISTICS
±1.96
se
(
)
VIF
k
k
k
k 0
t =
se
(
)
VIF
Due to low t-stats we can not reject our
Null Hypothesis
k
H
:
=
=
...
=
=
0
0
2
3
k
H _{a} : Not all slope coefficients are simultaneously zero
2
2
n k ESS
n
k
R
R /( k 1)
F =
=
=
2
2
k
1
RSS k
11 R
(1
R
)/(
n
k
)
ue to high R square the F-value will be very high and rejection of Ho will be easy
DETECTION
Multicollinearity is a question of degree. It is a feature of sample but not population.
How to detect :
High R square but low t-stats. High correlation coefficients among the independent variables. Auxiliary regression High VIF
Eigenvalue and condition index.***
AUXILIARY REGRESSION
Ho: The Xi variable is not collinear
X
i
=
ˆ
0
X
0
+
ˆ
1
+
ˆ
2
X
2
+
ˆ
3
X
3
.....
+
ˆ
k
X
k
R
2
=
R
2
j
Run regression where one X is dependent and other X’s are independent and Obtain R square
F =
R
2
/( k 2)
x x
,
,
x
...
x
i
2
3
k
2
) /( n k
x x
,
,
x
...
x
i
2
3
k
(1 R
+ 1)
Df num = k-2 Df denom = n-k+1
k- is the number of explanatory variables including intercept. n- is sample size.
If F stat is higher than F critical then Xi variable is collinear
Rule of thumb: if R square of auxiliary regression is higher than over R square then it might be troublesome.
WHAT TO DO ?
Do nothing. Combining cross section and time series Transformation of variables (differencing, ratio transformation) Additional data observations.
READING
Gujarati D., (2003), Basic Econometrics, Ch. 10