You are on page 1of 51

Addis Ababa University

College of Business and Economics


Department of Economics

Econometrics I

Chapter Four
Violations of Classical Assumptions

May 2014 Econometrics


Key assumptions
▪Recall the assumptions of classical linear regression model

➢E(𝜀𝑡 ) = 0

➢Var(𝜀𝑡 ) = 2 < 

➢Cov (𝜀𝑡 , 𝜀𝑡−𝑠 ) = 0, 𝑠 ≠ 0

➢The X matrix is non-stochastic or fixed in repeated samples

➢The matrix X is full rank

➢ 𝜀𝑡  N(0,2)

May 2004 Prof.VuThieu


Key issues to be covered
▪If
these assumptions are violated,
▪What are concept of violating assumption
▪What are causes (why it is violated)?
▪How we test for violations?
▪What are consequences?
• In general we could encounter any combination of
3 problems:
• the coefficient estimates are wrong
• the associated standard errors are wrong
• the distribution that we assumed for the test statistics will
be inappropriate
▪What are the remedial measures?

May 2004 Prof.VuThieu


4.1:Violation of zero mean assumption

▪Violation of this assumption affects the intercept term


only with no effects on the slope coefficients.

▪This implies that a constant non-zero mean for the


disturbances affects only the intercept estimate but not
the slope.

▪Thus, if other assumptions are satisfied, the slope


coefficients are BLUE.

May 2004 Prof.VuThieu


4.1: multicollinearity problem
▪What is multicollinearity?
▪Key assumption: X is full column rank indicating that the
inverse of the matrix (X’X) exists
▪This, in turn, implies that all the explanatory variables
are independent.
▪That is, there is no linear relationship among the
explanatory variables.
▪The idea of independence of regressors (Xi) is that we
can change the value of one independent variable
without changing values of other regressors.

▪However, when independent variables are correlated, it


indicates that changes in one variable are associated
with shifts in another variable.
May 2004 Prof.VuThieu
4.1: multicollinearity problem
▪Forthe linear model: Y= 𝑋𝛽 + 𝜀
▪The column vectors X1, X2, …, Xk, are linearly
dependent if there exists a set of constants 𝐶1 , 𝐶2 , … , 𝐶𝑘
where not all zero, such that
𝑘

෍ 𝐶𝑖 𝑋𝑖 = 0
𝑖=1

▪We have k-explanatory variables, but some of them are


linear combinations of the others, so they don’t add any
information.
▪If this holds exactly for a subset of the X1, X2, …, Xk,
then rank (X ' X )<k or the determinant of (X ' X )=0 or at
least one of its eigen values is zero.
▪ Consequently, (𝑿′ 𝑿)−𝟏 does not exist.
4.1: multicollinearity problem
▪This is the case of perfect multicollinearity.
▪In such a case, it is impossible to uniquely determine
OLS estimators and their variances due to the fact that:

𝛽መ = (𝑋 ′ 𝑋)−1 𝑋 ′ 𝑌

𝑣𝑎𝑟 𝛽መ = 𝜎 2 (𝑋 ′ 𝑋)−1

▪However, if there isn't an exact linear relationship


among the predictors (less than perfect multicollinearity),
then (X ‘ X ) will be invertible, (𝑋 ′ 𝑋)−1 exists, but it is
very huge/large.

▪What are effects of large variance of OLS estimators?


4.1: multicollinearity problem
Consequences of multicollinearity
▪When multicollinearity is less than perfect, we can
estimate the regression coefficients and their variances.

▪However, we would get

✓OLS estimators are inefficient


✓Unreliable estimates
✓Very high standard errors
✓Unexpected sign of parameter estimates
✓Unexpected magnitude of parameter estimates.
✓t-ratio is small and hence it may lead to wrong conclusion
✓F-statistic is also wrong!
4.1: multicollinearity problem
Detecting of multicollinearity
▪Inspection of correlation matrix: the inspection
of off-diagonal elements 𝑟𝑖𝑗 for 𝑖 ≠ 𝑗 in (X 'X )gives
an idea about the presence of multicollinearity. If
𝑋𝑖 and 𝑋𝑗 are nearly linearly dependent, then 𝑟𝑖𝑗
will be close to 1.

▪But there is no clear cut-off point?

▪Variance inflation factors (VIF): we know that


the diagonal element of the matrix the (𝑋 ′ 𝑋)−1
determines the variance of OLS estimators.
4.1: multicollinearity problem
Detecting of multicollinearity
𝑅𝑗2 denotes the coefficient of determination
▪If
obtained when 𝑋𝑗 is regressed on the remaining (k -1)
variables excluding , 𝑋𝑗 then the
𝑉𝑎𝑟𝑖𝑎𝑚𝑐𝑒 𝑖𝑛𝑓𝑙𝑎𝑡𝑖𝑜𝑛 𝑓𝑎𝑐𝑡𝑜𝑟𝑠 𝑖𝑠:

1
𝑉𝐼𝐹𝑗 =
1 − 𝑅𝑗2
▪Then, if 𝑋𝑗 is nearly independent to remaining
explanatory variables, then 𝑅𝑗2 is close to zero and
hence𝑉𝐼𝐹𝑗 is close to 1.

▪Butif 𝑋𝑗 is nearly linearly dependent on a subset of


remaining explanatory variables, then 𝑅𝑗2 is close to
one and hence 𝑉𝐼𝐹𝑗 becomes very large.
4.1: multicollinearity problem
Detecting of multicollinearity
▪The combined effect of dependencies among the
explanatory variables on the variance of a term is
measured by the VIF of that term in the model.

▪One or more large VIFs indicate the presence of


multicollinearity in the data.

▪In practice, usually,


𝑽𝑰𝑭𝒋 > 𝟏𝟎
▪indicatesthe presence of multicollinearity and hence
the associated regression coefficients are poorly
estimated due to multicollinearity.
4.1: multicollinearity problem
Remedies for multicollinearity
▪The use of the principal components is an
attempt to extract from the x matrix a small
number of variables that, in some sense, account
for most of or all the variation in the x matrix.

▪Collecting more data to increase variability in X

▪Drop some variables that are collinear, but this


may lead to specification problem due to omitted
variable and hence ‘the cure is worse than the
disease’
4.2: Heteroscedasticity
Recall that:
▪One of the key assumption of CLRM is that error
term is homoscedastic.

▪That is,
4.2: Heteroscedasticity
▪What is heteroscedasticity?
▪Heteroskedasticity refers to a situation where the
variance of the error terms is unequal over a range of
measured values.
▪Graphically,
4.2: Heteroscedasticity
▪That is, heteroscedastic error term implies that
variances of error terms are not constant across
observations
▪Symbolically,
4.2: Heteroscedasticity
▪Sources of heteroscedasticity

1. Following the error-learning models, as people learn,


their errors of behavior become smaller over time.
Then, 𝜎𝑖2 is expected to decrease.

2. Improvement in data collection: as data collecting


techniques improve, 𝜎𝑖2 is likely to decrease.

3. As incomes grow, people have more discretionary


income and, thus, more choice about how to spend
their income. Hence, 𝜎𝑖2 is likely to increase with
income.
4.2: Heteroscedasticity
▪Effects of heteroscedasticity

▪When all assumptions of classical linear regression model


are satisfied, the OLS estimators are BLUE.
▪What happens to these properties when the error terms
are non-spherical (or error terms are heteroscedastic)

▪The unbiasedness property of the OLS estimators are not


affected by violations of homoscedasticity assumption (if
all other assumptions are satisfied)

𝐸 𝜀 = 0, then 𝐸 𝛽መ = 𝛽 indicating that OLS estimator is


▪If
unbiased even if disturbance term has no constant
variance.
4.2: Heteroscedasticity
▪Effects of heteroscedasticity

▪However, the variance of OLS estimator is no more


efficient!

▪In short,

• Since, OLS estimators are inefficient, statistical


inferences are invalid as OLS estimator for 𝜎 2 is
biased and inconsistent.
• The usual OLS t-statistics do not have Student’s t -
distribution, even in large samples;
• The F-statistic no longer has Fisher distribution;
4.2: Heteroscedasticity
Tests (or detecting) Heteroscedasticity
▪Visual inspection
▪Plottingresiduals from OLS against dependent or one of
the explanatory variables, we see some patterns
4.2: Heteroscedasticity
Tests (or detecting) Heteroscedasticity
▪Formal test
The Goldfeld-Quandt (GQ) test is carried out as follows.
1. Split the total sample of length T into two sub-samples of
length T1 and T2. The regression model is estimated on each
sub-sample and the two residual variances are calculated.
2. The null hypothesis is that the variances of the disturbances
are equal, H0:  12 =  22
3. The test statistic, denoted GQ, is simply the ratio of the two
residual variances where the larger of the two variances must
2
be placed in the numerator. GQ = 1 s
s22

4. The test statistic is distributed as an F(T1-k, T2-k) under the


null of homoscedasticity.
4.2: Heteroscedasticity
Tests (or detecting) Heteroscedasticity
5. A problem with the test is that the choice of where to split the
sample is that usually arbitrary and may crucially affect the
outcome of the test.
White general Heteroscedasticity
White’s general test for heteroscedasticity is one of the best
approaches because it makes few assumptions about the form of
the heteroscedasticity.
The test is carried out as follows:
1. Assume that the regression we carried out is as follows
yt = 1 + 2x2t + 3x3t + ut
And we want to test Var(ut) = 2. We estimate the model,
obtaining the residuals, ut
2. Then run the auxiliary regression
4.2: Heteroscedasticity
Tests (or detecting) Heteroscedasticity
White general Heteroscedasticity
uˆt2 = 1 +  2 x2t + 3 x3t +  4 x22t + 5 x32t + 6 x2t x3t + vt

3. Obtain R2 from the auxiliary regression and multiply it by the


number of observations, n. It can be shown that
n R2  2 (m)
where m is the number of regressors in the auxiliary regression
excluding the constant term.

4. If the 2 test statistic from step 3 is greater than the


corresponding value from the statistical table then reject the null
hypothesis that the disturbances are homoscedastic.
4.2: Heteroscedasticity
Addressing heteroscedasticity
• If the form (i.e. the cause) of the heteroscedasticity is known,
then we can use an estimation method which takes this into
account (called generalised least squares, GLS).
• A simple illustration of GLS is as follows: Suppose that the
error variance is related to another variable zt by:
var (ut ) =  2 zt2
• To remove the heteroscedasticity, divide the regression
equation by zt yt 1 x2t x3t
= 1 + 2 + 3 + vt
zt zt zt zt
ut
where vt = is an error term.
zt
 ut  var (ut )  2 zt2
var (vt ) = var   = = =  2
Now z z2 z2 for known zt.
4.2: Heteroscedasticity
Addressing Heteroscedasticity
So the disturbances from the new regression equation will be
homoscedastic.

Other solutions include:


1. Transforming the variables into logs.
2. Use White’s heteroscedasticity consistent standard error
estimates.
The effect of using White’s correction is that in general the
standard errors for the slope coefficients are increased relative to
the usual OLS standard errors.
This makes us more “conservative” in hypothesis testing, so that
we would need more evidence against the null hypothesis before
we would reject it.
4.3: Autocorrelation
• We assumed of the CLRM’s errors that Cov (𝜀𝑖 , 𝜀𝑗 ) = 0 for ij.

• That is, this is essentially the same as saying there is no pattern


in the errors.

• Obviously we never have the actual 𝜀’s, so we use their sample


counterpart, the residuals (the 𝑒𝑖 ’s).

• If there are patterns in the residuals from a model, we say that


they are autocorrelated.

• Some stereotypical patterns we may find in the residuals are


given on the next 3 slides.
Positive Autocorrelation

𝑒𝑡
û t
+

û t
+

- +
uˆ t −1
Time
𝑒𝑡−1 time

-
-

Positive Autocorrelation is indicated by a cyclical residual plot over


time.
Negative Autocorrelation

𝑒𝑡û t
+

û t
+

- +
uˆ t −1

𝑒𝑡−1 Time
time

Negative autocorrelation is indicated by an alternating pattern where the


residuals cross the time axis more frequently than if they were
distributed randomly
No pattern in residuals –
No autocorrelation
𝑒𝑡û
𝑒𝑡 û t
+
t
+

- uˆ t −+1

𝑒𝑡−1 t

-
-

No pattern in residuals at all: this is what we would like to see


Reasons for autocorrelation
 Inertia: Inertia or sluggishness in economic time-series
is a great reason for autocorrelation. For example, GNP,
production, price index, employment, and
unemployment exhibit business cycles.
 Omitted variable problem: omitted variables are
captured by the errors and since these variables are
correlated, errors are correlated.
 Model Specification: Incorrect Functional Form:
suppose our explanatory variable is to be in square form,
but we used the linear form. The square form is included
into the error and hence errors are correlated.
 Non-Stationarity: economic variables are
increasing/decreasing over time with some trends.
May 2004
Detecting Autocorrelation:
The Durbin-Watson Test

The Durbin-Watson (DW) is a test for first order autocorrelation -


i.e. it assumes that the relationship between an error and the
previous one is given by:
ut = ut-1 + vt (1)
where vt  N(0, v2).

 The DW test statistic actually tests

H0 : =0 and H1 : 0


T
 The test statistic is calculated by:  ( ut − ut −1) 2
DW = t = 2 T
 ut 2
t =2
The Durbin-Watson Test:
Critical Values

 We can also write


DW  2(1 −  ) (2)

where  is the estimated correlation coefficient. Since  is a


correlation coefficient, it implies that − 1  pˆ  1 .
 Rearranging for DW from (2) would give 0DW4.

 If  = 0, DW = 2. So roughly speaking, do not reject the null


hypothesis if DW is near 2 → i.e. there is little evidence of
autocorrelation

 Unfortunately, DW has two critical values, an upper critical value


(du) and a lower critical value (dL), and there is also an
intermediate region where we can neither reject nor not reject H0.
The Durbin-Watson Test: Interpreting the Results

Conditions which Must be Fulfilled for DW to be a Valid Test


1. Constant term in regression
2. Regressors are non-stochastic
3. No lags of dependent variable
Another Test for Autocorrelation:
The Breusch-Godfrey Test

 It is a more general test for rth order autocorrelation:


ut = 1ut −1 + 2ut − 2 + 3ut − 3 +...+ r ut − r + vt , vN(0,
t
) v2
 The null and alternative hypotheses are:
H0 : 1 = 0 and 2 = 0 and ... and r = 0
H1 : 1  0 or 2  0 or ... or r  0
 The test is carried out as follows:
1. Estimate the linear regression using OLS and obtain the residuals, . 
ut
2. Regress u on all of the regressors from stage 1 (the x’s) plus
t
ut −1 , ut − 2 ,..., ut − r
Obtain R2 from this regression.
3. It can be shown that (T-r)R2  2(r)
 If the test statistic exceeds the critical value from the statistical tables, reject
the null hypothesis of no autocorrelation and hence there is autocorrelation
problem.
Consequences of Ignoring Autocorrelation
if it is Present

 The coefficient estimates derived using OLS are still


unbiased, but they are inefficient, i.e. they are not BLUE,
even in large sample sizes.

 Thus, if the standard error estimates are inappropriate, there


exists the possibility that we could make the wrong
inferences (wrong t-test and F-test).

 R2 is likely to be inflated relative to its “correct” value for


positively correlated residuals.
“Remedies” for Autocorrelation

 If the form of the autocorrelation is known, we could use a


GLS procedure – i.e. an approach that allows for
autocorrelated residuals.

 But such procedures that “correct” for autocorrelation


require assumptions about the form of the autocorrelation.

 If these assumptions are invalid, the cure would be more


dangerous than the disease! - see Hendry and Mizon (1978).

 However, it is unlikely to be the case that the form of the


autocorrelation is known, and a more “modern” view is that
residual autocorrelation presents an opportunity to modify
the regression.
Models in First Difference Form

 Another way to sometimes deal with the problem of


autocorrelation is to switch to a model in first differences.

 Denote the first difference of yt, i.e. yt - yt-1 as yt; similarly for
the x-variables, x2t = x2t - x2t-1 etc.

 The model would now be


yt = 1 + 2 x2t + ... + kxkt + ut

 Sometimes the change in y is purported to depend on previous


values of y or xt as well as changes in x:
yt = 1 + 2 x2t + 3x2t-1 +4yt-1 + ut
4.4. Specification Errors
 Consider a model that we assumed to be
correct Y =  +  X +  X +  X + U
i 1 2 2i 3 3i 4 4i 1i

 The researcher, however, might specify the


model in different way for some reason:
1. Y i =  1 +  2 X 2i +  3 X 3i + U 2i
→ Omitting a relevant variable
U 2i = U 1i +  4 X 4i

2. Y =  +  X +  X +  X +  X + U
i 1 2 2i 3 3i 4 4i 5 5i 3i

→inclusion of irrelevant or unnecessary


variable U = U −  X3i 1i 5 5i

May 2004 Prof.VuThieu


3. ln Y =  +  X +  X +  X + U
i 1 2 2i 3 3i 4 4i 4i

→ Wrong functional form


4. Y =  +  X +  X +  X + U
*
i
*
1
*
2
*
2i
*
3
*
3i
*
4
*
4i
*
i

Y i = Y i + i X 2i = X 2i + 2i X 3i = X 3i + 3i X 4i = X 4i + 4i ,and


* * * *

ε i , and 2i , 3i and 4i are the errors of measurement.

→ Errors of measurement bias

 But, what is(are) the consequences of such


specification errors?

May 2004 Prof.VuThieu


Omission of Relevant Variables
(Under fitting a Model)
 Suppose the true model is:
Y i =  1 +  2 X 2 i +  3 X 3i + u i
 But the researcher specified as:
Yi =  1 +  2 X 2 i + vi
 Consequences of omitting X3 are as
follows:
1. If the left-out variable X3 is correlated with
the included variable X2, both ˆ 1 and ˆ 2 are
biased as well as inconsistent:
Proof
May 2004 Prof.VuThieu
Omission of Relevant Variables (Under fitting
a Model)
 Prove:
 True model:

 Specified model:
 OLS estimator for coefficient of specified model:

 Assume the zero mean for x, y and q and the coefficient


for the true model becomes:

May 2004
Omission of Relevant Variables (Under fitting
a Model)
 The sum of xi is zero (as we assumed zero mean):
 Thus,

 Taking expectation of both sides:

 Since error term has zero mean, we have:

 Thus, OLS estimator is biased and inconsistent as this bias


never dies out as sample size goes to infinity.
May 2004
2. Even if X2 and X3 are uncorrelated, ̂ is still 1

biased, although ̂ is now unbiased.


2

3. The disturbance variance σ2 is incorrectly


estimated.   x −   x x 
2 2

E (ˆ v2 ) =  2 +  2
3 3i 32 2i 2i

n−2

4. The conventionally measured variance of ̂ is 2

a biased estimator of the variance of the true


estimator ̂ .
2

5. In consequence, the usual confidence


interval and hypothesis-testing procedures
are likely to give misleading conclusions.
May 2004 Prof.VuThieu
Inclusion of Irrelevant Variables
(Over fitting a Model)
 Suppose the true model is:
Y i =  1 +  2 X 2i + u i

 But a researcher fits the following model:


Y i = 1 + 2 X 2i + 3 X 3i + i
 The consequences of this specification
error:
1. The estimators are all unbiased and
consistent. How?
2. The error variance σ2 is correctly
estimated
May 2004 Prof.VuThieu
3. The usual confidence interval and
hypothesis-testing procedures remain valid.
4. However, the estimated α's will be
generally inefficient. How?

 In conclusion here would be it is better to


include irrelevant variable than to omit the
relevant ones.

May 2004 Prof.VuThieu


Tests for Omitted Variables and
Incorrect Functional Form
 In practice we are never sure that the
model adopted for empirical testing is the
true model.
 Here the idea is to check whether the
chosen model is adequate or not.
 Consider the total cost of production
function:
Y i = 1 + 2 X i + 3 X i + 4 X i + u1i
2 3

Where Y = total cost and X = output

May 2004 Prof.VuThieu


 But a researcher fits the following quadratic
function: Y =  +  X +  X + u
i 1 2 i 3
2
i 2i

 and another researcher fits the following


linear function: Y =  +  X + u
i 1 2 i 3i

 Ramsey has proposed a general test of


specification error called RESET (regression
specification error test).
 The steps involved in RESET are as follows:
1. Obtain the estimated Yi
2. Introducing Yˆ in some form as an additional
i

regressor(s)
May 2004 Prof.VuThieu
3. Check existence significant different in R
square difference b/n the two models using
F test.
4. If the computed Value is significant at
some specified level of significance, one
can accept the hypothesis that the model
is misspecified.
Illustration
Yi = 166.467 + 19.933 X i + uˆ 3i , R
2
= 0.8409
(19.021) (3.066 )

Yi = 2140.7223 + 476.6557 X i − 0.09187 Yˆ i2 + 0.000119 Yˆ 3i + uˆ i


(132.00044 ) (33.3951) (0.0062 ) (0.0000074 )

R
2
= 0.9983
May 2004 Prof.VuThieu
 Lagrange Multiplier (LM) Test for Adding
Variables
 Steps
1. Estimate the restricted regression and obtain
the residuals, uˆ i .
2. Regress the on all the regressors (including
those in the restricted regression).
uˆ i =  1 +  2 X i +  2 X i +  3 X i + vi
2 3

Where v is an error term with the usual properties.

3. For large sample size


n R2 (number of restrictions )
2
~
asy

May 2004 Prof.VuThieu


4. If the chi-square value obtained exceeds the
critical chi-square value at the chosen level of
significance, we reject the restricted regression.
 Illustration
Yˆ i = 166.467 + 19.933 X i
(19.021) (3.066 )

uˆ i = − 24.7 + 43.5443 X i − 12.9615 X i + 0.9396 X i + ˆ i , R = 0.9896


2 2 2
(6.375 ) (4.779 ) (0.986 ) (0.059 )

 Although our sample size of 10 is by no


means regarded as large, just is used to
illustrate the LM test.
 Thus, nR2=10*0.99=9.9. this is
>𝜒 2 (2)=5.99 at 5% level of significance.
→ Reject the restricted regression
May 2004 Prof.VuThieu
END of chapter 4

 End of the course!

May 2004 Prof.VuThieu


End of Chapter Four

May 2004 Prof.VuThieu

You might also like