Chap4 Econometrics I Jonse

Addis Ababa University
College of Business and Economics

Department of Economics
Econometrics I
Chapter Four
Violations of Classical Assumptions
May 2014 Econometrics

Key assumptions
▪Recall the assumptions of classical linear regression model
➢E(𝜀𝑡 ) = 0
➢Var(𝜀𝑡 ) = 2 < 
➢Cov (𝜀𝑡 , 𝜀𝑡−𝑠 ) = 0, 𝑠 ≠ 0
➢The X matrix is non-stochastic or fixed in repeated samples
➢The matrix X is full rank
➢ 𝜀𝑡  N(0,2)
May 2004 Prof.VuThieu

Key issues to be covered
▪If
these assumptions are violated,
▪What are concept of violating assumption
▪What are causes (why it is violated)?
▪How we test for violations?
▪What are consequences?
• In general we could encounter any combination of
3 problems:
• the coefficient estimates are wrong
• the associated standard errors are wrong
• the distribution that we assumed for the test statistics will
be inappropriate
▪What are the remedial measures?

4.1:Violation of zero mean assumption
▪Violation of this assumption affects the intercept term

only with no effects on the slope coefficients.
▪This implies that a constant non-zero mean for the

disturbances affects only the intercept estimate but not
the slope.
▪Thus, if other assumptions are satisfied, the slope

coefficients are BLUE.

4.1: multicollinearity problem
▪What is multicollinearity?
▪Key assumption: X is full column rank indicating that the
inverse of the matrix (X’X) exists
▪This, in turn, implies that all the explanatory variables
are independent.
▪That is, there is no linear relationship among the
explanatory variables.
▪The idea of independence of regressors (Xi) is that we
can change the value of one independent variable
without changing values of other regressors.
▪However, when independent variables are correlated, it

indicates that changes in one variable are associated
with shifts in another variable.
▪Forthe linear model: Y= 𝑋𝛽 + 𝜀
▪The column vectors X1, X2, …, Xk, are linearly
dependent if there exists a set of constants 𝐶1 , 𝐶2 , … , 𝐶𝑘
where not all zero, such that
𝑘
෍ 𝐶𝑖 𝑋𝑖 = 0
𝑖=1
▪We have k-explanatory variables, but some of them are

linear combinations of the others, so they don’t add any
information.
▪If this holds exactly for a subset of the X1, X2, …, Xk,
then rank (X ' X )<k or the determinant of (X ' X )=0 or at
least one of its eigen values is zero.
▪ Consequently, (𝑿′ 𝑿)−𝟏 does not exist.
▪This is the case of perfect multicollinearity.
▪In such a case, it is impossible to uniquely determine
OLS estimators and their variances due to the fact that:
𝛽መ = (𝑋 ′ 𝑋)−1 𝑋 ′ 𝑌
𝑣𝑎𝑟 𝛽መ = 𝜎 2 (𝑋 ′ 𝑋)−1
▪However, if there isn't an exact linear relationship

among the predictors (less than perfect multicollinearity),
then (X ‘ X ) will be invertible, (𝑋 ′ 𝑋)−1 exists, but it is
very huge/large.
▪What are effects of large variance of OLS estimators?

Consequences of multicollinearity
▪When multicollinearity is less than perfect, we can
estimate the regression coefficients and their variances.
▪However, we would get
✓OLS estimators are inefficient

✓Unreliable estimates
✓Very high standard errors
✓Unexpected sign of parameter estimates
✓Unexpected magnitude of parameter estimates.
✓t-ratio is small and hence it may lead to wrong conclusion
✓F-statistic is also wrong!
Detecting of multicollinearity
▪Inspection of correlation matrix: the inspection
of off-diagonal elements 𝑟𝑖𝑗 for 𝑖 ≠ 𝑗 in (X 'X )gives
an idea about the presence of multicollinearity. If
𝑋𝑖 and 𝑋𝑗 are nearly linearly dependent, then 𝑟𝑖𝑗
will be close to 1.
▪But there is no clear cut-off point?
▪Variance inflation factors (VIF): we know that

the diagonal element of the matrix the (𝑋 ′ 𝑋)−1
determines the variance of OLS estimators.
𝑅𝑗2 denotes the coefficient of determination
▪If
obtained when 𝑋𝑗 is regressed on the remaining (k -1)
variables excluding , 𝑋𝑗 then the
𝑉𝑎𝑟𝑖𝑎𝑚𝑐𝑒 𝑖𝑛𝑓𝑙𝑎𝑡𝑖𝑜𝑛 𝑓𝑎𝑐𝑡𝑜𝑟𝑠 𝑖𝑠:
1
𝑉𝐼𝐹𝑗 =
1 − 𝑅𝑗2
▪Then, if 𝑋𝑗 is nearly independent to remaining
explanatory variables, then 𝑅𝑗2 is close to zero and
hence𝑉𝐼𝐹𝑗 is close to 1.
▪Butif 𝑋𝑗 is nearly linearly dependent on a subset of

remaining explanatory variables, then 𝑅𝑗2 is close to
one and hence 𝑉𝐼𝐹𝑗 becomes very large.
▪The combined effect of dependencies among the
explanatory variables on the variance of a term is
measured by the VIF of that term in the model.
▪One or more large VIFs indicate the presence of

multicollinearity in the data.
▪In practice, usually,

𝑽𝑰𝑭𝒋 > 𝟏𝟎
▪indicatesthe presence of multicollinearity and hence
the associated regression coefficients are poorly
estimated due to multicollinearity.
Remedies for multicollinearity
▪The use of the principal components is an
attempt to extract from the x matrix a small
number of variables that, in some sense, account
for most of or all the variation in the x matrix.
▪Collecting more data to increase variability in X
▪Drop some variables that are collinear, but this

may lead to specification problem due to omitted
variable and hence ‘the cure is worse than the
disease’
4.2: Heteroscedasticity
Recall that:
▪One of the key assumption of CLRM is that error
term is homoscedastic.
▪That is,
▪What is heteroscedasticity?
▪Heteroskedasticity refers to a situation where the
variance of the error terms is unequal over a range of
measured values.
▪Graphically,
▪That is, heteroscedastic error term implies that
variances of error terms are not constant across
observations
▪Symbolically,
▪Sources of heteroscedasticity
1. Following the error-learning models, as people learn,

their errors of behavior become smaller over time.
Then, 𝜎𝑖2 is expected to decrease.
2. Improvement in data collection: as data collecting

techniques improve, 𝜎𝑖2 is likely to decrease.
3. As incomes grow, people have more discretionary

income and, thus, more choice about how to spend
their income. Hence, 𝜎𝑖2 is likely to increase with
income.
▪Effects of heteroscedasticity
▪When all assumptions of classical linear regression model

are satisfied, the OLS estimators are BLUE.
▪What happens to these properties when the error terms
are non-spherical (or error terms are heteroscedastic)
▪The unbiasedness property of the OLS estimators are not

affected by violations of homoscedasticity assumption (if
all other assumptions are satisfied)
𝐸 𝜀 = 0, then 𝐸 𝛽መ = 𝛽 indicating that OLS estimator is

▪If
unbiased even if disturbance term has no constant
variance.
▪Effects of heteroscedasticity
▪However, the variance of OLS estimator is no more

efficient!
▪In short,
• Since, OLS estimators are inefficient, statistical

inferences are invalid as OLS estimator for 𝜎 2 is
biased and inconsistent.
• The usual OLS t-statistics do not have Student’s t -
distribution, even in large samples;
• The F-statistic no longer has Fisher distribution;
Tests (or detecting) Heteroscedasticity
▪Visual inspection
▪Plottingresiduals from OLS against dependent or one of
the explanatory variables, we see some patterns
▪Formal test
The Goldfeld-Quandt (GQ) test is carried out as follows.
1. Split the total sample of length T into two sub-samples of
length T1 and T2. The regression model is estimated on each
sub-sample and the two residual variances are calculated.
2. The null hypothesis is that the variances of the disturbances
are equal, H0:  12 =  22
3. The test statistic, denoted GQ, is simply the ratio of the two
residual variances where the larger of the two variances must
2
be placed in the numerator. GQ = 1 s
s22
4. The test statistic is distributed as an F(T1-k, T2-k) under the

null of homoscedasticity.
5. A problem with the test is that the choice of where to split the
sample is that usually arbitrary and may crucially affect the
outcome of the test.
White general Heteroscedasticity
White’s general test for heteroscedasticity is one of the best
approaches because it makes few assumptions about the form of
the heteroscedasticity.
The test is carried out as follows:
1. Assume that the regression we carried out is as follows
yt = 1 + 2x2t + 3x3t + ut
And we want to test Var(ut) = 2. We estimate the model,
obtaining the residuals, ut
2. Then run the auxiliary regression
White general Heteroscedasticity
uˆt2 = 1 +  2 x2t + 3 x3t +  4 x22t + 5 x32t + 6 x2t x3t + vt
3. Obtain R2 from the auxiliary regression and multiply it by the

number of observations, n. It can be shown that
n R2  2 (m)
where m is the number of regressors in the auxiliary regression
excluding the constant term.
4. If the 2 test statistic from step 3 is greater than the

corresponding value from the statistical table then reject the null
hypothesis that the disturbances are homoscedastic.
Addressing heteroscedasticity
• If the form (i.e. the cause) of the heteroscedasticity is known,
then we can use an estimation method which takes this into
account (called generalised least squares, GLS).
• A simple illustration of GLS is as follows: Suppose that the
error variance is related to another variable zt by:
var (ut ) =  2 zt2
• To remove the heteroscedasticity, divide the regression
equation by zt yt 1 x2t x3t
= 1 + 2 + 3 + vt
zt zt zt zt
ut
where vt = is an error term.
zt
 ut  var (ut )  2 zt2
var (vt ) = var   = = =  2
Now z z2 z2 for known zt.
Addressing Heteroscedasticity
So the disturbances from the new regression equation will be
homoscedastic.
Other solutions include:

1. Transforming the variables into logs.
2. Use White’s heteroscedasticity consistent standard error
estimates.
The effect of using White’s correction is that in general the
standard errors for the slope coefficients are increased relative to
the usual OLS standard errors.
This makes us more “conservative” in hypothesis testing, so that
we would need more evidence against the null hypothesis before
we would reject it.
4.3: Autocorrelation
• We assumed of the CLRM’s errors that Cov (𝜀𝑖 , 𝜀𝑗 ) = 0 for ij.
• That is, this is essentially the same as saying there is no pattern

in the errors.
• Obviously we never have the actual 𝜀’s, so we use their sample

counterpart, the residuals (the 𝑒𝑖 ’s).
• If there are patterns in the residuals from a model, we say that

they are autocorrelated.
• Some stereotypical patterns we may find in the residuals are

given on the next 3 slides.
Positive Autocorrelation
𝑒𝑡
û t
+
û t
+
- +
uˆ t −1
Time
𝑒𝑡−1 time
-
-
Positive Autocorrelation is indicated by a cyclical residual plot over

time.
Negative Autocorrelation
𝑒𝑡û t
+
û t
+
- +
uˆ t −1
𝑒𝑡−1 Time
time
Negative autocorrelation is indicated by an alternating pattern where the

residuals cross the time axis more frequently than if they were
distributed randomly
No pattern in residuals –
No autocorrelation
𝑒𝑡û
𝑒𝑡 û t
+
t
+
- uˆ t −+1
𝑒𝑡−1 t
-
-
No pattern in residuals at all: this is what we would like to see

Reasons for autocorrelation
 Inertia: Inertia or sluggishness in economic time-series
is a great reason for autocorrelation. For example, GNP,
production, price index, employment, and
unemployment exhibit business cycles.
 Omitted variable problem: omitted variables are
captured by the errors and since these variables are
correlated, errors are correlated.
 Model Specification: Incorrect Functional Form:
suppose our explanatory variable is to be in square form,
but we used the linear form. The square form is included
into the error and hence errors are correlated.
 Non-Stationarity: economic variables are
increasing/decreasing over time with some trends.
May 2004
Detecting Autocorrelation:
The Durbin-Watson Test
The Durbin-Watson (DW) is a test for first order autocorrelation -

i.e. it assumes that the relationship between an error and the
previous one is given by:
ut = ut-1 + vt (1)
where vt  N(0, v2).
 The DW test statistic actually tests
H0 : =0 and H1 : 0

T
 The test statistic is calculated by:  ( ut − ut −1) 2
DW = t = 2 T
 ut 2
t =2
The Durbin-Watson Test:
Critical Values
 We can also write

DW  2(1 −  ) (2)
where  is the estimated correlation coefficient. Since  is a

correlation coefficient, it implies that − 1  pˆ  1 .
 Rearranging for DW from (2) would give 0DW4.
 If  = 0, DW = 2. So roughly speaking, do not reject the null

hypothesis if DW is near 2 → i.e. there is little evidence of
autocorrelation
 Unfortunately, DW has two critical values, an upper critical value

(du) and a lower critical value (dL), and there is also an
intermediate region where we can neither reject nor not reject H0.
The Durbin-Watson Test: Interpreting the Results
Conditions which Must be Fulfilled for DW to be a Valid Test

1. Constant term in regression
2. Regressors are non-stochastic
3. No lags of dependent variable
Another Test for Autocorrelation:
The Breusch-Godfrey Test
 It is a more general test for rth order autocorrelation:

ut = 1ut −1 + 2ut − 2 + 3ut − 3 +...+ r ut − r + vt , vN(0,
t
) v2
 The null and alternative hypotheses are:
H0 : 1 = 0 and 2 = 0 and ... and r = 0
H1 : 1  0 or 2  0 or ... or r  0
 The test is carried out as follows:
1. Estimate the linear regression using OLS and obtain the residuals, . 
ut
2. Regress u on all of the regressors from stage 1 (the x’s) plus
t
ut −1 , ut − 2 ,..., ut − r
Obtain R2 from this regression.
3. It can be shown that (T-r)R2  2(r)
 If the test statistic exceeds the critical value from the statistical tables, reject
the null hypothesis of no autocorrelation and hence there is autocorrelation
problem.
Consequences of Ignoring Autocorrelation
if it is Present
 The coefficient estimates derived using OLS are still

unbiased, but they are inefficient, i.e. they are not BLUE,
even in large sample sizes.
 Thus, if the standard error estimates are inappropriate, there

exists the possibility that we could make the wrong
inferences (wrong t-test and F-test).
 R2 is likely to be inflated relative to its “correct” value for

positively correlated residuals.
“Remedies” for Autocorrelation
 If the form of the autocorrelation is known, we could use a

GLS procedure – i.e. an approach that allows for
autocorrelated residuals.
 But such procedures that “correct” for autocorrelation

require assumptions about the form of the autocorrelation.
 If these assumptions are invalid, the cure would be more

dangerous than the disease! - see Hendry and Mizon (1978).
 However, it is unlikely to be the case that the form of the

autocorrelation is known, and a more “modern” view is that
residual autocorrelation presents an opportunity to modify
the regression.
Models in First Difference Form
 Another way to sometimes deal with the problem of

autocorrelation is to switch to a model in first differences.
 Denote the first difference of yt, i.e. yt - yt-1 as yt; similarly for
the x-variables, x2t = x2t - x2t-1 etc.
 The model would now be

yt = 1 + 2 x2t + ... + kxkt + ut
 Sometimes the change in y is purported to depend on previous

values of y or xt as well as changes in x:
yt = 1 + 2 x2t + 3x2t-1 +4yt-1 + ut
4.4. Specification Errors
 Consider a model that we assumed to be
correct Y =  +  X +  X +  X + U
i 1 2 2i 3 3i 4 4i 1i
 The researcher, however, might specify the

model in different way for some reason:
1. Y i =  1 +  2 X 2i +  3 X 3i + U 2i
→ Omitting a relevant variable
U 2i = U 1i +  4 X 4i
2. Y =  +  X +  X +  X +  X + U
i 1 2 2i 3 3i 4 4i 5 5i 3i
→inclusion of irrelevant or unnecessary

variable U = U −  X3i 1i 5 5i

3. ln Y =  +  X +  X +  X + U
i 1 2 2i 3 3i 4 4i 4i
→ Wrong functional form

4. Y =  +  X +  X +  X + U
*
i
*
1
*
2
*
2i
*
3
*
3i
*
4
*
4i
*
i
Y i = Y i + i X 2i = X 2i + 2i X 3i = X 3i + 3i X 4i = X 4i + 4i ,and

* * * *
ε i , and 2i , 3i and 4i are the errors of measurement.
→ Errors of measurement bias
 But, what is(are) the consequences of such

specification errors?

Omission of Relevant Variables
(Under fitting a Model)
 Suppose the true model is:
Y i =  1 +  2 X 2 i +  3 X 3i + u i
 But the researcher specified as:
Yi =  1 +  2 X 2 i + vi
 Consequences of omitting X3 are as
follows:
1. If the left-out variable X3 is correlated with
the included variable X2, both ˆ 1 and ˆ 2 are
biased as well as inconsistent:
Proof
Omission of Relevant Variables (Under fitting
a Model)
 Prove:
 True model:
 Specified model:
 OLS estimator for coefficient of specified model:
 Assume the zero mean for x, y and q and the coefficient

for the true model becomes:
May 2004
Omission of Relevant Variables (Under fitting
a Model)
 The sum of xi is zero (as we assumed zero mean):
 Thus,
 Taking expectation of both sides:
 Since error term has zero mean, we have:
 Thus, OLS estimator is biased and inconsistent as this bias

never dies out as sample size goes to infinity.
May 2004
2. Even if X2 and X3 are uncorrelated, ̂ is still 1
biased, although ̂ is now unbiased.

2
3. The disturbance variance σ2 is incorrectly

estimated.   x −   x x 
2 2
E (ˆ v2 ) =  2 +  2
3 3i 32 2i 2i
n−2
4. The conventionally measured variance of ̂ is 2
a biased estimator of the variance of the true

estimator ̂ .
2
5. In consequence, the usual confidence

interval and hypothesis-testing procedures
are likely to give misleading conclusions.
Inclusion of Irrelevant Variables
(Over fitting a Model)
 Suppose the true model is:
Y i =  1 +  2 X 2i + u i
 But a researcher fits the following model:

Y i = 1 + 2 X 2i + 3 X 3i + i
 The consequences of this specification
error:
1. The estimators are all unbiased and
consistent. How?
2. The error variance σ2 is correctly
estimated
3. The usual confidence interval and
hypothesis-testing procedures remain valid.
4. However, the estimated α's will be
generally inefficient. How?
 In conclusion here would be it is better to

include irrelevant variable than to omit the
relevant ones.

Tests for Omitted Variables and
Incorrect Functional Form
 In practice we are never sure that the
model adopted for empirical testing is the
true model.
 Here the idea is to check whether the
chosen model is adequate or not.
 Consider the total cost of production
function:
Y i = 1 + 2 X i + 3 X i + 4 X i + u1i
2 3
Where Y = total cost and X = output

 But a researcher fits the following quadratic
function: Y =  +  X +  X + u
i 1 2 i 3
2
i 2i
 and another researcher fits the following

linear function: Y =  +  X + u
i 1 2 i 3i
 Ramsey has proposed a general test of

specification error called RESET (regression
specification error test).
 The steps involved in RESET are as follows:
1. Obtain the estimated Yi
2. Introducing Yˆ in some form as an additional
i
regressor(s)
3. Check existence significant different in R
square difference b/n the two models using
F test.
4. If the computed Value is significant at
some specified level of significance, one
can accept the hypothesis that the model
is misspecified.
Illustration
Yi = 166.467 + 19.933 X i + uˆ 3i , R
2
= 0.8409
(19.021) (3.066 )
Yi = 2140.7223 + 476.6557 X i − 0.09187 Yˆ i2 + 0.000119 Yˆ 3i + uˆ i

(132.00044 ) (33.3951) (0.0062 ) (0.0000074 )
R
2
= 0.9983
 Lagrange Multiplier (LM) Test for Adding
Variables
 Steps
1. Estimate the restricted regression and obtain
the residuals, uˆ i .
2. Regress the on all the regressors (including
those in the restricted regression).
uˆ i =  1 +  2 X i +  2 X i +  3 X i + vi
2 3
Where v is an error term with the usual properties.
3. For large sample size

n R2 (number of restrictions )
2
~
asy

4. If the chi-square value obtained exceeds the
critical chi-square value at the chosen level of
significance, we reject the restricted regression.
 Illustration
Yˆ i = 166.467 + 19.933 X i
(19.021) (3.066 )
uˆ i = − 24.7 + 43.5443 X i − 12.9615 X i + 0.9396 X i + ˆ i , R = 0.9896

2 2 2
(6.375 ) (4.779 ) (0.986 ) (0.059 )
 Although our sample size of 10 is by no

means regarded as large, just is used to
illustrate the LM test.
 Thus, nR2=10*0.99=9.9. this is
>𝜒 2 (2)=5.99 at 5% level of significance.
→ Reject the restricted regression
END of chapter 4
 End of the course!

End of Chapter Four

Chap4 Econometrics I Jonse

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Chap4 Econometrics I Jonse

Uploaded by

Copyright:

Available Formats

Addis Ababa University

College of Business and Economics

May 2014 Econometrics

➢Cov (𝜀𝑡 , 𝜀𝑡−𝑠 ) = 0, 𝑠 ≠ 0

➢The X matrix is non-stochastic or fixed in repeated samples

➢The matrix X is full rank

May 2004 Prof.VuThieu

May 2004 Prof.VuThieu

▪Violation of this assumption affects the intercept term

▪This implies that a constant non-zero mean for the

▪Thus, if other assumptions are satisfied, the slope

May 2004 Prof.VuThieu

▪However, when independent variables are correlated, it

▪We have k-explanatory variables, but some of them are

▪However, if there isn't an exact linear relationship

▪What are effects of large variance of OLS estimators?

▪However, we would get

✓OLS estimators are inefficient

▪But there is no clear cut-off point?

▪Variance inflation factors (VIF): we know that

▪Butif 𝑋𝑗 is nearly linearly dependent on a subset of

▪One or more large VIFs indicate the presence of

▪In practice, usually,

▪Collecting more data to increase variability in X

▪Drop some variables that are collinear, but this

1. Following the error-learning models, as people learn,

2. Improvement in data collection: as data collecting

3. As incomes grow, people have more discretionary

▪When all assumptions of classical linear regression model

▪The unbiasedness property of the OLS estimators are not

𝐸 𝜀 = 0, then 𝐸 𝛽መ = 𝛽 indicating that OLS estimator is

▪However, the variance of OLS estimator is no more

• Since, OLS estimators are inefficient, statistical

4. The test statistic is distributed as an F(T1-k, T2-k) under the

3. Obtain R2 from the auxiliary regression and multiply it by the

4. If the 2 test statistic from step 3 is greater than the

Other solutions include:

• That is, this is essentially the same as saying there is no pattern

• Obviously we never have the actual 𝜀’s, so we use their sample

• If there are patterns in the residuals from a model, we say that

• Some stereotypical patterns we may find in the residuals are

Positive Autocorrelation is indicated by a cyclical residual plot over

Negative autocorrelation is indicated by an alternating pattern where the

No pattern in residuals at all: this is what we would like to see

The Durbin-Watson (DW) is a test for first order autocorrelation -

 The DW test statistic actually tests

H0 : =0 and H1 : 0

 We can also write

where  is the estimated correlation coefficient. Since  is a

 If  = 0, DW = 2. So roughly speaking, do not reject the null

 Unfortunately, DW has two critical values, an upper critical value

Conditions which Must be Fulfilled for DW to be a Valid Test

 It is a more general test for rth order autocorrelation:

 The coefficient estimates derived using OLS are still

 Thus, if the standard error estimates are inappropriate, there

 R2 is likely to be inflated relative to its “correct” value for

 If the form of the autocorrelation is known, we could use a

 But such procedures that “correct” for autocorrelation

 If these assumptions are invalid, the cure would be more

 However, it is unlikely to be the case that the form of the

 Another way to sometimes deal with the problem of

 The model would now be

 Sometimes the change in y is purported to depend on previous