You are on page 1of 64

Violations of Assumptions

of OLS
1
Readings: Asertiou:
P93-196 (2
nd
Edition
P83-179 (1
st
edition)
covers all this stuff.

2
The Linear Regression Model
The Assumptions
1. Linearity: The dependent variable is a linear function of the
independent variable

2. X
t
has some variation i.e. Var(X) is not 0.

3. X
t
is non stochastic and fixed in repeated samples

4. E(u
t
) = 0

5. Homoskedasticity: Var (u
t
) =
2

= constant for all t,

6. Cov(u
t
, u
s
) = 0 , serial independence

7. u
t
~ N (,
2
)

8. No Multi-collinearity (i.e. no explanatory variable can be a linear
combination of others in the model)



3
Violation of the Assumptions of the Classical Linear
Regression Model (CLRM) (i.e. OLS)
We will focus on the assumptions regarding the
disturbance terms:
1. E(u
t
) = 0
2. Var(u
t
) = o
2
(Homoskedasticity)
3. Cov (u
i
,u
j
) = 0 (Absence of autocorrelation)
4. u
t
~ N(0,o
2
) (Normally distributed errors)

4
Investigating Violations of the
Assumptions of the CLRM
We will now study these assumptions further, and in particular look at:
- How we test for violations
- Causes
- Consequences
in general we could encounter any combination of 3 problems:
- the coefficient estimates are wrong
- the associated standard errors are wrong
- the distribution that we assumed for the
test statistics will be inappropriate
- Solutions
- the assumptions are no longer violated
- we work around the problem so that we
use alternative techniques which are still valid
5
Statistical Distributions for Diagnostic Tests
Often, an F- and a _
2
- version of the test are available.

The F-test version involves estimating a restricted and an unrestricted
version of a test regression and comparing the RSS. (i.e. Wald tests)

The _
2
- version is sometimes called an LM test, and only has one
degree of freedom parameter: the number of restrictions being tested, m.

For small samples, the F-version is preferable.
6
Assumption 1:
E(u
t
) = 0
7
Assumption 1: E(u
t
) = 0
Assumption that the mean of the disturbances is zero.

For all diagnostic tests, we cannot observe the disturbances and so
perform the tests on the residuals instead.
Recall:
Disturbance -> difference between observation and true line
Residual -> difference between observation and estimated line

The mean of the residuals will always be zero provided that there is a
constant term in the regression so in most cases we dont need to worry
about this!!

8
Assumption 2:
Var(u
t
) = o
2

True => Homoskedasticity
False => Heteroskedasticity
9
Assumption 2: Var(u
t
) = o
2
<

We have so far assumed that the variance of the errors is constant, o
2
- this is
known as homoscedasticity. If the errors do not have a constant variance, we
say that they are heteroscedastic e.g. say we estimate a regression and calculate
the residuals. The plot of the residual against one of the independent variables,
x
2t
is shown below:



t
u +



















-

t
x
2

10
Detection of Heteroscedasticity

Graphical methods
Formal tests:
One of the best is Whites general test for heteroskedasticity.
The test is carried out as follows:
1. Assume that the regression we carried out is as follows
y
t
= |
1
+ |
2
x
2t
+ |
3
x
3t
+ u
t

And we want to test Var(u
t
) = o
2
. We estimate the model, obtaining
the residuals,

2. Then run the auxiliary regression

u
t
t t t t t t t t
v x x x x x x u + + + + + + =
3 2 6
2
3 5
2
2 4 3 3 2 2 1
2
o o o o o o
11
Performing Whites Test for Heteroscedasticity


3. Obtain R
2
from the auxiliary regression and multiply it by the number of
observations, T. It can be shown that
T R
2
~ _
2
(m)
where m is the number of regressors in the auxiliary regression excluding the
constant term. The idea is that if R
2
in the auxiliary regression is high, then
the equation has explanatory power.

4. If the _
2
test statistic from step 3 is greater than the corresponding value
from the statistical table then reject the null hypothesis that the disturbances
are homoscedastic.
There is a test for hetroscedasticity given with OLS regression output.

I.e. The basic idea is to see if xs are correlated with the size of errors
(shouldnt be if homoskedastic)
12
Consequences of Using OLS in the Presence of Heteroscedasticity

OLS estimation still gives unbiased coefficient estimates (in linear models),
but they are no longer BLUE. [not efficient because it increases the variance
of the distribution of the s]

This implies that if we still use OLS in the presence of heteroscedasticity, our
standard errors could be inappropriate and hence any inferences we make
could be misleading.


If standard error is too high: our t-statistic will be too low and we will fail to reject
hypotheses we should reject.
If standard error is too low: our t-statistic will be too high and we will reject
hypotheses we should fail to reject..

Whether the standard errors calculated using the usual formulae are too big or
too small will depend upon the form of the heteroscedasticity.

|

13
How Do we Deal with Heteroscedasticity?

If the form (i.e. the cause) of the heteroskedasticity is known, then we can use
an estimation method which takes this into account (called generalised least
squares, GLS).

Suppose that the original regression: y
t
= |
1
+ |
2
x
2t
+ |
3
x
3t
+ u
t


A simple illustration of GLS is as follows: Suppose that the error variance is
related to another variable z
t
by

To remove the heteroscedasticity, divide the regression equation by z
t




where is just the rescaled error term.

Now for known z
t
.

So the disturbances from the new regression equation will be homoscedastic.


( )
2 2
var
t t
z u o =
t
t
t
t
t
t t
t
v
z
x
z
x
z z
y
+ + + =
3
3
2
2 1
1
| | |
t
t
t
z
u
v =
( )
( )
2
2
2 2
2
var
var var o
o
= = =
|
|
.
|

\
|
=
t
t
t
t
t
t
t
z
z
z
u
z
u
v
14
Other Approaches to Dealing with Heteroscedasticity


Other solutions include:
1. Transforming the variables into logs or reducing by some other measure
of size.
2. Use Whites heteroscedasticity consistent standard error estimates.
The effect of using Whites correction is that in general the standard errors
for the slope coefficients are increased relative to the usual OLS standard
errors.
This makes us more conservative in hypothesis testing, so that we would
need more evidence against the null hypothesis before we would reject it.

15
Assumption 3:
Cov (u
i
,u
j
) = 0


(Absence of autocorrelation)
16
Aside: Covariance
What is covariance?
The covariance between two real-valued random
variables X and Y,
with expected values and is defined
as

If two variables tend to vary together (that is, when one of
them is above its expected value, then the other variable
tends to be above its expected value too), then the
covariance between the two variables will be positive.
On the other hand, if one of them tends to be above its
expected value when the other variable is below its
expected value, then the covariance between the two
variables will be negative.

17
Autocorrelation
Autocorrelation is when the error terms of different observations
are not independent of each other i.e. they are correlated with
each other
Consider the time series regression
Y
t
=
1
X
1t
+
2
X
2t
+
3
X
3t
++
k
X
kt
+ u
t
but in this case there is some relation between the error terms
across observations. [Imagine a shock like the current economic
collapse has effects in more than one period!]
E(u
t
) = 0
Var (u
t
) =
2

But: Cov (u
t
, u
s
) 0
Thus the error covariance's are not zero.
This means that one of the assumption that makes OLS BLUE
does not hold.
Autocorrelation is most likely to occur in a time series framework
In cross-section data there may be spatial correlation this is a form of
autocorrelation too!
18
Likely causes of Autocorrelation

1. Omit variable that ought to be included.
2. Misspecification of the functional form. This
is most obvious where a straight line is
fitted where a curve should have been
used. This would clearly show up in plots of
residuals.
3. Errors of measurement in the dependent
variable. If the errors are not random then
the error term will pick up any systematic
mistakes.
19
The Problem with Autocorrelation
OLS estimators will be inefficient so no longer BLUE but
are still unbiased so get right answer for betas on
average!!
Inefficient here basically means the estimate is more variable
than it needs to be! => less confident in our estimate than we
could be
The estimated variances of the regression coefficients
will be biased and inconsistent, therefore hypothesis
testing will no longer be valid.
t-statistics will tend to be higher
R
2
will tend to be overestimated. [i.e. think we explain
more than we actually do!]

20
Detecting Autocorrelation
By observation
Plot the residuals against time
If error is random should be no pattern!!
Plot u^
t
against u^
t-1
If no correlation should not be a relationship!
The Durbin Watson Test
Only good for first order serial correlation
It can give inconclusive results
It is not applicable when a lagged dependent variable is
included in the regression
Breusch and Godfrey (1978) developed an
alternative test

21
-30
-20
-10
0
10
20
30
-30 -20 -10 0 10 20 30
RESID
P
R
E
V
R
E
S
I
D
Breusch-Godfrey Test
This is an example of an LM (Lagrange Multiplier) type test
where only the restricted form of the model is estimated. We
then test for a relaxation of these restrictions by applying a
formula.
Consider the model
Y
t
=
1
X
1t
+
2
X
2t
+
3
X
3t
++
k
X
kt
+ u
t
Where u
t
=u
t-1
+
2
u
t-2
+
3
u
t-3
++
p
u
t-p
+
t
Combing these two equations gives
Y
t
=
1
X
1t
+
2
X
2t
+
3
X
3t
++
k
X
kt
+ u
t-1

+
2
u
t-2
+
3
u
t-3
++
p
u
t-p
+
t

Test the following H
0
and H
a

H
o
:
1
=
2
=
3
==
p
= 0

H
a
: at least one of the s is not zero, thus there is serial
correlation


22
Breusch-Godfrey Test
This two-stage test begins be considering the
model
Y
t
=
1
X
1t
+
2
X
2t
+
3
X
3t
++
k
X
kt
+ u
t
(1)


1) Estimate model 1 and save the residuals.

2) Run the following model with the number of lags used
determined by the order of serial correlation you are
willing to test.
u^
t
=
0
+
1
X
2t
+
2
X
3t
++
Rr
X
Rt
+
R+1
u^
t-1
+
R+2
u^
t-
2
++
R+p
u^
t-p


(auxiliary regression)

[Note: the ^ are supposed to be on top of the us above]
23
Breusch-Godfrey Test
The test statistic may be written as an
LM statistic = (n - )*R
2
,
where R
2
relates to this auxiliary regression.

The statistic is distributed asymptotically as chi-square
(
2

) with degrees of freedom.





If the LM statistic is bigger than the
2

critical then we
reject the null hypothesis

of no serial correlation and
conclude that serial correlation exists


24
Ho: No Autocorrelation
Here we have autocorrelation (Prob. F<0.05)
Also note previous residual is significant in regression
25
Solutions to Autocorrelation

1. Find cause
2. Increase number of observations
3. Re-specify model correctly
4. E-views provides number of procedures eg Cochrane Orcutt -
last resort
5. Most important: It is easy to confuse misspecified dynamics
with serial correlation in the errors. In fact it is best to always
start from a general dynamic model and test the restrictions
before applying the tests for serial correlation.
[i.e. If this periods Y depends on last periods Y, we should include a
lagged term for Y in our model rather than thinking its a problem with the
errors for the original model!]
26
Assumption 4:
u
t
~ N(0,o
2
)


Normally distributed error term
27

Testing the Normality Assumption

Recall we needed to assume normally distributed
disturbances for many hypothesis tests to be valid
Testing for Departures from Normality
The Bera J arque normality test
Skewness and kurtosis are the (standardised) third and
fourth moments of a distribution. The first and second are
the mean and variance respectively.
A normal distribution is not skewed and is defined to have a
coefficient of kurtosis of 3.
The kurtosis of the normal distribution is 3 so its excess
kurtosis (b
2
-3) is zero.

28
Normal versus Skewed Distributions









A normal distribution A skewed distribution

f(x)
x
x
f(x)
29
Leptokurtic versus Normal Distribution

-5.4 -3.6 -1.8 -0.0 1.8 3.6 5.4
0.0
0.1
0.2
0.3
0.4
0.5
30
Testing for Normality

Bera and Jarque formalise this by testing the residuals for normality by
testing whether the coefficient of skewness and the coefficient of excess
kurtosis are jointly zero.
It can be proved that the coefficients of skewness and kurtosis can be
expressed respectively as:

and


The Bera Jarque test statistic is given by


We estimate b
1
and b
2
using the residuals from the OLS regression, .
This test of normality can be easily performed using E-views
u
( )
b
E u
1
3
2
3 2
=
[ ]
/
o
( )
b
E u
2
4
2
2
=
[ ]
o
( )
( ) 2 ~
24
3
6
2
2
2
2
1
_
(


+ =
b b
T W
31

What do we do if we find evidence of Non-Normality?

It is not obvious what we should do!

Could use a method which does not assume normality, but difficult and
what are its properties?

Often the case that one or two very extreme residuals causes us to
reject the normality assumption.
A solution is to use dummy variables.
e.g. say we estimate a monthly model of asset returns from 1980-
1990, and we plot the residuals, and find a particularly large outlier for
October 1987:

32
What do we do if we find evidence
of Non-Normality? (contd)











Create a new variable:
D87M10
t
= 1 during October 1987 and zero otherwise.
This effectively knocks out that observation. But we need a theoretical
reason for adding dummy variables.

Oct
1987
+
-
Time
t
u
33
A test for functional form
34

Adopting the Wrong Functional Form

We have previously assumed that the appropriate functional
form is linear.
This may not always be true.
We can formally test this using Ramseys RESET test, which
is a general test for mis-specification of functional form.
1. Essentially the method works by adding higher order terms of the
fitted values (e.g. etc.) into an auxiliary regression:
2. Regress on powers of the fitted values:


3. Obtain R
2
from this regression. The test statistic is given by TR
2
and is
distributed as a .
4. So if the value of the test statistic is greater than a then reject
the null hypothesis that the functional form was correct.
... u y y y v
t t t p t
p
t
= + + + + +

| | | |
0 1
2
2
3
1
, y y
t t
2 3
_
2
1 ( ) p
u
t
_
2
1 ( ) p
35

But what do we do if this is the case?

The RESET test gives us no guide as to what a better specification
might be.

One possible cause of rejection of the test is if the true model is

In this case the remedy is obvious.

Another possibility is to transform the data into logarithms. This will
linearise many previously multiplicative models into additive ones:




t t t t t
u x x x y + + + + =
3
2 4
2
2 3 2 2 1
| | | |
t t t
u
t t
u x y e Ax y
t
+ + = = ln ln | o
|
36
Up to here is covered in the
first 9 chapters of Asetiou
and Hall
37
Intro to Endogeneity
[wont have time to cover this
in class not on exam]

38
Learning outcomes
1. To understand what endogeneity is & why it
is a problem.

2. To have an intuitive understanding of how
we might fix it.

But: We wont cover this in any detail despite
the fact that it is an extremely important
topic. Unfortunately we have limited time!



39
Sources of Endogeneity
3 causes of endogeneity:
1. Omitted Variables
2. Measurement Error
3. Simultaneity

I will only focus on endogeneity caused by
omitted variables.
40
The assumption:
When we estimate a model such as:

using OLS, we assume that cov(X
i
, ) = 0
i.e. our included variables (X
i
) are not related to any
unobserved variables () that influence the dependent
variable (Y).

The example we will work with today:
Earnings = +
1
YrsEd +
2
Other variables +
41
c | | o + + + =
2 2 1 1
X X Y
Assuming cov(X, ) = 0:
We can visualize the case where the assumption holds as
follows:





Notice that there is no link between X and .
If X affects Y, when X changes so should Y
The effect of X on Y is:






42
Y
X
) (
) , (
X Var
Y X Cov
= |
Example:
Comparing the earnings of people with different education
levels will give us an estimate of the effect of education on
earnings (controlling for gender etc.):

43
Earnings
Education

Experience
Ethinicity
Region etc.
Example 15.4 Wooldridges book
An extra year of education increases wages by
roughly 7.4%.
(Coefficient*100 is % effect because Y is logged)
44
What if our assumption is
incorrect, i.e. cov(X, )0?
=> Endogeneity
45
Endogeneity
In our example we looked at the effect of
education on earnings.
However many unobserved variables affect both
income and education e.g. the individuals ability
High ability people may be more likely to go to
college
High ability people generally earn more
Since we do not observe ability, we will think
all of the higher earnings is due to education
i.e. estimate the effect, , incorrectly!
46
Endogeneity:
Now we have a problem, some variables in are related to X
as well as to Y => Cov(X,) 0
A change in , will change X and Y but we do not observe , so
we think that X is causing the change in Y directly!!!
Thus we get the wrong estimate for :

47
Y
X
X
X corr p
o
o
c | |
c
) , (

lim + =
Endogeneity:
Now we have a problem, some variables in are related to X
as well as to Y => Cov(X,) 0
A change in , will change X and Y but we do not observe , so
we think that X is causing the change in Y directly!!!
Thus we get the wrong estimate for :

48

X
X corr p
o
o
c | |
c
) , (

lim + =
Earnings
Education
Solutions to endogeneity:
Instrumental Variables (IV)
49
Solution 1:
If we could include the omitted variable (or a
proxy) into our regression, this will solve the
problem since we would be controlling for
ability.
E.g. we might try to measure ability with an
aptitude test and include that in our model.

Problem: often good proxies are not
available.

50
Solution 2: Instrumental variable
Suppose we can find a variable Z that is related to X
but not to (or Y).





Since Z is not related to , if we focus on changes in
X that are due to Z, we overcome the problem. (since
these changes in X are not related to )

51
Y
X Z
Solution 2: Instrumental variable
In our example, we need a variable related to years of
education but not related to ability
Proximity to college may affect the likelihood of
going to college but not your ability!







52

Proximity to college
Earnings
Education
Two stage least squares
First stage:
Regress X and Z (and any other observed variables)


Now predict X =>

Key point:
Since Z is not related to , the predicted cant be
related to , so it is exogenous

53
X =
*
+
*
1
Z +
*
2
Other variables +
Problem solved:




Second stage: Regress Y and . This
IV
will be
unbiased.
Note however that the standard error needs to be
corrected since we reduced the variability of X in stage 1.
54
Y

First stage: regress Education on proximity
to college
55
Second stage: Use predicted education in
original regression instead of education

56
Original OLS (biased)
57
Comparison of Results:
Coefficients:
OLS: 0.074009 = > 7.4% increase in wages per
year
IV: 0.1322888 = > 13.2% increase in wages per
year
This difference may be important to an individual
deciding whether to go to college!
Note: standard errors are larger in IV than OLS because
we lose some explanatory power when we exclude part
of X (the part not correlated with Z)
58
Some issues with IV wont
have time to cover today!
59
Weak Instruments
Suppose Z is not strongly correlated with X e.g. My
education level may be related to my cousins
but probably not very strongly!
In the first stage regression we will not get a very good
prediction for X => so our second stage estimate wont
be good.

60
X Z
Y
Invalid Instruments
Here The instrument is not exogenous since it is also related to
=> We would still get a biased estimate
(in fact it could be even worse than the original estimate)
61
X Z
Y
Two Stage Least Squares estimator
(2SLS):
Recall the standard OLS estimator is:


The 2SLS estimator is:
62
) (
) , (
X Var
Y X Cov
= |
) , (
) , (
X Z Cov
Y Z Cov
= |
Exercise:
Suppose that I am interested in the effect of
hours of study (X) on exam results (Y).
1. Are there any unobserved variables that
might influence both X and Y?
2. Can you think of a proxy variable for them?
3. Suppose we have data on the following
variables, would any of these be valid
instruments for hours of study? Why, why
not?
63
Sample questions on Black Board
try to do the ones with the *


64

You might also like