B632 07lect10b

Autocorrelation in Regression
Analysis
Tests for Autocorrelation
Examples
Durbin-Watson Tests
Modeling Autoregressive Relationships
What causes autocorrelation?

Misspecification
Data Manipulation
Before receipt
After receipt
Event Inertia
Spatial ordering
Checking for Autocorrelation

Test: Durbin-Watson statistic:
2
(e
e
)
i i 1 , fornandK 1d.f.
d
2
e
i
Positive
Zone of
No Autocorrelation
Zone of
Negative
autocorrelation
indecision
indecision
autocorrelation
|_______________|__________________|_____________|_____________|__________________|
___________________|
0
d-lower
d-upper
2
4-d-upper
4-d-lower
Autocorrelation is clearly evident

Ambiguous cannot rule out autocorrelation
Autocorrelation in not evident
Consider the following regression:

Source |
SS
df
MS
-------------+-----------------------------Model | .354067287
2 .177033643
Residual | 1.09315071
325 .003363541
-------------+-----------------------------Total |
1.447218
327 .004425743
Number of obs
F( 2,
325)
Prob > F
R-squared
Adj R-squared
Root MSE
=
=
=
=
=
=
328
52.63
0.0000
0.2447
0.2400
.058
-----------------------------------------------------------------------------price |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------ice |
.060075
.006827
8.80
0.000
.0466443
.0735056
quantity | -2.27e-06
2.91e-07
-7.79
0.000
-2.84e-06
-1.69e-06
_cons |
.2783773
.0077177
36.07
0.000
.2631944
.2935602
------------------------------------------------------------------------------
Because this is time series data, we should consider the possibility of

autocorrelation. To run the Durbin-Watson, first we have to specify the data
as time series with the tsset command. Next we use the dwstat command.
Durbin-Watson d-statistic( 3, 328) = .2109072
Find the D-upper and D-lower

Check a Durbin Watson table for the
numbers for d-upper and d-lower.
http://hadm.sph.sc.edu/courses/J716/Dw.html
For n=20 and k=2, = .05 the values are:

Lower = 1.643
Upper = 1.704
Durbin's alternative test for autocorrelation
--------------------------------------------------------------------------lags(p) |
chi2
df
Prob > chi2
-------------+------------------------------------------------------------1
|
1292.509
1
0.0000
--------------------------------------------------------------------------H0: no serial correlation
Alternatives to the d-statistic

The d-statistic is not valid in models with a
lagged dependent variable
In the case of a lagged LHS variable you must
use the Durbin-a test (the command is durbina
in Stata)
Also, the d-statistic is only for first order

autocorrelation. In other instances you
may use the Durbin-a
Why would you suspect other than 1st order
autocorrelation?
The Runs Test

An alternative to the D-W test is a
formalized examination of the signs of the
residuals. We would expect that the signs
of the residuals will be random in the
absence of autocorrelation.
The first step is to estimate the model and
predict the residuals.
Runs continued
Next, order the signs of the residuals
against time (or spatial ordering in the
case of cross-sectional data) and see if
there are excessive runs of positives or
negatives. Alternatively, you can graph
the residuals and look for the same trends.
Runs test continued
The final step is to use the expected mean

and deviation in a standard t-test
Stata does this automatically with the
runtest command!
Visual diagnosis of autocorrelation

(in a single series)
- 0 .5 0
A u to c o r re la tio n s o f p r ic e
0 .0 0
0 .5 0
1 .0 0
A correlogram is a good tool to identify if a

series is autocorrelated
10
20
Lag
Bartlett's formula for MA(q) 95% confidence bands
30
40
Dealing with autocorrelation

D-W is not appropriate for auto-regressive (AR)
models, where:
Yt i b0 b1Yt i 1 b2 X 2 ...
In this case, we use the Durbin alternative test
For AR models, need to explicitly estimate the
correlation between Yi and Yi-1 as a model parameter
Techniques:
AR1 models (closest to regression; 1st order only)
ARIMA (any order)
Dealing with Autocorrelation

There are several approaches to
resolving problems of
autocorrelation.
Lagged dependent variables
Differencing the Dependent variable
GLS
ARIMA
Lagged dependent variables

The most common solution
Simply create a new variable that equals Y at t-1, and use as a
RHS variable
To do this in Stata, simply use the generate command with the new
variable equal to L.variable
gen lagy = L.y
gen laglagy = L2.y
This correction should be based on a theoretic belief for

the specification
May cause more problems than it solves
Also costs a degree of freedom (lost observation)
There are several advanced techniques for dealing with this as
well
Differencing
Differencing is simply the act of subtracting the previous
observation value from the current observation.
To do this in Stata, again use the generate command with a capital
D (instead of the L for lags)
This process is effective; however, it is an EXPENSIVE

correction
This technique throws away long-term trends
Assumes the Rho = 1 exactly
D1.x xt xt 1
GLS and ARIMA

GLS approaches use maximum likelihood
to estimate Rho and correct the model
These are good corrections, and can be
replicated in OLS
ARIMA is an acronym for Autoregressive

Integrated Moving Average
This process is a univariate filter used to
cleanse variables of a variety of pathologies
before analysis
Corrections based on Rho

There are several
ways to estimate rho,
the most simple being
calculating it from the
residuals
We then estimate the regression by transforming
the regressors so that:
and
This gives the regression:
High tech solutions

Stata also offers the option of estimating
the model with the AR (with multiple ways
of estimating rho). There is also what is
known as a prais-winsten regression
which generates values for the lost
observation
For the truly adventurous, there is also the
option of doing a full ARIMA model
Prais-winsten regression
Prais-Winsten AR(1) regression -- iterated estimates
Source |
SS
df
MS
-------------+-----------------------------Model | .012722308
2 .006361154
Residual | .134323736
325 .000413304
-------------+-----------------------------Total | .147046044
327 .000449682
-----------------------------------------------------------------------------price |
Coef.
Std. Err.
t
P>|t|
-------------+---------------------------------------------------------------ice |
.0098603
.0059994
1.64
0.101
-.0019422
.0216629
1.70e-07
-0.66
0.512
-4.45e-07
2.22e-07
_cons |
.2517135
.0195727
12.86
0.000
.2132082
.2902188
-------------+---------------------------------------------------------------rho |
.9436986
-----------------------------------------------------------------------------Durbin-Watson statistic (original)
0.210907
Durbin-Watson statistic (transformed) 1.977062
Number of obs
F( 2,
325)
Prob > F
R-squared
Adj R-squared
Root MSE
=
=
=
=
=
=
328
15.39
0.0000
0.0865
0.0809
.02033
ARIMA
The ARIMA model allows us to test the
hypothesis of autocorrelation and remove
it from the data.
This is an iterative process akin to the
purging we did when creating the ystar
variable.
The model
ARIMA regression
Sample:
1 to 328
Log likelihood =
811.6018
Number of obs
Wald chi2(1)
Prob > chi2
=
=
=
328
3804.80
0.0000
-----------------------------------------------------------------------------|
OPG
price |
Coef.
Std. Err.
z
P>|z|
-------------+---------------------------------------------------------------price
|
_cons |
.2558135
.0207937
12.30
0.000
.2150587
.2965683
-------------+---------------------------------------------------------------ARMA
|
ar |
L1. |
.9567067
.01551
61.68
0.000
.9263076
.9871058
-------------+---------------------------------------------------------------/sigma |
.0203009
.000342
59.35
0.000
.0196305
.0209713
------------------------------------------------------------------------------
Estimate of rho
Significant lag
-0 .2 0
A u to c o rre la tio n s o f e
-0 .1 0
0 .0 0
0 .1 0
0 .2 0
The residuals of the ARIMA model
10
20
Lag
30
Bartlett's formula for MA(q) 95% confidence bands
There are a few significant lags a ways back. Generally we

should expect some, but this mess is probably an indicator of a
seasonal trend (well beyond the scope of this lecture)!
40
ARIMA with a covariate

ARIMA regression
Sample:
1 to 328
Log likelihood =
812.9607
Number of obs
Wald chi2(3)
Prob > chi2
=
=
=
328
3569.57
0.0000
-----------------------------------------------------------------------------|
OPG
price |
Coef.
Std. Err.
z
P>|z|
-------------+---------------------------------------------------------------price
|
ice |
.0095013
.0064945
1.46
0.143
-.0032276
.0222303
1.22e-07
-0.85
0.393
-3.43e-07
1.35e-07
_cons |
.2531552
.0220777
11.47
0.000
.2098838
.2964267
-------------+---------------------------------------------------------------ARMA
|
ar |
L1. |
.9542692
.01628
58.62
0.000
.9223611
.9861773
-------------+---------------------------------------------------------------/sigma |
.0202185
.0003471
58.25
0.000
.0195382
.0208988
------------------------------------------------------------------------------
Final thoughts
Each correction has a best application.
If we wanted to evaluate a mean shift (dummy
variable only model), calculating rho will not
be a good choice. Then we would want to
use the lagged dependent variable
Also, where we want to test the effect of
inertia, it is probably better to use the lag
Final Thoughts Continued

In Small N, calculating rho tends to be more accurate
ARIMA is one of the best options, however, it is very
complicated!
When dealing with time, the number of time periods
and the spacing of the observations is VERY
IMPORTANT!
When using estimates of rho, a good rule of thumb is
to make sure you have 25-30 time points at a
minimum. More if the observations are too close for
the process you are observing!
Next Time:
Review for Exam
Plenary Session
Exam Posting
Available after class Wednesday

B632 07lect10b

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

B632 07lect10b

Uploaded by

Copyright:

Available Formats

Autocorrelation in Regression

What causes autocorrelation?

Checking for Autocorrelation

Autocorrelation is clearly evident

Consider the following regression:

Because this is time series data, we should consider the possibility of

Find the D-upper and D-lower

For n=20 and k=2, = .05 the values are:

Alternatives to the d-statistic

Also, the d-statistic is only for first order

The Runs Test

Runs test continued

The final step is to use the expected mean

Visual diagnosis of autocorrelation

A correlogram is a good tool to identify if a

Bartlett's formula for MA(q) 95% confidence bands

Dealing with autocorrelation

Dealing with Autocorrelation

Lagged dependent variables

This correction should be based on a theoretic belief for

This process is effective; however, it is an EXPENSIVE

GLS and ARIMA

ARIMA is an acronym for Autoregressive

Corrections based on Rho

High tech solutions

Prais-Winsten AR(1) regression -- iterated estimates

The residuals of the ARIMA model

Bartlett's formula for MA(q) 95% confidence bands

There are a few significant lags a ways back. Generally we

ARIMA with a covariate

Final Thoughts Continued

You might also like