You are on page 1of 20

Notes on Non-stationarity

Jennifer L. Castle
Hilary 2008
This note considers non-stationary univariate time-series. For readings see Banerjee,
Dolado, Galbraith and Hendry (1993) Ch.3&4, and Patterson, K. (2000), Ch.6. For imple-
mentation of the unit root test using OxMetrics see Hendry and Doornik (2001), Ch.4.
1 Pure random walk
We have analysed the AR(1) process in a lot of detail:
y
t
= +y
t1
+
t
,
t
WN

0,
2

(1)
A special case of interest occurs when we set = 0 and = 1. This is called a pure random
walk:
y
t
= y
t1
+
t
,
t
WN

0,
2

(2)
Likely values for are 0 < 1 for most economic time series.
If > 1 the process is explosive and it will grow without limit.
If < 0 successive values of y
t
tend to oscillate in sign.
If = 0, y
t
is a white noise process.
If 0 < < 1, the process is a stationary AR(1) process.
If = 1 the process is a random walk. If = 0 it is a pure random walk and if = 0
the process is a random walk with drift.
Why is a random walk interesting?
A random walk implies that the best guess of y
t+1
given the information at time t is y
t
.
There is no predictive structure in the AR process or the error process. Hence, the random
walk is often the baseline model for nancial and foreign exchange markets. It implies that
it is not possible to exploit the past history of y
t
or
t
to make prots by speculating on
future realisations.
1
A random walk has persistent shocks. To see this, solve recursively:
y
t
= y
t1
+
t
= y
t2
+
t1
+
t
= y
t3
+
t2
+
t1
+
t
=
.
.
.
= y
0
+
t

i=1

i
(3)
Hence, the random disturbances, or shocks, cumulate from the start of the process, and the
impact of a shock does not die away. The random walk is said to have an innite memory
it remembers the shocks forever.
A characteristic of the pure random walk is that it can wander anywhere (either positive
or negative) and it is not pulled back to a mean.
1.1 Calculating the rst and second moments of a RW
To calculate the mean take expectations of (3):
E[y
t
] = E[y
0
] +E

i=1

= y
0
+
t

i=1
E[
i
]
= y
0
(4)
where y
0
is xed. Hence, the mean is constant and nite.
To calculate the variance:
V[y
t
] =
t
(0) = E

(y
t
E[y
t
])
2

= E

i=1

=
t

i=1
E

2
i

{as E[
j

k
] = 0, j = k}
=
t

i=1

2
= t
2
(5)
2
To calculate the autocovariances:

t
(1) = E[(y
t
E[y
t
]) (y
t1
E[y
t1
])]
= E

i=1

t1

i=1

=
t1

i=1
E

2
i

{as E[
j

k
] = 0, j = k}
=
t1

i=1

2
= (t 1)
2
(6)
and in general:

t
(h) = E[(y
t
E[y
t
]) (y
th
E[y
th
])]
= E

i=1

th

i=1

=
th

i=1

2
= (t h)
2
(7)
Hence, the second moments depend on time, t.
The autocorrelation function is calculated as:

t
(h) =

t
(h)

2
t
(0)
2
th
(0)
=
(t h)
2

(t
2
) ((t h)
2
)
=
(t h)

t (t h)
=

(t h)
2
t (t h)
=

1
h
t
(8)
Recall the conditions for stationarity: {y
t
} is weakly or covariance stationary if rst
and second moments of the process exist and are time-invariant.
E [y
t
] = < t T
E [(y
t
) (y
th
)] = (h) < t, h (9)
3
Stationarity implies
t
(h) =
t
(h) = (h).
The stronger denition of strict stationarity states that {y
t
} is strictly stationary if for
any values of h
1
, h
2
, ..., h
n
the joint distribution of (y
t
, y
t+h
1
, ..., y
t+hn
) depends only on
the intervals h
1
, h
2
, ..., h
n
and not on t:
f (y
t
, y
t+h
1
, ..., y
t+hn
) = f (y

, y
+h
1
, ..., y
+hn
) t, (10)
Strict stationarity all moments are time invariant.
It is clear from the second moments that a random walk is non-stationary, as the variance
and autocovariance increases with t.
It is often difcult to distinguish between a random walk and a near random walk, i.e.
an AR(1) process with close to but less than 1 (say 0.95). This is because the process
has a lot of persistence and so over short samples can look similar to a random walk (as it
can deviate from its mean for long periods). However, as the sample size increases you will
observe the process returning to its mean more frequently and hence, there is more power
to distinguish between the two series as t increases.
1.2 Order of integration
An integrated process is one that can be made stationary by differencing. A discrete process
integrated of order d is one that can be made stationary by differencing d times. i.e
d
y
t
is
stationary where the differencing operater
d
is dened by (1 L)
d
.
Aside: If d is not an integer but a fractional value, the process is said to be fractionally
integrated of order d. This results in a process that has long memory, but we shall not
cover such processes in this course.
A random walk is integrated of order one, denoted I(1), as the rst difference
1
is a
stationary process:
y
t
= y
t1
+
t
I (1)
y
t
=
t
I (0) (11)
The rst difference is a white noise process. Integrating a White Noise process results in a
random walk.
2 Random walk with drift
The pure random walk can wander in any direction and so a priori we cannot predict which
way the process will go. If we add an intercept to the process we observe a systematic trend.
To see this solve the model recursively:
4
y
t
= +y
t1
+
t
= + ( +y
t2
+
t1
) +
t
= 2 + ( +y
t3
+
t2
) +
t1
+
t
=
.
.
.
= t +y
0
+
t

i=1

i
(12)
There is a deterministic trend component in the random walk, which induces a drift.
If the intercept is positive, the process will exhibit a positive trend, and if the intercept is
negative, there will be a downward trend.
2.1 Calculating the rst and second moments of a RW with drift
To calculate the mean, take expectations of (12):
E[y
t
] = E[t] +E[y
0
] +E

i=1

= t +y
0
(13)
The variance and autocovariance will be as before:
V[y
t
] =
t
(0) = E

(y
t
E[y
t
])
2

= t
2
(14)

t
(h) = E[(y
t
E[y
t
]) (y
th
E[y
th
])]
= (t h)
2
(15)
The rst difference of a random walk with drift is a stationary process but it will have a
non-zero constant mean determined by :
y
t
= +y
t1
+
t
I (1)
y
t
= +
t
I (0) (16)
2.2 Unit roots
Recall the AR(1) model:
y
t
= +y
t1
+
t
(1 L) y
t
= +
t
(17)
5
The characteristic equation is given by:
(1 z) = 0 (18)
which is solved when z =
1

. This is called the root of the equation. If = 1 then z = 1


and the process is said to have a unit root. In this case, the rst difference is stationary, as
(1 L) y
t
= y
t
y
t1
= +
t
which is stationary. Therefore, an I (1) series has 1 unit
root and an I (0) series has no unit roots.
For higher order autoregressive processes it is possible to have more than one unit root.
E.g.:
y
t
= +
1
y
t1
+
2
y
t2
+
t

1
1
L
2
L
2

y
t
= +
t
(19)
Forming the characteristic equation:

1
1
z
2
z
2

= 0
(1
1
z)

1
2
z
2

= 0 (20)
where
1
+
2
=
1
and
1

2
=
2
.
The roots are given by z
1
=
1

1
and z
2
=
1

2
. If these are both 1, there are 2 unit roots.
Hence, the process needs to be differenced twice to be stationary and is denoted I (2).
Aside: see the problem set exercise for a case with two unit roots.
For the general AR(p) model we can write the model as:
y
t
= +
p

j=1

j
y
tj
+
t
(L) y
t
= +
t
(21)
where (L) = 1

p
j=1

j
L
j
.
If a unit root exists, (L) can be factored into:
(L) = (1 L)

(L) (22)
where

(L) = 1

p1
j=1

j
L
j
such that (21) can be written as:
(1 L)

(L) y
t
= +
t

(L) y
t
= +
t
y
t
= +
p1

j=1

j
L
j
y
t
+
t
(23)
which is an AR(p-1) model in rst differences.
6
Likewise, if there are 2 unit roots, the same factorization can be applied and the model
would be an AR(p-2) process in second differences:
(1 L
2
)

(L) y
t
= +
t

2
y
t
= +
p2

j=1

j
L
j

2
y
t
+
t
(24)
where

(L) = 1

p2
j=1

j
L
j
.
3 Difference stationary and trend stationary series
Many economic time series exhibit a positive trend. One characterization of such series is
that they are random walks with drifts. Hence, they are difference-stationary as taking
the rst difference results in a stationary process. In this case, the shocks have a persistent
effect. An alternative explanation is that the data are trend stationary. A trend stationary
process has a deterministic trend, which accounts for the sustained increase in the series
over time. In this case the shocks have a transitory effect as they do not continue through
time but enter at only 1 point in time.
y
t
= +t +u
t
Trend-stationary process (25)
De-meaning by removing the trend results in a stationary process:
y
t
t = +u
t
(26)
Aside: We say that a trend stationary process is stationary despite its mean being a
function of time because the stochastic properties of the trend stationary process are en-
tirely determined by u
t
, and u
t
is clearly stationary.
The nesting model for both the trend stationary and difference stationary models is:
y
t
= +y
t1
+t +
t

t
WN

0,
2

(27)
The models nested include:
= 0; || < 1; = 0: Deterministic trend with stationary AR(1) component
= 0; = 1; = 0: Random Walk with drift and quadratic trend
= 0; = 1; = 0: Random Walk with drift
= 0; = 0; = 0: Deterministic trend
= 0; = 1; = 0: Pure Random Walk
It is important to establish whether data are trend stationary or difference stationary for
2 reasons. First, the persistence of shocks is very different between the 2 cases. Secondly, if
the data are difference stationary caution needs to be applied when analyzing non-stationary
data. Without appropriate transformations, spurious regressions could result. We shall look
at this phenomenon next.
7
3.1 Note on the quadratic trend in a unit root with determistic trend model
If:
y
t
= +y
t1
+t +
t
I (1) (28)
then we know that the rst difference will be I(0):
y
t
= +t +
t
I (0) (29)
i.e., a trend stationary process.
As y
t
I (0) this implies the cumulation of the series (i.e. the integral) would be I(1):
t

i=1
[y
i
] I (1) (30)
Therefore:
y
t
=
t

i=1
( +t +
t
)
=
t

i=1
+
t

i=1
(t) +
t

i=1

i
= t +
t (t + 1)
2
+
t

i=1

i
(31)
where
t(t+1)
2
results from the summation of an arithmetic progression. Hence, the process
has a quadratic trend in the level:
y
t
=

+

2

t +

2
t
2
+
t

i=1

i
(32)
This is intuitive as a random walk with drift has a trend and a trend stationary process has
a trend. Hence, a model with both components will have a quadratic trend. This is unlikely
to occur in practice, except perhaps locally.
4 Spurious regression
We are interested in investigating the relationship between dynamics and interdependence.
In other words, what impact does non-stationarity have when looking at the relationships
beween variables. Non-stationary processes can lead to nonsense regressions, where ap-
parent correlations are found which are in fact false. The problem of nonsense regression
was rst analysed by Yule in 1926.
8
Let us assume that we have 2 unrelated I (1) series, y
t
and x
t
, where:
y
t
= y
t1
+u
t
(33)
x
t
= x
t1
+v
t
(34)
E(u
t
v
s
) = 0, t, s; E(u
t
u
k
) = E(v
t
v
k
) = 0, k = 0 (35)
y
0
= x
0
= 0
i.e. x
t
and y
t
are uncorrelated Random Walks.
We next specify the economic hypothesis. As an example let y be the cumulative num-
ber of murders in the UK and x be the population of the UK. We are interested to see if
there is a relationship between the two variables:
y
t
=
0
+
1
x
t
+
t
(36)
where
yt
xt
=
1
. Under conventional assumptions we would calculate the t-statistic for

1
, denoted

t

1
and test the null hypothesis H
0
:
1
= 0. Under the null hypothesis, the
conventional probability of the t-test of H
0
being signicant is:
P

1
=0

2.0|H
0

= 0.05 (37)
This states that under the null hypothesis that y and x are unrelated we would expect to
observe t-values greater than 2 in absolute value with a 5% probability. As the processes
are unrelated we should nd
1
=
yt
xt
p
0.
However, when the data are non-stationary there is a balance problem when the null
hypothesis is true. As y
t
I (1) and we assume
t
I (0), equation (36) can be well
dened for non-zero
1
as x
t
I (1). If
1
= 0, the error term,
t
must be I (1) and there is
a violation of the assumptions.
The asymptotic theory for the unit root case is very complicated. An alternative way of
examining the issues is through Monte Carlo - this is where we simulate the process many
times and plot the distributions that result.
4.1 Monte Carlo
Generate y
t
and x
t
from (33) and (34) for T = 100 observations. The errors are drawn
using a random number generator for a standard normal distribution. The two series are
independently generated consistent with (35). Then estimate equation (36) and record the
estimated coefcient, standard error, t-statistic and R
2
. This process is repeated M =
10, 000 times, taking draws from the error distributions. This will give us 10,000 estimates
of:
the estimated coefcients

1,i
, i = 1, . . . , M;
the estimated coefcient standard errors

1,i
, i = 1, . . . , M;
the estimated t-statistics

t

1,i
, i = 1, . . . , M
9
the frequency of rejection of H
0
:
1
= 0;
the sample correlation between y
t
and x
t
.
Let us rst look at the estimate of
1
. The mean estimate of
1
is given by:

1
=
1
M
M

i=1

1,i
= 0.008 (38)
with a standard error of 0.006. Hence, we cannot reject H
0
: E

= 0.
The frequency distribution of

1
is recorded in gure 1. The shape of the distribution is rea-
sonably normal but the sampling standard deviation denoted MCSD is large (MCSD=0.62),
i.e. the distribution is spread out, revealing that some of the estimates are large in absolute
value.
Note the difference between MCSE and MCSD. The MCSD is what Monte Carlo re-
veals the correct value of the sampling standard deviation to be. The MCSE is what the
economist would report on average in any one regression. For M = 10, 000 replications,
MCSE = MCSD

M = MCSD

/100. This determines the accuracy of the


Monte Carlo estimates of E

, i.e. the variability due to sampling.


The uncertainty is evident when looking at the distribution of the standard errors of the
coefcient estimates. Figure 2 plots the distribution of

1,i
=

T
i=1
x
2
t,i

1/2

,i
. There
is a long right hand tail suggesting that there are some very large estimates of the standard
error. This impacts on the t-test statistics, recorded in gure 3. This gure also records the t-
distribution for the same test on stationary data in the top panel for comparison. The shape
is as we would expect a t-distribution, but the draws are very spread out. The statistic
calculated in each replication doesnt have the same variance as under the stationary case.
Instead, very large values of the t-statistic are likely. For:
P ({Reject H
0
:
1
= 0} |H
0
) = 0.05 (39)
instead of the critical values being 2, the values are 14.8:
P

1
=0

14.8|H
0

0.05 (40)
This is called a nonsense regression. It is easy to observe a vary large t-statistic showing
that the 2 series have a high correlation, when in fact they are unrelated.
The nonsense regression phenomenon does not disappear as T increases. Figure 4
records the estimated coefcients and standard errors recursively for T = 20, 21, . . . , 100.
The coefcient estimates are not biased but the standard deviation of the estimates is very
large and does not decrease with T. The rejection frequency (which should be 5% under
the null) increases with the sample size to 76%.
10
3.5 3.0 2.5 2.0 1.5 1.0 0.5 0.0 0.5 1.0 1.5 2.0 2.5 3.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
Figure 1: Frequency distribution of

1
0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45 0.50 0.55 0.60 0.65
1
2
3
4
5
6
7
8
9
10
random walks
Figure 2: Frequency distribution of

1
For non-stationary data,

1
does not converge to 0, but instead converges to a random
variable.

1
=

T
1
T

t=1
(x
t
x)
2

1
T
2
T

t=1
(x
t
x) (y
t
y) (41)
The numerator and denominator converge weakly to functionals of Brownian motion. See
Hamilton (1994) for further reading.
The reason for the over-rejection of the null is that when the data are non-stationary,
two series can be highly correlated merely because they wander in the same direction.
11
4 3 2 1 0 1 2 3 4
0.1
0.2
0.3
0.4
White noise
40 30 20 10 0 10 20 30 40
0.02
0.04
0.06
Random walks
Figure 3: Frequency distribution of the t-test of H
0
:
1
= 0
10 20 30 40 50 60 70 80 90 100
1
0
1
2ESE
2ESE
2MCSD
2MCSD
^

1
10 20 30 40 50 60 70 80 90 100
0.25
0.50
0.75
1.00
Rejection frequency
Figure 4: Recursives for nonsense regression simulation
Figure 5 records the R
2
s for a pair of I(0) and I(1) processes. When both variables are I(1)
the distribution is like a semi-ellipse, with excess frequency at both ends of the distribution.
Hence, values of R
2
well away from 0 are more likely, resulting in over-rejection of the
null.
As x
t
I (1), the sample variance is a function of T. Hence, the regression is imbal-
anced: under the null
t
I (1) and the sample variance will also be a function of T.

0
and

1
minimise
2

. but observe that

1
=

T
t=1
(x
t
x)
2

1
2
which is a function
of T.
12
0.4 0.3 0.2 0.1 0.0 0.1 0.2 0.3 0.4
1
2
3
4
1.25 1.00 0.75 0.50 0.25 0.00 0.25 0.50 0.75 1.00
0.2
0.4
0.6
Figure 5: Distribution of R for unrelated I(0) and I(1) series
Therefore the t-test,

1
=
c

1
b

1
diverges as T .
The nonsense regression phenomenon has serious implications for modelling economic
time-series. We need methods to test for non-stationary data, and then we need to think
about how to model non-stationary data in order to avoid this nonsense regression problem.
5 Testing for unit roots Dickey-Fuller tests
We will rst consider the simplest model and then move on to more complex models. Con-
sider the AR(1) with no intercept:
y
t
= y
t1
+
t
,
t
WN

0,
2

(42)
We can rewrite this model as:
y
t
= y
t1
+
t
(43)
where = 1.
If the process has a unit root, = 1 and = 0. Hence, the test for a unit root would be
a t-test of H
0
: = 0.
5.1 Simulating the Dickey-Fuller distribution
The t-test of H
0
: = 0 doesnt have a standard distribution. Instead the critical values are
simulated.
1. Generate the data of a given sample size T according to a specied DGP. For the pure
random walk generate:
y
t
=
t

t
NID(0, 1) (44)
13
with y
0
= 0.
The values of
t
for T = 1, . . . , T are taken from a random number generator with a
distribution as specied (here a standard normal).
Repeat the process M times to generate M samples of size T.
2. Estimate a regression model for each sample generated in step 1.
The choice of regression model is important it should match the DGP. So in this
example the model estimated would not include a constant:
y
t
= y
t1
+
t
(45)
For each replication and

are recorded. Although = 0, there will be a distribu-


tion of due to random sampling. The t-statistic is plotted and we can calculate the
critical values from this distribution when the DGP is y
t
=
t
and the maintained
regression is y
t
= y
t1
+
t
. The usual choice of signicance level is used (e.g.
5% or 1% are most common).
Note that the critical values will depend on the choice of null and alternative hypothesis.
The null hypothesis is of a unit root, but the alternative could be a 1 or 2-sided test. If
H
a
: = 0, we would also be testing for an explosive process as > 0 implies > 1.
This is not chosen in general as an explosive process is unstable and is unlikely to occur
in economic data. Hence, we can maximize power by computing a 1-sided test where
H
a
: < 0. This tests the null of a unit root against a stationary process. The critical values
will be negative and when sample values are more negative than the critical values we have
a rejection of the null hypothesis in favour of a stationary process.
The null and maintained regressions that we have looked at exclude an intercept. If we
wish to extend the model to test for a random walk with drift we would have the maintained
regression:
y
t
= +y
t1
+
t
(46)
where the null hypothesis is H
0
: = 0, i.e. a unit root, and the alternative hypothesis is
H
a
: < 0.
Under the alternative, the process is stationary and the process will have a non-zero
mean given by E[y
t
] =

1
. This test will have a different distribution to the test for a pure
random walk. This is because the test statistics depend on deterministic terms. Again, test
statistics are obtained by simulation, but depend on the value of in the DGP. We think
of as a nuisance parameter for inference on as the critical values for depend on the
generally unknown value of .
An alternative to the t-test is to compute a joint F-test of signicance of both and .
In this case the joint null hypothesis is H
0
: = 0; = 0. Under the null the series is a
pure random walk without drift. Under the alternative there are 3 possibilities:
H
a
: = 0; = 0. =Random walk with drift
H
a
: = 0; = 0. =Stationary AR process with non-zero mean
14
H
a
: = 0; = 0. =Stationary AR process with zero mean
We would not anticipate a random walk with drift, as if there was a noticable drift in the
series we would include a trend in the process (see next section). The F-test is calculated as
standard but different distributions apply and these are simulated. A drawback of the joint
test is that it is 2-sided and therefore does not maximise power in the likely direction of
departure from the null hypothesis.
We can generalize the test further to include a trend, as we are concerned with testing
between a trend stationary and difference stationary process. In the models we have looked
at there is no mechanism for generating the trend under the alternative of stationarity. Hence,
the maintained regression would be:
y
t
= +y
t1
+t +
t
(47)
where the null hypothesis is H
0
: = 0, i.e. a unit root, and the alternative hypothesis is
H
a
: < 0.
Again, a different distribution is used to obtain the critical values and is simulated by
computer packages.
If a joint F-test of signicance was computed the null hypothesis would be H
0
: =
0; = 0, i.e. a random walk with drift and no deterministic trend. Under the alternative
there are 3 possibilities:
H
a
: = 0; = 0. = Random walk with drift and deterministic trend (i.e. a
quadratic trend).
H
a
: = 0; = 0. =Stationary AR process with deterministic trend
H
a
: = 0; = 0. =Stationary AR process with no deterministic trend.
Aquadratic trend is unlikely, and would be observable in the data, ruling out the rst hypoth-
esis. The most likely alternative is a difference stationary process (as a trend was observed
in the data). If the third hypothesis were true, no trend would be observed and you would
apply the DF test to a model with just an intercept and not a trend.
Tests often have low power when the null and the alternative are close (i.e. a stationary
AR process with a very high persistence parameter as opposed to a unit root). Various alter-
native tests have been proposed in the literature to try and improve on the power, although
none are uniformly more powerful. This suggests that caution should be used against too
rigid an application of the unit root test. In practice there is considerable uncertainty in de-
termining the exact nature of the DGP. This is further complicated when there are structural
breaks in the data. Hence, further evidence is often sort before being rm on the null and
alternative hypothesis at the margin.
The distributions of the t-statistic vary depending on whether there is no intercept, an
intercept, or an intercept and trend in the model.
Summary
Three cases to consider:
15
Test Model Hypothesis
y
t
= y
t1
+
t
H
0
: = 0

y
t
= +y
t1
+
t
H
0
: = 0 or H
0
: = 0; = 0

y
t
= +y
t1
+t +
t
H
0
: = 0 or H
0
: = 0; = 0
where the critical values depend on the specication of the null and alternative hypotheses.
There are different distributions of test statistic depending on the deterministic terms. A
one-sided test is usually used to maximise power:
H
0
: = 0 versus H
a
: < 0. (48)
For examples of the critical values:
Distribution 2.5% 5% 10% 50% 90% 95% 97.5%
N(0, 1) -1.96 -1.64 -1.28 0 1.28 1.64 1.96
DF

-3.12 -2.86 -2.57 -0.44 -0.07 0.23


DF

-3.66 -3.41 -3.12 -1.25 -0.94 -0.66


Figure 6 records the distribution of the DF tests with no intercept and intercept against
a standard normal. The distributions are more negative. Also observe that for a given
signicance level,

<

< .
4 3 2 1 0 1 2 3 4
0.1
0.2
0.3
0.4
0.5
No deterministic terms
N(0,1)
DF

5 4 3 2 1 0 1 2 3 4
0.2
0.4
Intercept
N(0,1)
DF

Figure 6: Frequency distribution of DF tests with N(0,1)


16
6 Augmented Dickey-Fuller test
So far we have considered an AR(1) process. However, this is a simple model that may not
characterize actual economic data. A simple generalization is the AR(p) model:
y
t
= +
1
y
t1
+
2
y
t2
+. . . +
p
y
tp
+
t
(49)
If an AR(1) model were tted, say:
y
t
= +
1
y
t1
+v
t
(50)
then:
v
t
= +
2
y
t2
+. . . +
p
y
tp
+
t
(51)
and the autocorrelations of v
t
and v
tk
will be non-zero. The residuals will be autocorre-
lated and we fail the assumptions of the classical regression model.
The strategy for selecting the lag of an AR process is to start from a long lag and test
downwards. This is called a general-to-specic strategy. You would specify a long lag
and test the signicance of the longest lag and compute the diagnostic tests. If the longest
lag is insignicant, and the model passes the diagnostic tests you would delete the lag and
re-estimate the model with 1 fewer lag. Again you would test for the signicance of the
longest lag and check the diagnostic tests. You would repeat the procedure until either the
longest lag was signicant or deleting the lag led to a failure of a diagnostic test.
To compute the Augmented Dickey-Fuller test, consider an AR(2) process:
y
t
= +
1
y
t1
+
2
y
t2
+
t
(52)
We can rewrite this as:
y
t
y
t1
= +
1
y
t1
y
t1
+ [
2
y
t1

2
y
t1
] +
2
y
t2
+
t
y
t
= + (
1
+
2
1) y
t1

2
y
t1
+
t
y
t
= +y
t1
+y
t1
+
t
(53)
where:
=
1
+
2
1
=
2
(54)
The test statistic is as before a t-test on , where H
0
: = 0 is a unit root and H
a
: < 0 is
a stationary process. This is called an Augmented Dickey-Fuller test as the test is augmented
by y
t1
to mop up any residual autocorrelation and pre-whiten the residuals.
The distributions of the test statistics are as before. They depend on whether an intercept
and trend is included in the model but they dont depend on the augmentation of the lagged
differences.
We can generalize the model to an AR(3) process:
y
t
= +
1
y
t1
+
2
y
t2
+
3
y
t3
+
t
(55)
17
We can rewrite this as:
y
t
y
t1
= +
1
y
t1
y
t1
+ [
2
y
t1

2
y
t1
] + [
3
y
t1

3
y
t1
] +
2
y
t2
+
3
y
t3
+
t
y
t
= + (
1
+
2
+
3
1) y
t1

2
y
t1

3
y
t1
+
2
y
t2
+
3
y
t3
+ [
3
y
t2

3
y
t2
] +
t
= + (
1
+
2
+
3
1) y
t1
(
2
+
3
) y
t1

3
y
t2
y
t
= +y
t1
+
1
y
t1
+
2
y
t2
+
t
(56)
where:
=
1
+
2
+
3
1

1
= (
2
+
3
)

2
=
3
(57)
The general AR(p) model with intercept and trend is:
y
t
= +
p

j=1

j
y
tj
+t +
t
y
t
= +y
t1
+
p1

j=1

j
y
tj
+t +
t
(58)
where
=
1
+
2
+ +
p
1,

1
= (
2
+ +
p
)
.
.
.

p1
=
p
(59)
6.1 A framework for unit root testing
1. Plot the data. Does it look stationary or non-stationary? Is there a trend evident. Is
the mean non-zero?
2. Plot the ACF and PACF. How persistent is the data? How many lags look like being
required to ensure the residual is white noise?
3. Start with a long lag and estimate an AR(p) model. Check the diagnostic tests. Re-
duce the model by eliminating the longest lag if it is statistically insignicant, whilst
ensuring the residuals are not autocorrelated. Stop when the longest lag is signicant,
or deletion of the longest lag results in an autocorrelated error. This will determine
the lag order for your AR(p) model.
4. Transform the model to the maintained Dickey-Fuller regression, i.e. the model in
differences with the lagged level.
18
5. Decide whether to include an intercept and trend based on your graphical analysis.
If the data exhibit a trend, you would test a random walk with drift against a trend
stationary process. Hence, include an intercept and trend in the model. If there is no
trend in the data but the mean is non-zero, include an intercept.
6. Compute the ADF statistic and compare to the Dickey-Fuller distribution. Recall that
the null hypothesis is non-stationarity against the alternative that is stationary.
7. If you cannot reject the null hypothesis, you would conclude that the data are likely to
be non-stationary. The data could be integrated of order 1 or higher. To test whether
the data are integrated of a higher order, apply the same test to the differences of the
process, i.e.
y
t
= +
p

j=1

j
y
tj
+t +
t

2
y
t
= +y
t1
+
p1

j=1

2
y
tj
+t +
t
(60)
The null hypothesis, H
0
: = 0 is for y
t
to have a unit root, against the alternative
that H
a
: < 0, such that y
t
is stationary. Hence, under the null, y
t
is I(2) and
under the alternative y
t
is I(1).
8. This procedure can be applied recursively. If you nd the process to be non-stationary,
test for I(1) versus I(2), etc.
6.2 Structural Breaks
One problem with testing for unit roots is that a structural break can lead to incorrect infer-
ence regarding unit roots. In this case an I(0) process with break is difcult to distinguish
from an I(1) process.
Consider an articial process in which we generate data given by:
y
t
=

0.5y
t1
+
t
for t = 1, . . . , 50
4 + 0.5y
t1
+
t
for t = 51, . . . , 100
(61)
where
t
NID(0, 1) for t = 1, . . . , T.
The process is a stationary AR process but with a mean shift at time 50.
If we t an AR(1) model to the full sample we nd:
y
t
= 0.96
(0.03)
y
t1
+ 0.26
(0.18)
There is a coefcient of 0.96 on the lagged dependent variable. This is a near unit root, and
a statistical test would not reject the null hypothesis of a unit root. This is clearly incorrect.
19
The reason for such a high coefcient is that the process needs a mechanism for getting up
to the new mean and the only way to do this is to put a near unit coefcient on the lagged
dependent variable. Figure 7 shows the properties of the estimated model. It is clearly
misspecied and incorrect inference would result.
0 20 40 60 80 100
0.0
2.5
5.0
7.5
10.0
Ya Fitted
0.0 2.5 5.0 7.5 10.0
0
.
0
2
.
5
5
.
0
7
.
5
1
0
.
0
Fitted
Y
a
0 20 40 60 80 100
2
0
2
4
r:Ya (scaled)
0 5 10
0.5
0.0
0.5
1.0
ACFr:Ya PACFr:Ya
Figure 7: Properties of estimated AR(1) model
7 Conclusion
Many economic time series have unit roots. A general rule of thumb is that often nominal
variables are I(2), real variables are I(1), and growth rates of real variables are I(0). We can
test for unit roots and next we shall consider how to model them.
8 References
Banerjee, A., Dolado, J. J., Galbraith, J.W., and Hendry, D. F. (1993). Cointegration, Error
Correction and the Econometric Analysis of Non-Stationary Data. Oxford: Oxford Univer-
sity Press.
Hendry, D. F., and Doornik, J. A. (2001). Empirical Econometric Modelling using PcGive:
Volume I, 3rd edn. London: Timberlake Consultants Press.
Patterson, K. (2000). An Introduction to Applied Econometrics. A Time Series Approach.
UK: Palgrave Macmillan.
20