Further Non-Stationarity Notes

University of Oxford
Time Series Analysis
Section III
Michaelmas Term, 2010
Department of Statistics, 1 South Parks Road,

Oxford OX1 3TG
Contents
1 Non-Stationary Time Series 1
1.1 Phenomenology of Non-Stationarity . . . . . . . . . . . . . . . . . 1
1.2 Trend Stationary vs Di↵erence Stationary . . . . . . . . . . . . . 3
2 Unit Root Tests 7
2.1 General Issues in Unit Root Testing . . . . . . . . . . . . . . . . . 7
2.2 Dickey-Fuller Tests . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.2.1 Model 1 No Drift or Trend ⌧ . . . . . . . . . . . . . . . . 8
2.2.2 Model 2 Drift but no Trend ⌧µ . . . . . . . . . . . . . . . 9
2.2.3 Model 3 Drift and Trend ⌧⌧ . . . . . . . . . . . . . . . . . 10
2.2.4 Perron Sequential Testing Procedure for Unit Roots . . . . 12
2.3 Augmented Dickey Fuller Regression . . . . . . . . . . . . . . . . 15
2.3.1 1. Non-IID errors . . . . . . . . . . . . . . . . . . . . . . . 16
2.3.2 2. MA(q) and AR(k) terms . . . . . . . . . . . . . . . . . 17
2.3.3 The ADF test . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.3.4 ADF Test Procedure: Phase 1 . . . . . . . . . . . . . . . . 19
2.3.5 ADF Test Procedure: Phase 2 . . . . . . . . . . . . . . . . 20
1
CONTENTS 2
3 Spurious Regressions 21
4 Multivariate Time Series 29
4.1 Vector Time Series Models . . . . . . . . . . . . . . . . . . . . . . 29
4.1.1 Covariance and Correlation Matrix Functions . . . . . . . 29
4.1.2 Moving Average and Autoregressive Vector Models . . . . 33
5 Cointegration 39
5.1 The Error Correction Model . . . . . . . . . . . . . . . . . . . . . 39
5.2 Vector Error Correction Models . . . . . . . . . . . . . . . . . . . 42
5.3 Cointegration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
5.4 Engle-Granger Estimation . . . . . . . . . . . . . . . . . . . . . . 46

1
Non-Stationary Time Series
In Section I we examined several di↵erent ARIMA models. We determined

conditions under which these models were stationary. Having imposed these
stationarity conditions we then computed ACF functions for each model. In this
chapter we will examine some models for non-stationary series.
1.1 Phenomenology of Non-Stationarity
In this section we will consider some of the implications of non-stationarity. A

stationary series has a well-defined mean around which it can fluctuate with
constant finite variance. This is not necessarily true for a non-stationary series.
The issues involved can best be illustrated by example. Consider the AR(1)
model ✓ ◆
1
yt = yt 1 + ✏t , t = 1, 2, 3, . . . . (1.1)
r
This is equivalent to
✓ ◆
1
1 B yt = ✏t , t = 1, 2, 3, . . . , (1.2)
r
1
1. NON-STATIONARY TIME SERIES 2
1
and the characteristic polynomial (1 r
z) has root z = r. The series behaves
di↵erently according to whether r > 1, r = 1 or r < 1. Equation (1.2) has
solution:
1 1 1
yt = ✏t + ✏t 1 + ... + ✏1 + y0 (1.3)
r rt 1 rt
where y0 is the value of yt at t = 0.
1
It is clear that when r > 1 the influence of the initial term rt
y0 and the
1
impulses ri
✏t i die out as they move further into the past.
For r > 1 therefore we see that the present is more important than the past.
For these values of r the series is stationary and its behaviour will consist of
oscillation around the mean value 0.
When r = 1 past shocks and the initial value have the same weight, the past
being as important as the present.
And for r < 1 the weights on past terms increase with t, the past is more
important than the present. Here the series rapidly diverges towards +1 or
1. This behaviour is termed explosive and is of course counter-intuitive in
almost all situations. For that reason we safely assume that in time series models
of real life data all roots are either on or outside the unit circle.
1.2 Trend Stationary vs Di↵erence Stationary
We have seen in the last chapter that di↵erencing certain non-stationary time
series can produce stationary series.
For example if yt is an ARIMA(p, d, q) process where the p roots of the AR

characteristic equation are all outside the unit circle then this series yt will be
non-stationary because of the presence of the d unit roots in the model.
If however we consider the di↵erenced series zt = 4d yt , this series will be stationary

as all p of the roots of its AR characteristic equation are greater than one in
magnitude.
Instead of di↵erencing a series to achieve stationarity one might think of removing

a polynomial trend from yt to leave a stationary series. In fact this technique
works for some series but not for others.
Consider the series
yt = ↵ + t + ✏t , ✏t ⇠ IID white noise. (1.4)
If we remove the linear term ↵+ t from this series yt then the series zt = yt ↵ t
with which we are left is clearly a stationary series :
z t = ✏t . (1.5)
Of course di↵erencing this series would also leave a stationary series. This can
be seen from the following:
y t = ↵ + t + ✏t
yt 1 = ↵ + (t 1) + ✏t 1 (1.6)
4yt = + 4✏t (1.7)
Patently this di↵erenced series 4yt is stationary.
Nomenclature
Before proceeding we note the following:
Drift: Constant terms included in time series models such as ↵ in (1.4) are
referred to as Drift terms.
Trend: Constant multiples of time t such as t in (1.4) are referred to as Trend

terms.
Next consider the process
yt = + yt 1 + ✏t , ✏t ⇠ IID white noise. (1.8)
If = 1 this series is non-stationary and if the initial value of yt at t = 0 is y0

then, by iteration, we have:
t
X
yt = y0 + t + ej . (1.9)
j=1
If we remove a linear trend y0 + t from this series we are still left with a
P
non-stationary series tj=1 ej .
If however we were to di↵erence the series yt then we would find 4yt = + ✏t

which is stationary with mean .
Models such as (1.8) which require di↵erencing to achieve stationarity (and cannot
be made stationary by just removing a linear trend) are called Di↵erence-Stationary
series, whereas models which are stationary upon removal of a linear trend e.g.
(1.4) are called Trend-Stationary.
Note that the general ARIMA model (from Section I):
(B)4d (yt µ) = ✓(B)✏t ,
could in fact be written as
(L)4d [yt ( o + 1t + ... + d 1t

d 1
)] = ✓(L)✏t (1.10)
because
4d ( o + 1t + ... + d 1t
d 1
) = 0.
Thus (1.10) automatically includes polynomial trends of degree d 1. Including

a polynomial of degree d + k0 would give
(L)4d [yt ( o + 1t + ... + d+k0 t

d+k0
)]
= ✓(L)✏t (1.11)
which is equivalent to
(L)4d yt = c(t) + ✓(L)✏t , (1.12)
c(t) ⇠ polynomial of degree k0 .
Both the Trend-Stationary and Di↵erence-Stationary models allow for the inclusion
of a polynomial trend but in the Di↵erence-Stationary case the deviations from
the polynomial trend still require di↵erencing to achieve stationarity.
Choosing not to di↵erence a series when in fact di↵erencing is required can lead
to serious consequences such as spurious regression (c.f. Section 3), which is one
consequence of non-stationarity. As was seen from equation (1.9), removing a
linear trend does not solve the non-stationarity problem if the series is actually
di↵erence stationary.
Unnecessary di↵erencing, which has the benefit of at least ensuring stationarity,

has far less serious consequences: it can lead to inefficient parameter estimates
and over conservative forecast intervals. These parameter estimates are, however,
unbiased and consistent.
In the next section we will examine how to determine whether a series is ‘Di↵erence
Stationary’.
2
Unit Root Tests
2.1 General Issues in Unit Root Testing
We have seen the importance of the presence of unit roots in a time series. In
practice how should one decide whether a series contains a unit root or not?
One could examine plots of the time series looking for wandering behaviour that
would indicate non-stationarity. Alternatively one could look at the sample
auto-correlation function (ACF) of the original series and of the di↵erenced
series, if the auto-correlations don’t die out quickly then this would indicate
non-stationarity.
There are, however, problems with relying on graphical methods; the human eye
can deceive. Formal tests for unit roots have been developed and we will now look
at two of these tests in detail; the Dickey-Fuller and Augmented-Dickey-Fuller
tests. Formulating a set of hypotheses to test is our first consideration.
A general unit root process can be written as
(L)4(yt µ) = ✓(L)et , ✓(1) 6= 0. (2.1)
7
2. UNIT ROOT TESTS 8
This could be tested against the alternative hypothesis
(L)(1 ⇢L)(yt µ) = ✓(L)et , 1 < ⇢ < 1. (2.2)
The way this test has been formulated indicates that we are choosing a null
hypothesis of a unit-root with stationary alternatives. So we accept a unit-root
unless there is significant evidence that the process is stationary. We could have
decided to have stationarity as the null.
The reason we choose the hypotheses to have a unit-root null is because of the
relative importance of the two errors in this testing procedure. If we decide the
series is stationary when in fact it contains a unit root then any forecast intervals
we derive will be too narrow and we will be over-confident of our forecasts. If
however we conclude the series possesses a unit root when in fact it is stationary
then we would di↵erence a stationary series. The consequences of that are not
so serious: we would produce over conservative forecast intervals.
2.2 Dickey-Fuller Tests
In this Section we examine the Dickey-Fuller (DF) approach to testing for a unit
root.
2.2.1 Model 1 No Drift or Trend ⌧
The simplest example of the procedure is in the AR(1) model with no drift or
time trend term:
yt = ⇢yt 1 + et . (2.3)
We assume here that the et terms are IID white noise and we are interested in
testing the hypotheses:
H0 : ⇢ = 1 vs HA : ⇢ < 1.
In practice it is easier to use a re-parameterisation of (2.3):
4yt = yt 1 + et , (2.4)
where =⇢ 1, so that we are now testing
H0 : = 0 vs HA : < 0.
Considering (2.4), we see that we can test this hypothesis by regressing 4yt on
yt 1 and computing the standard least squares t-statistic for testing that the
coefficient equals 0.
This test statistic, which we will call ⌧ is produced automatically in the computer
output obtained from most statistical packages by running a regression for equation
(2.4).
There is one important thing to note, however. If the true process is (2.3) with
⇢ = 1 then, because of non-stationarity this “t”-test statistic does not, in fact,
follow the standard t-distribution. The asymptotic theory of this model has
been developed using Brownian motion techniques. Dickey and Fuller have used
Monte-Carlo simulation to compute a set of critical values for this test and for
other variations on this model. We present some of these critical values in Table
2.4.
2.2.2 Model 2 Drift but no Trend ⌧µ
Consider now a model with drift:
yt = ⇢yt 1 + ↵ + et (2.5)
or the reparameterisation
4yt = yt 1 + ↵ + et . (2.6)
Again we are testing the hypotheses
H0 : ⇢ = 1 vs HA : ⇢ < 1
or equivalently
H0 : = 0 vs HA : < 0.
The test statistic in this test, ⌧µ , is again the standard least squares t-statistic
obtained by running a regression for equation (2.6).
If the true data generating process has ↵ = 0 so that the real process is actually
(2.3) and if ⇢ = 1 then this test statistic, ⌧µ , follows a nonstandard distribution.
Critical values for this distribution, which is di↵erent from the one for the ⌧
statistic, have also been produced by Dickey and Fuller (c.f. Table 2.4). If
however the true process contains a unit root but a non-zero drift term ((2.5)
with ⇢ = 1), then ⌧µ follows the standard normal distribution.
2.2.3 Model 3 Drift and Trend ⌧⌧
Finally consider a model with drift and a trend:
yt = ⇢yt 1 + ↵ + t + et (2.7)
4yt = yt 1 + ↵ + t + et . (2.8)
Here the Dickey-Fuller test statistic, ⌧⌧ , is again the standard least squares
t-statistic obtained by running a regression for equation (2.8).
Again if the true data process is actually (2.3) with ⇢ = 1 then this test statistic,
⌧⌧ , also follows a nonstandard distribution. Critical values have again been
produced by Dickey and Fuller (c.f. Table 2.4). If the true process contains a unit
root but a non-zero drift term ((2.5) with ⇢ = 1), then ⌧⌧ follows a nonstandard
distribution. Lastly, if the true process contains a unit root, a non-zero drift term
and a non-zero trend term((2.7) with ⇢ = 1), then ⌧⌧ follows a standard normal
distribution.
We summarise these results in Table 2.1.
Estimating Test Critical

Equation Statistic True Model Values
4yt = yt 1 + et ⌧ 4yt = et Table 2.4
4yt = yt 1 + ↵ + et ⌧µ 4yt = et Table 2.4

4yt = yt 1 + ↵ + et ⌧µ 4yt = ↵ + et Standard Normal
4yt = yt 1 + ↵ + et ⌧µ 4yt = ↵ + t + et Standard Normal
4yt = yt 1 + ↵ + t + et ⌧⌧ 4yt = et Table 2.4

4yt = yt 1 + ↵ + t + et ⌧⌧ 4yt = ↵ + et Table 2.4
4yt = yt 1 + ↵ + t + et ⌧⌧ 4yt = ↵ + t + et Standard Normal
Table 2.1: Dickey Fuller ⌧ -Test Statistics
As well as using these three ⌧ -statistics to test the hypothesis of a unit root, it
is also possible to test some joint hypotheses for the presence of an intercept,
a time trend and a unit root. These joint tests use test statistics which are
calculated as standard F -statistics comparing restricted and unrestricted residual
sums of squares. However again due to non-stationarity the distributions are
non-standard. Dickey and Fuller present tables of critical values (c.f. Table 2.5
on page 15) for the three -statistics which are defined in Table 2.2.
We have now seen various tests of the unit-root hypotheses. Which test statistic
we should use depends not only on the estimating equation we will use but also

Equation Statistic Hypotheses Values
4yt = ↵ + yt 1 + et 1 H0 : (↵, ⇢) = (0, 1) Table 2.5

HA : (↵, ⇢) 6= (0, 1)
4yt = ↵ + t + yt 1 + et 2 H0 : (↵, , ⇢) = (0, 0, 1) Table 2.5

HA : (↵, , ⇢) 6= (0, 0, 1)
4yt = ↵ + t + yt 1 + et 3 H0 : (↵, , ⇢) = (↵, 0, 1) Table 2.5

HA : (↵, , ⇢) 6= (↵, 0, 1)
Table 2.2: Dickey Fuller -Test Statistics
depends on what the true data generating process is. Of course we do not know
in advance which is the correct data generating process so we need a systematic
testing procedure.
2.2.4 Perron Sequential Testing Procedure for Unit Roots
Perron described such a sequential testing procedure and we have outlined it in

Table 2.3 on page 13. We begin using the most general model, if we fail to reject
the null hypothesis of a unit root we continue through the steps stopping as soon
as we can reject a null of a unit root.
NOTE: Steps 2a and 4a are only performed if we reject steps 2 and 4.
The reasoning behind this procedure is as follows. In step 1 we use non-standard

critical values. Suppose we perform step 1 and fail to reject the null hypothesis
of a unit root, we then are left to decide is this because there is a unit root in
the process or have we assumed the wrong underlying data generating process.
So in step 2 we try to establish if indeed the underlying process we have assumed

should be di↵erent. Now in step 2 we will either reject the null H0 : (↵, , ) =

Step Equation Statistic Hypotheses Values
4yt =
1 ↵ + t + yt 1 + et ⌧⌧ H0 : =0 Table 2.4
HA : <0
2 ↵ + t + yt 1 + et 3 H0 : (↵, , ) = (↵, 0, 0) Table 2.5

HA : (↵, , ) 6= (↵, 0, 0)
2a ↵ + t + yt 1 + et t (⌧⌧ ) H0 : =0 St. Normal

HA : <0
3 ↵ + yt 1 + et ⌧µ H0 : =0 Table 2.4
HA : <0
4 ↵ + yt 1 + et 1 H0 : (↵, ) = (0, 0) Table 2.5

HA : (↵, ) 6= (0, 0)
4a ↵ + yt 1 + et t(⌧µ ) H0 : =0 St. Normal

HA : <0
5 yt 1 + et ⌧ H0 : =0 Table 2.4
HA : <0
Table 2.3: Perron Sequential Procedure for the Dickey Fuller Unit Root Test
(↵, 0, 0) or not. If we do reject the null in step 2 then either 6= 0 or 6= 0 but

since we did not reject the null of a unit root in step 1 this means that the we
must have 6= 0. So we now conclude that there is a significant trend in the
process and go to step 2a.
If, however, we do not reject the null in step 2 we conclude that there is no
evidence of a trend in the model and so we go to step 3 where we use an estimating
equation that does not include a trend.
Referring to Table 2.1 on page 11 we see that in the presence of a deterministic

time trend the ⌧⌧ -statistic is asymptotically standard normal. So in step 2a
instead of using the DF critical values we should use a standard t-statistic (with
1 degrees of freedom) to test for a unit root.
In step 3 we test for a unit root with a drift term in the model. If we fail to
reject the null in this step we proceed to step 4 where we test jointly for the
presence of a unit root and a drift term. If we fail to reject this null we conclude
that the true process does not contain a drift term and we move to step 5.
If, however, we do reject the null in step 4 then this can only be because there
is a drift term present in the true model which would imply that the statistic ⌧µ
should follow the standard normal distribution. So we move to step 4a.
Having gone through all these steps, if we cannot reject the null of a unit root
we conclude that a unit root is present in the model.
It should be noted that this test procedure is influenced by the fact that including
additional deterministic terms in the estimating model beyond what is present
in the true process increases the chance of a type II error (accepting the null of
a unit root when in fact the true process is stationary). That is the power of the
test decreases against alternatives of stationarity.
This can be seen by looking at the DF critical values: ⌧⌧ < ⌧µ < ⌧ . Suppose that
the true process is given by 4yt = et , for the lower tailed test H0 : = 0 vs HA :
< 0, the ordering of the DF critical values means that it will be harder to reject
the null of a unit root when estimation uses a model with drift and a trend than
when it uses only a drift than when it uses neither.
The sequential procedure of Perron seeks to minimize the possibility of making

this kind of error. Having said that, we must of course be aware that the usual
Significance Level Significance Level Significance Level

0.01 0.05 0.10 0.01 0.05 0.10 0.01 0.05 0.10
Sample
size critical values for ⌧ critical values for ⌧µ critical values for ⌧⌧
25 2.66 1.95 1.60 3.75 3.00 2.63 4.38 3.60 3.24

50 2.62 1.95 1.61 3.58 2.93 2.60 4.15 3.50 3.18
100 2.60 1.95 1.61 3.51 2.89 2.58 4.04 3.45 3.15
t-dist. 2.33 1.65 1.28 2.33 1.65 1.28 2.33 1.65 1.28
1 d.f.
Table 2.4: Dickey Fuller “⌧ ” Critical Values
Significance Level Significance Level Significance Level

0.01 0.05 0.10 0.01 0.05 0.10 0.01 0.05 0.10
Sample size critical values for 1 critical values for 2 critical values for 3
25 7.88 5.18 4.12 8.21 5.68 4.67 10.61 7.24 5.91

50 7.06 4.86 3.94 7.02 5.13 4.31 9.31 6.73 5.61
100 6.70 4.71 3.86 6.50 4.88 4.16 8.73 6.49 5.47
250 6.52 4.63 3.81 6.22 4.75 4.07 8.43 6.34 5.39
500 6.47 4.61 3.79 6.15 4.71 4.05 8.34 6.30 5.36
1 6.43 4.59 3.78 6.09 4.68 4.03 8.27 6.25 5.34
Table 2.5: Dickey Fuller “ ” Critical Values
issues associated with multiple testing remain.
2.3 Augmented Dickey Fuller Regression
In practice we cannot always use the Dickey-Fuller tests which were described
in the previous Section because the assumptions required are too strong. Recall
that in the basic Dickey Fuller tests we were dealing with AR(1) processes with
errors et that were IID white noise. In reality there are complications which
would prevent the use of these DF tests.
The first such complication is what should we do if the et are not IID white noise?
In that instance the Dickey Fuller critical values may not be valid. The second
problem arises when the process follows a more general model than an AR(1)
model: AR(k) models or mixed models with MA terms. In the next two sections
we will examine both of these problems.
2.3.1 1. Non-IID errors
Suppose the true model is:
yt = ⇢yt 1 + ↵ + t + et (with ↵ = 0, = 0, ⇢ = 1) (2.9)
where now et is not IID but instead is a stationary AR(k):
e t = ✓1 e t 1 + ✓2 e t 2 + . . . + ✓k e t k + ✏t , ✏t ⇠ IID white noise.
Equation (2.9) can be re-parameterised as :
4yt = yt 1 + ↵ + t + et
(2.10)
= yt 1 + ↵ + t + ✓1 e t 1 + . . . + ✓k e t k + ✏t
(2.11)
We now make use of the fact that in the true model: 4yt = et (c.f. (2.9)) to
rewrite Equation (2.11) as the AR(k) (c.f. (Section I)) process:
4yt = yt 1 + ↵ + t + ✓1 4yt 1 + . . . + ✓k 4yt k + ✏t . (2.12)

So an AR(1) process with autocorrelated errors can be transformed into an AR(k)

process with IID white noise errors.
2.3.2 2. MA(q) and AR(k) terms
What about models with MA terms? More generally, how should we test the
hypotheses (2.1) vs (2.2) for the general unit root model?
That is;
H0 : (L)4(yt µ) = ✓(L)et , ✓(1) 6= 0
vs
HA : (L)(1 ⇢L)(yt µ) = ✓(L)et , 1 < ⇢ < 1.
A possible approach might be suggested by the fact that a general ARMA process
can be approximated by an AR model, which has sufficiently high order to ensure
white noise residuals. The usefulness of this approach was confirmed by Said and
Dickey. They showed that an asymptotically valid unit root test for mixed models
with AR and MA components is obtained if the data are analysed as if the process
was an autoregressive model where the order of the AR model is related to n, the
sample size.
So both of the problems with the Dickey Fuller test are solved if we can test for
unit roots in AR(k) processes. Dickey and Fuller have developed such a test it is
called the Augmented Dickey Fuller (ADF) test.
2.3.3 The ADF test
We recall that a general AR(k) process,
y t = ⇢1 y t 1 + ⇢2 y t 2 + . . . + ⇢k y t k + et ,
can be written as:
4yt = ⇢⇤ yt 1 + ⇢⇤1 4yt 1 + . . . + ⇢⇤k 1 4yt k+1 + et .
This version of the process is often called an Error Correction Mechanism (ECM)
and Section 5.1 contains a detailed discussion of such models. We saw earlier that
this process contains a unit root if ⇢⇤ = 0 and is stationary if ⇢⇤ < 0. Dickey and
Fuller showed that in large samples the t-statistic ⇢ˆ⇤ se(ˆ
⇢⇤ ) follows the same
distribution as the ⌧ -statistic in the Dickey-Fuller test.
We can generalize to include drift and trend terms:
4yt = ↵ + t + ⇢⇤ yt 1 + ⇢⇤1 4yt 1 + . . . + ⇢⇤k 1 4yt k+1 + et . (2.13)
Dickey and Fuller have also shown that in large samples the ADF versions of not
just ⌧ but of all the statistics ⌧µ , ⌧⌧ , 1, 2, 3 follow the same distributions as in
the Dickey Fuller case (c.f. Table 2.1).
As mentioned before, an ARMA model with unknown orders for the AR and MA
components can be approximated by an AR(k) process, so long as k is sufficiently
large to ensure white noise residuals. The order k will increase as the sample size
increases, Schwert suggests using
" ✓ ◆1/4 #
T
k = int 12 , (2.14)
100
where int represents the “integer part of”.

Choosing the correct lag length is important. Including too few lags will mean
that the errors et will still be non-stationary and this will increase the probability
of a type I error. Including too many lags may reduce the power of the test as
the model will include too many unnecessary additional parameters. However
it is better to include too many lags than too few. If we include too many the
regression can set the unnecessary ones to zero while maybe losing some efficiency.
2.3.4 ADF Test Procedure: Phase 1
As discussed, the order k of the levels autoregression or k 1 in the ECM (2.16)

is unknown so our first task is to decide on this using the data, including as many
lags as is appropriate to ensure that the residuals are IID white noise. One could
begin here by examining PACF and ACF plots of the di↵erenced series 4yt , to
try and determine how many lags should be included. A significant PACF at lag
j would indicate one should fit k 1 = j lags in the Error Correction Model.
Alternatively one could use (2.14) initially, then fit ARIMA models with the order
of the AR part = k 1, the order of integration d equal to 1 and no MA terms
to the original data. This is fit (2.15) to the data
4yt = ↵ + t + ⇢⇤1 4yt 1 + . . . + ⇢⇤k 1 4yt k+1 + et .. (2.15)
It should be noted that (2.15) is in fact (2.13) without the yt 1 term.
We then try to fit (2.15) with one less lag and use Lagrange Multiplier tests to
check for white noise residuals. The Ljung-Box-Pierce statistic is appropriate
here, it looks at the residuals as a group testing for white noise. We compare a
model with k lags with one with k 1 lags to see if the chosen k is correct. We
continue reducing the number of lags in the ARIMA model and stop when the
Ljung-Box-Pierce statistic rejects white-noise residuals.
2.3.5 ADF Test Procedure: Phase 2
By analogy with equation (2.8) in the Dickey Fuller test procedure, we see that
the ADF procedure appropriately begins by estimating the following ECM:
k 1
X
⇤
4yt = ↵ + t + ⇢ yt 1 + ⇢⇤i 4yt i + et . (2.16)
i=1
That is, we regress 4yt on ↵, t, yt 1 , and
4yt 1 , 4yt 2 , . . . , 4yt k+1 .
Having decided on an appropriate order for the autoregression, the rest of the
Augmented-Dickey-Fuller test procedure follows the same as in the basic Dickey-Fuller
case. Refer to Table 2.3 on page 13 for details.
Of course if we decide that a series contains a unit-root there is the question as

to whether it also contains a second unit root, i.e. is the order of integration
I(1) or I(2)? To test this we should go through the ADF testing procedure on
the di↵erenced series 4yt . So instead of (2.16) we would begin here with the
regression:
k 1
X
⇤
2
4 yt = ↵ + t + ⇢ 4yt 1 + ⇢⇤i 42 yt i + et , (2.17)
i=1
and proceed as usual.

3
Spurious Regressions
A Marsr a day helps you work, rest and play
21
3. SPURIOUS REGRESSIONS 22
When analysing several time series and trying to establish relationships between
them, it is important to be aware of the possibility of spurious regression. It is
possible that two independent time series can appear to be related when in fact
all that is happening is that there are correlated time trends. In Trend-Stationary
series one should include a deterministic time trend in the regression in order to
remove the trend e↵ect. This will leave residuals which are stationary and allow
valid statistical inferences using t or F tests.
But suppose we are dealing with Di↵erence-Stationary series, in this case including
a time trend in the model is not sufficient. Using standard regression techniques
with non-stationary data will lead to spurious regressions giving invalid inferences
using t or F tests. An example will illustrate this. Consider the following two
independent time-series:
yt = yt 1 + ut , ut ⇠ IID white noise (3.1)
xt = xt 1 + vt , vt ⇠ IID white noise. (3.2)
The two series xt and yt are unrelated and estimation of the model
yt = 0 + 1 xt + ✏t (3.3)
should give the conclusion 1 = 0. In reaching that conclusion we use the fact that
ˆ1 /se( ˆ1 ) should be distributed as a Student-t distribution with N 2 degrees
of freedom, where N is the number of pairs of observations (xt , yt ). However
non-stationarity in the models (3.4, 3.5) can lead to a non-stationary ✏t and
the fact that both series are changing with t will show up in the modelling as a
correlation between the two series and as a non zero estimate for 1. So estimation
of model (3.6) will imply a causal relationship between the series when in fact
none is present. To illustrate this spurious regression problem we have simulated
the series xt and yt when = 0.1 to give two pairs of stationary series and then
again with = 1 giving two pairs of non stationary series. Each of our simulated
series contained 30 observations.
Plot of X_t Phi=1

1
t(Z)[, 1]
0
−1
−2
0 5 10 15 20 25 30
Time
Plot of Y_t Phi=1

2.0
1.0
t(Z)[, 2]
0.0
−1.5
0 5 10 15 20 25 30
Time
Figure 3.1: Time Series Plots of xt and yt when = 0.1
Figure 3.1 contains time series plots of a simulated xt and yt when = 0.1. It is
clear from these plots that both of the simulated series are stationary.
Scatter plot of Y_t vs X_t

2.0
1.5
1.0
0.5
Z[2, ]
0.0
−0.5
−1.0
−1.5
−2 −1 0 1
Z[1, ]
Figure 3.2: Scatter Plot of xt and yt when = 0.1
Examining Figure 3.2 we can see that the series xt and yt do not display any
correlation, as expected.
We now consider the series simulated with = 1. Time series plots of xt and
yt with = 1 are shown in Figure 3.3 and clearly indicate that the series are
non-stationary.
Plot of X_t Phi=1

1 2 3 4 5
t(Z)[, 1]
−1
0 5 10 15 20 25 30
Time
Plot of Y_t Phi=1

5
4
3
t(Z)[, 2]
2
1
0
0 5 10 15 20 25 30
Time
Figure 3.3: Time Series Plots of xt and yt when =1
The spurious regression phenomenon can be clearly seen when we examine a

scatter plot of xt vs yt with = 1 (Figure 3.4). In this plot there is a clear
positive correlation between the series xt and yt , despite the fact that these are
generated from entirely independent processes.
Scatter plot of Y_t vs X_t

5
4
3
Z[2, ]
2
1
0
−1 0 1 2 3 4 5
Z[1, ]
Figure 3.4: Scatter Plot of xt and yt when =1

To further examine the nature of the spurious regressions we estimated (3.6) for
the stationary pair of series and separately for the non-stationary pair, computing
ˆ1 /se( ˆ1 ) in each case. We repeated these simulations 10000 times and Table 3.1
compares the percentiles of ˆ1 /se( ˆ1 ) from the stationary and non-stationary

regressions. The spurious regression problem can be seen quite clearly from these
simulations. When = 0.1 and we are dealing with stationary series, ˆ1 /se( ˆ1 )
is distributed as a t-distribution with N 2 degrees of freedom. However, when
= 1, ˆ1 /se( ˆ1 ) is clearly no longer distributed as a t-distribution. In fact, it
is clear that the distribution of ˆ1 /se( ˆ1 ) in this case is much more spread out
leading to much higher rejection of the null hypothesis 1 = 0 in favour of 1 6= 0.
Source 90th Percentile 95th Percentile 99th Percentile
t-distribution 1.312527 1.701131 2.46714

= 0.1 1.292041 1.673797 2.438064
=1 8.128050 10.973310 17.09559
Table 3.1: ˆ1 /se( ˆ1 ) : Spurious vs Non-Spurious Regressions - 10,000 Simulations

Consider the following two independent time-series:
yt = yt 1 + ut , ut ⇠ IID white noise (3.4)
xt = xt 1 + vt , vt ⇠ IID white noise. (3.5)
The two series xt and yt are unrelated and estimation of the model
yt = 0 + 1 xt + ✏t (3.6)
should give the conclusion 1 = 0. However the non-stationarity in the models

(3.4), (3.5) leads to a non-stationary ✏t . Estimation of model (3.6) will imply a
causal relationship between the series when in fact none is present.
4
Multivariate Time Series
4.1 Vector Time Series Models
Multivariate time series data is often modelled using Vector Autoregressive Moving
Average (VARMA) models. They are a more general classification of time series
model and can be used to describe relationships between a number of time series
variables (rather than focusing on the relationship between a single dependent
variable and several independent variables as we have discussed up to now).
4.1.1 Covariance and Correlation Matrix Functions
We denote the variables being studied as:
Zt = [Z1t , Z2t , . . . , ZM t ]t (4.1)
where M i the number of time series being studied, t = 0, ±1. ±2, . . .. Zt is

an M-dimensional real-valued vector process. We also assume that Zt is jointly
stationary.
29
4. MULTIVARIATE TIME SERIES 30
Definition: A jointly stationary process implies that each univariate process is

stationary. However, the converse is not necessarily true: a vector of stationary
univariate time series is not necessarily a jointly stationary process.
We also assume that the expected value of Zt is given by:
E(Zrt ) = µr (4.2)
E(Zt ) = µ = (µ1 , µ2 , . . . , µM )t (4.3)
where the mean, µr , is constant for each r = 1, 2, . . . , M . The covariances between

Zrt and Zsu (for all r,s = 1, 2, . . ., M), are functions of the lag, or time di↵erence,
(u-t). The covariance matrix for lag k is given by:
(k) = Cov(Zt , Zt+k )
= E [(Zt µ)(Zt+k µ)]

2 3
Z µ1
6 1t 7
6 7
6 Z2t µ2 7 ⇥ ⇤
= E66 ..
7 Z1(t+k)
7 µ1 , Z2(t+k) µ2 , . . . , ZM (t+k) µM
6 . 7
4 5
Z M t µM
2 3
(k) 12 (k) ... 1M (k)
6 11 7
6 7
6 21 (k) 22 (k) ... 2M (k) 7
= 6
6 .. .. .. ..
7
7
6 . . . . 7
4 5
M 1 (k) M 2 (k) . . . M M (k)
= Cov(Zt k , Zt ) (4.4)
where
rs (k) = E(Zrt µr )(Zs(t+k) µs ) = E(Zr(t k) µr )(Zst µs ) (4.5)
for k = 0, ±1, ±2, . . ., and r,s = 1, 2, . . . , M.

(k) is referred to as the covariance matrix function for Zt . rr (k) is the autocovariance
function for Zrt . rs (k) denotes the covariance function between Zrt and Zst .
Finally, (0) is the variance-covariance matrix at a given time.
The correlation matrix function for Zt is calculated using the matrix D, where D
is the diagonal matrix of M variances:
D = diag[ 11 (0), 22 (0), . . . , M M (0)]. (4.6)
The correlation matrix is given by:
1 1
⇢(k) = D 2 (k)D 2 = [⇢rs (k)] (4.7)
for r,s = 1, 2, . . ., M. The rth diagonal element of the correlation matrix, ⇢rr (k),
represents the autocorrelation function for the rth series in Zt i.e. the ACF for Zrt .
The o↵-diagonal terms of the correlation matrix ⇢(k) are the cross-correlation
functions between the corresponding series e.g. ⇢rs (k) is the cross-correlation
function between Zrt and Zst . Each element can also be calculated using the
following formula:
rs (k)
⇢rs (k) = p . (4.8)
rr (0) ss (0)
Assumptions
It is important to note that the covariance and correlation matrices for a vector
time series are positive definite insofar as:
T X
X T
↵r0 (tr ts )↵s 0
r=1 s=1
and
T X
X T
↵r0 ⇢(tr ts )↵s 0 (4.9)
r=1 s=1
for any set of time points t1 , t2 , . . . , tT and any set of real vectors ↵1 , ↵2 , . . . ,
↵T .
It should also be noted that
rs (k) 6= rs ( k)
and
rs (k) 6= rs ( k). (4.10)
Instead,
0
(k) = ( k)
and
⇢(k) = ⇢0 ( k) (4.11)
since
rs (k) = E[(Zrt µr )(Zs(t+k) µs )]
= E[(Zs(t+k) µs )(Zrt µr )]
= sr (k). (4.12)
4.1.2 Moving Average and Autoregressive Vector Models
Moving Average Vector Models
The stationary vector time series Zt is called a linear process (or purely nondeterministic
process) if it can be written as a linear combination of white noise random vectors:
Z t = µ + at + 1 at 1 + 2 at 2 + ...
X1
= µ+ u at u (4.13)
u=0
where the at are M-dimensional white noise random vectors with mean zero and
covariance matrix given by:
⌃ if k = 0
E(at a0t+k ) = (4.14)
0 if k 6= 0
where ⌃ is an MxM symmetric positive definite matrix. The elements of the

vector at at di↵erent times are uncorrelated. However, they may be contemporaneously
correlated.
Note also that the coefficients of the linear combination u are MxM coefficient
matrices with 0 = IM , the identity matrix.
This process is known as the multivariate moving average process.
Autoregressive Vector Models
The vector process can also be expressed as an autoregressive process. In an

autoregressive model, the value of of the series Z at a given time t is regressed
on its own past values and a random vector (of errors or shocks).
Żt = ⇧1 Żt 1 + ⇧2 Żt 2 + . . . + at

X
= u = 11 ⇧u Ż( t u) + at (4.15)
This can also be expressed in terms of the backshift operator, B:
⇧(B)Żt = at (4.16)
where
1
X
⇧(B) = I ⇧u B u (4.17)
u=1
and the ⇧u are MxM matrices of the autoregressive coefficients. In particular,

⇧0 = IM .
In order for the process to be invertible, the autoregressive coefficient matrices
must be absolutely summable i.e.
1
X
|⇧rs,u | < 1 (4.18)
u=0
for all r and s, where ⇧u = [⇧rs,u ].
We have mentioned the conditions for stationarity in a moving average process

and invertibility in an autoregressive process. One does not imply the other.
A stationary process is not necessarily invertible. No zeros of the determinant of
the moving average matrix polynomial (| (B)|) should lie inside or on the unit
circle in order for a vector process with a stationary moving average representation
to be invertible i.e.:
| (B)| =
6 0 for |B|  1. (4.19)
Similarly, an invertible process is not necessarily stationary. Suppose a vector

process has an invertible autoregressive representation, it is only stationary if the
determinant of the autoregressive matrix polynomial (|⇧(B)|) has no zeros on or
inside the unit circle i.e.:
|⇧(B)| =
6 0 for |B|  1. (4.20)
The Vector Autoregressive Moving Average Model
The most general class of models of the type discussed above is the vector
autoregressive moving average models (or Vector ARMA(p,q) models). These
models allow both moving average terms and autoregressive terms:
p (B)Żt = ⇥q (B)at (4.21)
where:
Żt = Zt µ
2 p
p (B) = 0 1B 2B ... pB
and ⇥q (B) = ⇥0 ⇥1 B ⇥2 B 2 ... ⇥q B q
p (B) and ⇥q (B) are the autoregressive and moving average matrix polynomials
(of order p and q) respectively. 0 and ⇥0 are non-singular M-dimensional square
matrices. Without loss of generality, it can be assumed that 0 = ⇥0 = IM (as
long as the covariance matrix of at , ⌃, is positive definite.)
Clearly, when p = 0, this model becomes a vector moving average process of order
q (a vector MA(q) process):
Żt = at ⇥ 1 at 1 ... ⇥ q at q . (4.22)
Similarly, when q = 0, the process becomes a vector autoregressive process of

order p (a vector AR(p) process):
Żt = 1 Żt 1 + ... + p Żt p + at . (4.23)
The Vector ARMA(p,q) process is invertible if:
|⇥q (B)| =
6 0 for |B|  1 (4.24)
i.e. the zeros of the determinantal polynomial |⇥q (B)| are outside the unit circle.
In such a case, the model can be re-written in the form:
⇧(B)Żt = at (4.25)
where
1
X
p (B) = I
1
⇧(B) = [⇥q (B)] ⇧u B u (4.26)
u=1
such that the sequence ⇧u is absolutely summable.

The Vector ARMA(p,q) process is said to be stationary if the zeros of the
determinantal polynomial | p (B)| are outside the unit circle i.e.:
| p (B)| 6 0 for |B|  1

= (4.27)
A stationary process can then be written as:
Żt = (B)at (4.28)
where
(B) = [ p (B)] 1 ⇥q (B)

X1
u
= uB (4.29)
u=0
such that the sequence u is square summable.
Model Identification
The identification process for a Vector ARMA(p,q) model is similar to the identification
process of a univariate time series. In the univariate case, the following steps are
taken:
1. The time series plot is examined for evidence of non-stationarity.
2. If necessary, transformation (such as di↵erencing or de-trending) of the data

are applied to ensure stationarity.
3. The sample autocorrelation function and sample partial autocorrelation

function are calculated and plotted. These graphs are used to estimate the
order of autoregression and the order of the moving average components of
the model (p and q respectively).
In a similar way, given a vector time series Z1 , Z2 , . . . , ZM , the underlying model

is identified using the sample correlation and partial autocorrelation function
matrices (once any necessary transformations have been applied to ensure stationarity).
The Sample Correlation Matrix Function

Consider the observed vector time series Z1 , Z2 , . . . , ZM , the sample correlation
matrix function is denoted as:
⇢ˆ(k) = [ˆ
⇢rs (k)]. (4.30)
The ⇢ˆrs (k) are calculated using the following formula (Equation (4.31)) and
represent the cross-correlations between Zr and Zs .
Pn k
t=1 (Zrt Z̄r )(Zs(t+k) Z̄s )
⇢ˆrs (k) = ⇥P P ⇤ 12 (4.31)
n
t=1 (Zrt Z̄r )2 nt=1 (Zst Z̄s )2
where Z̄r and Z̄s are the sample means of Zr and Zs respectively. It has been
shown (Hamann REF) that the sample correlation function estimator ⇢ˆ(k) is
consistent and asymptotically Normally distributed, assuming that the vector
process is stationary.
The sample correlation matrix function is used to identify the order of the
(finite-order) moving average component of the ARMA model. This is due to
the characteristic of the sample correlation matrix function that the correlation
matrices beyond lag q are zero for a vector MA(q) process.
With high-dimensional vectors however, identification using the sample correlation

matrices can be difficult simply due to the number of elements. It can make it
extremely difficult to determine the patterns present in the matrices. There is
a convenient method (introduced by Tiao and Box REF) which can ease the
complexity of pattern recognition. The sample correlations are summarized by
converting the entries to one of three symbols:
+ denotes a value greater than 2 x the estimated standard errors,
- denotes a value less than -2 x the estimated standard errors and
· denotes a value within 2 estimated standard errors.
The Partial Autoregression Matrices

The order of the autoregressive component of the Vector ARMA(p,q) can be
identified in a similar way using the partial autocorrelation function (PACF).
The PACF between two series Zt and Zt+k is defined as the correlation between
the two after the linear dependency on the variables in between the two (Zt+1 ,
Zt+2 , . . . , Zt+k 1 ) has been removed:
h i
Cov (Zt Ẑt ), (Zt+k Ẑt+k )
kk =q q (4.32)
V ar(Zt Ẑt ) V ar(Zt+k Ẑt+k )
where Ẑt and Ẑt+k are the linear estimators of Zt and Zt+k calculated by minimum
mean squared error linear regression on Zt+1 , Zt+2 , . . . , Zt+k 1 . This function
kk is zero for |k| > p where p is the number of autoregressive terms required by
the underlying model.
5
Cointegration
5.1 The Error Correction Model
Let us introduce the Error Correction Model which is favoured by economists as a

means of modelling time series with both long and short run behaviours. Suppose
we have two series I (Income) and C (Consumption) which each have unit roots,
so that 4I and 4C are stationary. Now suppose that we believe that there is
a relationship between I and C. Because these two series are non-stationary, we
have seen that trying to model a relationship between these series is subject to
the problem of spurious regression.
But since the di↵erenced series are stationary we may decide that these di↵erenced
series are related by a regression model:
4Ct = 0 4It + et . (5.1)
In this model increasing I by one unit per period will increase C by 0 units.
Statistically this model is sound but in economics it may not be so reasonable.
39
5. COINTEGRATION 40
In particular, one might think that the relationship between the increase in
Consumption given an increase in Income should also depend on the current
level of Income. One reasoning for this might be that if one earns a lot then any
increase in income will not be needed to be saved for necessities but could instead
be spent freely, whereas if one is not in a high income bracket then extra income
may not be so liberally consumed.
Economists are also generally interested in systems reaching equilibrium and the
model (5.1) does not include an equilibrium solution. In equilibrium we would
have Ct = Ct 1 = . . . and It = It 1 = ... .
One way to try and fix these problems is to include a term which is the deviation
between the actual value of C in the previous period t 1 and the equilibrium
value of C.
Suppose the equilibrium relationship between C and I is linear:
Ctequil = It . (5.2)
Then the deviation from this equilibrium at period t 1 is Ct 1 It 1 . We can

incorporate this as a correction to model (5.1), and so the new model is:
4Ct = 0 4It (Ct 1 It 1 ) + et . (5.3)
The parameter is usually rewritten as (1 ✓1 ), where ✓1 < 1 giving:
4Ct = 0 4It (1 ✓1 )(Ct 1 It 1 ) + et . (5.4)
This type of model is called an Error Correction Model (ECM) as it has the
ability to correct disequilibria.
5. COINTEGRATION 41
Let us consider how it implements this correction. Firstly, during periods of

equilibrium the term (Ct 1 It 1 ) will be zero and the model (5.4) will revert
to the form (5.1). In a period of dis-equilibrium, Ct increases faster or slower
than expected by the equilibrium relationship (5.2).
If Ct increases slower than expected then we will find (Ct 1 It 1 ) < 0 but
(1 ✓1 ) < 0 also. So the net e↵ect is to add a positive term to the equilibrium
value 0 4It thus boosting 4Ct and forcing Ct back towards equilibrium.
If Ct increases faster than expected then we are instead adding a negative term
which again forces Ct back towards its equilibrium value.
The original model (5.1) did have problems as far as economics was concerned
however (5.1) was a relationship between stationary variables and so was sound
statistically.
The new model (5.4) may make more economic sense, however it now has statistical
problems: it only makes sense if the new variable Ct 1 It 1 is stationary. But
this variable is a linear combination of two non-stationary I(1) variables C and
I at time t 1 and this will also in general be non-stationary I(1).
We note that we can also generalize the ECM model (5.4), including more lag
lengths, to the following relationship linking the variables Ct and It :
A(L)4Ct = B(L)4It (1 ✓)(Ct 1 It 1 ) + et . (5.5)
where
A(L) = 1 ↵1 L ↵ 2 L2 ... ↵ k Lk ,
2 q
B(L) = 0 + 1L + 2L + ... + qL .
5. COINTEGRATION 42
5.2 Vector Error Correction Models
In fact, we can further generalize to a Vector Error Correction Model. Here we

consider the p dimensional vector time-series
0 1
y1t
B C
B C
B y2t C
yt = B
B
C
.. C
B . C
@ A
ypt
satisfying the following vector autoregression of order k:
yt = 1 yt 1 + 2 yt 2 + ... + k yt k + ✏t , ✏t ⇠ IN (0, ⌦) (5.6)
where each i is an p ⇥ p matrix of parameters.
As in the one dimensional case (c.f. (??) and (??)) this VAR model can be
re-formulated as a Vector Error Correction Model (VECM):
4yt = ⇧yt 1 + 1 4yt 1 + ... + k 1 4yt k+1 + ✏t (5.7)
where
i = ( k + p 1 + ... + i+1 ), i = 1, 2, . . . , k 1
and
⇧= 1 + ... + k I.
5. COINTEGRATION 43
Alternatively (5.6) can be written as
0 0
4yt = ⇧yt k + 1 4yt 1 + ... + k 1 4yt k+1 + ✏t (5.8)
where
0
i = (I 1 2 ... i ), i = 1, 2, . . . , k 1
and
⇧= 1 + ... + k I.
In this section we introduced the ECM but we have seen that this model appears
not to make sense statistically as it involves both I(1) and I(0) variables together
in the same regression. The solution to this problem was presented by Engle
and Granger when they introduced the concept of cointegration, which we will
examine in the next section of this chapter.
5.3 Cointegration
Consider two series y1t and y2t , which are both integrated of order d ⇠ I(d). In
general any linear combination of these series will also be integrated of order
d. In particular if a regression is performed of y1t on y2t , then the residuals
from this regression will be I(d) i.e. the regression will su↵er from spurious
correlation. Engle and Granger noticed that in some situations it might be
possible to perform a regression containing non-stationary variables and still avoid
spurious regression. They introduced the concept of cointegration:
5. COINTEGRATION 44
Definition 5.3.1 Cointegration (Granger 1981, Engle and Granger 1987)

The components of a vector time series
0 1
y1t
B C
B C
B y2t C
yt = B
B
C
.. C
B . C
@ A
ypt
are said to be co-integrated of order d,b, denoted yt ⇠ CI(d, b), if
• all components of yt are I(d) (Integrable of Order d);
• there exists a vector 6= 0 so that
0
Zt = yt ⇠ I(d b), d b > 0.
The vector is called the co-integrating vector.
In practice we do not often encounter this most general situation. Usually

0
d = b = 1 and so if cointegration exists Zt = yt will be I(0).
It should be noted that the co-integrating vector is not unique, as scalar

multiples of the stationary co-integrating linear combination are also stationary.
This means that the co-integrating vector may be normalised in di↵erent ways.
5. COINTEGRATION 45
In fact if yt is an p dimensional vector time series then there may be h < p

0
linearly independent p ⇥ 1 vectors ( 1 , 2, . . . , h) such that yt is a stationary
0
vector time series, where is the following h ⇥ p matrix:
0 1 0 1
0
...
B C B 11
1 12 1p
C
B 0 C B C
B 2 C B 21 22 ... 2p C
0
=B C B
B .. C = B ..
C
C
B . C B . C
@ A @ A
0
h h1 h2 ... hp
As mentioned, the vectors ( 1 , 2, . . . , h) are not unique, as for any non-zero

1 ⇥ h vector a0 the linear combination a0 0 yt is stationary.
If ( 1 , 2, . . . , h) span the co-integrating space then they form a basis for the
co-integrating space.
Having seen the formal definition of cointegration let us consider what it means
in practice. Cointegration means that although there may be many apparently
independent changes in the individual elements of yt , there are actually some
long-run equilibrium relations tying the individual components together. These
0
relations are represented by the linear combinations yt . So cointegration provides
a model for the idea in economics of a long-run equilibrium to which the system
will converge over time.
We see now that if two variables are fully co-integrated CI(d, d) it is possible to
perform a meaningful regression between them, the regression would pick up the
stationary linear combination and the residuals would no longer be non-stationary
thus eliminating the problem of spurious regression. In practice we mainly deal
with I(1) variables and seek to find co-integrating linear combinations which will
be stationary.
5. COINTEGRATION 46
5.4 Engle-Granger Estimation
When the concept of cointegration was introduced by Engle and Granger, they
suggested a procedure to test for cointegration among two variables. If two
variables yt and xt are co-integrated then there is a stationary linear combination
of the variables. This means that the model:
yt = x t + e t (5.9)
describes a stationary relationship, does not su↵er from spurious regression and
can be consistently estimated by ordinary least squares.
Now, if the two variables yt and xt are not co-integrated then there will not be a
stationary linear combination of the two variables and hence equation (5.9) would
once again su↵er from spurious regression as the residuals will be non-stationary.
Engle and Granger make use of this fact to construct a test for cointegration.
They suggest using an ADF test on the residuals et of the (5.9) regression to
see if they satisfy the null of being I(1) or the alternative of stationarity - I(0).
So as described in Section 2.3 we should estimate:
k 1
X
⇤
4et = ⇢ et 1 + ⇢⇤i 4et i + µ + t + !t ,
i=1
!t ⇠ IID white noise (5.10)
The trend and drift terms can be added in the regression (5.9) or in (5.10) but
not in both.
5. COINTEGRATION 47
This Engle-Granger procedure for testing for cointegration su↵ers from many
problems:
1. The test has low power
2. In finite samples the cointegration estimates may be biased
3. Inferences about parameters in (5.9) cannot be performed using standard

t-statistics.
In addition to these problems the fact is that this approach, which uses a single
equation in the model, is only really suitable if there is just one cointegrating
relationship. In general, the multivariate Vector Auto Regression (VAR) approach
of Johansen is to be preferred.

Further Non-Stationarity Notes

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Further Non-Stationarity Notes

Uploaded by

Copyright:

Available Formats

University of Oxford

Time Series Analysis

Michaelmas Term, 2010

Department of Statistics, 1 South Parks Road,

1 Non-Stationary Time Series 1

1.1 Phenomenology of Non-Stationarity . . . . . . . . . . . . . . . . . 1

1.2 Trend Stationary vs Di↵erence Stationary . . . . . . . . . . . . . 3

2 Unit Root Tests 7

2.1 General Issues in Unit Root Testing . . . . . . . . . . . . . . . . . 7

2.2 Dickey-Fuller Tests . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2.2.1 Model 1 No Drift or Trend ⌧ . . . . . . . . . . . . . . . . 8

2.2.2 Model 2 Drift but no Trend ⌧µ . . . . . . . . . . . . . . . 9

2.2.3 Model 3 Drift and Trend ⌧⌧ . . . . . . . . . . . . . . . . . 10

2.2.4 Perron Sequential Testing Procedure for Unit Roots . . . . 12

2.3 Augmented Dickey Fuller Regression . . . . . . . . . . . . . . . . 15

2.3.1 1. Non-IID errors . . . . . . . . . . . . . . . . . . . . . . . 16

2.3.2 2. MA(q) and AR(k) terms . . . . . . . . . . . . . . . . . 17

2.3.3 The ADF test . . . . . . . . . . . . . . . . . . . . . . . . . 18

2.3.4 ADF Test Procedure: Phase 1 . . . . . . . . . . . . . . . . 19

2.3.5 ADF Test Procedure: Phase 2 . . . . . . . . . . . . . . . . 20

4 Multivariate Time Series 29

4.1 Vector Time Series Models . . . . . . . . . . . . . . . . . . . . . . 29

4.1.1 Covariance and Correlation Matrix Functions . . . . . . . 29

4.1.2 Moving Average and Autoregressive Vector Models . . . . 33

5.1 The Error Correction Model . . . . . . . . . . . . . . . . . . . . . 39

5.2 Vector Error Correction Models . . . . . . . . . . . . . . . . . . . 42

5.4 Engle-Granger Estimation . . . . . . . . . . . . . . . . . . . . . . 46

Non-Stationary Time Series

In Section I we examined several di↵erent ARIMA models. We determined

1.1 Phenomenology of Non-Stationarity

In this section we will consider some of the implications of non-stationarity. A

1.2 Trend Stationary vs Di↵erence Stationary

For example if yt is an ARIMA(p, d, q) process where the p roots of the AR

If however we consider the di↵erenced series zt = 4d yt , this series will be stationary

Instead of di↵erencing a series to achieve stationarity one might think of removing

Consider the series

yt = ↵ + t + ✏t , ✏t ⇠ IID white noise. (1.4)

be seen from the following:

4yt = + 4✏t (1.7)

Patently this di↵erenced series 4yt is stationary.

Before proceeding we note the following:

Trend: Constant multiples of time t such as t in (1.4) are referred to as Trend

Next consider the process

yt = + yt 1 + ✏t , ✏t ⇠ IID white noise. (1.8)

If = 1 this series is non-stationary and if the initial value of yt at t = 0 is y0

If however we were to di↵erence the series yt then we would find 4yt = + ✏t

Note that the general ARIMA model (from Section I):

(B)4d (yt µ) = ✓(B)✏t ,

could in fact be written as

(L)4d [yt ( o + 1t + ... + d 1t

Thus (1.10) automatically includes polynomial trends of degree d 1. Including

(L)4d [yt ( o + 1t + ... + d+k0 t

(L)4d yt = c(t) + ✓(L)✏t , (1.12)

c(t) ⇠ polynomial of degree k0 .

Unnecessary di↵erencing, which has the benefit of at least ensuring stationarity,

Unit Root Tests

2.1 General Issues in Unit Root Testing

A general unit root process can be written as

(L)4(yt µ) = ✓(L)et , ✓(1) 6= 0. (2.1)

This could be tested against the alternative hypothesis

(L)(1 ⇢L)(yt µ) = ✓(L)et , 1 < ⇢ < 1. (2.2)

2.2 Dickey-Fuller Tests

2.2.1 Model 1 No Drift or Trend ⌧