Professional Documents
Culture Documents
Section III
1
CONTENTS 2
3 Spurious Regressions 21
5 Cointegration 39
5.3 Cointegration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
1
1. NON-STATIONARY TIME SERIES 2
1
and the characteristic polynomial (1 r
z) has root z = r. The series behaves
di↵erently according to whether r > 1, r = 1 or r < 1. Equation (1.2) has
solution:
1 1 1
yt = ✏t + ✏t 1 + ... + ✏1 + y0 (1.3)
r rt 1 rt
where y0 is the value of yt at t = 0.
1
It is clear that when r > 1 the influence of the initial term rt
y0 and the
1
impulses ri
✏t i die out as they move further into the past.
For r > 1 therefore we see that the present is more important than the past.
For these values of r the series is stationary and its behaviour will consist of
oscillation around the mean value 0.
When r = 1 past shocks and the initial value have the same weight, the past
being as important as the present.
And for r < 1 the weights on past terms increase with t, the past is more
important than the present. Here the series rapidly diverges towards +1 or
1. This behaviour is termed explosive and is of course counter-intuitive in
almost all situations. For that reason we safely assume that in time series models
of real life data all roots are either on or outside the unit circle.
1. NON-STATIONARY TIME SERIES 3
We have seen in the last chapter that di↵erencing certain non-stationary time
series can produce stationary series.
If we remove the linear term ↵+ t from this series yt then the series zt = yt ↵ t
with which we are left is clearly a stationary series :
z t = ✏t . (1.5)
Of course di↵erencing this series would also leave a stationary series. This can
1. NON-STATIONARY TIME SERIES 4
y t = ↵ + t + ✏t
yt 1 = ↵ + (t 1) + ✏t 1 (1.6)
Nomenclature
Drift: Constant terms included in time series models such as ↵ in (1.4) are
referred to as Drift terms.
t
X
yt = y0 + t + ej . (1.9)
j=1
1. NON-STATIONARY TIME SERIES 5
If we remove a linear trend y0 + t from this series we are still left with a
P
non-stationary series tj=1 ej .
Models such as (1.8) which require di↵erencing to achieve stationarity (and cannot
be made stationary by just removing a linear trend) are called Di↵erence-Stationary
series, whereas models which are stationary upon removal of a linear trend e.g.
(1.4) are called Trend-Stationary.
because
4d ( o + 1t + ... + d 1t
d 1
) = 0.
= ✓(L)✏t (1.11)
1. NON-STATIONARY TIME SERIES 6
which is equivalent to
Both the Trend-Stationary and Di↵erence-Stationary models allow for the inclusion
of a polynomial trend but in the Di↵erence-Stationary case the deviations from
the polynomial trend still require di↵erencing to achieve stationarity.
Choosing not to di↵erence a series when in fact di↵erencing is required can lead
to serious consequences such as spurious regression (c.f. Section 3), which is one
consequence of non-stationarity. As was seen from equation (1.9), removing a
linear trend does not solve the non-stationarity problem if the series is actually
di↵erence stationary.
In the next section we will examine how to determine whether a series is ‘Di↵erence
Stationary’.
2
We have seen the importance of the presence of unit roots in a time series. In
practice how should one decide whether a series contains a unit root or not?
One could examine plots of the time series looking for wandering behaviour that
would indicate non-stationarity. Alternatively one could look at the sample
auto-correlation function (ACF) of the original series and of the di↵erenced
series, if the auto-correlations don’t die out quickly then this would indicate
non-stationarity.
There are, however, problems with relying on graphical methods; the human eye
can deceive. Formal tests for unit roots have been developed and we will now look
at two of these tests in detail; the Dickey-Fuller and Augmented-Dickey-Fuller
tests. Formulating a set of hypotheses to test is our first consideration.
7
2. UNIT ROOT TESTS 8
The way this test has been formulated indicates that we are choosing a null
hypothesis of a unit-root with stationary alternatives. So we accept a unit-root
unless there is significant evidence that the process is stationary. We could have
decided to have stationarity as the null.
The reason we choose the hypotheses to have a unit-root null is because of the
relative importance of the two errors in this testing procedure. If we decide the
series is stationary when in fact it contains a unit root then any forecast intervals
we derive will be too narrow and we will be over-confident of our forecasts. If
however we conclude the series possesses a unit root when in fact it is stationary
then we would di↵erence a stationary series. The consequences of that are not
so serious: we would produce over conservative forecast intervals.
In this Section we examine the Dickey-Fuller (DF) approach to testing for a unit
root.
The simplest example of the procedure is in the AR(1) model with no drift or
time trend term:
yt = ⇢yt 1 + et . (2.3)
We assume here that the et terms are IID white noise and we are interested in
testing the hypotheses:
H0 : ⇢ = 1 vs HA : ⇢ < 1.
2. UNIT ROOT TESTS 9
4yt = yt 1 + et , (2.4)
H0 : = 0 vs HA : < 0.
Considering (2.4), we see that we can test this hypothesis by regressing 4yt on
yt 1 and computing the standard least squares t-statistic for testing that the
coefficient equals 0.
This test statistic, which we will call ⌧ is produced automatically in the computer
output obtained from most statistical packages by running a regression for equation
(2.4).
There is one important thing to note, however. If the true process is (2.3) with
⇢ = 1 then, because of non-stationarity this “t”-test statistic does not, in fact,
follow the standard t-distribution. The asymptotic theory of this model has
been developed using Brownian motion techniques. Dickey and Fuller have used
Monte-Carlo simulation to compute a set of critical values for this test and for
other variations on this model. We present some of these critical values in Table
2.4.
yt = ⇢yt 1 + ↵ + et (2.5)
2. UNIT ROOT TESTS 10
or the reparameterisation
4yt = yt 1 + ↵ + et . (2.6)
H0 : ⇢ = 1 vs HA : ⇢ < 1
or equivalently
H0 : = 0 vs HA : < 0.
The test statistic in this test, ⌧µ , is again the standard least squares t-statistic
obtained by running a regression for equation (2.6).
If the true data generating process has ↵ = 0 so that the real process is actually
(2.3) and if ⇢ = 1 then this test statistic, ⌧µ , follows a nonstandard distribution.
Critical values for this distribution, which is di↵erent from the one for the ⌧
statistic, have also been produced by Dickey and Fuller (c.f. Table 2.4). If
however the true process contains a unit root but a non-zero drift term ((2.5)
with ⇢ = 1), then ⌧µ follows the standard normal distribution.
yt = ⇢yt 1 + ↵ + t + et (2.7)
4yt = yt 1 + ↵ + t + et . (2.8)
Here the Dickey-Fuller test statistic, ⌧⌧ , is again the standard least squares
t-statistic obtained by running a regression for equation (2.8).
Again if the true data process is actually (2.3) with ⇢ = 1 then this test statistic,
⌧⌧ , also follows a nonstandard distribution. Critical values have again been
2. UNIT ROOT TESTS 11
produced by Dickey and Fuller (c.f. Table 2.4). If the true process contains a unit
root but a non-zero drift term ((2.5) with ⇢ = 1), then ⌧⌧ follows a nonstandard
distribution. Lastly, if the true process contains a unit root, a non-zero drift term
and a non-zero trend term((2.7) with ⇢ = 1), then ⌧⌧ follows a standard normal
distribution.
As well as using these three ⌧ -statistics to test the hypothesis of a unit root, it
is also possible to test some joint hypotheses for the presence of an intercept,
a time trend and a unit root. These joint tests use test statistics which are
calculated as standard F -statistics comparing restricted and unrestricted residual
sums of squares. However again due to non-stationarity the distributions are
non-standard. Dickey and Fuller present tables of critical values (c.f. Table 2.5
on page 15) for the three -statistics which are defined in Table 2.2.
We have now seen various tests of the unit-root hypotheses. Which test statistic
we should use depends not only on the estimating equation we will use but also
2. UNIT ROOT TESTS 12
depends on what the true data generating process is. Of course we do not know
in advance which is the correct data generating process so we need a systematic
testing procedure.
4yt =
1 ↵ + t + yt 1 + et ⌧⌧ H0 : =0 Table 2.4
HA : <0
3 ↵ + yt 1 + et ⌧µ H0 : =0 Table 2.4
HA : <0
5 yt 1 + et ⌧ H0 : =0 Table 2.4
HA : <0
Table 2.3: Perron Sequential Procedure for the Dickey Fuller Unit Root Test
If, however, we do not reject the null in step 2 we conclude that there is no
evidence of a trend in the model and so we go to step 3 where we use an estimating
equation that does not include a trend.
2. UNIT ROOT TESTS 14
In step 3 we test for a unit root with a drift term in the model. If we fail to
reject the null in this step we proceed to step 4 where we test jointly for the
presence of a unit root and a drift term. If we fail to reject this null we conclude
that the true process does not contain a drift term and we move to step 5.
If, however, we do reject the null in step 4 then this can only be because there
is a drift term present in the true model which would imply that the statistic ⌧µ
should follow the standard normal distribution. So we move to step 4a.
Having gone through all these steps, if we cannot reject the null of a unit root
we conclude that a unit root is present in the model.
It should be noted that this test procedure is influenced by the fact that including
additional deterministic terms in the estimating model beyond what is present
in the true process increases the chance of a type II error (accepting the null of
a unit root when in fact the true process is stationary). That is the power of the
test decreases against alternatives of stationarity.
This can be seen by looking at the DF critical values: ⌧⌧ < ⌧µ < ⌧ . Suppose that
the true process is given by 4yt = et , for the lower tailed test H0 : = 0 vs HA :
< 0, the ordering of the DF critical values means that it will be harder to reject
the null of a unit root when estimation uses a model with drift and a trend than
when it uses only a drift than when it uses neither.
t-dist. 2.33 1.65 1.28 2.33 1.65 1.28 2.33 1.65 1.28
1 d.f.
Sample size critical values for 1 critical values for 2 critical values for 3
In practice we cannot always use the Dickey-Fuller tests which were described
in the previous Section because the assumptions required are too strong. Recall
that in the basic Dickey Fuller tests we were dealing with AR(1) processes with
errors et that were IID white noise. In reality there are complications which
2. UNIT ROOT TESTS 16
The first such complication is what should we do if the et are not IID white noise?
In that instance the Dickey Fuller critical values may not be valid. The second
problem arises when the process follows a more general model than an AR(1)
model: AR(k) models or mixed models with MA terms. In the next two sections
we will examine both of these problems.
4yt = yt 1 + ↵ + t + et
(2.10)
= yt 1 + ↵ + t + ✓1 e t 1 + . . . + ✓k e t k + ✏t
(2.11)
We now make use of the fact that in the true model: 4yt = et (c.f. (2.9)) to
rewrite Equation (2.11) as the AR(k) (c.f. (Section I)) process:
What about models with MA terms? More generally, how should we test the
hypotheses (2.1) vs (2.2) for the general unit root model?
That is;
vs
A possible approach might be suggested by the fact that a general ARMA process
can be approximated by an AR model, which has sufficiently high order to ensure
white noise residuals. The usefulness of this approach was confirmed by Said and
Dickey. They showed that an asymptotically valid unit root test for mixed models
with AR and MA components is obtained if the data are analysed as if the process
was an autoregressive model where the order of the AR model is related to n, the
sample size.
So both of the problems with the Dickey Fuller test are solved if we can test for
unit roots in AR(k) processes. Dickey and Fuller have developed such a test it is
called the Augmented Dickey Fuller (ADF) test.
2. UNIT ROOT TESTS 18
y t = ⇢1 y t 1 + ⇢2 y t 2 + . . . + ⇢k y t k + et ,
This version of the process is often called an Error Correction Mechanism (ECM)
and Section 5.1 contains a detailed discussion of such models. We saw earlier that
this process contains a unit root if ⇢⇤ = 0 and is stationary if ⇢⇤ < 0. Dickey and
Fuller showed that in large samples the t-statistic ⇢ˆ⇤ se(ˆ
⇢⇤ ) follows the same
distribution as the ⌧ -statistic in the Dickey-Fuller test.
Dickey and Fuller have also shown that in large samples the ADF versions of not
just ⌧ but of all the statistics ⌧µ , ⌧⌧ , 1, 2, 3 follow the same distributions as in
the Dickey Fuller case (c.f. Table 2.1).
As mentioned before, an ARMA model with unknown orders for the AR and MA
components can be approximated by an AR(k) process, so long as k is sufficiently
large to ensure white noise residuals. The order k will increase as the sample size
increases, Schwert suggests using
" ✓ ◆1/4 #
T
k = int 12 , (2.14)
100
Choosing the correct lag length is important. Including too few lags will mean
that the errors et will still be non-stationary and this will increase the probability
of a type I error. Including too many lags may reduce the power of the test as
the model will include too many unnecessary additional parameters. However
it is better to include too many lags than too few. If we include too many the
regression can set the unnecessary ones to zero while maybe losing some efficiency.
Alternatively one could use (2.14) initially, then fit ARIMA models with the order
of the AR part = k 1, the order of integration d equal to 1 and no MA terms
to the original data. This is fit (2.15) to the data
We then try to fit (2.15) with one less lag and use Lagrange Multiplier tests to
check for white noise residuals. The Ljung-Box-Pierce statistic is appropriate
here, it looks at the residuals as a group testing for white noise. We compare a
model with k lags with one with k 1 lags to see if the chosen k is correct. We
continue reducing the number of lags in the ARIMA model and stop when the
Ljung-Box-Pierce statistic rejects white-noise residuals.
2. UNIT ROOT TESTS 20
By analogy with equation (2.8) in the Dickey Fuller test procedure, we see that
the ADF procedure appropriately begins by estimating the following ECM:
k 1
X
⇤
4yt = ↵ + t + ⇢ yt 1 + ⇢⇤i 4yt i + et . (2.16)
i=1
Having decided on an appropriate order for the autoregression, the rest of the
Augmented-Dickey-Fuller test procedure follows the same as in the basic Dickey-Fuller
case. Refer to Table 2.3 on page 13 for details.
Spurious Regressions
21
3. SPURIOUS REGRESSIONS 22
When analysing several time series and trying to establish relationships between
them, it is important to be aware of the possibility of spurious regression. It is
possible that two independent time series can appear to be related when in fact
all that is happening is that there are correlated time trends. In Trend-Stationary
series one should include a deterministic time trend in the regression in order to
remove the trend e↵ect. This will leave residuals which are stationary and allow
valid statistical inferences using t or F tests.
But suppose we are dealing with Di↵erence-Stationary series, in this case including
a time trend in the model is not sufficient. Using standard regression techniques
with non-stationary data will lead to spurious regressions giving invalid inferences
using t or F tests. An example will illustrate this. Consider the following two
independent time-series:
The two series xt and yt are unrelated and estimation of the model
yt = 0 + 1 xt + ✏t (3.3)
should give the conclusion 1 = 0. In reaching that conclusion we use the fact that
ˆ1 /se( ˆ1 ) should be distributed as a Student-t distribution with N 2 degrees
of freedom, where N is the number of pairs of observations (xt , yt ). However
non-stationarity in the models (3.4, 3.5) can lead to a non-stationary ✏t and
the fact that both series are changing with t will show up in the modelling as a
correlation between the two series and as a non zero estimate for 1. So estimation
of model (3.6) will imply a causal relationship between the series when in fact
none is present. To illustrate this spurious regression problem we have simulated
the series xt and yt when = 0.1 to give two pairs of stationary series and then
3. SPURIOUS REGRESSIONS 23
again with = 1 giving two pairs of non stationary series. Each of our simulated
series contained 30 observations.
0
−1
−2
0 5 10 15 20 25 30
Time
0.0
−1.5
0 5 10 15 20 25 30
Time
Figure 3.1 contains time series plots of a simulated xt and yt when = 0.1. It is
clear from these plots that both of the simulated series are stationary.
3. SPURIOUS REGRESSIONS 24
0.0
−0.5
−1.0
−1.5
−2 −1 0 1
Z[1, ]
Examining Figure 3.2 we can see that the series xt and yt do not display any
correlation, as expected.
We now consider the series simulated with = 1. Time series plots of xt and
yt with = 1 are shown in Figure 3.3 and clearly indicate that the series are
non-stationary.
3. SPURIOUS REGRESSIONS 25
−1
0 5 10 15 20 25 30
Time
2
1
0
0 5 10 15 20 25 30
Time
2
1
0
−1 0 1 2 3 4 5
Z[1, ]
To further examine the nature of the spurious regressions we estimated (3.6) for
the stationary pair of series and separately for the non-stationary pair, computing
ˆ1 /se( ˆ1 ) in each case. We repeated these simulations 10000 times and Table 3.1
The two series xt and yt are unrelated and estimation of the model
yt = 0 + 1 xt + ✏t (3.6)
Multivariate time series data is often modelled using Vector Autoregressive Moving
Average (VARMA) models. They are a more general classification of time series
model and can be used to describe relationships between a number of time series
variables (rather than focusing on the relationship between a single dependent
variable and several independent variables as we have discussed up to now).
29
4. MULTIVARIATE TIME SERIES 30
E(Zrt ) = µr (4.2)
= Cov(Zt k , Zt ) (4.4)
where
(k) is referred to as the covariance matrix function for Zt . rr (k) is the autocovariance
function for Zrt . rs (k) denotes the covariance function between Zrt and Zst .
Finally, (0) is the variance-covariance matrix at a given time.
The correlation matrix function for Zt is calculated using the matrix D, where D
is the diagonal matrix of M variances:
1 1
⇢(k) = D 2 (k)D 2 = [⇢rs (k)] (4.7)
for r,s = 1, 2, . . ., M. The rth diagonal element of the correlation matrix, ⇢rr (k),
represents the autocorrelation function for the rth series in Zt i.e. the ACF for Zrt .
The o↵-diagonal terms of the correlation matrix ⇢(k) are the cross-correlation
functions between the corresponding series e.g. ⇢rs (k) is the cross-correlation
function between Zrt and Zst . Each element can also be calculated using the
following formula:
rs (k)
⇢rs (k) = p . (4.8)
rr (0) ss (0)
Assumptions
It is important to note that the covariance and correlation matrices for a vector
time series are positive definite insofar as:
T X
X T
↵r0 (tr ts )↵s 0
r=1 s=1
and
T X
X T
↵r0 ⇢(tr ts )↵s 0 (4.9)
r=1 s=1
4. MULTIVARIATE TIME SERIES 32
for any set of time points t1 , t2 , . . . , tT and any set of real vectors ↵1 , ↵2 , . . . ,
↵T .
It should also be noted that
rs (k) 6= rs ( k)
and
Instead,
0
(k) = ( k)
and
⇢(k) = ⇢0 ( k) (4.11)
since
= E[(Zs(t+k) µs )(Zrt µr )]
= sr (k). (4.12)
4. MULTIVARIATE TIME SERIES 33
The stationary vector time series Zt is called a linear process (or purely nondeterministic
process) if it can be written as a linear combination of white noise random vectors:
Z t = µ + at + 1 at 1 + 2 at 2 + ...
X1
= µ+ u at u (4.13)
u=0
where the at are M-dimensional white noise random vectors with mean zero and
covariance matrix given by:
⌃ if k = 0
E(at a0t+k ) = (4.14)
0 if k 6= 0
⇧(B)Żt = at (4.16)
where
1
X
⇧(B) = I ⇧u B u (4.17)
u=1
1
X
|⇧rs,u | < 1 (4.18)
u=0
|⇧(B)| =
6 0 for |B| 1. (4.20)
4. MULTIVARIATE TIME SERIES 35
The most general class of models of the type discussed above is the vector
autoregressive moving average models (or Vector ARMA(p,q) models). These
models allow both moving average terms and autoregressive terms:
where:
Żt = Zt µ
2 p
p (B) = 0 1B 2B ... pB
p (B) and ⇥q (B) are the autoregressive and moving average matrix polynomials
(of order p and q) respectively. 0 and ⇥0 are non-singular M-dimensional square
matrices. Without loss of generality, it can be assumed that 0 = ⇥0 = IM (as
long as the covariance matrix of at , ⌃, is positive definite.)
Clearly, when p = 0, this model becomes a vector moving average process of order
q (a vector MA(q) process):
|⇥q (B)| =
6 0 for |B| 1 (4.24)
4. MULTIVARIATE TIME SERIES 36
i.e. the zeros of the determinantal polynomial |⇥q (B)| are outside the unit circle.
In such a case, the model can be re-written in the form:
⇧(B)Żt = at (4.25)
where
1
X
p (B) = I
1
⇧(B) = [⇥q (B)] ⇧u B u (4.26)
u=1
where
Model Identification
The identification process for a Vector ARMA(p,q) model is similar to the identification
process of a univariate time series. In the univariate case, the following steps are
taken:
⇢ˆ(k) = [ˆ
⇢rs (k)]. (4.30)
The ⇢ˆrs (k) are calculated using the following formula (Equation (4.31)) and
represent the cross-correlations between Zr and Zs .
Pn k
t=1 (Zrt Z̄r )(Zs(t+k) Z̄s )
⇢ˆrs (k) = ⇥P P ⇤ 12 (4.31)
n
t=1 (Zrt Z̄r )2 nt=1 (Zst Z̄s )2
where Z̄r and Z̄s are the sample means of Zr and Zs respectively. It has been
shown (Hamann REF) that the sample correlation function estimator ⇢ˆ(k) is
consistent and asymptotically Normally distributed, assuming that the vector
process is stationary.
The sample correlation matrix function is used to identify the order of the
(finite-order) moving average component of the ARMA model. This is due to
the characteristic of the sample correlation matrix function that the correlation
matrices beyond lag q are zero for a vector MA(q) process.
4. MULTIVARIATE TIME SERIES 38
where Ẑt and Ẑt+k are the linear estimators of Zt and Zt+k calculated by minimum
mean squared error linear regression on Zt+1 , Zt+2 , . . . , Zt+k 1 . This function
kk is zero for |k| > p where p is the number of autoregressive terms required by
the underlying model.
5
Cointegration
But since the di↵erenced series are stationary we may decide that these di↵erenced
series are related by a regression model:
In this model increasing I by one unit per period will increase C by 0 units.
Statistically this model is sound but in economics it may not be so reasonable.
39
5. COINTEGRATION 40
In particular, one might think that the relationship between the increase in
Consumption given an increase in Income should also depend on the current
level of Income. One reasoning for this might be that if one earns a lot then any
increase in income will not be needed to be saved for necessities but could instead
be spent freely, whereas if one is not in a high income bracket then extra income
may not be so liberally consumed.
Economists are also generally interested in systems reaching equilibrium and the
model (5.1) does not include an equilibrium solution. In equilibrium we would
have Ct = Ct 1 = . . . and It = It 1 = ... .
One way to try and fix these problems is to include a term which is the deviation
between the actual value of C in the previous period t 1 and the equilibrium
value of C.
Ctequil = It . (5.2)
This type of model is called an Error Correction Model (ECM) as it has the
ability to correct disequilibria.
5. COINTEGRATION 41
If Ct increases slower than expected then we will find (Ct 1 It 1 ) < 0 but
(1 ✓1 ) < 0 also. So the net e↵ect is to add a positive term to the equilibrium
value 0 4It thus boosting 4Ct and forcing Ct back towards equilibrium.
If Ct increases faster than expected then we are instead adding a negative term
which again forces Ct back towards its equilibrium value.
The original model (5.1) did have problems as far as economics was concerned
however (5.1) was a relationship between stationary variables and so was sound
statistically.
The new model (5.4) may make more economic sense, however it now has statistical
problems: it only makes sense if the new variable Ct 1 It 1 is stationary. But
this variable is a linear combination of two non-stationary I(1) variables C and
I at time t 1 and this will also in general be non-stationary I(1).
We note that we can also generalize the ECM model (5.4), including more lag
lengths, to the following relationship linking the variables Ct and It :
where
A(L) = 1 ↵1 L ↵ 2 L2 ... ↵ k Lk ,
2 q
B(L) = 0 + 1L + 2L + ... + qL .
5. COINTEGRATION 42
As in the one dimensional case (c.f. (??) and (??)) this VAR model can be
re-formulated as a Vector Error Correction Model (VECM):
where
i = ( k + p 1 + ... + i+1 ), i = 1, 2, . . . , k 1
and
⇧= 1 + ... + k I.
5. COINTEGRATION 43
0 0
4yt = ⇧yt k + 1 4yt 1 + ... + k 1 4yt k+1 + ✏t (5.8)
where
0
i = (I 1 2 ... i ), i = 1, 2, . . . , k 1
and
⇧= 1 + ... + k I.
In this section we introduced the ECM but we have seen that this model appears
not to make sense statistically as it involves both I(1) and I(0) variables together
in the same regression. The solution to this problem was presented by Engle
and Granger when they introduced the concept of cointegration, which we will
examine in the next section of this chapter.
5.3 Cointegration
Consider two series y1t and y2t , which are both integrated of order d ⇠ I(d). In
general any linear combination of these series will also be integrated of order
d. In particular if a regression is performed of y1t on y2t , then the residuals
from this regression will be I(d) i.e. the regression will su↵er from spurious
correlation. Engle and Granger noticed that in some situations it might be
possible to perform a regression containing non-stationary variables and still avoid
spurious regression. They introduced the concept of cointegration:
5. COINTEGRATION 44
0
Zt = yt ⇠ I(d b), d b > 0.
If ( 1 , 2, . . . , h) span the co-integrating space then they form a basis for the
co-integrating space.
Having seen the formal definition of cointegration let us consider what it means
in practice. Cointegration means that although there may be many apparently
independent changes in the individual elements of yt , there are actually some
long-run equilibrium relations tying the individual components together. These
0
relations are represented by the linear combinations yt . So cointegration provides
a model for the idea in economics of a long-run equilibrium to which the system
will converge over time.
We see now that if two variables are fully co-integrated CI(d, d) it is possible to
perform a meaningful regression between them, the regression would pick up the
stationary linear combination and the residuals would no longer be non-stationary
thus eliminating the problem of spurious regression. In practice we mainly deal
with I(1) variables and seek to find co-integrating linear combinations which will
be stationary.
5. COINTEGRATION 46
When the concept of cointegration was introduced by Engle and Granger, they
suggested a procedure to test for cointegration among two variables. If two
variables yt and xt are co-integrated then there is a stationary linear combination
of the variables. This means that the model:
yt = x t + e t (5.9)
describes a stationary relationship, does not su↵er from spurious regression and
can be consistently estimated by ordinary least squares.
Now, if the two variables yt and xt are not co-integrated then there will not be a
stationary linear combination of the two variables and hence equation (5.9) would
once again su↵er from spurious regression as the residuals will be non-stationary.
Engle and Granger make use of this fact to construct a test for cointegration.
They suggest using an ADF test on the residuals et of the (5.9) regression to
see if they satisfy the null of being I(1) or the alternative of stationarity - I(0).
So as described in Section 2.3 we should estimate:
k 1
X
⇤
4et = ⇢ et 1 + ⇢⇤i 4et i + µ + t + !t ,
i=1
!t ⇠ IID white noise (5.10)
The trend and drift terms can be added in the regression (5.9) or in (5.10) but
not in both.
5. COINTEGRATION 47
This Engle-Granger procedure for testing for cointegration su↵ers from many
problems:
In addition to these problems the fact is that this approach, which uses a single
equation in the model, is only really suitable if there is just one cointegrating
relationship. In general, the multivariate Vector Auto Regression (VAR) approach
of Johansen is to be preferred.