You are on page 1of 31

Cointegration and Error Correction Model

Bidyut Kumar Ghosh


Department of Commerce
Manipal Academy of Higher Education
email: bidyut.ghosh@manipal.edu

Tuesday, July 13, 2021, 10:42:03 AM IST

Contents
Introduction 2

Cointegration: a more mathematical approach 2


Definition 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Definition 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

Cointegration and the error-correction mechanism (ECM): A General Approach 3


The Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Cointegration (again) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
The error-correction model (ECM) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
Advantages of the ECM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

Cointegration and the error-correction mechanism: A More Mathematical Approach 5


A simple model for only one lagged term of 𝑋 and 𝑌 . . . . . . . . . . . . . . . . . . . . . . . . . . 5
Proof of ECM as reparametrization of the ARDL . . . . . . . . . . . . . . . . . . . . . . . . . 6
A more general model for large numbers of lagged terms . . . . . . . . . . . . . . . . . . . . . . . . 6

Testing for cointegration 8


Cointegration in single equations: the Engle-Granger approach . . . . . . . . . . . . . . . . . . . . 8
Drawbacks of the EG approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

Numerical Example: Is the regression of consumption expenditure on disposable income


spurious 11
Visualisation of the data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
Performing ADF test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
When a spurious regression may not be spurious? . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
The EG and AEG tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
Unit root tests and cointegration tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
Cointegration and error correction mechanism (ECM) . . . . . . . . . . . . . . . . . . . . . . . . . 18

Cointegration in multiple equations and the Johansen approach 19

Advantages of the multiple-equation approach 21

The Johansen approach 21


Case 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
Case 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
Case 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

1
The steps of the Johansen approach in practice 22
Step 1: testing the order of integration of the variables . . . . . . . . . . . . . . . . . . . . . . . . . 22
Step 2: Setting the appropriate lag length of the model . . . . . . . . . . . . . . . . . . . . . . . . 22
Step 3: Choosing the appropriate model regarding the deterministic components in the multivariate
system . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
Step 4: Determining the rank of Π or the number of cointegrating vectors . . . . . . . . . . . . . . 24
Step 5: Testing for weak exogeneity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
Step 6: Testing for linear restrictions in the cointegrating vectors . . . . . . . . . . . . . . . . . . . 25

Numerical Example: 25
Importing the data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

Introduction
The basic idea of cointegration follows from our explanation of spurious regression where we have seen that
if the two variables are non-stationary we can represent the error as a combination of two cumulated error
processes. These cumulated error processes are often called stochastic trends and normally we would expect
them to combine to produce another non-stationary process.
However, in the special case that 𝑋 and 𝑌 are in fact related we would expect them to move together so
the two stochastic trends would be very similar. When we put them together it should be possible to find a
combination of them that eliminates the nonstationarity. In this special case we say that the variables are
cointegrated. In theory, this should only happen when there is truly a relationship linking the two variables,
so cointegration becomes a very powerful way of detecting the presence of economic structures.
Cointegration then becomes an overriding requirement for any economic model using non-stationary time
series data. If the variables do not cointegrate we have problems of spurious regression and econometric
work becomes almost meaningless. On the other hand, if the stochastic trends do cancel then we have
cointegration and, as we shall see later, everything works even more effectively than we previously might
have thought.
The key point here is that, if there really is a genuine long-run relationship between 𝑌𝑡 and 𝑋𝑡 , then despite
the variables rising over time (because they are trended), there will be a common trend that links them
together. For an equilibrium or long-run relationship to exist, what we require, then, is a linear combination
of 𝑌𝑡 and 𝑋𝑡 that is a stationary variable (an 𝐼(0) variable). A linear combination of 𝑌𝑡 and 𝑋𝑡 can be taken
directly from estimating the following regression:

𝑌𝑡 = 𝛽1 + 𝛽2 𝑋𝑡 + 𝑢𝑡 (1)

and taking the residuals:

𝑢̂𝑡 = 𝑌𝑡 − 𝛽1̂ − 𝛽2̂ 𝑋𝑡 (2)

If 𝑢̂𝑡 ∼ 𝐼(0) then the variables 𝑌𝑡 and 𝑋𝑡 are said to be cointegrated.

Cointegration: a more mathematical approach


To put it differently, consider a set of two variables {𝑌 , 𝑋} that are integrated of order 1 (that is {𝑌 , 𝑋} ∼
𝐼(1)) and suppose that there is a vector {𝜃1 , 𝜃2 } that gives a linear combination of {𝑌 , 𝑋} which is stationary,
denoted by:

𝜃1 𝑌𝑡 + 𝜃2 𝑋𝑡 = 𝑢𝑡 ∼ 𝐼(0) (3)

2
then the variable set {𝑌 , 𝑋} is called the cointegration set, and the coefficients vector {𝜃1 , 𝜃2 } is called the
cointegration vector. What we are interested in is the long-run relationship, which for 𝑌𝑡 is:

𝑌𝑡∗ = 𝛽𝑋𝑡 (4)

To see how this comes from the cointegration method, we can normalize Equation (3) for 𝑌𝑡 to give:

𝜃2
𝑌𝑡 = − 𝑋 + 𝑒𝑡 (5)
𝜃1 𝑡

where now 𝑌 ∗ = − (𝜃2 /𝜃1 ) 𝑋𝑡 which can be interpreted as the long-run or equilibrium value of 𝑌𝑡 (conditional
on the values of 𝑋𝑡 ). We shall return to this point when discussing the error-correction mechanism.
For bivariate economic 𝐼(1) time series processes, cointegration often manifests itself by more or less par-
allel plots of the series involved. As noted earlier, we are interested in detecting long-run or equilibrium
relationships and this is mainly what the concept of cointegration allows.
Working in the context of a bi-variate system with at most one cointegrating vector, Engle and Granger
(1987) give the formal definition of cointegration between two variables as follows:

Definition 1
Time series 𝑌𝑡 and 𝑋𝑡 are said to be cointegrated of order 𝑑, 𝑏 where 𝑑 ≥ 𝑏 ≥ 0, written as 𝑌𝑡 , 𝑋𝑡 ∼ 𝐶𝐼(𝑑, 𝑏),
if (a) both series are integrated of order 𝑑, and (b) there exists a linear combination of these variables, say
𝛽1 𝑌𝑡 + 𝛽2 𝑋𝑡 which is integrated of order 𝑑 − 𝑏. The vector {𝛽1 , 𝛽2 } is called the cointegrating vector.
A simple generalization of the above definition can be made for the case of 𝑛 variables, as follows:

Definition 2
If 𝑍𝑡 denotes an 𝑛 × 1 vector of series 𝑍1𝑡 , 𝑍2𝑡 , 𝑍3𝑡 , ⋯ , 𝑍𝑛𝑡 and (a) each 𝑍𝑖𝑡 is 𝐼(𝑑); and (b) there exists an
𝑛 × 1 vector 𝛽 such that 𝑍𝑡′ 𝛽 ∼ 𝐼(𝑑 − 𝑏), then 𝑍𝑡 ∼ 𝐶𝐼(𝑑, 𝑏).
For empirical econometrics, the most interesting case is where the series transformed with the use of the
cointegrating vector become stationary; that is, when 𝑑 = 𝑏, and the cointegrating coefficients can be
identified as parameters in the long-run relationship between the variables.

Cointegration and the error-correction mechanism (ECM): A Gen-


eral Approach
The Problem
As mentioned earlier, when there are non-stationary variables in a regression model we may get results that
are spurious.
So if 𝑌𝑡 and 𝑋𝑡 are both 𝐼(1), if we regress:

𝑌𝑡 = 𝛽1 + 𝛽2 𝑋𝑡 + 𝑢𝑡 (6)

we will not generally get satisfactory estimates of 𝛽1̂ and 𝛽2̂ .


One way of resolving this is to difference the data to ensure stationarity of our variables. After doing this,

Δ𝑌𝑡 ∼ 𝐼(0)

3
and

Δ𝑋𝑡 ∼ 𝐼(0)

and the regression model will be:

Δ𝑌𝑡 = 𝑎1 + 𝑎2 Δ𝑋𝑡 + Δ𝑢𝑡 (7)

In this case, the regression model may give us correct estimates of the 𝑎1̂ and 𝑎2̂ parameters and the spurious
equation problem has been resolved.
However, what we have from Equation (7) is only the short-run relationship between the two variables.
Remember that, in the long-run relationship:

𝑌𝑡∗ = 𝛽1 + 𝛽2 𝑋𝑡 (8)

so Δ𝑌𝑡 is bound to give us no information about the long-run behaviour of our model. Knowing that
economists are interested mainly in long-run relationships, this constitutes a big problem, and the concept
of cointegration and the ECM are very useful to resolve this.

Cointegration (again)
We noted earlier that 𝑌𝑡 and 𝑋𝑡 are both 𝐼(1). In the special case that there is a linear combination of 𝑌𝑡
and 𝑋𝑡 (that is, 𝐼(0)), then 𝑌𝑡 and 𝑋𝑡 are cointegrated. Thus, if this is the case, the regression of Equation
(6) is no longer spurious, and it also provides us with the linear combination:

𝑢̂𝑡 = 𝑌𝑡 − 𝛽1̂ − 𝛽2̂ 𝑋𝑡 (9)

which connects 𝑌𝑡 and 𝑋𝑡 in the long-run.

The error-correction model (ECM)


If, then, 𝑌𝑡 and 𝑋𝑡 are cointegrated, by definition 𝑢̂𝑡 ∼ 𝐼(0). Thus we can express the relationship between
𝑌𝑡 and 𝑋𝑡 with an ECM specification as:

Δ𝑌𝑡 = 𝑎0 + 𝑏1 Δ𝑋𝑡 − 𝜋𝑢̂𝑡−1 + 𝑒𝑡 (10)

which will now have the advantage of including both long-run and short-run information. In this model,
𝑏1 is the impact multiplier (the short-run effect) that measures the immediate impact a change in 𝑋𝑡 will
have on a change in 𝑌𝑡 . On the other hand, 𝜋 is the feedback effect, or the adjustment effect, and shows
how much of the disequilibrium is being corrected – that is, the extent to which any disequilibrium in the
previous period affects any adjustment in 𝑌𝑡 . Of course 𝑢̂𝑡−1 = 𝑌𝑡−1 − 𝛽1̂ − 𝛽2̂ 𝑋𝑡−1 and ,therefore, from this
equation 𝛽2 is also the long-run response (note that it is estimated by Equation (7)).
Equation (10) now emphasizes the basic approach of the cointegration and error-correction models. The
spurious regression problem arises because we are using non-stationary data, but in Equation (10) everything
is stationary, the change in 𝑋 and 𝑌 is stationary because they are assumed to be 𝐼(1) variables, and the
residual from the levels regression (9) is also stationary, by the assumption of cointegration. So Equation (10)
fully conforms to our set of assumptions about the classic linear regression model and OLS should perform
well.

4
Advantages of the ECM
The ECM is important and popular for many reasons:
1. First, it is a convenient model measuring the correction from disequilibrium of the previous period,
which has a very good economic implication.
2. Second, if we have cointegration, ECMs are formulated in terms of first differences, which typically
eliminate trends from the variables involved, and they resolve the problem of spurious regressions.
3. A third, very important, advantage of ECMs is the ease with which they can fit into the general to
specific approach to econometric modelling, which is in fact a search for the most parsimonious ECM
model that best fits the given data sets.
4. Finally, the fourth and most important feature of the ECM comes from the fact that the disequilibrium
error term is a stationary variable (by definition of cointegration). Because of this, the ECM has
important implications: the fact that the two variables are cointegrated implies that there is some
adjustment process preventing the errors in the long-run relationship from becoming larger and larger.

Cointegration and the error-correction mechanism: A More Math-


ematical Approach
A simple model for only one lagged term of 𝑋 and 𝑌
The concepts of cointegration and the error-correction mechanism (ECM) are very closely related. To
understand the ECM it is better to think of it first as a convenient reparametrization of the general linear
autoregressive distributed lag (ARDL) model. Consider the very simple dynamic ARDL model describing
the behavior of 𝑌 in terms of 𝑋, as follows:

𝑌𝑡 = 𝑎0 + 𝑎1 𝑌𝑡−1 + 𝛾0 𝑋𝑡 + 𝛾1 𝑋𝑡−1 + 𝑢𝑡 (11)

where the residuls 𝑢𝑡 ∼ 𝑖𝑖𝑑(𝑜, 𝜎2 ).


In this model the parameter 𝛾0 denotes the short-run reaction of 𝑌𝑡 after a change in 𝑋𝑡 . The long-run effect
is given when the model is in equilibrium, where:

𝑌𝑡∗ = 𝛽0 + 𝛽1 𝑋𝑡∗ (12)

and for simplicity assume that

𝑋𝑡∗ = 𝑋𝑡 = 𝑋𝑡−1 = ⋯ = 𝑋𝑡−𝑝 (13)

Thus, it is given by:

𝑌𝑡∗ =𝑎0 + 𝑎1 𝑌𝑡∗ + 𝛾0 𝑋𝑡∗ + 𝛾1 𝑋𝑡∗ + 𝑢𝑡


𝑌𝑡∗ (1 − 𝑎1 ) =𝑎0 + (𝛾0 + 𝛾1 )𝑋𝑡∗ + 𝑢𝑡
𝑎0 𝛾 + 𝛾1 ∗ (14)
𝑌𝑡 ∗ = + 0 𝑋 + 𝑢𝑡
1 − 𝑎1 1 − 𝑎1 𝑡
𝑌𝑡∗ =𝛽0 + 𝛽1 𝑋𝑡∗ + 𝑢𝑡

So the long-run elasticity between 𝑌 and 𝑋 is captured by 𝛽1 = (𝛾0 + 𝛾1 )/(1 − 𝑎1 ). Here, we need to make
the assumption that 𝑎1 < 1 so that the short-run model in Equation (11) converges to a long-run solution.
We can then derive the ECM, which is a reparametrization of the original Equation (11) model:

5
Δ𝑌𝑡 = 𝛾0 Δ𝑋𝑡 − (1 − 𝑎) [𝑌𝑡−1 − 𝛽0 − 𝛽1 𝑋𝑡−1 ] + 𝑢𝑡 (15)

Δ𝑌𝑡 = 𝛾0 Δ𝑋𝑡 − 𝜋 [𝑌𝑡−1 − 𝛽0 − 𝛽1 𝑋𝑡−1 ] + 𝑢𝑡 (16)

What is of importance here is that when the two variables 𝑌 and 𝑋 are cointegrated, the ECM incorporates
not only short-run but also long-run effects. This is because the long-run equilibrium 𝑌𝑡−1 − 𝛽0 − 𝛽1 𝑋𝑡−1
is included in the model together with the short-run dynamics captured by the differenced term. Another
important advantage is that all the terms in the ECM model are stationary, and standard OLS is therefore
valid. This is because if 𝑌 and 𝑋 are 𝐼(1), then Δ𝑌 and Δ𝑋 are 𝐼(0), and by definition if 𝑌 and 𝑋 are
cointegrated then their linear combination (𝑌𝑡−1 − 𝛽0 − 𝛽1 𝑋𝑡−1 ) ∼ 𝐼(0).
A final, very important, point is that the coefficient 𝜋 = (1 − 𝑎1 ) provides us with information about the
speed of adjustment in cases of disequilibrium. To understand this better, consider the long-run condition.
When equilibrium holds, then (𝑌𝑡−1 − 𝛽0 − 𝛽1 𝑋𝑡−1 ) = 0. However, during periods of disequilibrium, this
term will no longer be zero and measures the distance the system is away from equilibrium.
For example, suppose that because of a series of negative shocks in the economy (captured by the error term
𝑢𝑡 ) 𝑌𝑡 increases less rapidly than is consistent with Equation (14). This causes (𝑌𝑡−1 − 𝛽0 − 𝛽1 𝑋𝑡−1 ) to be
negative, because 𝑌𝑡−1 has moved below its long-run steady-state growth path. However, since 𝜋 = (1 − 𝑎1 )
is positive (and because of the minus sign in front of 𝜋) the overall effect is to boost Δ𝑌𝑡 back towards
its long-run path as determined by 𝑋𝑡 in Equation (14). The speed of this adjustment to equilibrium is
dependent on the magnitude of (1 − 𝑎1 ).

Proof of ECM as reparametrization of the ARDL


To show that this is the same as the original model, substitute the long-run solutions for 𝛽0 = 𝑎0 /(1 − 𝑎1 )
and 𝛽1 = (𝛾0 + 𝛾1 )/(1 − 𝑎1 ) to give:

𝑎0 𝛾 + 𝛾1
Δ𝑌𝑡 = 𝛾0 Δ𝑋𝑡 − (1 − 𝑎) [𝑌𝑡−1 − − 0 𝑋 ] + 𝑢𝑡 (17)
1 − 𝑎1 1 − 𝑎1 𝑡−1

Δ𝑌𝑡 = 𝛾0 Δ𝑋𝑡 − (1 − 𝑎)𝑌𝑡−1 − 𝑎0 + (𝛾0 + 𝛾1 )𝑋𝑡−1 + 𝑢𝑡 (18)

𝑌𝑡 − 𝑌𝑡−1 = 𝛾0 𝑋𝑡 − 𝛾0 𝑋𝑡−1 − 𝑌𝑡−1 + 𝑎𝑌𝑡−1 − 𝑎0 − 𝛾0 𝑋𝑡−1 − 𝛾1 𝑋𝑡−1 + 𝑢𝑡 (19)

and by rearranging and cancelling out terms that are added and subtracted at the same time we get:

𝑌𝑡 = 𝑎0 + 𝑎1 𝑌𝑡−1 + 𝛾0 𝑋𝑡 + 𝛾1 𝑋𝑡−1 + 𝑢𝑡 (20)

which is the same as for the original model.

A more general model for large numbers of lagged terms


Consider the following two-variable 𝑌𝑡 and 𝑋𝑡 ARDL:

𝑛 𝑚
𝑌𝑡 = 𝜇 + ∑ 𝑎𝑖 𝑌𝑡−𝑖 + ∑ 𝛾𝑖 𝑋𝑡−𝑖 + 𝑢𝑡 (21)
𝑖=1 𝑖=0

𝑌𝑡 = 𝜇 + 𝑎1 𝑌𝑡−1 + ⋯ ⋯ + 𝑎𝑛 𝑌𝑡−𝑛 + 𝛾0 𝑋𝑡 + 𝛾1 𝑋𝑡−1 + ⋯ ⋯ + 𝛾𝑚 𝑋𝑡−𝑚 + 𝑢𝑡 (22)

6
We want to obtain a long-run solution of the model, which would be defined as the point where 𝑌𝑡 and 𝑋𝑡
settle down to constant steady-state levels 𝑌 ∗ and 𝑋 ∗ , or more simply when:

𝑌 ∗ = 𝛽 0 + 𝛽1 𝑋 ∗ (23)

and again assume 𝑋 ∗ is constant

𝑋 ∗ = 𝑋𝑡 = 𝑋𝑡−1 = ⋯ = 𝑋𝑡−𝑚

So, putting this condition into Equation (21), we get the long-run solution, as:

𝜇 ∑ 𝛾𝑖
𝑌∗ = + 𝑋∗
1 − ∑ 𝑎𝑖 1 − ∑ 𝑎𝑖

𝜇 𝛾1 + 𝛾2 + ⋯ + 𝛾𝑚
𝑌∗ = + 𝑋∗ (24)
1 − 𝑎 1 − 𝑎2 − ⋯ − 𝑎 𝑛 1 − 𝑎 1 − 𝑎2 − ⋯ − 𝑎 𝑛

which can be written as:

𝑌 ∗ = 𝐵 0 + 𝐵1 𝑋 ∗ (25)

which means we can define 𝑌 ∗ conditional on a constant value of 𝑋 at time 𝑡 as:

𝑌 ∗ = 𝐵0 + 𝐵1 𝑋𝑡 (26)

Here there is an obvious link to the discussion of cointegration in the previous section. Defining 𝑒𝑡 as the
equilibrium error as in Equation (4), we get:

𝑒𝑡 ≡ 𝑌𝑡 − 𝑌 ∗ = 𝑌𝑡 − 𝐵𝑜 − 𝐵1 𝑋𝑡 (27)

Therefore, what we need is to be able to estimate the parameters 𝐵0 and 𝐵1 . Clearly, 𝐵0 and 𝐵1 can be
derived by estimating Equation (21) by OLS and then calculating 𝐴 = 𝜇/(1 − ∑ 𝑎𝑖 ) and 𝐵 = ∑ 𝛾𝑖 /(1 − 𝑎𝑖 ).
However, the results obtained by this method are not transparent, and calculating the standard errors will
be very difficult. However, the ECM specification cuts through all these difficulties. Take the following
model, which (although it looks quite different) is a reparametrization of Equation (21):

𝑛−1
Δ𝑌𝑡 = 𝜇 + ∑ 𝑎𝑖 Δ𝑌𝑡−𝑖 + ∑ 𝑖 = 0𝑚−1 𝛾1 Δ𝑋𝑡−𝑖 + 𝜃1 𝑌𝑡−1 + 𝜃2 𝑋𝑡−1 + 𝑢𝑡 (28)
𝑖=1

Note that for 𝑛 = 1 the second term on the left-hand side of Equation (28) disappears. From this equation
we can see, with a bit of mathematics, that:

𝜃2 = ∑ 𝑖 = 1𝑚 𝛾𝑖 (29)

which is the numerator of the long-run parameter, 𝐵1 , and that:

𝜃1 = − (1 − ∑ 𝑖 = 1𝑛 𝑎𝑖 ) (30)

So the long-run parameter 𝐵0 is given by 𝐵0 = 1/𝜃1 and the long-run parameter 𝐵1 = −𝜃2 /𝜃1 .

7
Therefore, the level terms of 𝑌𝑡 and 𝑋𝑡 in the ECM tell us exclusively about the long-run parameters. Given
this, the most informative way to write the ECM is as follows:

𝑛−1 𝑚−1
1 𝜃
Δ𝑌𝑡 = 𝜇 + ∑ 𝑎𝑖 Δ𝑌𝑡−𝑖 + ∑ 𝛾𝑖 Δ𝑋𝑡−𝑖 + 𝜃1 (𝑌𝑡−1 − − 2 𝑋𝑡−1 ) + 𝑢𝑡 (31)
𝑖=1 𝑖=0
𝜃1 𝜃1

𝑛−1 𝑚−1
Δ𝑌𝑡 = 𝜇 + ∑ 𝑎𝑖 Δ𝑌𝑡−𝑖 + ∑ 𝛾𝑖 Δ𝑋𝑡−𝑖 − 𝜃1 (𝑌𝑡−1 − 𝛽0̂ − 𝛽1̂ 𝑥𝑡−1 ) + 𝑢𝑡 (32)
𝑖=1 𝑖=0

where 𝜃1 = 0. Furthermore, knowing that 𝑌𝑡−1 − 𝛽0̂ − 𝛽1̂ 𝑥𝑡−1 = 𝑒𝑡 , our equilibrium error, we can rewrite
Equation (31) as:

𝑛−1 𝑚−1
Δ𝑌𝑡 = 𝜇 + ∑ 𝑎𝑖 Δ𝑌𝑡−𝑖 + ∑ 𝛾𝑖 Δ𝑋𝑡−𝑖 − 𝜋𝑒𝑡−1
̂ + 𝜖𝑡 (33)
𝑖=1 𝑖=0

What is of major importance here is the interpretation of 𝜋.


𝜋 is the error-correction coefficient and is also called the adjustment coefficient. In fact, 𝜋 tells us how much
of the adjustment to equilibrium takes place in each period, or how much of the equilibrium error is corrected.
Consider the following cases:
(a) If 𝜋 = 1 then 100% of the adjustment takes place within a given period, or the adjustment is instanta-
neous and full.
(b) If 𝜋 = 0.5 then 50% of the adjustment takes place in each period.
(c) If 𝜋 = 0 then there is no adjustment, and to claim that 𝑌𝑡∗ is the long-run part of 𝑌𝑡 no longer makes
sense.
We need to connect this with the concept of cointegration. Because of cointegration, 𝑒𝑡̂ ∼ 𝐼(0) and, therefore,
also 𝑒𝑡−1
̂ ∼ 𝐼(0). Thus, in Equation (33), which is the ECM representation, we have a regression that contains
only 𝐼(0) variables and allows us to use both long-run information and short-run disequilibrium dynamics,
which is the most important feature of the ECM.

Testing for cointegration


Cointegration in single equations: the Engle-Granger approach
Granger (1981) introduced a remarkable link between non-stationary processes and the concept of long-
run equilibrium; this link is the concept of cointegration defined above. Engle and Granger (1987) further
formalized this concept by introducing a very simple test for the existence of cointegrating (that is long-run
equilibrium) relationships.
To understand this approach (which is often called the EG approach) consider the following two series, 𝑋𝑡
and 𝑌𝑡 , and the following cases:
(a) If 𝑌𝑡 ∼ 𝐼(0) and 𝑋𝑡 ∼ 𝐼(1), then every linear combination of those two series

𝜃1 𝑌𝑡 + 𝜃2 𝑋𝑡 (34)

will result in a series that will always be 𝐼(1) or non-stationary. This will happen because the behaviour of
the non-stationary 𝐼(1) series will dominate the behaviour of the 𝐼(0) one.
(b) If we have that both 𝑋𝑡 and 𝑌𝑡 are 𝐼(1), then in general any linear combination of the two series, say

8
𝜃1 𝑌𝑡 + 𝜃2 𝑋𝑡 (35)

will also be 𝐼(1). However, though this is the more likely case, there are exceptions to this rule, and we
might find in rare cases that there is a unique combination of the series, as in Equation (35) above, that is
𝐼(0). If this is the case, we say that 𝑋𝑡 and 𝑌𝑡 are cointegrated of order (1, 1).
Now the problem is how to estimate the parameters of the long-run equilibrium relationship and check
whether or not we have cointegration. Engle and Granger proposed a straightforward method involving four
steps.
Step 1: test the variables for their order of integration
By definition, cointegration necessitates that the variables be integrated of the same order. Thus the first
step is to test each variable to determine its order of integration. The DF and ADF tests can be applied
in order to infer the number of unit roots (if any) in each of the variables. We can differentiate three cases
which will either lead us to the next step or will suggest stopping:
(a) if both variables are stationary (𝐼(0)), it is not necessary to proceed, since standard time series methods
apply to stationary variables (in other words, we can apply classical regression analysis);
(b) if the variables are integrated of different order, it is possible to conclude that they are not cointegrated;
and
(c) if both variables are integrated of the same order we proceed with step 2.
Step 2: estimate the long-run (possible cointegrating) relationship
If the results of step 1 indicate that both Xt and Yt are integrated of the same order (usually in economics,
𝐼(1)), the next step is to estimate the long-run equilibrium relationship of the form:

𝑌𝑡 = 𝛽1 + 𝛽2 𝑋𝑡 + 𝑒𝑡 (36)

and obtain the residuals of this equation.


If there is no cointegration, the results obtained will be spurious. However, if the variables are cointegrated,
then OLS regression yields ‘super-consistent’ estimators for the cointegrating parameter 𝛽2̂ .
Step 3: check for (cointegration) the order of integration of the residuals
To determine if the variables are in fact cointegrated, denote the estimated residual sequence from this
equation by 𝑒𝑡̂ . Thus, 𝑒𝑡̂ is the series of the estimated residuals of the long-run relationship. If these
deviations from long-run equilibrium are found to be stationary, then 𝑋𝑡 and 𝑌𝑡 are cointegrated.
We perform a DF test on the residual series to determine their order of integration. The form of this DF
test is:

𝑛
Δ𝑒𝑡̂ = 𝑎1 𝑒𝑡−1
̂ + ∑ 𝛿𝑖 Δ𝑒𝑡−𝑖
̂ + 𝑣𝑡 (37)
𝑖=1

Note that because 𝑒𝑡̂ is a residual we do not include a constant or a time trend. The critical values differ
from the standard ADF values, being more negative (typically around −3.5). Critical values are provided in
Table 1 below. Obviously, if we find that 𝑒𝑡̂ ∼ 𝐼(0), we can reject the null that the variables 𝑋𝑡 and 𝑌𝑡 are
not cointegrated; similarly, if we have a single equation with more than just one explanatory variable.
Step 4: estimate the ECM
If the variables are cointegrated, the residuals from the equilibrium regression can be used to estimate the
ECM and to analyze the long-run and short-run effects of the variables as well as to see the adjustment

9
coefficient, which is the coefficient of the lagged residual terms of the long-run relationship identified in step
2. At the end, the adequacy of the model must always be checked by performing diagnostic tests.

10
Table 1: Table 1: Critical values for the null of no cointegration

1% 5% 10%
No lags -4.07 -3.37 -3.3
Lags -3.73 -3.17 -2.91

Important note. It is of major importance to note that the critical values for the cointegration test (the
ADF test on the residuals) are not the same as the standard critical values of the ADF test used for testing
stationarity. In fact, in order to have more robust conclusions regarding the evidence of cointegration, the
critical values are more negative than the standard ADF ones. Engle and Granger (1987), in their seminal
paper, performed their own Monte Carlo simulations to construct critical values for the cointegration tests.
These values are shown in Table 1. There are two sets of critical values: the first is for no lagged dependent
variable terms in the augmentation term (that is for the simple DF test); and the second is for including
lagged dependent variables (that is for the ADF test). A more comprehensive set of critical values may be
found in MacKinnon (1991), which is now the primary source.

Drawbacks of the EG approach


One of the best features of the EG approach is that it very easy both to understand and to implement.
However, there are important shortcomings in the Engle–Granger methodology:
1. One very important issue is related to the order of the variables. When estimating the long-run
relationship, one has to place one variable in the left-hand side and use the others as regressors. The
test does not say anything about which of the variables can be used as a regressor and why. Consider,
for example, the case of just two variables, 𝑋𝑡 and 𝑌𝑡 . One can either regress 𝑌𝑡 on 𝑋𝑡 (that is,
𝑌𝑡 = 𝛼 + 𝛽𝑋𝑡 + 𝑢1𝑡 ) or choose to reverse the order and regress 𝑋𝑡 on 𝑌𝑡 (that is, 𝑋𝑡 = 𝛼 + 𝛽𝑌𝑡 + 𝑢2𝑡 ).
It can be shown, with asymptotic theory, that as the sample goes to infinity, the test for cointegration
on the residuals of those two regressions is equivalent (that is there is no difference in testing for
unit roots in 𝑢1𝑡 and 𝑢2𝑡 ). However, in practice in economics, there are rarely very big samples and
it is, therefore, possible to find that one regression exhibits cointegration while the other does not.
This is obviously a very undesirable feature of the EG approach, and the problem becomes far more
complicated when there are more than two variables to test.
2. A second problem is that when there are more than two variables there may be more than one cointe-
grating relationship, and the Engle–Granger procedure using residuals from a single relationship cannot
treat this possibility. So a most important point is that it does not give us the number of cointegrating
vectors.
3. A third problem is that it relies on a two-step estimator. The first step is to generate the residual
series and the second is to estimate a regression for this series to see whether the series is stationary
or not. Hence, any error introduced in the first step is carried into the second.
All these problems are resolved with the use of the Johansen approach that will be discussed later.

Numerical Example: Is the regression of consumption expenditure


on disposable income spurious
The excel file (Table 14.1.xls) gives quarterly data on personal consumption expenditure (PCE) and personal
disposable (i.e. after-tax) income (PDI) for the USA for the period 1970–2008, for a total of 156 observations.
All the data are in billions of 2000 dollars.

11
Visualisation of the data
Let us first plot the data, as shown below.
library(readxl)
Table14.1<-read_excel(path = "Table14_1.xls",col_names = TRUE,skip = 0)
attach(Table14.1)
head(Table14.1)

## # A tibble: 6 x 5
## pce pdi lnpce lnpdi time
## <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 2434. 2729. 7.80 7.91 1
## 2 2446. 2777. 7.80 7.93 2
## 3 2467. 2815. 7.81 7.94 3
## 4 2460. 2804. 7.81 7.94 4
## 5 2507. 2864. 7.83 7.96 5
## 6 2530. 2905. 7.84 7.97 6
plot(time,lnpce,type = "l", col="blue", lty=1, lwd=2, ylab="LPCE & LPDI")
axis(1, seq(0,180,5))
lines(time,lnpdi, type = "l",col="red",lty=2,lwd=2)
legend("topleft", legend=c("LPCE", "LDPI"),col=c("blue", "red"), lty=1:2, cex=0.8)
9.0

LPCE
LDPI
8.8
LPCE & LPDI

8.6
8.4
8.2
8.0
7.8

0 10 25 40 5055 70 85 100 115 130 145


150 160

time

As we have done frequently, we have plotted the data on a logarithmic scale so that changes in the variables
represent relative changes, or percentage changes after multiplication by 100. This figure shows that both
LPDI and LPCE are trending series, which suggests that these series are not stationary. They seem to be
𝐼(1), that is, they have stochastic trends. This can be confirmed by unit root analysis, as done below:

12
Performing ADF test
require(urca)

## Loading required package: urca


summary(ur.df(lnpdi,type = "trend",lags = 13, selectlags = "AIC"))

##
## ###############################################
## # Augmented Dickey-Fuller Test Unit Root Test #
## ###############################################
##
## Test regression trend
##
##
## Call:
## lm(formula = z.diff ~ z.lag.1 + 1 + tt + z.diff.lag)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.026641 -0.005161 0.000336 0.005564 0.032542
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.8496291 0.3415057 2.488 0.0140 *
## z.lag.1 -0.1057582 0.0429484 -2.462 0.0150 *
## tt 0.0007793 0.0003176 2.454 0.0154 *
## z.diff.lag -0.1708277 0.0841074 -2.031 0.0442 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.009331 on 138 degrees of freedom
## Multiple R-squared: 0.09066, Adjusted R-squared: 0.07089
## F-statistic: 4.586 on 3 and 138 DF, p-value: 0.0043
##
##
## Value of test-statistic is: -2.4624 28.2231 3.0346
##
## Critical values for test statistics:
## 1pct 5pct 10pct
## tau3 -3.99 -3.43 -3.13
## phi2 6.22 4.75 4.07
## phi3 8.43 6.49 5.47
summary(ur.df(lnpce,type = "trend",lags = 13, selectlags = "AIC"))

##
## ###############################################
## # Augmented Dickey-Fuller Test Unit Root Test #
## ###############################################
##
## Test regression trend
##
##
## Call:

13
## lm(formula = z.diff ~ z.lag.1 + 1 + tt + z.diff.lag)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.0274122 -0.0025974 0.0001455 0.0032497 0.0140332
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.6232660 0.1903262 3.275 0.00134 **
## z.lag.1 -0.0796253 0.0244561 -3.256 0.00143 **
## tt 0.0006422 0.0001992 3.224 0.00158 **
## z.diff.lag1 0.2337245 0.0806047 2.900 0.00436 **
## z.diff.lag2 0.1532792 0.0835448 1.835 0.06874 .
## z.diff.lag3 0.2738821 0.0827603 3.309 0.00120 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.00596 on 136 degrees of freedom
## Multiple R-squared: 0.2076, Adjusted R-squared: 0.1785
## F-statistic: 7.128 on 5 and 136 DF, p-value: 5.963e-06
##
##
## Value of test-statistic is: -3.2558 6.9729 5.3863
##
## Critical values for test statistics:
## 1pct 5pct 10pct
## tau3 -3.99 -3.43 -3.13
## phi2 6.22 4.75 4.07
## phi3 8.43 6.49 5.47
We have used the augmented Dickey–Fuller (ADF) test by including one lagged term of first difference of
lagged LPD. The coefficient of primary interest is the coefficient of lagged LPD, which is -0.1057582, which
on the basis of the usual 𝑡 test is significant at the 0.015 level, but on the basis of the tau statistic it is not
significant even at the 10% level, indicating that the LPD time series is non-stationary. Here too the LPCE
series is non-stationary on the basis of the ADF test, although the usual 𝑡 test declares otherwise. It seems
that both LPCE and LPDI series have a unit root, or stochastic trend. Therefore, if we regress LPCE on
LPDI, we might get a spurious regression. Before we consider this possibility, let us present the results of
this regression.
summary(lm(lnpce~lnpdi))

##
## Call:
## lm(formula = lnpce ~ lnpdi)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.045865 -0.010973 0.005101 0.010368 0.041209
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.842510 0.033717 -24.99 <2e-16 ***
## lnpdi 1.086822 0.003949 275.24 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

14
##
## Residual standard error: 0.01657 on 154 degrees of freedom
## Multiple R-squared: 0.998, Adjusted R-squared: 0.998
## F-statistic: 7.576e+04 on 1 and 154 DF, p-value: < 2.2e-16
library(car)

## Loading required package: carData


durbinWatsonTest(lm(lnpce~lnpdi))

## lag Autocorrelation D-W Statistic p-value


## 1 0.794206 0.3671868 0
## Alternative hypothesis: rho != 0
Before we interpret the results, notice that 𝑅2 > 𝑑(= 0.3672). This raises the possibility that this regression
might be spurious, which might be due to regressing one stochastic trend series on another stochastic trend
series. Of course, if we interpret the Durbin–Watson on its own, it would suggest that the error term in this
regression suffers from first-order autocorrelation. The results at their face value suggest that the elasticity
of personal consumption expenditure of 1.08 with respect to PDI is greater than one – a one percent increase
in PDI leads to more than a one percent increase in personal consumption expenditure. This elasticity seems
high. Because of the possibility of spurious regression, we should be wary of these results.
Since both the time series are trending, let us see what happens if we add a trend variable to the model.
Before we do that it may be worth noting that the trend variable is a catch-all for all other variables that
might affect both the regressand and regressor(s). One such variable is population, because as population
increases the aggregate consumption expenditure and aggregate disposable income also increase. If we had
quarterly data on population, we could have added that variable as an additional regressor instead of the
trend variable. Better yet, we could have expressed consumption expenditure and personal disposable income
on a per capita basis. So keep in mind that the trend variable may be a surrogate for other variables. With
this caveat, let us see what happens if we add the trend variable to our model.
summary(lm(lnpce~lnpdi+time))

##
## Call:
## lm(formula = lnpce ~ lnpdi + time)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.043229 -0.008002 0.000884 0.010870 0.031287
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.6729600 0.4873401 3.433 0.000769 ***
## lnpdi 0.7702424 0.0613163 12.562 < 2e-16 ***
## time 0.0023658 0.0004574 5.172 7.15e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.01533 on 153 degrees of freedom
## Multiple R-squared: 0.9983, Adjusted R-squared: 0.9983
## F-statistic: 4.423e+04 on 2 and 153 DF, p-value: < 2.2e-16
durbinWatsonTest(lm(lnpce~lnpdi+time))

## lag Autocorrelation D-W Statistic p-value


## 1 0.8501832 0.2616917 0

15
## Alternative hypothesis: rho != 0
Compared to the previous results, there are changes. The elasticity of LPCE with respect LPID is now
much less than unity, although it is still statistically significant on the basis of the usual 𝑡 test and the trend
variable is also statistically significant. Therefore, allowing for linear trend, the relationship between the
two variables is strongly positive. But notice again the low Durbin–Watson value, which suggests that the
results are plagued by autocorrelation. Or maybe this regression too is spurious.

When a spurious regression may not be spurious?


Underlying the regression with time trend, the population regression model is:

𝑙𝑃 𝐶𝐸𝑡 = 𝛽1 + 𝛽2 𝑙𝑃 𝐷𝐼𝑡 + 𝛽3 𝑡 + 𝑢𝑡 (Ex 1)


Rewrite this model as:

𝑢𝑡 = 𝑙𝑃 𝐶𝐸𝑡 − 𝛽1 − 𝛽2 𝑙𝑃 𝐷𝐼𝑡 − 𝛽3 𝑡 (Ex 2)

After estimating the first equation, suppose we subject the estimated 𝑢𝑡 to unit root analysis and find that
it is stationary, that is, it is 𝐼(0). This is an intriguing situation, for although the log of PCE and log of PDI
are individually 𝐼(1), that is, that they have stochastic trends, their (linear) combination is 𝐼(0). This linear
combination, so to speak, cancels out the stochastic trends in the two series. In that case the regression of
lPCE on LPDI is not spurious. If this happens, we say that the variables lPCE and lPDI are cointegrated.
This can be seen clearly in visual inspection of two variables, for even though the two series are stochastically
trending, they do not drift apart substantially. It is as if two drunkards are meandering aimlessly, but they
keep pace with each other. Economically speaking, two variables will be cointegrated if they have a long-run,
or equilibrium, relationship between them. In the present context economic theory tells us that there is
a strong relationship between consumption expenditure and personal disposable income. Remember that
PCE is about 70% of PDI. The point of all this discussion is that not all time series regressions are spurious.
Of course, we need to test this formally. As Granger notes, “A test for cointegration can be thought of
as a pre-test to avoid ‘spurious regression’ situations”. In the language of cointegration theory, regression
like the first one is known as a cointegrating regression and the slope parameters 𝛽2 and 𝛽3 are known as
cointegrating parameters.

The EG and AEG tests


To use the DF or ADF test, we estimate a regression like (Ex 1), obtain residuals from this regression, and
use these tests. However, since we only observe 𝑒𝑡 and not 𝑢𝑡 , the DF and ADF critical significance values
need to be adjusted, as suggested by Engle and Granger. Let us apply these tests to the PCE-PDI regression
(Ex 1). The results of this regression are already shown earlier. Let us first run the EG test with no intercept
and no trend term, which gives the results below.
# DF test on residuals
uhat1<-resid(lm(lnpce~lnpdi+time))
summary(ur.df(uhat1,type ="none" ,lags = 0))

##
## ###############################################
## # Augmented Dickey-Fuller Test Unit Root Test #
## ###############################################
##
## Test regression none
##
##
## Call:

16
## lm(formula = z.diff ~ z.lag.1 - 1)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.0208863 -0.0045538 0.0000729 0.0035560 0.0252790
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## z.lag.1 -0.13599 0.04009 -3.393 0.00088 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.007542 on 154 degrees of freedom
## Multiple R-squared: 0.06954, Adjusted R-squared: 0.0635
## F-statistic: 11.51 on 1 and 154 DF, p-value: 0.0008801
##
##
## Value of test-statistic is: -3.3926
##
## Critical values for test statistics:
## 1pct 5pct 10pct
## tau1 -2.58 -1.95 -1.62
uhat1<-resid(lm(lnpce~lnpdi+time))
summary(ur.df(uhat1,type ="none" ,lags = 13,selectlags = "AIC"))

##
## ###############################################
## # Augmented Dickey-Fuller Test Unit Root Test #
## ###############################################
##
## Test regression none
##
##
## Call:
## lm(formula = z.diff ~ z.lag.1 - 1 + z.diff.lag)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.0189857 -0.0041765 -0.0003458 0.0032564 0.0205569
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## z.lag.1 -0.11794 0.04420 -2.668 0.00853 **
## z.diff.lag1 -0.22465 0.08710 -2.579 0.01094 *
## z.diff.lag2 0.14580 0.08464 1.723 0.08719 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.007321 on 139 degrees of freedom
## Multiple R-squared: 0.1567, Adjusted R-squared: 0.1385
## F-statistic: 8.613 on 3 and 139 DF, p-value: 2.774e-05
##
##
## Value of test-statistic is: -2.6685

17
##
## Critical values for test statistics:
## 1pct 5pct 10pct
## tau1 -2.58 -1.95 -1.62
This output clearly shows that the residuals from regression (Ex 1) are stationary, for the computed tau
value of the lagged residual term far exceeds any of the critical values in the table. The results did not
change materially if we add several lagged terms.

Unit root tests and cointegration tests


Notice the difference between the unit root and cointegration tests. Tests for unit roots are performed on
single time series, whereas cointegration deals with the relationship among a group of variables, each having
a unit root. In practice, it is better to test each series for unit roots, for it is quite possible that some of
the series in a group may have more than one unit root, in which case they will have to be differenced more
than once to make them stationary.
If two time series 𝑌 and 𝑋 are integrated of different orders then the error term in the regression of 𝑌 and
𝑋 is not stationary and this regression equation is said to be unbalanced. On the other hand, if the two
variables are integrated of the same order, then the regression equation is said to be balanced.

Cointegration and error correction mechanism (ECM)


After allowing for deterministic trend, we have shown that log PCE and log PDI series are cointegrated,
that is, they have a long-term, or equilibrium, relationship. But how is this equilibrium achieved, for in the
short-run there may be disequilibrium? We can treat the error term in Eq. (Ex 2) as the “equilibrating”
error term that corrects deviations of LPCE from its equilibrium value given by the cointegrating regression
(Ex 1). Dennis Sargan called this the error correction mechanism (ECM), a term that was later popularized
by Engle and Granger. An important theorem, known as Granger Representation Theorem, states that if
two variables 𝑌 and 𝑋 are cointegrated, the relationship between the two can be expressed as an ECM. Now
consider the following model:

Δ𝑙𝑃 𝐶𝐸𝑡 = 𝛼1 + 𝛼2 Δ𝑙𝑃 𝐷𝐼𝑡 + 𝛼3 𝑢𝑡−1 + 𝜖𝑡 (Ex 3)

where Δ, as usual, is the first-difference operator, 𝑢𝑡−1 is the lagged value of the error correction term from
Eq. (Ex 2), and 𝜖𝑡 is a white noise error term. We know that Eq. (Ex 1) gives the long-run relationship
between 𝑙𝑃 𝐶𝐸 and 𝑙𝑃 𝐷𝐼. On the other hand, Eq. (Ex 3) gives the short-run relationship between the two.
Just as 𝛽2 in Eq. (Ex 1) gives the long-run impact of 𝑙𝑃 𝐷𝐼 on 𝑙𝑃 𝐶𝐸, 𝛼2 in Eq. (Ex 3) gives the immediate,
or short-run, impact of Δ𝑙𝑃 𝐷𝐼 on Δ𝑙𝑃 𝐶𝐸.
Model (Ex 3), called the error correction model (ECM), postulates that changes in 𝑙𝑃 𝐶𝐸 depend on changes
in 𝑙𝑃 𝐷𝐼 and the lagged equilibrium error term, 𝑢𝑡–1 . If this error term is zero, there will not be any
disequilibrium between the two variables and in that case the long-run relationship will be given by the
cointegrating relationship (Ex 1) (no error term here). But if the equilibrium error term is nonzero, the
relationship between 𝑙𝑃 𝐶𝐸 and 𝑙𝑃 𝐷𝐼 will be out of equilibrium.
To see this, let Δ𝑙𝑃 𝐷𝐼 = 0 (no change in 𝑙𝑃 𝐷𝐼) and suppose 𝑢𝑡–1 is positive. This means $LPCE_{t -1}
is too high to be in equilibrium – that is, 𝐿𝑃 𝐶𝐸𝑡−1 is above its equilibrium value (𝛽1 + 𝛽2 𝐿𝑃 𝐷𝐼𝑡−1 ). Since
𝛼3 in Eq. (Ex 3) is expected to be negative, the term 𝛼3 𝑢𝑡−1 is negative and, therefore Δ𝑙𝑃 𝐶𝐸𝑡 will be
negative to restore the equilibrium. That is, if 𝐿𝑃 𝐶𝐸𝑡 is above its equilibrium value, it will start falling in
the next period to correct the equilibrium error; hence the name ECM.
By the same token, if 𝑙𝑃 𝐶𝐸𝑡 is below its equilibrium value (i.e. if 𝑢𝑡−1 is negative), 𝛼3 𝑢𝑡−1 will be positive,
which will cause Δ𝐿𝑃 𝐶𝐸𝑡 to be positive, leading 𝑙𝑃 𝐶𝐸 to rise in period 𝑡.
Thus the absolute value of 𝛼3 will decide how quickly the equilibrium is reached. Note that in practice we
estimate 𝑢𝑡–1 by its sample counterpart 𝑒𝑡–1 . It is interesting to note that Eq. (Ex 3) incorporates both the

18
short-run and long-run dynamics. Also note that in Eq. (Ex 3) all the variables are 𝐼(0), or stationary. So
Eq. (Ex 3) can be estimated by OLS.
To see all this theory in practice, we return to our illustrative example. The empirical counterpart of Eq.
(Ex 3) is shown below:
fitecm<-lm(diff(lnpce, differences = 1)~diff(lnpdi, differences = 1)+uhat1[1:155])
summary(fitecm)

##
## Call:
## lm(formula = diff(lnpce, differences = 1) ~ diff(lnpdi, differences = 1) +
## uhat1[1:155])
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.0225994 -0.0028893 0.0002059 0.0029321 0.0155813
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.0055296 0.0006261 8.831 2.34e-15 ***
## diff(lnpdi, differences = 1) 0.3064059 0.0515887 5.939 1.87e-08 ***
## uhat1[1:155] -0.0652469 0.0335044 -1.947 0.0533 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.006129 on 152 degrees of freedom
## Multiple R-squared: 0.1898, Adjusted R-squared: 0.1791
## F-statistic: 17.8 on 2 and 152 DF, p-value: 1.131e-07
durbinWatsonTest(fitecm)

## lag Autocorrelation D-W Statistic p-value


## 1 0.1180621 1.707366 0.056
## Alternative hypothesis: rho != 0
First, note that all coefficients in this table are individually statistically significant at the 6% or lower level.
The coefficient of about 0.31 shows that a 1% increase in 𝑙𝑛(𝑙𝑃 𝐷𝐼𝑡 /𝑙𝑃 𝐷𝐼𝑡−1 ) will lead on average to a 0.31%
increase in 𝑙𝑛(𝑙𝑃 𝐶𝐸𝑡 /𝑙𝑃 𝐶𝐸𝑡−1 ).
This is the short-run consumption–income elasticity. The long-run value is given by the cointegrating regres-
sion Eq. (Ex 2), which is about 0.77. The coefficient of the error-correction term of about –0.06 suggests
that only about 6% of the discrepancy between long-term and short-term PCE is corrected within a quarter,
suggesting a slow rate of adjustment to equilibrium. One reason the rate of adjustment seems low is that
our model is rather simple. If we had the necessary data on interest rate, wealth of consumer, and so on,
probably we might have seen a different result.

Cointegration in multiple equations and the Johansen approach


It was mentioned earlier that if there are more than two variables in the model, there is a possibility of
having more than one cointegrating vector. This means that the variables in the model might form several
equilibrium relationships governing the joint evolution of all the variables. In general, for 𝑛 number of
variables there can be only up to 𝑛 − 1 cointegrating vectors. Therefore, when 𝑛 = 2, which is the simplest
case, if cointegration exists then the cointegrating vector is unique.
Having 𝑛 > 2 and assuming that only one cointegrating relationship exists where there are actually more
than one is a serious problem that cannot be resolved by the EG single-equation approach. Therefore, an

19
alternative to the EG approach is needed, and this is the Johansen approach for multiple equations.
To present this approach, it is useful to extend the single-equation error-correction model to a multivariate
one. Let us assume that we have three variables, 𝑌𝑡 , 𝑋𝑡 and 𝑊𝑡 which can all be endogenous; that is we
have it that (using matrix notation for Zt = [𝑌𝑡 𝑋𝑡 𝑊𝑡 ])

Zt = A1 Zt−1 + A2 Zt−2 + ⋯ + Ak Zt−k + ut (38)

This is comparable to the single-equation dynamic model for two variables 𝑌𝑡 and 𝑋𝑡 . Thus, it can be
reformulated in a vector error-correction model (VECM) as follows:

�Zt = �1 �Zt−1 + �2 �Zt−2 + ⋯ + �k−1 �Zt−k−1 + ��Zt−1 + ut (39)

where �i = (I − A1 − A2 − ⋯ − Ak ) (𝑖 = 1, 2, ⋯ , 𝑘 − 1) and � = −(I − A1 − A2 − ⋯ − Ak ).


Here we need to examine carefully the 3 × 3 � matrix. (The Π matrix is 3 × 3 because we assume three
variables in Zt = [Yt , Xt , Wt ]). The � matrix contains information regarding the long-run relationships. We

can decompose � = 𝛼𝛽 where 𝛼 will include the speed of adjustment to equilibrium coefficients while 𝛽 will
be the long-run matrix of coefficients.

Therefore, the 𝛽 Zt−1 term is equivalent to the error-correction term (𝑌𝑡−1 − 𝛽0 − 𝛽1 𝑋𝑡−1 ) in the single-

equation case, except that now 𝛽 Zt−1 contains up to (𝑛 − 1) vectors in a multivariate framework.
For simplicity, we assume that 𝑘 = 2, so that we have only two lagged terms, and the model is then the
following:

Δ𝑌𝑡 Δ𝑌𝑡−1 𝑌𝑡−1


⎡ Δ𝑋 ⎤ = Γ ⎡ Δ𝑋 ⎤ + Π ⎡ 𝑋 ⎤ + 𝑒 (40)
⎢ 𝑡⎥ 1⎢ 𝑡−1 ⎥ ⎢ 𝑡−1 ⎥ 𝑡
⎣Δ𝑊𝑡 ⎦ ⎣Δ𝑊𝑡−1 ⎦ ⎣𝑊𝑡−1 ⎦

This can also be written as:

Δ𝑌𝑡 Δ𝑌𝑡−1 𝑎11 𝑎12 𝑌


⎡ Δ𝑋 ⎤ = Γ ⎡ Δ𝑋 ⎤ + ⎡𝑎 𝛽11 𝛽21 𝛽31 ⎡ 𝑡−1 ⎤
⎢ 𝑡⎥ 1⎢ 𝑡−1 ⎥ ⎢ 21 𝑎22 ⎤
⎥ [𝛽 ] ⎢ 𝑋𝑡−1 ⎥ + 𝑒𝑡 (41)
𝛽22 𝛽32
⎣Δ𝑊𝑡 ⎦ ⎣Δ𝑊𝑡−1 ⎦ ⎣𝑎31 𝑎32 ⎦ 12 ⎣𝑊𝑡−1 ⎦

Let us now analyse only the error-correction part of the first equation (that is for Δ𝑌𝑡 on the left-hand side),
which gives:

𝑌𝑡−1
Π1 𝑍𝑡−1 = [[𝑎11 𝛽11 + 𝑎12 𝛽12 ] [𝑎11 𝛽21 + 𝑎12 𝛽22 ] [𝑎11 𝛽31 + 𝑎12 𝛽32 ]] ⎡ 𝑋
⎢ 𝑡−1 ⎥
⎤ (42)
⎣𝑊𝑡−1 ⎦

where Π1 is the first row of Π matrix.


Equation (42) can be rewritten as:

Π1 𝑍𝑡−1 = 𝑎11 (𝛽11 𝑌𝑡−1 + 𝛽21 𝑋𝑡−1 + 𝛽31 𝑊𝑡−1 ) + 𝑎12 (𝛽12 𝑌𝑡−1 + 𝛽22 𝑋𝑡−1 + 𝛽32 𝑍𝑡−1 ) (43)

which shows clearly the two cointegrating vectors with their respective speed of adjustment terms 𝑎11 and
𝑎12 .

20
Advantages of the multiple-equation approach
So, from the multiple-equation approach we can obtain estimates for both cointegrating vectors from equation
(43), while with the simple equation we have only a linear combination of the two long-run relationships.
Also, even if there is only one cointegrating relationship (for example the first only) rather than two,
with the multiple-equation approach we can calculate all three differing speeds of adjustment coefficients
(𝑎11 𝑎21 𝑎31 ).
Only when 𝑎21 = 𝑎31 = 0, and only one cointegrating relationship exists, can we then say that the multiple-
equation method is the same (reduces to the same) as the single-equation approach, and therefore, there is
no loss from not modelling the determinants of Δ𝑋𝑡 and Δ𝑊𝑡 . Here, it is good to mention too that when
𝑎21 = 𝑎31 = 0, this is equivalent to 𝑋𝑡 and 𝑊𝑡 being weakly exogenous.
So, summarizing, only when all right-hand variables in a single equation are weakly exogenous does the
single-equation approach provide the same result as a multivariate equation approach.

The Johansen approach


Let us now go back and examine the behaviour of the Π matrix under different circumstances. Given that
Zt is a vector of non-stationary 𝐼(1) variables, then �Zt−1 are 𝐼(0) and ΠZt−1 must also be 𝐼(0) in order to
have that 𝑢𝑡 ∼ 𝐼(0) and therefore to have a well-behaved system.
In general, there are three cases for ΠZt−1 to be 𝐼(0):

Case 1
When all the variables in Zt are stationary. Of course, this case is totally uninteresting since it implies that
there is no problem of spurious regression and the simple VAR in levels model can be used to model this
case.

Case 2
When there is no cointegration at all and therefore the Π matrix is an 𝑛 × 𝑛 matrix of zeros because there
are no linear relationships among the variables in Zt . In this case the appropriate strategy is to use a VAR
model in first differences with no long-run elements as a result of the non-existence of long-run relationships.

Case 3
When there exist up to (𝑛 − 1) cointegrating relationships of the form 𝛽 ′ Zt−1 ∼ 𝐼(0). In this particular
case, 𝑟 ≤ (𝑛 − 1) cointegrating vectors exist in 𝛽. This simply means that 𝑟 columns of 𝛽 form 𝑟 linearly
independent combinations of the variables in Zt , each of which is stationary. Of course, there will also be
(𝑛 − 𝑟) common stochastic trends underlying 𝑍𝑡 .
Recall that Π = 𝛼𝛽 ′ and so in case 3 above, while the matrix will always be dimensioned 𝑛 × 𝑛, the 𝛼 and 𝛽
matrices will be dimensioned 𝑛 × 𝑟. This, therefore, imposes a rank of 𝑟 on the matrix, which also imposes
only 𝑟 linearly independent rows in this matrix. So underlying the full size matrix is a restricted set of only

𝑟 cointegrating vectors given by 𝛽 Zt−1 . Reduced rank regression, of this type, has been available in the
statistics literature for many years, but it was introduced into modern econometrics and linked with the
analysis of non-stationary data by Johansen (1988).
Going back to the three different cases considered above regarding the rank of the matrix Π we have:
• Case 1: When Π has a full rank (that is there are 𝑟 = 𝑛 linearly independent columns) then the
variables in Zt are 𝐼(0).
• Case 2: When the rank of Π is zero (that is there are no linearly independent columns) then there are
no cointegrating relationships.

21
• Case 3: When Π has a reduced rank (that is there are 𝑟 ≤ (𝑛 − 1) linearly independent columns) and
therefore there are 𝑟 ≤ (𝑛 − 1) cointegrating relationships.
Johansen (1988) developed a methodology that tests for the rank of Π and provides estimates of 𝛼 and 𝛽
through a procedure known as reduced rank regression, but the actual procedure is quite complicated.

The steps of the Johansen approach in practice


Step 1: testing the order of integration of the variables
As with the EG approach, the first step in the Johansen approach is to test for the order of integration of
the variables under examination. It was noted earlier that most economic time series are non-stationary, and
therefore, integrated. Indeed, the issue here is to have non-stationary variables in order to detect among
them stationary cointegrating relationship(s) and avoid the problem of spurious regressions. It is clear that
the most desirable case is when all the variables are integrated of the same order, and then to proceed with
the cointegration test. However, it is important to stress that this is not always the case, and that even
in cases where a mix of 𝐼(0), 𝐼(1) and 𝐼(2) variables are present in the model, cointegrating relationships
might well exist. The inclusion of these variables, though, will massively affect researchers’ results and more
consideration should be applied in such cases.
Consider, for example, the inclusion of an 𝐼(0) variable. In a multivariate framework, for every 𝐼(0) variable
included in the model the number of cointegrating relationships will increase correspondingly. We stated
earlier that the Johansen approach amounts to testing for the rank of Π (that is finding the number of linearly
independent columns in Π), and since each 𝐼(0) variable is stationary by itself, it forms a cointegrating
relationship by itself and therefore forms a linearly independent vector in Π.
Matters become more complicated when we include 𝐼(2) variables. Consider, for example, a model with the
inclusion of two 𝐼(1) and two 𝐼(2) variables. There is a possibility that the two 𝐼(2) variables cointegrate
down to an 𝐼(1) relationship, and then this relationship may further cointegrate with one of the two 𝐼(1)
variables to form another cointegrating vector. In general, situations with variables in differing orders of
integration are quite complicated, though the positive thing is that it is quite common in macroeconomics
to have 𝐼(1) variables. Those who are interested in further details regarding the inclusion of 𝐼(2) variables
can can refer to Johansen’s (1995b) paper, which develops an approach to treat 𝐼(2) models.

Step 2: Setting the appropriate lag length of the model


The issue of finding the appropriate (optimal) lag length is very important because we want to have Gaussian
error terms (that is standard normal error terms that do not suffer from non-normality, autocorrelation,
heteroskedasticity and so on). Setting the value of the lag length is affected by the omission of variables
that might affect only the short-run behaviour of the model. This is because omitted variables instantly
become part of the error term. Therefore, very careful inspection of the data and the functional relationship
is necessary before proceeding with estimation, to decide whether to include additional variables. It is quite
common to use dummy variables to take into account short-run ‘shocks’ to the system, such as political
events that had important effects on macroeconomic conditions. The most common procedure in choosing
the optimal lag length is to estimate a VAR model including all our variables in levels (non-differenced data).
This VAR model should be estimated for a large number of lags, then reducing down by re-estimating the
model for one lag less until zero lags are reached (that is we estimate the model for 12 lags, then 11, then
10 and so on until we reach 0 lags).
In each of these models we inspect the values of the AIC and the SBC criteria, as well as the diagnostics
concerning autocorrelation, heteroskedasticity, possible ARCH effects and normality of the residuals. In
general the model that minimizes AIC and SBC is selected as the one with the optimal lag length. This
model should also pass all the diagnostic checks.

22
Step 3: Choosing the appropriate model regarding the deterministic components
in the multivariate system
Another important aspect in the formulation of the dynamic model is whether an intercept and/or a trend
should enter either the short-run or the long-run model, or both models. The general case of the VECM,
including all the various options that can possibly arise, is given by the following equation:

Δ𝑍𝑡 = Γ1 Δ𝑍𝑡−1 + ⋯ + Γ𝑘−1 Δ𝑋𝑡−𝑘−1 + 𝛼(𝛽𝑍𝑡−1 𝜇1 1 𝛿1 𝑡) + 𝜇2 + 𝛿2 𝑡 + 𝑢𝑡 (44)

And for this equation we can see the possible cases. We can have a constant (with coefficient 𝜇1 ) and/or
a trend (with coefficient 𝛿1 ) in the long-run model (the cointegrating equation (CE)), and a constant (with
coefficient 𝜇2 ) and/or a trend (with coefficient 𝛿2 ) in the short-run model (the VAR model).
In general, five distinct models can be considered. While the first and the fifth models are not that realistic,
all of them are presented for reasons of complementarity.
• Model 1: No intercept or trend in CE or VAR (𝛿1 = 𝛿2 = 𝜇1 = 𝜇2 = 0)
In this case there are no deterministic components in the data or in the cointegrating relations. However,
this is quite unlikely to occur in practice, especially as the intercept is generally needed to account for
adjustments in the units of measurements of the variables in (𝑍𝑡−1 1 𝑡).
• Model 2: Intercept (no trend) in CE, no intercept or trend in VAR (𝛿1 = 𝛿2 = 𝜇2 = 0)
This is the case where there are no linear trends in the data, and therefore the first differenced series have a
zero mean. In this case, the intercept is restricted to the long-run model (that is the cointegrating equation)
to account for the unit of measurement of the variables in (𝑍𝑡−1 1 𝑡).
• Model 3: Intercept in CE and VAR, no trends in CE and VAR (𝛿1 = 𝛿2 = 0)
In this case there are no linear trends in the levels of the data, but both specifications are allowed to drift
around an intercept. In this case, it is assumed that the intercept in the CE is cancelled out by the intercept
in the VAR, leaving just one intercept in the short-run model.
• Model 4: Intercept in CE and VAR, linear trend in CE, no trend in VAR (𝛿2 = 0)
In this model a trend is included in the CE as a trend-stationary variable, to take into account exogenous
growth (that is technical progress). We also allow for intercepts in both specifications while there is no trend
in the short-run relationship.
• Model 5: Intercept and quadratic trend in the CE intercept and linear trend in VAR
This model allows for linear trends in the short-run model and thus quadratic trends in the CE. Therefore,
in this final model, everything is unrestricted. However, this model is very difficult to interpret from an
economics point of view, especially since the variables are entered as logs, because a model like this would
imply an implausible ever-increasing or ever-decreasing rate of change.
So the problem is, which of the five different models is appropriate in testing for cointegration. It was noted
earlier that model 1 and model 5 are not that likely to happen,and that they are also implausible in terms
of economic theory, therefore the problem reduces to a choice of one of the three remaining models (models
2, 3 and 4). Johansen (1992) suggests that the joint hypothesis of both the rank order and the deterministic
components need to be tested, applying the so-called Pantula principle. The Pantula principle involves
the estimation of all three models and the presentation of the results from the most restrictive hypothesis
(that is, 𝑟 = number of cointegrating relations = 0 and model 2) to the least restrictive hypothesis (that
is, 𝑟 = number of variables entering the VAR − 1 = 𝑛 − 1 and model 4). The model-selection procedure
then comprises moving from the most restrictive model, at each stage comparing the trace test statistic to
its critical value, and stopping only when it is concluded for the first time that the null hypothesis of no
cointegration is not rejected.

23
Step 4: Determining the rank of Π or the number of cointegrating vectors
According to Johansen (1988) and Johansen and Juselius (1990), there are two methods (and corresponding
test statistics) for determining the number of cointegrating relations, and both involve estimation of the
matrix Π. This is a 𝑘 × 𝑘 matrix with rank 𝑟. The procedures are based on propositions about eigenvalues.
(a) One method tests the null hypothesis, that 𝑅𝑎𝑛𝑘(Π) = 𝑟 against the hypothesis that the rank is 𝑟 + 1.
So the null in this case is that there are cointegrating vectors and there are up to 𝑟 cointegrating
relationships, with the alternative suggesting that there are (𝑟 + 1) vectors. The test statistics are
based on the characteristic roots (also called eigenvalues) obtained from the estimation procedure.
The test consists of ordering the largest eigenvalues in descending order and considering whether
they are significantly different from zero. To understand the test procedure, suppose we obtained
𝑛 characteristic roots denoted by 𝜆1 > 𝜆2 > 𝜆3 > ⋯ > 𝜆𝑛 . If the variables under examination are
not cointegrated, the rank of Π is zero and all the characteristic roots will equal zero. Therefore,
(1 − 𝜆̂ 𝑖 ) will be equal to 1 and, since 𝑙𝑛(1) = 0, each of the expressions will be equal to zero for no
cointegration. On the other hand, if the rank of Π is equal to 1, then 0 < 𝜆1 < 1 so that the first
expression is (1 − 𝜆̂ 𝑖 ) < 0, while all the rest will be equal to zero. To test how many of the numbers
of the characteristic roots are significantly different from zero this test uses the following statistic:

𝜆𝑚𝑎𝑥 (𝑟, 𝑟 + 1) = −𝑇 𝑙𝑛(1 − 𝜆̂ 𝑟+1 ) ⋯ (8)

As noted above, the test statistic is based on the maximum eigenvalue and is thus called the maximal
eigenvalue statistic (denoted by �max).
(b) The second method is based on a likelihood ratio test for the trace of the matrix (and because of that
it is called the trace statistic). The trace statistic considers whether the trace is increased by adding
more eigenvalues beyond the rth. The null hypothesis in this case is that the number of cointegrating
vectors is less than or equal to 𝑟. From the previous analysis it should be clear that when all 𝜆̂ 𝑖 = 0,
then the trace statistic is also equal to zero. On the other hand, the closer the characteristic roots are
to unity, the more negative is the 𝑙𝑛(1 − 𝜆̂ 𝑖 ) term, and therefore, the larger the trace statistic. This
statistic is calculated by:
𝑛
𝜆𝑡𝑟𝑎𝑐𝑒 (𝑟) = −𝑇 ∑ 𝑙𝑛(1 − 𝜆̂ 𝑟+1 ) ⋯ (9)
𝑖=𝑟+1

The usual procedure is to work downwards and stop at the value of 𝑟, which is associated with a test
statistic that exceeds the displayed critical value. Critical values for both statistics are provided by
Johansen and Juselius (1990) (these critical values are directly provided from statistical software after
conducting a test for cointegration using the Johansen approach).

Step 5: Testing for weak exogeneity


After determining the number of cointegrating vectors we proceed with tests of weak exogeneity. Remember
that the Π matrix contains information about the long-run relationships, and that Π = 𝛼𝛽 ′ , where 𝛼
represents the speed of adjustment coefficients and 𝛽 is the matrix of the long-run coefficients. From this
it should be clear that when there are 𝑟 ≤ 𝑛 − 1 cointegrating vectors in 𝛽, this automatically means that
at least (𝑛 − 𝑟) columns of 𝛼 are equal to zero. Thus, once the number of cointegrating vectors has been
determined, we should proceed with testing which of the variables are weakly exogenous. A very useful
feature of the Johansen approach for cointegration is that it allows us to test for restricted forms of the
cointegrating vectors. Consider the case given by Equation (3), and from this the following equation:

Δ𝑌𝑡 Δ𝑌𝑡−1 𝑎11 𝑎12 𝑌


⎡ Δ𝑋 ⎤ = Γ ⎡ Δ𝑋 ⎤ + ⎡𝑎 𝛽11 𝛽21 𝛽31 ⎡ 𝑡−1 ⎤
⎢ 𝑡⎥ 1⎢ 𝑡−1 ⎥ ⎢ 21 𝑎22 ⎤
⎥ [𝛽 ] ⎢ 𝑋𝑡−1 ⎥ + 𝑒𝑡 (45)
𝛽22 𝛽32
⎣Δ𝑊𝑡 ⎦ ⎣Δ𝑊𝑡−1 ⎦ ⎣𝑎31 𝑎32 ⎦ 12 ⎣𝑊𝑡−1 ⎦

In this equation it can be seen that testing for weak exogeneity with respect to the long-run parameters is

24
equivalent to testing which of the rows of 𝛼 are equal to zero. A variable 𝑍 is weakly exogenous if it is
only a function of lagged variables, and the parameters of the equation generating 𝑍 are independent of the
parameters generating the other variables in the system. If we think of the variable 𝑌 in Equation (10), it is
clearly a function of only lagged variables but in the general form above the parameters of the cointegrating
vectors (𝛽) are clearly common to all equations and so the parameters generating 𝑌 cannot be independent
of those generating 𝑋 and 𝑊 as they are the same parameters. However, if the first row of the 𝛼 matrix
were all zeros then the 𝛽𝑠 would drop out of the 𝑌 equation and it would be weakly exogenous. So a joint
test that a particular row of 𝛼 is zero is a test of the weak exogeneity of the corresponding variable. If
a variable is found to be weakly exogenous it can be dropped as an endogenous part of the system. This
means that the whole equation for that variable can also be dropped, though it will continue to feature on
the right-hand side of the other equations.

Step 6: Testing for linear restrictions in the cointegrating vectors


An important feature of the Johansen approach is that it allows us to obtain estimates of the coefficients
of the matrices 𝛼 and 𝛽, and then test for possible linear restrictions regarding those matrices. Especially
for matrix 𝛽, the matrix that contains the long-run parameters, this is very important because it allows us
to test specific hypotheses regarding various theoretical predictions from an economic theory point of view.
So, for example, if we examine a money–demand relationship, we might be interested in testing restrictions
regarding the long-run proportionality between money and prices, or the relative size of income and interest-
rate elasticities of demand for money and so on. For more details regarding testing linear restrictions in the
Johansen framework, see Enders (1995) and Harris (1997).

Numerical Example:
Importing the data
library(readxl)
money_ita<-read_excel(path = "money_ita.xls",col_names = T)
attach(money_ita)
str(money_ita)

## tibble [92 x 5] (S3: tbl_df/tbl/data.frame)


## $ obs: chr [1:92] "1975Q1" "1975Q2" "1975Q3" "1975Q4" ...
## $ CPI: num [1:92] 18 18.5 18.9 19.4 20.2 21.5 22.1 23.5 24.3 25.3 ...
## $ GDP: num [1:92] 131 136 141 146 155 168 182 193 203 210 ...
## $ M2 : num [1:92] 96 101 108 114 120 ...
## $ R : num [1:92] 8 7 6 6 12 12 12 15 15 13 ...
library(urca)

summary(ur.df(log(money_ita$GDP),lags = 12,type = "trend", selectlags = "AIC"))

##
## ###############################################
## # Augmented Dickey-Fuller Test Unit Root Test #
## ###############################################
##
## Test regression trend
##
##
## Call:
## lm(formula = z.diff ~ z.lag.1 + 1 + tt + z.diff.lag)
##

25
## Residuals:
## Min 1Q Median 3Q Max
## -0.014366 -0.005235 -0.001217 0.004200 0.025370
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.291e-01 3.918e-02 3.295 0.0015 **
## z.lag.1 -1.698e-02 6.754e-03 -2.515 0.0141 *
## tt 7.776e-05 1.707e-04 0.456 0.6500
## z.diff.lag 3.405e-01 1.057e-01 3.220 0.0019 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.007754 on 75 degrees of freedom
## Multiple R-squared: 0.7562, Adjusted R-squared: 0.7464
## F-statistic: 77.53 on 3 and 75 DF, p-value: < 2.2e-16
##
##
## Value of test-statistic is: -2.5146 11.6235 13.9825
##
## Critical values for test statistics:
## 1pct 5pct 10pct
## tau3 -4.04 -3.45 -3.15
## phi2 6.50 4.88 4.16
## phi3 8.73 6.49 5.47
summary(ur.df(log(money_ita$M2),lags = 12,type = "trend",selectlags = "AIC"))

##
## ###############################################
## # Augmented Dickey-Fuller Test Unit Root Test #
## ###############################################
##
## Test regression trend
##
##
## Call:
## lm(formula = z.diff ~ z.lag.1 + 1 + tt + z.diff.lag)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.025894 -0.008987 0.001470 0.007708 0.033671
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.511e-01 6.155e-02 2.455 0.0164 *
## z.lag.1 -2.141e-02 1.152e-02 -1.858 0.0671 .
## tt -1.439e-05 2.269e-04 -0.063 0.9496
## z.diff.lag 1.437e-01 1.111e-01 1.293 0.2000
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.01311 on 75 degrees of freedom
## Multiple R-squared: 0.4673, Adjusted R-squared: 0.446
## F-statistic: 21.93 on 3 and 75 DF, p-value: 2.679e-10

26
##
##
## Value of test-statistic is: -1.8579 14.0471 12.8675
##
## Critical values for test statistics:
## 1pct 5pct 10pct
## tau3 -4.04 -3.45 -3.15
## phi2 6.50 4.88 4.16
## phi3 8.73 6.49 5.47
summary(ur.df(money_ita$R,lags = 12,type = "trend",selectlags = "AIC"))

##
## ###############################################
## # Augmented Dickey-Fuller Test Unit Root Test #
## ###############################################
##
## Test regression trend
##
##
## Call:
## lm(formula = z.diff ~ z.lag.1 + 1 + tt + z.diff.lag)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.3228 -0.5012 -0.0013 0.3400 3.7010
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.356400 0.791916 2.976 0.00396 **
## z.lag.1 -0.115888 0.041833 -2.770 0.00710 **
## tt -0.017926 0.006190 -2.896 0.00499 **
## z.diff.lag1 -0.007524 0.110264 -0.068 0.94578
## z.diff.lag2 0.139923 0.110167 1.270 0.20808
## z.diff.lag3 0.239659 0.109661 2.185 0.03206 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.9019 on 73 degrees of freedom
## Multiple R-squared: 0.1657, Adjusted R-squared: 0.1086
## F-statistic: 2.9 on 5 and 73 DF, p-value: 0.01918
##
##
## Value of test-statistic is: -2.7703 3.3149 4.8376
##
## Critical values for test statistics:
## 1pct 5pct 10pct
## tau3 -4.04 -3.45 -3.15
## phi2 6.50 4.88 4.16
## phi3 8.73 6.49 5.47
summary(ur.df(diff(log(money_ita$GDP),1),lags = 12,type = "trend",selectlags = "AIC"))

##
## ###############################################

27
## # Augmented Dickey-Fuller Test Unit Root Test #
## ###############################################
##
## Test regression trend
##
##
## Call:
## lm(formula = z.diff ~ z.lag.1 + 1 + tt + z.diff.lag)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.015234 -0.005119 -0.001201 0.004626 0.026595
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.292e-02 7.369e-03 4.468 2.79e-05 ***
## z.lag.1 -5.926e-01 1.243e-01 -4.770 9.03e-06 ***
## tt -3.334e-04 8.061e-05 -4.136 9.24e-05 ***
## z.diff.lag 3.519e-02 1.157e-01 0.304 0.762
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.008065 on 74 degrees of freedom
## Multiple R-squared: 0.2888, Adjusted R-squared: 0.2599
## F-statistic: 10.01 on 3 and 74 DF, p-value: 1.286e-05
##
##
## Value of test-statistic is: -4.7696 7.7073 11.3749
##
## Critical values for test statistics:
## 1pct 5pct 10pct
## tau3 -4.04 -3.45 -3.15
## phi2 6.50 4.88 4.16
## phi3 8.73 6.49 5.47
summary(ur.df(diff(log(money_ita$M2),1),lags = 12,type = "trend",selectlags = "AIC"))

##
## ###############################################
## # Augmented Dickey-Fuller Test Unit Root Test #
## ###############################################
##
## Test regression trend
##
##
## Call:
## lm(formula = z.diff ~ z.lag.1 + 1 + tt + z.diff.lag)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.028640 -0.007972 -0.000224 0.009323 0.033706
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.0326516 0.0078569 4.156 8.61e-05 ***

28
## z.lag.1 -0.7436077 0.1458247 -5.099 2.54e-06 ***
## tt -0.0003539 0.0001006 -3.516 0.000752 ***
## z.diff.lag -0.1123496 0.1146482 -0.980 0.330303
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.01336 on 74 degrees of freedom
## Multiple R-squared: 0.4277, Adjusted R-squared: 0.4045
## F-statistic: 18.44 on 3 and 74 DF, p-value: 4.96e-09
##
##
## Value of test-statistic is: -5.0993 8.8782 13.0874
##
## Critical values for test statistics:
## 1pct 5pct 10pct
## tau3 -4.04 -3.45 -3.15
## phi2 6.50 4.88 4.16
## phi3 8.73 6.49 5.47
summary(ur.df(diff(money_ita$R,1),lags = 12,type = "trend",selectlags = "AIC"))

##
## ###############################################
## # Augmented Dickey-Fuller Test Unit Root Test #
## ###############################################
##
## Test regression trend
##
##
## Call:
## lm(formula = z.diff ~ z.lag.1 + 1 + tt + z.diff.lag)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.6933 -0.3457 0.0113 0.2238 4.3455
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.270968 0.275544 0.983 0.328662
## z.lag.1 -0.763221 0.195772 -3.899 0.000213 ***
## tt -0.006468 0.004996 -1.295 0.199529
## z.diff.lag1 -0.278424 0.164271 -1.695 0.094355 .
## z.diff.lag2 -0.180306 0.115221 -1.565 0.121939
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.948 on 73 degrees of freedom
## Multiple R-squared: 0.5332, Adjusted R-squared: 0.5076
## F-statistic: 20.85 on 4 and 73 DF, p-value: 1.715e-11
##
##
## Value of test-statistic is: -3.8985 5.0767 7.6085
##
## Critical values for test statistics:
## 1pct 5pct 10pct

29
## tau3 -4.04 -3.45 -3.15
## phi2 6.50 4.88 4.16
## phi3 8.73 6.49 5.47
So, at levels the variables are non-stationary, but at first difference, they become stationary. So, they are all
𝐼(1) processes.
• Now create new vector using the log of GDP and M2, and interest rate to perform Johansen test in R
View(money_ita)
money_ita$lgdp<-log(money_ita$GDP)
money_ita$lm2<-log(money_ita$M2)
coint<-money_ita[,5:7]
View(coint)
library(vars)

## Loading required package: MASS


## Loading required package: strucchange
## Loading required package: zoo
##
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
##
## as.Date, as.Date.numeric
## Loading required package: sandwich
## Loading required package: lmtest
VARselect(coint,type = "const")$selection

## AIC(n) HQ(n) SC(n) FPE(n)


## 2 2 1 2
jotest<-ca.jo(coint,type = "trace",ecdet = "trend",K=2,spec = "transitory")
summary(jotest)

##
## ######################
## # Johansen-Procedure #
## ######################
##
## Test type: trace statistic , with linear trend in cointegration
##
## Eigenvalues (lambda):
## [1] 3.619839e-01 1.712078e-01 1.075033e-01 4.223984e-17
##
## Values of teststatistic and critical values of test:
##
## test 10pct 5pct 1pct
## r <= 2 | 10.24 10.49 12.25 16.26
## r <= 1 | 27.14 22.76 25.32 30.45
## r = 0 | 67.58 39.06 42.44 48.45
##
## Eigenvectors, normalised to first column:
## (These are the cointegration relations)

30
##
## R.l1 lgdp.l1 lm2.l1 trend.l1
## R.l1 1.000000 1.0000000 1.0000000 1.000000
## lgdp.l1 -76.792725 -57.0261553 78.4958532 -64.502460
## lm2.l1 94.539417 44.6826350 -88.4201555 24.963193
## trend.l1 1.245786 0.6680202 -0.1901579 1.423052
##
## Weights W:
## (This is the loading matrix)
##
## R.l1 lgdp.l1 lm2.l1 trend.l1
## R.d -0.0044105639 -0.2080534224 -0.059065862 5.964115e-13
## lgdp.d -0.0001674396 0.0012239896 -0.000443017 -9.658481e-15
## lm2.d -0.0005870527 -0.0004233637 0.000282320 1.293889e-14
http://www.econ.uiuc.edu/~econ508/R/e-ta8_R.html https://www.econometrics-with-r.org/16-3-
cointegration.html https://search.r-project.org/CRAN/refmans/egcm/html/egcm.html
https://www.zeileis.org/teaching/AER/Ch-TimeSeries.pdf

31

You might also like