The Cointegrated VAR Methodology

See discussions, stats, and author profiles for this publication at: https://www.researchgate.
net/publication/228920249
The Cointegrated VAR Methodology
Article · January 2005
CITATIONS READS
0 2,894
1 author:
J James Reade
University of Reading
101 PUBLICATIONS 1,831 CITATIONS
SEE PROFILE
All content following this page was uploaded by J James Reade on 05 September 2014.
The user has requested enhancement of the downloaded file.

THE COINTEGRATED VAR METHODOLOGY
JAMES READE
Abstract. Outline of how to go about a cointegration analysis, as provided by

the Summer School in Econometrics at the University of Copenhagen, Summer
2005.
Contents
1. Introduction 2
2. A brief motivation for the cointegrated VAR model 2
3. The vector autoregressive model 3
3.1. The system 3
3.2. Conditional factorisation and weak exogeneity 4
3.3. A first order autoregressive model 4
3.4. A second order autoregressive model 6
3.5. Bi-variate second order vector autorgressive model with deterministic
terms 8
3.6. The unrestricted vector autoregressive model 10
3.7. Estimating the unrestricted VAR in PcGive 12
4. Diagnostic Testing on the unrestricted VAR model 14
4.1. The Assumptions of the VAR model 15
4.2. The test output from PcGive 15
4.3. Other information for diagnosing problems 17
4.4. Solving Diagnosed Problems 18
5. The Cointegrated Vector Autoregressive Model 24
5.1. The Model 24
5.2. Constant and Trend 28
5.3. Dummy variables 31
5.4. Estimation and rank determination 33
5.5. Additional Information on Rank Determination 38
6. Limiting Distributions of the Trace test 42
7. Imposing restrictions and Identification 43
7.1. β restrictions 44
7.2. α restrictions 50
7.3. H-form versus R-form and inputting restrictions in PcGive 50
7.4. Using restrictions to understand the system 51
Thanks be first and foremost to God, without whom none of this would be possible. In addition,
I would like to thank Katerina Juselius, Soren Johansen, Heino Bohn Nielsen and Anders Rahbek
who presented the Econometrics Summer School in Copenhagen in August 2005, and Nick Fawcett
for kindly looking over the notes and pointing out numerous (I’m sure) appalling errors in earlier
drafts of these notes. Thanks also to David Hendry, Jennie Castle and Bent Nielsen amongst
others for discussions in Oxford while forming my ‘knowledge’ of the cointegrating VAR model.
1
2 JAMES READE
7.5. Identification of β 54
7.6. Identification of α and the short-run structure 58
8. Extensions of the CVAR model 61
9. Data 61
9.1. The Economic Problem of Interest 62
9.2. Use of past theoretical and empirical work to derive a list of relevant
variables for inclusion 63
9.3. Other considerations for variable selection 65
9.4. Preparing data in PcGive 69
Appendix A. The Frisch-Waugh Theorem 71
References 71
1. Introduction
These notes accompany two seminars given at Cambridge Econometrics on De-
cember 8th 2005, and attempt to espouse the cointegrated vector autoregressive
(CVAR) methodology (Johansen 1995) as presented during the Econometrics Sum-
mer School at the University of Copenhagen in August 2005. Firstly, a brief
motivation for using the CVAR over other forms of modelling for systems of time-
series variables will be given in Section 2, before the theory behind the CVAR will
be given, helping discussion on the motivation for using such models. This will
be done by first considering the unrestricted vector autoregressive (VAR) model in
Section 3 and the various diagnotic tests that can be performed to ensure that the
vital assumptions underlying the statistical inference hold (Section 7), before the
CVAR will be covered in Section 5 After this, data selection issues will be looked
at in Section 9, and an empirical example will be considered. At each stage, im-
plementing the procedure in PcGive will be described, and also accompanying Ox
jobs for various things will be described.1 Please don’t be put off by the daunting
number of pages in these notes; there are many pictures, and bits of output from
GiveWin and Ox pasted in, taking up the space!
2. A brief motivation for the cointegrated VAR model

Modelling economic data in order to garner some understanding of the economy
is a difficult task, not least because it is very hard to understand exactly which
variables might be important in determining a parameter of interest, such as the
marginal propensity to save, or the investment multiplier. Furthermore, in any
regression model, a vast number of assumptions are made, assumptions which if
violated will adversely affect the inference, giving confusing and potential erroneous
conclusions. A particularly important assumption is that of exogeneity, placing
particular variables on the right hand side of a regression equation. By running a
CVAR on a system of variables, to some extent this problem can be circumvented;
no such conditional factorisation is made before starting; variables can later be
tested for exogeneity, and restricted to be exogenous then.
1These Ox jobs will be made available at the seminars; however if anyone would like copies of
these jobs, email me.
THE COINTEGRATED VAR MODEL 3
Perhaps a more pertinent problem in time-series econometrics is that of non-

stationarity; if a variable is non-stationary inference is adversely affected. A solu-
tion often used in the literature is to difference data until it is stationary; however,
this loses much information on the levels of the data processes. Further, differencing
might not result in a stationary series. The cointegration framework allows a way
around this problem by modelling non-stationary data through linear combinations
of the levels of non-stationary variables that are stationary and called cointegrating
relations. Furthermore, structural breaks appear to be very frequent occurrences
in time-series data; even if one knows a break has taken place, it is not usually
very clear from the data where exactly that break has taken place. Nevertheless,
dummy variables of all types can be incorporated into the analysis to potentially
allow explanations for oddities that otherwise have puzzled theorists. Further,
while work is at an early stage, a technique called ‘dummy saturation’ in single
equation time-series shows promise to help in this area (Hendry & Santos 2005).
The CVAR approach extends the single-equation cointegration methodology into
dynamic systems of time-series variables, enabling the practitioner to explicitly
model the short and long run dependencies between variables in the system without
making a priori assumptions on exogeneity. Furthermore, theoretical economic
models, in particular models of the dynamic stochastic general equilibrium class,
posit systems of variables, some of which are endogenous, some exogenous, and
steady state relations. An example is the model of Pétursson & Sløk (2001), which
considers simply the labour market.2 The system of variables they use includes:
employment (nt ), output (yt ), wages (wt ) and prices (pt ); and one of the steady
state conditions is given as:
(2.1) nt = ϕ0 + ϕ1 yt − ϕ2 (w − p)t .
Later it will be seen that steady state relationships accord very well to the CVAR
model and the cointegrating relations that are modelled therein. Further, the as-
sumptions of endogeneity and exogeneity made for economic models can be explic-
itly tested, and the theoretical relationships such as (2.1) can be tested empirically.
3. The vector autoregressive model

3.1. The system. The vector autoregressive (VAR) model can be written as:
(3.1) Xt = µ0 + µ1 t + Π1 Xt−1 + · · · + Πk Xt−k + ΦDt + εt .
Xt is a p-dimensional data vector at time t:
 
X1,t
 X2,t 
 
(3.2)  ..  .
 . 
Xp,t
The model in (3.1) is specified to have initial values X0 , X−1 , . . . , X−k+1 , the
error terms are assumed to be independently identically Normally distributed,
ε1 , . . . , εt ∼ iidN (0, Ω), deterministic terms D1 , . . . , Dt and the parameters of the
model are p × p matrices (Π1 , . . . , Πk ), along with potentially the p × 1 vectors of
2This model, is discussed in Section 9.2, and the UK labour market is the subject of the empirical
applications in these notes.
4 JAMES READE
means, µ0 , and trend coefficients, µ1 . The assumption of Normality of the errors

εt gives the distribution of Xt :
(3.3) Xt |Xt−1 , . . . , Xt−k ∼ N (µ0 + µ1 t + Π1 Xt−1 + · · · + Πk Xt−k + ΦDt , Σ)
= N (BWt , (Wt0 Wt )−1 Ω).
Here Σ is some function of Ω and the regressors Wt = (1, t, Xt−1 , . . . , Xt−p , Dt ),
while the parameters are grouped into B = (µ0 , µ1 , Π1 , . . . , Πp , Φ). The construc-
tion of the dataset (3.2) will be covered in Section 9 and the deterministic terms
Dt along with intercept and trend terms will be discussed in Section ??.
3.2. Conditional factorisation and weak exogeneity. It is worth pointing out

that in single equation modelling of any particular variable in Xt , one makes the
assumption that the right hand side variables (lags of all the variables in Xt ) are
weakly exogeneous for the parameters of interest, θ, namely the regression coeffi-
cients. One might think of this as splitting the data Xt into two subsets, Yt and
Zt (see Hendry 1995):
(3.4) Xt0 = (Yt0 : Zt0 ).
It is assumed Xt has a probability distribution with parameters Θ: f (Xt , Θ). Re-
gression of Yt on Zt , where for example:
(3.5) Yt = X1,t
(3.6) Zt = (X2,t , . . . , Xp,t ),
involves the following factorisation, which uses Bayes’ theorem:
(3.7) f (Xt , Θ) = f (Yt |Zt , Θ1 )f (Zt , Θ2 ).
The idea is that the variables Zt are weakly exogeneous variables, i.e. they are not
affected by Yt so given interest in the conditional mean of Yt on Zt , there is no
need to model Zt . Zt is said to be weakly exogenous with respect the parameters
of interest, θ, if:
i: θ = g(Θ1 ) alone, where g is some function.
ii: Θ1 and Θ2 are variation free.
This means that the parameters of interest must be a function of the parameter
space of the marginal model alone, i.e. the model of Yt conditional on Zt ; i.e.
there can’t be anything to be learnt about ourthe parameters of interest that isn’t
contained in the conditional density, and secondly the parameters of the conditional
model cannot move with the parameters of the marginal model, which means there
can’t be any feedback from Yt to Zt .
3.3. A first order autoregressive model. Equation (3.1) might look a touch
fearsome at first, and as such one might consider a number of examples that grad-
ually build up to the VAR(k) process and the analysis thereof. Firstly the simple
first order autoregressive (AR(1)) process is considered:
(3.8) xt = ρxt−1 + µ + εt .
Just as one solves a differential equation, this model has a solution, a representation
in terms of all the factors contributing to its determination at any particular point,
which can be found by recursively substituting (3.8) to get:

t
X
(3.9) xt = ρt x0 + ρi (µ + εi ) .
i=1
This is the moving average representation of the process. A number of possible

cases exist for this model, which are determined by what value the autoregressive
parameter ρ takes. Firstly stationarity is considered; a process z1 , . . . , zt is called
(strongly) stationary if its distribution is the same at all points in time; specifically:
D
(3.10) (z1 , . . . , zt ) = (zs , . . . , zt+s ) ∀s.
The process zt is called weakly, or covariance, stationary if:
(3.11) E(zt ) = µ∀t
(3.12) Cov(zt , zt−s ) = γ(s)∀t.
It is usually the latter concept that is emphasised because when Normality of the
residuals εt is assumed, weak stationarity implies strong stationarity.3 If |ρ| < 1,
then the model can be characterised as follows:
t
X µ
E(xt |x0 ) = ρt x0 + ρi µ −→
i=1
1−ρ
Ã 2i ! t
X X σ2
V ar(xt |x0 ) = E ρ2i ε2i = ρ2i σ 2 −→
i=1 i=1
1 − ρ2
Ã t−k ! t−k
X X σ2
Cov(xt , xt−k |x0 ) = E ρ2i ε2i = ρ2i σ 2 −→ .
i=1 i=1
1 − ρ2
Hence the process is stationary asymptotically since there is no time dependence
in the limit, but in small samples it is not stationary as the initial values x0 affect
the moments of the process. If ρ = 1, then (3.9) becomes:
t
X
(3.13) xt = x0 + (µ + εi ) ,
i=1
and so:
E(xt |x0 ) = x0 + tµ −→ ∞
2i
X t
X
V ar(xt |x0 ) = ε2i = σ 2 = tσ 2 −→ ∞.
i=1 i=1
Here the mean and variance are functions of t and hence are not stationary. This
is the unit root case, and corresponds to a large number of economic data series.
The explosive case, where ρ > 1, is not considered, but things will also tend to
infinity. The motivation for the moving average formulation of the autorgressive
model can be seen here; it is very simple to characterise the data process under
consideration. This principle is the same for the more complicated models that will
be introduced later. Also helpful for understanding the derivation of the moving
3This is because the first two moments of the Normal distribution completely characterise it.
6 JAMES READE
average representation in more complicated models, it can be shown that if |ρ| < 1
then (3.8) implies, where L is the lag operator which is defined as Lk xt = xt−k :
(3.14) xt − ρxt−1 = εt
(3.15) xt (1 − ρL) = εt
−1
xt = (1 − ρL) εt
(3.16)
= εt + ρεt−1 + ρ2 εt−2 + . . . ,
which is the infinite moving average (MA(∞)) representation, and the last step, in
(3.16), follows from the formula for the infinite sum of an arithmetic progression.
3.3.1. Impulse response analysis. In this setting, using the moving average repre-
sentation, the impulse response function can be discussed. The question is asked:
if the economy is shocked or impulsed now, where will it be in h periods? This
question is formally written:
t+h
X
(3.17) xt+h = ρh xt + ρi (µ + εi ).
i=t
The expectation of this is:

t+h
X
(3.18) E(xt+h |xt ) = ρh xt + ρi µ.
i=t
The impulse response is defined as:

∂E(xt+h |xt )
(3.19) IR(h) = = ρh xt ,
∂xt
and so if the process is stationary, |ρ| < 1 then IR(h) =⇒ 0, hence the impulse
dies away eventually. If the process has a unit root, ρ = 1, then IR(h) = 1 ∀h,
i.e. the shock cumulates and never dies away. If the process is explosive, |ρ| > 1,
then the impulse response tends to infinity. These three possible cases are shown
in Figure 1, where the simulated process is given in the graph.
Impulse response analysis is often used in empirical studies; however, it should
be noted that the impulse in impulse response analysis such as this is to the residual
of the statistical model, which need not correspond to any economic idea of a shock.
Economic models prescribe a whole plethora of shocks, be they monetary policy
shocks, technology shocks or whatever, but the statistical model errors are what is
left unexplained in the model and so are influenced by lag length misspecification,
omitted variables and other potential ailments of econometric models. Restrictions
might be placed upon the model to help identification of structure so that it accords
to a particular economic theory: however, impulse responses are dependent on the
particular restrictions imposed on the model and hence are far from an exact science.
3.4. A second order autoregressive model. Considering an autoregressive pro-

cess of 2 lags (AR(2)) allows intuition into how the moving average process works
in more complicated systems, and how unit roots and stationarity affect the repre-
sentation. The AR(2) model is written as:
(3.20) xt = π1 xt−1 + π2 xt−2 + εt .
Stationary process
1.0
xt =0.6xt−1 +εt
0.5
0 10 20 30 40 50 60 70 80 90 100
Random walk process
1.0
xt =1xt−1 +εt
0.5
0 10 20 30 40 50 60 70 80 90 100
Explosive process
15
xt =1.03xt−1 +εt
10
0 10 20 30 40 50 60 70 80 90 100
Figure 1. Simulated impulse responses for three possible scenarios
This can be rearranged as:

¡ ¢
(3.21) 1 − π1 L − π2 L2 xt = εt .
Taking (3.21) the characteristic (lag) polynomial is defined, subbing z’s for L’s, as:
(3.22) Π(z) = 1 − π1 z − π2 z 2 = (1 − ρ1 z) (1 − ρ2 z) ,
X X ∞ X∞ ∞
1 1
(3.23) = = ρi1 z i ρi2 z i = cn z n ,
Π(z) (1 − ρ1 z) (1 − ρ2 z) i i n
where the step to summations in (3.23) follows from the definition of the sum to
infinity of a geometric progression, and cn → 0 as n → ∞ as it is a function of
|ρ1 | < 1 and |ρ2 | < 1.4 Hence as long as |ρ1 | < 1 and |ρ2 | < 1, then as described
above in (3.16), a infinite order moving average representation exists:
∞
X
(3.24) xt = cn (µ + εt−n ) ,
n=0
and so expectation, again found reversing the steps using the sum to infinity formula
in (3.23):5
∞
X µ µ
(3.25) E(xt |x0 ) = cn µ = = .
n=0
(1 − ρ1 )(1 − ρ2 ) 1 − π1 − π2
4i.e. the coefficients on a particular error dampen out to zero in the limit, but before then could
be any size, but will be decreasing with time.
5Which can be found more simply by assuming a stationary process in (3.20), taking expectations
and rearranging.
8 JAMES READE
So again the process can be characterised using the moving average process, al-
though arriving at the MA(∞) representation is more complicated.
3.4.1. Impulse response analysis. As before, the impulse response is found from
the moving average representation. However, it is more complicated here because
of the second lag. The form of the impulse response can be known; it will be a
function of cn since the value of the process at t + h can be written as:
∞
X
(3.26) xt+h = cn (µ + εt+h−n )
n=0
= c0 (εt+h − µ) + c1 (εt+h−1 − µ) + . . .
(3.27)
+ ch−1 (εt+1 − µ) + ch (εt − µ) + ch+1 (εt−1 − µ) + . . .
(3.28) = · · · + ch (xt − π1 xt−1 − π2 xt−2 ) + . . .
The process is impulsed at time t and the value of the process at time t + h is
of interest, hence for any given h, from (3.27) is what size is ch , since every other
residual before and after t is set to zero in this analysis. Hence the impulse response
is:
∂
(3.29) IR(h) = E(xt+h |x0 , . . . , xt ) = ch −→ 0.
∂xt
3.5. Bi-variate second order vector autorgressive model with determin-

istic terms. The next thing to consider is a 2-dimensional VAR(2) model with
deterministic terms Dt :
(3.30) Xt = Π1 Xt−1 + Π2 Xt−2 + εt
µ ¶ µ ¶µ ¶
X1,t π1,11 π1,12 X1,t−1
= +
X2,t π1,21 π1,22 X2,t−1
(3.31) µ ¶µ ¶ µ ¶
π2,11 π2,12 X1,t−2 ε1,t
+ .
π2,21 π2,22 X2,t−2 ε2,t
The characteristic polynomial can be defined as, where z is a scalar and πk,ij is the
ij th element of the Πk coefficient matrix:
(3.32)
µ ¶
1 − π1,11 z − π2,11 z 2 −π1,12 z − π2,12 z 2
Π(z) = I2 − Π1 z − Π2 z 2 = .
−π1,21 z − π2,21 z 2 1 − π1,22 z − π2,22 z 2
As in the univariate system, Π(z) can be used to characterise the properties of the
model, this time two processes. The equivalent to solving (3.32) for the roots is to
solve det(Π(z)) = 0, which will give four roots and can be written:
(3.33) det(Π(z)) = (1 − ρ1 z)(1 − ρ2 z)(1 − ρ3 z)(1 − ρ4 z) = 0,
where ρi are functions of Π1 and Π2 . Linear algebra provides:
adj(Π(z))
(3.34) Π(z)−1 = ,
det(Π(z))
where adj(Π(z)) is the adjoint matrix of Π(z), and each element of this is at most
order 2 since the matrix is only 2 × 2. Thus what happens to det(Π(z)) matters
for determining whether or not Π(z)−1 converges as T → ∞. Because the form of

det(Π(z)) in (3.33) means Π(z)−1 takes the form:
P (z)
Π(z)−1 =
(1 − ρ1 z)(1 − ρ2 z)(1 − ρ3 z)(1 − ρ4 z)
 
∞
X ∞
X ∞
X ∞
X
= P (z)  ρi1 z i ρj2 z j ρk3 z k ρm
4 z
m
i=0 j=0 k=0 m=0

∞
X
= Pn∗ z n ,
n=0
where P (z) is a second order function of z which in (3.35) is incorporated into Pn∗ ,
then Pn∗ is exponentially
P∞ convergent if |ρi | < 1, and has a M A(∞) representation,
where Π−1 (z) = i=0 Pi∗ z i :
∞
X
Xt = Pi∗ (ΦDt−i + εt−i ) = Π(L)−1 (ΦDt + εt ) ,
i=0
P∞ P∞
with E(Xt ) = i=0 Pi∗ ΦDt−i and V ar(Xt ) = i=0 Pi∗ ΩPi∗0 . Hence Xt is not
stationary as Dt depends on t, but Xt − E(Xt ) is stationary.
3.5.1. The companion form of a vector autoregressive model. Completing the set of
building block models , there exists a useful way of expressing the VAR(k) process
that is often used; using the VAR(2) model in (3.30), is can be transformed into
companion matrix form:
µ ¶ µ ¶µ ¶ µ ¶
Xt Π1 Π2 Xt−1 ΦDt + εt
(3.35) = +
Xt−1 I2 0 Xt−2 0
(3.36) = ΞX̄t + vt ,
where Ξ, X̄t and vt are suitably defined. Ξ is the companion matrix, and it can be
seen that the potentially p-lagged system has been reduced to a VAR(1) representa-
tion, which is again useful for characterising the model via the MA representation.
The roots of the companion matrix correspond to the roots of the system above,
and are found by solving the eigenvalue problem:
µµ ¶ µ ¶¶
Π1 Π2 I2 0
(3.37) det −ρ = 0,
I2 0 0 I2
which could be equivalently written as:
µ ¶µ ¶ µ ¶µ ¶
Π1 Π2 v1 I2 0 v1
(3.38) =ρ ,
I2 0 v2 0 I2 v2
and using the equations that (3.38) produces:
Π1 v1 + Π2 v2 = ρv1
v1 = ρv2
(3.39) ⇒ Π1 v1 + Π2 ρ−1 v1 = ρv1
and since det(A−ρI) = 0 ⇐⇒ Av = ρIv then (3.39) =⇒ det(ρ−Π1 −Π2 ρ−1 ) = 0,
or:
(3.40) ρ−1 Π1 v1 + Π2 ρ−2 v1 = v1 ⇐⇒ det(I2 − Π1 ρ−1 − Π2 ρ−2 ) = 0.
10 JAMES READE
So if the roots of the characteristic polynomial (ρ) are outside the unit circle, then
the roots of the companion matrix, the solutions to (3.40) ρ−1 , are inside the unit
circle and the system is stationary. This may seem slightly confusing given one
usually looks for roots inside the unit circle; however, it is a polynomial in ρ−1
being solved, and this might help ease confusion. However, the intuition is as in
the simple AR(1) or AR(2) model; the conditions on stationarity thus enable the
MA(∞) representation and so allow characterisation of the more complicated VAR
model.
3.6. The unrestricted vector autoregressive model. Thus by a long tour,

estimation of the unrestricted VAR model for a general system is reached. The
model with two lags is, writing (3.30) again:
Xt = Π1 Xt−1 + Π2 Xt−2 + ΦDt + εt
Now define:
   
Π01 Xt−1
(3.41) B =  Π02  Wt =  Xt−2  ,
Φ0 Dt
giving us
(3.42) Xt = B 0 Wt + εt .
3.6.1. Maximum likelihood estimation of the unrestricted VAR. Maximum likeli-

hood estimation of a set of parameters θ involves firstly defining the likelihood
function for θ. This is the joint density of the variables under consideration,
Xt given the parameters θ. The autoregressive structure of the model means a
sequential factorisation must take place:
T
Y
(3.43) f (Xt , Xt−1 , . . . , X1 , X0 ) = f (Xt |Xt−1 , . . . , Xt−k ).
t=k
Thus the likelihood can be written as:

T
Y
(3.44) L(θ; Xt ) = f (Xt |Xt−1 , . . . , Xt−k ; θ ).
t=k
Then the maximum likelihood estimator of θ given the data Xt is defined as:
(3.45) θ̂ = max L(θ; Xt ),
θ
i.e. the value of θ that, given the assumed distributional form for the parameters
θ and the data Xt , maximises the likelihood function. The likelihood function
can be seen as a measure of plausibility; given the distributional form assumed and
data observed, how plausible is a particular parameter value? Often logarithms are
used in order to make the likelihood function more tractable, because logarithms
are a monotonic transformation. Hence (3.45) might be written as:
(3.46) θ̂ = max log L(θ; Xt ).
θ
In the VAR case, where the residuals are assumed to be Normally distributed with
mean 0 and variance Ω as in (3.1), it can be shown that maximum likelihood
estimation is equivalent to OLS on each equation, and hence:

T
X 2
(3.47) min (Xt − B 0 Wt )
B
t=0
T
X
(3.48) 0= (Xt − B 0 Wt ) Wt0
t=0
T
Ã T
!
X X
0 −1
(3.49) B̂ = Xt Wt Wt Wt0 = MXW MW W,
t=0 t=0
PT
where the product moment matrices are generically defined as MXW = t=0 Xt Wt .
Furthermore:
(3.50) ε̂t = Xt − B̂ 0 Wt−1

T
X
−1
(3.51) Ω̂ = T −1 ε̂t ε̂0t = MXX − MXW MW W MW X ,
t=0
leading to:
Ã " #!
¯ ¯T /2 1 XT
T p/2 ¯ ¯ −1 0
(3.52) Lmax = L(B̂, Ω̂) = (2π) ¯Ω̂¯ exp − tr Ω̂ ε̂t ε̂t
2 t=1
¯ ¯
p¯ ¯
(3.53) L−2/T
max = (2πe) ¯Ω̂¯ .
3.6.2. Hypothesis Testing with the Maximum Likelihood Framework. One benefit
of using maximum likelihood analysis is the ease with which hypotheses can be
tested. A convenient test is the likelihood ratio test. One can impose a particular
hypothesis upon the model, calculate the likelihood value for that restricted model,
and compare it to the unrestricted maximum likelihood estimator. In other words,
one might test the hypothesis:
(3.54) H0 : θ = θ 0 ,
by calculating the following test statistic:

³ ´
(3.55) LR = −2 log L (θ0 ; Xt ) − log L(θ̂; Xt ) ∼ χ2dim(θ) .
The test assesses how plausible the restrictions are. If the restrictions move the
likelihood far from the values of θ that maximise it hence are most plausible and
supported by the data, this suggests the restrictions should be rejected. Restric-
tions on the model can be formed and the testing of them is conceptually quite
straightforward. Restrictions on the estimates B̂ 0 are formed by constructing
matrices R or H, both of which will be discussed in greater detail in Section 7.
Considering the H form for now, the restrictions are imposed by forming ψ = HB,
hence meaning that the following model is estimated:
(3.56) Xt = HB 0 Zt + εt = ψZt + εt .
12 JAMES READE
Estimating the restricted model provides a restricted set of estimators, denoted not
by hats but by checks:
−1
(3.57) ψ̌ = MXZ H (H 0 MZZ H)
−1
(3.58) Ω̌ = MXX − MXZ H (H 0 MZZ H) MZX
From this a likelihood ratio test of restrictions can be written:

³¯ ¯ ¯ ¯´
¯ ¯
(3.59) ln(LR) = − ln ¯Ω̌¯ / ¯Ω̂¯ .
Hence the test statistic is very simple. Thankfully in addition to this, any time-
series econometrics package will calculate the likelihood ratio test of restrictions
imposed.
3.7. Estimating the unrestricted VAR in PcGive. Once the stage of estimat-
ing the unsrestricted VAR has been reached, one ought to have considered which
variables to include in the system, and what is the object of interest (Section 9).
For the examples in these notes, the object of interest is the UK labour market.
At the current stage of research on this, the dataset is, where the variable, its name
in PcGive, and its description are given:
(3.60)
X1t = (y − py )t ympy GDP, expenditure approach, constant prices
X2t = ∆pyt dlypt first difference of GDP Deflator
X3t = (w − pc )t wmpc Real weekly earnings, All activities
X4t = (py − pc )t pwedgey GDP deflator minus CPI
X5t = ert empr Employment rate
X6t = (r − py )t rmpy UK labour force
All variables are in logarithms. Also by this stage, the deterministic terms to
be included should have been decided upon (Section ??); in this model, a trend
is restricted to the cointegrating space because a number of the data series are
trending, and also to possibly accommodate a crude output gap measure. This
entails including a trend and a constant at this stage, and ensuring the constant is
unrestricted. Furthermore, if one has non-seasonally adjusted data, then seasonal
dummies should be included.
There are two methods in PcGive to estimate a system. One can use the PcGive
module, or one can write batch code.
3.7.1. PcGive module. To do this, one should follow:

Model --> Formulate... --> [select variables from database] OK -->
[check Unrestricted VAR box] OK --> OK
At the [select variables from database] stage for a constant one must click
on Constant, Trend, and CSeasonal in the Special box on the right hand side, and
ensure that Constant and CSeasonal have U next to them in the Model window,
and that Trend has nothing next to it (so is unrestricted). This will result in a
large volume of output being printed in the Results window in GiveWin, which is
given in Section 3.7.3.
3.7.2. Batch code. Writing batch code is quite similar to writing an Ox job. Having
a batch file saved for each particular project being working on seems a useful idea,
as one can easily change settings without having to go through various different
windows, and as in Ox, one can write comments on lines to help understand what
is going on. To make a batch code file, one can either firstly run the unrestricted
VAR as in Section 3.7.1 and then open the batch editor in GiveWin6 and click on
“Save as. . . ”, or one can open a new text window and copy and paste the following
into the file:
module("PcGive");
package("PcGive");
usedata("BigDatabase.xls");
system
{
Y = empr, r, wmpc, ympy, pwedgey, dlpyt;
Z = empr_1, empr_2, r_1, r_2, wmpc_1, wmpc_2, ympy_1, ympy_2,
pwedgey_1, pwedgey_2, dlpyt_1, dlpyt_2, Trend;
U = Constant, CSeasonal, CSeasonal_1, CSeasonal_2;
}
estimate("OLS", 1963, 4, 2005, 1);
Then save the file with a “.fl” extension, and run the file by either pressing ctrl + R
or clicking on the button on the toolbar with the little man running with a piece of
paper in his hand. The output should look something like what is in Section 3.7.3.
3.7.3. PcGive output.
SYS( 1) Estimating the system by OLS (using BigDatabase.xls)
The estimation sample is: 1963 (4) to 2005 (1)
URF equation for: lnt

Coefficient Std.Error t-value t-prob
lnt_1 1.54602 0.1156 13.4 0.000
..
.
CSeasonal_2 U -0.000381653 0.0008049 -0.474 0.636
sigma = 0.00306384 RSS = 0.001286038165
URF equation for: llt

lnt_1 0.117226 0.07329 1.60 0.112
..
.
CSeasonal_2 U -0.000313035 0.0005103 -0.613 0.541
sigma = 0.00194229 RSS = 0.0005168287107
URF equation for: wmpc

6This can be done by clicking on the batch button on the GiveWin toolbar (the one with a white
sheet and three red arrows pointing away from it), or pressing “ctrl” and “B” , or going to the
“Tools” menu then “Batch Editor. . . ”.
14 JAMES READE

lnt_1 -0.130782 0.4401 -0.297 0.767
..
.
CSeasonal_2 U -0.000437043 0.003064 -0.143 0.887
sigma = 0.0116637 RSS = 0.0186378236
URF equation for: ympy

lnt_1 1.22135 0.2651 4.61 0.000
..
.
CSeasonal_2 U -0.000545281 0.001846 -0.295 0.768
sigma = 0.00702474 RSS = 0.006760526004
URF equation for: pwedgey

lnt_1 -0.257225 0.2353 -1.09 0.276
..
.
CSeasonal_2 U -0.00314115 0.001638 -1.92 0.057
sigma = 0.00623592 RSS = 0.005327477777
URF equation for: dlpyt

lnt_1 -0.168546 0.2384 -0.707 0.481
..
.
CSeasonal_2 U -0.00382051 0.001660 -2.30 0.023
sigma = 0.00631743 RSS = 0.005467662415
log-likelihood 4063.90453 -T/2log|Omega| 5477.16731

|Omega| 2.19240717e-029 log|Y’Y/T| -44.367709
R^2(LR) 1 R^2(LM) 0.855059
no. of observations 166 no. of parameters 174
4. Diagnostic Testing on the unrestricted VAR model

The unrestricted VAR, as outlined in the last Section, provides useful informa-
tion on the data being considered. As a way of thinking about statistical analysis
of data, Johansen (2004) emphasises that the econometric question to be asked
is “which statistical model describes the data?” Statistical models rely on as-
sumptions, and their properties have been proved based on the existence of these
assumptions. Hence if a particular statistical model is used, then one must test
that these assumptions hold true for the model. The CVAR approach relies on the
maximum likelihood statistical framework, which makes a larger number of assump-

tions than simply running OLS would do, but there are many benefits from using
maximum likelihood, particularly in terms of estimating the CVAR and testing re-
strictions. Thus these assumptions need to be checked on the unrestricted VAR
before proceeding to restricting rank and other parameter values. In this Section,
firstly the assumptions the VAR model rest upon are outlined, then the formal tests
and other methods for diagnosing problems in the model will be discussed, before
methods to attempt to solve these problems will be outlined.
4.1. The Assumptions of the VAR model. The VAR model assumes:
(1) Linear conditional mean explained by past observations and deterministic
terms (see equation (3.3)).
• Check for un-modelled systematic variation (i.e. in residuals and not
in model - so test for heteroskedasticity and autocorrelated errors);
• Choice of lag length;
• Choice of information set (which variables are in Xt ;
• possible outliers (see Section ??);
• non-linearity (odd shaped residual distributions and standardised plots);
• non-constant parameters.
(2) Constance conditional variance.
Check for ARCH effects, and for regime shifts in the variance.
(3) Independent Normal errors, mean zero, variance Ω.
Check for no autocorrelation, distributional form of errors.
Thus one should check these things in the unrestricted VAR before proceeding
to the cointegration analysis, since the fundamental choice made in cointegration
analysis is that of the rank, and testing for this is affected potentially adversely by
failures of particular diagnostic tests. Some test failures have been shown to be
more important than others (see Nielsen & Rahbek 2000): the failure of the tests
of autocorrelation and of Normality are important, ARCH less so. It has also been
shown that near-Normality leaves nice properties for OLS and hence MLE, so that
marginal failures of tests are not disastrous.
The correct choice of lag-length, of deterministic terms, of information set, of

constant and trend all impact upon the outcome of these tests, and final choice
of each will depend on the outcome of the diagnostic checks described in this Sec-
tion; for this reason it is hard to specify a particular ordering of how to go about
diagnostic checking.
4.2. The test output from PcGive. PcGive reports a whole range of diagnostic
tests for each equation in an unrestricted VAR once it has been run (see Sec-
tions 3.7.2–3.7.1); either by clicking on the second from the right button (with a
box and two yellow lines (lightning bolts) entering from the top), or by going to:
Test --> Test Summary
which should give an output looking like, for a system like in (3.30):
x1 : Portmanteau(12): 13.3363
x2 : Portmanteau(12): 9.79324
x1 : AR 1-5 test: F(5,132) = 1.4890 [0.1976]
x2 : AR 1-5 test: F(5,132) = 1.4880 [0.1979]
x1 : Normality test: Chi^2(2) = 2.8495 [0.2406]
16 JAMES READE
x2 : Normality test: Chi^2(2) = 9.2261 [0.0099]**

x1 : ARCH 1-4 test: F(4,129) = 1.3185 [0.2665]
x2 : ARCH 1-4 test: F(4,129) = 2.9587 [0.0223]*
x1 : hetero test: F(27,109)= 0.92128 [0.5812]
x2 : hetero test: F(27,109)= 2.1060 [0.0038]**
x1 : hetero-X test: F(118,18)= 0.57996 [0.9559]
x2 : hetero-X test: F(118,18)= 0.42545 [0.9967]
Vector Portmanteau(12): 549.823

Vector AR 1-5 test: F(180,610)= 1.6514 [0.0000]**
Vector Normality test: Chi^2(12)= 51.138 [0.0000]**
Vector hetero test: F(567,1569)= 1.0953 [0.0912]
Vector hetero-X test: F(2478,126)= +.Inf [0.0000]**
The tests reported can be described as follows, and come from Hendry & Doornik
(2001, Chapter 11):
Portmanteau statistic: This is a goodness-of-fit statistic often called the
Ljung-Box statistic, and is only valid in single equations with no exogenous
variables. It is:
X s
rj2
(4.1) LB(s) = T 2 ,
j=1
T −j
where s is length of correlogram and rj is the j th coefficient of residual au-

tocorrelation. The Ljung-Box test statistic rejects pretty much everything,
and hence it is generally encouraged to ignore this.
AR test: This is the Lagrange-multiplier test for autocorrelation and is an F-
test of the significance of the αi coefficients in the regression of the residuals
on the original variables and lagged residuals:
p
X s
X
(4.2) εt = πj xt−j αi εt−i + vt , 1 ≤ r ≤ s.
j=1 i=r
Normality test: This test is given by, where b1 and b2 are estimates of skew-
ness and kurtosis:
√
T ( b1 )2 T (b2 − 3)2
(4.3) N= + ∼ χ2 (2).
6 24
ARCH test: This test statistic is T R2 where R2 is conventionally defined,
from the regression of εˆt 2 on a constant and εt−1
ˆ 2 to εt−s
ˆ 2
Heteroskedasticity test: This is the White test of heteroskedasticity, and
is an F-test of overall significance on the auxiliary regression of the squared
residuals from the original equation εˆt 2 on the original regressors xi,t−k
and all their squares x2i,t−k . The test is F (s, T − s − 1 − k) where k is
the number of lags, and s is the number of regressors (not including the
constant) in the auxiliary regression.
Heteroskedasticity-x test: This is just as the heteroskedasticity test, but
instead the squared residuals are regressed on the original regressors and
all the cross products of the regressors. Often there are not enough ob-
servations to carry out this test, since it requires regressing on 21 k(k + 1)
regressors.
It is worth noting that in testing there will always be spurious rejections; for ex-
ample at the 5% level of significance, 1 in 20 tests will reject spuriously, and hence
a very good model will have a small number of test failures. However, many
more failures than prescribed by the test size suggests problems that ought to be
investigated.
4.3. Other information for diagnosing problems. In assessing any particular
model, in addition to test results, other information can be useful for understanding
why a particular test might have failed. These methods will now be considered.
4.3.1. Graphical analysis of residuals. If Normality tests have strongly failed, look-
ing at a plot of density of the residuals for each equation of a VAR can be instructive,
as certain, non-Normal, patterns observed indicate particular problems:
Bi-modality: Suggests a break has been missed in the system, since things
are around two or more means;
Skewed distributions: Imply a transformation of the data might be re-
quired, such as taking logs (Section 9.3.2);
Fat tails: Suggest there are many outliers which ought to be taken into ac-
count in the model via indicator variables (see Section 4.4.2).
To get plots of the densities of residuals, go to:
Test --> Graphic Analysis... --> Residual density
and histogram (kernel estimate)
Standardised residual plots (residuals minus their mean, divided by their stan-
dard error ) are also useful in providing an idea about whether outliers are causing
test failures. Plots for each equation can be found in PcGive:
Test --> Graphic Analysis... --> Residuals (scaled)
Because the standardised residuals should then follow a standard Normal distri-
bution, 95% of the mass should lie between ±2 (as this is 2 standard deviations
either side of the mean). A “large” value is somewhat subjective; Juselius &
Hendry (2000) suggest standardised residuals larger than 3.3 in absolute value,
while Nielsen (2004) describes ‘usual practice’ to be absolute standardised resid-
uals of size greater than 3.9. Large residuals suggest outliers, and methods for
dealing with these are discussed in Section 4.4.2.
4.3.2. Recursive Analysis. The world is non-stationary, and hence events happen
that lead to long-term changes in parameter values such as changing monetary
policy regimes or changes of government. One of the assumptions underlying
statistical inference is the constancy of parameters, and hence this should also be
tested. The effects of missing such non-constancies on inference can be indicated
by considering the very simple model:
(4.4) yt = µ + βyt−1 + θdt + εt .
Here, dt = 1{t>ta } , an indicator variable taking the value 1 when t > ta , and zero
otherwise. An OLS regression:
(4.5) yt = µ + βyt−1 + εt ,
would result in an estimate of µ that is some weighted average of its true value
when t ≤ ta , µ, and its true value when t > ta , µ + θ. Thus working out where this
structural break took place, ta , is of paramount importance, in order to include an
18 JAMES READE
appropriate shift dummy. Recursive testing allows a graphical inspection of the

stability of a model and potential identification of where structural breaks occur.
The idea can be expressed using the simple model in (4.4) and (4.5). The OLS
estimate of β is estimated as t increases from a particular reference point. To
convey this, β̂ might be written as, where ( yi−1 | 1) signifies the residuals from the
regression of yi−1 on the constant (see Section A):
Ã t !−1 t
X X
(t) 2 (t)
(4.6) β̂ = ( yi−1 | 1) ( yi | 1)( yi−1 | 1) = Syy 1
/Sy(t)
1 y1
,
i=0 i=0
where β̂ (T ) would be the full sample estimate. One then monitors how much the
estimate of β alters. The reference point for variations, so β doesn’t vary “too”
much over the sample, is to take β0 = β̂ (T ) and compare β̂ (t) to this. There are
various tests, but one simple and effective one is the sup or max test:
¯ ¯
¯ ¯
max ¯β̂ (t) − β0 ¯ ,
t
a test which converges to a complicated distribution called a Brownian bridge, a

bridge by the virtue of the fact it has a fixed start and end point.7 In PcGive,
a number of recursive plots can be made. A data series has been generated as
follows:
(4.7) yt = βyt−1 + δ1{t≥50} + εt ,
with β = 0.8 and δ = 4, hence a four standard deviation break. The series is
plotted in Figure 2 and the recursive plots in Figure 3 are all the plots possible for
the estimator on the lagged dependent variable. When estimating in PcGive, one
must remember to check the box for recursives. The panels in Figure 3 are:
(1) the actual β estimate, with error bands (that assume no structural break);
(2) the t-value;
(3) residual sum of squares;
(4) the residuals yt − βyt−1 .
1/2
(5) standardised innovations vt = (yt − βyt−1 )/ωt , where
0 0 −1
ωt = 1 + yt (yt−2 yt−2 ) yt−1 .
(6) 1-step Chow test statistics:
(RSSt − RSSt−1 )(t − k − 1) v 2 /ωt

= t2 ,
RSSt−1 σ̂t−1
which under the null of constant parameters, is F (1, t − k − 1) distributed.

(7) Break-point Chow tests;
(8) forecast Chow tests.
Hence a range of statistics are available.
4.4. Solving Diagnosed Problems.
7The start point, y , is fixed by assumption, while the end point is fixed since at this point
0
β̂ (t) − β0 = 0
Var1
20
15
10
0 10 20 30 40 50 60 70 80 90 100
Figure 2. Plot of series with 4 standard deviation break after T = 50.
Var1_1 × +/−2SE
75 t: Var1_1
1
50
0 25
20 40 60 80 100 20 40 60 80 100
RSS 5.0 Res1Step
150
100 2.5
0.0
50
−2.5
20 40 60 80 100 20 40 60 80 100
Innovs
4 1up CHOWs 1%
5 3
2
0
1
20 40 60 80 100 20 40 60 80 100
1.0 Ndn CHOWs 1%
1.0 Nup CHOWs 1%
0.5 0.5
20 40 60 80 100 20 40 60 80 100
Figure 3. PcGive Recursive Plots for break of 4 standard devia-

tions after T = 50.
4.4.1. Specifying Lag length and Information Set. Lag length is an important issue,
because choosing wrongly has strong implications for subsequent modelling choices.
If too few lags are chosen, then systematic variation will show up in the residuals
and hence the autocorrelation may have failed, but the penalty if too many lags are
chosen is drastically fewer degrees of freedom, as in a p-dimensional VAR, adding
another lag adds p × p variables. To complicate matters further, adding another
variable to the model (increasing the information set) is often a better strategy than
adding another lag. This could be for two reasons; firstly the systematic variation
in the residuals showing up as autocorrelation need not be because a lag is not
included, but because an important variable is not included, and second because
adding a variable adds k × (2p − 1) variables; so for k = 2 then once p ≥ 4 adding
another variables increases the number of parameters less than adding a lag.
20 JAMES READE
A lag length of 2 is generally encouraged. This is for a number of reasons.

Firstly in macroeconomic data it is hard to imagine agents using information much
further back than 2 quarters in making decisions.8 Secondly, having 2 lags allows
I(2) analysis of the system, which is useful given many series may be I(2) or near-
I(2) in smaller samples. Thirdly, as mentioned, adding extra lags is costly in
degrees of freedom: it is at k = 2 lags that it becomes less costly to include an
additional variable rather than another lag.
A strategy might thus be to begin with 2 lags, and if the autocorrelation test
has failed for a number of the equations in the system, to firstly increase the lag
length to see if this has any effect. Then one might consider additional important
variables that haven’t been modelled. If one cannot eradicate autocorrelation, or
at least bring the test statistics down to values close to the critical levels, then
one should proceed tentatively with great caution, as autocorrelation is a serious
ailment to the CVAR model (Nielsen & Rahbek 2000). It is possible to test for
the significance of additional lags through a LR test - in PcGive, this is done by
estimating first the model with more lags then the reduced model, then using
Model --> Progress...
in the PcGive module. One might do this if the autocorrelation statistics are
not particularly bad to select a lag length, although bearing in mind the strong
arguments for a 2-lag system given above.
4.4.2. Using dummy variables for outliers. As mentioned in Section 4.3.1, the ex-
istence of many outliers in the information set will lead to fat tails in the residual
distribution, and a structural break will lead to bi-modality, and hence the failure of
the Normality test. The use of dummy variables can alleviate both these problems.
Considering first outliers, Nielsen (2004) identifies two types of outliers that exist in
dynamic systems of data; additive and innovational outliers. Thinking in terms of
earlier impulse responses (see Section 3.3.1), innovational outliers are large values
which then have a subsequent effect on the dynamics of the system; so for example
the system is dislodged, the impulse, and it either takes time to settle back to its
mean level (panel 1, the stationary series), or it settles at its new level (panel 2, the
random walk case). On the other hand however, additive outliers have no effect
at all on the dynamics of the system, as shown in panel 3. Innovational outliers
are generally economic events, such as the UK’s exit from the ERM in 1992, which
may show up in residuals of the change in the exchange rate as a large residual
next observation, and then in subsequent observations, the exchange rate adjusts
to this large value before settling back down. Additive outliers are generally things
outside the system, such as typing mistakes in the compilation of the data. Having
identified outliers by an appropriate strategy, one must then consider the nature of
each outlier.
In PcGive and Ox. An Ox program exists (Outliers.ox) that takes the residuals
saved from each equation from PcGive and reports the date and series for each
standardised residual that is greater than a user-specified size, and reports the
standardised residuals that come before and after that observation; one should
proceed as follows: having estimated the unrestricted VAR (Section 3.7.2) then in
the PcGive module follow:
8Data will either be seasonally adjusted or seasonal dummies will be added hence any annual
element to behaviour is in theory removed.
Innovation outlier in a stationary process

1.0
xt =0.6xt−1 +εt
0.5
0 10 20 30 40 50 60 70 80 90 100
Innovation outlier in a random walk process
1.0
xt =1xt−1 +εt
0.5
0 10 20 30 40 50 60 70 80 90 100
Additive outlier
1.0
0.5
0 10 20 30 40 50 60 70 80 90 100
Figure 4. Innovational (first two panels) and additive (third

panel) outliers.
Test --> Store

Residuals etc. in Database... --> [check Residuals box] OK --> OK to
each name --> [in GiveWin go to database window] File --> Save as...
--> [give name] Save --> [in Database Selection window select ONLY
residuals] OK
This should have saved a file with the residuals from the unrestricted VAR in.
Open the Ox job Outliers.ox and ensure the path for the residuals file is correct,
and the start and end periods and data frequency are appropriately declared as
detailed in the program documentation. Run the program, and the output should
look something like:
--------------- Ox at 11:10:52 on 02-Dec-2005 ---------------
Ox version 3.40 (Windows) (C) J.A. Doornik, 1994-2004

Potential outlier in variable Series 1 in observation 1965:1,
or observation (not Ox language) 6
Data around here:
0.84576
1.2999
-5.0352
2.0741
0.30261
..
.
Potential outlier in variable Series 6 in observation 1979:3,
22 JAMES READE
or observation (not Ox language) 64

Data around here:
1.7738
-0.068102
5.0730
-0.98247
1.1192
Thus a list of potential outliers is given. Following this, firstly one should try to
find a plausible explanation for this (poll tax riots, oil crisis, election of Margaret
Thatcher or Ronald Reagan. . . ). If an explanation can be found, this suggests that
the outlier is innovational, but if not then the dummy may well be additive. The
strategy followed in each case is very different:
Innovational Outlier: The next question is which type of dummy should
be included. Three types of dummy are outlined in Section 5.3, of which
two are relevant here; transitory impulse dummies, which look like
(0, . . . , 0, 1, −1, 0, . . . , 0),
and permanent impulse dummies:
(0, . . . , 0, 1, 0, . . . , 0).
The first type address the situation where something extraordinary happens
which is subsequently corrected for by agents; this might be a spending
splurge one period due to an unexpected tax cut before the tax is reimposed,
while the second type refers to the type of incident where a shock has a
long term effect and isn’t corrected for, such as a currency being devalued
in a fixed exchange rate system. Hence now considering the surrounding
standardised residuals, if the one before or after is large and of the opposite
sign, this suggests a transitory impulse dummy should be included for this
observation, whereas if all surrounding standardised dummies are small,
then a permanent impulse dummy is appropriate.
Additive Outlier: The suggested strategy here (Nielsen 2004) is to “clean”
the data before analysis; this is because dummy variables have dynamic
effects on subsequent observations as will be seen in Section 5.3, and so
a dummy variable is not the appropriate method to correct for this. Of
course, one should be very sure this is indeed a mistake in the data, and
should contact the data provider regarding the series.
An economic rationale is quite simple for the second outlier above; it falls in the
second oil crisis around the Iranian revolution in 1979; further, the residuals on ei-
ther side are negligibly sized, and hence as a result one should consider a permanent
impulse dummy. The code would be for this:
dum793p = dummy(1979,3,1979,3); //dlpyt
It is suggested the name for any dummy should include the observation it is for
(793), and the type of dummy it is (p for permanent impulse, t for transitory
impulse dummy); also, commented out, for future reference, one should note which
series the dummy is for (series 6 corresponds to dlpyt). Explaining the second
dummy is not to obvious; some events of note did happen in this year in UK, a
quick search on http://news.bbc.co.uk/onthisday/hi/years/ gives some ideas:
• Winston Churchill died;
• Rhodesia split from the UK;

• the UK’s first oil rig collapsed;
• Beeching recommends closure of a quarter of UK’s railway lines.
Of these, one imagines the latter might help explain; the Beeching report came in
February, and proposed job cuts of 70,000. However, one imagines these closures
didn’t happen immediately. Also on the railway lines though at this time, steam
engines were being phased out, leading to less need for railway workers. However,
due to the rather unsatisfactory nature of these explanations, I have sent an email
to the data provider, the OECD, and await a response. In the meantime, as with
the 1979:3 observation, a permanent impulse response dummy is added, using the
following code:9
dum651p = dummy(1965,1,1965,1); //lnt, lut
This process should be continued for all the potential outliers flagged up by the
Ox job. Having done this, the unrestricted VAR should be re-estimated, and the
same procedure followed; this is because omitting these outliers will have reduced
the standard deviation of the residuals, and hence standardising may expose more
possible outliers. Also it helps to check whether the right dummy variables have
been created and added.
Accounting for these dummies ought to reduce the size of Normality test statistics
to more reasonable sizes. It might nevertheless be the case that the test still fails
once all large outliers have been accounted for, because including a dummy variable
sets the residual at that observation to zero, hence there will be an increased mass
at zero in the distribution, causing the test to again fail. Nevertheless it is likely
the distribution with dummies in the model will be more symmetric than the model
without dummies, and less harmful for inference using the statistical model.
4.4.3. Using dummy variables for structural breaks. It might also be the case the
Normality test continues to fail because a structural break has not been picked
up; further graphical inspection of the densities of residuals will help here, along
with consideration of recursive plots as discussed in Section 4.3.2. Considering
recursive plots should help detect where a mean shift takes place. The response
to uncovering a structural break will be to add a mean shift dummy to the model
(a dummy variable taking zero before the break-point and one afterwards - see
Section 5.3); this may seem slightly odd given that it is likely many parameters
other than the constant have shifted. However, Hendry (1995) emphasises that
it is unmodelled shifts in the constant that are the main source of forecast failure
in economic models. Furthermore, in cointegrating relations the constant is the
equilibrium level that variables correct to, hence it is very important to have this
level correctly specified hence the model coefficients in some parts of the model,
and forecasts made based on the model, will be correcting to the wrong equilibrium
level.
Hendry & Santos (2005) show, in single-equation time series models, both theo-
retically and via simulation, that firstly including a small number of impulse dum-
mies by mistake does not bias coefficient estimates, and that secondly structural
breaks such as the one in (4.4) can be picked up with a high probability by a tech-
nique called ‘dummy saturation’. This entails entering a dummy variable for each
9The Ox job output flagged up 1965:1 twice, for both series 1 and series 2, but only the first and
last potential outliers are reported for brevity.
24 JAMES READE
10.0
7.5
0
5.0
−5 2.5
0.0
−10
0 20 40 60 80 100 0 20 40 60 80 100
−10.0
15
−12.5
−15.0
10
−17.5
5
−20.0
0 20 40 60 80 100 0 20 40 60 80 100
Figure 5. Generated random walk series
observation in stages (by splitting the sample and adding a block of dummies at a
time, be it in halves, or thirds and so on), retaining the significant dummy variables,
and then regressing on a ‘union’ model of all the significant dummies from the vari-
ous splits. My current research aims to extend this work into the VAR framework,
which is naturally much more complicated than the single equation situation.
Nevertheless, shift dummies should be used with caution. One could model
a random walk as a stationary process with sufficiently many level shifts, as a
random walk can produce movements which look like structural breaks, as Figure 5,
which plots four generated random walk series, shows. One should have a strong
justification for a mean-shift dummy, such as a known exogenous events like German
reunification, or a change between exchange rate regimes.
5. The Cointegrated Vector Autoregressive Model

Having estimated the unrestricted VAR, and ensured through diagnostic testing
and checking the existence of the assumptions upon which the Maximum Likeli-
hood and hence cointegrated VAR framework is based, the cointegrated (restricted)
VAR model is introduced. Firstly the model will be outlined (Section 5.1), after
which the inclusion of deterministic terms (constant and trend in Section 5.2, then
dummies in Section 5.3) will be considered, and finally estimation along with rank
determination will be described in Section 5.4.
5.1. The Model. Many transformations of the unrestricted VAR in (3.1) exist;
particularly useful transformations result in equilibrium correction representations,
which glean information from the data. The error correction form of the VAR is:
(5.1) ∆Xt = ΠXt−1 + Γ1 ∆Xt−1 + · · · + Γk−1 ∆Xt−k+1 + ΦDt + εt ,
³P ´ Pk
k
where Π = i=1 Πi − 1 and Γj = i=j+1 Πi . As a simple example, if a VAR(1)
model was estimated:
(5.2) Xt = Π1 Xt−1 + εt ,
then the CVAR transformation is:
(5.3) ∆Xt = ΠXt−1 + εt ,
with Π = Π1 − 1. A VAR(2) model such as in (3.30) would provide a CVAR:
(5.4) ∆Xt = ΠXt−1 + Γ1 ∆Xt−1 + εt ,
where Π = Π1 − 1 + Π2 , and Γ1 = −Π2 .
This transformation has a number of advantages over the simple unrestricted
VAR:
(1) By combining levels and differences, the multicollinearity often present in
macroeconomic data is reduced, as differences are much more orthogonal
than levels are;
(2) It gives a more intuitive explanation of the data, as effects can be cate-
gorised into long run and short run effects.
(3) All the long run information is confined to the Π matrix, hence focus can
be placed on that.
(4) The Γk matrices capture the short run dynamics of the data.
(5) As the data is most likely non-stationary, so Xt ∼ I(1), another advantage
of this representation is that inference is improved by the fact that ∆Xt ∼
I(0).
Following on from the final advantage, there remains a problem, because Xt is still
in the equation, and Xt ∼ I(1) yet ∆Xt , εt ∼ I(0) hence the equation is unbalanced.
Just as in the univariate case, when there is a unit root, the idea is transform the
model in order to put this term to zero. In the multivariate case this corresponds
in effect to setting rows (or columns) of Π to zero, meaning it has a reduced rank of
r < p, where p is the number of variables in Xt . Now any reduced rank matrix can
be factorised into two p × r matrices α and β, such that Π = αβ 0 . Because β 0 is
r × p one sees it matrix-multiplies the Xt−1 vector to provide a linear combination
of the variables of the system. This linear combination β 0 Xt−1 must be stationary
in order for the equation to be balanced, and hence this factorisation provides r
stationary linear combinations of variables, known as cointegrating vectors. Giving
an example, if r = 1 and p = 2, then β is 2 × 1 and so
µ ¶ µ ¶
α1 ¡ ¢ α1
αβ 0 Xt−1 = β1 β2 Xt−1 = (β1 X1t + β2 X2t ) .
α2 α2
Thus the linear combination that is β 0 Xt will remain intact, and will be multiplied
by two constants, α1 and α2 . With this factorisation, (5.1) becomes:
(5.5) ∆Xt = αβ 0 Xt−1 + Γ1 ∆Xt−1 + · · · + Γk−1 ∆Xt−k+1 + ΦDt + εt .
when r > 1, the β matrix is generally spoken of in terms of its rows, and the α
matrix in terms of its columns, and this will happen from now on in these notes.
This is because each row of β constitutes a cointegrating vector, a stationary linear
combination of the variables in Xt , while a column in α describes the reaction of
each variable in Xt to a particular cointegrating vector.
26 JAMES READE
However, this highlights the disadvantage with estimating the cointegrated VAR
model: as with making any restriction, there is the probability the restriction is
incorrectly imposed. To minimise the possibility of this, one must ensure the
diagnostic checks described in Section 4 are satisfied.
As in the simpler examples discussed in Sections 3.3–3.5, it is useful to derive the
solution to the cointegrated VAR model. Considering a VAR(1) model transformed
into CVAR form:
(5.6) ∆Xt = αβ 0 Xt−1 + εt .
Next the orthogonal complement of any given p × r matrix α, denoted α⊥ and of
dimension p × (p − r) is defined to be such that:
0
• α⊥ α=0
• (α, α⊥ ) is of full rank.
The orthogonal complement is fundamentally useful in VAR analysis; it pays to
understand how it is formed, and Section 7 on testing restrictions will enter more
into its form. Firstly note that the CVAR can be written as:
(5.7) β 0 ∆Xt = β 0 αβ 0 Xt−1 + β 0 εt
(5.8) β 0 Xt = (Ip − β 0 α) β 0 Xt−1 + β 0 εt
∞
X i
(5.9) β 0 Xt = (Ip − β 0 α) β 0 εt−i ,
i=0
where (5.9) exists provided the eigenvalues of (Ip − αβ 0 ) lie within the unit circle.
This is a stationary representation of the CVAR, because β 0 Xt are the cointegrat-
ing and hence stationary relations in the system Xt . Continuing, and using the
orthogonal complement of α in (5.9) gives:
0 0
(5.10) α⊥ ∆Xt = α⊥ εt
0 0 0
(5.11) α⊥ Xt = α⊥ Xt−1 + α⊥ εt
t
X
0 0 0
(5.12) α⊥ Xt = α⊥ εi + α⊥ X0 .
i=0
Thus along with a stationary expression in (5.9) a random walk expression (5.12)
can be derived from the CVAR. These two expressions can be brought together
using the following identity:
0 −1 −1
(5.13) Ip = β⊥ (α⊥ β⊥ ) α⊥ + α (β 0 α) β0.
Multiplying through by Xt shows how this can be done:
0 −1 −1
(5.14) Xt = β⊥ (α⊥ β⊥ ) α⊥ Xt + α (β 0 α) β 0 Xt
Ã t
!
−1
X
0 0 0
= β⊥ (α⊥ β⊥ ) α⊥ εi + α⊥ X0
i=0
(5.15) Ã t−1 !
−1
X i
0 0 0 0 t
+ α (β α) (Ip − β α) β εt−i + (Ip − αβ ) X0
i=0
t
X ∞
X
(5.16) =C εi + Ci∗ εt−i + A,
i=0 i=0
Figure 6. Cross plot of two simulated cointegrated series showing

movement around the attractor set.
X1 × X2
5.0
2.5
0.0
−2.5
−5.0
−7.5
−12 −10 −8 −6 −4 −2 0 2 4 6 8
0 −1
0 −1 i
where C = β⊥ (α⊥ β⊥ ) α⊥ , and Ci∗ = α (β 0 α) (Ip − αβ 0 ) β 0 , and A collects the
remaining terms, the initial values. This (5.16) is the Granger representation and
is the moving average representation for the cointegrated VAR. It shows that the
system of variables
Pt under consideration can be broken P∞ down into a random walk
component (C i=0 εi ), a stationary component ( i=0 Ci∗ εt−i ), and initial values,
A. Thus shocks to the system can have both a permanent effect (random walk
component), and/or a transitory effect (the stationary component). The random
0
Pt
walk parts, α⊥ i=0 εt , are the stochastic or common trends in the data, while the
stationary parts, β 0 Xt are the cointegrating relations.
The Granger representation enables impulse response analysis, as the following

can be written from (5.16):
∂E(Xt+h |Xt ) ∂E(Xt+h |Xt )

(5.17) = = C + Ch∗ −→ C,
∂Xt ∂εt
since Ci∗ converges to zero as it is the coefficient on the stationary component.

Hence there is a permanent and a transitory effect of an impulse to the system,
corresponding to the random walk and stationary parts of the process.
The kind of movements in data, and patterns, that resemble cointegrated series
can be seen by looking at Figure 6, which plots two simulated data series against
28 JAMES READE
each other.10 The cointegrating relation in this simulated example, β 0 Xt dictates

movement back to equilibrium if the process has been knocked out of equilibrium by
an error in a particular period. Hence this is correction to an error and forms the
transitory shock part of the system. The equilibrium isP defined as the set of points
t
satisfying β 0 Xt−1 , sp(β⊥ ). The common trends, α⊥ 0
i=0 εt , dictate movement
up and down the attractor set, to which it can be seen there is no correction - the
system simply moves around the equilibrium at that part of the system of variables.
The set of points in Figure 6 can be joined together in time order of occurrence,
and in doing this movement up and down (permanent shocks) and in the region
surrounding the attractor set (transitory shocks) would be observed. An Ox job,
called SimulateSmallCVAR.ox, exists which enables the user to plot the points in
Figure 6 in order of occurrence, gaining some idea of how cointegrated systems
behave.
This kind of analysis is not dissimilar to the analysis of many theoretical macroe-
conomic models which are systems of differential equations, and for which phase
diagrams are plotted and transition paths to equilibrium discussed. For sure this
is a very simplified example, but if there are, say two cointegrating relations and
three variables, then this could be imagined as two attractor sets in the diagram
above, perhaps demand and supply systems, or an IS-LM system. This gives an
alternative way to think about correlations observed between variables in two or
more dimensional systems, as it allows a dynamic aspect to the formation of these
relationships. It is the size of the α coefficients that describe the speed of adjust-
ment back to equilibrium, so systems with higher α coefficients might be thought
to have less spread around the attractor set. However, this ignores the interaction
between the α and β coefficients. Each α coefficient describes the path back to
equilibrium for each dimension of the system. The stronger are the α coefficients
the more they dominate and hence the path back to equilibrium is more along these
directions specified, while if α coefficients are small then the movement back will
be not so large or strong, and also the common trends pushing the variables up
and down the attractor set will have more influence giving a more varied pattern.
The general thing is the smaller is α the less is the movement towards the attractor
set and the less obvious it is that there is even an attractor set. This leads to the
classic statistical inference problem: the existence or not of a particular phenome-
non in the data. Smaller α values mean slower mean reversion and require more
observations, longer datasets, to unearth.
Furthermore, with it’s description of short-run and long-run effects, the cointe-
grated VAR provides a bridge to economic theory, which often posits effects over
such time horizons for policies or actions of agents. This helps motivate use of the
technique.
5.2. Constant and Trend. The important principle to learn is that the level
and trend of an economic process do not translate into what is estimated from an
economic model necessarily, especially given the potent threat of mis-specification
10The series were generated according to the model:
0 1
„ « „ « X1t−1 „ «
∆X1t −0.25 ` ´ ε1t
= 1 −0.7 1 @ X2t−1 A + ,
∆X2t 0.25 ε2t
1
i.e. the model has a constant in the cointegrating relations, and has p = 2 and r = 1.
and non-stationarity. A univariate example is given; the data generating process

is:
(5.18) xt = µ0 + γt + ut
(5.19) ut = ρut−1 + εt ,
where E(ut ) = 0 and V ar(ut ) = σ 2 /(1 − ρ2 ) if |ρ| < 1, while E(ut ) = u0 and
V ar(ut ) = tσ 2 if ρ = 1. The two equations can be combined to get a model in
autoregressive form:
(5.20) xt = ρxt−1 + (1 − ρ)µ0 − γρ(t − 1) + γt + εt
(5.21) xt = b0 + b1 xt−1 + b2 t + εt ,
but the trend in this model (5.21) is far different from what it is in the process
(5.19). Furthermore it is dependent on what ρ is, and if ρ = b1 = 1 then the mode
has a unit root, and is:
(5.22) xt = xt−1 + γ + εt
t
X
(5.23) xt = εi + γt + x0 .
i=0
So E(xt ) = γt + x0 and E(∆xt ) = γ, hence γ translates as the trend of the model.

The more general form of this in the CVAR can be illustrated with a simple
model with one lag and a constant. The principle applies the same if a trend is
involved. The model is as before (5.5) but simplified:
(5.24) ∆Xt = αβ 0 Xt−1 + µ0 + εt ,
and the Granger representation is:
t
X ∞
X
(5.25) Xt = C (εi + µ0 ) + Ci∗ (εt−i + µ0 ) + A
i=0 i
∞
X
(5.26) ∆Xt = C (εt + µ0 ) + Ci∗ ∆εt−i ,
i
where the second line comes from differencing the Granger representation. Then:
(5.27) E(∆Xt ) = Cµ0 = γ0 .
Considering next the stationary component of the CVAR (5.24) is transformed into:
(5.28) Xt = (I − αβ 0 ) Xt−1 + µ0 + εt
(5.29) β 0 Xt = (I − β 0 α) β 0 Xt−1 + β 0 µ0 + εt
∞
X i
(5.30) β 0 Xt = (I − β 0 α) β 0 (εt−i + µ0 ) ,
i=0
where the last line follows from the fact the second line is stationary. Then taking
expectations:
∞
X i
(5.31) E (β 0 Xt ) = (I − β 0 α) β 0 µ0
i=0
−1
(5.32) = (β 0 α) β 0 µ0 = β0 ,
30 JAMES READE
where the last line is found using the formula for a geometric progression. Then
considering the formula for the skew projection given in (5.13) applied to µ0 :
−1 −1
(5.33) µ0 = α (β 0 α) β 0 µ0 + β⊥ (α⊥
0
β) 0
α⊥ µ0 = αβ0 + γ0 ,
where β0 and γ0 are defined in (5.27) and (5.32). Equation (5.33) can then be used
to write (5.24) as:
(5.34) ∆Xt − γ0 = α (β 0 Xt−1 + β0 ) + εt .
This is the equilibrium correction model written in deviations from equilibrium.
From (5.27), γ0 is the equilibrium growth rate while from (5.32) β0 is the equilibrium
mean. This shows that the deterministic terms in the CVAR are not as simple
as in the individual economic processes. This is a simple example of the general
result that from the CVAR in (5.5):
k−1
X
(5.35) ∆Xt = αβ 0 Xt−1 + Γi ∆Xt−i + ΦDt + εt
i=0
k−1
X
(5.36) =⇒ E(∆Xt ) = αE(β 0 Xt−1 ) + Γi E(∆Xt−i ) + ΦDt
i=0
=⇒ ∆Xt − E(∆Xt ) = αβ 0 Xt−1 − αE(β Xt−1 ) 0
k−1
X k−1
X
(5.37)
+ Γi ∆Xt−i − Γi E(∆Xt−i ) + εt ,
i=0 i=0
which gives the same form, and E(β 0 Xt−1 ) and E(∆Xt ) have the intuitive interpre-
tations as disequilibrium mean, and growth rates respectively. There then follow
based on (5.33) five possible cases for the constant and trend in the CVAR:
k−1
X
(5.38) ∆Xt = αβ 0 Xt−1 + Γi ∆Xt−i + µ0 + µ1 t + ΦDt + εt
i=0
(1) No deterministics µ0 = µ1 = 0 hence E(β 0 Xt ) = E(∆Xt ) = 0. No

growth in data and equilibrium term has zero mean. This case means the
model is not invariant to measurement units. Not usually used. Might be
written:
(5.39) ∆Xt = αβ 0 Xt−1 + εt .
This method of writing the CVAR follows on from Nielsen & Rahbek (2000)
who considered how to minimise the impact on the trace test of nuisance
variables.
0
(2) Restricted constant α⊥ µ0 = 0 and µ1 = 0. Hence the mean in the data
0 0
levels is knocked out by α⊥ , since α⊥ C = 0, leaving just the constant in
the cointegrating relations. Written as:
∆Xt = α (β 0 Xt−1 + β0 ) + εt
µ ¶0 µ ¶
β Xt−1
(5.40) =α + εt .
β0 1
(3) Unrestricted constant Only µ1 = 0. Constant can appear in both

equation and in cointegrating relation.
(5.41) ∆Xt = αβ 0 Xt−1 + µ0 + εt .
This case allows for trends in the data series, as can be seen by considering
the Granger representation under the assumption αβ 0 = 0:11
t
X
(5.42) Xt = εi + µ 0 t + x 0
i=0
0
(4) Restricted trend and unrestricted constant α⊥ µ1 = 0, i.e. trend
restricted to cointegrating relations. Get constant in data (i.e. growth)
and trend in cointegrating relations.
µ ¶0 µ ¶
β Xt−1
(5.43) ∆Xt = α + θ + εt ,
β1 t
(5) Unrestricted constant and trend µ0 and µ1 unrestricted. Written:

(5.44) ∆Xt = αβ 0 Xt−1 + µ0 + µ1 t + εt .
Using the Granger representation, again assuming αβ 0 = 0:
t
X
(5.45) Xt = +µ0 t + µ1 t(t + 1) + x0 .
i=0
Case 1 doesn’t give invariance to units of measurement, which is important in

macroeconomic data, hence is often discouraged. Case 5 implies a quadratic trend
from (5.45), which is hard to economically motivate. As such, cases 2,3, and 4 are
usually of more interest. Case 2 would be used if there were no clearly trending data
series, and theoretical objections/no theoretical basis for a trend. Case 3 allows
one to model trends that appear in the data but one wouldn’t expect they appeared
in the cointegrating relations. Case 4 ought to be used when there are clear rea-
sons why one would expect a trend, or that the data series are trending. Further,
Doornik, Hendry & Nielsen (1998) show that this deterministic set-up is helpful as
omitting these deterministic terms can lead to substantial mis-specification bias,
whereas including these variables even if the true data generating process has nei-
ther constant nor trend gives good size and power properties for the trace test of
cointegrating rank (see Section 7).
5.3. Dummy variables. Dummy variables have already been discussed in Sec-
tion 4.4.2 as a method for dealing with extreme observations and structural breaks.
The CVAR with dummy variables in is written:
(5.46) ∆Xt = αβ 0 Xt−1 + ΦDt + εt .
If d dummy variables are specified, Dt is a d × 1 vector of the dummies, while Φ
is a p × d matrix of coefficients; all else is as before. It is important to realise the
impact dummy variables have on the CVAR, through the levels and differences,
11So r = 0, no cointegrating relations; this is assumed to make the exposition clearer.

32 JAMES READE
and the permanent and transitory effects. A first idea at the effects can be gleaned
from the Granger representation from (5.16), slightly rewritten:
t
X
(5.47) Xt = C (εi + ΦDi ) + C(L) (εt + ΦDt ) + A,
i=1
where A are the initial values and:
0 −1 0
(5.48) C = β⊥ (α⊥ β⊥ ) α⊥ ,
∞
X
−1 i
(5.49) C(L) = α (β 0 α) (I + β 0 α) β 0 Li .
i=0
The summations in (5.47) show that deterministic terms cumulate in the CVAR.
There are two summations in (5.47), the first one for the random walk component,
the effects of which are permanent as with the impulse response analysis considered
in Section 3.3.1, while the second one is for the stationary component, where the
effects of a dummy will fade. Three types of dummy variable might be identified:
Transitory Impulse: d = (0, . . . , 0, 1, −1, 0, . . . , 0),
∆d = (0, . . . , 0, 1, −2, 1, 0, . . . , 0),
Pt
i=0 di = 0.
Permanent Impulse: d = (0, . . . , 0, 1, 0, . . . , 0),
∆d = (0, . . . , 0, 1, −1, 0, . . . , 0),
Pt
i=0 di = 1.
Mean-shift Dummy: d = (0, . . . , 0, 1, 1, . . . , 1), ∆d = (0,
. . . , 0, 1, 0, . . . , 0),
Pt Pt
i=0 di = i=0 1{i>Tb } = Tb ,.
The reason the differenced dummy variables are included is that in the CVAR they
will appear in differences. In each process the dummy will appear in levels. The
summing of the deterministic terms can be considered also; a transitory impulse will
have no effect on the process in the long run since in both summations, the effect
of the shift is immediately corrected. On the other hand, a permanent impulse
in one variable has a long run effect on the data through the permanent effect (C
in (5.47)), and a transitory but fading effect through the stationary component.
Due to the accumulation effect, a mean shift dummy actually translates into a time
trend after the break in the data. However, for all the complicated analysis that
could be carried out, the simple advice is to match the pattern observed in the
residuals of the model for a dummy variable. Hence the mean shift is unlikely to
be added, at least in unrestricted form. If a clear structural break has taken place
however, such as a change in exchange rate system, then a shift dummy might be
included but restricted to the cointegration space:
µ ¶0 µ ¶
β Xt−1
(5.50) ∆Xt = α + θ∆Dst + εt .
β̃0 Dst
From the definition of C(L) in (5.49) it can be seen that the cumulated effects of a
shift dummy in the cointegration space fade away and are not permanent.
5.3.1. Unified structure. Dummy variables enter all equations in the CVAR, so Φ
has p rows, in order that the system remains in a form that can be estimated by
MLE. In Section 7.6 this unified structure is relaxed to allow short-run identification
of the system.
5.3.2. Seasonal dummy variables. This analysis of dummy variables in the CVAR
gives the rationale for using centred seasonal dummies not normal seasonal dummies
in cointegration analysis.12 A normal seasonal dummy for, say, quarterly data,
would look like:
Ds = (1, 0, 0, 0, 1, 0, 0, 0, 1, . . . ),
and from above, along with the two other lagged values of ds, this would accumulate.
Centered seasonal dummies are such that there is no cumulation, so that the first
is, say, as above, but the next two are:
(0, −0.5, 0, 0, 0, −0.5, 0, 0, 0, −0.5, 0, . . . ),
and:
(0, 0, −0.5, 0, 0, 0, −0.5, 0, 0, 0, −0.5, 0, 0, . . . ),
hence there is no cumulation when summed.
5.3.3. Exogenous Variables. Exogeneous variables, while not deterministic, can also
be included within the system in a similar way to deterministic terms. Sometimes
it might be required to include a variable that isn’t actually modelled, and is hence
conditioned on. One example might be the oil price, although any variable be-
ing included in this way ought to have been previously tested for weak exogeneity
(see Section 7). Alternatively if one has a large number of variables, it might be
of interest to reduce the size of Xt by conditioning on a number of the variables,
dependent again on the outcome of weak exogeneity testing. The vector of exo-
geneous variables is denoted as Zt and enters like other deteriministic terms, in its
levels and differences where appropriate, so:
µ ¶0 µ ¶
β Xt−1
(5.51) ∆Xt = α + θ∆Zt + εt ,
β0 Zt
5.4. Estimation and rank determination. Having now considered the form of
the cointegrated VAR model, in this Section estimation in the CVAR model is
considered, along with rank determination. The cointegrated VAR is estimated
for rank r < p:
k−1
X
(5.52) Hr : ∆Xt = αβ 0 Xt−1 + Γi ∆Xt−i + ΦDt + εt .
i=1
If the data in levels are I(1), Xt ∼ I(1), then the p × p matrix Π must be of reduced
rank
r < p, since it is the coefficients on Xt , while the other parts of the equation,
∆Xt , εt ∼ I(0). Something needs to be done to balance the equation in (5.5),
otherwise inference will be spurious (Granger & Newbold 1974). Three cases for
the rank r are possible:
r = p: The data are I(0) in levels as otherwise the model would be imbalanced.
Thus estimate VAR in levels, Xt .
r < p, r > 0: System is of reduced rank, and linear combinations of variables
can be found that are stationary.
r = 0: No cointegration. The VAR in differences (∆Xt ) should be run; there
are no long run relationships.
12This is relevant when using data that is not seasonally adjusted. In PcGive, then in the “Data
Selection”, ensure CSeasonal is selected, and not Seasonal when adding seasonal dummies.
34 JAMES READE
Thus whether Π is of reduced rank is important for estimation. If Π is of reduced

rank then the model cannot be estimated using standard procedures as there is
non-linearity in the Π = αβ 0 factorisation of the cointegrated VAR transformation
because the two vectors of coefficients, α and β, are multiplied by each other.
The cointegration methodology is to try to find ways to combine data series that
render them stationary. To take a simple example, say x1t and x2t are defined as:
t
X
(5.53) x1t = a ε3i + ε1t
i=0
Xt
(5.54) x2t = b ε3i + ε2t ,
i=0
Thus both processes are I(1) variables as they have a random walk, an integrated
error, in their determination. The idea then is to combine these two variables in a
system so that the resulting variable is stationary. An example would be:
(5.55) bx1t − ax2t = bε1t − aε2t ,
which is I(0), since the integrated error, the I(1) part, has been cancelled by the
linear combination, and the cointegrating vector is β = (a, −b). Furthermore,
linking into the discussion on common trends, both variables
Pt in the 2 variable
system are driven by the same common stochastic trend, i=0 ε3i .part of the CVAR
that is in levels, the ΠXt−1 part. This doesn’t reduce the explanatory power of
the model by reducing information - it increases it since now stable combinations of
variables are included in the model which describe steady state relationships that
theory often suggests, and further accurate inference on these can be carried out.
Thus Π is factorised into Π = αβ 0 , where α and β are p × r matrices, where r is the
rank of the matrix Π, and p is the number of variables in the system. This could
be written equivalently as:
 
β1
¡ ¢
 β2 

(5.56) Π = αβ 0 = α1 α2 . . . αr  .  = α1 β10 + α2 β20 + · · · + αr βr0 ,
 .. 
βr
where each αi or βi is p × 1, and so when multiplied by Xt it can be seen that:
βi0 Xt = β1i X1t + β2i X2t + · · · + βpi Xpt ,
describes a combination of variables that is stationary.
It is more likely however that r < p and so Xt ∼ I(1), and it is of vital importance
to accurately determine r, the cointegration rank. This is the same as the univariate
case where it is vital to distinguish between stationarity and non-stationarity for
asymptotic distributions and hence inference. Any restriction from a rank of p
produces a special case of the unrestricted model:
(5.57) H(r = 0) ⊂ H(r = 1) ⊂ · · · ⊂ H(r = p),
where H(r = 0) is simply the VAR model in differences, while H(r = p) is the
case where the data is stationary, the model of full rank. The testing procedure
is to test in this order also, so to test LR( H(r = 0)| H(r = p)) first, then proceed
if the test is rejected to r = 1, and to r = 2 if this is rejected, and so on until
a test is not rejected. Then this hypothesis is accepted. The eigenvalues of the
companion matrix (Section 3.5.1) are fundamental here. Each eigenvalue λi can
2
be read as Corr (∆Xt , vi Xt−1 ) , where vi is the corresponding eigenvector. Thus
the correlation between an I(0) quantity, ∆Xt , and vi Xt−1 which may or may not
be I(0) is reported. The rationale here is that an I(0) process cannot be correlated
with an I(1) process, at least not if the sample size is large enough. This is because,
Pt
to take an I(0) process, yt = µ0 + ut 13, and an I(1) process, zt = i=0 εt 14, the
correlation of yt and zt is:
Cov(yt , zt ) µ0 x0
(5.58) Corr(yt , zt ) = p p =p p ,
V ar(yt V ar(zt ) σu2 tσε2
hence as the sample size increases the correlation decreases. On the other hand, if
both variables are I(0), which suggests that vi or βi has done the trick and rendered
the levels combination stationary, then the eigenvalue λi will be significant. Thus
the LR test for rank determination is in effect a test of the statistical
Qr significance of
the eigenvalues of the system, hence how it is often written: L = i=0 (1−λi ). The
test begins with r = 0: this implies there are no cointegrating vectors and hence all
the eigenvalues are insignificant. Thus if this null hypothesis is rejected, it must be
that there is at least one significant eigenvalue; thus the model is restricted to have
one cointegrating vector, r = 1, and this is tested; if this is rejected, again it must
be that there is a significant eigenvalue in the ones being restricted to zero (i.e. all
the rest). So again r is increased, and the test is run on the r + 1th up to the pth
eigenvalues. Once the test is accepted, this suggests the additional eigenvalues are
all zero, hence there are no more cointegrating vectors.
The cointegrated VAR is estimated by the reduced rank regression of ∆Xt on
Xt−1 corrected for lagged differences and deterministic terms. Using the Frish-
Waugh Theorem (Section A), the residuals from the regressions of ∆Xt and Xt−1
on ∆Xt−1 and Dt can be written:
Ã k−1
!
X
(5.59) R0,t = ∆Xt | ∆Xt−1 , Dt
i=1
Ã k−1
!
X
(5.60) R1,t = Xt−1 | ∆Xt−1 , Dt .
i=1
(5.59) and (5.60) can be used in (5.52) to produce the concentrated regression
model:
(5.61) R0,1 = αβ 0 R1,t + εt .
This gives a likelihood of:
Ã T
!
1 1X 0 0 −1 0
(5.62) L= T /2
exp − (R0,1 − αβ R1,t ) Ω (R0,1 − αβ R1,t ) .
|Ω| 2 t=1
This is estimated by fixing β and estimating α and Ω by the OLS regression of R0,t
PT
on R1,t in (5.61). Defining the squared correlations Sij as Sij = T −1 t=1 Rit Rjt
0
,
13So E(y ) = µ and V ar(y ) = σ 2 .

t 0 t u
14So E(z ) = x and V ar(y ) = tσ 2 , and E(y z ) = µ x as E(u ε ) = 0.
t 0 t ε t t 0 0 t t
36 JAMES READE
this gives:
−1
(5.63) α̂(β) = S01 β (β 0 S11 β)
−1
(5.64) Ω̂(β) = S00 − S01β (β 0 S11 β) β 0 S10
¯ ¯ ¯ ¯
¯ ¯ ¯ −1 0 ¯
(5.65) L−2/T 0
max (β) = ¯Ω̂(β)¯ = ¯S00 − S01β (β S11 β) β S10 ¯
¯ 0¡ ¢ ¯
¯β S11 − S10 S −1 S01 β ¯
00
(5.66) = |S00 | ,
|β 0 S11 β|
where the last line is simply a rearrangement of the line above, factoring out the
S00 term and writing the S11 below. Then this problem is minimised by solving the
eigenvalue¯ problem, which can ¯ probably be seen by the fact that if the eigenvalue
problem, ¯λS11 − S10 S00−1
S01 ¯ holds then the likelihood is zero, thus minimised.15
This gives us eigenvalues λ̂1 > · · · > λ̂p and eigenvectors V̂ = (v̂1 , . . . , v̂p ) such
that:
−1
(5.67) λ̂i S11 v̂i = S10 S00 S01 v̂i .
Then the eigenvectors for the r largest eigenvalues are taken and, normalised on
S11 by the β 0 S11 β term in (5.66):
¯ ¯ r
Y
¯ ¯
(5.68) L−2/T
max (Hr ) = ¯Ω̂(β)¯ = |S00 | (1 − λ̂i ).
i=1
The eigenvalues can be interpreted as the squared correlations between the levels
of the data and the differences and hence as described above in equation (5.58) the
combinations of levels that are most stationary since the differences are stationary,
and in the limit there cannot be correlation between random walks and stationary
series.
5.4.1. Likelihood ratio (trace) test of cointegration rank. Estimation has thus far
proceeded for a general rank, r. This r must be chosen however, and it is done so
using the likelihood ratio test. Since the form of (5.68) that is unrestricted will be
where r = p, it follows using the laws of logarithms that the likelihood ratio test
statistic (Section 7) will be:
p
X
(5.69) −2 ln LR( Hr | Hp ) = −T ln(1 − λ̂i ),
r+1
Thus the test is that the remaining eigenvalues, above the ones taken under the null
to be significant, are equal to zero and hence the data supports the null hypothesised
value of r.
Testing proceeds from rank r = 0 to r = p in testing, as opposed to the op-
posite direction, due to the size and power properties of testing in this direction,
which are preferable (Johansen 1995). Thankfully this procedure is automated in
most econometrics programs, and carrying it out in PcGive will be outlined in Sec-
tion 5.4.2. It is suggested that one first carries out the trace test before estimating
the cointegrated VAR in PcGive (Sections 5.4.3 and 5.4.4).
15Noting the likelihood is raised to a negative power, −T /2.

5.4.2. Trace testing in PcGive. Having estimated the unrestricted VAR as in Sec-
tion 3.7.1 or 3.7.2, and carried out the various diagnostic checks, and arrived at a
model that is congruent and satisfies the assumptions of the maximum likelihood
framework, then one can proceed to trace testing to determine the cointegrating
rank, r, for the system. To do so, using the PcGive module having estimated the
unrestricted VAR in its satisfactory form, follow:
Test --> Dynamic Analysis and Cointegration Tests... --> [check I(1)
cointegration analysis box] OK
This will provide an output something like:
I(1) cointegration analysis, 1963 (4) to 2005 (1)
eigenvalue loglik for rank
3950.074 0
0.43799 3997.902 1
0.29420 4026.821 2
0.19185 4044.500 3
0.10640 4053.837 4
0.072642 4060.097 5
0.044840 4063.905 6
H0:rank<= Trace test [ Prob]

0 227.66 [0.000] **
1 132.01 [0.000] **
2 74.167 [0.004] **
3 38.808 [0.121]
4 20.134 [0.223]
5 7.6155 [0.293]
Below this will be a description of the unrestricted and restricted variables (see
Section ??), and the β, α and Π matrices for r = p. Thankfully, as can be seen,
the decision is reasonably clear in this situation; a rank of 3 should be chosen, as
the null hypothesis of r = 2 is rejected with a 0.4% p-value, while there is a 12.1%
probability of false rejection for r = 3, which is deemed too high.
5.4.3. Estimating the cointegrated VAR using PcGive module. Having determined
the cointegration rank, the cointegrated VAR can be estimated. In the PcGive
module, one should follow:
Model --> Formulate --> [keep variables as in last UVAR
specification] OK --> [check Cointegrated VAR box] OK --> [select
appropriate Cointegrating rank based on trace test outcome] OK -->
OK
The resulting output should look something like the abbreviated output in Sec-
tion 5.4.5.
5.4.4. Estimating the cointegrated VAR in PcGive using batch code. Using the code
established for estimating the unrestricted VAR in Section 3.7.2, chang the line that
says:
estimate("OLS", 1963, 4, 2005, 1);
to:
rank(6); estimate("COINT", 1963, 4, 2005, 1);
38 JAMES READE
Running the file will give output something like that in Section 5.4.5.
5.4.5. PcGive CVAR Output.

SYS( 4) Cointegrated VAR (using BigDatabase.xls)
The estimation sample is: 1963 (4) to 2005 (1)
Cointegrated VAR (2) in:

[0] = lnt
..
.
Unrestricted variables:
[0] = Constant
..
.
Restricted variables:
[0] = Trend
[1] = dum19842s
Number of lags used in the analysis: 2
..
.
beta
..
.
alpha
..
.
Standard errors of alpha
..
.
long-run matrix, rank 6
..
.
Standard errors of long-run matrix
..
.
log-likelihood 4063.90453 -T/2log|Omega| 5477.16731
no. of observations 166 no. of parameters 174
rank of long-run matrix 6 no. long-run restrictions 0
beta is not identified
No restrictions imposed
5.5. Additional Information on Rank Determination. The trace test pro-

cedure outlined above is very useful for determining the cointegration rank of a
system. The rank test is described as the single most important formal testing
procedure in cointegrating analysis; this is because if the wrong rank is decided
upon then too many or too few unit roots are imposed on the model, and the
data is wrongly characterised. It is important to have a congruent model and the
right deterministic specification for the model, because the distribution of the test
statistic and hence the critical values may otherwise be distorted, as discussed in
Section 4. Often the outcome is clear - there are r large eigenvalues and p − r
negligible eigenvalues. However, this is not always the case, and there are a num-
ber of eigenvalues that aren’t particularly small neither overly large, and they are
evenly spread over a range between say 0.1 and 0.4, making rank determination
less clear. In this situation, where there is ambiguity over which eigenvalues to
take as significant and which to not, extra information needs to be used; however,
as the trace test is the only formal procedure used, it should be accorded most
weight. Possible sources of additional information will be considered in the next
few Sections.
5.5.1. Coefficients in α matrix. One might consider the coefficients, and their sig-
nificance in the α matrix. If there is ambiguity over whether or not an (r + 1)th
cointegrating vector is stationary, then significant entries in the (r + 1)th column of
α (above about 2.6 as α follows a non-standard distribution) suggest that there is
useful information on the dynamics of the system in this vector.
In PcGive, the α matrix and its standard errors can be found by running a CVAR
with r = p.16 This is done as in Section 5.4.4 or 5.4.4, though setting cointegrating
rank to p in the relevant places.
5.5.2. Plots of cointegrating vectors. Another course of action is to consider various

plots. One might look at possible cointegrating vectors; by running a CVAR of
full rank, one can get plots of all the possible cointegrating vectors. If a particular
cointegrating vector doesn’t look particularly stationary, this is more information
on its suitability for inclusion in the system.
In PcGive this is done by following, after the CVAR has been estimated:
Test --> Graphic Analysis --> [check Cointegrating relations box and
choose Use (Y:Z) OR Use (Y_1:Z)...] OK
(Y:Z) is the cointegrating vectors from the standard CVAR, while (Y_1:Z) is the
cointegrating vectors from the concentrated model (as in Section A). This will
result in the cointegrating vectors being printed, as in Figure 7.
5.5.3. Plots of characteristic roots. Another check of correct rank, which may also
yield information on whether or not there is a possible I(2) problem, is plots of
the root of the companion under various imposed ranks - if for the rank r that is
imposed, there still appears to be a root near the unit circle, then this suggests
the other root should be imposed to unity, and hence the potential additional
cointegrating vector is actually a stochastic trend. However, if regardless of what
rank is imposed, there is always an additional root near to unity, this suggests there
may be an I(2) problem - differencing alone will not get rid of this root, so placing
more and more roots to unity will not solve the problem.
In PcGive this is done by following, after the CVAR has been estimated:
Test --> Dynamic Analysis --> [check Plot roots of companion matrix
box] OK
This will result in the roots of the characteristic polynomial, or of the companion
matrix (the two are equivalent) being printed, as in Figure 8.
16The output from the trace test does not give standard errors for α.
40 JAMES READE
vector1 vector2
−73 37.95
37.90
−74
37.85
1970 1980 1990 2000 1970 1980 1990 2000

vector3 vector4
17.10
21.1
17.05
21.0 17.00
1970 1980 1990 2000 1970 1980 1990 2000
−12.3
vector5 6.3 vector6
−12.4
6.2
−12.5
6.1
1970 1980 1990 2000 1970 1980 1990 2000
Figure 7. Cointegrating vectors from UK labour market model.
Roots of companion matrix

1.00
0.75
0.50
0.25
0.00
−0.25
−0.50
−0.75
−1.0 −0.8 −0.6 −0.4 −0.2 0.0 0.2 0.4 0.6 0.8 1.0
Figure 8. Plot of roots of companion matrix for a CVAR with p = 6.

5.5.4. Economic interpretability. One might give consideration to the αr+1 and
βr+1 vectors that would result if an ambiguous cointegrating vector was added. If
the cointegrating combination of levels, βr+1 is nonsensical then it might be worth
omitting. It is the case that combinations of vectors can be formed into one (see
the section on identification, Section ??), and hence the adding of another column
might lead to a cointegrating vector being dispersed over two columns, leaving it
looking a bit odd and nonsensical on its own. If a particular variable is expected
to be error correcting, such as a variable like price of output in a supply equation,
then it would be hoped there would be a negative α coefficient corresponding to
price in at least one column of α, hence if the price coefficient in αr+1 is the
appropriate sign, it might be worth keeping. However, it should be remembered
that the purpose of econometric analysis is to understand what the data reveal for
a particular economic problem, and as such testing procedures should be accorded
more weight appropriately.
5.5.5. Remove dummy variables and run trace test again. Because it is known that
the inclusion of dummy variables alters the distribution of the trace test, but that
it is hard to simulate critical values for every eventuality, the critical values for the
trace test are based on the model with no dummies. Hence it might provide infor-
mation on the rank if the trace test is calculated again without dummy variables
in the model.
In PcGive this involves omitting the dummy variables (apart from the seasonal
dummies) in either the model selection window using the module, or by commenting
out using batch code (this is where the benefit of using batch code enters), running
the unrestricted VAR again and running the trace test again. For the UK labour
market model, this results in:
I(1) cointegration analysis, 1963 (4) to 2005 (1)
eigenvalue loglik for rank
3740.232 0
0.27499 3766.923 1
0.24606 3790.366 2
0.14621 3803.486 3
0.097172 3811.970 4
0.056917 3816.834 5
0.016040 3818.176 6
H0:rank<= Trace test [ Prob]

0 155.89 [0.000] **
1 102.51 [0.003] **
2 55.622 [0.204]
3 29.381 [0.545]
4 12.412 [0.782]
5 2.6843 [0.898]
This reveals some of the problems in this analysis; the inclusion of the dummy
variables does indeed have an effect; without them, only 2 cointegrating vectors
are detected, whereas with them, 3 are found (Section 5.4.2. This suggests that
including dummies has made series that otherwise appeared non-stationary to look
42 JAMES READE
stationary; this is like the phenomenon of modelling a random walk using a station-
ary process with a number of suitable dummy variables. Hence this suggests that
one should consider the other information presented here before making a choice
on the cointegrating rank.
5.5.6. Simulation of small sample test distribution. A final thing that could be done
is to simulate a VAR process with the coefficient values estimated, to look at the
empirical rejections of various null hypotheses if particular ranks are the true rank
- so if it is not clear whether the rank is 2 or 3, then one might simulate with a true
rank of 3 and consider the probability of falsely accepting the hypothesis of r = 2.
Ideally if the true rank is 2, then one would hope for a 95% rejection frequency
of the null hypothesis of r = 0 and r = 1 (power), and a 5% rejection frequency
of r = 2 (size). With such a simulation one can then alter coefficient values and
sample size to see what would be needed in order to correctly determine the rank
of the system.17
6. Limiting Distributions of the Trace test

The likelihood ratio test for cointegrating rank, also known as the trace test, has
been shown asymptotically to follow the following Dickey-Fuller distribution:
³R ´2
1
2
T S01 0
BdB
(6.1) LR(Hr |Hp ) = −→ R 1 .
S00 S11 B 2 du
0
PcGive simulates critical values for the trace test. The existence of this asymp-
totic distribution relies on a number of assumptions which must be tested. These
assumptions are:
(1) rank(Π) = r. Correct rank has been chosen. If not, results will be wrong,
since unit roots may be left in data.
(2) p − r unit roots (so no explosive roots).
(3) Functional Central Limit Theorem exists for errors, which requires no au-
tocorrelation in errors.
(4) T = ∞ should be roughly approximated. However, T = 100 6= ∞ and
small sample problems may exist.
(5) constant parameters.
This is the reason for the emphasis on testing and checking these assumptions (see
Section 4); they underly the trace test, determination of the cointegrating rank
hence classification of the I(0) and I(1) components of the model.
The distributions remain the same if lags are included in the model, because
estimation and rank determination is carried out on the concentrated model of
(5.61), and hence however many lags are included, they are concentrated out.18
The inclusion of deterministic terms, nuisance parameters, does change the limit-
ing distribution. If a restricted constant is included (see case 2 on page 30), the
17I intend to produce an Ox job to do this and if anyone is interested I can email it to them once
I have written it.
18Although as discussed in Section 4.4.1, I(2) analysis requires at least 2 lags, while short-run
identification is affected also by the number of lags included.
asymptotic distribution becomes:

½ Z µ ¶ ¾
B(p−r)
(6.2) LR(Hr |Hp ) −→ tr . . . , du, . . . ,
1
and so simulated critical values will be different if a constant is in the levels or not.
A trend restricted to the cointegration space (see case 4 on page 31) has a similar
effect: the asymptotic distribution is slightly altered to:
½ Z µ ¶ ¾
B(p−r)
(6.3) LR(Hr |Hp ) −→ tr . . . , du, . . . .
u
Exogenous variables, discussed in Section 5.3.3, can also be included within the
system and the asymptotic distribution is altered to:
½ Z µ ¶ ¾
B(p−r)
(6.4) LR(Hr |Hp ) −→ tr . . . , du, . . . ,
Wdim(z)
where Wdim(z) is an additional Brownian motion. The critical values reported
in PcGive are not necessarily correct here. So there is potentially the need for
additional simulation of critical values if it is clear that the exogeneous data is not
I(1).
Inclusion of dummy variables (Section 4.4.2) leads to a different asymptotic
distribution:
   
 Z B(p−r) 
(6.5) LR(Hr |Hp ) −→ tr . . . ,  1  du, . . . .
 
?
Theory has established that a limit exists with shift dummies and details on the
asymptotic distributions when dummy variables are included are given in Johansen,
Mosconi & Nielsen (2000). However the critical values are not simulated in PcGive
when dummy variables are added. A way of checking the difference the inclusion
of dummy variables has made is to omit all dummies and determine the rank. If
there is no effect on the outcome of the rank test then one should proceed.
7. Imposing restrictions and Identification

Having determined the rank of the system, the dimensions of the α and β ma-
trices have also been determined. Understanding the dynamics in the system,
how the α and β matrices interact, and what hypotheses the data will support, is
now of interest. This involves restricting the matrices and then testing them; the
method for testing was outlined in Section 3.6.2 but is explained in more detail in
this Section.
Firstly though, a normalisation of the β and α matrices must be imposed, as
in the simple form they often contain very large values that are hard to interpret.
This is because for the matrix Π an infinite number of α and β matrices can be
found that satisfy Π = αβ 0 , just as any number of real numbers a and b can be
found such that ab = 1. This normalisation, and any particular normalisation, can
be found simply by combining and rotating the rows and columns of Π in the αβ 0
factorisation, by noting that, for any non-singular matrix Q:
(7.1) αβ 0 = αQQ−1 β 0 .
44 JAMES READE
To normalise, PcGive puts the leading diagonal to zero, i.e., where a ∗ signifies a
freely varying parameter:19
 
1 ∗
 ∗ 1 
 
β =
0
 ∗ ∗ .

 ∗ ∗ 
∗ ∗
This normalisation makes the counting of parameters estimated somewhat more
difficult, since by normalisation at least r parameters are not estimated but fixed.
In fact by linear algebra the β matrix can be row reduced until it has an identity
matrix in the top segment, so:
 
1 0
 0 1 
 
(7.2) β =
0
 ∗ ∗ .

 ∗ ∗ 
∗ ∗
Hence there are less parameters to estimate than 2 × p × r; in fact there are pr +
pr − rr = pr + (p − r)r, since rr is the size of the identity matrix that can always
be found. This is not a restriction though: the likelihood will always remain the
same before and after this restriction is imposed. However, rendering the system in
a more interpretable form, which this normalisation should do, will enable clearer
understanding of the dynamics within it. Restricting the β and α matrices will
be discussed Sections 7.1 and 7.2, implementation in PcGive will be considered in
Section 7.3 before in Section 7.4 types of restrictions that can aid understanding of
the data will be described.
7.1. β restrictions. As an example, a system with p = 3 and r = 2 is taken. There
are always likely to be deterministic terms in a particular model; when testing, one
must consider the role these terms play. The model is:
(7.3)
 
    X1,t−1
∆X1,t α11 α12 µ ¶  X2,t−1 
 
 ∆X2,t  =  α21 α22  β11 β21 β31 β41 β51  X3,t−1 
β12 β22 β32 β42 β52   
∆X3,t α31 α32 1 
Dst .
 
ε1,t
+  ε2,t 
ε3,t
Many questions can be asked about the system which can be answered by placing
restrictions on the β matrix and testing them, such as:
(1) Do X1,t and X2,t cointegrate?
(2) Is X3,t stationary?
(3) Is the spread X1,t − X2,t stationary?
(4) If there exists cointegrating relations, do all have spreads in them?
(5) Can a variable, say X3,t , be excluded?
19Although this can be altered by imposing restrictions, discussed in Section 7.1.
(6) Do a number of regression coefficients sum to zero, e.g. β1 + β2 + β3 = 0?

Restrictions are imposed on the system by constructing restriction matrices.
Using the orthogonal complement defined on page 26, there are two possible forms
of imposing restrictions. The first is perhaps the more traditional way to impose
restrictions and might be called the R-form. In a very simple uni-variate regression
y = δx + u, this would be Rδ = 0, where to test the restriction δ was equal to zero,
R = 1 would be required. In a single cointegrating vector, such as the first in (7.3):
¡ ¢
(7.4) β10 = β11 β21 β31 β41 β51 ,
then placing a restriction involves writing
R0 β1 = 0.
So R must be a (remembering the transposes) p × 1 matrix. Restricting one
coefficient, say β31 to be zero would require:
¡ ¢
(7.5) R0 = 0 0 1 0 0 .
Imposing more than one restriction means R must have more than one column.
So testing the significance of another variable, say X1,t , means R becomes:
µ ¶
0 0 1 0 0
(7.6) R0 = .
1 0 0 0 0
Hence R must be p × (p − m) where there are p − m restrictions. Extending to the
case where there is more than one cointegrating vector, such as the two in (7.3),
then an R matrix must be constructed for each vector, so for βi an Ri matrix must
be constructed. This can be written:
 0 
R1 β1
 R20 β2 
(7.7) R0 β =  
 . . .  = 0,
Rr0 βr
Applying the restriction in (7.5) to all rows in β would give restriction 5. In the
r = 2 case here, this is:
· 0 ¸
0 R1 β1
Rβ=
R20 β2
   
β11
   
 ¡ ¢  β21  
 0 0 1 0 0  β31  
   
  β41  
 
 β51  
(7.8) =  
β12 
 
  β22  
 ¡ ¢  
 0 0 1 0 0  β32  
   
  β42  
β52
µ ¶
β31
= = 0.
β32
46 JAMES READE
The other form for restrictions is the H-form, where instead of writing down
the restricted coefficients, the model is written in terms of the coefficients that are
unrestricted hence estimated; it is written:
¡ ¢
(7.9) β = Hϕ = H1 ϕ 1 H2 ϕ2 . . . Hr ϕr ,
where H = R⊥ and hence is p × m and ϕ is m × r and m might be described as

the number of freely varying parameters. The far right hand side of (7.9) shows
that different restrictions can be applied to each βi vector in β, just as in the longer
expression in (7.7). If the same restriction is being applied to each vector in β
then one can just use the simpler first expression in each case. It is certainly more
difficult to conceptualise the H-form at first than the R-form; hence a number of
examples will be given. Firstly the equivalent to R = (1, 0, 0), restriction 5 is given
by allowing the four unrestricted parameters vary freely, hence thinking of β in
terms of unrestricted parameters, so:
   
0 0 0 0
β =  β21 β22  =  ϕ11 ϕ12  ,
β31 β32 ϕ21 ϕ22
where the last bit is written in terms of the ϕ matrix that is used to implement
these restrictions. So overall it might be written:
(7.10)
β = Hϕ = [H1 ϕ1 , H2 ϕ2 ]
      
? ? µ ¶ ? ? µ ¶ 0 0
ϕ11 ϕ12  
(7.11) =  ? ?  , ? ?  = ϕ11 ϕ12  ,
ϕ21 ϕ22
? ? ? ? ϕ21 ϕ22
and consideration of this for long enough should provide intuition that H takes the
form:
 
0 0
 1 0 .
0 1
The first row must be zeros in order that the resultant matrix be zeroes. Then the
identity is required to map what is in the ϕ matrix to the bottom square part of the
resultant Hϕ product. The dimensions of H and ϕ can be found by considering
the resulting matrix that is required, the far RHS of (7.11). Here, in each β vector
there is one restriction, hence there must be p − m rows of ϕ and as p = 3 then
p − m = 2. Then there will be p − m columns in H as a result else the matrix
multiplication couldn’t take place.
Moving on to other kinds of restrictions, restriction 2 asks whether a variable
is stationary. This translates into asking whether the variable on its own is a
cointegrating vector, since in the reduced rank model all variables must be I(0). In
R form, asking whether X2,t is stationary is equivalent to asking whether or not
the other variables in a β vector are equal to zero, hence:

   
  β11 0
1 0 0 0 0    
 0 0 β21 0
1 0 0     
(7.12) Rβ = 
 0 0

  β31 =
  0 .

0 1 0    
β41 0
0 0 0 0 1
β51 0
A choice must be made here however between testing for stationarity per se or
stationarity around deterministic terms, such as a trend or a mean shift. In (7.13)
stationarity per se is being tested, while if one were to test for stationarity around
the constant and mean shift dummy in this model, then the restrictions would be:
   
β11 0
µ ¶ β21   0 
1 0 0 0 0    
(7.13) Rβ =  β31 = 0 .
0 0 1 0 0    
 β41   0 
β51 0
Consider next the H form, restrictions are placed on one column in β, and the
other columns are left to vary freely. To find the exact form for H, one could
either work out the orthogonal complement of H, else one might construct it as
before, by working backward from the free parameters. On the restricted vector, if
testing is for stationarity around deterministic terms, only the variable being tested
for stationarity and the deterministic terms can vary.20The restrictions on each β
vector determine the size of each Hi matrix. On the first vector, the coefficients
on X2,t , 1 and Dst are allowed to freely vary; the rest are fixed. Hence there are
two restrictions and so two rows for the ϕ1 matrix, along with two columns in the
H1 matrix:
¡ ¢
(7.14) β̃ = H1 ϕ1 H2 ϕ 2
      
? ? ?   ? ? ? ? ? ϕ12
  ? ? ?  ϕ11  ? ? ? ? ?  ϕ22  
      
= 
  ? ? ?  
 ϕ21  
 ? ? ? ? ? 
 ϕ32 



  ? ? ?  ϕ31  ? ? ? ? ?  ϕ42  
? ? ? ? ? ? ? ? ϕ52
(7.15)  
0 ϕ12
 ϕ11 ϕ22 
 
=
 0 ϕ32 
,
 ϕ21 ϕ42 
ϕ31 ϕ52
20This is usually tested for, since the terms restricted to the cointegration space are restricted in
order to ensure stationarity of the system. It is left for the reader to think about the H-form if
testing was for stationarity per se. Restriction 5 is an exclusion restriction, and might be used
to test mean shift dummies or trends included when one is unsure over their usefulness.
48 JAMES READE
with H matrices allowing the deterministic terms to vary freely:
   
0 0 0 1 0 0 0 0
 1 0 0   0 1 0 0 0 
   
H1 = 
 0 0 0 ,
 H2 = 
 0 0 1 0 0 .

 0 1 0   0 0 0 1 0 
0 0 1 0 0 0 0 1
Restrictions on linear combinations of terms can be considered. One common

combination to test is whether particular spreads are stationary, which conform
to restrictions 3 and 4 on page 44, or whether coefficients sum to unity. Firstly,
asking whether or not the spread X1,t − X2,t is stationary is equivalent to asking
whether this forms a cointegrating vector in itself?21 Thus coefficients on X1,t and
X2,t are restricted to be equal and of differing sign, and the coefficient on X3,t is
restricted to be zero. For now the other β vector is ignored:
   
? ? ?   ϕ11
 ? ? ?  ϕ11  −ϕ11 
   
(7.16) β1 = H1 ϕ1 = 
 ? ? ?   ϕ21  = 
  0 .

 ? ? ?  ϕ31  ϕ21 
? ? ? ϕ31
The H matrix is:

 
1 0 0
 −1 0 0 
 
H1 = 
 0 0 0 
.
 0 1 0 
0 0 1
Thus equality restrictions between variables lead to more than one non-zero entry
on a particular column of the H matrix in question. Considering the R-form, this
will have dimension p × (p − m), and from (7.16) m = 3 hence R has dimension
5 × 2 and is:
µ ¶
0 1 1 0 0 0
(7.17) R = H⊥ = .
0 0 1 0 0
If the question was do spreads exist in all cointegrating vectors, restriction 4, this
would place restrictions on the other β vector here, and the spread could be between
21The inclusion of determinstic terms or not is case dependent, and depends on the spread or
cointegrating vector being tested for. Initially here deterministic terms are left unrestricted.
X2,t and X3,t , which would be formulated as:

¡ ¢
(7.18) β̃ = H1 ϕ1 H2 ϕ2
     
1 0 0   0 0 0  
  −1 0 0  ϕ  1 0 0  ϕ12 
   11   
= 
  0 0 0 
  ϕ21   −1
 0 0   ϕ22  
 
  0 1 0  ϕ31  0 1 0  ϕ32 
0 0 1 0 0 1
 
ϕ11 0
 −ϕ11 ϕ12 
 
(7.19) =  0 −ϕ12  .
 ϕ21 ϕ22 
ϕ31 ϕ32
As a final illustrative example, consider restricting the coefficients on all three
variables to sum to zero; restriction 6. In R-form, the R vector would be:
¡ ¢
(7.20) R0 = 1 1 1 0 0
This would look like:
   
? ? ? ?   ϕ11 + ϕ21
  ϕ 11  −ϕ11 
 ? ? ? ?   ϕ21   
(7.21) β 1 = H1 ϕ 1 = 
 ? ? ? ?   
  ϕ31  =  −ϕ21
,

 ? ? ? ?   ϕ31 
ϕ41
? ? ? ? ϕ41
and the H matrix is:
 
1 1 0 0
 −1 0 0 0 
 
H1 = 
 0 −1 0 0 .

 0 0 1 0 
0 0 0 1
Estimation and testing of restrictions is quite simple; instead of regressing:
∆Xt = αβ 0 Xt−1 + εt ,
the following regression is run:
(7.22) ∆Xt = αϕ0 H 0 Xt−1 + εt ,
and the reduced rank regression of ∆Xt on Xt−1 , is replaced by that of ∆Xt on
H 0 Xt−1 , and everything follows as in Section 5, with the substitution β 0 = ϕ0 H 0
imposed. Furthermore these restrictions are easily tested since the likelihood from
the restricted model can be calculated and compared to the unrestricted likelihood
via the likelihood ratio test, discussed in Section 3.6.2, and because the rank of the
regression has been determined, the test has a standard distribution; it is χ2 dis-
tributed with degrees of freedom of the difference between the number of parameters
estimated in the restricted and unrestricted regressions: (p − m)r.
Hence many restrictions are possible on the β matrix, and the restriction types 1–
6 give suggestions for possible cointegrating relationships that could be tested in
Section 7.4.6.
50 JAMES READE
7.2. α restrictions. Restrictions on the short-run structure of the model, the α

matrix, are imposed in analogous fashion to restrictions on the β matrix. Again
there is R form and H form for the restrictions. Certain types of restrictions have
different interpretations here though. Considering again (7.3):
       
∆X1,t α11 α12 µ ¶ X1,t−1 ε1,t
 ∆X2,t  =  α21 α22  β11 β21 β31  X2,t−1  +  ε2,t 
β12 β22 β32
∆X3,t α31 α32 X3,t−1 ε3,t
  0   
α11 α12 β1 X1,t−1 ε1,t
(7.23) =  α21 α22   β20 X2,t−1  +  ε2,t  .
α31 α32 β30 X3,t−1 ε3,t
Restricting a row in α to be zero means that that particular variable does not
respond to the cointegrating relations. There might be theory predictions that
certain variables should not adjust to the equilibrium relations, and this can be
tested. Alternatively, when one has reduced the model to I(0) form at a later
stage, one can simply restrict all insignificant variables to zero, which is often a
simpler way to impose restrictions on the α matrix. Testing for weak exogeneity
and unit vectors in α are useful tests to carry out at this stage however, and are
discussed in Sections 7.4.1 and 7.4.2.
Restrictions are tested on α by running:
(7.24) ∆Xt = Hϕβ 0 Xt−1 + εt ,
where the substitution α = Hϕ is imposed and one proceeds to estimate this system
as in Section 5 and construct a likelihood ratio test as in Section 3.6.2.
7.3. H-form versus R-form and inputting restrictions in PcGive. As seen,
the H-form of imposing restrictions is more useful for evaluating the likelihood, as
it gives free parameters that need estimating, while the R-form is more appropriate
for the Wald test formulation and test of restrictions.
The method for inputting restrictions in PcGive is the same regardless of whether
batch code or the PcGive module interface is being used. PcGive attaches elements
in the α and β matrices numbers, to which code is written to express restrictions.
Thus, for example, if p = 5 and r = 2 as in the examples in this Section (with
constant and shift dummy):
 
&0 &1 µ ¶
  &6 &7 &8 &9 &10
(7.25) α= &2 &3 , β=
&11 &12 &13 &14 &15
&4 &5
Hence to test the restriction that X1t can be excluded from the cointegrating rela-
tions, restriction 5, the code is:
&6=0; &11=0;
If using the PcGive module this would be written in the window provided by going
to:
Model --> Formulate... --> [select variables] OK --> Cointegrated
VAR, OK --> General Restrictions, OK
The correct numbers for the given dimensions of the system are provided via this
method. On the other hand, if writing batch code then firstly an Ox job exists
to print out the correct numbers as in (7.33) for reference, and secondly one would
enter below the system part and above the estimate line the following lines:
constraints
{
&6=0; &11=0;
}
7.4. Using restrictions to understand the system. A number of tests of re-
strictions can be carried out on the α and β matrices which have various implica-
tions for the system; testing is only on one vector in each matrix, since at this stage
neither is identified and hence it is irrelevant which vector is being tested on, the
question is more whether the data can support the restriction being hypothesised.
7.4.1. Weak exogeneity. Weak exogeneity, which was discussed in Section 3, can
be tested easily in this framework. This corresponds to a zero row in the α
matrix. This means that one variable in Xt does not correct to the cointegrating
vector; in other words the determination of its level is exogeneous to the system,
it is determined outside. As an example, X2t is weakly exogenous in a bi-variate
system with one cointegrating vector (p = 2, r = 1):
µ ¶ µ ¶ µ ¶
∆X1t α1 0 ε1t
(7.26) = β Xt−1 + ,
∆X2t 0 ε2t
The equation for X2t is simply ∆X2t−1 = ε2t and so X2t in this simple case is a
random walk, hence not determined inside the model. X2t is a common trend,
a driving force in itself, and consideration of the α⊥ shows this: a zero row in
α corresponds to a unit vector in α⊥ which simply picks up that particular vari-
able with a unity and places a zero on the other variables, here just X1t . The
Granger representation in (5.16) shows that the common trends, the random walk
Pt
component, are α⊥ i=0 εi and there are p − r (= 1 here) of these, and because
Pt
α⊥ must have the form α⊥ = (0, α⊥,2 ), then α⊥ i=0 εi = α⊥,2 ε2t and so X2t can
be isolated as the variable driving the system, with X1t simply correcting to it,
within the movements prescribed bythe cointegrating vector. If testing suggests a
variable is weakly exogeneous, then that variable can be transferred from Xt to Zt
and treated as a weakly exogeneous variable in the system. This can be written
in two equivalent forms; first by just including the contemporaneous differences of
Zt , and as many lagged differences of Zt as the model has for Xt . The alternative
formation is to partition Xt = ( Yt | Zt ), i.e. to condition on Zt . Then the model is
written:
∆Yt = αβ 0 Xt−1 + Γ1 ∆Xt−1 + γ∆Zt−1 + γ0 ∆Zt + εt .
One should note that the test for weak exogeneity on any particular variable is a
test only on that variable; because more than one row looks to be zero, this doesn’t
mean they are are necessarily equal to zero. However, testing more than one vector
is not too difficult.
7.4.2. Testing for a unit vector in α. The α matrix is commonly described in terms
of its column vectors α1 , . . . , αr , since these give the response of each variable to
the rth cointegrating vector. The test for a unit vector in α thus investigates
whether or not a particular cointegrating vector could be described as having just
one variable correcting to it. Furthermore, a unit vector in αr corresponds to a
zero vector in Pthe appropriate column of α⊥ , and given the common trends are
t
defined as α⊥ i=0 εi , then a unit vector implies shocks to that variable have no
52 JAMES READE
long-term effect. It is a complementary test to the weak exogeneity test, since

weakly exogenous variables implies a corresponding zero column in α and hence a
unit vector in α⊥ and so identification of that variable as a common trend.
7.4.3. Testing for variable exclusion. Whether or not a variable can be excluded
from the long-run relations is testable; one can test that a column of β is equal to
zero. Thus the same restriction is applied to all vectors in β, and is for R a unit
vector with the unity entry in the appropriate place; so for X1t :
¡ ¢
R0 = 1 0 0 0 0 .
For the H form, it is an identity matrix ‘interrupted’ with a row of zeros for the
relevant variable; so for X3t :
 
1 0 0 0
 0 1 0 0 
 
(7.27) H=
 0 0 0 0 .

 0 0 1 0 
0 0 0 1
This test can be of use with deterministic terms to determine whether or not they
are appropriate. This is particularly so for mean-shift dummies, but less so for a
trend; here one should only exclude if p-values are very high, because omitting the
trend may lead to the model using another variable as a proxy to the trend, hence
biasing other coefficients.
7.4.4. Testing for variable stationarity. Variable stationarity is also testable; this
equates to testing for a unit row vector in β. This is a restriction tested on any
particular vector in β, and would restrict all coefficients other than the coefficient
on the variable of interest, say X2t , to be zero. The R-form matrix would be:
 
1 0 0 0 0
 0 0 1 0 0 
(7.28) R0 = 
 0 0 0 1 0 .

0 0 0 0 1
The H-form matrix would be:
¡ ¢
(7.29) H= 0 1 0 0 0 .
7.4.5. Testing in PcGive. In PcGive, carrying out the tests described in Sections
7.4.1–7.4.4 involves imposing restrictions on the α and β matrices, which was in-
troduced in Section 7.3. However, it is not simple to put together the code to test
for weak exogeneity of so many possible variables, and over all the possible rank
restrictions. It is informative to consider the test described in the previous few
Sections for different ranks, because, for example if between rank r and r + 1 a
variable becomes endogenous, this suggests that that variable corrects to a partic-
ular cointegrating vector. However, there is no way of automating this procedure
as far as I am aware in PcGive.22 A manual method can be employed, and to this
effect an Ox job exists that prints the batch code for a given number of variables
and deterministic terms called CreateBatchCode.ox. In the job, simply alter the
iP and iDets variables as appropriate, and the code for all ranks for the four types
22This is a big advantage of the CATS for RATS software, which automates the tests in Sec-
tions 7.4.1–7.4.4 for all possible rank restrictions, and tables the output.
of test will be printed as output. Copy and paste this into the constraints part
of the batch file, and proceed through the various restrictions testing one at a time;
a procedure might be:
(1) comment in one line;
(2) run the batch file --> output of restricted VAR;
(3) scroll to bottom of output (ctrl+page dn);
(4) copy the likelihood ratio test statistic and p-value (in square brackets) and
paste it on line of batch file for this restriction;
(5) comment back out line;
(6) go back to (1) until have done all restrictions;
Having done this, one can analyse the test statistic outcomes. During the process
the constraints part of the batch file (which with the restriction code will become
very long) should look something like:
constraints
{
//Exclusion (beta) restrictions.
//rank = 1
//Variable 0
//&5=0; 83.257 [0.0000]**
//Variable 1
//&6=0; 8.7046 [0.0032]**
//Variable 2
//&7=0; 0.23305 [0.6293]
//Variable 3
//&8=0; 1.3792 [0.2402]
//Variable 4
//&9=0; 1.4983 [0.2209]
//Variable 5
//&10=0; 1.7782 [0.1824]
//Variable 6
&11=0;
//rank = 2
//Variable 0
//&10=0; &17=0;
//Variable 1
//&11=0; &18=0;
It might help to have these tabulated, and an Ox job exists which takes the batch
file once all the tests have been carried out and creates LATEX code for tabulating
the results. This job is imaginatively called ReadTestResIntoTex.ox.23 Also
23In its current form it relies on the line spacing remaining as it is when printed out from the
CreateBatchCode.ox job, and also requires that the test statistics and p-values are pasted at the
end of the line (to make sure they’re put in the right place, use ctrl+end then copy there).
54 JAMES READE
within this approach, multiple restrictions can be tested, to see if they hold to-
gether; for example, if two variables appear individually to be weakly exogenous,
by commenting in the two lines that relate to these variables, one can test the joint
weak exogeneity of the two variables.
7.4.6. Testing candidate cointegrating vectors. Theory ought to have provided a
number of candidate cointegrating vectors, which ought to be tested at this point
to see if in fact they are I(0) given the data, and if not, what is needed to make
them I(0). Possible combinations of these vectors will be considered in Section 7.5
when identification of the system is discussed.24 While it is informative to consider
the restrictions of Sections 7.4.1–7.4.4 for all possible rank conditions, there is little
to be gained from varying rank when testing candidate cointegrating vectors.
There may not necessarily be a one-to-one relation between theory cointegrating
vectors, and empirical ones; either theory relations are a combination of partial
empirical ones, or vice versa; identification may be a problem, but also theory
relations may not exist on their own but only in combination with other theoretical
relations.
When testing candidate cointegrating vectors in PcGive, a similar strategy to
that in Section 7.4.5 might be used. For each possible cointegrating vector, writing
a line in the batch code, with a comment describing the relation may prove useful;
especially at the later stage of identification (Section 7.5), since combinations of
candidate restrictions can be commented in as appropriate. Thus an example for
the constraints section of the batch code might be:25
//&19=1; &18=-&20-1; &21=0; &22=0;
//2.6422 [0.1041] price wedge and productivity relation
//&18=1; &19=-1; &20=0; &21=0; &22=0;
//10.688 [0.0048]** pure phillips curve
//&18=1; &19=-1; &20=0; &22=0;
//5.1466 [0.0233]* allowing output
//&18=1; &19=0; &20=-&22; &23=0; //labour demand 0.018685 [0.8913]
In terms of accepting restrictions, it is nice to have rejection p-values of above
20%, which would give good support to the restriction being imposed. The sugges-
tion is that p-values between 10% and 50% give slight support, while those above
50% give strong support. The closer the test statistic is to conventional critical
values, the further into the tails of the distribution are the hypothesised restrictions
and hence the less plausible are these restrictions.
7.5. Identification of β. The β matrix is said to be identified when one can tell
βi0 Xt from βj0 Xt , i 6= j. In (7.23) the cointegrating relations could be written:
(7.30) β10 Xt = β11 X1,t + β21 X2,t + β31 X3,t
(7.31) β20 Xt = β12 X1,t + β22 X2,t + β32 X3,t ,
and without additional information, it would be impossible to tell the two vectors
apart; while there will be numbers for the β coefficients, linear combinations of the
vectors in β and α could be taken to leave the β vector looking completely different.
A way of making each vector, (7.30) and (7.31) distinct is required, so that if linear
24This Section can be ignored if one wishes to identify the system using strategy 2 in Section 7.5.
25The example is taken from the UK labour market job. Each restriction is split over two lines
to avoid breaking the margins on this page!
combinations are taken, the restrictions placed on each vector are destroyed hence
the only matrix that can be placed between α and β is the identity matrix. This
will involve placing zeroes or equality restrictions in each matrix. Formally, there
are rank and order conditions to establish identification, and there are three classes
of identification:
• Rank and order conditions:
Rank condition: If rank(Ri0 β) ≥ r − 1, i = 1, . . . , r then cointegrating
relation i is identified.
¡ ¢
Ri0 β = Ri0 Hϕ = Ri H1 ϕ1 . . . Hr ϕr .
• Classes of identification:
Just-identified: Just enough restrictions so that the likelihood is not
altered; the rank condition holds with equality, and only the r − 1
restrictions that are found by linear combination of α and β are used.
This is formally, or non-economically identified. Economically might
choose which variables to set to zero to satisfy economic relations such
as demand or supply.
Over-identified: The likelihood is altered, and the rank condition holds
with inequality. Hence a LR test can be carried out to test these
restrictions.
Under-identified: The restrictions in place do not identify the system.
A few examples might help. Taking the matrix:
µ ¶
β11 β21 β31
,
β12 β22 β32
then the first thing to do might be to impose a zero in each row, if a variable is not
thought to belong to that relation:
µ ¶
0 β21 β31
.
β12 0 β32
This identifies the system, since adding the first row to the second destroys the zero
restriction in place in the second column, and if adding the second row to the first
destroys the zero restriction on β11 . Formally the rank condition can be checked.
This is rank(Ri0 β) = rank(Ri0 Hϕ) > r − 1. It is pointless to consider R1 and H1 ,
or Ri and Hi more generally, since Ri = (Hi )⊥ hence Ri0 Hi = 0 by construction.
So for identification of the first cointegrating relation, rank(R10 H2 ):
¡ ¢
R10 = 1 0 0 ,
and so  
1 0
H2 =  0 0  ,
0 1
and R0 H1 ≥ 1 needs to be satisfied. So:
 
¡ ¢ 1 0 ¡ ¢
R10 H2 = 1 0 0  0 0 = 1 0 ,
0 1
which has rank 1, hence satisfying the condition. A similar argument would show
the second relation is identified.
56 JAMES READE
However, if either of β11 or β22 were equal to zero, then the system wouldn’t be
identified; the following system is not identified:
µ ¶
0 β21 β31
,
0 0 β32
because adding the second vector to the first does not destroy any restrictions in
place in that vector. This can be shown formally by considering rank(R10 H2 ) > r−1
since this is the condition that identifies the first cointegrating vector, the one to
be checked for identification. Now
 
¡ ¢ 0
R1 = 1 0 0 , H2 =  0  ,
1
so rank(R10 H2 ) = 0 and so the vector is not identified. Equality restrictions can
also be used:
µ ¶
0 β21 β31
,
β12 −β32 β32
would identify both vectors, since adding the first to the second, provided β21 6= β31 ,
would destroy the equality restriction there, while adding the second to the first
vector would destroy the zero restriction on β11 .26
7.5.1. Identification Strategies. There are two strategies one might follow, which
are:27
(1) Just-identify then restrict insignificant parameters This strategy
involves imposing just-identifying restrictions, i.e. a zero in each β vector,
and then having done this, imposing further restrictions if need be, and if
possible. Thus one might exclude a variable that is clearly expected to
affect labour supply only, and one that would surely only affect demand,
and having done this, then for parsimony, given that standard errors can
now be reported on the identified system, insignificant variables could be
omitted from each vector, provided that in doing so the system doesn’t
become un-identified.
(2) Impose known cointegrating vectors Prior testing of stationary re-
lations based on theory might have provided a number of potential coin-
tegrating vectors. Then one might select a number of these and see if
they exist together, since thus far the test has been whether they exist in
isolation.
26The rank conditions also support this; R0 = (1, 0, 0), R0 = (0, 1, 1),
1 2
0 1 0 1
0 0 1 0
H1 = @ 1 0 A, H2 = @ 0 1 A,
0 1 0 −1
hence R10 H2 = (1, 0) and R20 H1 = (1, 1) hence both have rank 1 as required.
27
There is a third strategy, which involves having CATS. It is an option on the CATS for RATS
package called CATSmining. This procedure finds all possible stationary cointegrating relations
and then finds all combinations of these and reports all possible models. This undoubtedly is
data-mining, and has no economic meaning to it. However, it is certainly useful for getting to
grips with the kind of models that might exist, and the kind of relationships that might cointegrate
together.
7.5.2. Long-run identification in PcGive. Identifying by strategy (2) involves im-

posing combinations of the candidate cointegrating vectors tested for in Section 7.4.6,
hence simply involves commenting in combinations of the vectors to see if they hold
simultaneously. There is the caveat though that one will need to change the num-
bers on some of the vector restrictions, since one cannot restrict the same vector to
be two different cointegrating vectors. Thus, for example in the UK labour market
example, p = 6, r = 2 and there is a trend and mean-shift dummy restricted to the
cointegration space, hence in PcGive speak:
 
&0 &1
 &2 &3 
 
 &4 &5 
(7.32) α=  ,

 &6 &7 
 &8 &9 
&10 &11
µ ¶
&12 &13 &14 &15 &16 &17 &18 &19
(7.33) β= .
&20 &21 &22 &23 &24 &25 &26 &27
Thus if one wished to impose as the two cointegrating vectors the price wedge
three productivity relation, and the Phillips curve allowing for output, the two
restrictions listed in Section 7.4.6, then one would have to change them from:
//&19=1; &18=-&20-1; &21=0; &22=0;
//&18=1; &19=-1; &20=0; &22=0;
//5.1466 [0.0233]* Phillips curve allowing output
//&18=1; &19=0; &20=-&22; &23=0; //labour demand 0.018685 [0.8913]
to:
//&19=1; &18=-&20-1; &21=0; &22=0;
//&26=1; &27=-1; &28=0; &30=0;
//5.1466 [0.0233]* Phillips curve allowing output
//&34=1; &35=0; &36=-&38; &39=0; //labour demand 0.018685 [0.8913]
It should be noted that while these restrictions identify β, PcGive will report that
beta is not identified unless a normalisation is carried out on the β matrix;
this is achieved here with the unity restrictions on &19, &26 and &34. It should
also be noted that these restrictions are heavily rejected, with 13.297 [0.0040]**,
so a test statistic of 13.297 and a p-value of 0.4%, suggesting there is only a 0.4%
chance of a false rejection, hence these restrictions should be rejected with a high
degree of confidence. Hence these are not the identifying restrictions settled upon
in the UK labour market analysis. Nevertheless, they show how one might go
about identifying the long-run structure of the CVAR model. In the UK labour
market example, the two cointegrating vectors given below were settled upon in the
initial analysis. They correspond to an employment equation, and a Phillips curve
type relation:
β10 Xt = −0.284 (n − l)t − 0.867 (y − py )t + 0.378 (py − pc )t
(7.34)
+ ∆pyt + 0.005t − 0.018Dst ,
(7.35) β20 Xt = −0.124lt + −0.334 (w − pc )t − (y − py )t + 0.004t − 0.047Dst .
58 JAMES READE
7.6. Identification of α and the short-run structure. This is not so straight-

forward to do economically, because macroeconomic data, which is temporally ag-
gregated, will often miss the reaction of agents to particular impulses, especially if
data is later corrected. Thus this part of the analysis is done empirically as op-
posed to economically, through the omission of insignificant variables in the system.
The transformation to an I(0) model has already taken place with the imposition of
reduced rank in the Π matrix, and because the estimates√ of β̂ are super-consistent,
which means they converge at a rate of T and not T (the rate estimates of α
and Ω converge at), β 0 Xt−1 can be replaced with the fitted cointegrated vectors,
denoted CIit = β̂i0 Xt , in the system:
µ ¶ µ ¶µ ¶
∆X1t α1 CI1t−1
(7.36) = + εt .
∆X2t α2 CI2t−1
All the variables in (7.36) are I(0), and hence inference is now standard. Further-
more because (7.36) is simply a simultaneous equations model, each equation can
be treated in isolation and variables omitted. In the cointegrated VAR, a unified
structure was imposed on all equations, and hence there will be many variables
that can be omitted from each equation, and this will enable identification.
7.6.1. Identifying short-run structure in PcGive. Having identified the long run
structure and run the model, go to:
Test --> Further Output...
--> [check Batch code to map CVAR to I(0) model box]
--> OK
Some batch code will appear in the Results window in GiveWin at this point. It
should look something like:
// Batch code to map CVAR to model with identities in I(0) space
algebra
{
Dlnt = diff(lnt, 1);
Dllt = diff(llt, 1);
Dwmpc = diff(wmpc, 1);
Dympy = diff(ympy, 1);
Dpwedgey = diff(pwedgey, 1);
Ddlpyt = diff(dlpyt, 1);
CIa = +0.0135068 * lnt -0.311093 * wmpc +0.0893986 * ympy...
CIb = +0.0421891 * lnt -0.0421891 * llt -0.285783 * wmpc...
}
system
{
Y = Dlnt, Dllt, Dwmpc, Dympy, Dpwedgey, Ddlpyt;
I = CIa, CIb;
Z = Dlnt_1, Dllt_1, Dwmpc_1, Dympy_1, Dpwedgey_1, Ddlpyt_1,
CIa_1, CIb_1, Constant;
U = , CSeasonal, CSeasonal_1, CSeasonal_2, dum651p, dum731p...
}
model
{
Dlnt = Dlnt_1, Dllt_1, Dwmpc_1, Dympy_1, Dpwedgey_1, ...

Dllt = Dlnt_1, Dllt_1, Dwmpc_1, Dympy_1, Dpwedgey_1, ...
Dwmpc = Dlnt_1, Dllt_1, Dwmpc_1, Dympy_1, Dpwedgey_1, ...
Dympy = Dlnt_1, Dllt_1, Dwmpc_1, Dympy_1, Dpwedgey_1, ...
Dpwedgey = Dlnt_1, Dllt_1, Dwmpc_1, Dympy_1, Dpwedgey_1, ...
Ddlpyt = Dlnt_1, Dllt_1, Dwmpc_1, Dympy_1, Dpwedgey_1, ...
CIa = Dlnt, Dwmpc, Dympy, Dpwedgey, Ddlpyt, Constant, ...
CIb = Dlnt, Dllt, Dwmpc, Dpwedgey, Ddlpyt, Constant, ...
}
estimate("FIML", 1963, 4, 2005, 1, 0, 0);
In the system part, delete the I = ... line, along with the lines in the model
section that relate to each cointegrating vector (CIa, CIb,...). Next, in order to
be able to reduce the model down from its current form, all dummy variables (apart
from seasonal dummies if you have them) need to be moved into the Z = ... line
in the system section, and they should also be added to each line in the model
section. Random commas in the wrong places should also be checked for, such as
in the U = line above. The final code should look something like:
algebra
{
Dlnt = diff(lnt, 1);
Dllt = diff(llt, 1);
Dwmpc = diff(wmpc, 1);
Dympy = diff(ympy, 1);
Dpwedgey = diff(pwedgey, 1);
Ddlpyt = diff(dlpyt, 1);
CIa = +0.0135068 * lnt -0.311093 * wmpc +0.0893986 * ympy...
CIb = +0.0421891 * lnt -0.0421891 * llt -0.285783 * wmpc...
}
system
{
Y = Dlnt, Dllt, Dwmpc, Dympy, Dpwedgey, Ddlpyt;
Z = Dlnt_1, Dllt_1, Dwmpc_1, Dympy_1, Dpwedgey_1, Ddlpyt_1,
CIa_1, CIb_1, Constant, dum651p, dum731p, dum732p,
dum741p, dum792p, dum793p, S1, S1_1, S1_2,
dum19842p, dum19842p_1;
U = CSeasonal, CSeasonal_1, CSeasonal_2;
}
model
{
Dlnt = Dlnt_1, Dllt_1, Dwmpc_1, Dympy_1, Dpwedgey_1,
Ddlpyt_1, CIa_1, CIb_1, Constant, dum651p, dum731p,
dum732p, dum741p, dum792p, dum793p, S1, S1_1, S1_2,
dum19842p, dum19842p_1 ;
..
.
Ddlpyt = Dlnt_1, Dllt_1, Dwmpc_1, Dympy_1, Dpwedgey_1,
Ddlpyt_1, CIa_1, CIb_1, Constant, dum651p, dum731p,
dum732p, dum741p, dum792p, dum793p, S1, S1_1, S1_2,
60 JAMES READE
dum19842p, dum19842p_1 ;
}
estimate("FIML", 1963, 4, 2005, 1, 0, 0);
Finally highlight the code from the algebra line onwards, down to the
estimate("FIML",... line, and either press ctrl+B or right click and select
run --> Run as batch, and the model should run. Having run this code, a
simultaneous equations output will result. One should then proceed by restricting
all insignificant variables to be equal to zero; this is done by going back to the
PcGive module, and clicking on:28
Model --> Formulate... --> OK --> OK
which will bring up a Model Formulation window from which each equation can
be considered. Deleting a variable from here will delete it from that particular
equation. Having deleted a number of variables, the model can be run, and one
should continue doing this until either all insignificant variables have been deleted,
or the test of the restrictions, which is reported at the bottom of the output, on a
line looking a bit like
LR test of over-identifying restrictions: Chi^2(9) = ...
has been rejected.
7.6.2. Other possible strategies for short-run structure identification. More ‘eco-
nomic’ identification procedures are possible; one such is the impose a Choleski
decomposition on the variance-covariance matrix Ω̂, and hence on the entire model;
this is problematic, as it is non-unique and implies a causal chain which may not be
economically justifiable. Another possible method is to isolate large off-diagonal
entries in the variance-covariance matrix, and then take a linear combination of the
equations relating to the two correlated variables in order to get rid of this corre-
lation. However, this is likely to lead to parameters that are hard to interpret,
especially since it is plausible a number of high correlations are simply between
different prices, and while relative prices are useful, it may still be more useful
to have the prices themselves in the model. The assumption that Ω̂ be diagonal
was never made in this context, so it is not a problem. The only possible problem
emerges when one seeks to look at the moving average representation in a structural
manner.
Having reached this stage, the cointegrated VAR model is estimated, and can
be reported. Each equation can be individually reported and interpreted, since
each one is identified. For example, taking again the UK labour market model,
there is an equation for each variable in the model, and below two of the seven
are reported. Each equation is given below, where ecm1t−1 and ecm2t−1 refer
to the two cointegrating vectors in (7.34) and (7.35) respectively, and each will be
commented on in turn. The seasonal dummies, which were not omitted from any
of the equations, are not reported to aid readability.
28Alternatively, clicking the Model Settings button (the middle of the three buttons with tetris
like blue and red bricks on) and clicking “ok” will lead to the same place.
∆nt = 0.9379 ∆nt−1 − 0.6033 ∆lt−1 + 0.0006262

(0.0643) (0.0965) (0.000251)
− 0.01974 dum651pt − 0.004324 dum741pt + 0.01011 dum83pt

(0.00287) (0.00223) (0.00288)
This first equation, for employment, confirms that the variable is indeed weakly
exogeneous, the restriction is accepted easily. Furthermore, it suggests little has
been done to properly explain this variable in this system, since it depends only on
the difference of employment and the labour force, suggesting demographic factors
are equally, if not more, important in explaining where employment is. It is possibly
the case results would be different if the sample was split, since it is certainly likely
in the 1960s and 1970s employment was weakly exogenous and as such a pushing
factor due to strong labour unions and restrictive regulation of the labour market
meaning firms could not alter their employment levels to suit demand and market
conditions, whereas it is likely in the second half of the sample employment was
more responsive to the variables included in the model. The near unity coefficient
on the lagged first difference of employment also lends support to the suggestion
the variable is near I(2).
∆lt = 0.2523 ∆nt−1 + 0.03844 ecm1t−1 + 0.03364 ecm2t−1

(0.0336) (0.00707) (0.00809)
+ 0.1127 − 0.01894 dum651pt + 0.008355 dum83pt

(0.0616) (0.00184) (0.00184)
The labour force equation provides less suggestion of I(2)ness, since the vari-
able does not include its own lag hence displays no significant persistence. The
change in the labour force is related to changes in employment, suggesting that
while demographic factors such as female participation might affect this, employ-
ment opportunities will still draw in peripheral workers, since the sign is positive.
Employment reacts positively to both cointegrating relations and hence is not cor-
recting, perhaps reflecting the fact that adjustment takes place in other variables,
possibly prices, since despite more flexible labour markets, it is still unlikely firms
can shed or hire workers to the extent they would ideally like, hence making its
movement more sluggish and less likely to be quickly adjusting.
8. Extensions of the CVAR model

[Impulse response analysis.]
[Global VAR models, panel CVAR methods.]
9. Data
Great consideration must be given to the collection of data and initial analysis
of it. Ultimately it is the correct inclusion of important variables that determines
the quality of the econometric model that follows, as Section 4 on diagnostic testing
suggested; however, selection of variables is quite possibly the hardest part of the
whole analysis. This Section will discuss firstly identifying the economic problem,
then creating a list of potential variables for inclusion, then factors enabling this list
62 JAMES READE
to be reduced to something more manageable in the cointegrated VAR framework,

before discussing factors related to the cointegrated model and economic data series
that should influence which variables to include, and potential transformations that
might be carried out.
9.1. The Economic Problem of Interest. The first question is: which economic
problem is of interest. Because the cointegrated VAR model allows modelling of
short-run and long-run effects, it is well suited to much economic theory, which
often explicitly refers to short and long-term implications for actions. Further-
more, many macroeconomic models, in particular those in the neoclassical repre-
sentative agent framework involve solving a system of differential equations for a
symmetric equilibrium which corresponds closely to the idea of cointegrating re-
lations. The steady state relationships that these models deliver can be thought
of as possible cointegrating vectors, as was discussed in Section 2 and equation
(2.1) from Pétursson & Sløk (2001). This is not the only paper which uses the
CVAR methodology to investigate the labour market; in Section 9.2 past theory
and empirical work will be discussed.
In this Section also other works using the cointegrated VAR framework will be
listed giving a flavour of what has been done and can be done within this framework.
A few examples of work done should give some idea of the scope available for
learning about the economy. King, Plosser, Stock & Watson (1991) suggest that the
theoretical results of simple real business cycle models (Kydland & Prescott 1982)
type have a “natural interpretation in terms of cointegration”. Real business cycle
models in their simple form were built on the basic neoclassical growth model of
Solow (1970) which posited that per capita consumption (ct ), investment (it ) and
output (yt ) all growth at the same rate in steady state. This immediately implies
cointegrating vectors; notably, the ratios of these variables, often called ‘great ratios’
in the literature:
(9.1) β10 Xt = ct − it ,
(9.2) β20 Xt = it − yt ,
(9.3) β30 Xt = ct − yt .
In another paper, Juselius & MacDonald (2004) consider a number of parity re-
lationships that economic theory provides relating to international currency move-
ments:
Purchasing power parity: Where p is log domestic price level, p∗ log of
foreign price and e is the log spot exchange rate:
(9.4) β10 Xt = pt − p∗t − et
Uncovered interest parity: Where Ete (∆m et+m is an economic expecta-
tion formed at time t about the change in the spot exchange rate e over the
period m = l, s, long and short-term, and it is a bond yield with maturity
t + m:
(9.5) β20 Xt = Ete (∆m et+m /m − (im m∗
t + it )
Term structure of interest rates: Theory suggests that the difference be-
tween long and short interest rates should be stationary, so:
(9.6) β30 Xt = ilt − ist .
Giese (2005) gives a recent investigation of term structure using the CVAR,
and indeed term structure was one of the first applications of cointegration
analysis (Engle & Granger 1987).
Fisher real interest rate parity: This is a decomposition of nominal inter-
est rates into real interest rate and expected inflation components, where r
is the real interest rate:
(9.7) β40 Xt = im m e
t − rt − Et (∆m pt+m )/m.
Thus all these four theoretical relationships suggest possible cointegrating relation-
ships that can be tested. The analysis of Juselius & MacDonald (2004) found
that for the US and Germany none of the above parities on their own were station-
ary, but than combinations of the conditions were stationary; often the purchasing
power parity condition helped ‘make stationary’ other parity conditions.
9.2. Use of past theoretical and empirical work to derive a list of relevant
variables for inclusion. Having decided this, initially existing theories, and past
empirical analyses of the economic entity of interest should be consulted, and a list
of potential variables for inclusion drawn from there. The whole premise of doing
cointegrated VAR analysis is to look at what the data support, and hence one should
allow established theories to motivate choice of variables, but not attention should
not be restricted to any one particular theory - once the variables are selected,
one should allow the data to determine which variables play which role, and hence
which theory or theories are supported.
There exists a huge range of macroeconomic models of the labour market. Pe-
saran (1991) generalises the standard adjustment cost model of factor demand for
labour demand determination. The model specifies some desired level of employ-
ment, lt∗ , which is a function of xt , the set of variables determining this desired
level. The model has adjustment costs, which Pesaran specifies to be the differ-
ence in employment and the second difference, or acceleration in employment, and
an optimisation problem is posited, of which the solution takes a Vector Autore-
gressive (VAR) form. Modern neo-classical representative agent models exist also
which attempt to bring microeconomic foundations into macroeconomic models.
Pétursson & Sløk (2001) present one such model, solving such a model and esti-
mating a cointegrated VAR in order to identify stationary relations motivated by
theory. These relations are, for employment, repeating equation (2.1), where the
variables are employment (nt ), output (yt ), wages (wt ) and consumer prices (pct ):
nt = ϕ0 + ϕ1 yt − ϕ2 (w − pc )t .
The model also provides a steady state relationship for real wage determination,
where lt is labour force and pyt is producer prices:29
(9.8) (w − pc )t = ϕ0 + ϕ1 (y − py − n)t + ϕ2 (lt − nt ) ,
Another labour market analysis is that given in Juselius (2006, Chapter 19), which
considers a larger dataset, producing a number of candidate cointegrating vectors.
Among these are Phillips curve type relations involving unemployment and infla-
tion, along with several relations involving the difference between consumer and
producer price inflation, the price wedge. This variable, written as pyt − pct , can
29This equation is written in a slightly odd form in order to express the variables that were
eventually included in the final model in Reade (2005). (y − py − n)t is productivity, and (lt − nt )
is unemployment.
64 JAMES READE
be seen to measure the bargaining power that sides hold in wage negotiations, and
also the degree of openness to foreign trade of an economy.
Another possible cointegrating relation inspired by economic theory might cap-
ture Okun’s law (Okun 1962), which states the negative relationship between real
output and unemployment with a coefficient of −0.4. On the other hand there
might be a number of pushing forces; in a time of strong labour unions one might
perhaps even suggest real wages or employment to be a pushing variable, with per-
haps price or producer inflation correcting to equilibrium. In times of lower union
power one might expect real wages perhaps to exist in a stationary relationship
with economic activity (proxying productivity) and unemployment.
There are numerous other labour market analyses, such as Carstensen & Hansen
(2000) for Germany, Jacobson, Vredin & Warne (1997) and Jacobson, Vredin &
Warne (1998) for Scandinavian countries, and Corsini & Guerrazzi (2004) consider
the Italian labour market using the cointegrated VAR methodology.
Hendry (2001) considers economic modelling over the very long term, from 1875–
1991. The employment rate, controlling for the participation of women, has been
reasonably constant in the very long term, which leads one to search for other
factors that have been reasonably constant over the very long term; one might be
the output gap, another might be real interest rates. Furthermore, it is to be
expected that despite the efforts of unions, that firms have more power to set both
wages and employment levels, hence the factors determining a firm’s labour demand
ought to be considered. One such factor is the capital stock, which Pétursson &
Sløk (2001) do not model on the grounds that it is poorly measured. If the
labour force of the UK was doubled but the capital stock remained the same,
the employment rate would fall sharply, as would real wages. However, if the
capital stock was to rise commensurately, then one imagines little or no effect upon
the employment rate and real wages. Further, capital plays a large role in how
productive workers are, which then determines how much firms demand. Such a
long term view of employment would cast doubt on real wages as an explanatory
factor; they have risen dramatically over the last 100 years, while the employment
rate has not similarly risen.
A brief look across the wider literature might throw up other candidate variables.
A list of candidate variables for inclusion might be:
(1) employment;
(2) labour force;
(3) hours worked;
(4) wages;
(5) output;
(6) capital;
(7) productivity;
(8) real interest rates;
(9) vacancies;
(10) benefit levels;
(11) unemployment rate;
(12) population;
(13) female participation rate;
(14) consumer prices;
(15) energy prices/factor prices/producer prices;

(16) foreign prices.
9.3. Other considerations for variable selection. Many other considerations
ought to influence variable selection, and are likely to lead to a reduction in the
number of variables modelled. This is desirable, because problems emerge with
large datasets with identification of the long-run structure. A suggestion, at least
initially when one is learning about the CVAR model, is to take at most 6 variables
for a system. Sections 9.3.1–9.3.4 give a number of factors worth considering in
selection of variables.
9.3.1. How many observations? The more observations, the better, perhaps unsur-
prisingly. The higher the frequency the better also. For annual data, one should
seek at least 40/50 years, though ideally one might move down to quarterly or
monthly frequencies. The caution here would however be that with much data for
the macroeconomy, data of higher frequency only captures the same cyclical varia-
tion over a particular cycle. Nevertheless, it gives many more observations, which
is very important when estimating many variable systems, since there is a large
number of parameters being estimated with VARs. So series with very few obser-
vations might not be considered, and effort spent finding longer series, or higher
frequency series. In the list of candidate variables in Section 9.2, this consideration
ruled out vacancies, female participation rate and population.
9.3.2. Logarithmic transformation. Most commonly, data are modelled in loga-
rithms; this is a special case of the Box-Cox transformation, a set of transformations
that are useful for a number of reasons.30 The logarithmic transformation allows
the following:
• stabilisation of variance (hence less heteroskedastic data);
• renders seasonal effect additive hence making seasonal dummies more ef-
fective;
• makes the data Normally distributed by reducing the skewness in the data.
9.3.3. Brevity of system. While the General-to-specific modelling methodology of
Hendry (1995) would suggest including all variables that might be relevant and allow
the regression model to determine which variables actually matter, this is really only
possible in the single-equation framework. Because all variables included in the
VAR are modelled, there is a limit on the number of variables one can include,
especially if the dataset is quite small. With p variables modelled, each additional
lag in the model adds p × p parameters, while each deterministic term adds p
parameters, all of which need to be estimated. This undoubtedly is a drawback
to the VAR methodology, and only huge datasets is likely to alleviate the problem;
but even then, many variables is likely to lead to many cointegrating vectors, and
finding restrictions on all the cointegrating vectors that identifies the system will
be difficult.
30The Box-Cox transformation for a data series X is
T (X) = (X λ − 1)/λ,
where λ is the transformation parameter. When λ = 0 the transformation is:
T (X) = ln(X).
66 JAMES READE
There are a number of considerations which may help reduce the size of the
system. Firstly, it is likely that a number of variables will be highly collinear
in the list in Section 9.2; while some degree of correlation is necessary between
variables in order for any meaningful analysis to be found, if correlation is very
high, it is likely the two variables are explaining the same thing. Also, a number
of the variables in the list can be found by transformations of other variables. For
example, productivity could be found by using real output and employment, and
hence instead of including all three variables, just two are included. Employment
and hours worked will also to a large extent explain the same thing, and so one or
the other ought to be included.
Further, a narrower focus might be more relevant. While it is undoubtedly the

case that foreign competition has an effect on the UK labour market, the effect of
this is likely to be felt through other variables than foreign prices, and in particular
the price wedge between producer and consumer prices is likely to reflect this to
some extent. The study of the exposure of the UK labour market to international
competition is probably another study, and hence considerations such as this might
help narrow down the focus of the study. Additionally, the difficulty of finding
long datasets on the UK capital stock may lead to the alternative strategy of using
a time trend instead; there is however the possibility the time trend is capturing
many other effects, such as the output gap and technology. Nevertheless, capital
stock is omitted at the moment on grounds of data availability. For a similar
reason, benefits are omitted, since data series either measure exactly what would
be required but only go back a small number of years, or are too vague a measure
to be of any use.
9.3.4. Transformation to I(1). It is argued most data is non-stationary, in particu-

lar I(1). Some series however may be found to be I(2), integrated of second order.
This creates problems for the CVAR analysis: the CVAR is a balanced equation, i.e.
all parts of it are I(0) due to differencing and cointegrating relations in the reduced
rank matrix Π. However, if then an I(2) variable is added in, this leaves the CVAR
unbalanced - the error term εt is I(0), but the rest of the equation is I(1). Variables
that tend to be I(2) are nominal variables such wages and prices, because wage and
price inflation is generally I(1), hence the levels of these variables, often useful in
economic analysis, must be I(2). Any variable that appears to move smoothly over
a period of time is potentially I(2), and Figure 9 shows three potentially I(2) series.
The first is the UK consumer price index over 40 years from 1963:1 to 2005:1 and
is very smooth, which strongly suggests I(2)ness. The second and third are labour
market variables, which are possibly I(2), although greater variation before 1985
reduces the possibility that they are I(2). I(2) analysis is very difficult, and not
possible in PcGive. I(2) trends can be tested for in PcGive, while a new version
of CATS currently under development enables the I(2) model to be estimated. As
such, transforming data into I(1) quantities is a useful alternative strategy. There
are a number of standard methods for doing this, and a small understanding of
random walks motivates them. Considering a single random walk process, xt :
(9.9) xt = xt−1 + εt ,
Figure 9. UK data series that display potential I(2) characteristics.
1.0 UK CPI
0.5
1965 1970 1975 1980 1985 1990 1995 2000 2005

3e7
Labour force
2.8e7
2.6e7
1965 1970 1975 1980 1985 1990 1995 2000 2005

Employment
2.8e7
2.6e7
2.4e7
1965 1970 1975 1980 1985 1990 1995 2000 2005
which is solved by recursively substituting:
t
X
(9.10) xt = x0 + εi .
i=1
This characterises the random walk - wherever the process is at the point t, it is
a function of its initial value and every shock that has hit the process since then.
Changing the notation a touch, if the process was prices, pt , and inflation is denoted
πt , then since the inflation process could be written as:
(9.11) pt = pt−1 + πt .
Solving this gives:
t
X
(9.12) pt = p0 + πi .
i=1
Inflation is I(1), a random walk, hence:
t
X
(9.13) πt = π0 + εi .
i=1
68 JAMES READE
Substituting this into (9.12) gives:

 
t
X i
X
pt = p0 + π0 + εj  ,
i=1 j=1
t X
X i
(9.14) pt = p0 + π0 t + εi ,
i=1 j=1
and hence it can be seen prices are a function of the initial price level, a linear time
trend, and a doubly summed error. This is the characteristic of I(2) processes. In
the cointegrated VAR, the moving average representation, or solution, of the vector
Xt of processes, is called the Granger representation (Johansen 1995), and in I(1)
models is, but in I(2) models it becomes much more complicated:
X j
t X t
X
(9.15) Xt = B0 + B1 t + C2 εi + C1 εi + C0 (B)εt .
j=1 i=1 i=1
Equation (9.16) can be seen to follow similar properties to (9.14), with the B
matrices functions of initial values, while the C matrices are complicated matrix
multiplications involving orthogonal complements of many things, but capture that
the process is driven by an I(2) component, the doubly summed part, and an I(1)
component, the singly summed part. Writing (9.16) in matrix form for a 2-variable
system illustrates what needs to be done:
(9.16)
µ ¶ µ ¶ µ ¶ µ ¶ Ã Pt Pj !
X1,t B0,11 B1,11 C2,11 C2,12 ε1,i
= + t+ Pj=1
t Pi=1
j
X2,t B0,21 B1,21 C2,21 C2,22 j=1 i=1 ε2,i
µ ¶ µ Pt ¶
C1,11 C1,12 ε1,i
+ Pi=1
t + C0 (B)εt .
C1,21 C1,22 i=1 ε2,i
Any transformation should be supported by the data, and the idea is to find com-
binations of variables such as C2,11 − C2,21 = 0 and C2,12 − C2,22 = 0, such that:
µ ¶ µ ¶ µ ¶
X1,t − X2,t B0,11 − B0,21 B1,11 − B1,21
= + t
∆X2,t B1,21 0
µ ¶ Ã Pt Pj !
0 0 j=1 i=1 ε1,i
+ Pt Pj
(9.17) 0 0 j=1 i=1 ε2,i
µ ¶ µ Pt ¶
C1,11 − C1,21 C1,12 − C1,22 ε1,i
+ ∗ ∗ Pti=1
C1,21 C1,22 i=1 ε2,i
+ C0∗ (B)εt ,
which probably needs a bit of motivation. Differencing a trend gives a constant,
differencing a constant leaves nothing, differencing a twice integrated error process
gives a singly integrated one, hence the ∗’s on the C parameters, because the
differenced X2,t variable will have a single integrated error, although it will be
in a different matrix to the double integrated one. This is the trick, to knock
out the double integrated errors by linear combination. This is also the idea for
cointegration more generally.
Usual transformations to achieve I(1)ness are called nominal-to-real transforma-

tions, because they usually involve making nominal variables real. If nominal wages
(wt ) are included, or nominal output (yt ), both of which tend to be I(2), then a
real transformation is made (recalling data is in logs) using the relevant price in-
dex, to give (w − p)t or (y − p)t , and these subtractions have the property that they
have dealt with the doubly integrated error in (9.16) as in (9.17). In order not to
throw away information on prices, the differenced price variable is included, like the
∆X2,t variable in (9.17). There is no reason why the transformation should not be
carried out the other way, with ∆yt not ∆pt appearing, but an inflation variable
and GDP level, as opposed to GDP growth and price level is a more economically
coherent model to interpret.31 Labour market transformation? However, while
it is possible to test these I(2) restrictions in CATS, the same cannot be done in
PcGive. In both, one can test for the number of I(2) and I(1) stochastic trends,
but only in CATS can the I(2) model and hence (9.16) be estimated and restricted
appropriately.
In the list of variables in Section 9.2, there are a number of nominal variables
which display strong I(2) characteristics, notably wages, output and the various
price levels. Also, employment and unemployment display near-I(2) characteristics.
To retain all necessary information, the general nominal-to-real transformation for
variables such as output and wages is to deflate by the appropriate price series,
and include the price series in differences. Given log transformations, described in
Section 9.3.2, one might transform wages (wt ) and consumer prices (pct ) to (w −pc )t
and ∆pct . If instead wages deflated by producer prices (pyt ) were required, pyt might
be used instead of pct . The added complication of considering these two prices
means one must consider a transformation that renders these I(1) also. Juselius
(2006) suggests using the price wedge, (py − pc )t , and including either:
• (w − pc )t and ∆pyt , or:
• (w − py )t and ∆pct .
This would enable the price wedge to be used to represent either deflation of real
wages, while the choice depends on which is deemed more of interest: producer
price inflation, or consumer price inflation.32 Turning to employment (nt ), it turns
out that dividing by the labour force (lt ) renders this series more I(1). This gives
the employment rate: ert = (n − l)t .
A trimmed list might be:
(1) employment rate, ert ;
(2) real (consumer) wages, (w − pc )t ;
(3) real output, (y − py )t ;
(4) real interest rates, (r − py )t ;
(5) producer and consumer price wedge, (py − pc )t ;
(6) producer price inflation, ∆pyt .
9.4. Preparing data in PcGive. Having decided on a set of variables, one then
needs to prepare the dataset for use in PcGive. Having found a series for each
variable, it is then advisable to collect the series in columns in an Excel file. Having
given each series an appropriate name, save the file as an Excel 2.1 worksheet. This
31For a more complicated nominal to real transformation see Juselius & MacDonald (2004).
32Which ever choice was made for the differenced price series, this series should also deflate output
- so if the first choice is taken, real output should be (y − py )t .
70 JAMES READE
file will open in PcGive, and also back in Excel should more variables need be added,
or other amendments made.33 It is likely that many transformations of the data
will be needed before the unrestricted VAR can be run. These can be done in
PcGive using the ‘calculator’, having opened the dataset in GiveWin. The dataset
can be opened by following, in GiveWin:
File --> Open Data File...
The ‘calculator’ can be opened by clicking on the button on the toolbar that looks
like a calculator, or by pressing ctrl+C or by following:
Tools --> Calculator...
All sorts of transformations are possible using the calculator, which are listed in
the function window; clicking on the Help button below this gives information on
each transformation. However, it is likely that with any dataset, a large number
of transformations will be made. This will result in a large dataset of the original
variables along with the transformed ones. It is advisable however to, instead of
initially transforming variables and saving the dataset in its new form, to instead
create an algebra file of transformations to run on the original dataset each time
session is begun in GiveWin. Such a file could be created using the Algebra win-
dow in GiveWin (found by Tools-->Algebra Editor..., or ctrl+A or pressing
the button on the toolbar in GiveWin with a blue “A”, a red lower case “l” and
something in yellow (a “g”?)), and clicking on Save As... after each new transfor-
mation has been written in. However, often the transformation that has just been
written here is lost, because clicking on OK:Run causes unsaved code to be lost.
Another method is to have a text file open in the main GiveWin window which
is saved with a .alg extension, which can be run each time another transformed
variable is required, like running an Ox job. An algebra file looks quite similar to
an Ox job, except each variable need not be declared. The abbreviated algebra
file for the UK labour market analysis is:
//log transformations
lpyt=log(pyt); //log of producer prices
lpct=log(pct); //log of consumer prices
lpft=log(pft); //log of foreign prices
..
.
//nominal-to-real transformations
wmpc = lwt-lpct; //real wages (deflated by consumer prices)
wmpy = lwt-lpyt; //real wages (deflated by producer prices)
ympy = lyt-lpyt; //real output (deflated by producer prices)
..
.
//other transformations
prodn = ympy-lnt; //productivity measure (real output/employment)
reprat = wmpc-bmpc;//replacement ratio (wages/benefits)
..
.
As can be seen in this file, some transformation are no longer used, hence are
commented out as they would be in an Ox job. It is perhaps useful, particularly in
more complicated transformations, to write out what the transformation actually
33Excel 2.1 Worksheets will also open in CATS.
is; this way one can keep track of which variable is what, as opposed to two months
later finding a variable called pwedgec and wondering what it might be. When
dummy variables need to be created, they should be created in the algebra file also;
see Section 4.4.2.
Appendix A. The Frisch-Waugh Theorem

The Frisch-Waugh Theorem is a useful tool for estimation of a regression such
as the following:
(A.1) Xt = A0 Zt + ΦDt + εt .
The regression is run by regressing Xt on Dt and Zt on Dt giving ( Xt | Dt ) and
(Zt |Dt ) (where the generic notation ( Yt | Zt ) to signify the regression of Yt on Zt ,
is introduced) providing the residuals:
−1
(A.2) Rxt = Xt − MXD MDD Dt
−1
(A.3) Rzt = Zt − MZD MDD Dt .
Substituting these into (A.1) gives the concentrated model Rxt = A0 Rzt + εt , from
which the estimator of A is found:
(A.4) Â0 = MRxt Rzt MR−1
zt Rzt
.
This theorem is useful when looking at the restricted (cointegrated) VAR, and is
generally useful for thinking about estimation of estimators in models more com-
plicated than the simple AR(1) model.
References
Carstensen, Kai & Gerd Hansen (2000), ‘Cointegration and common trends on the west german
labour market’, Empirical Economics 25(3), 475–493.
Corsini, Lorenzo & Marco Guerrazzi (2004), Searching for long run equilibrium relationships in
the italian labour market: a cointegrated VAR approach. University of Pisa Discussion Paper
43.
Doornik, Jurgen A., David F. Hendry & Bent Nielsen (1998), ‘Inference in cointegrated models:
UK M1 revisited’, Journal of Economic Surveys 12, 533–572. Reprinted in: Michael MacAleer
and Les Oxley (1999) Practical issues in cointegration analysis; Blackwell, Oxford.
Engle, Robert F. & Clive W.J. Granger (1987), ‘Co-integration and error correction: representa-
tion, estimation and testing’, Econometrica 55, 251–276.
Giese, Julia (2005), Level, slope, curvature: characterising the yield curve’s derivatives in a coin-
tegrated VAR model. Unpublished paper, Nuffield College, University of Oxford.
Granger, C. W. J. & P. Newbold (1974), ‘Spurious regressions in econometrics’, Journal of Econo-
metrics 2, 111–120.
Hendry, David F. (2001), ‘Modelling uk inflation, 1875-1991’, Journal of Applied Econometrics
16(3), 255–275.
Hendry, David F. & Carlos Santos (2005), ‘Regression models with data-based indicator variables’,
Oxford Bulletin of Economics and Statistics 67(5).
Hendry, D.F. (1995), Dynamic Econometrics, Oxford University Press, Oxford.
Hendry, D.F. & J.A. Doornik (2001), Empirical Econometric Modelling using PcGive: Volume
II, 3 edn, Timberlake Consultants Press, London.
Jacobson, Tor, Anders Vredin & Anders Warne (1997), ‘Common trends and hysteresis in scan-
dinavian unemployment’, European Economic Review 41(9), 1781–1816.
Jacobson, Tor, Anders Vredin & Anders Warne (1998), ‘Are real wages and unemployment re-
lated?’, Economica 65(257), 69–96.
Johansen, Søren (1995), Likelihood-based inference in cointegrated vector autoregressive models,
Oxford University Press, Oxford.
View publication stats
72 JAMES READE
Johansen, Søren (2004), What is the price of maximum likelihood. Unpublished paper, Department
of Applied Mathematics and Statistics, University of Copenhagen.
Johansen, Søren, Rocco Mosconi & Bent Nielsen (2000), ‘Cointegration analysis in the presence
of structural breaks in the deterministic trend’, Econometrics Journal 3(2), 216–249.
Juselius, Katarina & David F. Hendry (2000), Explaining cointegration analysis: Part ii, Dis-
cussion papers, University of Copenhagen. Department of Economics (formerly Institute of
Economics).
Juselius, Katarina & Ronald MacDonald (2004), Interest rate and price linkages between the usa
and japan: Evidence from the post-Bretton Woods period, Technical Report 00-13, University
of Copenhagen. Institute of Economics.
Juselius, Katerina (2006), The Cointegrated VAR Model: Methodology and Applications, Oxford
University Press. Forthcoming.
King, Robert G., Charles I. Plosser, James H. Stock & Mark W. Watson (1991), ‘Stochastic trends
and economic fluctuations’, American Economic Review 81(4), 819–40.
Kydland, F.E. & E.C. Prescott (1982), ‘Time to build and aggregate fluctuations’, Econometrica
50(6), 1345–1370.
Nielsen, Bent & Anders Rahbek (2000), ‘Similarity issues in cointegration analysis’, Oxford Bul-
letin of Economics and Statistics 62(1), 5–22.
Nielsen, Heino Bohn (2004), ‘Cointegration analysis in the presence of outliers’, Econometrics
Journal 7(1), 249–271.
Okun, Arthur M (1962), ‘Potential gnp: Its measurement and significance’, American Statistical
Association, Proceedings of the Business and Economics Statistics Section pp. 98–104.
Pesaran, M Hashem (1991), ‘Costly adjustment under rational expectations: A generalization’,
The Review of Economics and Statistics 73(2), 353–58.
Pétursson, Thórarinn G. & Torsten Sløk (2001), ‘Wage formation and employment in a cointe-
grated var model’, The Econometrics Journal 4(2), 191–209.
Reade, J. James (2005), A cointegrated VAR analysis of employment. Unpublished paper, Uni-
versity of Oxford.
Solow, Robert (1970), Growth Theory, Clarendon Press, Oxford.
University of Oxford
E-mail address: james.reade@stx.ox.ac.uk

The Cointegrated VAR Methodology

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

The Cointegrated VAR Methodology

Uploaded by

Copyright:

Available Formats

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

The Cointegrated VAR Methodology

Article · January 2005

The user has requested enhancement of the downloaded file.

Abstract. Outline of how to go about a cointegration analysis, as provided by

2. A brief motivation for the cointegrated VAR model

Perhaps a more pertinent problem in time-series econometrics is that of non-

3. The vector autoregressive model

means, µ0 , and trend coefficients, µ1 . The assumption of Normality of the errors

3.2. Conditional factorisation and weak exogeneity. It is worth pointing out

which can be found by recursively substituting (3.8) to get:

This is the moving average representation of the process. A number of possible

The expectation of this is:

The impulse response is defined as:

3.4. A second order autoregressive model. Considering an autoregressive pro-

Figure 1. Simulated impulse responses for three possible scenarios

This can be rearranged as:

3.5. Bi-variate second order vector autorgressive model with determin-

for determining whether or not Π(z)−1 converges as T → ∞. Because the form of

i=0 j=0 k=0 m=0

3.6. The unrestricted vector autoregressive model. Thus by a long tour,

3.6.1. Maximum likelihood estimation of the unrestricted VAR. Maximum likeli-

Thus the likelihood can be written as:

estimation is equivalent to OLS on each equation, and hence:

(3.50) ε̂t = Xt − B̂ 0 Wt−1

by calculating the following test statistic:

From this a likelihood ratio test of restrictions can be written:

3.7.1. PcGive module. To do this, one should follow:

URF equation for: lnt

sigma = 0.00306384 RSS = 0.001286038165

URF equation for: llt

sigma = 0.00194229 RSS = 0.0005168287107

URF equation for: wmpc

Coefficient Std.Error t-value t-prob

sigma = 0.0116637 RSS = 0.0186378236

URF equation for: ympy

sigma = 0.00702474 RSS = 0.006760526004

URF equation for: pwedgey

sigma = 0.00623592 RSS = 0.005327477777

URF equation for: dlpyt

sigma = 0.00631743 RSS = 0.005467662415

log-likelihood 4063.90453 -T/2log|Omega| 5477.16731

4. Diagnostic Testing on the unrestricted VAR model

maximum likelihood statistical framework, which makes a larger number of assump-

The correct choice of lag-length, of deterministic terms, of information set, of

x2 : Normality test: Chi^2(2) = 9.2261 [0.0099]**

Vector Portmanteau(12): 549.823

where s is length of correlogram and rj is the j th coefficient of residual au-

appropriate shift dummy. Recursive testing allows a graphical inspection of the

a test which converges to a complicated distribution called a Brownian bridge, a

(4.7) yt = βyt−1 + δ1{t≥50} + εt ,

(RSSt − RSSt−1 )(t − k − 1) v 2 /ωt

which under the null of constant parameters, is F (1, t − k − 1) distributed.

4.4. Solving Diagnosed Problems.

Figure 2. Plot of series with 4 standard deviation break after T = 50.

Figure 3. PcGive Recursive Plots for break of 4 standard devia-

A lag length of 2 is generally encouraged. This is for a number of reasons.

Innovation outlier in a stationary process

Figure 4. Innovational (first two panels) and additive (third

Test --> Store

Ox version 3.40 (Windows) (C) J.A. Doornik, 1994-2004

or observation (not Ox language) 64

• Rhodesia split from the UK;