Professional Documents
Culture Documents
The Cointegrated VAR Methodology
The Cointegrated VAR Methodology
net/publication/228920249
CITATIONS READS
0 2,894
1 author:
J James Reade
University of Reading
101 PUBLICATIONS 1,831 CITATIONS
SEE PROFILE
All content following this page was uploaded by J James Reade on 05 September 2014.
JAMES READE
Contents
1. Introduction 2
2. A brief motivation for the cointegrated VAR model 2
3. The vector autoregressive model 3
3.1. The system 3
3.2. Conditional factorisation and weak exogeneity 4
3.3. A first order autoregressive model 4
3.4. A second order autoregressive model 6
3.5. Bi-variate second order vector autorgressive model with deterministic
terms 8
3.6. The unrestricted vector autoregressive model 10
3.7. Estimating the unrestricted VAR in PcGive 12
4. Diagnostic Testing on the unrestricted VAR model 14
4.1. The Assumptions of the VAR model 15
4.2. The test output from PcGive 15
4.3. Other information for diagnosing problems 17
4.4. Solving Diagnosed Problems 18
5. The Cointegrated Vector Autoregressive Model 24
5.1. The Model 24
5.2. Constant and Trend 28
5.3. Dummy variables 31
5.4. Estimation and rank determination 33
5.5. Additional Information on Rank Determination 38
6. Limiting Distributions of the Trace test 42
7. Imposing restrictions and Identification 43
7.1. β restrictions 44
7.2. α restrictions 50
7.3. H-form versus R-form and inputting restrictions in PcGive 50
7.4. Using restrictions to understand the system 51
Thanks be first and foremost to God, without whom none of this would be possible. In addition,
I would like to thank Katerina Juselius, Soren Johansen, Heino Bohn Nielsen and Anders Rahbek
who presented the Econometrics Summer School in Copenhagen in August 2005, and Nick Fawcett
for kindly looking over the notes and pointing out numerous (I’m sure) appalling errors in earlier
drafts of these notes. Thanks also to David Hendry, Jennie Castle and Bent Nielsen amongst
others for discussions in Oxford while forming my ‘knowledge’ of the cointegrating VAR model.
1
2 JAMES READE
7.5. Identification of β 54
7.6. Identification of α and the short-run structure 58
8. Extensions of the CVAR model 61
9. Data 61
9.1. The Economic Problem of Interest 62
9.2. Use of past theoretical and empirical work to derive a list of relevant
variables for inclusion 63
9.3. Other considerations for variable selection 65
9.4. Preparing data in PcGive 69
Appendix A. The Frisch-Waugh Theorem 71
References 71
1. Introduction
These notes accompany two seminars given at Cambridge Econometrics on De-
cember 8th 2005, and attempt to espouse the cointegrated vector autoregressive
(CVAR) methodology (Johansen 1995) as presented during the Econometrics Sum-
mer School at the University of Copenhagen in August 2005. Firstly, a brief
motivation for using the CVAR over other forms of modelling for systems of time-
series variables will be given in Section 2, before the theory behind the CVAR will
be given, helping discussion on the motivation for using such models. This will
be done by first considering the unrestricted vector autoregressive (VAR) model in
Section 3 and the various diagnotic tests that can be performed to ensure that the
vital assumptions underlying the statistical inference hold (Section 7), before the
CVAR will be covered in Section 5 After this, data selection issues will be looked
at in Section 9, and an empirical example will be considered. At each stage, im-
plementing the procedure in PcGive will be described, and also accompanying Ox
jobs for various things will be described.1 Please don’t be put off by the daunting
number of pages in these notes; there are many pictures, and bits of output from
GiveWin and Ox pasted in, taking up the space!
1These Ox jobs will be made available at the seminars; however if anyone would like copies of
these jobs, email me.
THE COINTEGRATED VAR MODEL 3
2This model, is discussed in Section 9.2, and the UK labour market is the subject of the empirical
applications in these notes.
4 JAMES READE
3.3. A first order autoregressive model. Equation (3.1) might look a touch
fearsome at first, and as such one might consider a number of examples that grad-
ually build up to the VAR(k) process and the analysis thereof. Firstly the simple
first order autoregressive (AR(1)) process is considered:
(3.8) xt = ρxt−1 + µ + εt .
Just as one solves a differential equation, this model has a solution, a representation
in terms of all the factors contributing to its determination at any particular point,
THE COINTEGRATED VAR MODEL 5
and so:
E(xt |x0 ) = x0 + tµ −→ ∞
2i
X t
X
V ar(xt |x0 ) = ε2i = σ 2 = tσ 2 −→ ∞.
i=1 i=1
Here the mean and variance are functions of t and hence are not stationary. This
is the unit root case, and corresponds to a large number of economic data series.
The explosive case, where ρ > 1, is not considered, but things will also tend to
infinity. The motivation for the moving average formulation of the autorgressive
model can be seen here; it is very simple to characterise the data process under
consideration. This principle is the same for the more complicated models that will
be introduced later. Also helpful for understanding the derivation of the moving
3This is because the first two moments of the Normal distribution completely characterise it.
6 JAMES READE
average representation in more complicated models, it can be shown that if |ρ| < 1
then (3.8) implies, where L is the lag operator which is defined as Lk xt = xt−k :
(3.14) xt − ρxt−1 = εt
(3.15) xt (1 − ρL) = εt
−1
xt = (1 − ρL) εt
(3.16)
= εt + ρεt−1 + ρ2 εt−2 + . . . ,
which is the infinite moving average (MA(∞)) representation, and the last step, in
(3.16), follows from the formula for the infinite sum of an arithmetic progression.
3.3.1. Impulse response analysis. In this setting, using the moving average repre-
sentation, the impulse response function can be discussed. The question is asked:
if the economy is shocked or impulsed now, where will it be in h periods? This
question is formally written:
t+h
X
(3.17) xt+h = ρh xt + ρi (µ + εi ).
i=t
Stationary process
1.0
xt =0.6xt−1 +εt
0.5
0 10 20 30 40 50 60 70 80 90 100
Random walk process
1.0
xt =1xt−1 +εt
0.5
0 10 20 30 40 50 60 70 80 90 100
Explosive process
15
xt =1.03xt−1 +εt
10
0 10 20 30 40 50 60 70 80 90 100
where the step to summations in (3.23) follows from the definition of the sum to
infinity of a geometric progression, and cn → 0 as n → ∞ as it is a function of
|ρ1 | < 1 and |ρ2 | < 1.4 Hence as long as |ρ1 | < 1 and |ρ2 | < 1, then as described
above in (3.16), a infinite order moving average representation exists:
∞
X
(3.24) xt = cn (µ + εt−n ) ,
n=0
and so expectation, again found reversing the steps using the sum to infinity formula
in (3.23):5
∞
X µ µ
(3.25) E(xt |x0 ) = cn µ = = .
n=0
(1 − ρ1 )(1 − ρ2 ) 1 − π1 − π2
4i.e. the coefficients on a particular error dampen out to zero in the limit, but before then could
be any size, but will be decreasing with time.
5Which can be found more simply by assuming a stationary process in (3.20), taking expectations
and rearranging.
8 JAMES READE
So again the process can be characterised using the moving average process, al-
though arriving at the MA(∞) representation is more complicated.
3.4.1. Impulse response analysis. As before, the impulse response is found from
the moving average representation. However, it is more complicated here because
of the second lag. The form of the impulse response can be known; it will be a
function of cn since the value of the process at t + h can be written as:
∞
X
(3.26) xt+h = cn (µ + εt+h−n )
n=0
= c0 (εt+h − µ) + c1 (εt+h−1 − µ) + . . .
(3.27)
+ ch−1 (εt+1 − µ) + ch (εt − µ) + ch+1 (εt−1 − µ) + . . .
(3.28) = · · · + ch (xt − π1 xt−1 − π2 xt−2 ) + . . .
The process is impulsed at time t and the value of the process at time t + h is
of interest, hence for any given h, from (3.27) is what size is ch , since every other
residual before and after t is set to zero in this analysis. Hence the impulse response
is:
∂
(3.29) IR(h) = E(xt+h |x0 , . . . , xt ) = ch −→ 0.
∂xt
where P (z) is a second order function of z which in (3.35) is incorporated into Pn∗ ,
then Pn∗ is exponentially
P∞ convergent if |ρi | < 1, and has a M A(∞) representation,
where Π−1 (z) = i=0 Pi∗ z i :
∞
X
Xt = Pi∗ (ΦDt−i + εt−i ) = Π(L)−1 (ΦDt + εt ) ,
i=0
P∞ P∞
with E(Xt ) = i=0 Pi∗ ΦDt−i and V ar(Xt ) = i=0 Pi∗ ΩPi∗0 . Hence Xt is not
stationary as Dt depends on t, but Xt − E(Xt ) is stationary.
3.5.1. The companion form of a vector autoregressive model. Completing the set of
building block models , there exists a useful way of expressing the VAR(k) process
that is often used; using the VAR(2) model in (3.30), is can be transformed into
companion matrix form:
µ ¶ µ ¶µ ¶ µ ¶
Xt Π1 Π2 Xt−1 ΦDt + εt
(3.35) = +
Xt−1 I2 0 Xt−2 0
(3.36) = ΞX̄t + vt ,
where Ξ, X̄t and vt are suitably defined. Ξ is the companion matrix, and it can be
seen that the potentially p-lagged system has been reduced to a VAR(1) representa-
tion, which is again useful for characterising the model via the MA representation.
The roots of the companion matrix correspond to the roots of the system above,
and are found by solving the eigenvalue problem:
µµ ¶ µ ¶¶
Π1 Π2 I2 0
(3.37) det −ρ = 0,
I2 0 0 I2
which could be equivalently written as:
µ ¶µ ¶ µ ¶µ ¶
Π1 Π2 v1 I2 0 v1
(3.38) =ρ ,
I2 0 v2 0 I2 v2
and using the equations that (3.38) produces:
Π1 v1 + Π2 v2 = ρv1
v1 = ρv2
(3.39) ⇒ Π1 v1 + Π2 ρ−1 v1 = ρv1
and since det(A−ρI) = 0 ⇐⇒ Av = ρIv then (3.39) =⇒ det(ρ−Π1 −Π2 ρ−1 ) = 0,
or:
(3.40) ρ−1 Π1 v1 + Π2 ρ−2 v1 = v1 ⇐⇒ det(I2 − Π1 ρ−1 − Π2 ρ−2 ) = 0.
10 JAMES READE
So if the roots of the characteristic polynomial (ρ) are outside the unit circle, then
the roots of the companion matrix, the solutions to (3.40) ρ−1 , are inside the unit
circle and the system is stationary. This may seem slightly confusing given one
usually looks for roots inside the unit circle; however, it is a polynomial in ρ−1
being solved, and this might help ease confusion. However, the intuition is as in
the simple AR(1) or AR(2) model; the conditions on stationarity thus enable the
MA(∞) representation and so allow characterisation of the more complicated VAR
model.
Then the maximum likelihood estimator of θ given the data Xt is defined as:
(3.45) θ̂ = max L(θ; Xt ),
θ
i.e. the value of θ that, given the assumed distributional form for the parameters
θ and the data Xt , maximises the likelihood function. The likelihood function
can be seen as a measure of plausibility; given the distributional form assumed and
data observed, how plausible is a particular parameter value? Often logarithms are
used in order to make the likelihood function more tractable, because logarithms
are a monotonic transformation. Hence (3.45) might be written as:
(3.46) θ̂ = max log L(θ; Xt ).
θ
In the VAR case, where the residuals are assumed to be Normally distributed with
mean 0 and variance Ω as in (3.1), it can be shown that maximum likelihood
THE COINTEGRATED VAR MODEL 11
leading to:
à " #!
¯ ¯T /2 1 XT
T p/2 ¯ ¯ −1 0
(3.52) Lmax = L(B̂, Ω̂) = (2π) ¯Ω̂¯ exp − tr Ω̂ ε̂t ε̂t
2 t=1
¯ ¯
p¯ ¯
(3.53) L−2/T
max = (2πe) ¯Ω̂¯ .
3.6.2. Hypothesis Testing with the Maximum Likelihood Framework. One benefit
of using maximum likelihood analysis is the ease with which hypotheses can be
tested. A convenient test is the likelihood ratio test. One can impose a particular
hypothesis upon the model, calculate the likelihood value for that restricted model,
and compare it to the unrestricted maximum likelihood estimator. In other words,
one might test the hypothesis:
(3.54) H0 : θ = θ 0 ,
The test assesses how plausible the restrictions are. If the restrictions move the
likelihood far from the values of θ that maximise it hence are most plausible and
supported by the data, this suggests the restrictions should be rejected. Restric-
tions on the model can be formed and the testing of them is conceptually quite
straightforward. Restrictions on the estimates B̂ 0 are formed by constructing
matrices R or H, both of which will be discussed in greater detail in Section 7.
Considering the H form for now, the restrictions are imposed by forming ψ = HB,
hence meaning that the following model is estimated:
(3.56) Xt = HB 0 Zt + εt = ψZt + εt .
12 JAMES READE
Estimating the restricted model provides a restricted set of estimators, denoted not
by hats but by checks:
−1
(3.57) ψ̌ = MXZ H (H 0 MZZ H)
−1
(3.58) Ω̌ = MXX − MXZ H (H 0 MZZ H) MZX
Hence the test statistic is very simple. Thankfully in addition to this, any time-
series econometrics package will calculate the likelihood ratio test of restrictions
imposed.
3.7. Estimating the unrestricted VAR in PcGive. Once the stage of estimat-
ing the unsrestricted VAR has been reached, one ought to have considered which
variables to include in the system, and what is the object of interest (Section 9).
For the examples in these notes, the object of interest is the UK labour market.
At the current stage of research on this, the dataset is, where the variable, its name
in PcGive, and its description are given:
(3.60)
X1t = (y − py )t ympy GDP, expenditure approach, constant prices
X2t = ∆pyt dlypt first difference of GDP Deflator
X3t = (w − pc )t wmpc Real weekly earnings, All activities
X4t = (py − pc )t pwedgey GDP deflator minus CPI
X5t = ert empr Employment rate
X6t = (r − py )t rmpy UK labour force
All variables are in logarithms. Also by this stage, the deterministic terms to
be included should have been decided upon (Section ??); in this model, a trend
is restricted to the cointegrating space because a number of the data series are
trending, and also to possibly accommodate a crude output gap measure. This
entails including a trend and a constant at this stage, and ensuring the constant is
unrestricted. Furthermore, if one has non-seasonally adjusted data, then seasonal
dummies should be included.
There are two methods in PcGive to estimate a system. One can use the PcGive
module, or one can write batch code.
3.7.2. Batch code. Writing batch code is quite similar to writing an Ox job. Having
a batch file saved for each particular project being working on seems a useful idea,
as one can easily change settings without having to go through various different
windows, and as in Ox, one can write comments on lines to help understand what
is going on. To make a batch code file, one can either firstly run the unrestricted
VAR as in Section 3.7.1 and then open the batch editor in GiveWin6 and click on
“Save as. . . ”, or one can open a new text window and copy and paste the following
into the file:
module("PcGive");
package("PcGive");
usedata("BigDatabase.xls");
system
{
Y = empr, r, wmpc, ympy, pwedgey, dlpyt;
Z = empr_1, empr_2, r_1, r_2, wmpc_1, wmpc_2, ympy_1, ympy_2,
pwedgey_1, pwedgey_2, dlpyt_1, dlpyt_2, Trend;
U = Constant, CSeasonal, CSeasonal_1, CSeasonal_2;
}
estimate("OLS", 1963, 4, 2005, 1);
Then save the file with a “.fl” extension, and run the file by either pressing ctrl + R
or clicking on the button on the toolbar with the little man running with a piece of
paper in his hand. The output should look something like what is in Section 3.7.3.
3.7.3. PcGive output.
SYS( 1) Estimating the system by OLS (using BigDatabase.xls)
The estimation sample is: 1963 (4) to 2005 (1)
Normality test: This test is given by, where b1 and b2 are estimates of skew-
ness and kurtosis:
√
T ( b1 )2 T (b2 − 3)2
(4.3) N= + ∼ χ2 (2).
6 24
ARCH test: This test statistic is T R2 where R2 is conventionally defined,
from the regression of εˆt 2 on a constant and εt−1
ˆ 2 to εt−s
ˆ 2
Heteroskedasticity test: This is the White test of heteroskedasticity, and
is an F-test of overall significance on the auxiliary regression of the squared
residuals from the original equation εˆt 2 on the original regressors xi,t−k
and all their squares x2i,t−k . The test is F (s, T − s − 1 − k) where k is
the number of lags, and s is the number of regressors (not including the
constant) in the auxiliary regression.
Heteroskedasticity-x test: This is just as the heteroskedasticity test, but
instead the squared residuals are regressed on the original regressors and
all the cross products of the regressors. Often there are not enough ob-
servations to carry out this test, since it requires regressing on 21 k(k + 1)
regressors.
THE COINTEGRATED VAR MODEL 17
It is worth noting that in testing there will always be spurious rejections; for ex-
ample at the 5% level of significance, 1 in 20 tests will reject spuriously, and hence
a very good model will have a small number of test failures. However, many
more failures than prescribed by the test size suggests problems that ought to be
investigated.
4.3. Other information for diagnosing problems. In assessing any particular
model, in addition to test results, other information can be useful for understanding
why a particular test might have failed. These methods will now be considered.
4.3.1. Graphical analysis of residuals. If Normality tests have strongly failed, look-
ing at a plot of density of the residuals for each equation of a VAR can be instructive,
as certain, non-Normal, patterns observed indicate particular problems:
Bi-modality: Suggests a break has been missed in the system, since things
are around two or more means;
Skewed distributions: Imply a transformation of the data might be re-
quired, such as taking logs (Section 9.3.2);
Fat tails: Suggest there are many outliers which ought to be taken into ac-
count in the model via indicator variables (see Section 4.4.2).
To get plots of the densities of residuals, go to:
Test --> Graphic Analysis... --> Residual density
and histogram (kernel estimate)
Standardised residual plots (residuals minus their mean, divided by their stan-
dard error ) are also useful in providing an idea about whether outliers are causing
test failures. Plots for each equation can be found in PcGive:
Test --> Graphic Analysis... --> Residuals (scaled)
Because the standardised residuals should then follow a standard Normal distri-
bution, 95% of the mass should lie between ±2 (as this is 2 standard deviations
either side of the mean). A “large” value is somewhat subjective; Juselius &
Hendry (2000) suggest standardised residuals larger than 3.3 in absolute value,
while Nielsen (2004) describes ‘usual practice’ to be absolute standardised resid-
uals of size greater than 3.9. Large residuals suggest outliers, and methods for
dealing with these are discussed in Section 4.4.2.
4.3.2. Recursive Analysis. The world is non-stationary, and hence events happen
that lead to long-term changes in parameter values such as changing monetary
policy regimes or changes of government. One of the assumptions underlying
statistical inference is the constancy of parameters, and hence this should also be
tested. The effects of missing such non-constancies on inference can be indicated
by considering the very simple model:
(4.4) yt = µ + βyt−1 + θdt + εt .
Here, dt = 1{t>ta } , an indicator variable taking the value 1 when t > ta , and zero
otherwise. An OLS regression:
(4.5) yt = µ + βyt−1 + εt ,
would result in an estimate of µ that is some weighted average of its true value
when t ≤ ta , µ, and its true value when t > ta , µ + θ. Thus working out where this
structural break took place, ta , is of paramount importance, in order to include an
18 JAMES READE
where β̂ (T ) would be the full sample estimate. One then monitors how much the
estimate of β alters. The reference point for variations, so β doesn’t vary “too”
much over the sample, is to take β0 = β̂ (T ) and compare β̂ (t) to this. There are
various tests, but one simple and effective one is the sup or max test:
¯ ¯
¯ ¯
max ¯β̂ (t) − β0 ¯ ,
t
with β = 0.8 and δ = 4, hence a four standard deviation break. The series is
plotted in Figure 2 and the recursive plots in Figure 3 are all the plots possible for
the estimator on the lagged dependent variable. When estimating in PcGive, one
must remember to check the box for recursives. The panels in Figure 3 are:
(1) the actual β estimate, with error bands (that assume no structural break);
(2) the t-value;
(3) residual sum of squares;
(4) the residuals yt − βyt−1 .
1/2
(5) standardised innovations vt = (yt − βyt−1 )/ωt , where
0 0 −1
ωt = 1 + yt (yt−2 yt−2 ) yt−1 .
(6) 1-step Chow test statistics:
7The start point, y , is fixed by assumption, while the end point is fixed since at this point
0
β̂ (t) − β0 = 0
THE COINTEGRATED VAR MODEL 19
Var1
20
15
10
0 10 20 30 40 50 60 70 80 90 100
Var1_1 × +/−2SE
75 t: Var1_1
1
50
0 25
20 40 60 80 100 20 40 60 80 100
RSS 5.0 Res1Step
150
100 2.5
0.0
50
−2.5
20 40 60 80 100 20 40 60 80 100
Innovs
4 1up CHOWs 1%
5 3
2
0
1
20 40 60 80 100 20 40 60 80 100
1.0 Ndn CHOWs 1%
1.0 Nup CHOWs 1%
0.5 0.5
20 40 60 80 100 20 40 60 80 100
4.4.1. Specifying Lag length and Information Set. Lag length is an important issue,
because choosing wrongly has strong implications for subsequent modelling choices.
If too few lags are chosen, then systematic variation will show up in the residuals
and hence the autocorrelation may have failed, but the penalty if too many lags are
chosen is drastically fewer degrees of freedom, as in a p-dimensional VAR, adding
another lag adds p × p variables. To complicate matters further, adding another
variable to the model (increasing the information set) is often a better strategy than
adding another lag. This could be for two reasons; firstly the systematic variation
in the residuals showing up as autocorrelation need not be because a lag is not
included, but because an important variable is not included, and second because
adding a variable adds k × (2p − 1) variables; so for k = 2 then once p ≥ 4 adding
another variables increases the number of parameters less than adding a lag.
20 JAMES READE
4.4.2. Using dummy variables for outliers. As mentioned in Section 4.3.1, the ex-
istence of many outliers in the information set will lead to fat tails in the residual
distribution, and a structural break will lead to bi-modality, and hence the failure of
the Normality test. The use of dummy variables can alleviate both these problems.
Considering first outliers, Nielsen (2004) identifies two types of outliers that exist in
dynamic systems of data; additive and innovational outliers. Thinking in terms of
earlier impulse responses (see Section 3.3.1), innovational outliers are large values
which then have a subsequent effect on the dynamics of the system; so for example
the system is dislodged, the impulse, and it either takes time to settle back to its
mean level (panel 1, the stationary series), or it settles at its new level (panel 2, the
random walk case). On the other hand however, additive outliers have no effect
at all on the dynamics of the system, as shown in panel 3. Innovational outliers
are generally economic events, such as the UK’s exit from the ERM in 1992, which
may show up in residuals of the change in the exchange rate as a large residual
next observation, and then in subsequent observations, the exchange rate adjusts
to this large value before settling back down. Additive outliers are generally things
outside the system, such as typing mistakes in the compilation of the data. Having
identified outliers by an appropriate strategy, one must then consider the nature of
each outlier.
In PcGive and Ox. An Ox program exists (Outliers.ox) that takes the residuals
saved from each equation from PcGive and reports the date and series for each
standardised residual that is greater than a user-specified size, and reports the
standardised residuals that come before and after that observation; one should
proceed as follows: having estimated the unrestricted VAR (Section 3.7.2) then in
the PcGive module follow:
8Data will either be seasonally adjusted or seasonal dummies will be added hence any annual
element to behaviour is in theory removed.
THE COINTEGRATED VAR MODEL 21
0.5
0 10 20 30 40 50 60 70 80 90 100
Innovation outlier in a random walk process
1.0
xt =1xt−1 +εt
0.5
0 10 20 30 40 50 60 70 80 90 100
Additive outlier
1.0
0.5
0 10 20 30 40 50 60 70 80 90 100
10.0
7.5
0
5.0
−5 2.5
0.0
−10
0 20 40 60 80 100 0 20 40 60 80 100
−10.0
15
−12.5
−15.0
10
−17.5
5
−20.0
0 20 40 60 80 100 0 20 40 60 80 100
observation in stages (by splitting the sample and adding a block of dummies at a
time, be it in halves, or thirds and so on), retaining the significant dummy variables,
and then regressing on a ‘union’ model of all the significant dummies from the vari-
ous splits. My current research aims to extend this work into the VAR framework,
which is naturally much more complicated than the single equation situation.
Nevertheless, shift dummies should be used with caution. One could model
a random walk as a stationary process with sufficiently many level shifts, as a
random walk can produce movements which look like structural breaks, as Figure 5,
which plots four generated random walk series, shows. One should have a strong
justification for a mean-shift dummy, such as a known exogenous events like German
reunification, or a change between exchange rate regimes.
³P ´ Pk
k
where Π = i=1 Πi − 1 and Γj = i=j+1 Πi . As a simple example, if a VAR(1)
model was estimated:
(5.2) Xt = Π1 Xt−1 + εt ,
then the CVAR transformation is:
(5.3) ∆Xt = ΠXt−1 + εt ,
with Π = Π1 − 1. A VAR(2) model such as in (3.30) would provide a CVAR:
(5.4) ∆Xt = ΠXt−1 + Γ1 ∆Xt−1 + εt ,
where Π = Π1 − 1 + Π2 , and Γ1 = −Π2 .
This transformation has a number of advantages over the simple unrestricted
VAR:
(1) By combining levels and differences, the multicollinearity often present in
macroeconomic data is reduced, as differences are much more orthogonal
than levels are;
(2) It gives a more intuitive explanation of the data, as effects can be cate-
gorised into long run and short run effects.
(3) All the long run information is confined to the Π matrix, hence focus can
be placed on that.
(4) The Γk matrices capture the short run dynamics of the data.
(5) As the data is most likely non-stationary, so Xt ∼ I(1), another advantage
of this representation is that inference is improved by the fact that ∆Xt ∼
I(0).
Following on from the final advantage, there remains a problem, because Xt is still
in the equation, and Xt ∼ I(1) yet ∆Xt , εt ∼ I(0) hence the equation is unbalanced.
Just as in the univariate case, when there is a unit root, the idea is transform the
model in order to put this term to zero. In the multivariate case this corresponds
in effect to setting rows (or columns) of Π to zero, meaning it has a reduced rank of
r < p, where p is the number of variables in Xt . Now any reduced rank matrix can
be factorised into two p × r matrices α and β, such that Π = αβ 0 . Because β 0 is
r × p one sees it matrix-multiplies the Xt−1 vector to provide a linear combination
of the variables of the system. This linear combination β 0 Xt−1 must be stationary
in order for the equation to be balanced, and hence this factorisation provides r
stationary linear combinations of variables, known as cointegrating vectors. Giving
an example, if r = 1 and p = 2, then β is 2 × 1 and so
µ ¶ µ ¶
α1 ¡ ¢ α1
αβ 0 Xt−1 = β1 β2 Xt−1 = (β1 X1t + β2 X2t ) .
α2 α2
Thus the linear combination that is β 0 Xt will remain intact, and will be multiplied
by two constants, α1 and α2 . With this factorisation, (5.1) becomes:
(5.5) ∆Xt = αβ 0 Xt−1 + Γ1 ∆Xt−1 + · · · + Γk−1 ∆Xt−k+1 + ΦDt + εt .
when r > 1, the β matrix is generally spoken of in terms of its rows, and the α
matrix in terms of its columns, and this will happen from now on in these notes.
This is because each row of β constitutes a cointegrating vector, a stationary linear
combination of the variables in Xt , while a column in α describes the reaction of
each variable in Xt to a particular cointegrating vector.
26 JAMES READE
However, this highlights the disadvantage with estimating the cointegrated VAR
model: as with making any restriction, there is the probability the restriction is
incorrectly imposed. To minimise the possibility of this, one must ensure the
diagnostic checks described in Section 4 are satisfied.
As in the simpler examples discussed in Sections 3.3–3.5, it is useful to derive the
solution to the cointegrated VAR model. Considering a VAR(1) model transformed
into CVAR form:
(5.6) ∆Xt = αβ 0 Xt−1 + εt .
Next the orthogonal complement of any given p × r matrix α, denoted α⊥ and of
dimension p × (p − r) is defined to be such that:
0
• α⊥ α=0
• (α, α⊥ ) is of full rank.
The orthogonal complement is fundamentally useful in VAR analysis; it pays to
understand how it is formed, and Section 7 on testing restrictions will enter more
into its form. Firstly note that the CVAR can be written as:
(5.7) β 0 ∆Xt = β 0 αβ 0 Xt−1 + β 0 εt
(5.8) β 0 Xt = (Ip − β 0 α) β 0 Xt−1 + β 0 εt
∞
X i
(5.9) β 0 Xt = (Ip − β 0 α) β 0 εt−i ,
i=0
where (5.9) exists provided the eigenvalues of (Ip − αβ 0 ) lie within the unit circle.
This is a stationary representation of the CVAR, because β 0 Xt are the cointegrat-
ing and hence stationary relations in the system Xt . Continuing, and using the
orthogonal complement of α in (5.9) gives:
0 0
(5.10) α⊥ ∆Xt = α⊥ εt
0 0 0
(5.11) α⊥ Xt = α⊥ Xt−1 + α⊥ εt
t
X
0 0 0
(5.12) α⊥ Xt = α⊥ εi + α⊥ X0 .
i=0
Thus along with a stationary expression in (5.9) a random walk expression (5.12)
can be derived from the CVAR. These two expressions can be brought together
using the following identity:
0 −1 −1
(5.13) Ip = β⊥ (α⊥ β⊥ ) α⊥ + α (β 0 α) β0.
Multiplying through by Xt shows how this can be done:
0 −1 −1
(5.14) Xt = β⊥ (α⊥ β⊥ ) α⊥ Xt + α (β 0 α) β 0 Xt
à t
!
−1
X
0 0 0
= β⊥ (α⊥ β⊥ ) α⊥ εi + α⊥ X0
i=0
(5.15) Ã t−1 !
−1
X i
0 0 0 0 t
+ α (β α) (Ip − β α) β εt−i + (Ip − αβ ) X0
i=0
t
X ∞
X
(5.16) =C εi + Ci∗ εt−i + A,
i=0 i=0
THE COINTEGRATED VAR MODEL 27
X1 × X2
5.0
2.5
0.0
−2.5
−5.0
−7.5
−12 −10 −8 −6 −4 −2 0 2 4 6 8
0 −1
0 −1 i
where C = β⊥ (α⊥ β⊥ ) α⊥ , and Ci∗ = α (β 0 α) (Ip − αβ 0 ) β 0 , and A collects the
remaining terms, the initial values. This (5.16) is the Granger representation and
is the moving average representation for the cointegrated VAR. It shows that the
system of variables
Pt under consideration can be broken P∞ down into a random walk
component (C i=0 εi ), a stationary component ( i=0 Ci∗ εt−i ), and initial values,
A. Thus shocks to the system can have both a permanent effect (random walk
component), and/or a transitory effect (the stationary component). The random
0
Pt
walk parts, α⊥ i=0 εt , are the stochastic or common trends in the data, while the
stationary parts, β 0 Xt are the cointegrating relations.
The kind of movements in data, and patterns, that resemble cointegrated series
can be seen by looking at Figure 6, which plots two simulated data series against
28 JAMES READE
This kind of analysis is not dissimilar to the analysis of many theoretical macroe-
conomic models which are systems of differential equations, and for which phase
diagrams are plotted and transition paths to equilibrium discussed. For sure this
is a very simplified example, but if there are, say two cointegrating relations and
three variables, then this could be imagined as two attractor sets in the diagram
above, perhaps demand and supply systems, or an IS-LM system. This gives an
alternative way to think about correlations observed between variables in two or
more dimensional systems, as it allows a dynamic aspect to the formation of these
relationships. It is the size of the α coefficients that describe the speed of adjust-
ment back to equilibrium, so systems with higher α coefficients might be thought
to have less spread around the attractor set. However, this ignores the interaction
between the α and β coefficients. Each α coefficient describes the path back to
equilibrium for each dimension of the system. The stronger are the α coefficients
the more they dominate and hence the path back to equilibrium is more along these
directions specified, while if α coefficients are small then the movement back will
be not so large or strong, and also the common trends pushing the variables up
and down the attractor set will have more influence giving a more varied pattern.
The general thing is the smaller is α the less is the movement towards the attractor
set and the less obvious it is that there is even an attractor set. This leads to the
classic statistical inference problem: the existence or not of a particular phenome-
non in the data. Smaller α values mean slower mean reversion and require more
observations, longer datasets, to unearth.
Furthermore, with it’s description of short-run and long-run effects, the cointe-
grated VAR provides a bridge to economic theory, which often posits effects over
such time horizons for policies or actions of agents. This helps motivate use of the
technique.
5.2. Constant and Trend. The important principle to learn is that the level
and trend of an economic process do not translate into what is estimated from an
economic model necessarily, especially given the potent threat of mis-specification
10The series were generated according to the model:
0 1
„ « „ « X1t−1 „ «
∆X1t −0.25 ` ´ ε1t
= 1 −0.7 1 @ X2t−1 A + ,
∆X2t 0.25 ε2t
1
i.e. the model has a constant in the cointegrating relations, and has p = 2 and r = 1.
THE COINTEGRATED VAR MODEL 29
where the second line comes from differencing the Granger representation. Then:
(5.27) E(∆Xt ) = Cµ0 = γ0 .
Considering next the stationary component of the CVAR (5.24) is transformed into:
(5.28) Xt = (I − αβ 0 ) Xt−1 + µ0 + εt
(5.29) β 0 Xt = (I − β 0 α) β 0 Xt−1 + β 0 µ0 + εt
∞
X i
(5.30) β 0 Xt = (I − β 0 α) β 0 (εt−i + µ0 ) ,
i=0
where the last line follows from the fact the second line is stationary. Then taking
expectations:
∞
X i
(5.31) E (β 0 Xt ) = (I − β 0 α) β 0 µ0
i=0
−1
(5.32) = (β 0 α) β 0 µ0 = β0 ,
30 JAMES READE
where the last line is found using the formula for a geometric progression. Then
considering the formula for the skew projection given in (5.13) applied to µ0 :
−1 −1
(5.33) µ0 = α (β 0 α) β 0 µ0 + β⊥ (α⊥
0
β) 0
α⊥ µ0 = αβ0 + γ0 ,
where β0 and γ0 are defined in (5.27) and (5.32). Equation (5.33) can then be used
to write (5.24) as:
(5.34) ∆Xt − γ0 = α (β 0 Xt−1 + β0 ) + εt .
This is the equilibrium correction model written in deviations from equilibrium.
From (5.27), γ0 is the equilibrium growth rate while from (5.32) β0 is the equilibrium
mean. This shows that the deterministic terms in the CVAR are not as simple
as in the individual economic processes. This is a simple example of the general
result that from the CVAR in (5.5):
k−1
X
(5.35) ∆Xt = αβ 0 Xt−1 + Γi ∆Xt−i + ΦDt + εt
i=0
k−1
X
(5.36) =⇒ E(∆Xt ) = αE(β 0 Xt−1 ) + Γi E(∆Xt−i ) + ΦDt
i=0
=⇒ ∆Xt − E(∆Xt ) = αβ 0 Xt−1 − αE(β Xt−1 ) 0
k−1
X k−1
X
(5.37)
+ Γi ∆Xt−i − Γi E(∆Xt−i ) + εt ,
i=0 i=0
which gives the same form, and E(β 0 Xt−1 ) and E(∆Xt ) have the intuitive interpre-
tations as disequilibrium mean, and growth rates respectively. There then follow
based on (5.33) five possible cases for the constant and trend in the CVAR:
k−1
X
(5.38) ∆Xt = αβ 0 Xt−1 + Γi ∆Xt−i + µ0 + µ1 t + ΦDt + εt
i=0
0
(4) Restricted trend and unrestricted constant α⊥ µ1 = 0, i.e. trend
restricted to cointegrating relations. Get constant in data (i.e. growth)
and trend in cointegrating relations.
µ ¶0 µ ¶
β Xt−1
(5.43) ∆Xt = α + θ + εt ,
β1 t
5.3. Dummy variables. Dummy variables have already been discussed in Sec-
tion 4.4.2 as a method for dealing with extreme observations and structural breaks.
The CVAR with dummy variables in is written:
(5.46) ∆Xt = αβ 0 Xt−1 + ΦDt + εt .
If d dummy variables are specified, Dt is a d × 1 vector of the dummies, while Φ
is a p × d matrix of coefficients; all else is as before. It is important to realise the
impact dummy variables have on the CVAR, through the levels and differences,
and the permanent and transitory effects. A first idea at the effects can be gleaned
from the Granger representation from (5.16), slightly rewritten:
t
X
(5.47) Xt = C (εi + ΦDi ) + C(L) (εt + ΦDt ) + A,
i=1
where A are the initial values and:
0 −1 0
(5.48) C = β⊥ (α⊥ β⊥ ) α⊥ ,
∞
X
−1 i
(5.49) C(L) = α (β 0 α) (I + β 0 α) β 0 Li .
i=0
The summations in (5.47) show that deterministic terms cumulate in the CVAR.
There are two summations in (5.47), the first one for the random walk component,
the effects of which are permanent as with the impulse response analysis considered
in Section 3.3.1, while the second one is for the stationary component, where the
effects of a dummy will fade. Three types of dummy variable might be identified:
Transitory Impulse: d = (0, . . . , 0, 1, −1, 0, . . . , 0),
∆d = (0, . . . , 0, 1, −2, 1, 0, . . . , 0),
Pt
i=0 di = 0.
Permanent Impulse: d = (0, . . . , 0, 1, 0, . . . , 0),
∆d = (0, . . . , 0, 1, −1, 0, . . . , 0),
Pt
i=0 di = 1.
Mean-shift Dummy: d = (0, . . . , 0, 1, 1, . . . , 1), ∆d = (0,
. . . , 0, 1, 0, . . . , 0),
Pt Pt
i=0 di = i=0 1{i>Tb } = Tb ,.
The reason the differenced dummy variables are included is that in the CVAR they
will appear in differences. In each process the dummy will appear in levels. The
summing of the deterministic terms can be considered also; a transitory impulse will
have no effect on the process in the long run since in both summations, the effect
of the shift is immediately corrected. On the other hand, a permanent impulse
in one variable has a long run effect on the data through the permanent effect (C
in (5.47)), and a transitory but fading effect through the stationary component.
Due to the accumulation effect, a mean shift dummy actually translates into a time
trend after the break in the data. However, for all the complicated analysis that
could be carried out, the simple advice is to match the pattern observed in the
residuals of the model for a dummy variable. Hence the mean shift is unlikely to
be added, at least in unrestricted form. If a clear structural break has taken place
however, such as a change in exchange rate system, then a shift dummy might be
included but restricted to the cointegration space:
µ ¶0 µ ¶
β Xt−1
(5.50) ∆Xt = α + θ∆Dst + εt .
β̃0 Dst
From the definition of C(L) in (5.49) it can be seen that the cumulated effects of a
shift dummy in the cointegration space fade away and are not permanent.
5.3.1. Unified structure. Dummy variables enter all equations in the CVAR, so Φ
has p rows, in order that the system remains in a form that can be estimated by
MLE. In Section 7.6 this unified structure is relaxed to allow short-run identification
of the system.
THE COINTEGRATED VAR MODEL 33
5.3.2. Seasonal dummy variables. This analysis of dummy variables in the CVAR
gives the rationale for using centred seasonal dummies not normal seasonal dummies
in cointegration analysis.12 A normal seasonal dummy for, say, quarterly data,
would look like:
Ds = (1, 0, 0, 0, 1, 0, 0, 0, 1, . . . ),
and from above, along with the two other lagged values of ds, this would accumulate.
Centered seasonal dummies are such that there is no cumulation, so that the first
is, say, as above, but the next two are:
(0, −0.5, 0, 0, 0, −0.5, 0, 0, 0, −0.5, 0, . . . ),
and:
(0, 0, −0.5, 0, 0, 0, −0.5, 0, 0, 0, −0.5, 0, 0, . . . ),
hence there is no cumulation when summed.
5.3.3. Exogenous Variables. Exogeneous variables, while not deterministic, can also
be included within the system in a similar way to deterministic terms. Sometimes
it might be required to include a variable that isn’t actually modelled, and is hence
conditioned on. One example might be the oil price, although any variable be-
ing included in this way ought to have been previously tested for weak exogeneity
(see Section 7). Alternatively if one has a large number of variables, it might be
of interest to reduce the size of Xt by conditioning on a number of the variables,
dependent again on the outcome of weak exogeneity testing. The vector of exo-
geneous variables is denoted as Zt and enters like other deteriministic terms, in its
levels and differences where appropriate, so:
µ ¶0 µ ¶
β Xt−1
(5.51) ∆Xt = α + θ∆Zt + εt ,
β0 Zt
5.4. Estimation and rank determination. Having now considered the form of
the cointegrated VAR model, in this Section estimation in the CVAR model is
considered, along with rank determination. The cointegrated VAR is estimated
for rank r < p:
k−1
X
(5.52) Hr : ∆Xt = αβ 0 Xt−1 + Γi ∆Xt−i + ΦDt + εt .
i=1
If the data in levels are I(1), Xt ∼ I(1), then the p × p matrix Π must be of reduced
rank
r < p, since it is the coefficients on Xt , while the other parts of the equation,
∆Xt , εt ∼ I(0). Something needs to be done to balance the equation in (5.5),
otherwise inference will be spurious (Granger & Newbold 1974). Three cases for
the rank r are possible:
r = p: The data are I(0) in levels as otherwise the model would be imbalanced.
Thus estimate VAR in levels, Xt .
r < p, r > 0: System is of reduced rank, and linear combinations of variables
can be found that are stationary.
r = 0: No cointegration. The VAR in differences (∆Xt ) should be run; there
are no long run relationships.
12This is relevant when using data that is not seasonally adjusted. In PcGive, then in the “Data
Selection”, ensure CSeasonal is selected, and not Seasonal when adding seasonal dummies.
34 JAMES READE
Thus both processes are I(1) variables as they have a random walk, an integrated
error, in their determination. The idea then is to combine these two variables in a
system so that the resulting variable is stationary. An example would be:
(5.55) bx1t − ax2t = bε1t − aε2t ,
which is I(0), since the integrated error, the I(1) part, has been cancelled by the
linear combination, and the cointegrating vector is β = (a, −b). Furthermore,
linking into the discussion on common trends, both variables
Pt in the 2 variable
system are driven by the same common stochastic trend, i=0 ε3i .part of the CVAR
that is in levels, the ΠXt−1 part. This doesn’t reduce the explanatory power of
the model by reducing information - it increases it since now stable combinations of
variables are included in the model which describe steady state relationships that
theory often suggests, and further accurate inference on these can be carried out.
Thus Π is factorised into Π = αβ 0 , where α and β are p × r matrices, where r is the
rank of the matrix Π, and p is the number of variables in the system. This could
be written equivalently as:
β1
¡ ¢
β2
(5.56) Π = αβ 0 = α1 α2 . . . αr . = α1 β10 + α2 β20 + · · · + αr βr0 ,
..
βr
where each αi or βi is p × 1, and so when multiplied by Xt it can be seen that:
βi0 Xt = β1i X1t + β2i X2t + · · · + βpi Xpt ,
describes a combination of variables that is stationary.
It is more likely however that r < p and so Xt ∼ I(1), and it is of vital importance
to accurately determine r, the cointegration rank. This is the same as the univariate
case where it is vital to distinguish between stationarity and non-stationarity for
asymptotic distributions and hence inference. Any restriction from a rank of p
produces a special case of the unrestricted model:
(5.57) H(r = 0) ⊂ H(r = 1) ⊂ · · · ⊂ H(r = p),
where H(r = 0) is simply the VAR model in differences, while H(r = p) is the
case where the data is stationary, the model of full rank. The testing procedure
is to test in this order also, so to test LR( H(r = 0)| H(r = p)) first, then proceed
if the test is rejected to r = 1, and to r = 2 if this is rejected, and so on until
a test is not rejected. Then this hypothesis is accepted. The eigenvalues of the
THE COINTEGRATED VAR MODEL 35
companion matrix (Section 3.5.1) are fundamental here. Each eigenvalue λi can
2
be read as Corr (∆Xt , vi Xt−1 ) , where vi is the corresponding eigenvector. Thus
the correlation between an I(0) quantity, ∆Xt , and vi Xt−1 which may or may not
be I(0) is reported. The rationale here is that an I(0) process cannot be correlated
with an I(1) process, at least not if the sample size is large enough. This is because,
Pt
to take an I(0) process, yt = µ0 + ut 13, and an I(1) process, zt = i=0 εt 14, the
correlation of yt and zt is:
Cov(yt , zt ) µ0 x0
(5.58) Corr(yt , zt ) = p p =p p ,
V ar(yt V ar(zt ) σu2 tσε2
hence as the sample size increases the correlation decreases. On the other hand, if
both variables are I(0), which suggests that vi or βi has done the trick and rendered
the levels combination stationary, then the eigenvalue λi will be significant. Thus
the LR test for rank determination is in effect a test of the statistical
Qr significance of
the eigenvalues of the system, hence how it is often written: L = i=0 (1−λi ). The
test begins with r = 0: this implies there are no cointegrating vectors and hence all
the eigenvalues are insignificant. Thus if this null hypothesis is rejected, it must be
that there is at least one significant eigenvalue; thus the model is restricted to have
one cointegrating vector, r = 1, and this is tested; if this is rejected, again it must
be that there is a significant eigenvalue in the ones being restricted to zero (i.e. all
the rest). So again r is increased, and the test is run on the r + 1th up to the pth
eigenvalues. Once the test is accepted, this suggests the additional eigenvalues are
all zero, hence there are no more cointegrating vectors.
The cointegrated VAR is estimated by the reduced rank regression of ∆Xt on
Xt−1 corrected for lagged differences and deterministic terms. Using the Frish-
Waugh Theorem (Section A), the residuals from the regressions of ∆Xt and Xt−1
on ∆Xt−1 and Dt can be written:
à k−1
!
X
(5.59) R0,t = ∆Xt | ∆Xt−1 , Dt
i=1
à k−1
!
X
(5.60) R1,t = Xt−1 | ∆Xt−1 , Dt .
i=1
(5.59) and (5.60) can be used in (5.52) to produce the concentrated regression
model:
(5.61) R0,1 = αβ 0 R1,t + εt .
This gives a likelihood of:
à T
!
1 1X 0 0 −1 0
(5.62) L= T /2
exp − (R0,1 − αβ R1,t ) Ω (R0,1 − αβ R1,t ) .
|Ω| 2 t=1
This is estimated by fixing β and estimating α and Ω by the OLS regression of R0,t
PT
on R1,t in (5.61). Defining the squared correlations Sij as Sij = T −1 t=1 Rit Rjt
0
,
this gives:
−1
(5.63) α̂(β) = S01 β (β 0 S11 β)
−1
(5.64) Ω̂(β) = S00 − S01β (β 0 S11 β) β 0 S10
¯ ¯ ¯ ¯
¯ ¯ ¯ −1 0 ¯
(5.65) L−2/T 0
max (β) = ¯Ω̂(β)¯ = ¯S00 − S01β (β S11 β) β S10 ¯
¯ 0¡ ¢ ¯
¯β S11 − S10 S −1 S01 β ¯
00
(5.66) = |S00 | ,
|β 0 S11 β|
where the last line is simply a rearrangement of the line above, factoring out the
S00 term and writing the S11 below. Then this problem is minimised by solving the
eigenvalue¯ problem, which can ¯ probably be seen by the fact that if the eigenvalue
problem, ¯λS11 − S10 S00−1
S01 ¯ holds then the likelihood is zero, thus minimised.15
This gives us eigenvalues λ̂1 > · · · > λ̂p and eigenvectors V̂ = (v̂1 , . . . , v̂p ) such
that:
−1
(5.67) λ̂i S11 v̂i = S10 S00 S01 v̂i .
Then the eigenvectors for the r largest eigenvalues are taken and, normalised on
S11 by the β 0 S11 β term in (5.66):
¯ ¯ r
Y
¯ ¯
(5.68) L−2/T
max (Hr ) = ¯Ω̂(β)¯ = |S00 | (1 − λ̂i ).
i=1
The eigenvalues can be interpreted as the squared correlations between the levels
of the data and the differences and hence as described above in equation (5.58) the
combinations of levels that are most stationary since the differences are stationary,
and in the limit there cannot be correlation between random walks and stationary
series.
5.4.1. Likelihood ratio (trace) test of cointegration rank. Estimation has thus far
proceeded for a general rank, r. This r must be chosen however, and it is done so
using the likelihood ratio test. Since the form of (5.68) that is unrestricted will be
where r = p, it follows using the laws of logarithms that the likelihood ratio test
statistic (Section 7) will be:
p
X
(5.69) −2 ln LR( Hr | Hp ) = −T ln(1 − λ̂i ),
r+1
Thus the test is that the remaining eigenvalues, above the ones taken under the null
to be significant, are equal to zero and hence the data supports the null hypothesised
value of r.
Testing proceeds from rank r = 0 to r = p in testing, as opposed to the op-
posite direction, due to the size and power properties of testing in this direction,
which are preferable (Johansen 1995). Thankfully this procedure is automated in
most econometrics programs, and carrying it out in PcGive will be outlined in Sec-
tion 5.4.2. It is suggested that one first carries out the trace test before estimating
the cointegrated VAR in PcGive (Sections 5.4.3 and 5.4.4).
5.4.2. Trace testing in PcGive. Having estimated the unrestricted VAR as in Sec-
tion 3.7.1 or 3.7.2, and carried out the various diagnostic checks, and arrived at a
model that is congruent and satisfies the assumptions of the maximum likelihood
framework, then one can proceed to trace testing to determine the cointegrating
rank, r, for the system. To do so, using the PcGive module having estimated the
unrestricted VAR in its satisfactory form, follow:
Test --> Dynamic Analysis and Cointegration Tests... --> [check I(1)
cointegration analysis box] OK
This will provide an output something like:
I(1) cointegration analysis, 1963 (4) to 2005 (1)
eigenvalue loglik for rank
3950.074 0
0.43799 3997.902 1
0.29420 4026.821 2
0.19185 4044.500 3
0.10640 4053.837 4
0.072642 4060.097 5
0.044840 4063.905 6
Running the file will give output something like that in Section 5.4.5.
negligible eigenvalues. However, this is not always the case, and there are a num-
ber of eigenvalues that aren’t particularly small neither overly large, and they are
evenly spread over a range between say 0.1 and 0.4, making rank determination
less clear. In this situation, where there is ambiguity over which eigenvalues to
take as significant and which to not, extra information needs to be used; however,
as the trace test is the only formal procedure used, it should be accorded most
weight. Possible sources of additional information will be considered in the next
few Sections.
5.5.1. Coefficients in α matrix. One might consider the coefficients, and their sig-
nificance in the α matrix. If there is ambiguity over whether or not an (r + 1)th
cointegrating vector is stationary, then significant entries in the (r + 1)th column of
α (above about 2.6 as α follows a non-standard distribution) suggest that there is
useful information on the dynamics of the system in this vector.
In PcGive, the α matrix and its standard errors can be found by running a CVAR
with r = p.16 This is done as in Section 5.4.4 or 5.4.4, though setting cointegrating
rank to p in the relevant places.
5.5.3. Plots of characteristic roots. Another check of correct rank, which may also
yield information on whether or not there is a possible I(2) problem, is plots of
the root of the companion under various imposed ranks - if for the rank r that is
imposed, there still appears to be a root near the unit circle, then this suggests
the other root should be imposed to unity, and hence the potential additional
cointegrating vector is actually a stochastic trend. However, if regardless of what
rank is imposed, there is always an additional root near to unity, this suggests there
may be an I(2) problem - differencing alone will not get rid of this root, so placing
more and more roots to unity will not solve the problem.
In PcGive this is done by following, after the CVAR has been estimated:
Test --> Dynamic Analysis --> [check Plot roots of companion matrix
box] OK
This will result in the roots of the characteristic polynomial, or of the companion
matrix (the two are equivalent) being printed, as in Figure 8.
16The output from the trace test does not give standard errors for α.
40 JAMES READE
vector1 vector2
−73 37.95
37.90
−74
37.85
21.0 17.00
−12.3
vector5 6.3 vector6
−12.4
6.2
−12.5
6.1
1970 1980 1990 2000 1970 1980 1990 2000
0.75
0.50
0.25
0.00
−0.25
−0.50
−0.75
−1.0 −0.8 −0.6 −0.4 −0.2 0.0 0.2 0.4 0.6 0.8 1.0
5.5.4. Economic interpretability. One might give consideration to the αr+1 and
βr+1 vectors that would result if an ambiguous cointegrating vector was added. If
the cointegrating combination of levels, βr+1 is nonsensical then it might be worth
omitting. It is the case that combinations of vectors can be formed into one (see
the section on identification, Section ??), and hence the adding of another column
might lead to a cointegrating vector being dispersed over two columns, leaving it
looking a bit odd and nonsensical on its own. If a particular variable is expected
to be error correcting, such as a variable like price of output in a supply equation,
then it would be hoped there would be a negative α coefficient corresponding to
price in at least one column of α, hence if the price coefficient in αr+1 is the
appropriate sign, it might be worth keeping. However, it should be remembered
that the purpose of econometric analysis is to understand what the data reveal for
a particular economic problem, and as such testing procedures should be accorded
more weight appropriately.
5.5.5. Remove dummy variables and run trace test again. Because it is known that
the inclusion of dummy variables alters the distribution of the trace test, but that
it is hard to simulate critical values for every eventuality, the critical values for the
trace test are based on the model with no dummies. Hence it might provide infor-
mation on the rank if the trace test is calculated again without dummy variables
in the model.
In PcGive this involves omitting the dummy variables (apart from the seasonal
dummies) in either the model selection window using the module, or by commenting
out using batch code (this is where the benefit of using batch code enters), running
the unrestricted VAR again and running the trace test again. For the UK labour
market model, this results in:
I(1) cointegration analysis, 1963 (4) to 2005 (1)
eigenvalue loglik for rank
3740.232 0
0.27499 3766.923 1
0.24606 3790.366 2
0.14621 3803.486 3
0.097172 3811.970 4
0.056917 3816.834 5
0.016040 3818.176 6
stationary; this is like the phenomenon of modelling a random walk using a station-
ary process with a number of suitable dummy variables. Hence this suggests that
one should consider the other information presented here before making a choice
on the cointegrating rank.
5.5.6. Simulation of small sample test distribution. A final thing that could be done
is to simulate a VAR process with the coefficient values estimated, to look at the
empirical rejections of various null hypotheses if particular ranks are the true rank
- so if it is not clear whether the rank is 2 or 3, then one might simulate with a true
rank of 3 and consider the probability of falsely accepting the hypothesis of r = 2.
Ideally if the true rank is 2, then one would hope for a 95% rejection frequency
of the null hypothesis of r = 0 and r = 1 (power), and a 5% rejection frequency
of r = 2 (size). With such a simulation one can then alter coefficient values and
sample size to see what would be needed in order to correctly determine the rank
of the system.17
PcGive simulates critical values for the trace test. The existence of this asymp-
totic distribution relies on a number of assumptions which must be tested. These
assumptions are:
(1) rank(Π) = r. Correct rank has been chosen. If not, results will be wrong,
since unit roots may be left in data.
(2) p − r unit roots (so no explosive roots).
(3) Functional Central Limit Theorem exists for errors, which requires no au-
tocorrelation in errors.
(4) T = ∞ should be roughly approximated. However, T = 100 6= ∞ and
small sample problems may exist.
(5) constant parameters.
This is the reason for the emphasis on testing and checking these assumptions (see
Section 4); they underly the trace test, determination of the cointegrating rank
hence classification of the I(0) and I(1) components of the model.
The distributions remain the same if lags are included in the model, because
estimation and rank determination is carried out on the concentrated model of
(5.61), and hence however many lags are included, they are concentrated out.18
The inclusion of deterministic terms, nuisance parameters, does change the limit-
ing distribution. If a restricted constant is included (see case 2 on page 30), the
17I intend to produce an Ox job to do this and if anyone is interested I can email it to them once
I have written it.
18Although as discussed in Section 4.4.1, I(2) analysis requires at least 2 lags, while short-run
identification is affected also by the number of lags included.
THE COINTEGRATED VAR MODEL 43
To normalise, PcGive puts the leading diagonal to zero, i.e., where a ∗ signifies a
freely varying parameter:19
1 ∗
∗ 1
β =
0
∗ ∗ .
∗ ∗
∗ ∗
This normalisation makes the counting of parameters estimated somewhat more
difficult, since by normalisation at least r parameters are not estimated but fixed.
In fact by linear algebra the β matrix can be row reduced until it has an identity
matrix in the top segment, so:
1 0
0 1
(7.2) β =
0
∗ ∗ .
∗ ∗
∗ ∗
Hence there are less parameters to estimate than 2 × p × r; in fact there are pr +
pr − rr = pr + (p − r)r, since rr is the size of the identity matrix that can always
be found. This is not a restriction though: the likelihood will always remain the
same before and after this restriction is imposed. However, rendering the system in
a more interpretable form, which this normalisation should do, will enable clearer
understanding of the dynamics within it. Restricting the β and α matrices will
be discussed Sections 7.1 and 7.2, implementation in PcGive will be considered in
Section 7.3 before in Section 7.4 types of restrictions that can aid understanding of
the data will be described.
7.1. β restrictions. As an example, a system with p = 3 and r = 2 is taken. There
are always likely to be deterministic terms in a particular model; when testing, one
must consider the role these terms play. The model is:
(7.3)
X1,t−1
∆X1,t α11 α12 µ ¶ X2,t−1
∆X2,t = α21 α22 β11 β21 β31 β41 β51 X3,t−1
β12 β22 β32 β42 β52
∆X3,t α31 α32 1
Dst .
ε1,t
+ ε2,t
ε3,t
Many questions can be asked about the system which can be answered by placing
restrictions on the β matrix and testing them, such as:
(1) Do X1,t and X2,t cointegrate?
(2) Is X3,t stationary?
(3) Is the spread X1,t − X2,t stationary?
(4) If there exists cointegrating relations, do all have spreads in them?
(5) Can a variable, say X3,t , be excluded?
19Although this can be altered by imposing restrictions, discussed in Section 7.1.
THE COINTEGRATED VAR MODEL 45
The other form for restrictions is the H-form, where instead of writing down
the restricted coefficients, the model is written in terms of the coefficients that are
unrestricted hence estimated; it is written:
¡ ¢
(7.9) β = Hϕ = H1 ϕ 1 H2 ϕ2 . . . Hr ϕr ,
where the last bit is written in terms of the ϕ matrix that is used to implement
these restrictions. So overall it might be written:
(7.10)
β = Hϕ = [H1 ϕ1 , H2 ϕ2 ]
? ? µ ¶ ? ? µ ¶ 0 0
ϕ11 ϕ12
(7.11) = ? ? , ? ? = ϕ11 ϕ12 ,
ϕ21 ϕ22
? ? ? ? ϕ21 ϕ22
and consideration of this for long enough should provide intuition that H takes the
form:
0 0
1 0 .
0 1
The first row must be zeros in order that the resultant matrix be zeroes. Then the
identity is required to map what is in the ϕ matrix to the bottom square part of the
resultant Hϕ product. The dimensions of H and ϕ can be found by considering
the resulting matrix that is required, the far RHS of (7.11). Here, in each β vector
there is one restriction, hence there must be p − m rows of ϕ and as p = 3 then
p − m = 2. Then there will be p − m columns in H as a result else the matrix
multiplication couldn’t take place.
Moving on to other kinds of restrictions, restriction 2 asks whether a variable
is stationary. This translates into asking whether the variable on its own is a
cointegrating vector, since in the reduced rank model all variables must be I(0). In
R form, asking whether X2,t is stationary is equivalent to asking whether or not
THE COINTEGRATED VAR MODEL 47
A choice must be made here however between testing for stationarity per se or
stationarity around deterministic terms, such as a trend or a mean shift. In (7.13)
stationarity per se is being tested, while if one were to test for stationarity around
the constant and mean shift dummy in this model, then the restrictions would be:
β11 0
µ ¶ β21 0
1 0 0 0 0
(7.13) Rβ = β31 = 0 .
0 0 1 0 0
β41 0
β51 0
Consider next the H form, restrictions are placed on one column in β, and the
other columns are left to vary freely. To find the exact form for H, one could
either work out the orthogonal complement of H, else one might construct it as
before, by working backward from the free parameters. On the restricted vector, if
testing is for stationarity around deterministic terms, only the variable being tested
for stationarity and the deterministic terms can vary.20The restrictions on each β
vector determine the size of each Hi matrix. On the first vector, the coefficients
on X2,t , 1 and Dst are allowed to freely vary; the rest are fixed. Hence there are
two restrictions and so two rows for the ϕ1 matrix, along with two columns in the
H1 matrix:
¡ ¢
(7.14) β̃ = H1 ϕ1 H2 ϕ 2
? ? ? ? ? ? ? ? ϕ12
? ? ? ϕ11 ? ? ? ? ? ϕ22
=
? ? ?
ϕ21
? ? ? ? ?
ϕ32
? ? ? ϕ31 ? ? ? ? ? ϕ42
? ? ? ? ? ? ? ? ϕ52
(7.15)
0 ϕ12
ϕ11 ϕ22
=
0 ϕ32
,
ϕ21 ϕ42
ϕ31 ϕ52
20This is usually tested for, since the terms restricted to the cointegration space are restricted in
order to ensure stationarity of the system. It is left for the reader to think about the H-form if
testing was for stationarity per se. Restriction 5 is an exclusion restriction, and might be used
to test mean shift dummies or trends included when one is unsure over their usefulness.
48 JAMES READE
0 0 0 1 0 0 0 0
1 0 0 0 1 0 0 0
H1 =
0 0 0 ,
H2 =
0 0 1 0 0 .
0 1 0 0 0 0 1 0
0 0 1 0 0 0 0 1
? ? ? ϕ11
? ? ? ϕ11 −ϕ11
(7.16) β1 = H1 ϕ1 =
? ? ? ϕ21 =
0 .
? ? ? ϕ31 ϕ21
? ? ? ϕ31
Thus equality restrictions between variables lead to more than one non-zero entry
on a particular column of the H matrix in question. Considering the R-form, this
will have dimension p × (p − m), and from (7.16) m = 3 hence R has dimension
5 × 2 and is:
µ ¶
0 1 1 0 0 0
(7.17) R = H⊥ = .
0 0 1 0 0
If the question was do spreads exist in all cointegrating vectors, restriction 4, this
would place restrictions on the other β vector here, and the spread could be between
21The inclusion of determinstic terms or not is case dependent, and depends on the spread or
cointegrating vector being tested for. Initially here deterministic terms are left unrestricted.
THE COINTEGRATED VAR MODEL 49
constraints
{
&6=0; &11=0;
}
7.4. Using restrictions to understand the system. A number of tests of re-
strictions can be carried out on the α and β matrices which have various implica-
tions for the system; testing is only on one vector in each matrix, since at this stage
neither is identified and hence it is irrelevant which vector is being tested on, the
question is more whether the data can support the restriction being hypothesised.
7.4.1. Weak exogeneity. Weak exogeneity, which was discussed in Section 3, can
be tested easily in this framework. This corresponds to a zero row in the α
matrix. This means that one variable in Xt does not correct to the cointegrating
vector; in other words the determination of its level is exogeneous to the system,
it is determined outside. As an example, X2t is weakly exogenous in a bi-variate
system with one cointegrating vector (p = 2, r = 1):
µ ¶ µ ¶ µ ¶
∆X1t α1 0 ε1t
(7.26) = β Xt−1 + ,
∆X2t 0 ε2t
The equation for X2t is simply ∆X2t−1 = ε2t and so X2t in this simple case is a
random walk, hence not determined inside the model. X2t is a common trend,
a driving force in itself, and consideration of the α⊥ shows this: a zero row in
α corresponds to a unit vector in α⊥ which simply picks up that particular vari-
able with a unity and places a zero on the other variables, here just X1t . The
Granger representation in (5.16) shows that the common trends, the random walk
Pt
component, are α⊥ i=0 εi and there are p − r (= 1 here) of these, and because
Pt
α⊥ must have the form α⊥ = (0, α⊥,2 ), then α⊥ i=0 εi = α⊥,2 ε2t and so X2t can
be isolated as the variable driving the system, with X1t simply correcting to it,
within the movements prescribed bythe cointegrating vector. If testing suggests a
variable is weakly exogeneous, then that variable can be transferred from Xt to Zt
and treated as a weakly exogeneous variable in the system. This can be written
in two equivalent forms; first by just including the contemporaneous differences of
Zt , and as many lagged differences of Zt as the model has for Xt . The alternative
formation is to partition Xt = ( Yt | Zt ), i.e. to condition on Zt . Then the model is
written:
∆Yt = αβ 0 Xt−1 + Γ1 ∆Xt−1 + γ∆Zt−1 + γ0 ∆Zt + εt .
One should note that the test for weak exogeneity on any particular variable is a
test only on that variable; because more than one row looks to be zero, this doesn’t
mean they are are necessarily equal to zero. However, testing more than one vector
is not too difficult.
7.4.2. Testing for a unit vector in α. The α matrix is commonly described in terms
of its column vectors α1 , . . . , αr , since these give the response of each variable to
the rth cointegrating vector. The test for a unit vector in α thus investigates
whether or not a particular cointegrating vector could be described as having just
one variable correcting to it. Furthermore, a unit vector in αr corresponds to a
zero vector in Pthe appropriate column of α⊥ , and given the common trends are
t
defined as α⊥ i=0 εi , then a unit vector implies shocks to that variable have no
52 JAMES READE
0 0 0 0 1
The H-form matrix would be:
¡ ¢
(7.29) H= 0 1 0 0 0 .
7.4.5. Testing in PcGive. In PcGive, carrying out the tests described in Sections
7.4.1–7.4.4 involves imposing restrictions on the α and β matrices, which was in-
troduced in Section 7.3. However, it is not simple to put together the code to test
for weak exogeneity of so many possible variables, and over all the possible rank
restrictions. It is informative to consider the test described in the previous few
Sections for different ranks, because, for example if between rank r and r + 1 a
variable becomes endogenous, this suggests that that variable corrects to a partic-
ular cointegrating vector. However, there is no way of automating this procedure
as far as I am aware in PcGive.22 A manual method can be employed, and to this
effect an Ox job exists that prints the batch code for a given number of variables
and deterministic terms called CreateBatchCode.ox. In the job, simply alter the
iP and iDets variables as appropriate, and the code for all ranks for the four types
22This is a big advantage of the CATS for RATS software, which automates the tests in Sec-
tions 7.4.1–7.4.4 for all possible rank restrictions, and tables the output.
THE COINTEGRATED VAR MODEL 53
of test will be printed as output. Copy and paste this into the constraints part
of the batch file, and proceed through the various restrictions testing one at a time;
a procedure might be:
(1) comment in one line;
(2) run the batch file --> output of restricted VAR;
(3) scroll to bottom of output (ctrl+page dn);
(4) copy the likelihood ratio test statistic and p-value (in square brackets) and
paste it on line of batch file for this restriction;
(5) comment back out line;
(6) go back to (1) until have done all restrictions;
Having done this, one can analyse the test statistic outcomes. During the process
the constraints part of the batch file (which with the restriction code will become
very long) should look something like:
constraints
{
//Exclusion (beta) restrictions.
//rank = 1
//Variable 0
//&5=0; 83.257 [0.0000]**
//Variable 1
//&6=0; 8.7046 [0.0032]**
//Variable 2
//&7=0; 0.23305 [0.6293]
//Variable 3
//&8=0; 1.3792 [0.2402]
//Variable 4
//&9=0; 1.4983 [0.2209]
//Variable 5
//&10=0; 1.7782 [0.1824]
//Variable 6
&11=0;
//rank = 2
//Variable 0
//&10=0; &17=0;
//Variable 1
//&11=0; &18=0;
It might help to have these tabulated, and an Ox job exists which takes the batch
file once all the tests have been carried out and creates LATEX code for tabulating
the results. This job is imaginatively called ReadTestResIntoTex.ox.23 Also
23In its current form it relies on the line spacing remaining as it is when printed out from the
CreateBatchCode.ox job, and also requires that the test statistics and p-values are pasted at the
end of the line (to make sure they’re put in the right place, use ctrl+end then copy there).
54 JAMES READE
within this approach, multiple restrictions can be tested, to see if they hold to-
gether; for example, if two variables appear individually to be weakly exogenous,
by commenting in the two lines that relate to these variables, one can test the joint
weak exogeneity of the two variables.
7.4.6. Testing candidate cointegrating vectors. Theory ought to have provided a
number of candidate cointegrating vectors, which ought to be tested at this point
to see if in fact they are I(0) given the data, and if not, what is needed to make
them I(0). Possible combinations of these vectors will be considered in Section 7.5
when identification of the system is discussed.24 While it is informative to consider
the restrictions of Sections 7.4.1–7.4.4 for all possible rank conditions, there is little
to be gained from varying rank when testing candidate cointegrating vectors.
There may not necessarily be a one-to-one relation between theory cointegrating
vectors, and empirical ones; either theory relations are a combination of partial
empirical ones, or vice versa; identification may be a problem, but also theory
relations may not exist on their own but only in combination with other theoretical
relations.
When testing candidate cointegrating vectors in PcGive, a similar strategy to
that in Section 7.4.5 might be used. For each possible cointegrating vector, writing
a line in the batch code, with a comment describing the relation may prove useful;
especially at the later stage of identification (Section 7.5), since combinations of
candidate restrictions can be commented in as appropriate. Thus an example for
the constraints section of the batch code might be:25
//&19=1; &18=-&20-1; &21=0; &22=0;
//2.6422 [0.1041] price wedge and productivity relation
//&18=1; &19=-1; &20=0; &21=0; &22=0;
//10.688 [0.0048]** pure phillips curve
//&18=1; &19=-1; &20=0; &22=0;
//5.1466 [0.0233]* allowing output
//&18=1; &19=0; &20=-&22; &23=0; //labour demand 0.018685 [0.8913]
In terms of accepting restrictions, it is nice to have rejection p-values of above
20%, which would give good support to the restriction being imposed. The sugges-
tion is that p-values between 10% and 50% give slight support, while those above
50% give strong support. The closer the test statistic is to conventional critical
values, the further into the tails of the distribution are the hypothesised restrictions
and hence the less plausible are these restrictions.
7.5. Identification of β. The β matrix is said to be identified when one can tell
βi0 Xt from βj0 Xt , i 6= j. In (7.23) the cointegrating relations could be written:
(7.30) β10 Xt = β11 X1,t + β21 X2,t + β31 X3,t
(7.31) β20 Xt = β12 X1,t + β22 X2,t + β32 X3,t ,
and without additional information, it would be impossible to tell the two vectors
apart; while there will be numbers for the β coefficients, linear combinations of the
vectors in β and α could be taken to leave the β vector looking completely different.
A way of making each vector, (7.30) and (7.31) distinct is required, so that if linear
24This Section can be ignored if one wishes to identify the system using strategy 2 in Section 7.5.
25The example is taken from the UK labour market job. Each restriction is split over two lines
to avoid breaking the margins on this page!
THE COINTEGRATED VAR MODEL 55
combinations are taken, the restrictions placed on each vector are destroyed hence
the only matrix that can be placed between α and β is the identity matrix. This
will involve placing zeroes or equality restrictions in each matrix. Formally, there
are rank and order conditions to establish identification, and there are three classes
of identification:
• Rank and order conditions:
Rank condition: If rank(Ri0 β) ≥ r − 1, i = 1, . . . , r then cointegrating
relation i is identified.
¡ ¢
Ri0 β = Ri0 Hϕ = Ri H1 ϕ1 . . . Hr ϕr .
• Classes of identification:
Just-identified: Just enough restrictions so that the likelihood is not
altered; the rank condition holds with equality, and only the r − 1
restrictions that are found by linear combination of α and β are used.
This is formally, or non-economically identified. Economically might
choose which variables to set to zero to satisfy economic relations such
as demand or supply.
Over-identified: The likelihood is altered, and the rank condition holds
with inequality. Hence a LR test can be carried out to test these
restrictions.
Under-identified: The restrictions in place do not identify the system.
A few examples might help. Taking the matrix:
µ ¶
β11 β21 β31
,
β12 β22 β32
then the first thing to do might be to impose a zero in each row, if a variable is not
thought to belong to that relation:
µ ¶
0 β21 β31
.
β12 0 β32
This identifies the system, since adding the first row to the second destroys the zero
restriction in place in the second column, and if adding the second row to the first
destroys the zero restriction on β11 . Formally the rank condition can be checked.
This is rank(Ri0 β) = rank(Ri0 Hϕ) > r − 1. It is pointless to consider R1 and H1 ,
or Ri and Hi more generally, since Ri = (Hi )⊥ hence Ri0 Hi = 0 by construction.
So for identification of the first cointegrating relation, rank(R10 H2 ):
¡ ¢
R10 = 1 0 0 ,
and so
1 0
H2 = 0 0 ,
0 1
and R0 H1 ≥ 1 needs to be satisfied. So:
¡ ¢ 1 0 ¡ ¢
R10 H2 = 1 0 0 0 0 = 1 0 ,
0 1
which has rank 1, hence satisfying the condition. A similar argument would show
the second relation is identified.
56 JAMES READE
However, if either of β11 or β22 were equal to zero, then the system wouldn’t be
identified; the following system is not identified:
µ ¶
0 β21 β31
,
0 0 β32
because adding the second vector to the first does not destroy any restrictions in
place in that vector. This can be shown formally by considering rank(R10 H2 ) > r−1
since this is the condition that identifies the first cointegrating vector, the one to
be checked for identification. Now
¡ ¢ 0
R1 = 1 0 0 , H2 = 0 ,
1
so rank(R10 H2 ) = 0 and so the vector is not identified. Equality restrictions can
also be used:
µ ¶
0 β21 β31
,
β12 −β32 β32
would identify both vectors, since adding the first to the second, provided β21 6= β31 ,
would destroy the equality restriction there, while adding the second to the first
vector would destroy the zero restriction on β11 .26
7.5.1. Identification Strategies. There are two strategies one might follow, which
are:27
(1) Just-identify then restrict insignificant parameters This strategy
involves imposing just-identifying restrictions, i.e. a zero in each β vector,
and then having done this, imposing further restrictions if need be, and if
possible. Thus one might exclude a variable that is clearly expected to
affect labour supply only, and one that would surely only affect demand,
and having done this, then for parsimony, given that standard errors can
now be reported on the identified system, insignificant variables could be
omitted from each vector, provided that in doing so the system doesn’t
become un-identified.
(2) Impose known cointegrating vectors Prior testing of stationary re-
lations based on theory might have provided a number of potential coin-
tegrating vectors. Then one might select a number of these and see if
they exist together, since thus far the test has been whether they exist in
isolation.
26The rank conditions also support this; R0 = (1, 0, 0), R0 = (0, 1, 1),
1 2
0 1 0 1
0 0 1 0
H1 = @ 1 0 A, H2 = @ 0 1 A,
0 1 0 −1
hence R10 H2 = (1, 0) and R20 H1 = (1, 1) hence both have rank 1 as required.
27
There is a third strategy, which involves having CATS. It is an option on the CATS for RATS
package called CATSmining. This procedure finds all possible stationary cointegrating relations
and then finds all combinations of these and reports all possible models. This undoubtedly is
data-mining, and has no economic meaning to it. However, it is certainly useful for getting to
grips with the kind of models that might exist, and the kind of relationships that might cointegrate
together.
THE COINTEGRATED VAR MODEL 57
7.6.1. Identifying short-run structure in PcGive. Having identified the long run
structure and run the model, go to:
Test --> Further Output...
--> [check Batch code to map CVAR to I(0) model box]
--> OK
Some batch code will appear in the Results window in GiveWin at this point. It
should look something like:
// Batch code to map CVAR to model with identities in I(0) space
algebra
{
Dlnt = diff(lnt, 1);
Dllt = diff(llt, 1);
Dwmpc = diff(wmpc, 1);
Dympy = diff(ympy, 1);
Dpwedgey = diff(pwedgey, 1);
Ddlpyt = diff(dlpyt, 1);
CIa = +0.0135068 * lnt -0.311093 * wmpc +0.0893986 * ympy...
CIb = +0.0421891 * lnt -0.0421891 * llt -0.285783 * wmpc...
}
system
{
Y = Dlnt, Dllt, Dwmpc, Dympy, Dpwedgey, Ddlpyt;
I = CIa, CIb;
Z = Dlnt_1, Dllt_1, Dwmpc_1, Dympy_1, Dpwedgey_1, Ddlpyt_1,
CIa_1, CIb_1, Constant;
U = , CSeasonal, CSeasonal_1, CSeasonal_2, dum651p, dum731p...
}
model
{
THE COINTEGRATED VAR MODEL 59
dum19842p, dum19842p_1 ;
}
estimate("FIML", 1963, 4, 2005, 1, 0, 0);
Finally highlight the code from the algebra line onwards, down to the
estimate("FIML",... line, and either press ctrl+B or right click and select
run --> Run as batch, and the model should run. Having run this code, a
simultaneous equations output will result. One should then proceed by restricting
all insignificant variables to be equal to zero; this is done by going back to the
PcGive module, and clicking on:28
Model --> Formulate... --> OK --> OK
which will bring up a Model Formulation window from which each equation can
be considered. Deleting a variable from here will delete it from that particular
equation. Having deleted a number of variables, the model can be run, and one
should continue doing this until either all insignificant variables have been deleted,
or the test of the restrictions, which is reported at the bottom of the output, on a
line looking a bit like
LR test of over-identifying restrictions: Chi^2(9) = ...
has been rejected.
7.6.2. Other possible strategies for short-run structure identification. More ‘eco-
nomic’ identification procedures are possible; one such is the impose a Choleski
decomposition on the variance-covariance matrix Ω̂, and hence on the entire model;
this is problematic, as it is non-unique and implies a causal chain which may not be
economically justifiable. Another possible method is to isolate large off-diagonal
entries in the variance-covariance matrix, and then take a linear combination of the
equations relating to the two correlated variables in order to get rid of this corre-
lation. However, this is likely to lead to parameters that are hard to interpret,
especially since it is plausible a number of high correlations are simply between
different prices, and while relative prices are useful, it may still be more useful
to have the prices themselves in the model. The assumption that Ω̂ be diagonal
was never made in this context, so it is not a problem. The only possible problem
emerges when one seeks to look at the moving average representation in a structural
manner.
Having reached this stage, the cointegrated VAR model is estimated, and can
be reported. Each equation can be individually reported and interpreted, since
each one is identified. For example, taking again the UK labour market model,
there is an equation for each variable in the model, and below two of the seven
are reported. Each equation is given below, where ecm1t−1 and ecm2t−1 refer
to the two cointegrating vectors in (7.34) and (7.35) respectively, and each will be
commented on in turn. The seasonal dummies, which were not omitted from any
of the equations, are not reported to aid readability.
28Alternatively, clicking the Model Settings button (the middle of the three buttons with tetris
like blue and red bricks on) and clicking “ok” will lead to the same place.
THE COINTEGRATED VAR MODEL 61
This first equation, for employment, confirms that the variable is indeed weakly
exogeneous, the restriction is accepted easily. Furthermore, it suggests little has
been done to properly explain this variable in this system, since it depends only on
the difference of employment and the labour force, suggesting demographic factors
are equally, if not more, important in explaining where employment is. It is possibly
the case results would be different if the sample was split, since it is certainly likely
in the 1960s and 1970s employment was weakly exogenous and as such a pushing
factor due to strong labour unions and restrictive regulation of the labour market
meaning firms could not alter their employment levels to suit demand and market
conditions, whereas it is likely in the second half of the sample employment was
more responsive to the variables included in the model. The near unity coefficient
on the lagged first difference of employment also lends support to the suggestion
the variable is near I(2).
The labour force equation provides less suggestion of I(2)ness, since the vari-
able does not include its own lag hence displays no significant persistence. The
change in the labour force is related to changes in employment, suggesting that
while demographic factors such as female participation might affect this, employ-
ment opportunities will still draw in peripheral workers, since the sign is positive.
Employment reacts positively to both cointegrating relations and hence is not cor-
recting, perhaps reflecting the fact that adjustment takes place in other variables,
possibly prices, since despite more flexible labour markets, it is still unlikely firms
can shed or hire workers to the extent they would ideally like, hence making its
movement more sluggish and less likely to be quickly adjusting.
9. Data
Great consideration must be given to the collection of data and initial analysis
of it. Ultimately it is the correct inclusion of important variables that determines
the quality of the econometric model that follows, as Section 4 on diagnostic testing
suggested; however, selection of variables is quite possibly the hardest part of the
whole analysis. This Section will discuss firstly identifying the economic problem,
then creating a list of potential variables for inclusion, then factors enabling this list
62 JAMES READE
Term structure of interest rates: Theory suggests that the difference be-
tween long and short interest rates should be stationary, so:
(9.6) β30 Xt = ilt − ist .
THE COINTEGRATED VAR MODEL 63
Giese (2005) gives a recent investigation of term structure using the CVAR,
and indeed term structure was one of the first applications of cointegration
analysis (Engle & Granger 1987).
Fisher real interest rate parity: This is a decomposition of nominal inter-
est rates into real interest rate and expected inflation components, where r
is the real interest rate:
(9.7) β40 Xt = im m e
t − rt − Et (∆m pt+m )/m.
Thus all these four theoretical relationships suggest possible cointegrating relation-
ships that can be tested. The analysis of Juselius & MacDonald (2004) found
that for the US and Germany none of the above parities on their own were station-
ary, but than combinations of the conditions were stationary; often the purchasing
power parity condition helped ‘make stationary’ other parity conditions.
9.2. Use of past theoretical and empirical work to derive a list of relevant
variables for inclusion. Having decided this, initially existing theories, and past
empirical analyses of the economic entity of interest should be consulted, and a list
of potential variables for inclusion drawn from there. The whole premise of doing
cointegrated VAR analysis is to look at what the data support, and hence one should
allow established theories to motivate choice of variables, but not attention should
not be restricted to any one particular theory - once the variables are selected,
one should allow the data to determine which variables play which role, and hence
which theory or theories are supported.
There exists a huge range of macroeconomic models of the labour market. Pe-
saran (1991) generalises the standard adjustment cost model of factor demand for
labour demand determination. The model specifies some desired level of employ-
ment, lt∗ , which is a function of xt , the set of variables determining this desired
level. The model has adjustment costs, which Pesaran specifies to be the differ-
ence in employment and the second difference, or acceleration in employment, and
an optimisation problem is posited, of which the solution takes a Vector Autore-
gressive (VAR) form. Modern neo-classical representative agent models exist also
which attempt to bring microeconomic foundations into macroeconomic models.
Pétursson & Sløk (2001) present one such model, solving such a model and esti-
mating a cointegrated VAR in order to identify stationary relations motivated by
theory. These relations are, for employment, repeating equation (2.1), where the
variables are employment (nt ), output (yt ), wages (wt ) and consumer prices (pct ):
nt = ϕ0 + ϕ1 yt − ϕ2 (w − pc )t .
The model also provides a steady state relationship for real wage determination,
where lt is labour force and pyt is producer prices:29
(9.8) (w − pc )t = ϕ0 + ϕ1 (y − py − n)t + ϕ2 (lt − nt ) ,
Another labour market analysis is that given in Juselius (2006, Chapter 19), which
considers a larger dataset, producing a number of candidate cointegrating vectors.
Among these are Phillips curve type relations involving unemployment and infla-
tion, along with several relations involving the difference between consumer and
producer price inflation, the price wedge. This variable, written as pyt − pct , can
29This equation is written in a slightly odd form in order to express the variables that were
eventually included in the final model in Reade (2005). (y − py − n)t is productivity, and (lt − nt )
is unemployment.
64 JAMES READE
be seen to measure the bargaining power that sides hold in wage negotiations, and
also the degree of openness to foreign trade of an economy.
Another possible cointegrating relation inspired by economic theory might cap-
ture Okun’s law (Okun 1962), which states the negative relationship between real
output and unemployment with a coefficient of −0.4. On the other hand there
might be a number of pushing forces; in a time of strong labour unions one might
perhaps even suggest real wages or employment to be a pushing variable, with per-
haps price or producer inflation correcting to equilibrium. In times of lower union
power one might expect real wages perhaps to exist in a stationary relationship
with economic activity (proxying productivity) and unemployment.
There are numerous other labour market analyses, such as Carstensen & Hansen
(2000) for Germany, Jacobson, Vredin & Warne (1997) and Jacobson, Vredin &
Warne (1998) for Scandinavian countries, and Corsini & Guerrazzi (2004) consider
the Italian labour market using the cointegrated VAR methodology.
Hendry (2001) considers economic modelling over the very long term, from 1875–
1991. The employment rate, controlling for the participation of women, has been
reasonably constant in the very long term, which leads one to search for other
factors that have been reasonably constant over the very long term; one might be
the output gap, another might be real interest rates. Furthermore, it is to be
expected that despite the efforts of unions, that firms have more power to set both
wages and employment levels, hence the factors determining a firm’s labour demand
ought to be considered. One such factor is the capital stock, which Pétursson &
Sløk (2001) do not model on the grounds that it is poorly measured. If the
labour force of the UK was doubled but the capital stock remained the same,
the employment rate would fall sharply, as would real wages. However, if the
capital stock was to rise commensurately, then one imagines little or no effect upon
the employment rate and real wages. Further, capital plays a large role in how
productive workers are, which then determines how much firms demand. Such a
long term view of employment would cast doubt on real wages as an explanatory
factor; they have risen dramatically over the last 100 years, while the employment
rate has not similarly risen.
A brief look across the wider literature might throw up other candidate variables.
A list of candidate variables for inclusion might be:
(1) employment;
(2) labour force;
(3) hours worked;
(4) wages;
(5) output;
(6) capital;
(7) productivity;
(8) real interest rates;
(9) vacancies;
(10) benefit levels;
(11) unemployment rate;
(12) population;
(13) female participation rate;
(14) consumer prices;
THE COINTEGRATED VAR MODEL 65
T (X) = (X λ − 1)/λ,
where λ is the transformation parameter. When λ = 0 the transformation is:
T (X) = ln(X).
66 JAMES READE
There are a number of considerations which may help reduce the size of the
system. Firstly, it is likely that a number of variables will be highly collinear
in the list in Section 9.2; while some degree of correlation is necessary between
variables in order for any meaningful analysis to be found, if correlation is very
high, it is likely the two variables are explaining the same thing. Also, a number
of the variables in the list can be found by transformations of other variables. For
example, productivity could be found by using real output and employment, and
hence instead of including all three variables, just two are included. Employment
and hours worked will also to a large extent explain the same thing, and so one or
the other ought to be included.
(9.9) xt = xt−1 + εt ,
THE COINTEGRATED VAR MODEL 67
1.0 UK CPI
0.5
2.8e7
2.6e7
2.6e7
2.4e7
1965 1970 1975 1980 1985 1990 1995 2000 2005
t
X
(9.10) xt = x0 + εi .
i=1
This characterises the random walk - wherever the process is at the point t, it is
a function of its initial value and every shock that has hit the process since then.
Changing the notation a touch, if the process was prices, pt , and inflation is denoted
πt , then since the inflation process could be written as:
(9.11) pt = pt−1 + πt .
t
X
(9.12) pt = p0 + πi .
i=1
t
X
(9.13) πt = π0 + εi .
i=1
68 JAMES READE
and hence it can be seen prices are a function of the initial price level, a linear time
trend, and a doubly summed error. This is the characteristic of I(2) processes. In
the cointegrated VAR, the moving average representation, or solution, of the vector
Xt of processes, is called the Granger representation (Johansen 1995), and in I(1)
models is, but in I(2) models it becomes much more complicated:
X j
t X t
X
(9.15) Xt = B0 + B1 t + C2 εi + C1 εi + C0 (B)εt .
j=1 i=1 i=1
Equation (9.16) can be seen to follow similar properties to (9.14), with the B
matrices functions of initial values, while the C matrices are complicated matrix
multiplications involving orthogonal complements of many things, but capture that
the process is driven by an I(2) component, the doubly summed part, and an I(1)
component, the singly summed part. Writing (9.16) in matrix form for a 2-variable
system illustrates what needs to be done:
(9.16)
µ ¶ µ ¶ µ ¶ µ ¶ Ã Pt Pj !
X1,t B0,11 B1,11 C2,11 C2,12 ε1,i
= + t+ Pj=1
t Pi=1
j
X2,t B0,21 B1,21 C2,21 C2,22 j=1 i=1 ε2,i
µ ¶ µ Pt ¶
C1,11 C1,12 ε1,i
+ Pi=1
t + C0 (B)εt .
C1,21 C1,22 i=1 ε2,i
Any transformation should be supported by the data, and the idea is to find com-
binations of variables such as C2,11 − C2,21 = 0 and C2,12 − C2,22 = 0, such that:
µ ¶ µ ¶ µ ¶
X1,t − X2,t B0,11 − B0,21 B1,11 − B1,21
= + t
∆X2,t B1,21 0
µ ¶ Ã Pt Pj !
0 0 j=1 i=1 ε1,i
+ Pt Pj
(9.17) 0 0 j=1 i=1 ε2,i
µ ¶ µ Pt ¶
C1,11 − C1,21 C1,12 − C1,22 ε1,i
+ ∗ ∗ Pti=1
C1,21 C1,22 i=1 ε2,i
+ C0∗ (B)εt ,
which probably needs a bit of motivation. Differencing a trend gives a constant,
differencing a constant leaves nothing, differencing a twice integrated error process
gives a singly integrated one, hence the ∗’s on the C parameters, because the
differenced X2,t variable will have a single integrated error, although it will be
in a different matrix to the double integrated one. This is the trick, to knock
out the double integrated errors by linear combination. This is also the idea for
cointegration more generally.
THE COINTEGRATED VAR MODEL 69
file will open in PcGive, and also back in Excel should more variables need be added,
or other amendments made.33 It is likely that many transformations of the data
will be needed before the unrestricted VAR can be run. These can be done in
PcGive using the ‘calculator’, having opened the dataset in GiveWin. The dataset
can be opened by following, in GiveWin:
File --> Open Data File...
The ‘calculator’ can be opened by clicking on the button on the toolbar that looks
like a calculator, or by pressing ctrl+C or by following:
Tools --> Calculator...
All sorts of transformations are possible using the calculator, which are listed in
the function window; clicking on the Help button below this gives information on
each transformation. However, it is likely that with any dataset, a large number
of transformations will be made. This will result in a large dataset of the original
variables along with the transformed ones. It is advisable however to, instead of
initially transforming variables and saving the dataset in its new form, to instead
create an algebra file of transformations to run on the original dataset each time
session is begun in GiveWin. Such a file could be created using the Algebra win-
dow in GiveWin (found by Tools-->Algebra Editor..., or ctrl+A or pressing
the button on the toolbar in GiveWin with a blue “A”, a red lower case “l” and
something in yellow (a “g”?)), and clicking on Save As... after each new transfor-
mation has been written in. However, often the transformation that has just been
written here is lost, because clicking on OK:Run causes unsaved code to be lost.
Another method is to have a text file open in the main GiveWin window which
is saved with a .alg extension, which can be run each time another transformed
variable is required, like running an Ox job. An algebra file looks quite similar to
an Ox job, except each variable need not be declared. The abbreviated algebra
file for the UK labour market analysis is:
//log transformations
lpyt=log(pyt); //log of producer prices
lpct=log(pct); //log of consumer prices
lpft=log(pft); //log of foreign prices
..
.
//nominal-to-real transformations
wmpc = lwt-lpct; //real wages (deflated by consumer prices)
wmpy = lwt-lpyt; //real wages (deflated by producer prices)
ympy = lyt-lpyt; //real output (deflated by producer prices)
..
.
//other transformations
prodn = ympy-lnt; //productivity measure (real output/employment)
reprat = wmpc-bmpc;//replacement ratio (wages/benefits)
..
.
As can be seen in this file, some transformation are no longer used, hence are
commented out as they would be in an Ox job. It is perhaps useful, particularly in
more complicated transformations, to write out what the transformation actually
33Excel 2.1 Worksheets will also open in CATS.
THE COINTEGRATED VAR MODEL 71
is; this way one can keep track of which variable is what, as opposed to two months
later finding a variable called pwedgec and wondering what it might be. When
dummy variables need to be created, they should be created in the algebra file also;
see Section 4.4.2.
References
Carstensen, Kai & Gerd Hansen (2000), ‘Cointegration and common trends on the west german
labour market’, Empirical Economics 25(3), 475–493.
Corsini, Lorenzo & Marco Guerrazzi (2004), Searching for long run equilibrium relationships in
the italian labour market: a cointegrated VAR approach. University of Pisa Discussion Paper
43.
Doornik, Jurgen A., David F. Hendry & Bent Nielsen (1998), ‘Inference in cointegrated models:
UK M1 revisited’, Journal of Economic Surveys 12, 533–572. Reprinted in: Michael MacAleer
and Les Oxley (1999) Practical issues in cointegration analysis; Blackwell, Oxford.
Engle, Robert F. & Clive W.J. Granger (1987), ‘Co-integration and error correction: representa-
tion, estimation and testing’, Econometrica 55, 251–276.
Giese, Julia (2005), Level, slope, curvature: characterising the yield curve’s derivatives in a coin-
tegrated VAR model. Unpublished paper, Nuffield College, University of Oxford.
Granger, C. W. J. & P. Newbold (1974), ‘Spurious regressions in econometrics’, Journal of Econo-
metrics 2, 111–120.
Hendry, David F. (2001), ‘Modelling uk inflation, 1875-1991’, Journal of Applied Econometrics
16(3), 255–275.
Hendry, David F. & Carlos Santos (2005), ‘Regression models with data-based indicator variables’,
Oxford Bulletin of Economics and Statistics 67(5).
Hendry, D.F. (1995), Dynamic Econometrics, Oxford University Press, Oxford.
Hendry, D.F. & J.A. Doornik (2001), Empirical Econometric Modelling using PcGive: Volume
II, 3 edn, Timberlake Consultants Press, London.
Jacobson, Tor, Anders Vredin & Anders Warne (1997), ‘Common trends and hysteresis in scan-
dinavian unemployment’, European Economic Review 41(9), 1781–1816.
Jacobson, Tor, Anders Vredin & Anders Warne (1998), ‘Are real wages and unemployment re-
lated?’, Economica 65(257), 69–96.
Johansen, Søren (1995), Likelihood-based inference in cointegrated vector autoregressive models,
Oxford University Press, Oxford.
View publication stats
72 JAMES READE
Johansen, Søren (2004), What is the price of maximum likelihood. Unpublished paper, Department
of Applied Mathematics and Statistics, University of Copenhagen.
Johansen, Søren, Rocco Mosconi & Bent Nielsen (2000), ‘Cointegration analysis in the presence
of structural breaks in the deterministic trend’, Econometrics Journal 3(2), 216–249.
Juselius, Katarina & David F. Hendry (2000), Explaining cointegration analysis: Part ii, Dis-
cussion papers, University of Copenhagen. Department of Economics (formerly Institute of
Economics).
Juselius, Katarina & Ronald MacDonald (2004), Interest rate and price linkages between the usa
and japan: Evidence from the post-Bretton Woods period, Technical Report 00-13, University
of Copenhagen. Institute of Economics.
Juselius, Katerina (2006), The Cointegrated VAR Model: Methodology and Applications, Oxford
University Press. Forthcoming.
King, Robert G., Charles I. Plosser, James H. Stock & Mark W. Watson (1991), ‘Stochastic trends
and economic fluctuations’, American Economic Review 81(4), 819–40.
Kydland, F.E. & E.C. Prescott (1982), ‘Time to build and aggregate fluctuations’, Econometrica
50(6), 1345–1370.
Nielsen, Bent & Anders Rahbek (2000), ‘Similarity issues in cointegration analysis’, Oxford Bul-
letin of Economics and Statistics 62(1), 5–22.
Nielsen, Heino Bohn (2004), ‘Cointegration analysis in the presence of outliers’, Econometrics
Journal 7(1), 249–271.
Okun, Arthur M (1962), ‘Potential gnp: Its measurement and significance’, American Statistical
Association, Proceedings of the Business and Economics Statistics Section pp. 98–104.
Pesaran, M Hashem (1991), ‘Costly adjustment under rational expectations: A generalization’,
The Review of Economics and Statistics 73(2), 353–58.
Pétursson, Thórarinn G. & Torsten Sløk (2001), ‘Wage formation and employment in a cointe-
grated var model’, The Econometrics Journal 4(2), 191–209.
Reade, J. James (2005), A cointegrated VAR analysis of employment. Unpublished paper, Uni-
versity of Oxford.
Solow, Robert (1970), Growth Theory, Clarendon Press, Oxford.
University of Oxford
E-mail address: james.reade@stx.ox.ac.uk