You are on page 1of 24

OXFORD BULLETIN OF ECONOMICS AND STATISTICS, 0305–9049

doi: 10.1111/obes.12123

System-Equation ADL Test for Threshold


Cointegration with an Application to the Term
Structure of Interest Rates
Jing Li

Department of Economics, Miami University, Oxford, OH 45056, USA (e-mail:


lij14@miamioh.edu)

Abstract
This paper proposes a new system-equation test for threshold cointegration based on a
threshold vector autoregressive distributed lag (ADL) model. The new test can be applied
when the cointegrating vector is unknown and when weak exogeneity fails. The asymptotic
null distribution of the new test is derived, critical values are tabulated and finite-sample
properties are examined. In particular, the new test is shown to have good size, so the
bootstrap is not required. The new test is illustrated using the long-term and short-term
interest rates. We show that the system-equation model can shed light on both asymmetric
adjustment speeds and asymmetric adjustment roles. The latter is unavailable in the single-
equation testing strategy.

I. Introduction
Some economic theories imply that variables are related only within a certain regime. For
instance, according to the modified law of one price, spatial price linkage exists only in the
regime where price differential exceeds transaction cost. Outside that regime, the arbitrage
becomes non-profitable, so the price linkage breaks. Empirical findings are provided in
Giovanini (1988), Taylor and Peel (2000), Sephton (2003) and Jang et al. (2007), among
others. This type of asymmetric adjustment towards equilibrium is the key idea of the
threshold cointegration introduced by Balke and Fomby (1997). In practice, the interest
is often on testing threshold cointegration, namely, the existence of long-run relationship
with regime-dependent error-correcting speeds.
This paper makes contribution to the literature by proposing a new system-equation test
for the threshold cointegration based on the autoregressive distributed lag (ADL) model.
The new test improves existing tests as follows. Enders and Siklos (2001) (ES) first propose
an Engle–Granger-type residual-based single-equation test for the threshold cointegration.

JEL Classification numbers: C22, C12, C13


1
© 2016 The Department of Economics, University of Oxford and John Wiley & Sons Ltd.
2 Bulletin

The ES test can capture only asymmetric adjustment speeds. In contrast, the new ADL test
is based on a system-equation model, so that it can accommodate not only asymmetric
speeds, but also asymmetric roles. By asymmetric roles, we mean some variables have
error-correcting speeds significantly different from other variables. The asymmetric roles
are illustrated in the application of the new test.
Next Seo (2006) develops a system-equation test using a generalized error correction
model (ECM). The ECM test requires correctly specified cointegrating vectors. But in prac-
tice, the cointegrating vector can be misspecified.To make it worse, many economic theories
are not numerically specific, resulting in unknown cointegrating vectors. That means the
application of the ECM test can be limited. The new ADL test is not as restrictive, and can
be applied even when the cointegrating vector is unknown. This improvement is attributed
to the fact that in the ADL model, the lagged regressand, rather than the error correction
term, is used as the regressor. Using the lagged regressand has another advantage. The ADL
test is shown to be insensitive to a poorly-estimated or misspecified cointegrating vector.
The ADL test conducts a more efficient grid search for the threshold value than the
ECM test. The ECM test uses the error correction term as the threshold variable, which
is non-stationary under the null hypothesis of no cointegration, and searches the unknown
threshold parameter over a range of fixed values. Because the probability of a non-stationary
series being less than a fixed value approaches zero asymptotically, the threshold parameter
is absent in the limiting distribution of the ECM test. This implies that the information
obtained in the grid search is asymptotically unexploited. In contrast, the ADL test uses the
difference of the error correction term, which is stationary under the null hypothesis, as the
threshold variable. Moreover, the ADL test treats a percentile of the empirical distribution
of the difference of error correction term as the threshold parameter. As a result, the grid
search of the ADL test is over a range of percentiles, rather than a range of fixed values. By
doing so, the searching result is fully utilized regardless of the sample size. In the end the
limiting distribution of the ADL test can be expressed as functionals of the two-parameter
Brownian motion; the threshold parameter is one of the two parameters.
Another improvement over the ECM test is that the ADL test allows for a non-zero
deterministic component in the series. By adding the trend in the testing regression, theADL
test is able to distinguish difference stationarity from trend stationarity. This generalization
is important given that many economic series are trending. Following Sims, Stock and
Watson (1990), we develop the asymptotic theory based on the canonical form of the
trend-augmented model.
A single-equation ADL test is considered in Li and Lee (2010). The single-equation
test is easy to compute. However, the validity of that test hinges on the assumption of weak
exogeneity, i.e., only one variable is error-correcting; others are not. See Boswijk (1994)
and Zivot (2000) for more discussion about weak exogeneity. Weak exogeneity becomes
questionable when a system is of high dimension. By assuming away weak exogeneity the
new test is less restrictive than the single-equation ADL test.
Finally, the new test extends the linear ADL cointegration test of Banerjee et al. (1986).
The linear test assumes regime-invariant error-correcting speeds. Early works such as
Pippenger and Goering (1993) have shown the linear test is prone to losing power in the
presence of regime switching. The power loss does not occur to the new test because it
explicitly accounts for asymmetric adjustment speeds.

© 2016 The Department of Economics, University of Oxford and John Wiley & Sons Ltd
System-equation ADL test 3

Testing for the threshold cointegration is non-standard because the threshold parameter
cannot be identified under the null hypothesis. Following Hansen (1996) we adopt the sup
Wald statistic to address the so-called Davies’ problem, c.f., Davies (1977, 1987). The
limiting null distribution of the new test is derived as functionals of the two-parameter
Brownian motion.
We conduct Monte Carlo experiments to compare the finite-sample performance. Re-
markably, the simulation shows that the size distortion of the new test is much smaller
than the ECM test. Thus the bootstrap is not required for the new test, a good message
for practitioners. Following Horvath and Watson (1995) and Zivot (2000), we examine the
effect of misspecification of the cointegrating vector. The simulation results indicate that
the incorrectly specified cointegrating vector can lead to deterioration in the power of the
ECM test, but not the ADL test. Furthermore, the power of our new test is insensitive to the
assumption of weak exogeneity. That is not the case for the single-equation tests of Enders
and Siklos (2001) or Li and Lee (2010).
We provide an application of the new test, in which the relationship of US short-term
and long-term interest rates is investigated. The focal point is the asymmetry in the process
of error correction. In particular, it is shown that the federal funds rate plays larger adjusting
role than the 10-year interest rate.
Throughout the paper 1(.) denotes the indicator function, [x] the integer part of x, ⇒ the
weak convergence with respect to the uniform metric on [0, 1]2 . All mathematical proofs
are in the appendix.

II. Threshold vector ADL model


Consider a two-regime threshold vector autoregressive distributed lag model (TVADLM)
(L)yt = (c11 + c12 t + 1 yt−1 )11,t−1 + (c21 + c22 t + 2 yt−1 )12,t−1 + et , (1)
where yt = (y1,t ,…, yn,t ) is an n × 1 data vector from a sample of size T , yt is the first
difference, et is an n × 1 vector of error terms, and 1 and 2 are n × n matrixes. Intercepts
and trends are included in (1), so the model
p can be applied to trending series. The short-run
dynamics is represented by (L) ≡ In − j=1 j Lj , a p-th order matrix polynomial in the lag
operator L. Choosing p can be guided by information criteria of AIC and BIC in practice.
Economic factors such as transaction cost can cause threshold behaviour or regime-
switching, which is determined by the indicator functions of 11,t−1 and 12,t−1 . Let  denote
the n × 1 cointegrating vector1 and
xt−1 ≡  yt−1 (2)
denote the lagged error correction term. The literature usually lets the regime-switching
be governed by xt−1 since it measures the deviation from the equilibrium.
1
Here we assume  has column rank of 1. The problem of testing q against q + 1 threshold cointegration rela-
tionships is beyond the scope of this paper and is left to future research. It is easy to see why allowing for multiple
cointegrations can cause technical difficulty. For one thing, there would be multiple error correction terms if the
column rank of  is greater than 1. In that case, it is unclear to grid search which error correction term or which
combination of them. In other words, there would be ambiguity in specifying the indicator functions. Note that the
system-equation ECM test of Seo (2006) and the single-equation test of Enders and Siklos (2001) also assume  has
column rank of 1.

© 2016 The Department of Economics, University of Oxford and John Wiley & Sons Ltd
4 Bulletin

The next question is which level of xt−1 will trigger the regime-switching. The ECM
test of Seo (2006) searches the threshold value over a range of fixed values, despite the
fact that xt−1 is non-stationary under the null hypothesis of no cointegration. Because a
fixed value is asymptotically irrelevant for a non-stationary series, the threshold parameter
is absent in the limiting distribution of the ECM test.
This paper lets the threshold parameter  be a percentile of the empirical distribution
of the difference of error correction term. That is, the indicator functions are specified as
11,t−1 = 1(vt−1 < ṽt−1 ()); 12,t−1 = 1(vt−1  ṽt−1 ()) (3)
where vt−1 = xt−1 , ṽt−1 is vt−1 sorted in an ascending order, and ṽt−1 () denotes the th
element of ṽt−1 , i.e., ṽt−1 () is the -th quantile. Notice that the threshold variable vt−1 is
stationary under the null hypothesis of no cointegration.2 Also notice that the searching
range for  is the parameter space of percentiles, which is always restricted to lie between 0
and 1. Searching the percentile has the benefit that the parameter  always matters, no matter
in finite samples or in the limit, and no matter xt−1 is stationary or not. More discussion
about using the percentile as the threshold parameter can be found in Li and Lee (2010).
Basically the indicators say that in the regime one the difference of error correction
term is less than the -th quantile, whereas greater than or equal to the quantile in the
regime two. Alternative specifications are possible. For example, the threshold variable
can be the d-th lag xt−d , and we can estimate the delay parameter d by the grid search.
A three-regime model can be specified using three indicators: 11,t−1 = 1(vt−1 < ṽt−1 (1 )),
12,t−1 = 1(ṽt−1 (1 )  vt−1  ṽt−1 (2 )) and 13,t−1 = 1(vt−1 > ṽt−1 (2 )). Testing the number of
regimes is an unsolved issue. This paper focuses on the two-regime model (1) for ease of
exposition.
To develop the asymptotic theory, we assume
Assumption 1. (a) et is martingale difference (MDS) process with Eet et = , and
E|et |4 < ∞. (b) All roots of |(L)| = 0 are outside the unit circle. (c) xt has a bounded
density function.
Assumption 1(b) is typically imposed in the literature. Assumptions 1(a) and (b) ensure
that the following weak convergence holds: as T → ∞,

[Tr]

T −1/ 2 11,t−1 et ⇒ PW (r, ), (4)


t=1

where P is the Cholesky factor of , W (r, ) denotes the multivariate version of the two-
parameter Brownian motion (TPBM) introduced by Caner and Hansen (2001).3 The key
is that the TPBM depends on the threshold parameter , so  is relevant for the asymptotic
theory.
The link between the ADL model (1) and the error correction model becomes apparent
if we rewrite
1 = 1  , 2 = 2  , (5)
2
The asymptotic theory of Caner and Hansen (2001) requires the threshold variable be stationary.
3
When  = 1, W (r, 1) becomes the standard multivariate one-parameter Brownian motion.

© 2016 The Department of Economics, University of Oxford and John Wiley & Sons Ltd
18 Bulletin

where vt−1 = xt−1 , and zt−1 = cdf(vt−1 ). Due to the property of empirical distribution
function, in the limit as T → ∞, zt−1 has a marginal uniform distribution U (0, 1), which
satisfies assumption 2 of Caner and Hansen (2001).
Suppose et is an i.i.d (n × 1) sequence with mean zero, finite fourth moments, and
Eet et =  = PP  where 
P is the Cholesky factor. Let ut = D(L)et denote a stationary
t linear

vector sequence with s=0 s|Dij | < ∞ for each i, j = 1,…, n. Define
t = j=1 uj ,  =
s
∞ 
(D0 + D1 +…)P = D(1)P, and 1 = i=1 E(1(zt−1 < )1(zt−1−i < )ut ut−i ). Let W (r, )
denote the n-dimensional two-parameter Brownian motion; W (r, 1) the n-dimensional one-
parameter Brownian motion in r. The asymptotic theory is based on the following lemma.
Lemma 1.


[Tr]

a : T −1/ 2 1(zt−1 < )et ⇒ PW (r, )


t=1


T  
−1
b: T
t−1 1(zt−1 < )et ⇒ W (r, 1)dW (r, ) P 


t=1


T  
−2
c: T
t−1
t−1 1(zt−1 < ) ⇒  W (r, 1)W (r, 1) dr 


t=1

Proof of Lemma 1a. Let vt be an i.i.d (n × 1) vector with mean zero and unity variance
Evt vt = I . Notice that {11,t−1 vit } = {1(zt−1 < )vit }, i = (1,…, n), is a strictly stationary and
ergodic martingale difference process with variance E(1(zt−1 < )vit )2 = . For any (r, ),
the martingale difference central limit theorem implies that


[Tr]
−1/ 2
T 1(zt−1 < )vit →d N (0, rI),
t=1

and the asymptotic covariance kernel is


  

[Tr1 ]

[Tr2 ]

E T −1/ 2 1(zt−1 < 1 )vit T −1/ 2 1(zt−1 < 2 )vit = (r1 ∧ r2 )(1 ∧ 2 ).
t=1 t=1
[Tr]
The proof of the stochastic equicontinuity for each component of T −1/ 2 t=1 1(zt−1 < )vit is
lengthy, and can be found in Caner and Hansen (2001). Because the sequences of marginal
probability measures are tight, we deduce that the univariate stochastic equicontinuity
carries over to the vector, see lemma A.3 of Phillips and Durlauf (1986). The final result
is deduced after applying the Cramer–Wold device and noticing that et = Pvt .
Proof of Lemma 1b. We apply the Beveridge–Nelson ∞ ∞ to ut and obtain
decomposition
ut = D(1)et + ẽt−1 − ẽt where ẽt = D̃(L)et , D̃(L) = j=0 d̃ j Lj and d̃ j = s=j+1 ds . The stated
result is obtained after applying Theorem 2.2 in Kurtz and Protter (1991) and Lemma 1a
above.
Proof of Lemma 1c. This is a trivial extension of result 3 of theorem 3 of Caner and
Hansen (2001) to the multivariate case.

© 2016 The Department of Economics, University of Oxford and John Wiley & Sons Ltd
6 Bulletin

yt−1 12,t−1 in one regression are different from other regressions. The univariate model (7)
cannot illustrate asymmetric adjustment roles. We will revisit this issue in the application.

III. System-equation ADL test for threshold cointegration


This paper proposes a new test for the null hypothesis of no cointegration:
H0 : 1 = 2 = 0 (8)
based on TVADLM (1), called the system-equation ADL test for threshold cointegration.
Because of (5), testing (8) is equivalent to testing 1 = 2 = 0 in (6). So the ADL test and
ECM test are closely related. Li and Lee (2010) show that testing (8) can be undertaken
in a single-equation framework if each of 1 and 2 contains only one non-zero element
(i.e., only one of (y1,t ,…, yn,t ) is error-correcting). This condition is called weak exogeneity,
which can be violated in practice. The system-equation ADL test is more general than the
single-equation ADL test by assuming away weak exogeneity.6
When (8) is rejected, we conclude that series are cointegrated, but not necessarily in
both regimes. The null hypothesis (8) can also be rejected if yt is linearly cointegrated, i.e.,
1 = 2 = 0. In this sense, the threshold cointegration test extends the linear cointegration
test.7
Testing (8) is non-standard because under (8) the indicators disappear and  cannot
be identified. The so-called Davies’ problem was first raised by Davies (1977, 1987). In
order to resolve the Davies’ problem, we follow Hansen (1996) and calculate the sup-Wald
statistic8 as follows. First, we estimate the cointegrating vector  by OLS if it is unknown,
and compute the error correction term (2). Then, for given , we define the indicators
(3), fit (1) by OLS, obtain the estimated coefficients ˆ = (ˆ1 (), ˆ2 ()), the residual êt (),
and the variance-covariance matrix    = T −1  êt ()êt () . We stack matrices by letting
Yt = (yt−1 11,t−1 , yt−1 12,t−1 ), Y = [Y1 Y2…YT ] , e = [e1 e2…eT ] . Define the annihilation matrix
Q ≡ I − Z(Z  Z)−1 Z  where Z stacks (yt−1 ,…, yt−p , dt ). Similar to the Dickey–Fuller unit
root test, the distribution of the ADL test depends on the deterministic term dt . There are
three cases:
(Case I, no intercept or trend): c11 = c12 = c21 = c22 = 0 in (1);
(Case II, intercept only): c11 = 0, c21 = 0, c12 = c22 = 0 in (1);
(Case III, intercept and trend): c11 = 0, c21 = 0, c12 = 0, c22 = 0 in (1).
Let F(), F d () and F t () denote the Wald statistics for cases I, II and III, respectively.
They have the same form, which is
F(), F d (), F t () ≡ [vec(ˆ )] [v
ar(vec(ˆ ))−1 ][vec(ˆ )] (9)
6
Another difference between the system-equation ADL test and single-equation ADL test is that we develop a new
asymptotic theory for the system-equation ADL test that involves the two-parameter Brownian motion.
7
Using threshold cointegration tests (including the ADL test) may have cost. If the system is linearly cointegrated
(i.e., 1 = 2 = 0), then the linear cointegration will have greater power than the threshold cointegration test. In other
words, the power gain of the threshold cointegration test relative to the linear cointegration test depends on the degree
of asymmetric adjustment.
8
This paper focuses on the sup test due to its simplicity. See Andrews and Ploberger (1994) for alternatives.

© 2016 The Department of Economics, University of Oxford and John Wiley & Sons Ltd
System-equation ADL test 7

  ]−1 [vec(ˆ )]
= [vec(ˆ )] [(Y QY )−1 ⊗  (10)
 
 −1 ][vec(e QY )]
= vec(e QY ) [(Y QY )−1 ⊗  (11)


 −1/ 2 (e QY )(Y  QY )−1 (Y  Qe)


= trace[  −1/ 2 ], (12)
   

where  −1/ 2 is the Cholesky factor of 


  , vec denotes the vec operator and ⊗ the Kro-

necker product operator. Note that the definition of the Wald statistics is (9), but in practice,
we calculate the Wald statistics using (10). The asymptotic theory is built upon (12). Finally,
we search  over a range , and the sup Wald test is
sup F(), sup F d (), sup F t (), (13)
∈ ∈ ∈

where  is a symmetric compact set such as  = [0.15, 0.85].9

IV. Asymptotic Theory


Let W (r, ) denote the n-dimensional
 1 two-parameter
 Brownian motion on (r, ) ∈ [0, 1]2 .
For notational economy, we write 0 as . By holding  constant, the integration is over
r for the stochastic integrations such as W (r, 1)dW (r, ) . Following the literature, we
make an auxiliary assumption for cases I and II.
Assumption 2. Under the null hypothesis (8), yt is generated by (1) with c11 = c12 =
c21 = c22 = 0.
Assumption 2 states that the true process under the null is vector random walk without
drift.10 The limiting null distributions of the system-equation ADL tests (13) for cases I
and II are
Theorem 1. (Case I) Under (8) and Assumptions 1, 2, as T → ∞,
sup F() ⇒ sup trace[J0 J1−1 J0 ], (14)
∈ ∈

where
⎛ ⎞
W (r, 1)dW (r, )
⎜ ⎟
J0 = ⎜ ⎟
⎝ W (r, 1)dW (r, 1) − W (r, 1)dW (r, ) ⎠,

⎛ ⎞
 W (r, 1)W (r, 1) dr 0
⎜ ⎟
J1 = ⎝ ⎠.
0 (1 − ) W (r, 1)W (r, 1) dr

9
Hansen (1996) points out the Wald statistic is ill-behaved for extreme values of . Therefore  cannot be too close
to 0 or 1.
10
To understand why the auxiliary assumption is needed, see page 518 of Hamilton (1994), the discussion about
case 2 for the augmented Dickey–Fuller test.

© 2016 The Department of Economics, University of Oxford and John Wiley & Sons Ltd
8 Bulletin

TABLE 1
Asymptotic critical values
w/o constant w/ constant w/ trend
1% 5% 10% 1% 5% 10% 1% 5% 10%
 = [0.10, 0.90]
n=2 35.0 30.6 28.3 49.0 43.7 40.6 62.4 54.8 51.5
n=3 61.4 55.3 52.1 78.2 71.7 67.9 95.1 87.3 83.5
n=4 95.5 88.1 83.9 114.8 107.5 103.2 135.5 126.9 122.5
n = 5 138.0 128.8 123.8 160.2 151.4 146.8 184.5 175.0 169.7

 = [0.15, 0.85]
n=2 35.5 30.2 27.9 48.2 42.5 39.9 60.0 54.3 51.1
n=3 61.3 55.2 52.1 77.3 70.4 67.2 95.1 86.9 82.8
n=4 95.4 87.7 83.6 116.6 106.9 102.6 136.1 126.7 122.2
n = 5 137.2 127.9 123.4 161.3 151.0 145.5 184.1 174.2 169.0

 = [0.20, 0.80]
n=2 35.4 29.6 27.3 48.0 42.0 38.9 59.4 53.5 50.4
n=3 60.7 54.2 50.9 77.6 70.3 66.7 94.2 86.1 82.4
n=4 95.9 86.8 82.7 112.4 105.6 101.4 135.8 126.0 121.4
n = 5 135.8 125.9 121.6 158.9 149.4 144.7 185.4 173.4 168.2

Theorem 2. (Case II) Under (8) and Assumptions 1, 2, as T → ∞,



sup F d () ⇒ sup trace[J0d (J1d )−1 J0d ], (15)
∈ ∈

where
⎛ ⎞
W (r, 1)drW (1, )
⎜ ⎟
J0d = J0 − ⎜

⎟,
W (r, 1)dr[W (1, 1) − W (1, )] ⎠
⎛ ⎞
 W (r, 1)dr W  (r, 1)dr 0
⎜ ⎟
J1 = J1 − ⎝
d
⎠.

0 (1 − ) W (r, 1)dr W (r, 1)dr

There are some remarks. First of all, the distributions in (14) and (15) are in terms
of the TPBM, which includes the threshold parameter . In contrast theorem 2 of Seo
(2006) shows that  is absent in the limiting distribution of the ECM test. Second, the new
distributions depend on n, the dimensional parameter, and , the searching range. Finally,
the distributions are free of nuisance parameters such as the Cholesky factor P and serial
correlation characterized by (L). Therefore, we can tabulate in Table 1 the critical values
of the limiting distributions for various n and . Those critical values are computed as the
empirical quantiles from 5,000 independent draws from the simulations for the asymptotic
formula in Theorems 1 and 2.
Table 1 shows that the critical value increases as the searching range  gets wider. This
is indicative of the divergent behaviour of the sup Wald test. The critical value also increases

© 2016 The Department of Economics, University of Oxford and John Wiley & Sons Ltd
System-equation ADL test 9

as the dimensional parameter n rises. Finally, the critical values of sup∈ F() (reported
under column w/o constant in Table 1) are less than those of sup∈ F d () (reported under
column w/ constant). As expected, the intercept term shifts the limiting distribution.
For case III we make the auxiliary assumption that
Assumption 3. Under the null hypothesis (8), yt is generated by (1) with c12 = c22 =
0, c11 = c21 = c0 .
Assumption 3 states that under the null hypothesis yt is vector random walk with drift
component of c0 . In this case we need to derive the canonical form introduced by Sims,
Stock and Watson (1990) for (1). To see why, notice that under Assumption 3 (L)yt =
c0 + et . So yt consists of a stochastic trend and a deterministic trend, and the latter is
correlated with the trend term in the testing regression. Therefore, it is necessary to avoid
collinearity by removing the deterministic trend from yt . Towards that end, we consider the
canonical form given by
(L)yt = (c11
*
+ c12
*
t + *1 t−1 )11,t−1 + (c21
*
+ c22
*
t + *2 t−1 )12,t−1 + et . (16)
where t−1 = yt−1 − c0 (t − 1), *1 = 1 , *2 = 2 , c11
*
= c11 − c0 , c12
*
= c12 + c0 , c21
*
= c21 − c0 ,
and c22 = c22 + c0 . Now the new regressor t−1 in (16) contains only the stochastic trend.
*

The hypothetical model (16) is just the algebra rearrangement of (1), so testing 1 = 2 = 0
in (1) is equivalent to testing *1 = *2 = 0 in (16). The computation of the ADL test, denoted
by sup∈ F t () in this case, is still based on (1). The limiting distribution of sup∈ F t ()
is derived based on (16), and is given in the following theorem.
Theorem 3. (Case III) Under (8) and Assumptions 1, 3, as T → ∞,

sup F t () ⇒ sup trace[J0t (J1t )−1 Jtd ], (17)
∈ ∈

where
−1 * −1 *
J0t = J0 − C21
*
C22 C23 − C31
*
C32 C33 ,
−1 * −1 *
J1t = J1 − C21
*
C22 C21 − C31
*
C32 C31 ,
      
 W (r, 1)dr  rW (r, 1)dr   rdr
C21 = 0 0 , C22 =  rdr  r 2 dr ,
 
C23 =  W (1, ) 
rdW (r, )
 
0  0
C31 = (1 − ) W (r, 1)dr (1 − ) rW (r, 1)dr ,

    
1 − (1 − ) rdr  [W (1, 1) − W
 (1, )]  
C32 = , C33 = rdW (r, 1) − rdW (r, ) .
(1 − ) rdr (1 − ) r 2 dr

Note that the distribution does not depend on c0 , the drift term. The critical values of
the distribution are reported under the column w/ trend in Table 1. In practice, we can use

© 2016 The Department of Economics, University of Oxford and John Wiley & Sons Ltd
10 Bulletin

sup∈ F t () test to distinguish difference-stationary threshold series from trend-stationary


threshold series. The ECM test cannot do this since it excludes the trend term.

V. Discussion
Nuisance parameters
The asymptotic distributions developed in Caner and Hansen (2001) in general are non-
pivotal. In contrast, limiting distributions of our tests are free of nuisance parameters.
This difference is mainly due to the fact that our parsimonious model (1) constraints the
coefficients of lagged yt−j to be constant across regimes, while they are regime-varying
in Caner and Hansen (2001).
In the literature, Seo (2006) and Tsay (1997) impose similar restrictions as ours. We
decide to follows those two papers for two reasons. First, in our view, the essence of the
threshold cointegration is allowing for regime-varying error correcting speeds, not regime-
varying short run dynamics captured by (L), see Balke and Fomby (1997). Second, the
econometric benefit of imposing regime-invariant (L) outweighs the cost. Our model
becomes inappropriate when (L) is indeed regime-dependent. Nevertheless, the benefit
is that we can obtain an asymptotic distribution of the testing statistic that is pivotal, and
therefore critical values can be tabulated, and the bootstrapping is not needed.
More explicitly, theorems 3 and 5 of Caner and Hansen (2001) show that the nuisance
parameter is related to the product 1t−1 wt−1 where 1(.) is the indicator and wt−1 represents
(yt−1 ,…, yt−p ). Essentially, the nuisance parameter measures the serial correlation of

1t−1 wt−1 and 1t−1 wt−1 wt−1 . Because of the indicator, which is present even under the null
hypothesis, cancellation is impossible and the nuisance parameter cannot be avoided.
Our model is different, in which the indicator completely vanishes under the null hy-
pothesis and Assumptions 2 and 3. In our context, the nuisance parameter measures the

serial correlation of wt−1 and wt−1 wt−1 . But without the indictor, the nuisance parameter
can be cancelled out when deriving the limiting distribution. This situation is similar to the
well-known result that the distribution of the augmented Dickey–Fuller test is free of the
nuisance parameter, see section 17.7 pages 516–522 of Hamilton (1994) for instance.

Threshold effect under the null hypothesis


More generally, the asymptotic distribution of the test should depend on whether the
threshold effect exists under the null hypothesis. So far we have only considered the first
possibility that there is no threshold effect under the null, see Assumptions 2 and 3 of our
paper. Only in that case, the limiting distribution is free of nuisance parameters.
The second possibility is that the threshold effect is present, e.g., c11 = c21 , under the
null. One difficulty is that now the limiting distribution becomes non-pivotal, for the same
reason as Caner and Hansen (2001).11 Instead of providing an upper bound for the limiting
distribution and relying on the bootstrapping, which may take us too astray, this paper
11
The second paragraph on page 303 of Hansen and Seo (2002) also mentions that by removing the lagged yt−j
and deterministic term from their model, the limiting distribution of their test becomes free of nuisance-parameters.
That finding is in line with ours.

© 2016 The Department of Economics, University of Oxford and John Wiley & Sons Ltd
System-equation ADL test 11

suggests that a practitioner may first detrend the series, and then apply the Theorem 1 to
the detrended series.

VI. Monte Carlo experiments


Size
This section compares the finite-sample performance for the proposed system-equation
ADL test, the ECM of Seo (2006) and the ES test of Enders and Siklos (2001). We first
compare the rejection frequency under the null hypothesis (size) at the 5% nominal level.
A bivariate data generating process (DGP) is considered:
y1,t = c1 y1,t−1 + e1,t (18)

y2,t = c2 y2,t−1 + e2,t (19)


xt−1 = y1,t−1 − y2,t−1 (20)

x̂t−1 = y1,t−1 − y
ˆ 2,t−1 (21)
   
e1,t 1 d1
∼ iidn(0, ), = , (22)
e2,t d1 1
where  = 0.1, and ˆ is the OLS estimate. The short-run dynamics is controlled by c1 and
c2 , and the correlation between the error terms is determined by d1 . Their default values
are c1 = 0.1, c2 = 0.5, and d1 = 0.7. When running the simulation, we let one parameter
vary while keeping other parameters at their default values. The null hypothesis (8) is
imposed since neither y1,t nor y2,t is error-correcting. The number of iteration is 5,000, and
the sample size is 100. The initial values of y1,t and y2,t are set as random numbers.
The testing regression has no intercept or trend. The corresponding critical values at
the 5% level are 30.2 for the ADL test, 7.08 for the ES test (denoted by * in table 5 of
Enders and Siklos (2001)), and 10.968 for the ECM test (denoted by supW in table 1 of
Seo (2006)). The ADL and ES tests use the estimated error correction term (21), while the
ECM test uses the predetermined error correction term (20). Panels a, b, c of Figure 1 plot
the sizes against c1 , c2 and d1 , respectively.
We can summarize Figure 1 as follows. Among the three tests, the size of the ADL test
(marked by circle) is closest to the nominal level 0.05, and is largely invariant to c1 , c2 and
d1 . This finding confirms that the limiting distribution in Theorem 1 is indeed free of those
nuisance parameters. It also implies that the bootstrap is not needed for the ADL test. In
contrast, a size as large as 0.4 is found for the ECM test (marked by square). Because of
the severe size distortion, the ECM test has to be used with the bootstrap.

Power
Next we compare the rejection frequency under the alternative hypothesis (power). In light
of the size distortion, we focus on the size-adjusted power by using the 5% bootstrap critical
value obtained under the null hypothesis. The new DGP is:

© 2016 The Department of Economics, University of Oxford and John Wiley & Sons Ltd
12 Bulletin

Panel A Panel B Panel C


0.5 0.5 0.5
ADL ADL ADL
ES ES ES
0.45 ECM 0.45 ECM 0.45 ECM

0.4 0.4 0.4

0.35 0.35 0.35

0.3 0.3 0.3


Size

Size

Size
0.25 0.25 0.25

0.2 0.2 0.2

0.15 0.15 0.15

0.1 0.1 0.1

0.05 0.05 0.05

0 0 0
0 0.5 1 0 0.5 1 0 0.5 1
c1 c2 d1

Figure 1 Rejection frequency of various tests under the null hypothesis at the 5% level

y1,t = −0.1xt−1 11,t−1 + k1 xt−1 12,t−1 + 0.1y1,t−1 + e1,t (23)

y2,t = xt−1 + 0.5y2,t−1 + e2,t (24)

x̃t−1 = y1,t−1 − ( + f )y2,t−1 . (25)


Note that y1,t is error-correcting and the null hypothesis is violated. The parameter k1
determines the error-correcting speed in the second regime. The system is cointegrated
linearly when k1 = −0.1. In this section, we let  = 0, so y2,t is not an error-correction and
weak exogeneity holds.
Because the ADL and ES tests can use either predetermined cointegrating vector or
estimated cointegrating vector, and because the power depends on which cointegrating
vector is used, we need to be careful to make sure the tests are compared on equal footing.
So we denote the ADL and ES tests that use the predetermined error correction term (25)
by ADLP and ESP, and the tests that use the estimated error correction term (21) by ADL
and ES. By allowing for f = 0 in (25), the predetermined error correction term can differ
from the true error correction term (20). Thus, we can evaluate the effect of a misspecified
cointegrating vector on the power. Figure 2 plots the powers against k1 , with f = 0 in panel
a, f = 0.05 in panel b, and f = 0.1 in panel c.

© 2016 The Department of Economics, University of Oxford and John Wiley & Sons Ltd
System-equation ADL test 13

Panel A, f=0 Panel B, f=0.05 Panel C, f=0.1


1 1 1
ADL ADL ADL
ES ES ES
0.9 ECM 0.9 ECM 0.9 ECM
ADLP ADLP ADLP
ESP ESP ESP
0.8 0.8 0.8

0.7 0.7 0.7

0.6 0.6 0.6


Power

Power
Power

0.5 0.5 0.5

0.4 0.4 0.4

0.3 0.3 0.3

0.2 0.2 0.2

0.1 0.1 0.1

0 0 0
−1 −0.5 0 −1 −0.5 0 −1 −0.5 0
k1 k1 k1

Figure 2 Rejection frequency of various tests under the alternative hypothesis at the 5% level

There are two main findings. First, the powers of all tests are minimized when k1 = −0.1.
Then the powers start to rise as k1 moves towards −0.8. This result is expected because
k1 measures both the degree of asymmetry and strength of error-correction. Second, the
powers of ECM and ESP are sensitive to the misspecification in the cointegrating vector.
For instance, the ECM test has the greatest power in panel a; but its power becomes the
second lowest in panel c. In contrast, the power of the ADLP test is almost invariant to the
misspecification.
We need to interpret the performance of the ECM test with caution. Remember due to
the nature of the Monte Carlo experiment, the cointegrating vector (1, − ) happens to be
known. That is why the ECM test can be applied here. The ECM test is shown to dominate
other tests only under the favourable condition of no misspecification (f = 0). In practice
the cointegrating vector can be unknown or mis-specified though.

Weak exogeneity
In this section we compare the powers of the proposed system-equation ADL test, the
single-equation ES test of Enders and Siklos (2001), and the single-equation ADL (BO)
test of Li and Lee (2010). In particular, we focus on the effect of weak exogeneity on the
power. Towards that end, we try  = (0, − 0.1,…, − 0.7) in (24), and consider  = 0.1, f = 0,
k1 = −0.8. Weak exogeneity fails when  = 0.

© 2016 The Department of Economics, University of Oxford and John Wiley & Sons Ltd
14 Bulletin

Power and Weak Exogeneity

0.8
Rejection Frequency

0.6

0.4

0.2

System−ADL
ES
Single−ADL
0
−0.7 −0.6 −0.5 −0.4 −0.3 −0.2 −0.1 0
lambda

Figure 3 Power and weak exogeneity

All the tests uses the estimated error correction term. Figure 3 plots the power at 5%
significance level against . It is clearly shown that the proposed system-equation ADL
test outperforms the other two single-equation tests when  less than −0.2. Thus, the
system-equation test is preferred when the weak exogeneity fails.

VII. An application
This section investigates the relationship between the US 10-year treasury constant maturity
rate (10ybond, y1,t ) and the effective federal funds rate (Fedfunds, y2,t ). The monthly data
from January 1962 to August 2013 are downloaded from FRED Economic Data.12 The
sample size is 620.
Panel a of Figure 4 presents the time series plot. Both interest rates appear highly persis-
tent. The augmented Dickey–Fuller t statistics with the lag selected by AIC are −1.97 for
Fedfunds and −1.30 for 10ybond.Therefore we conclude that both series are non-stationary.
Panel a also shows the two series tend to move together over time. This comovement is
predicted by the expectation hypothesis of the term structure. Our cointegration analysis
starts with estimating the long run relationship:

12
The website is http://research.stlouisfed.org/fred2/.

© 2016 The Department of Economics, University of Oxford and John Wiley & Sons Ltd
System-equation ADL test 15

Panel A: Time Series Plot


20
Fedfunds
10ybond
15

10

0
0 100 200 300 400 500 600
Obs

Panel B: Error Correction Term


4

−2

−4
0 100 200 300 400 500 600
Obs

Figure 4 Time series plot

y1,t = 0 + 1 y2,t + xt , (26)


where xt is the error correction term, which measures the size of disequilibrium. Since 0
and 1 are both unknown, this is an example where the economic theory does not quantify
the cointegrating vector. The OLS estimation13 of (26) is
xt = y1,t − 2.81 − 0.68y2,t ,
and panel b of Figure 4 plots xt . The fact that xt shows less persistence than y1,t and y2,t is
suggestive of cointegration. Actually the Engle–Granger linear cointegration test is –3.41,
so rejects the null hypothesis of no cointegration at the 5% level.
The Engle–Granger test assumes constant adjustment speeds. Nevertheless, intuition is
that the adjustment speed may depend on whether the error correction term is big or small:
big disequilibrium supposedly leads to fast adjustment. Now the question is, which value
of the error correction term will trigger the fast adjustment? In order to answer that, we
13
In theory it is possible for the cointegrating vector to be different from (1, −1) for the long-term and short-term
interest rates. For instance, if we assume the long-term interest rate y1, t is integrated of order one I(1), and the
spread between the long-term and short-term rates y1, t − y2, t is I(1) as well (which is possible in the presence of

non-stationary risk premium), then the normalized cointegrating vector is (1, 1− ), not (1, −1), if y1, t − spread =
y1, t − (y1, t − y2, t ) = (1 − )y1, t + y2, t is integrated of order zero (stationary).

© 2016 The Department of Economics, University of Oxford and John Wiley & Sons Ltd
16 Bulletin

TABLE 2
Estimation results
xt y1,t − 2.81 − 0.68y2,t
Regressors
Equation (27) y1,t−1 11,t−1 y1,t−1 12,t−1 y2,t−1 11,t−1 y2,t−1 12,t−1 y1,t−1 y2,t−1 11,t−1 12,t−1
Coefficients 0.06 −0.05 −0.03 0.03 −0.03 0.25 −0.15 0.18
t values 0.77 −3.98** −0.53 3.04** −1.32 5.91** −0.99 4.03**
Ljung–Box Q 1.13
test
Regressors
Equation (28) y1,t−1 11,t−1 y1,t−1 12,t−1 y2,t−1 11,t−1 y2,t−1 12,t−1 y1,t−1 y2,t−1 11,t−1 12,t−1
Coefficients 0.22 −0.03 −0.23 0.03 0.40 0.30 0.19 0.06
t values 1.93* −1.45 −3.43** 1.97** 10.75** 5.10** 0.88 0.90
Ljung–Box Q 0.52
test
Tests for threshold cointegration
ADL sup∈[0.15,0.85] F d () = 128.04**
Enders–Siklos 18.62**
test
Note: **significant at the 5% level, *significant at the 10% level.

grid search the threshold value by minimizing the determinant of   = T −1  êt ê , where
t
êt = (ê1,t , ê2,t ) denotes the OLS residual of the two-regime threshold vector ADL model:

1 (L)y1,t = (c1,11 + 11 yt−1 )11,t−1 + (c1,21 + 12 yt−1 )12,t−1 + e1,t (27)

2 (L)y2,t = (c2,11 + 21 yt−1 )11,t−1 + (c2,21 + 22 yt−1 )12,t−1 + e2,t , (28)

where the trend is excluded since neither interest rate is trending. We follow the principle of
parsimony and start with one lag (let p = 1 in 1 (L) and 2 (L)). The model seems adequate
as the Ljung–Box Q test finds no serial correlation in ê1,t and ê2,t .
Table 2 reports the estimation results for (27) and (28). There are two main findings.
First, asymmetric adjustment speeds are confirmed by the fact that the absolute values of
the coefficients in lower regime (with indicator 11,t−1 ) are different from those in upper
regime (with indicator 12,t−1 ). For instance, in (28), the estimated coefficient of y1,t−1 is
0.22 and significant at the 10% level in lower regime, but −0.03 and insignificant in upper
regime. To explain this result, notice that the error correction term is less than −1.25 in
lower regime. In that case the yield curve is likely to be inverted, so the market reacts
quickly to this unusual event. For instance, in panel a of Figure 4, we see the Fedfunds
exceeds 10ybond and the yield curve becomes inverted after the 200-th observation. During
that period (from September 1978 to May 1980) the adjustment speed of Fedfunds is much
faster than other periods with normal yield curves.
The second main finding is, the absolute values of some coefficients in (28) are signif-
icantly greater than (27). Take the coefficient for the regressor y2,t−1 11,t−1 . It is −0.23 and

© 2016 The Department of Economics, University of Oxford and John Wiley & Sons Ltd
System-equation ADL test 17

significant in (28), but −0.03 and insignificant in (28). Recall that the regressands are Fed-
funds in (28) and 10ybond in (27). Hence the second finding indicates that the two interest
rates play asymmetric roles in the adjustment process. In particular, the equation for the
short-term rate has two big coefficients (0.22 and −0.23), and therefore the short-term rate
plays a bigger role than the long-term rate. This finding is also intuitive: Fedfunds plays
bigger role just because it is a policy tool that is frequently manipulated by the Federal
Reserve.
Note that the two main findings are unavailable in the Engle–Granger linear cointe-
gration model. The single-equation models of Enders and Siklos (2001) and Li and Lee
(2010) can detect only the asymmetric adjustment speeds, but not the asymmetric roles.
This application shows the system-equation model is inherently more informative than the
single-equation model.
Finally we test the null hypothesis of no threshold cointegration, i.e., H0 : 11 = 12 =
21 = 22 = 0 in (27) and (28). The ECM test of Seo (2006) cannot be used here because
0 and 1 in Li and Lee (2010) are not predetermined. The single-equation ADL test
of Li and Lee (2010) is also invalid since both y1,t and y2,t are error-correcting and weak
exogeneity fails. We let the searching range be  = [0.15, 0.85].TheADL test with intercept
sup∈ F d () equals 128.04, greater than the 5% critical value 42.5. The Enders–Siklos test
is 18.62, rejecting the null as well.
We can summarize the application as follows. First, the long-term and short-term in-
terest rates are cointegrated. Second, there is evidence that an inverted yield curve leads to
fast adjustment. Finally, the short-term rate plays bigger role than the long-term rate.

VIII. Conclusion
This paper develops a new system-equation ADL test for the threshold cointegration, which
can resolve some problems found in the existing tests. In particular, the new test can be used
when the cointegrating vector is unknown and when weak exogeneity fails. The new test
can be applied to trending series, and the grid-searching information is efficiently utilized
irrespective of sample sizes.
The limiting distribution of the new test can be tabulated since it involves no nuisance
parameters. Monte Carlo simulations show that the size distortion of the new test is neg-
ligible, so the bootstrap is not needed. The power of the new test is comparable to the
existing tests. We provide an application of the new test, in which the long run equilibrium
between long-term and short-term interest rates is found and twofold asymmetries are
detected. Noticeably, statistical evidence is provided for the asymmetric adjustment roles;
such evidence is eluded in the single-equation model.

Appendix A. Mathematical proof


Let cdf(.) denote the empirical distribution function, which is non-decreasing, and is a
step function that jumps for 1/T at each of the T data points. Because cdf is monotonic,
we have

11,t−1 = 1(vt−1 < ṽt−1 ()) = 1(cdf(vt−1 ) < ) = 1(zt−1 < )

© 2016 The Department of Economics, University of Oxford and John Wiley & Sons Ltd
18 Bulletin

where vt−1 = xt−1 , and zt−1 = cdf(vt−1 ). Due to the property of empirical distribution
function, in the limit as T → ∞, zt−1 has a marginal uniform distribution U (0, 1), which
satisfies assumption 2 of Caner and Hansen (2001).
Suppose et is an i.i.d (n × 1) sequence with mean zero, finite fourth moments, and
Eet et =  = PP  where 
P is the Cholesky factor. Let ut = D(L)et denote a stationary
t linear

vector sequence with s=0 s|Dij | < ∞ for each i, j = 1,…, n. Define
t = j=1 uj ,  =
s
∞ 
(D0 + D1 +…)P = D(1)P, and 1 = i=1 E(1(zt−1 < )1(zt−1−i < )ut ut−i ). Let W (r, )
denote the n-dimensional two-parameter Brownian motion; W (r, 1) the n-dimensional one-
parameter Brownian motion in r. The asymptotic theory is based on the following lemma.
Lemma 1.


[Tr]

a : T −1/ 2 1(zt−1 < )et ⇒ PW (r, )


t=1


T  
−1
b: T
t−1 1(zt−1 < )et ⇒ W (r, 1)dW (r, ) P 


t=1


T  
−2
c: T
t−1
t−1 1(zt−1 < ) ⇒  W (r, 1)W (r, 1) dr 


t=1

Proof of Lemma 1a. Let vt be an i.i.d (n × 1) vector with mean zero and unity variance
Evt vt = I . Notice that {11,t−1 vit } = {1(zt−1 < )vit }, i = (1,…, n), is a strictly stationary and
ergodic martingale difference process with variance E(1(zt−1 < )vit )2 = . For any (r, ),
the martingale difference central limit theorem implies that


[Tr]
−1/ 2
T 1(zt−1 < )vit →d N (0, rI),
t=1

and the asymptotic covariance kernel is


  

[Tr1 ]

[Tr2 ]

E T −1/ 2 1(zt−1 < 1 )vit T −1/ 2 1(zt−1 < 2 )vit = (r1 ∧ r2 )(1 ∧ 2 ).
t=1 t=1
[Tr]
The proof of the stochastic equicontinuity for each component of T −1/ 2 t=1 1(zt−1 < )vit is
lengthy, and can be found in Caner and Hansen (2001). Because the sequences of marginal
probability measures are tight, we deduce that the univariate stochastic equicontinuity
carries over to the vector, see lemma A.3 of Phillips and Durlauf (1986). The final result
is deduced after applying the Cramer–Wold device and noticing that et = Pvt .
Proof of Lemma 1b. We apply the Beveridge–Nelson ∞ ∞ to ut and obtain
decomposition
ut = D(1)et + ẽt−1 − ẽt where ẽt = D̃(L)et , D̃(L) = j=0 d̃ j Lj and d̃ j = s=j+1 ds . The stated
result is obtained after applying Theorem 2.2 in Kurtz and Protter (1991) and Lemma 1a
above.
Proof of Lemma 1c. This is a trivial extension of result 3 of theorem 3 of Caner and
Hansen (2001) to the multivariate case.

© 2016 The Department of Economics, University of Oxford and John Wiley & Sons Ltd
System-equation ADL test 19

Proof of Theorem 1. We prove the result for the TVADLM with one lag:
yt = 1 yt−1 11,t−1 + 2 yt−1 12,t−1 + 1 yt−1 + et .
The proof can be easily extended to a general TVADLM with p lags. Under the null hypoth-
esis 1 = 2 = 0, it follows that (L)yt = et , (L) = I − 1 L. Define ut = D(L)et , D(L) =
(L)−1 .Then yt−1 =
t−1 in Lemma 1. Let ỹt−1 = (yt−1 
11,t−1 , yt−1 12,t−1 ) , wt−1 = (ỹt−1 , yt−1

) ,
 = (1 , 2 ), and = (, 1 ). Consider the scaling matrix
 = diag(T I, T 1/ 2 I).
We have
  T  −1  
   
T
 ˆ − = −1 
wt−1 wt−1 −1 −1 wt−1 et ,
t=1 t=1

where
 T   
 B11 B12
−1 
wt−1 wt−1 −1 ⇒ B = 
B12 B22 ,
t=1
   
 W (r, 1)W (r, 1) dr   0 
B11 = ,
0 (1 − ) W (r, 1)W (r, 1) dr 
B12 = op (1),

B22 = Eyt−1 yt−1 ,
and
 

T C1
−1
 wt−1 et ⇒ C = C2 ,
t=1
  
  
  W (r, 1)dW
 (r, ) P 
C1 =
 W (r, 1)dW (r, 1) − W (r, 1)dW (r, ) P 
C2 = Op (1).
In order to obtain B11 recall that 11,t−1 = 1(zt−1 < ), 12,t−1 = 1 − 1(zt−1 < ), and 11,t−1 12,t−1 =
0. The off-diagonal elements of B11 are zeros because of the orthogonality of 11,t−1 and
12,t−1 . The [1,1] term of  B11 is due to Lemma   1c; the [2,2] term is due to the facts that
−2
T  
T t=1 yt−1 yt−1 ⇒  W (r, 1)W (r, 1) dr  , cf. lemma 3.1.(b) of Phillips and Durlauf
(1986), and

T 
T
−2 
T yt−1 yt−1 12,t−1 = T −2 
yt−1 yt−1 (1 − 1(zt−1 < ))
 
t=1 t=1

⇒ (1 − ) W (r, 1)W (r, 1) dr  .




T T
B12 = op (1) since T −3/ 2 t=1 ỹt−1 yt−1

= T −1/ 2 T −1 t=1 ỹt−1 yt−1

= T −1/ 2 Op (1) = op (1).
B22 is due to the Weak Law of Large Number.

© 2016 The Department of Economics, University of Oxford and John Wiley & Sons Ltd
20 Bulletin

T
We obtain C1  according to Lemma 1b, and because T −1 
t=1 yt−1 et ⇒

[ W (r, 1)dW (r, 1) ]P , cf. proposition 18.1.(f) of Hamilton (1994), and

T 
T
T −1 yt−1 12,t−1 et = T −2 yt−1 (1 − 1(zt−1 < ))et
 
t=1 t=1


⇒ W (r, 1)dW (r, 1) − W (r, 1)dW (r, ) P  .


C2 is due to the Central


T Limit Theorem.
Because −1 ( t=1 wt−1 wt−1 )−1 is asymptotically block-diagonal, it follows that
   −1 
ˆ B11 C1
 − ⇒ −1 .
B22 C2
−1
In particular, B11 C1 indicates that
 −1  
T I(ˆ1 − 1 ) ⇒ −1 −1 W (r, 1)W (r, 1) dr W (r, 1)dW (r, ) P  ,

and
T I(ˆ2 − 2 ) ⇒
 −1  
−1 −1 
(1 − )  W (r, 1)W (r, 1) dr W (r, 1)dW (r, 1) − W (r, 1)dW (r, ) P  .
 

This implies the super-consistency of , ˆ and the consistency of    . For given , let Y and e
be the matrices stacking (yt−1 11,t−1 , yt−1 12,t−1 ) and et , respectively. Define Q as projection
onto the orthogonal space of the lagged term yt−1 . Write the OLS estimates for (1 , 2 )
as ˆ = (ˆ1 , ˆ2 ). Then the Wald statistic for H0 : 1 = 2 = 0 is
F() =vec(ˆ )] [(Y QY )−1 ⊗   ]−1 [vec(ˆ )]
=trace[  −1/ 2 (e QY )(Y  QY )−1 (Y  Qe) −1/ 2 ].
   

We can show that


   −1 −1  −1/ 2 
T −1 Y Qe =T −1 ỹt−1 et − T −1/ 2 T −1 ỹt−1 yt−1
 
T yt−1 yt−1 T yt−1 et
=T −1 ỹt−1 et + op (1) ⇒ C1 ,
and
   −1 −1  −1 
T −2 Y QY =T −2 ỹt−1 ỹt−1 − T −1 T −1 ỹt−1 yt−1
 
T yt−1 yt−1 T yt−1 ỹt−1
=T −2 ỹt−1 ỹt−1 + op (1) ⇒ B11
 −1/ 2 = P −1 + op (1) we can show
Combining with  

 −1/ 2 (e QY )(Y  QY )−1 (Y  Qe)


F() =trace[  −1/ 2 ]
   
 −1/ 2  −1
=trace[ C B C1   −1/ 2
] + op (1)
 1 11 

⇒trace[J0 J1−1 J0 ],

© 2016 The Department of Economics, University of Oxford and John Wiley & Sons Ltd
System-equation ADL test 21

where
⎛ ⎞
W (r, 1)dW (r, )
⎜ ⎟
J0 = ⎜ ⎟
⎝ W (r, 1)dW (r, 1) − W (r, 1)dW (r, ) ⎠

⎛   ⎞
 W (r, 1)W (r, 1) dr 0
⎜  ⎟
J1 = ⎝ ⎠.
0 (1 − ) W (r, 1)W (r, 1) dr

The remaining steps for the convergence of sup F follow from the continuous mapping
theorem.
Proof of Theorem 2. Only a sketch of the proof is provided since it is similar to that
of Theorem 1. The testing regression is a TVADLM with one lag and regime-dependent
intercept terms:

yt = (c11 + 1 yt−1 )11,t−1 + (c21 + 2 yt−1 )12,t−1 + 1 yt−1 + et .

Under the null hypothesis 1 = 2 = 0 and Assumption 3 c11 = c21 = 0 it follows that yt−1 =

t−1 in Lemma 1. For given , let Y and e be the matrices stacking (yt−1 11,t−1 , yt−1 12,t−1 ) and
et , respectively. Define Qd as projection onto the orthogonal space of (11,t−1 , 12,t−1 , yt−1 ).
Because of the orthogonality between 11,t−1 and 12,t−1 , and Eyt−1 = 0, Qd = I − (Q1d +
Q2d + Q3d ), where Q1d is projection onto the space of 11,t−1 , Q2d is projection onto the space
of 12,t−1 , and Q3d is projection onto the space of yt−1 . It follows that

T −1 Y Qd e =T −1 Y e − T −1 Y Q1d e − T −1 Y Q2d e − T −1 Y Q3d e


T −1 Y e ⇒C1
⎛ ⎞
 W (r, 1)dr
⎜ ⎟
T −1 Y Q1d e ⇒−1 ⎝ 0 ⎠ W (1, )P


⎛ ⎞
0
T −1 Y Q2d e ⇒(1 − )−1 ⎝ (1 − ) W (r, 1)dr ⎠ [W (1, 1) − W (1, )]P 

T −1 Y Q3d e =op (1).


To summarize, we have
⎛ ⎞
W (r, 1)drW (1, )
⎜ ⎟ 
T −1 Y Qd e = C1 −  ⎝ ⎠ P + op (1).
W (r, 1)dr[W (1, 1) − W (1, )]

Similarly we can show that

© 2016 The Department of Economics, University of Oxford and John Wiley & Sons Ltd
22 Bulletin

⎛ ⎞
 W (r, 1)dr W  (r, 1)dr 0
⎜ ⎟
T −2 Y Qd Y = B11 −  ⎜


0 (1 − ) W (r, 1)dr W (r, 1)dr ⎠


×  + op (1).
Finally, it follows that
 
F d () =trace  −1/ 2 (e Qd Y )(Y  Qd Y )−1 (Y  Qd e)
 −1/ 2
   
 d  d −1 d 
⇒trace J0 (J1 ) J0 ,
where
  
 W (r, 1)drW (1, )
J0d =J0 − W (r, 1)dr[W (1, 1) − W (1, )]
   
 W (r, 1)dr W  (r, 1)dr  0 
J1d =J1 − 0 (1 − ) W (r, 1)dr W  (r, 1)dr .

Proof of Theorem 3. Consider the testing regression given by


yt = (c11 + c12 t + 1 yt−1 )11,t−1 + (c21 + c22 t + 2 yt−1 )12,t−1 + et .
For easy of exposition, the above regression excludes the lagged term yt−1 since it is
asymptotically irrelevant, as shown by the proofs of Theorems 1 and 2. Under the null
hypothesis (8) and Assumption 3, yt = y0 + c0 t +
t where
t = j=1 uj , ut = (L)−1 et .
t

Because yt is collinear with the time trend, we consider the hypothetical regression in the
canonical form
yt = (c11
*
+ c12
*
t + *1 t−1 )11,t−1 + (c21
*
+ c22
*
t + *2 t−1 )12,t−1 + et ,
where t−1 = yt−1 − c0 (t − 1), *1 = 1 , *2 = 2 , c11 *
= c11 − c0 , c21
*
= c21 − c0 , c12 *
= c12 +
c0 , c22 = c22 + c0 . For given , stacking (t−1 11,t−1 , t−1 12,t−1 ) as X , (11,t−1 , t11,t−1 ) as M1 ,
*

and (12,t−1 , t12,t−1 ) as M2 . Because 11,t−1 12,t−1 = 0, ∀t, the projection onto the orthogonal
space of (11,t−1 , t11,t−1 , 12,t−1 , t12,t−1 ), denoted by Qt , is
Qt = I − M1 (M1 M1 )−1 M1 − M2 (M2 M2 )−1 M2 .
We can show that
T −1 X Qt e = T −1 X [I − M1 (M1 M1 )−1 M1 − M2 (M2 M2 )−1 M2 ]e,
T −1 X e ⇒ C1 ,
T −1 X M1 (M1 M1 )−1 M1 e ⇒ C21 C22
−1
C23 ,
T −1 X M2 (M2 M2 )−1 M2 e ⇒ C31 C32
−1
C33 ,
where
© 2016 The Department of Economics, University of Oxford and John Wiley & Sons Ltd
System-equation ADL test 23

   
 W (r, 1)dr  rW (r, 1)dr
C21 = 0 0 ≡ C21
*
,
  
  rdr
C22 =  rdr  r 2 dr ,
  
 W (1, )P   * 
C23 = rdW (r, ) P ≡ C23 P.

 
0 0
C31 = (1 − ) W (r, 1)dr (1 − ) rW (r, 1)dr ≡ C31
*
,
  
1 − (1 − ) rdr
C32 = ,
(1 − ) rdr (1 − ) r 2 dr
  
 [W (1, 1) − W (1, )]P    * 
C33 = rdW (r, 1) − rdW (r, ) P ≡ C33 P.

To summarize,
T −1 X Qt e ⇒ C1 − C21 C22
−1 −1
C23 − C31 C32 C33 .
In a similar fashion we can show
T −2 X Qt X ⇒ B11 − C21 C22
−1  −1 
C21 − C31 C32 C31 .
Finally,
 
F t () =trace   −1/ 2
 −1/ 2 (e Qt X )(X  Qt X )−1 (X  Qt e)
   
  
⇒trace J0t (J1t )−1 Jtd ,
where
−1 * −1 *
J0t =J0 − C21
*
C22 C23 − C31
*
C32 C33
 
−1 * −1 *
J1t =J1 − C21
*
C22 C21 − C31
*
C32 C31 .

Final Manuscript Received: September 2015

References
Andrews, D. and Ploberger, W. (1994). ‘Optimal tests when a nuisance parameter is present only under the
alternative’, Econometrica, Vol. 62, pp. 1383–1414.
Balke, N. and Fomby, T. (1997). ‘Threshold cointegration’, International Economic Review, Vol. 38, pp. 627–
645.
Banerjee,A., Dolado, J., Hendry, D. and Smith, G. (1986). ‘Exploring equilibrium relationships in econometrics
through static models: some Monte Carlo evidence’, Oxford Bulletin of Economics and Statistics, Vol. 48,
pp. 253–277.

© 2016 The Department of Economics, University of Oxford and John Wiley & Sons Ltd
24 Bulletin

Boswijk, H. P. (1994). ‘Testing for an unstable root in conditional and structural error correction models’,
Journal of Econometrics, Vol. 63, pp. 37–60.
Caner, M. and Hansen, B. E. (2001). ‘Threshold autoregression with a unit root’, Econometrica, Vol. 69, pp.
1555–1596.
Davies, R. B. (1977). ‘Hypothesis testing when a nuisance parameter is present only under the alternative’,
Biometrika, Vol. 64, pp. 247–254.
Davies, R. B. (1987). ‘Hypothesis testing when a nuisance parameter is present only under the alternative’,
Biometrika, Vol. 74, pp. 33–43.
Enders, W. and Siklos, P. L. (2001). ‘Cointegration and threshold adjustment’, Journal of Business & Economic
Statistics, Vol. 19, pp. 166–176.
Giovanini, A. (1988). ‘Exchange rates and traded goods prices’, Journal of International Economics, Vol. 24,
pp. 45–68.
Hamilton, J. D. (1994). Time Series Analysis, Princeton University Press, Princeton.
Hansen, B. E. (1996). ‘Inference when a nuisance parameter is not identified under the null hypothesis’,
Econometrica, Vol. 64, pp. 413–430.
Hansen, B. E. and Seo, B. (2002). ‘Testing for two-regime threshold cointegration in vector error correction
models’, Journal of Econometrics, Vol. 110, pp. 293–318.
Horvath, M. T. and Watson, M. W. (1995). ‘Testing for cointegration when some of the cointegrating vectors
are prespecified’, Econometric Theory, Vol. 11, pp. 984–1014.
Jang, B. G., Koo, H. K., Liu, H. and Loewenstein, M. (2007). ‘Liquidity premia and transaction costs’, Journal
of Finance, Vol. 62, pp. 2329–2366.
Kremers, J. J. M., Ericsson, N. R. and Dolado, J. J. (1992). ‘The power of cointegration tests’, Oxford Bulletin
of Economics and Statistics, Vol. 54, pp. 325–348.
Kurtz, T. and Protter, P. (1991). ‘Weak limit theorems for stochastic integrals and stochastic differential equa-
tions’, Annals of Probability, Vol. 19, pp. 1035–1070.
Li, J. and Lee, J. (2010). ‘ADL tests for threshold cointegration’, Journal of Time Series Analysis, Vol. 31, pp.
241–254.
Phillips, P. C. B. and Durlauf, S. (1986). ‘Multiple time series regression with integrated processes’, Review
of Economic Studies, Vol. 53, pp. 473–496.
Phillips, P. C. B. and Ouliaris, S. (1990). ‘Asymptotic properties of residual based tests for cointegration’,
Econometrica, Vol. 58, pp. 165–193.
Pippenger, M. K. and Goering, G. E. (1993). ‘A note on the empirical power of unit root tests under threshold
processes’, Oxford Bulletion of Economics and Statistics, Vol. 55, pp. 473–481.
Seo, M. (2006). ‘Bootstrap testing for the null of no cointegration in a threshold vector error correction model’,
Journal of Econometrics, Vol. 134, pp. 129–150.
Sephton, P. S. (2003). ‘Spatial market arbitrage and threshold cointegration’, American Journal of Agricultural
Economics, Vol. 85, pp. 1041–1046.
Sims, C. A., Stock, J. H. and Watson, M. W. (1990). ‘Inference in linear time series models with some unit
roots’, Econometrica, Vol. 58, pp. 113–144.
Taylor, M. P. and Peel, D. A. (2000). ‘Nonlinear adjustment, long-run equilibrium and exchange rate funda-
mentals’, Journal of International Money and Finance, Vol. 19, pp. 33–53.
Tsay, R. S. (1997). Unit-Root Tests with Threshold Innovations, University of Chicago, Chicago, IL.
Zivot, E. (2000). ‘The power of single-equation tests for cointegration when the cointegrating vector is pre-
specified’, Econometric Theory, Vol. 16, pp. 407–439.

Supporting Information
Additional supporting information may be found in the online version of this article:

Data S1. Reply to Referee 1.


Data S2. Reply to Referee 2.

© 2016 The Department of Economics, University of Oxford and John Wiley & Sons Ltd

You might also like