You are on page 1of 44

BAND SPECTRAL REGRESSION

WITH TRENDING DATA

BY

DEAN CORBAE, SAM OULIARIS


and PETER C. B. PHILLIPS

COWLES FOUNDATION PAPER NO. 1039

COWLES FOUNDATION FOR RESEARCH IN ECONOMICS


YALE UNIVERSITY
Box 208281
New Haven, Connecticut 06520-8281

2002

http://cowles.econ.yale.edu/
Econometrica, Vol. 70, No. 3 (May, 2002), 1067–1109

BAND SPECTRAL REGRESSION WITH TRENDING DATA

By Dean Corbae, Sam Ouliaris, and Peter C. B. Phillips1

Band spectral regression with both deterministic and stochastic trends is considered. It
is shown that trend removal by regression in the time domain prior to band spectral regres-
sion can lead to biased and inconsistent estimates in models with frequency dependent
coefficients. Both semiparametric and nonparametric regression formulations are consid-
ered, the latter including general systems of two-sided distributed lags such as those aris-
ing in lead and lag regressions. The bias problem arises through omitted variables and is
avoided by careful specification of the regression equation. Trend removal in the frequency
domain is shown to be a convenient option in practice. An asymptotic theory is developed
and the two cases of stationary data and cointegrated nonstationary data are compared.
In the latter case, a levels and differences regression formulation is shown to be useful in
estimating the frequency response function at nonzero as well as zero frequencies.

Keywords: Band spectral regression, deterministic and stochastic trends, discrete


Fourier transform, distributed lag, integrated process, leads and lags regression, nonsta-
tionary time series, two-sided spectral BN decomposition.

1 introduction
Hannan’s (1963a, b) band-spectrum regression procedure is a useful
regression device that has been adopted in some applied econometric work,
notably Engle (1974), where there are latent variables that may be regarded as
frequency dependent (like permanent and transitory income) and where there is
reason to expect that the relationship between the variables may depend on fre-
quency. More recently, band spectral regression has been used to estimate coin-
tegrating relations, which describe low frequency or long run relations between
economic variables. In particular, Phillips (1991a) showed how frequency domain
regressions that concentrate on a band around the origin are capable of pro-
ducing asymptotically efficient estimates of cointegrating vectors. Interestingly,
in that case the spectral methods are used with nonstationary integrated time
series in precisely the same way as they were originally designed for stationary
series, on which there is now a large literature (see, inter alia, Hannan (1970 Ch.
VII), Hannan and Thomson (1971), Hannan and Robinson (1973), and Robinson
(1991)). Related methods were used in Choi and Phillips (1993) to construct fre-
quency domain tests for unit roots. In all of this work, the capacity of frequency
domain techniques to deal nonparametrically with short memory components in
1
Our thanks go to a co-editor and three referees for helpful comments. An early draft was written
in the summer of 1994. The first complete version of the paper was written over the period 1 January–
1 July 1997 while Phillips was visiting the University of Auckland. Lemmas A–D are due to Phillips
and are taken from other work to make this paper self contained. Phillips thanks the NSF for research
support under Grant Nos. SBR 94-22922, SBR 97-30295, and SES 0092509.

1067
1068 d. corbae, s. ouliaris, and p. c. b. phillips

the data is exploited. Robinson (1991), who allowed for band dependent model
formulations, showed how, in such semiparametric models, data based methods
can be used to determine the smoothing parameter that plays a central role in
the treatment of the nonparametric component. Other data based methods have
been used directly in the study of cointegrating relations and testing problems by
Xiao and Phillips (1998, 1999).
To the extent that relations between variables may be band dependent, it is
also reasonable to expect that causal relations between variables may vary accord-
ing to frequency. Extending work by Geweke (1982), both Hosoya (1991) and
Granger and Lin (1995) have studied causal links from this perspective and con-
structed measures of causality that are frequency dependent. Most recently, fre-
quency band methods have been suggested for the estimation of models with long
memory components (e.g., Marinucci (1998), Robinson and Marinucci (1998),
Kim and Phillips (1999)), where salient features of the variables are naturally
evident in frequency domain formulations.
This paper studies some properties of frequency band regression in the pres-
ence of both deterministic and stochastic trends, a common feature in economic
time series applications. In such cases, a seemingly minor issue relates to the
manner of deterministic trend removal. In particular, should the deterministic
trends be eliminated by regression in the time domain prior to the use of the
frequency domain regression or not? Since spectral regression procedures were
originally developed for stationary time series and since the removal of deter-
ministic trends by least squares regression is well known to be asymptotically
efficient under weak conditions (Grenander and Rosenblatt (1957, p. 244)), it
may seem natural to perform the trend removal in the time domain prior to the
use of spectral methods. Indeed, this is recommended in Hannan (1963a, p. 30;
1970, pp. 443–444), even though the development of spectral regression there
and in Hannan (1970, Ch. VII.4) allows for regressors like deterministic trends
that are susceptible to a generalized harmonic analysis, satisfying the so-called
Grenander conditions (Grenander and Rosenblatt (1957, p. 233), Hannan (1970,
p. 77)). Theorem 11 of Hannan (1970, p. 444) confirms that such time domain
detrending followed by spectral regression is asymptotically efficient in models
where all the variables are trend stationary and where the model coefficients do
not vary with frequency.
In time domain regression, the Frisch–Waugh (1933) theorem assures invari-
ance of the regression coefficients to prior trend removal or to the inclusion
of trends in the regression itself. Such invariance is often taken for granted in
empirical work. However, as will be shown here, invariance does not always
apply for band-spectrum regression when one switches from time domain to fre-
quency domain regressions. In particular, detrending by removing deterministic
components in the time domain and then applying band-spectrum regression is
not necessarily equivalent to detrending in the frequency domain and applying
band-spectrum regression. The reason is that a band dependent spectral model
is similar in form to a linear regression with a structural change in the coeffi-
cient vector. So, when there are deterministic trends in the model, the original
band spectral regression 1069

deterministic regressors also need to be augmented by regressors that are rele-


vant to the change period (here, the frequency band where the change occurs).
Thus, the seemingly innocuous matter of detrending can have some nontrivial
consequences in practice. In particular, detrending in the time domain yields esti-
mates that can be biased in finite samples, and, in the case of nonstationary data,
inconsistent. The appropriate procedure is to take account of the augmented
regression equation prior to detrending and this can be readily accomplished by
including the deterministic variables explicitly in the frequency domain regres-
sion. These issues are relevant whenever band spectral methods are applied,
including those cases that involve cointegrating regressions. In the latter case,
a levels and differences regression formulation is shown to be appropriate and
useful in estimating the frequency response function at nonzero as well as zero
frequencies.
The present paper provides an asymptotic analysis of frequency domain
regression with trending data for the two cases of stationary and cointegrated
nonstationary data. We consider both semiparametric and fully nonparametric
formulations of the regression model, in both cases allowing the model coeffi-
cients to vary with frequency. In dealing with the nonstationary case, the paper
introduces some new methods for obtaining a limit theory for discrete Fourier
transforms (dft’s) of integrated time series and makes extensive use of two sided
BN decompositions, which lead to levels and differences model formulations.
This asymptotic theory simplifies some earlier results given in Phillips (1991a)
and shows that the dft’s of an I(1) process are spatially (i.e., frequencywise)
dependent across all the fundamental frequencies, even in the limit as the sam-
ple size n → , due to leakage from the zero frequency. This leakage is strong
enough to ensure that smoothed periodogram estimates of the spectrum away
from the origin are inconsistent at all frequencies  = 0. The techniques and
results given here should be useful in other regression contexts where these quan-
tities arise. Some extensions of the methods to cases of data with long range
dependence are provided in Phillips (1999).
Most of our notation is standard: [a] signifies the largest integer not exceeding
a > signifies positive definiteness when applied to matrices, a∗ is the complex
conjugate transpose of the matrix a, a is a matrix norm of a, a− is the Moore
Penrose inverse of a, Pa = a a∗ a
− a∗ is the orthogonal projector onto the range
of a, L2 0 1 is the space of square integrable functions on [0, 1], 1[A] is the indi-
d d p
cator of A, ∼ signifies ‘asymptotically distributed as’ and → and → are used to
denote weak convergence of the associated probability measures and convergence
in probability, respectively, as the sample size, n → ; I(1) signifies an integrated
process of order one, BM 
denotes a vector Brownian  1 motion with covariance
1
matrix , and we write integrals like 0 B r
dr as 0 B, or simply B if there
is no ambiguity over limits; MN 0 G
signifies a mixed normal distribution with
matrix mixing variate G, Nc 0 G
signifies a complex normal distribution with
covariance matrix G,√thediscrete Fourier transform (dft) of at  t = 1     n is
written wa 
= 1/ n
nt=1 at eit , and s = 2"s/n
# s = 0 1     n − 1 are
the fundamental frequencies.
1070 d. corbae, s. ouliaris, and p. c. b. phillips

2 models and estimation preliminaries


Let xt t = 1     n
be an observed time series generated by

(1) xt = %2
zt + x̃t 

where zt is a p + 1-dimensional deterministic sequence and x̃t is a zero mean


k-dimensional time series. The series xt therefore has both a deterministic
component involving the sequence zt and a stochastic (latent) component x̃t . In
developing our theory it will be convenient to allow for both stationary and non-
stationary x̃t . Accordingly, we introduce the following two alternative assump-
tions. The mechanism linking the observed variables xt , unobserved disturbances
(t , and a dependent variable yt is made explicit later.

Assumption 1: *t = (t  x̃t


is a jointly stationary time series with Wold repre-
sentation *t = j=0 Cj -t−j , where -t = iid 0 .
with finite fourth moments and the
 1
coefficients Cj satisfy  j=0 j Cj  < . Partitioned conformably with *t , the spec-
2

tral density matrix f** 


of *t is
 
f(( 
0
f** 
= 
0 fxx 

with f(( 
 fxx 
> 0 ∀ .

Assumption 2: x̃t is an I(1) process satisfying 1x̃t = vt , initialized at t = 0


by any Op (1) random variable. The shocks *t = (t  vt

satisfy Assumption 1 with


spectral density f** 
partitioned as
 
f 
0
f** 
= (( 
0 fvv 

with f(( 
 fvv 
> 0 ∀.

Assumptions 1 and 2 suffice for partial sums of *t to satisfy the functional law
−1/2
 nr d
n t=1 *t → B r
= BM 
, a vector Brownian motion of dimension k + 1

with covariance matrix  = 2"f** 0


(e.g., Phillips and Solo (1992, Theorem
3.4)). The vector process B and matrix  can be partitioned conformably with *
as B = B(  Bx

and
 
(( 0
= 
0 xx

where xx > 0, so that x̃t is a full rank vector I(1) process. In all cases, x̃t is taken
to be independent of (t . This amounts to a strict exogeneity assumption when we
introduce a generating mechanism for a dependent variable yt . The assumption
is standard for consistent estimation in the stationary case (as in Hannan (1963a,
b)). In the nonstationary case, it is not required for the consistent estimation of
band spectral regression 1071

(low frequency) cointegrating regression components. However, at bands away


from the origin, some version of incoherency between the regressors and errors
is required for consistency and asserting incoherency enables us to examine the
bias and inconsistency effects of detrending in such regressions, just as in the
case of stationary x̃t .
We make the following assumptions concerning the deterministic sequence zt .
We confine our treatment to time polynomials and set zt = 1 t     t p

, the
most common case in practice, although we anticipate that our conclusions about
bias and inconsistency will apply more generally, for instance to trigonometric
polynomials, after some appropriate extensions of our dft formulae in Lemma A
and the limit results in Lemma C below. Let 5n = diag 1 n     np
, and define
dt = 5−1
n zt . Then

(2) d nr = 5−1 p

n z nr → u r
= 1 r     r


uniformly in r ∈ 0 1 . The limit  1 functions u r


are linearly independent in

L2 0 1 and n−1 nt=1 dt dt

→ 0 uu
> 0. Now let Z (respectively, D) be the n×p
observation matrix of the nonstochastic regressors zt (respectively, standardized
regressors dt ), PZ be the projector onto the range of Z, and QZ = In − PZ be the
residual projection matrix (respectively, PD and I − PD ). Clearly, PZ = PD and
QZ = QD .

The Nonparametric Model


The type of model we have in mind for the stochastic component of the data
allows for the regression coefficient to vary across frequency bands. In the time
domain, suppose a (latent) dependent variable ỹt is related to x̃t and (t according
to a (possibly two-sided) distributed lag



(3) ỹt = :
j x̃t−j + (t = : L

x̃t + (t 
j=−


where the transfer function of the filter, b 
= : ei
=  ij
j=− :j e , is assumed
to converge for all  ∈ −" " . Under stationarity (Assumption 1), the cross
spectrum fyx 
between ỹt and x̃t satisfies

(4) fyx 
= b 

fxx 


whose complex conjugate transpose gives the complex regression coefficient

(5) : = b −
= fxx 
−1 fxy 


As argued in Hannan (1963a, b), spectral methods provide a natural mechanism


for focusing attention on a particular frequency, or band of frequencies, and for
performing estimation of : .
1072 d. corbae, s. ouliaris, and p. c. b. phillips

Next, suppose that the coefficients :j in (3) are such that : L


has a valid two-
sided spectral BN decomposition (see part (b) of Lemma D in the Appendix),
viz.

(6) : L
= : ei
+ :̃ e−i L
e−i L − 1


where
 
 

 : eik  j ≥ 0

k=j+1 k


j
:̃ L
= :̃j L  :̃j =

 j

j=− 
 − :k eik  j < 0

k=−

 2
with j=− :̃j  < . Using (6) in (3) we have

(7) ỹt = : L

x̃t + (t

= b 
+ :̃ e−i L
e−i L − 1
x̃t + (t 
= b 

x̃t + e−i L − 1
ãt + (t 

in which ãt = :̃ e−i L


x̃t . Then, setting  = s in (7) and taking dft’s at fre-
quency s we obtain the following frequency domain form of (3):
1

(8) wỹ s
= b s

wx̃ s
+ w( s
+ √ e−is ãs 0 − ãs n 
n
For stationary x̃t  ãt is also stationary and (8) can be written as

1
(9) wỹ s
= b s

wx̃ s
+ w( s
+ Op √ 
n
giving an approximate frequency domain version of (3).
The case of integrated x̃t (Assumption 2) can be handled in a similar way.
Here it is useful to apply to (3) the two-sided BN decomposition at unity (part(a)
of Lemma D), viz.

(10) : L
= : 1
+ :̃ L
L − 1
 where
 
 

 :k  j ≥ 0



j k=j+1
:̃ L
= :̃j L  :̃j =
 
 j
j=− 
 − :k  j < 0

k=−

so that (3) can be interpreted as

(11) ỹt = : 1

x̃t + (t − :̃ L

vt = : 1

x̃t + (˜ t  say
band spectral regression 1073

which is a cointegrating equation between ỹt and x̃t with cointegrating vector
1 −: 1


and composite error (˜ t = (t − :̃ L

vt . Setting <t = :̃ L

vt and taking
the dft of (11), we obtain

(12) wỹ s
= : 1

wx̃ s
+ w( s
− w< s

(13) = b s

wx̃ s
+ w( s
− w< s
+ : 1
− b s

wx̃ s


The final two terms of (13) are important, revealing that the approximation (9)
fails in the nonstationary case and has an error of specification that arises from
omitted variables.
Assuming it is valid, the spectral BN decomposition of :̃ L
at frequency s is

:̃ L
= :̃ eis
+ :s e−is L
e−is L − 1


where :s is constructed as in Lemma D(b). Then

1

(14) w< s
= :̃ eis

wv s
+ √ e−is ṽs 0 − ṽs n
n

1
= :̃ eis

wv s
+ Op √
n

where ṽs t =:s e−is L
vt is stationary. It follows that (12) has the form


is
1
(15) wỹ s
= : 1
wx̃ s
− :̃ e
wv s
+ w( s
+ Op √ 
n
giving an approximate frequency domain version of the cointegrated model that
applies for all frequencies. Since vt = 1x̃t , it is apparent from (15) that if data on
ỹt and x̃t were available, the equation could be estimated in the frequency domain
by band spectral regression of wỹ s
on wx̃ s
and w1x̃ s
for frequencies s
centered around some frequency . For s centered around  = 0, this procedure
was suggested originally in Phillips (1991b) and Phillips and Loretan (1991),
where it is called augmented frequency domain regression, although in that work
deterministic trends were excluded. In the general case where s is centered on
 = 0, we can recover the spectral coefficient : = b −
from the coefficients
: 1
and :̃ eis

= :̃ e−is
∗ in (15) using the fact that

(16) b −
= : e−i
= : 1
+ :̃ e−i
e−i − 1


Causal economic relationships are often formulated as in (3) with one-sided


relations where :j = 0 for j < 0, but two-sided relations are not excluded. They
are well known to occur in situations where there is feedback in both directions
between the variables ỹ and x̃. A notable recent example arises in the simplifi-
cation of cointegrating systems to single equation formulations where two-sided
1074 d. corbae, s. ouliaris, and p. c. b. phillips

dynamic regressions like (11) arise and where they have been used in regres-
sion analysis to obtain efficient estimates of the cointegrating vector (Phillips and
Loretan (1991), Saikonnen (1991), Stock and Watson (1993)).
The mechanism linking the observed dependent variable yt to zt and ỹt (and,
hence, x̃t , and (t ) is now given by

(17) yt = z
t "1 + ỹt 

where ỹt follows (3) in the stationary case with x̃t  (t


satisfying Assumption 1,
and (11) in the cointegrated case with x̃t  (t
satisfying Assumption 2. The
observed series yt and xt therefore have deterministic trends and stochastic com-
ponents that are linked by a system of distributed lags with stationary errors.
The natural statistical approach would appear to be deterministic trend removal
by regression of yt  xt
on zt , followed by an analysis of the system of lagged
dependencies between the residuals from these regressions. The latter can be
conducted in the frequency domain where the coefficient of interest is the trans-
fer function of the filter b 

. This coefficient will vary according to frequency


(unless :j = 0 for j = 0) and can be estimated by nonparametric regression tech-
niques. One of the aims of this paper is to examine the finite sample and asymp-
totic performance of this natural procedure against that of a procedure that
seeks to perform the regression analysis in the frequency domain directly on the
observed quantities yt  xt  zt
rather than on detrended data.

A Fixed Band Regression Model


A simple situation of interest occurs when the response function b 
is con-
stant, let us say on two fixed frequency bands. This case captures the essential
features of the problem that we intend to discuss, is easy to analyze, and will be
considered at some length here and in the asymptotic theory. It has the advan-
tage that b 
is real and parametric and can be estimated at parametric rates
when the bands are of fixed positive length. This case is analogous to a regression
model with a structural break and that analogy helps to explain why time and fre-
quency domain detrending have different effects. The nonparametric model can
be interpreted as a limiting case in which one of the bands shrinks to a particular
frequency as the sample size increases. To promote the analogy with a structural
break model we will develop a conventional regression model formulation for
the data.
To this end, we postulate the dual bands A = −0  0 and cA = −",
−0
∪ 0  " for some given frequency 0 > 0 and define the frequency depen-
dent coefficient vector

(18) b 
= :A 1  ∈ A + :Ac 1  ∈ cA 

where :A is a k-vector of parameters pertinent to the band A , and :Ac is the


corresponding k-vector of parameters pertinent to the band cA . This formulation
of b 
enables us to separate low frequency  ≤ 0
from high frequency
band spectral regression 1075

 > 0
responses ina regression context. The coefficients :j in the Fourier
representation, b 
=  ij
j=− :j e , of (18) are

 : A 0 : A c

 + " − 0
 j = 0
1 " −ij " "
(19) :j = b 
e dw =
2" −" 
 sin j0
 :A − :Ac
 j = 0
"j
so that the filter in (3) is symmetric and two-sided. This is an interesting case
where the coefficients :j decay slowly and do not satisfy the sufficient condition
 1
− j 
2 :  <  given in Lemma D for the validity of the BN decomposition of
j
: L
=  j
j=− :L . Nevertheless, the BN decomposition of : L
is still valid in
this case, as shown in Remark (i) following Lemma D in the Appendix.
While data for ỹt can be generated by (3) and (19) by truncating the filter, an
alternate frequency domain approach that uses (18) is possible. This mechanism
has the advantage of revealing the impact of the shift in (18) in terms of a stan-
dard regression model with a structural change. Like xt , the dependent variable
yt is assumed to have both deterministic and stochastic components. Its stochas-
tic component ỹt is generated from x̃t and (t by means of a triangular array
formulation which we explain as follows.
Let X  = x̃1      x̃n
be the n × k matrix of observations of the exogenous

regressors x̃t , let U be the n × n unitary matrix with j k
element e2"ijk/n / n,
and let W = E1n U , where
 
0 1
E1n =
In−1 0

is an n × n permutation matrix that reorders the rows of U so that the last


row becomes the first and succeeding rows are simply advanced by one position.
Then, W ∗ W = In , and the matrix W X  has kth row given by the dft wx̃ s

with
s = k − 1.
We introduce the following matrices that are used to provide a frequency
dependent structure to the model for yt . Define:
• A = n × n selector matrix that zeroes out frequencies in W X  that are not
c
relevant to the primary band of interest, say A . Then, A W = I − A W extracts
the residual frequencies over cA . Note that Ac A = AAc = 0.
• A = W ∗ AW and A c = W ∗ Ac W = I − A .
Then, for each n, ỹ = ỹ1      ỹn
is generated in the frequency domain in dft
form by the system

(20)  A + AW(
AW ỹ = AW X:
(21)  Ac + Ac W(
Ac W ỹ = Ac W X:

where ( = (1      (n

. Equations (20) and (21) generate observations in the


frequency domain that correspond to the model (8) in the nonparametric case
1076 d. corbae, s. ouliaris, and p. c. b. phillips

with the final term of (8) omitted. In the stationary case, this omitted term is
1
Op n− 2
as seen in (9), so the difference between the generating mechanisms
is negligible asymptotically. In the nonstationary case, however, the omitted term
is nonnegligible, as is apparent from (13) and (15). In consequence, the two gen-
erating mechanisms differ in the nonstationary case and this leads to a difference
in the asymptotic theory between the two models that will figure in our discus-
sion in Section 5.
Adding (20) and (21) and multiplying by W ∗ gives

(22)  A + A c X:
ỹ = A X:  Ac + (

which is the time domain generating mechanism for ỹ. By construction, the matri-
ces A and A c have elements that depend on n, and it follows that ỹt # t =
1     n has a triangular array structure, although we do not emphasize this by
using an additional subscript and we will have no need of triangular array limit
theory in our asymptotic development.
In (22), (t  x̃t

satisfies Assumption 1 in the stationary case and (t  1x̃t

sat-
isfies Assumption 2 in the nonstationary case. Thus, (22) is a variable (band
dependent) coefficient time series regression with exogenous regressors and sta-
tionary errors under Assumption 1, and a cointegrating regression with exoge-
nous regressors and band dependent coefficients under Assumption 2. Note that
model (22) is semiparametric: parametric in the regression coefficients :A and
:Ac and nonparametric in the regression errors (.
We now suppose that the observed series yt has deterministic components like
xt in (1) and is related to the unobserved component ỹt by means of

(23) y = Z"1 + ỹ

Using (22) and (23), it follows that the observed data satisfy the model

(24) y = Z "1 − %2 :A
+ X:A + A c Z%2 :A − :Ac
− A c X :A − :Ac
+ (

or, equivalently,

(25) y = Z "1 − %2 :Ac


+ A Z%2 :Ac − :A
+ X:Ac − A X :Ac − :A
+ (

or

(26) y = A Z "1 − %2 :A
+ A c Z "1 − %2 :Ac
+ A X:A + A c X:Ac + (

These models extend (22) to cases where the observed data contain deterministic
trends. Observe that the trend regressors zt now appear in the model with band
dependent coefficients, just like the exogenous regressors xt .
The formulations (24)–(26) make it clear that detrending the data in the time
domain is not a simple matter of applying the projection matrix QZ , as might
be expected immediately from (1) and (23). In fact, correct trend removal is
accomplished by the use of the operator QV = I − PV , where V = Z A c Z or,
band spectral regression 1077

equivalently, in view of (26) V = A Z A c Z . Methods that rely on prefiltering


by means of QZ do not fully remove the trends and this can have important
consequences, like biased and inconsistent estimates of the regression coefficients
:A and :Ac . Of course, when A = In the coefficient is invariant across bands
:A = : say
, we have A = In and A c is null, so that V = Z and usual detrending
by QZ is appropriate. In that case (26) reduces to

y = Z "1 − %2 :
+ X: + (

Put another way, conventional detrending by QZ implicitly assumes that there is


no variation in the coefficient across frequency bands. If there are such variations,
then correct detrending needs to take into account the effects of such variations
on the trends, as in (26). Otherwise, there will be misspecification bias from
omitted variables in the deterministic detrending.

Fixed Band Regressions and Misspecification Bias


A simple illustration with the Hannan (1963a) inefficient band-spectrum
regression estimator shows the effects of such misspecification. This estimator
can be constructed for the band A and then has the form

(27) :̂A = X
QZ A QZ X
−1 X
QZ A QZ y


with a corresponding formula for :̂Ac , the estimator over the band cA . In form-
ing :̂A and :̂Ac , the data are filtered by a trend removal regression via the projec-
tion QZ before performing the band-spectrum regression. This procedure follows
Hannan’s (1963a, b) recommendation for dealing with deterministic trends and,
as we have discussed in the introduction, is the conventional approach in this
context. Using (24) and (27), we find

(28) :̂A = :A + X
QZ A QZ X−1 X
QZ A QZ A c Z%2 :A − :Ac

− A c X :A − :Ac
+ ( 
 A − :Ac
− ( 
= :A − X
QZ A QZ X−1 X
QZ A QZ A c X :
= :A − X  −1 X

QZ A QZ X  A − :Ac
− ( 

QZ A QZ A c X :

and the corresponding formula for :̂Ac is

(29) :̂Ac = :Ac − X  −1 X



QZ A c QZ X 
QZ A c QZ A X :
 Ac − :A
− ( 

It follows that

(30)  A − :Ac

E :̂A X
= :A − X
QZ A QZ X−1 X
QZ A QZ A c X :

and
 Ac − :A

E :̂Ac X
= :Ac − X
QZ A c QZ X−1 X
QZ A c QZ A X :
1078 d. corbae, s. ouliaris, and p. c. b. phillips

Hence, band-spectrum regression yields biased estimates of the coefficients when


:A = :Ac and trend removal regression is conducted via the projection QZ .
To some extent, the problem with :̂A is a finite sample one. In fact, for sta-
tionary x̃t , the bias disappears as n → . But, as we shall see in Section 5, when
x̃t is integrated the bias in :̂Ac does not always disappear in the limit.

Narrow Band Regressions


In the nonparametric case we estimate : in (5) at a particular frequency ,
rather than over a fixed discrete band. In that case the band spectral estimator
has the same form as (27) but the selector matrix A = A (respectively, A = A )
chooses frequencies in a shrinking band about . We write the estimator as
(31) :̂ = X
QZ A QZ X
−1 X
QZ A QZ y

From the form of (15) it is apparent that the narrow band regression leading
to (31) omits the term involving the dft, wv s
, of the differences vt = 1x̃t . So
this narrow band regression also involves omitted variable misspecification.
An alternate nonparametric procedure is to use an augmented narrow band
regression in which the differences are included. To fix ideas, we detrend the
data using QZ and then perform a band spectral regression of the form
(32) wyz s
= ã
1 wxz s
+ ã
2 w1xz s
+ residual
for frequencies s in a shrinking band around . In equation (32) the affix ‘z’ on
the subscript (e.g., in wxz s
) signifies that the data have been detrended before
taking dft’s. This approach is analogous to the augmented spectral regression
approach of Phillips and Loretan (1991) for estimating a cointegrating vector.
However, the narrow band regression (32) now applies for frequencies  that
are nonzero as well as zero and prior detrending of the data has occurred in the
time domain. Define X = X 1X and ã
 = ã
1  ã
2
. Then
(33) ã = X
QZ A QZ X
−1 X
QZ A QZ y

In view of the relation (16), an estimate of : = : e−i
may be recovered from
ã using the linear combination

ã1 − − e−i − 1
ã2 −   =
 0
(34) :̃ =
ã10   = 0

The asymptotic theory for the narrow band estimates :̂ and :̃ is developed
in Section 5.

3 dft recursion formulae


Here we develop some useful formulae for dft’s of the deterministic sequence
zt . The following
 lemma gives general recursion formulae for the quantity
Wk s
= nt=1 t k eis t . The case for s = 0 is well known, of course, but the result
for s = 0 appears to be new.
band spectral regression 1079

Lemma A: (a) For s = 0,




n
1 k+1 k+1
Wk s
= tk = nk+1 − B1 nk + B2 nk−1
t=1
k+1 1 2

k+1 k+1
+ B3 nk−2 + · · · + Bk−1 n 
3 k

where Bj are the Bernoulli numbers.


(b) For s = 0, Wk s
satisfies the recursion
k
eis
k 1  k
(35) Wk s
= n i + i −1
j Wk−j s

e − 1 e − 1 j=1 j
s s

with initialization


n
is t n s = 0
(36) W0 s
= e =
t=1 0 s = 0

  k
Using (35), the standardized quantities W k s
= √1n nt=1 nt eis t satisfy the
recursion

1 eis 1  k
1 k
(37) W k s
= √ i + i −1
j W k−j s

n e s − 1 e s − 1 j=1 nj j

and then the dft of dt is simply

1  n  

wd s
= √ dt eis t = W 0 s
     W p s

n t=1

The main cases of interest are low order polynomials, where explicit expres-
sions for the discrete Fourier transforms W k s
are easily obtained from
Lemma A. Thus, when k = 0 1 2 we have
√
n s = 0
(38) W 0 s
=
0 s = 0

 n+1

  s = 0
2n1/2
(39) W 1 s
=
 eis
 1

 s = 0
n1/2 eis − 1

 n + 1
2n + 1


  s = 0
6n3/2
(40) W 2 s
=

 1 eis 2 eis
 −  s = 0
n1/2 eis − 1 n3/2 eis − 1
2
1080 d. corbae, s. ouliaris, and p. c. b. phillips

In case (38), it is apparent that eliminating the zero frequency will demean
the data and leave the model unchanged for s = 0. Then, A c Z = 0 and so
=0
A QZ A c = A A c − A Pz A c = 0, and QZ = QV . It follows that X
QZ A QZ A c X
in (30) and therefore :̂A is unbiased in this case of simple data demeaning.
On the other hand, when z
t = 1 t
and dt
= 1 t/n
, it follows from (38) and
(39) that

√ n + 1 

 n 1/2  s = 0
1 n
2n
(41) wd s

= √ dt
eis t =
n t=1   eis 
 0 1

 s = 0
n1/2 eis − 1
and the second component of wd s
has nonzero elements for s = 0. Hence,
A c Z = 0 and QZ = QV , so that :̂A is biased in this case. The same applies
for higher order trends.
Frequencies in the band cA satisfy s  > 0 > 0. It therefore follows from
1
Lemma A and (37) that W k s
= O n− 2
uniformly for s ∈ cA . Hence,
(42) wd s
= O n−1/2
for all s ∈ cA
and, thus, A c D = W ∗ Ac WD has elements that are of O n−1/2
.

4 frequency domain detrending


We consider first the fixed band regression model (22) and (23). The problem
of the omitted variable bias evident in (28) can be dealt with either in the fre-
quency domain or in the time domain. In the time domain, one simply detrends
using the projection operator QV = I − PV , where V = A Z A c Z , as is appar-
ent from the form of (26). In the frequency domain, the alternative is to leave
the detrending until the regression is performed in the frequency domain.
To do frequency domain detrending, we simply apply the discrete Fourier
transform operator W to (24) and then perform the band spectrum regression.
The transformed model is
Wy = WZ "1 − %2 :A
+ W A c Z%2 :A − :Ac
+ WX:A
− W A c X :A − :Ac
+ W(
= WZ "1 − %2 :A
+ Ac WZ%2 :A − :Ac
+ WX:A
− Ac WX :A − :Ac
+ W(
The resulting band spectral estimator for the band A is equivalent to a regres-
sion on
(43) AWy = AWZ "1 − %2 :A
+ AWX:A + AW(
c
since AA = 0, and therefore this estimator has the form
f
(44) :̂A = X
W ∗ AQAWZ AWX
−1 X
W ∗ AQAWZ AWy

= :A + X
W ∗ AQAWZ AWX
−1 X
W ∗ AQAWZ AW(

band spectral regression 1081

f
Clearly, E :̂A X
= :A , and the estimator is unbiased. A similar result holds for
f
the corresponding estimator :̂Ac of :Ac .
In this frequency domain approach to detrending, the so called Frisch–Waugh
f
(1933) theorem clearly holds, i.e., the regression coefficient :̂A on the variable
AWX in (43) is invariant to whether the regressor AWZ is included in the regres-
sion or whether all the data have been previously detrended in the frequency
domain by regression on AWZ.
Following (32), the natural narrow band approach is to use an augmented
regression model that includes the dft of the trend in the regression, viz.


(45) wy s
= c̃1 wx s
+ c̃2 w1x s
+ c̃3 wz s
+ residual

for frequencies s centered on . This narrow band regression leads to the estimate

f
c̃1 − − e−i − 1
c̃2 −   = 0
(46) :̃ =
c̃10   = 0

similar to (34).

5 asymptotic theory
We derive a limit distribution theory for the detrended band spectral regres-
sion estimates and consider what happens to the bias as n → . We start by
introducing some notation and making the framework for the limit theory more
precise.
We start with the parametric case where there are the two discrete bands
A and Ac . Let na = #s ∈ A  and nc = #s ∈ Ac  be the number of
fundamental frequencies in the bands A and Ac . It is convenient to subdivide
−" " into subbands j of equal width (say, "/J ) that center on frequencies
j = "j/J # j = −J + 1     J − 1. Let m = # s ∈ j
and suppose that Ja of
these bands lie in A . It follows that n and na can be approximated by n = 2mJ
and na = 2mJa , respectively.
In the nonparametric case, we focus on a single frequency  and consider a
shrinking band  of width "/J centered on . Again, we let m = # s ∈ 
.
The following condition will be useful in the development of the asymptotics
and will be taken to hold throughout the remainder of the paper. Additional
requirements will be stated as needed.

Assumption 3: (a) na /n → E and nc /n → 1 − E for some fixed number E ∈


0 1 as n → . √
(b) m J → , and J / n → 0 as n → .

For the bias in :̂A to vanish asymptotically, the deviation that depends on the
term
 A − :Ac

X
QZ A QZ X−1 X
QZ A QZ A c X :
1082 d. corbae, s. ouliaris, and p. c. b. phillips

in (28) needs to disappear as n → . We will distinguish the two cases of sta-


tionary and nonstationary (integrated) x̃t corresponding to Assumptions 1 and 2
in the following discussion.

51 The Stationary Case


Here, the bias in :̂A will disappear when
−1 
X
QZ A Q Z X X
QZ A Q Z A c X p
(47) → 0
n n

A similar requirement, obtained by interchanging A and A c in (47), holds for


the bias in :̂Ac . We have the following result.

Theorem 1 (Semiparametric Case): If x̃t and (t are zero mean, stationary,


and ergodic time series satisfying Assumption 1, and ỹt is generated by (22), band
spectral regression with detrending in the time domain or in the frequency domain
f
is consistent for both :A and :Ac . The common limit distribution of :̂A and :̂A is
given by
√ √ f d
n :̂A − :A
 n :̂A − :A
→ N 0 Va


where
 −1   −1
(48) Va = fxx 
d 2" fxx 
f(( 
d fxx 
d 
A A A

f
with an analogous result for :̂Ac and :̂Ac .

In the stationary case, therefore, the bias in band spectral regression from time
domain detrending disappears as n →  and there is no difference between the
two bands A and cA in terms of the limit theory. It is therefore irrelevant
whether the main focus of interest is high or low frequency regression. The form
of the asymptotic covariance matrix Va is a band spectral version of the familiar
sandwich formula for the robust covariance matrix in least squares regression.
Va can be estimated by replacing the spectra in the above formula with cor-
responding consistent estimates and averaging over the band A . For the full
band case where A = −" " , the matrix Va is the well known formula for the
asymptotic covariance matrix of the least squares regression estimator in a time
series regression (c.f. Hannan (1970, p. 426)). A formula related to Va was given
in Hannan (1963b, equation (16)) for band spectral estimates in the context of
models with measurement error.
A similar result holds in the nonparametric case, but with a different conver-
gence rate.
band spectral regression 1083

Theorem 1
(Nonparametric Case): If x̃t and (t are zero mean, stationary,
and ergodic time series satisfying Assumption 1, and ỹt is generated by (3), non-
parametric band spectral regression with detrending in the time domain or in the
frequency domain is consistent for : = b −
. The common limit distribution of
f
:̂ and :̂ is given by
√ √ d
m :̂ − :
 m :̂f − :
→ Nc 0 V


where

V = f(( 
fxx 
−1 

52 The Nonstationary Case


Here, the distinction between low frequency regression and regression at other
frequencies becomes important. In the band regression model (22), there are
only two bands A and cA . Over A , which includes the zero frequency, the
estimator is known to be n-consistent when :A = :Ac (see Phillips (1991a)) since
in that case, the regression equation is a conventional cointegrating relation.
When :A = :Ac , the same result continues to hold over the band A , as shown
in Theorem 2 below. In this case, the bias in (28) disappears when
−1 
X
QZ A Q Z X X
QZ A Q Z A c X p
(49) → 0
n2 n 2

Note that in (49) the moment matrices are standardized by n2 , because the data
nonstationarity is manifest in bands like A that include the zero frequency. Over
frequency bands like cA that exclude the zero frequency the rate of convergence
of the moment matrices is slower and the bias in :̂Ac will disappear when
−1 
X
Q Z A c QZ X X
QZ A c QZ A X p
(50) → 0
n n

Our starting point in the I 1


case is to provide some limit theory for dis-
crete Fourier transforms of I 1
and detrended I 1
processes. The following
two lemmas do this and provide a limit theory for periodogram averages of such
processes. The remainder of the limit theory then follows in a fairly straightfor-
ward way from these lemmas. To make the derivations simpler we confine our
attention here to the case of a linear trend.

Lemma B: Let x̃t be an I 1


process satisfying Assumption 2. Then, the discrete
Fourier transform of x̃t for s = 0 is given by

1 eis x̃n − x̃0


(51) wx̃ s
= wv s
− 
1−e s
i 1 − eis n1/2
1084 d. corbae, s. ouliaris, and p. c. b. phillips

Equation (51) shows that the discrete Fourier transforms of an I 1


process are
not asymptotically independent across fundamental frequencies. Indeed, they are
frequency-wise dependent by virtue of the component n−1/2 x̃n , which produces a
common leakage into all frequencies s = 0, even in the limit as n → . As the
next lemma shows, this leakage is strong enough to ensure that smoothed peri-
odogram estimates of the spectral density fxx 
= 1 − ei −2 fvv 
are inconsis-
tent at frequencies  ∈ cA . Lemma C(f) shows that the leakage is still manifest
when the data are first detrended in the time domain. On the other hand, it is
apparent from (41) that we can write (51) as

1
(52) wx̃ s
= w 
+ w nt s
x̃n − x̃0 
1 − eis v s

from which it is clear that frequency domain detrending (i.e., using residuals from
regressions of the frequency domain data on wt/n s
or wd s
) will remove the
second term of (51) and thereby eliminate the common leakage from the low
frequency.

Lemma C: Let x̃t be an I 1


process satisfying Assumption 2 and dt = 1 t/n
.




−1

Define x̃d t = x̃t − dt D D
D X
and let wxd 
be the discrete Fourier trans-
form of x̃d t . Then, as n → :
 d 1
(a) n−2 s ∈A wx̃ s
wx̃ s
∗ → 0 Bx Bx

 d 1
(b) n−3/2 s ∈A wx̃ s
wd s
∗ → 0 Bx u

 d 1
(c) n−2 s ∈A wxd s
wxd s
∗ → 0 Bxu Bxu


 d 1

(d) n−1/2 s ∈cA wx̃ s
wd s
∗ → − 2" Bx 1
c ei f1 
∗ / 1 − ei

d
 d 
1
A

(e) n−1 s ∈cA wx̃ s
wx̃ s
∗→ c fxx 
+ 2" 1/ 1−ei 2

Bx 1
Bx 1

d
 d 
A


(f) n−1 s ∈cA wxd s
wxd s
∗ → c fxx 
+ 2"
−1 g  Bx
g  Bx
∗ d
 d
A
1

(g) n−1/2 s ∈cA wxd s
wd s
∗ → − 2" cA
g  Bx
f1 
∗ d
 1

(h) s ∈cA wd s
wd s
∗ → 2" cA 1
f 
f1 
∗ d
  1
(i) n−1 s ∈A wd s
wd s
∗ → 0 uu

 d 1
(j) n−2 s ∈0 wx̃ s
wx̃ s
∗ → 0 Bx Bx

 d 1
(k) n−2 s ∈0 wxd s
wxd s
∗ → 0 Bxu Bxu


1  ∗
 1

(l) n s ∈0 wd s
wd s
→ 0 uu 
 d 1
(m) n−3/2 s ∈0 wx̃ s
wd s
∗ → 0 Bx u


(n) n/m
s ∈ wd s
wd s
∗ → f1 
f1 
∗   = 0
√  d
(o) nm−1 s ∈ wx̃ s
wd s
∗ → ei / 1 − ei

B 1
f1 
∗   = 0
band spectral regression 1085

where fxx 
= 1 − ei −2 fvv 
, f1 
= 0 ei / ei − 1

for  = 0, and
 1
 1
−1

Bxu r
= Bx r
− Bx u uu u r

0 0

and
 1  1 −1
ei

g  Bx
= B 1
+ B u uu f1 

1 − ei x 0
x
0

Joint convergence applies in (a)–(o).

With this result in hand, the I 1


band regression model can be analyzed. The
limit theory for time domain detrended band spectral regression is as follows.

Theorem 2: Suppose (t  x̃t


satisfies Assumption 2 and ỹt is generated by (22).
Then
 1 −1  1
d

(53) n :̂A − :A
→ Bxu Bxu Bxu dB(
0 0
 1
−1

= MN 0 Bxu Bxu 2"f(( 0



0

and
d
(54) :̂Ac → :Ac − G −1 - :Ac − :A


where
 
G= 2"fxx 
+ g  Bx
g  Bx
∗ d 
cA

and
  −1  
 1 1

(55) -= g  Bx
f1 
d uu uBx

cA 0 0

So, when the regressors are I 1


, band spectral regression is inconsistent in
frequency bands that exclude the origin when detrending is performed in the time
domain prior to frequency domain regression. The bias is random and is linear
in the differential, :Ac − :A , between the coefficients in the frequency bands. In
the case of a linear trend zt = t, the limit function f1 
= ei / ei − 1
and then
the bias depends on
  1  1 −1 
ei
g  Bx
= B 1
− Bx r r2 
1 − ei x 0 0
1086 d. corbae, s. ouliaris, and p. c. b. phillips

When fxx 
= 1/2"
1 − ei −2 (i.e., when vt is iid(0, 1)), a simple calculation
reveals that the probability limit (54) of :̂Ac simplifies to
   −1 
1 1 2  1 2 −1   1 
0
B x r 0
r − B x 1
0
r 0
B x r
(56) :Ac + :A − :Ac
    −1 2 
1 1 2
1 + Bx 1
− 0 Bx r 0
r

So, in this case, the bias is not dependent on the width of the band :Ac .
The following theorem gives the corresponding results for the nonparametric
estimators :̂ and :̃ .

Theorem 2
: Suppose (t  x̃t
satisfies Assumption 2 and ỹt is generated by (11)
where : L
and :̃ L
both have valid BN decompositions.
(a) For  = 0,
 1 −1  1  1  
d

(57) n :̂0 − :0
→ Bxu Bxu Bxu dB( − Bxu dBx + 1x :̃ 1

0 0 0


where :0 = : 1
and 1x = j=− E v0 vj
.
For  = 0,


(58) :̂ →d : + 2"G−1 fxx 
− 1 :̃ e−i
e−i − 1

where : = : e−i
= b −
,
G = 2"fxx 
+ g  Bx
g  Bx
∗ 
and fxx 
= 1 − ei −2 fvv 
.
(b) For  = 0,
d
 1 −1   1 

n :̃0 − :0
→ Bxu Bxu Bxu dB( 
0 0

For  = 0,

√ f(( 

m :̃ − :
→d Nc 0 
fxx 

Remarks: (i) In Theorem 2


(a) it is apparent that the limit distribution of
:̂0 has a second order bias term involving
 1 
Bxu dBx
+ 1x :̃ 1

0

which is nonzero except when :̃ 1


= 0, a circumstance that arises when the filter
: L
is symmetric (see Remark (ii) following Lemma D in the Appendix) as it
is in the case (19). In that case, the limit (57) reduces to the same mixed normal
distribution as (53). Part (a) shows also that the estimator :̂ is inconsistent at
all frequencies  = 0 except when :̃ e−i
= 0. The bias in :̂0 and inconsistency
in :̂ are due to omitted variable misspecification in the regression.
band spectral regression 1087

(ii) Part (b) of Theorem 2


shows that the estimator :̃ based on the aug-
mented regression (32) is consistent for both  = 0, and  = 0. The same mixed
normal distribution as (53) applies when  = 0. When  = 0 the limit distribution
is the same as in the stationary case given in Theorem 1
.
The following results give the limit theory for the frequency domain detrended
estimators in the fixed band and narrow band cases for nonstationary data.

Theorem 3: Under the conditions of Theorem 2


 −1 
 f  d 1

1
(59) n :̂A − :A → Bxu Bxu Bxu dB(
0 0

and
  −1  
√  f  d  
(60) n :̂Ac − :Ac → N 0 fxx 
d 2" fxx 
f(( 
d
cA cA

 −1 
× fxx 
d 
cA

In fixed band regression models there is usually some advantage to be gained


by averaging over the band and using weighted regression, as shown in Hannan’s
(1963a, b) original work. Efficient regression is based on a weighted band spec-
tral regression that uses a preliminary regression to obtain estimates of the equa-
tion errors and a corresponding estimate of the error spectrum, say fˆ(( 
, that
is uniformly consistent so that sup fˆ(( 
− f(( 
 →p 0 (e.g., Hannan (1970,
p. 488)). When such weighted regression is performed with frequency domain
detrending, the resulting estimates have optimal properties for both the nonsta-
tionary component and the stationary component. For :A , the limit distribution
is the same as (59) and is optimal in the sense of Phillips (1991b). For :Ac ,
the limit distribution of the efficient estimate is normal with variance matrix
2" c fxx 
f(( 
−1 d −1 and therefore attains the usual efficiency bound in
A
time series regression (e.g., Hannan (1970, eqn. (3.4), p. 427)) adjusted here for
band limited regression (Hannan (1963, eqn. (16))). Details are omitted and are
available in an earlier version of the present paper.

Theorem 3
: Under the conditions of Theorem 2
, at  = 0
 −1 
 f  d 1

1
n :̃0 − :0 → Bxu Bxu Bxu dB(
0 0

and for  = 0
√  f  d  
(61) m :̃ − : → Nc 0 f(( 
fxx 
−1 
1088 d. corbae, s. ouliaris, and p. c. b. phillips

f f
So, :̂A and :̃0 are consistent and have the same mixed normal limit distribution
as that of :̂A in (53). The limit distribution makes asymptotic inference about :A
and :0 straightforward, using conventional regression Wald tests adjusted in the
usual fashion so that a consistent estimate of the spectrum of (t is used (based
on regression residuals) in place of a variance estimate.
f f
The frequency domain detrended estimators :̂Ac and :̂ are also consistent,
unlike the time detrended estimator :̂  = 0
in the nonstationary case. The
f
limit distribution of :̂Ac is the same as it is in the case of trend stationary regres-
√ f
√ (Theorem 1) and has the same n rate. The nonparametric estimator :̃ is
sors
m consistent and is asymptotically equivalent to the augmented regression esti-
mator :̃ . As far as the model (11) is concerned, it is therefore apparent from
Theorems 2
and 3
that if the correct augmented regression model is used in
estimation, it does not matter asymptotically whether detrending is done in the
time domain or the frequency domain.

6 conclusion
It is natural to eliminate deterministic trends in the time domain by simple least
squares regression because the Grenander–Rosenblatt (1957, p. 244) theorem
shows that such regression is asymptotically efficient when the time series are
trend stationary (although this conclusion does not hold when there are stochas-
tic as well as deterministic trends—see Phillips and Lee (1996)). In a similar way,
it seems natural to eliminate deterministic trends in band spectral regressions by
detrending in the time domain prior to the use of spectral methods because these
methods were originally intended for application to stationary time series. How-
ever, this paper shows that such time domain detrending will lead to biased coef-
ficient estimates in models where the coefficients are frequency dependent. In
models that have both deterministic and stochastic trends, time domain detrend-
ing can lead to inconsistent estimates of the coefficients at frequency bands away
from the origin. The inconsistency, which arises from omitted variable effects,
can be substantial and has been confirmed in simulations that are not reported
here (Corbae, Ouliaris, and Phillips (1997)).
The bias and inconsistency arise from omitted variable misspecification and
are managed by use of an appropriate augmented regression model. The situa-
tion is analogous to a structural break model, but here the coefficients change
across frequency rather than over time. An alternate approach that is suitable in
practice is to model the data and run regressions, including detrending regres-
sions, in the frequency domain. In effect, discrete Fourier transforms of all the
variables in the model, including the deterministic trends, are taken and band
regression is performed. When nonparametric estimation is being conducted, the
same principle is employed but one uses a shrinking band that is local to a par-
ticular frequency. In the nonstationary case, it turns out to be particularly impor-
tant to specify the model in terms of levels and differences as in (15) leading to
the fitted regressions (32) and (45). An estimate of the frequency response coef-
ficient at a particular frequency is then recovered from a linear combination of
band spectral regression 1089

the coefficients in the regression as in (34) and (46). This approach, which can
be regarded as a frequency domain version of leads and lags dynamic regression,
provides a convenient single equation method of estimating a long run relation-
ship in the presence of deterministic trends and short run dynamics.

Dept. of Economics, University of Texas at Austin, Austin, TX 78712, U.S.A.;


corbae@eco.utexas.edu; http://www.eco.utexas.edu/∼corbae
School of Business, National University of Singapore, Singapore 117591; and
Research Dept., IMF, Washington DC, U.S.A.; souliaris@nus.edu.sg
and
Cowles Foundation for Research in Economics, Yale University, Box 208281,
New Haven, CT 06520-8281, U.S.A.; and University of Auckland and University of
York; peter.phillips@yale.edu; http://Korora.econ.yale.edu
Manuscript received September, 1997; final revision received June, 2001.

APPENDIX

  1
Lemma D (Two Sided BN Decompositions): If C L
=  j=− cj L
j
and j=− j cj  < 
2

then:
 
(a) C L
= C 1
+ C̃ L
1 − L
, where C L
=  j
j=− c̃j L , with

 
 

 c  j ≥ 0

k=j+1 k
c̃j =

 j



− ck  j < 0
k=−

w L
=  c̃wj Lj  with
w e−iw L
e−iw L − 1
 where C
(b) C L
= C eiw
+ C j=−

 
 

 c eikw  j ≥ 0

k=j+1 k
c̃wj =

 j


 ck eikw  j < 0
−
k=−

Proof of Lemma D: The proof is along the same lines as the proof in Phillips and Solo (1992,
Lemma 2.1) of the one sided BN decomposition.

Remarks: (i) The condition



1
(62) j 2 cj  < 
j=−


is sufficient for (a) and (b) and ensures that  2
j=− c̃j <  but it is not necessary for the latter. For
 j
instance, the series : L
= j=− :j L , whose coefficients :j are given by (19), fails (62) but still has

a valid BN decomposition with  2
j=− :̃j < . To see this, let cj = eijx /j and then :j is a constant

times the imaginary part of cj . Define Sn = nj=1 eijx = eix / eix − 1

einx − 1
. Partial summation gives


n
eijx 1 eix n
ei j−1
x − 1
Pn = = Sn + ix 
j=1
j n e − 1 j=2 j j − 1

1090 d. corbae, s. ouliaris, and p. c. b. phillips

It follows that
Sn S m eix n
ei j−1
x − 1
Pn − P m = − + ix 
n m e − 1 j=m+1 j j − 1

and letting n →  we have




eijx S eix 
ei j−1
x − 1 1
c̃m = = − m + ix =O 
j=m+1
j m e − 1 j=m+1
j j − 1
m
 
with a similar result for c̃−m . We deduce that  2
− c̃m  < . Hence, : L
=
j
j=− :j L with :j
  2
given by (19) has a valid BN decomposition with j=− :̃j < 
(ii) The case of a symmetric filter with cj = c−j for j = 0 is an important specialization, which
 
includes the series with coefficients :j given by (19). In this case, −c̃−m =  j=m c−j = j=m cj =
 =       
cm + c̃m . Then, C 1
j=1 c̃j + c̃−j
+ c̃0 = j=1 c̃j − c̃j − cj
+ j=1 cj = 0. If C L
has a valid BN
decomposition, we deduce that

  

(63) 
C L
= C 1
+ C L
1 
− L
2  C L
= c̃˜j Lj 
j=−

where
 
 

 c̃ j ≥ 0

k=j+1 k

c=
j

 j



− c̃k j < 0
k=−

 1  3
A sufficient condition for the validity of (63) is j=− j 2 c̃j  <  or j=− j 2 cj  <  in terms
of the original coefficients.

Proof of Lemma A: Part (a) is well known (e.g., Gradshteyn and Ryzhik (1965, formula
0.121)). For part (b) we use partial summation to give
 

n 
n
 k  i t 
n
 
1 t k eis t = 1t e s + t − 1
k 1eis t
t=1 t=1 t=1
 
 k 
n
k k−j n
 
= − t −1
eis t + t − 1
k eis t−1
eis − 1 
j

t=1 j=1
j t=1

or
 
 k 
n
k k−j n
 
nk = − t −1
j eis t + t − 1
k eis t−1
eis − 1 
t=1 j=1
j t=1

Using this formula in the identity


 

n
    
n
 
t k eis t eis − 1 = nk eis − 1 + t − 1
k eis t−1
eis − 1 
t=1 t=1

we get
   
 k 
n
k is t
 is
 k
 is
 k
n
k k−j
t e e −1 = n e −1 +n − − t −1
eis t
j

t=1 t=1 j=1


j
 
k n
k
= nk eis + −1
j t k−j eis t 
j=1
j t=1
band spectral regression 1091

giving the recursion


k

eis 1 k
Wk s
= nk + i −1
j Wk−j s

e − 1 e − 1 j=1 j
is s

which holds for s = 1     n (i.e., eis = 1).


The initial condition

 n
is t n s = 0
W0 s
= e =
t=1 0 s = 0

follows by elementary calculation. For higher order trends with k = 1 2 3 we get



 n n + 1


  s = 0
2
W1 s
=
 is
n e

 s = 0
eis − 1

 n n + 1
2n + 1


  s = 0
6
W2 s
=  

 eis 2
n n − i  s = 0
eis − 1 e s −1

and
 2

 n n + 1


  s = 0
2
W3 s
= is
 
n e


1 1
 eis − 1 n − 3 n − 1
 is  +6
2
2  s =
 0
e −1 eis − 1

which lead to the expressions for W k s


 k = 0 1 2, that are given in Section 2 following Lemma A.

Proof of Theorem 1: This follows standard lines and is omitted (see Corbae, Ouliaris, and
Phillips (1997)).

Proof of Theorem 1
: First, observe that
   
(64) :̂ = m−1 X  −1 m−1 X

QZ A Q Z X 
QZ A QZ ỹ 

Next, from (8) and (9) we have

1

(65) wỹ s
= b s

wx̃ s
+ w( s
+ √ e−is ãs 0 − ãs n
n

1
(66) = b 

wx̃ s
+ w( s
+ Op √
n
in the stationary case. We now find that

(67) 
QZ A QZ X
m−1 X  = m−1 wx̃ s
wx̃ s
∗ + op 1
→p 2"fxx 

s ∈

and using (66) we have

(68) 
QZ A QZ ỹ = m−1 X
m−1 X 
A ỹ + op 1

 
= m−1 wx̃ s
wx̃ s
∗ b −
+ m−1 wx̃ s
w( s
∗ + op 1

s ∈ s ∈
1092 d. corbae, s. ouliaris, and p. c. b. phillips

Then, from (64), (67), and (68) we have the expression


 −1  
√    
(69) m :̂ − : = m−1 wx̃ s
wx̃ s
∗ m−1/2 wx̃ s
w( s
∗ + op 1

s ∈ s ∈

The family w( s


# s ∈   are known to satisfy a central limit theorem (with limit Nc 0 2"f(( 

)
for dft’s of stationary processes and are independently distributed2 as n → . It follows from this
result and (67) that

(70) m−1/2 wx̃ s
w( s
∗ →d Nc 0 2"
2 f(( 
fxx 


s ∈

which leads to the stated limit theorem for :̂ . A similar argument gives the result for :̂f .

Proof of Lemma B: Take dft’s of the equation 1x̃t = Jt , giving


n

 
wx̃ S
= n−1/2 x̃t−1 eis t + wv s
= eis wx̃ s
− n−1/2 eis n x̃n − x̃0 + wv s

t=1

Then, transposing and solving yields the stated formula

1 eis x̃n − x̃0


(71) wx̃ s
= w 
− 
1 − eis v s 1 − eis n1/2

Proof of Lemma C: Part (a)# This result may be proved as in Phillips (1991a). A new and
substantially simpler proof uses part (e) and is as follows.
  
n−2 wx̃ s
wx̃ s
∗ = n−2 wx̃ s
wx̃ s
∗ − n−2 wx̃ s
wx̃ s

s ∈A s ∈ −" " s ∈c
A


n
= n−2 x̃t x̃t
+ Op n−1

t=1

d
 1
→ Bx Bx

0

where the error magnitude in the second line follows


√ directly from part (e) below.  
Part (b)# First note from (41) that wd 0

= n n + 1
/2n1/2 , so that n−1/2 wd 0

→ 1 21 =

 1

f0 = 0 u , and

1 eis 1
wd s

= 0 1/2 i −1 = √ f1 s

 s = 0
n e s n

It follows that as n → 

n
(72) wd s

= 0 1 + o 1

2"si

2
Hannan (1973, Theorem 3) showed that a finite collection of w( s
satisfy a central limit theorem
and are independent. Here the collection s ∈   has m members and is asymptotically infinite.
Phillips (2000, Theorem 3.2) showed that an asymptotically infinite collection of w( s
satisfy a
central limit theorem and are asymptotically independent for frequencies s in the neighborhood of
1 1
the origin provided the number of frequencies m = o n− 2 + p
where E (t p
<  for some p > 2,
i.e., provided the number of frequencies does not go to infinity too fast. This result can be extended
to frequency bands away from the origin, although a proof was not given in that paper.
band spectral regression 1093

a formula that holds for both s fixed and for s →  with s/n
→ 0. On the other hand, when
s →  = 0 as n →  we have

1 ei 1 1 1
(73) wd s

= √ 0 i +o √ = √ f1 

+ o √ 
n e −1 n n n
  
Write the summation over A as follows: s ∈A = s ∈ −" " − s ∈Ac . Then
 
  
n−3/2 wx̃ s
wd s
∗ = n−3/2 wx̃ s
wd s
∗ − n−3/2 wx̃ s
wd s

s ∈A s ∈ −" " s ∈Ac
 
= n−3/2 wx̃ s
wd s
∗ − n−2 wx̃ s
f1 s
∗ 
s ∈ −" " s ∈Ac

First,
 
  1 eis x̃n − x̃0
n−2 wx̃ s
f1 s
∗ = n−2 wv s
− f1 s

s ∈Ac s ∈Ac
1−e s
i 1−e s n
i 1/2

= Op n−1


Second,
 
n 
n−3/2 wx̃ s
wd s
∗ = n−2 x̃t eis t wd s

s ∈ −" " t=1 s ∈ −" "


n n

= n−3/2 x̃t dt
= n−1 √t dt

t=1 t=1 n
d
 1
→ Bx r
u r

dr
0

It follows that
 
 d
 1
n−3/2 wx̃ s
wd s
∗ → Bx r
u r

dr
s ∈A 0

Part (c)# First, observe that


 1
−1  1
d
n−1/2 x̃d

nr → Bx r
− u r

uu
uBx
= Bxu r


0 0

Then
 
1  d
 1
wxd s
wxd s
∗ →

Bxu Bxu 
n2 s ∈A 0

as in part (a).
Part (d)# From Lemma B we have
1 eis x̃n − x̃0
wx̃ s
= w 
− 
1 − eis v s 1 − eis n1/2
d
As in the proof of part (b), we have n−1/2 x̃n → Bx 1
, and, if s →  = 0 as n → ,

d 1 ei
wx̃ s
→ Nc 0 2"fvv 

− N 0 2"fvv 0

1−e i 1 − ei

f 
+ fvv 0

= Nc 0 2" vv 
1 − ei 2
1094 d. corbae, s. ouliaris, and p. c. b. phillips

It follows that
 
n−1/2 wx̃ s
wd s
∗ = n−1 wx̃ s
f1 s

s ∈c s ∈c
A A
is
 1 x̃ − x̃  e
= n−1 wJ s
f1 s
∗ − n−1 n 1/2 0 f 

s ∈c
1−e s
i n  ∈c 1 − eis 1 s
A s A

= term I + term II say.


p
Since the wJ s
are asymptotically independent Nc 0 2"fvv s

, term I → 0. For term II, since


x̃0 = Op 1
, we have
is
x̃n − x̃0  eis 1 x̃n 2"  e
n−1 f 1  s

∼ f 

n1/2  ∈c 1 − eis 2" n1/2 n  ∈c 1 − eis 1 s
s A s A

d1 ei f1 

→ Bx 1
d
2" c
A
1 − ei

so that
 ei f 

d 1 1
(74) term II → − Bx 1
d 
2" c
A
1 − ei

giving part (d).


Part (e)# From Lemma B we get
 d  1  x̃n x̃n

n−1 wx̃ s
wx̃ s
∗ ∼ n−1 wv s
wv s
∗ + 1/2
s ∈c s ∈c
1 − e s 
i 2 n n1/2
A A

1  1  1  x̃n x̃n

= w 
w 
∗ + 1/2 + op 1

2J Ja ≤j≤J
m s ∈ 1 − eis 2 v s v s n n1/2
j

1  1  1 x̃n x̃n

= ij 2
2" fˆvv j
+ n−1 + op 1

2J Ja ≤j≤J 1 − e  s ∈c
1 − eis 2 n1/2 n1/2
A

d
 fvv 
1  1
→ d + d Bx 1
Bx 1


c
A
1 − e 
i 2 2" cA 1 − ei 2

as required. In the penultimate line above, 2" fˆvv j
= 1/m
s ∈j wJ s
wJ s
∗ and fˆvv 

fvv 
→p 0.
Part (f)# Note that

(75) 
wxd s
∗ = wx̃ s
∗ − wd s
∗ n−1 D
D
−1 n−1 D
X

1 e−is x̃n − x̃0



= wv s
∗ − − n1/2 wd s
∗ n−1 D
D
−1 n−3/2 D
X

1−e s −i 1 − e−is n1/2


 1 −1  1
d 1 e−i

→ N c 0 2"f JJ 

− B x 1
− f 1 
uu uB x
1 − e−i 1 − e−i 0 0

1
= N 0 2"fJJ 

− g  Bx
∗  say
1 − e−i c
when s →  = 0 as n → . Here
 1  1 −1
ei

g  Bx
= B x 1
+ B x u uu f1 

1 − ei 0 0
band spectral regression 1095

and the complex normal variate Nc 0 2"fJJ 

in (75) is independent of the Brownian motion Bx .


If j = "j/J is the midpoint of the band j and j →  as n → , then

1  d 2"fJJ 

2" fˆxxd j
= w 
w 
∗ → + g  Bx
g  Bx

m s ∈ xd s xd s 1 − ei 2
j

= 2"fxx 
+ g  Bx
g  Bx
∗ 

We deduce that
 1  1 
n−1 wxd s
wxd s
∗ = w 
w 
∗ + op 1

s ∈c
2J Ja ≤j≤J
m s ∈ xd s xd s
A j

1 
= 2" fˆxxd j
+ op 1

2J Ja ≤j≤J
d


→ fxx 
+ 2"
−1 g  Bx
g  Bx
∗ d
c
A

which gives part (f).


Part (g)# Observe that
 
n−1/2 wxd s
wd s
∗ = n−1 wxd s
f1 s
∗ + op 1

s ∈c s ∈c
A A

d 
∼ −n−1 g s  Bx
f1 s

s ∈c
A

d 1 2" 
∼− g s  Bx
f1 s

2" n  ∈c
s A

d 1 
→− g  Bx
f1 
∗ d
2" cA

as stated.
Part (h)# From (73) we have

 1  1 
wd s
wd s
∗ ∼ f1 s
f1 s
∗ → f 
f1 
∗ d
s ∈c
n  ∈c 2" cA 1
A s A

as required.
Part (i)# Using part (h), we get
   
n−1 wd s
wd s
∗ = n−1 wd s
wd s
∗ + Op n−1
s ∈A s ∈ −" "


n   
= n−1 dt eis t wd s
∗ + Op n−1
t=1 s ∈ −" "

−1

n
 
=n dt dt
+ Op n−1
t=1
 1
→ u r
u r

dr
0

giving the stated result.


1096 d. corbae, s. ouliaris, and p. c. b. phillips

Part (j)# Using the representation of wx̃ s


in Lemma B, the fact that s > J →  when s ∈
−" " − 0 , and proceeding as in part (a), we find that
 
−2
 ∗ −2
 
n wx̃ s
wx̃ s
= n − wx̃ s
wx̃ s
∗ + op 1

s ∈0 s ∈ −" " s ∈ −" " −0

−2

=n wx̃ s
wx̃ s

s ∈ −" "
  
1  ∗ x̃n x̃n

+ Op w v  s
w v  s
+
n≥s≥J
s2 n1/2 n1/2
 

n
d
1
= n−2 x̃t x̃t
+ op 1
→ Bx Bx

t=1 0

Part (k)# In the same way as parts (j) and (c) we find that
 
n
d
 1
n−2 wxd s
wxd s
∗ = n−2 x̃d t x̃
d t + op 1

Bxu Bxu 
s ∈0 t=1 0

  1
Part (l)# From part (b) we have n−1/2 wd 0

→ f0
= 1 21 = 0 u
and, for s = 0 with s/n
→ 0,


we have wd s
= 0 n/2"si
1 + o 1
from (72). Thus, for  = 0, we find that
1  1 1 
w 
w 
∗ = wd 0
wd 0

+ w 
w 

n s ∈ d s d s n n s ∈ −0 d s d s
0 0
 
0 0
1 2
= wd 0
wd 0

+  1 m 1
n 0 2 s=1
2" s2
   1
0 0
→ f0 f0
+ 1 = uu

0 12 0

Part (m)# As in part (b) we find that


 
−3/2
  
n wx̃ s
wd s
= n−3/2

wx̃ s
wd s
∗ − n−3/2 wx̃ s
wd s

s ∈0 s ∈ −" " s ∈0


n
= n−3/2 x̃t dt
− op 1

t=1

d
 1
→ Bx r
u r

dr
0

since

 1 
n−3/2 wx̃ s
wd s
∗ = Op w 

s 0
nm s  x̃ s
0
  x̃n −x̃0 
1  wv s
− eis √
n
= Op
nm s  1 − eis
0

= op 1


Part (n)# For  = 0, using (73) we obtain


n  1 
w 
w 
∗ ∼ f 
f 
∗ → f1 
f1 
∗ 
m s ∈ d s d s m s ∈ 1 s 1 s
band spectral regression 1097

Part (o)# For  = 0, using (73) we obtain



n  1 
w 
w 
∗ = w 
f 
∗ + o 1

m s ∈ x̃ s d s m s ∈ x̃ s 1 s
 i x̃n −x̃0 
1  wv s
− e s √
n
= f1 s
∗ + op 1

m s ∈ 1−e si

d ei
→ B 1
f1 
∗ 
1 − ei

Finally, joint weak convergence in (a)–(o) applies because the component elements jointly converge
and one may apply the continuous mapping theorem in a routine fashion. In particular, Assumptions 1
and 2 ensure that
 
nr
 d
n−1/2 *t
 wv 

→ B r

 -

 with - = Nc 0 2"fvv 


t=1

and where - is independent of B r


for all  = 0. The required quantities in (a)–(k) are functionals

of these elements and smoothed periodogram ordinates (like 1/m
s ∈j wv s
wv s
∗ for j → 
as in the case of part (f)) that converge in probability to constants.

Proof of Theorem 2: Note that


QZ A QZ X
:̂A − :A = −X 
QZ A QZ A c X :
 −1 X  A − :Ac
− ( 

We consider the limiting behavior of n−2 X 


QZ A Q Z X
 and n−1 X

QZ A QZ A c X.
 Define x̃
= x̃

dt t
 Since x̃t is an I 1
process and satisfies an invariance principle when standardized
dt
D
D
−1 D
X
.
by n−1/2 , we have
 1
−1  1
d
(76) n−1/2 x̃d

nr → Bx r
− u r

uu
uBx
= Bxu r

 say
0 0

Write the discrete Fourier transform of x̃d t as wxd s


and then from Lemma C(c) we have

 
 
= 1 d 1
(77) 
QZ A Q Z X
n−2 X wxd s
wxd s
∗ →

Bxu Bxu 
n2 s ∈A 0

1

Since 0 Bxu Bxu 


QZ A Q Z X
> 0 (see Phillips and Hansen (1990)), n−2 X  has a positive definite limit
as n → .
Next, decompose n−1 X 
QZ A Q Z A c X
 as follows:


QZ A Q Z A c X
X  
QZ W ∗ AWPZ W ∗ Ac W X
X  X 
QD W ∗ AWPD W ∗ Ac W X

(78) =− −
n n n
X
PD W ∗ AWPD W ∗ Ac W X
 X 
W ∗ AWPD W ∗ Ac W X

= −
n n
= term A − term B

Take each of these terms in turn. Factor term A as follows and consider each factor separately. Write

 X

PD W ∗ AWPD W ∗ Ac W X
X 
PD W ∗ AWD D
D −1 D
W ∗ Ac W X

(79) = 
n n3/2 n n1/2
1098 d. corbae, s. ouliaris, and p. c. b. phillips

The first factor is



PD W ∗ AWD
n−1/2 X 
D D
D −1 D
W ∗ AWD
n−1/2 X
(80) =
n n n n
 1  1 −1  1 1
d
→ Bx u
uu
uu
= Bx u

0 0 0 0

in view of (76) and Lemma C(i). The third factor is the conjugate transpose of

(81) 
W ∗ Ac WD = n−1/2
n−1/2 X wx̃ s
wd s

s ∈c
A
 ei f 

d 1 1
→− Bx 1
d 
2" c
A
1 − ei

from Lemma C(d). The limit of term A now follows by combining (81) and (80) and using joint weak
convergence:


PD W ∗ AWPD W ∗ Ac W X
X  
PD W ∗ AWD D
D −1
n−1/2 X
(82) = 
D
W ∗ Ac W n−1/2 X

n n n
 1   1 −1   
d 1 e−i f1 

→ Bx u
uu
− d B x 1


0 0 2" c
A
1 − e−i

Next consider term B of (78). Using Lemma C, we obtain


W ∗ AWPD W ∗ Ac W X
X 
(83)
n

−1
1 −1/2  DD 
= n wx̃ s
wd s
∗ n−1/2 wd s
wx̃ s

n s ∈
n  ∈c
A s A
  −1 
d 1 1 1 e−i f1 

→ Bx u
uu
− dB x 1


0 0 2" c
A
1 − e−i

Combining (82) and (83) in (78) we find3

(84) 
QZ A Q Z A c X
n−1 X  = op 1


As for the limit distribution of :̂A , we have


 
 −1  X

QZ A Q Z ( 
X QZ A Q Z X
(85) n :̂A − :A
=
n2 n
 
 −1  
 
X QZ A Q Z X X QZ A Q Z A c X
− : A − : Ac

n2 n

From Lemma C(c) and (77) above, we have

d
 1
(86) 
QZ A Q Z X
n−2 X →

Bxu Bxu = G
0

3
By changing the probability space (on which the random sequence x̃t is defined), we can ensure
that both terms tend almost surely to the same random variable (see, for example, Theorem 4,
page 47 of Shorack and Wellner (1986)). Then, in the original space the difference of the terms tends
in distribution to zero giving the stated result.
band spectral regression 1099

and from (84)


 −1 X  p

QZ A Q Z A c X
X QZ A Q Z X
(87) → 0
n2 n

Next,


QZ A Q Z (
X  1 
(88) = n−1 wxd s
w(d s
∗ = w 
w 

n s ∈A
n s ∈ xd s ( s
A
  −1

1  ∗ D

D D(
− w xd  s
w d  s
√ 
3
n 2 s ∈A n n

Then, using (76) and proceeding as in the proof of Lemma C(a) above, we find

  
(89) n−1 wxd s
w( s
∗ = n−1 wxd s
w( s
∗ − n−1 wxd s
w( s

s ∈A s ∈ −" " s ∈Ac


n
d
 1
= n−1 x̃d t (t − op 1
→ Bxu dB( 
t=1 0

Further, from Lemma C(b), (i) and Assumption 2, we obtain

  −1   
1  D
D D
( d 1 1 1
(90) wxd s
wd s
∗ √ → Bxu u
uu
u dB( = 0
n
3
2 s ∈A n n 0 0 0

1
since 0
Bxu u
= 0. Combining (88), (89), and (90), we deduce that


QZ A Q Z ( d  1
X
(91) → Bxu dB( 
n 0

1

The limit distribution (91) is a mixture normal distribution with mixing matrix variate 0
Bxu Bxu .
It now follows from (86) and (91) that

 

QZ A Q Z (  d  1
 −1  X −1  1
X QZ A Q Z X

(92) → B xu B xu B xu dB (


n2 n 0 0
 1 −1

≡ MN 0 2"f(( 0
Bxu Bxu 
0

Using (85), (87), and (92), we deduce that

 1
−1  1

d

n :̂A − :A
→ Bxu Bxu Bxu dB( 
0 0

which gives the stated result.


1100 d. corbae, s. ouliaris, and p. c. b. phillips

For the limit of :̂Ac , we need to examine the asymptotic behavior of the bias term in (29), which
depends on the matrix quotient n−1 X 
QZ A c QZ X

 −1 n−1 X

QZ A c QZ A X
.
 Take each of these fac-
tors in turn. First,

X 

QZ A c QZ X 
QD A c QD X
X  
= = n−1 wxd s
wxd s

n n  ∈c s A

 From Lemma C(f) we have


where wxd s
∗ = wx̃ s
∗ − wd s
∗ n−1 D
D
−1 n−1 D
X
.
 d


(93) n−1 wxd s
wxd s
∗ → fxx 
+ 2"
−1 g  Bx
g  Bx
∗ d
c
s ∈c A
A

which is a positive definite limit. Next,


QZ A c QZ A X
n−1 X 


Q Z A c PZ A X
= −n−1 X  = −n−1 X 
Q D A c PD A X

   
 
= − n−1/2 wxd s
wd s
∗ n−1 D
D
−1 n−3/2 wd s
wx̃ s
∗ 
s ∈c s ∈A
A

From Lemma C(g) and (b), we obtain


 1 −1  1
→

QZ A c QZ A X d 1 
(94) n−1 X − − g  Bx
f1 
∗ d uu
uBx

2" cA 0 0

It follows from (93) and (94) and joint weak convergence that the asymptotic bias term for :̂Ac
involves


QZ A c QZ X

n−1 X  −1 n−1 X
QZ A c QZ A X


 −1
d

→ 2"fxx 
+ g  Bx
g  Bx
∗ d
c
A
   1
−1  1

× g  Bx
f1 
∗ d uu
uBx

c 0 0
A

establishing the stated result.

Proof of Theorem 2
: Part (a)# From (64) and (11) we have

 

(95) :̂ = : 1
+ X  −1 X
 QZ A QZ X  QZ A QZ (˜ 

Since (˜ t is a strictly stationary and ergodic sequence with mean zero and satisfies a central limit
theorem, n−1 D
D
−1 n−1 D
(
˜ = Op n−1/2
, and so

p
(96) (˜
t − z
t n−1 Z
Z
−1 n−1 Z
(

˜ = (˜
t − z
t 5−1 −1
−1 −1

n n D D
n D (
˜ → (˜
t 

Then, since :̃ L
has a valid BN decomposition,

(97) 
QZ A QZ (˜ = n−1
n−1 X wx̃d s
w(˜ s
∗ + op 1

s ∈
 
= n−1 wx̃d s
w( s
∗ − n−1 wx̃d s
wv s
∗ :̃ e−is
+ op 1

s ∈ s ∈
band spectral regression 1101

When  = 0, we find as in (89) and Theorem 3.1 of Phillips (1991a) that

 d
 1
(98) n−1 wx̃d s
w( s
∗ → Bxu dB( 
s ∈0 0

and
 d
 1
(99) n−1 wx̃d s
wv s
∗ → Bxu dBx
+ 1x 
s ∈0 0


where 1x = j=− E v0 vj

. We deduce that
 1
 1

d
(100) 
QZ A QZ (˜ →
n−1 X Bxu dB( − Bxu dBx
+ 1x :̃ 1

0 0

Next, as in Lemma C(c) we find that

d
 1
(101) 
QZ A QZ X
n−2 X →

Bxu Bxu 
0

Combining (100) and (101), we obtain


 1
−1  1
 1
 
d

n :̂0 − : 1

→ Bxu Bxu Bxu dB( − Bxu dBx


+ 1x :̃ 1

0 0 0

Now consider the case where  = 0. First we have




QZ A QZ X
m−1 X  = m−1 wxd s
wxd s
∗ 
s ∈

and in a similar fashion to (93) we find that


 d
(102) m−1 wxd s
wxd s
∗ → 2"fxx 
+ g  Bx
g  Bx
∗ = G  say
s ∈

Next, from (15) we have



1
(103) wỹ s
= : 1

wx̃ s
− :̃ eis

wv s
+ w( s
+ Op √ 
n

Now

(104) wyd s
∗ = wỹ s
∗ − wd s
∗ n−1 D
D
−1 n−1 D



and, in view of the cointegrating relation (11), we have


1
(105) n− 2 ỹ nr →d : 1

Bx r
= By r


and it follows as in part (f) of Lemma C that


 1
−1  1

(106) wyd s
∗ ∼ wỹ s
∗ − f1 s
∗ uu
uBx
: 1
+ op 1

0 0

However, from the proof of Lemma C(f) we also have


 1
−1  1

(107) wxd s
∗ = wx s
∗ − f1 s
∗ uu
uBx
+ op 1

0 0
1102 d. corbae, s. ouliaris, and p. c. b. phillips

We deduce from (103), (106), and (107) that


 1
−1  1

(108) wyd s
∗ ∼ wx̃ s
∗ : 1
− f1 s
∗ uu
uBx
: 1
+ w( s

0 0

∗ −is
− wv s
:̃ e
+ op 1

= wxd s
: 1
− wv s
∗ :̃ e−is
+ w( s
∗ + op 1


Then

(109) 
QZ A QZ ỹ = m−1
m−1 X wxd s
wyd s

s ∈

∼ m−1 wxd s
wxd s
∗ : 1

s ∈

− m−1 wxd s
wv s
∗ :̃ e−i
+ op 1

s ∈

Using (51) we find that


 
(110) m−1 wxd s
wv s
∗ = m−1 wx s
wv s
∗ + op 1

s ∈ s ∈

 1
= m−1 w 
w 
∗ + op 1

s ∈
1 − eis v s v s

2"
→p f 

1 − ei vv
It follows from (95), (102), (109), and (110) that
2" 2"
:̂ →d : 1
− G −1 f 
:̃ e−i
= : 1
+ G −1 f 
:̃ e−i
e−i − 1

1 − ei  vv 1 − ei 2  vv
The true coefficient is

: = b −
= : e−i
= : 1
+ :̃ e−i
e−i − 1


so we have
 
2" −1
:̂ →d : + G f 
− 1 :̃ e−i
e−i − 1

1 − ei 2  vv
which gives the stated result since fxx 
= 1 − ei −2 fvv 
.
Part (b)# The estimator :̃ is derived from the augmented spectral regression (32) and the relation
(34). In (32), yt and xt are first detrended in the time domain and then dft’s are taken of the detrended
data and differences of the detrended data. Since w1xd s
∼ wv s
+ op 1
, (108) can be written as

(111) wyd s
∗ = wxd s
∗ : 1
− w1xd s
∗ :̃ e−is
+ w( s
∗ + op 1


First, take  = 0. In this case, the regressors wxd s


and w1xd s
in (111) are asymptotically
orthogonal after appropriate scaling because

(112) m−1/2 n−1 wx̃d s
wv s
∗ = Op m−1/2

s ∈0

in view of (99). It then follows from (98), Lemma C(k), (112), and (111) that
 1 −1  1 
d

n :̃0 − : 1

→ Bxu Bxu Bxu dB( 


0 0

as required.
band spectral regression 1103

Now take the case  = 0. From part (f) of Lemma C and (52) we have

wv s
eis x̃n − x̃0 
D
n−1 D
D
−1 n1/2 wd s

wxz s
= wxd s
= − − n−3/2 X
1−e is 1 − eis n1/2
wv s

D
n−1 D
D
−1 n1/2 wd s

= + w nt s
x̃n − x̃0 − n−3/2 X
1 − eis
w 

= v is + An n1/2 wd s


1−e s

Here, n1/2 wd s
= Op 1
from (42) and
 
x̃ − x̃ 
D
n−1 D
D
−1 = Op 1

A
n = 0 n√ 0 − n−3/2 X
n

Then (111) is
 ∗
wv s

(113) wyd s
∗ = + A n n 1/2
w d  s

: 1
− wv s
∗ :̃ e−is
+ w( s
∗ + op 1

1 − eis
 
: 1

= wv s
∗ − :̃ e−is
+ wd s
∗ n1/2 An
+ w( s
∗ + op 1

1−e s−i

The augmented narrow band regression (32) around frequency  can therefore be written as
 
wv s

(114) wyz s
= ã
1 + A n n 1/2
w d  s

+ ã
2 wv s
+ op 1
+ residual
1 − eis

which is asymptotically equivalent to the regression

(115) wyz s
∗ = wv s
∗ b̃1 − + n1/2 wd s

∗ b̃2 − + residual

with
ã1 −
b̃1 − = + ã2 −  b̃2 = ã1 − An 
1 − e−i

In view of (113), (115), and the asymptotic orthogonality of the regressors in (115) (i.e., m−1
 1/2
s ∈ wv s
n wd s

∗ →p 0), we find that

: 1
:
b̃1 − →p − :̃ e−i
=
1 − e−i 1 − e−i
and
 
√ : 2"f(( 

m b̃1 − − → d N c 0 
1 − e−i 2"fvv 

We deduce that

:̃ = ã1 − + 1 − e−i


ã2 − = b̃1 − 1 − e−i
→p :

and


f 

m :̃ − : →d Nc 0 (( 
fxx 

giving the stated result.


1104 d. corbae, s. ouliaris, and p. c. b. phillips

Proof of Theorem 3: Note that


f
n :̂A − :A
= n−2 X
W ∗ AQAWZ AWX
−1 n−1 X
W ∗ AQAWZ AW(

= n−2 X
QV A QV X
−1 n−1 X
QV A QV (


QV A QV X

= n−2 X  −1 n−1 X

QV A QV (


QA Z A QA Z X

= n−2 X  −1 n−1 X

QA Z A QA Z (


Next,

X 
A − X

QA Z A = X 
PA Z A = X

A − X

A Z Z
A Z
−1 Z
A


A − X
=X 
A D D
A D
−1 D
A 

and so

X  = X

QA Z A Q A Z X 
QA Z A
A QA Z X

 =X

A X

− X

A D D
A D
−1 D
A X



which is

wx̃ s
wx̃ s

s ∈A

−1
  
− wx̃ s
wd s
∗ wd s
wd s
∗ wd s
wx̃ s
∗ 
s ∈A s ∈A s ∈A

Now, as in Lemma C(a), (b) and (i), we have

 d
 1
n−2 wx̃ s
wx̃ s
∗ → Bx Bx

s ∈A 0

 1 
 d
n−3/2 wx̃ s
wd s
∗ →d Bx u

s ∈A 0

  1
n−1 wd s
wd s
∗ → uu

s ∈A 0

Thus,
 1
 1
 1
−1  1
1
d
(116) 
QA Z A Q A Z X
n−2 X → Bx Bx
− Bx u
uu
f0 Bx
= Bxu Bxu


0 0 0 0 0

Next, consider the limit behavior of


QA Z A QA Z ( = n−1 X
n−1 X 
A ( − X
A D D
A D
−1 D
A (

 
= n−1 wx̃ s
w( s
∗ − n−3/2 wx̃ s
wd s

s ∈A s ∈A

−1
 
× n−1 wd s
wd s
∗ n−1/2 wd s
w( s
∗ 
s ∈A s ∈A

As in the proof of (91),

 d
 1
n−1 wx̃ s
w( s
∗ → Bx dB( 
s ∈A 0
band spectral regression 1105

and
 d
 1
n−1/2 wd s
w( s
∗ → u dB( 
s ∈A 0

Thus,
 1
 1
 1
−1  1  1
d

QA Z A Q A Z ( →
n−1 X Bx dB( − Bx u
uu
udB( = Bxu dB( 
0 0 0 0 0

It follows that
 1
−1  1
 1 −1
f d

n :̂A − :A
→ Bxu Bxu Bxu dB( ≡ MN 0 Bxu Bxu 2"f(( 0

0 0 0

giving the stated result for the band A .


For the band cA , we have
√ f
(117) n :̂Ac − :Ac
= n−1 X
W ∗ Ac QAc WZ Ac WX
−1 n−1/2 X
W ∗ Ac QAc WZ Ac W(


QA c Z A c QA c Z X

= n−1 X  −1 n−1/2 X

QA c Z A c QA c Z (


As above,

QA c Z A c QA c Z X
X  = X

QA c Z A c
A c QA c Z X

=X
A c X

− X 
A c D D
A c D
− D
A c X

 
 ∗
 ∗
= wx̃ s
wx̃ s
− wx̃ s
wd s

s ∈c s ∈c
A A
 −1  
 
× wd s
wd s
∗ wd s
wx̃ s
∗ 
s ∈c s ∈c
A A

From the above expression and Lemma C(d), (e), and (h) we deduce that
(118) 
QA c Z A c QA c Z X
n−1 X 
     ei f 
∗ 
d 1 1 1 1
→ fxx 
+ Bx 1
Bx 1

d − Bx 1
d
c
A
2" 1 − e 
i 2 2" c
A
1−e i

  −   −i

1 1 e
× f 
f1 
∗ d f 
dBx 1

2" cA 1 2" cA 1 1 − e−i


  1 1

1

ei f1 



= fxx 
+ B x 1
B x 1
d − B x 1
B x 1
d
c
A
2" 1 − ei 2 2" c
A
1 − ei
 −  −i

e
× f1 
f1 
∗ d f1 
d 
c
A
c
A
1 − e−i
Next, observe that
  −
ei f1 
∗ ∗
d f 1 
f 1 
d f1 

c
A
1 − ei c
A

is the L2 cA
projection of the function ei 1 − ei
−1 onto the space spanned by f1 
. When
the deterministic variable zt includes a linear time trend, we know from (39) that the vector f1 

includes the function ei 1 − ei


−1 as one of its components. Hence, in this case we have
  −
(119) ei 1 − ei
−1 f1 
∗ d f1 
f1 
∗ d f1 
= ei 1 − ei
−1
c c
A A

for  ∈ cA . It follows that (118) is simply c
fxx 
d.
A
1106 d. corbae, s. ouliaris, and p. c. b. phillips

Proceeding with the proof, the second factor of (117) decomposes as



QA c Z A c QA c Z (
n−1/2 X

= n−1/2 wx̃ s
w( s

s ∈c
A
  −  
−1/2
 ∗
 ∗
 ∗
− n wx̃ s
wd s
wd s
wd s
wd s
w( s

s ∈c s ∈c s ∈c
A A A

Using (119) and the independence of x̃t and (t we find


d 

QA c Z A c QA c Z ( ∼ n−1/2
n−1/2 X wx̃ s
w( s

s ∈c
A
 ei f 
∗ −
1 1 1  ∗
− Bx 1
d f 
f 
d
2" c
A
1 − ei 2" cA 1 1


×n−1/2 f1 s
w( s

s ∈c
A
  ei f 

d  1 1
∼ n−1/2 wx̃ s
− Bx 1
d
s ∈c
2" c
A
1 − ei
A
− 
1 
× f1 
f1 
∗ d f1 s
w( s

2" cA
 
 eis
= n−1/2 wx̃ s
− Bx 1
w( s

 ∈c
1−e si
s A
 
 1 eis x̃n − x̃0 eis
= n−1/2 w 
− − Bx 1

s ∈c
1 − eis J s 1 − eis n1/2 1 − eis
A

× w( s

 
 1
= n−1/2 w J  s
w( s
∗ + op 1

s ∈c
1 − eis
A
 
d 2"  1
∼ N 0 wJ s
wJ s
∗ f(( s

n  ∈c 1 − e 
is 2
s A

d
∼ N 0 2" fxx 
f(( 
d 
c
A

We deduce that
  −1   −1 
√ d
 
f
n :̂Ac − :Ac
→ N 0 fxx 
d 2" fxx 
f(( 
d fxx 
d 
c c c
A A A

giving the stated result.

Proof of Theorem 3
: Part (a)# First note that

wx s
= %2
wz s
+ wx̃ s
= %2
5n wd s
+ wx̃ s


with a similar expression for wy s


. The regression (45) is equivalent to the following regression
with frequency domain detrended data:
f
f
f
(120) wỹd s
= c̃1 wx̃d s
+ c̃2 w1x̃d s
+ residual
band spectral regression 1107

where
 −1  
f  
wxd s
∗ = wx s
∗ − wd s
∗ wd s
wd s
∗ wd s
wx s

s ∈ s ∈

 −1  
 
= wx̃ s
∗ − wd s
∗ wd s
wd s
∗ wd s
wx̃ s

s ∈ s ∈

f ∗
= w s

x̃d say

f f
with a similar expression for wyd s
= wỹd s
, and
 −1  
f  
w1x̃d s
∗ = w1x̃ s
∗ − wd s
∗ wd s
wd s
∗ wd s
w1x̃ s

s ∈ s ∈


= wv s
+ op 1


For  = 0, as in the proof of Theorem 2


, the regressors in (120) are asymptotically orthogonal
because
 f
n−3/2 wx̃d s
wv s
∗ = Op n−1/2

s ∈0

Next, using Lemma C(j)–(m), we find as in (116) above that

 d
 1
f f
n−2 wx̃d s
wx̃d s
∗ →

Bxu Bxu 
s ∈0 0

and as in (89) we get

 d
 1
f
n−1 wx̃d s
w( s
∗ → Bxu dB( 
s ∈0 0

It follows using (15) and these two results that


 −1  1 
  d 1
n :̃f − :0 →

Bxu Bxu Bxu dB( 


0 0

Next consider the  = 0 case. First we simplify the regression equation (120):

f
f
f
(121) wỹd s
= c̃1 wx̃d s
+ c̃2 w1x̃d s
+ residual

f

= c̃1 wx̃d s
+ c̃2 wv s
+ op 1
+ residual

Using (73) and Lemma C(n)–(o) we obtain


 −1  
f  
(122) wx̃d s
∗ = wx̃ s
∗ − wd s
∗ wd s
wd s
∗ wd s
wx̃ s

s ∈ s ∈

∗ −is
wv s
e x̃n − x̃0
= −
1 − e−is 1 − e−is n1/2
 −  
1 m m e−i
− √ f1 
∗ f1 
f1 
∗ √ f1 
B 1

n n n 1 − e−i
1108 d. corbae, s. ouliaris, and p. c. b. phillips

wv s
∗ e−is x̃n − x̃0 e−i
= − − B 1

1−e s−i 1−e s n


−i 1/2 1 − e−i
w s

= v −i + op 1

1−e s
So the regression (121) is asymptotically equivalent to
 ∗
f wv s

(123) wỹd s
∗ = + o p 1
c̃1 + wv s
+ op 1
∗ c̃2 + residual
1 − eis

From (122) and (15) we deduce, as in Theorem 2


, that

f c̃1 − : 1
:
b̃1 − #= + c̃2 −p →p − :̃ e−i
=
1 − e−i 1 − e−i 1 − e−i
and
 
√ : f(( 

m b̃1 − − → d N c 0 
1 − e−i fvv 

It follows that

:̃f = c̃1 − + 1 − e−i


c̃2 − = b̃1 − 1 − e−i
→p :

and


f 

m :̃f − : →d Nc 0 (( 
fxx 

REFERENCES

Choi, I., and P. C. B. Phillips (1993): “Testing for a Unit Root by Frequency Domain Regression,”
Journal of Econometrics, 59, 263–286.
Corbae, D., S. Ouliaris, and P. C. B. Phillips (1997): “Band Spectral Regression with Trending
Data,” Cowles Foundation Discussion Paper, Yale University, 1163.
Engle, R. (1974): “Band Spectrum Regression,” International Economic Review, 15, 1–11.
Frisch, R., and F. V. Waugh (1993): “Partial Time Series Regression as Compared with Individual
Trends,” Econometrica, 1, 211–223.
Geweke, J. (1982): “Measurement of Linear Dependence and Feedback Between Multiple Time
Series,” Journal of the American Statistical Association, 77, 304–313.
Gradshteyn, I. S., and I. M. Ryzhik (1965): Tables of Integrals, Series and Products. New York:
Academic Press.
Granger, C. W. J., and J.-L Lin (1995): “Causality in the Long Run,” Econometric Theory, 11,
530–536.
Grenander, U., and M. Rosenblatt (1957): Statistical Analysis of Stationary Time Series.
New York: Wiley.
Hannan, E. J. (1963a): “Regression for Time Series,” in Time Series Analysis, ed. by M. Rosenblatt.
New York: Wiley.
(1963b): “Regression for Time Series with Errors of Measurement,” Biometrika, 50, 293–302.
(1970): Multiple Time Series. New York: Wiley.
(1973): “Central Limit Theorems for Time Series Regression,” Zeitschrift für Wahrschein-
lichkeitsheorie und verwandte Gebiete, 26, 157–170.
Hannan, E. J., and P. J. Thomson (1971): “Spectral Inference over Narrow Bands,” Journal of
Applied Probability, 8, 157–169.
Hannan, E. J., and P. M. Robinson (1973): “Lagged Regression with Unknown Lags,” Journal of
the Royal Statistical Society, Series B, 35, 252–267.
band spectral regression 1109

Hosoya, Y. (1991): “The Decomposition and Measurement of the Interdependence between


Second-order Stationary Processes,” Probability Theory and Related Fields, 88, 429–444.
Kim, C., and P. C. B. Phillips (1999): “Fully Modified Estimation of Fractional Cointegration
Models,” Yale University, Mimeo.
Marinucci, D. (1998): “Band Spectrum Regression for Cointegrated Time Series with Long Mem-
ory Innovations,” LSE Econometrics Discussion Paper #353.
Phillips, P. C. B. (1991a): “Spectral Regression for Co-integrated Time Series,” in Nonparamet-
ric and Semiparametric Methods in Economics and Statistics, ed. by W. Barnett, J. Powell, and
G. Tauchen. Cambridge: Cambridge University Press.
(1991b): “Optimal Inference in Cointegrated Systems,” Econometrica, 59, 283–306.
(1999): “Discrete Fourier Transforms of Fractional Processes,” Cowles Foundation Discussion
Paper #1243, Yale University.
(2000): “Unit Root Log Periodogram Regression,” Cowles Foundation Discussion Paper
#1244, Yale University.
Phillips, P. C. B., and B. Hansen (1990): “Statistical Inference in Instrumental Variable Regres-
sion with I(1) processes,” Review of Economic Studies, 57, 99–125.
Phillips, P. C. B., and C. C. Lee (1996): “Efficiency Gains from Quasi-differencing under Nonsta-
tionarity,” in Athens Conference on Applied Probability and Time Series: Essays in Memory of E. J.
Hannan, ed. by P. M. Robinson and M. Rosenblatt. New York: Springer-Verlag.
Phillips, P. C. B., and M. Loretan (1991): “Estimating Long-run Economic Equilibria,” Review
of Economic Studies, 59, 407–436.
Phillips, P. C. B., and V. Solo (1992): “Asymptotics for Linear Processes,” Annals of Statistics,
20, 971–1001.
Robinson, P. M. (1991): “Automatic Frequency Domain Inference,” Econometrica, 59, 1329–1363.
Robinson P. M., and D. Marinucci (1998): “Semiparametric Frequency Domain Analysis of
Fractional Cointegration,” LSE Econometrics Discussion Paper #348.
Saikkonen, P. (1991): “Asymptotically Efficient Estimation of Cointegration Regressions,” Econo-
metric Theory, 7, 1–21.
Shorack, G. R., and J. A. Wellner (1986): Empirical Processes with Applications to Statistics.
New York: Wiley.
Stock, J. H., and M. W. Watson (1993): “A Simple Estimator of Cointegrating Vectors in Higher
Order Integrated Systems,” Econometrica, 61, 783–820.
Xiao, Zhijie, and Peter C. B. Phillips (1998): “Higher Order Approximations for Frequency
Domain Time Series Regression,” Journal of Econometrics, 86, 297–336.
(1999): “Higher Order Approximations for Wald Statistics in Time Series Regressions with
Integrated Processes,” Yale University, Mimeo.