You are on page 1of 10

Massachusetts Institute of Technology Department of Economics

Time Series 14.384

Guido Kuersteiner

Lecture Note 3: Spectral Representation of Stationary Processes We have seen in the previous lecture that the dependence properties of a stationary time series can be described by the autocovariance function (h). It is shown in Appendix B that () can be expressed in terms of the spectral distribution function F () xx (h) = Z eih dF (). (3.1)

The spectral distribution measures the fraction of the total variance of the process (0) that can be attributed to a certain interval of frequencies.. If for example we have monthly data then one month corresponds to 2, two month correspond to , one year corresponds to /6 . To measure the fraction of the variance generated by cyclical components of more than one year cycle length we would consider F (/6) F (0). Appendix A discusses a simple process that illustrates this interpretation. If the distribution function F () has a density f () then (3.1) can be written as xx (h) = Z eih f ()d (3.2)

where xx (h) is the Fourier transform of f (). If moreover the spectral density f () L2 [, ], the space of square integrable functions, then the inverse Fourier transform exists and is given by f () =
1 X xx (h)eih . 2 h=


is This relationship must hold in L2 [, ] since L2 [, ] is a Hilbert space with bas {eit , t Z}. It thus follows from (3.2) that the autocovariance function xx (h) = f (), eih is the regres2 sion P coecient of a projection of f () onto the basis vectors. If f () L [, ] then the sum ih converges in mean square and the limit is f () almost everywhere. h= xx (h)e P In the context of ARMA models it is the case that h= | xx (h)| < . Then the series P xx (h)eih converges absolutely and uniformly and the limit is f () almost everywhere. h= In this case it is therefore common to dene the spectral density directly as in (3.3). 3.1. Properties of Spectral Densities P To simplify the argument we assume that h= | xx (h)| < . In this case we can establish the properties of f () based on its Fourier Approximation (3.3). The properties discussed in this section hold however for general stationary processes. If Xt is a real valued, weakly stationary process, then xx (h) = xx (h). It then follows that f () = f(). To show that f () 0 we introduce the following concept which is of independent importance. The Cesaro mean of a sequence {aj } is j=0 P Pn1 1 dened to be the average of the rst n partial sums. Let Sk = k=0 aj and dene n = n k=0 Sk . j Then n1 X k (1 )ak . n = n k=0 Pk We want to show that n S if j=0 aj S. This follows from the Toeplitz Lemma.

Lemma 3.1 (Toeplitz). Let {Sn } be a P bounded sequence for which Sn S as n . Let n wni > 0 be an array of weights for which i=1 wni = 1 for all n and wni 0 for all i as n . Then n X wni Si S.

Proof. For any > 0 consider n N n X X X wni Si S = wni (Si S) + wni (Si S)
i=1 i=1 i=N+1 n X

N X i=1

wni |Si S| +


wni |Si S|


and choose n such that wni /2(N max |Si S|)1 and N such that |Si S| /2 for all i > N. Then N N X X wni /2 wni |Si S| max |Si S|
i=1 i=1 n X

P where the last inequality follows from n wni = 1. The result now follows since was arbitrary. i=1 By setting wni = n1 it now follows immediately that n S if Sn S. Going back to the spectral density f () we can now state that 1 2 if Pn1
ih h=n+1 xx (h)e n1 X


wni |Si S| /2


n X

wni /2



|h| ) (h)eih f () n xx

f (). Letting fn () be the nth Cesaro mean we can now see that |h| ) (h)eih n xx h=n+1 n X 2 1 E (xt Ext )eit 0 for all n. 2n (1

fn () =

1 2

n1 X

Thus lim inf fn () 0 such that f () 0. n R Finally we also note that f ()d = xx (0) < . We summarize these results in the following theorem. Theorem 3.2 (Spectral Density). If a real valued weakly stationary process has a spectral density f () then f () satises 1. f () = f () 2. f () 0 R 3. f ()d = xx (0) < .

We now turn to the characterization of spectral densities for ARM A models. 2

3.2. Spectral Densities for ARMA Processes In this section we will exploit the fact that spectral densities can be obtained by looking at their Fourier approximation (3.3). In some cases this allows to nd specic functional forms for the spectral density. One case is the class of ARMA processes. We rst consider the spectral density for the linear P P a process Yt = j= j Xtj with j= j < , where Xt isP zero mean stationary process P with spectral distribution function Fx (). We know that yy (h) = j=0 k=0 j k xx (h j + k) R ih e dFx () so that where xx (h) =

yy (h) =

j= k=

P 2 ij If Xt has a spectral density fx () then it follows that fy () = fx (). This can j= j e also be seen by looking at the Fourier approximation of fy () directly. We assume in addition that the autocovariance function of Xt , xx (h), is absolutely summable. Then the Fourier approximation of fy () is fy () = =
1 X yy (h)eih 2

2 X eih j eij dFx () j=

j k

ei(hj+k) dFx ()

A more compact formulation of the spectral density of a P ltered time series can be given by dening the innite order lag polynomial of the lter as (L) = j= j Lj . It now follows that 2 fy () = (ei ) fx (). We turn now to the spectral density of the ARMA(p,q) model. Theorem 3.3 (ARMA(p,q) Spectral Density). Let {Xt } be an ARMA(p,q) process satisfying (L)Xt = (L)t t W N (0, 2 ) (3.4) where (L) = 1 1 L1 ... p Lp and (L) = 1 + 1 L1 + ...+ q Lq have no common zeros and (L) has all roots outside the unit circle. The {Xt } has spectral density 2 2 (ei ) fx () = 2 |(ei )|2 3

j k xx (h j + k)eih j= k= h= X X 1 X j k ei(kj) xx (h j + k)ei(hj+k) 2 j= k= h= X X j k ei(kj) fx () j= k= 2 X ij j e fx (). j=0

1 2

h= X

P j < . Since f () = 2 , it follows from the 2 2 2 previous result that fx () exists and is equal to (ei ) . The ltered innovations (L)t have 2 2 2 spectral density 2 (ei ) by the previous theorem. The same argument implies that (L)Xt has 2 spectral density (ei ) fx (). Since the RHS and LHS of (3.4) have the same covariances they 2 2 2 also have the same spectral density such that (ei ) fx () = 2 (ei ) . Proof. First note that Xt = P
j=0 j tj


3.3. Linear lters

Some times we do not wish to analyze an original series yt but only a ltered version of it. Leading examples are found in the business cycles literature where the business cycle is often dened as a deviation from a trend. Here we consider some of the properties of linear lters. Taking rst dierences of a time series for example can be viewed as a linear lter. Let xt be the original series and yt = xt the ltered series. Then yt =

j xtj ,


with 0 = 1, 1 = 1and j = 0 otherwise. P P j More generally we can therefore look at a lter (L) = | j= j L , where we require j |< . We have already seen that the spectral density of the ltered series yt is related to the spectral density of the original series by fy () =| (ei ) |2 fx ()

P where (ei ) = j= j eij is called the frequency response of the lter. The term | (ei ) |2 is called the power transfer function. Intuitively this function determines how the (power) spectrum is changed by applying the lter. The factor by which the amplitude of a cyclical component is charged is measured by the modulus of the frequency response function, | (ei ) |. This term is called the gain of the lter. Another way in which a lter alters the properties of a time series is by shifting it. This is called the phase shift. Letting X (ei ) = j [cos(j) i sin (j)] and dening = arctan
P j sin(j) P j cos(j)

then leads to X j eij | ei .

(ei ) =|

Note that is the P phase shift. A symmetric lter for which i = i does not have a phase shift, i.e. = 0 because j= j sin(j) = 0 from sin(x) = sin(x). We consider a very simple lter which lags xt by one period, i.e., yt = xt1 then (ei ) = ei such that the gain is | ei |= 1 and the phase shift is sin = arctan cos = so the phase is a measure of shift in the time domain.

We now look at more realistic lters. The k period dierencing lter yt = xt xtk has a frequency response (ei ) = 1 eik . The gain is therefore | (ei ) |= and the phase is sin k = arctan . 1 cos k The two-sided moving average lter is given by
m X 1 xti yt = 2m + 1 j=m

2(1 cos(k))1/2

Then (ei ) = =
m X 1 eij 2m + 1 j=m

sin((2m + 1)/2) (2m + 1) sin(/2)

A lter which is popular in the business cycles literature is the HP lter. It can be motivated in the time domain as the following smoothing problem min+1 g T
T X g g g (yt yt )2 + (yt+1 yt )2 .

{yt }t=0 t=1

g g If = 0 then yt = yt and no smoothing occurs. If > 0 then yt is chosen to follow yt as closely as possible measured by the squared tracking error but penalized by too strong changes in the growth rate. If we ignore details at the beginning and at the end of the sample, then the rst-order condition takes the form

such that solving for yt

g g g g g 0 = 2(yt yt ) + 2 (yt yt1 ) (yt1 yt2 ) g g g g 4 (yt+1 yt ) (yt yt1 ) g g g g +2 yt+2 yt+1 ) (yt+1 yt )
g g g g g yt = yt+2 4yt+1 + (6 + 1)yt 4yt1 + yt2

and using lag operator notation g yt = (1 L)2 (1 L1 )2 + 1 yt

g yt =

so that the smoothed version of yt is given by

1 yt . 1 + (1 L)2 (1 L1 )2 (1 L)2 (1 L1 )2 yt 1 + (1 L)2 (1 L1 )2 5

g g c yt is called the trend component, and yt = yt yt is the cyclical component. It follows at once that c yt =

The frequency response of the lter is then given by (ei ) = 4(1 cos())2 1 + 4(1 cos())2

The gain can now be analyzed. For = 0 we have (1) = 0. Moreover if is large then (ei ) 1 for not close to zero. This shows that the lter has certain optimality properties. It doesnt aect the short-run frequencies and removes the long-run frequencies.

A. Interpretation of the Spectral Measure

In Lecture 2, we have considered linear processes of the form Xt =
X j=0

j tj

where {t } t= is a sequence of white noise random variables. We will show later on, that every weakly stationary process can be represented in this form. Here, we are concerned with an alternative representation of weakly stationary processes, called the spectral representation. Suppose rst that Xt is given by a complex valued process Xt =
n X j=1

Aj eitj

where < 1 < ... < n = and Aj are uncorrelated complex valued random variables such that E(Aj ) = 0 and E(Aj Aj ) = 2 for all j = 1, ..., n. If Aj = Anj and j = nj for all j = 1, ..., n j 2 then Xt is real valued. To see this put Aj = Yj iZj with E(Yj ) = E(Zj ) = 0, E(Yj2 ) = E(Zj ) = 2 j /2 and E(Yi Yj ) = E(Zi Zj ) = 0 for all i 6= j and i 6= n j. Moreover, E(Yi Zj ) = 0 for all i and for all j. Using DeMoivres formula eitj = cos(tj ) + i sin(tj ) leads to Xt =
n X j=1

Yj cos(j t) +

n X j=1

Zj sin(j t)


where the complex terms cancel because Zj = Znj and sin(j ) = sin(nj ). An alternative way of writing (A.1) is h X rj cos(j t j ) Xt =

q 2 where rj = Yj2 + Zj is the j th amplitude and j = arctan(Zj /Yj ) is the j th phase. It can be seen from this representation that Xt is the sum of h dierent cosine waves with random amplitude rj and random phase shift j . The autocovariance function of Xt is now given by E(Xt ) = 0 and E(Xt Xt+h ) = = Pn
n X j=0 n X j=0

2 [cos(j t) cos(j (t + h)) + sin(j t) sin(j (t + h))] j 2 cos(j h) j

2 such that E(Xt ) = j =0 2 . If we consider the generating mechanism (1) we can now interpret 2 j j as the contribution of the frequency j to the overall variance. Note that for large j , cos(j t) and sin(j t) complete their periodic oscillation frequently. Without loss of generality, we can order the frequencies 0 < 1 < 2 < ... < n < . We introduce the step function F (), called the spectral distribution function by dening dF () = 0 if 6= j and dF (j ) = 2 . The distribution function is j then dened as the Stieltjes integral of dF (). More explicitly this is written as

so that we can write in terms of the Stieltjes integral

2 E(Xt )

F () =

0 2 1 . .

j=1 k P

< 1 1 < 2 .
2 j k1 < k .
2 j n

n P


dF ()

and E(Xt Xt+h ) =

n X j=1 Z

2 (cos(j h) + i sin(j h)) = j

n X j=1

2 eij h j

eih dF ()

where we have used De Moivres theorem: eih = cos(h) + i sin(h). It can be shown that every zero mean stationary process has a representation which generalizes (A.1), namely Xt = Z eit dZ() (A.2)

where Z() is an orthogonal increment process such that hZ(), Z()i < , hZ(), 1i = 0 and for 1 < 2 < 3 < 4 such that (1 , 2 ] (3 , 4 ] = and hZ(2 ) Z(1 ), Z(4 ) Z(3 )i = 0. The distribution function F () is then dened by F () = 0 for , F () = F () for and F () F () = kZ() Z()k2 for .

B. Proof of Existence of a Spectral Measure (strictly optional)

In this Appendix we prove that the covariance function of a zero mean stationary process can be represented by a spectral distribution function. Before the proof we need to introduce a few concepts. Denition B.1. measure P if A sequence of probability measures {Pn } converges weakly to the probability Z Z f (x)dPn (x) f (x)dP (x)

for all bounded and continuous functions f. Denition B.2. A family P of probability measures is relatively compact if every sequence of probability measures from P contains a subsequence which converges weakly to a probability measure. 8

Remark 1. The subsequence need not necessarily converge to a member in the original class. This is why the convergence is called relative. Denition B.3. A family of probability measures P is tight if for every > 0 there is a compact set K R such that sup P (R\K) <

Theorem B.4 (Prokhorov). Let P be a family of probability measures dened on R. Then P is relatively compact if and only if it is tight. We now prove (3.1). Theorem B.5 (Herglotz). Let xx (h) be the covariance function of a stationary random sequence {Xt } t= with zero mean. Then there is a nite measure F : [, ] R such that xx (h) = Proof. For N 1 and [, ] put fN () =
N N 1 XX (k l)eik eil . 2N k=1 l=1 xx

eih dF ()

It follows that fN () 0 since xx (k l) is non-negative denite. Note that fN () does not necessarily converge as N .We can write fN () = and let FN (1 , 2 ) = 1 X |m| ) (m)eim (1 2 N xx
|m|<N Z2

fN ()d for 1 2 .

The function FN (, 2 ) is nondecreasing, continuous from the right and has a limit from the left. Also FN (, ) = 0 and FN (, ) = xx (0). Then Z e

dFN () =


( 1 fN ()d = 0

|h| N

xx (h) for |h| < N for |h| > N

The measures FN for N 1 are supported on [, ] and FN (, ) = xx (0) for all N 1. Therefore, the family of measures {FN } is tight and by Prokhorovs theorem, relatively compact. w w Therefore there exists a subsequence {Nk } {N } such that FNk F where denotes weak convergence. It then follows that Z eih dF () = lim Z eih dFNk () = xx (h)


Remark 2. The complications in this theoremP come from the fact that we only assume stationarity of the process. If we impose in addition that | xx (h)| < then fN () converges as N . P Since |fN ()| < | xx (h)| it follows that f L1 [, ] by the Dominated Convergence Theorem. The next two results are concerned with the form of convergence of the Fourier Approximation to the spectral density. To show that the Fourier Approximation converges almost surely to the spectral density when f L2 [, ] we rst prove the following proposition. Proposition B.6. If f L2 [, ] and f, eij = 0 j Z, then f = 0 almost everywhere.

Rb Proof. By a monotone class argument it is enough to show that a f ()d = 0 for all sub-intervals . of [, ] . Let I(a, b) be the indicato function on this interval (a,b). Clearly I(a, L2 [, ] r b) Since the nite linear combinations of eij are dense in L2 [, ], there is a g sp eiji ; j1 , ..., jn such that kg I(a, b)k < for any 0. Then Z
b a

f ()d = hf, I(a, b)i hf, gi + kfk kg I(a, b)k 0 + kf k

The result now follows since > 0 was arbitrary. Theorem B.7. If the spectral density fx () L2 [, ] then fx () =
1 X (j)eij almost everywhere. 2 j= xx

R Proof. The space L2 [, ] is a Hilbert space with inner product hf, gi = f gd for all f, g ik L2 [, ] . The functions e ; k Z form an orthonormal basis on L2 [, ] . This can be seen by checking that Z ik ij 2 if k = j i(kj) = e d = e ,e 0 if k 6= j

and he ikusing t Stone-Weierstrass Theorem to show that the set of nite linear combinations of e ; k Z is dense in L2 [, ]. 2 P By Bessels inequality we have |j|n fy (), eij kf k2 . Thus n 2 m X X X 2 ih ih yy (h)e yy (h)e fy (), eij 0.
h=n h=m |j|m

Pn This shows that h=n yy (h)eih is a Cauchy sequence and therefore has a mean square limit P ih that we will denote by . It therefore remains to show that this mean square h= yy (h)e P ih ij limit coincides with fy (). Note that fy (), eij = = yy (j), j so ,e h= yy (h)e P ih ij = 0, j. Using the previous proposition, therefore shows that that fy () yy (h)e , e P fy () = yy (h)eih almost everywhere.