You are on page 1of 82

# Time Series Analysis and Its Applications

Edition 2
Instructors Manual

## c 2006, R.H. Shumway and D.S. Stoer. Please Do Not Reproduce.



Chapter 1

Chapter 1

1.1 The major dierences are how quickly the signal dies out in the explosion versus the earthquake and
the larger amplitude of the signals in the explosion.
x = matrix(scan("/mydata/eq5exp6.dat"), ncol=2)
plot.ts(x[,1], col="blue", main="EQ-blue EXP-red", ylab="")
lines(x[,2], col="red")
1.2 Consider a signal plus noise model of the general form xt = st + wt , where wt is Gaussian white noise
2
= 1. Simulate and plot n = 200 observations from each of the following two models.
with w
(a) Below is R code for this problem. Figure 1 shows contrived data simulated according to this
model. The modulating functions are also plotted.
w = rnorm(200,0,1)
t = 1:100
y = cos(2*pi*t/4)
e = 10*exp(-t/20)
s = c(rep(0,100), y*e )
x = s+w
par(mfrow=c(2,1))
ts.plot(s, main="signal")
ts.plot(x, main="signal+noise")
(b) This is similar to part (a). The plots according to the model in this part are also shown and we
note that the second modulating function has less decay and produces a longer signal.
(c) The rst signal bears a striking resemblance to the two arrival phases in the explosion. The second
signal decays more slowly and looks more like the earthquake. The periodic behavior is emulated
by the cosine function which will make one cycle every four points. If we assume that the data
are sampled at 40 points per second, the data will make 10 cycles in a second. This is a bit high
for earthquakes and explosions, which will generally make about 1 cycle per second (see Figure
3.10).
1.3 Below is R code for parts (a)-(c). In all cases the moving average nearly annihilates (completely in the
2nd case) the signal. The signals in part (a) and (c) are similar.
w = rnorm(150,0,1) # 50 extra to avoid startup problems
x = filter(w, filter=c(0,-.9), method="recursive")
x = x[51:150]
x2 = 2*cos(2*pi*(1:100)/4)
x3 = x2 + rnorm(100,0,1)
v = filter(x, rep(1,4)/4) # moving average
v2 = filter(x2, rep(1,4)/4) # moving average
v3 = filter(x3, rep(1,4)/4) # moving average
par(mfrow=c(3,1))
plot.ts(x)
lines(v,lty="dashed")
plot.ts(x2)
lines(v2,lty="dashed")
plot.ts(x3)
lines(v3,lty="dashed")

Chapter 1

Series (a)

Modulator (a)

10

1
0.8

0.6
0
0.4
5

10

0.2

50

100

150

200

50

Series (b)

100

150

200

150

200

Modulator (b)

15

10

0.9
0.8

0.7
0
0.6
5

0.5

10
15

0.4
0

50

100

150

0.3

200

50

100

Series (a)
30
20
10
0
10
20
30

20

40

60

80

100

120

140

160

180

200

120

140

160

180

200

Series (b)
60
40
20
0
20
40

20

40

60

80

100

## Figure 2: Simulated series with autoregressive modulations for Problem 1.2.

1.4 Simply expand the binomial product inside the expectation and use the fact that t is a nonrandom
constant, i.e.,
(s, t)

= E[(xs xt s xt xs t + s t ]
= E(xs xt ) s E(xt ) E(xs )t + s t
= E(xs xt ) s t s t + s t

1.5 For (a) and (b) Ext = st . To get Figure 3, just plot the signal (s) in Problem 1.2. Note that the
autocovariance function
(t, u) = E[(xt st )(xu su ) = E(wt wu ),
which is one when t = u and zero otherwise.
1.6 (a) Since Ext = 1 + 2 t, the mean is not constant, i.e., does not satisfy (1.17). Note that
xt xt1

= 1 + 2 t + wt 1 2 (t 1) wt1
= 2 + wt wt1 ,

Chapter 1

3
Mean Series (a)
10

10

20

40

60

80

100

120

140

160

180

200

120

140

160

180

200

10

10

20

40

60

80

100

## Figure 3: Mean functions for Problem 1.4.

which is clearly stationary. Verify that the mean is 2 and the autocovariance is 2 for s = 2 and
1 for |s t| = 1 and is zero for |s t| > 1.
(b) First, write
E(yt ) =
=

q

1
[(1 + 2 (t j)]
2q + 1 j=q


q

1
j
(2q + 1)(1 + 2 t) 2
2q + 1
j=q

= 1 + 2 t
because the positive and negative terms in the last sum cancel out. To get the covariance write
the process as


yt =
aj wtj ,
j=

where aj = 1, j = q, . . . , 0, . . . , q and is zero otherwise. To get the covariance, note that we need
y (h) = E[(yt+h Eyt+h )(yt Eyt )]

aj ak Ewt+hj wtk
= (2q + 1)2
j

(2q + 1)2

aj ak h+kj ,

j,k

aj+h aj ,

j=

where h+kj = 1, j = k + h and is zero otherwise. Writing out the terms in y (h), for h =
0, 1, 2, . . ., we obtain
2 (2q + 1 |h|)
y (h) =
(2q + 1)2
for h = 0, 1, 2, . . . , 2q and zero for |h| > q.
1.7 By a computation analogous to that appearing in Example 1.17, we may obtain

6 2 h = 0

w
2
h = 1
4w
(h) =
2

h = 2

w
0
|h| > 2.

Chapter 1

2
.
The autocorrelation is obtained by dividing the autocovariances by (0) = 6w
s
1.8 (a) Simply substitute s + k=1 wk for xs to see that

t
t1


wk = + (t 1) +
wk + wt .
t +
k=1

k=1

## Alternately, the result can be shown by induction.

(b) Note rst that

t
t


Ext = E t +
wk = t +
Ewk = t.
k=1

k=1

(s, t)

## cov(xs , xt ) = E{(xs s)(xt t)}



s
t

wj
wk
= E

j=1

k=1

= E (w1 + + ws )(w1 + + ws + ws+1 + . . . + wt )
=

s


2
E(wj2 ) = s w

j=1

2
2
2 , which yields the result. The implication is
(c) From (b), x (t 1, t) = (t 1)w
/ (t 1)w
tw
that the series tends to change slowly.
(d) The series is nonstationary because both the mean function and the autocovariance function
depend on time, t.
(e) One possibility is to note that xt = xt xt1 = + wt , which is stationary.
1.9 Note that E(U1 ) = E(U2 ) = 0, implying Ext = 0. Then,
(h)

= E(xt+h xt )



= E U1 sin[20 (t + h)] + U2 cos[20 (t + h)]



U1 sin[20 t] + U2 cos[20 t]


2
= w sin[20 (t + h)] sin[20 t] + cos[20 (t + h)] cos[20 t]
2
= w
cos[20 (t + h) 20 t]
2
= w
cos[20 h]

## by the standard trigonometric identity, cos(A B) = sin A sin B + cos A cos B.

1.10 (a) Note rst that

M SE(A) = E

x2t+

2AE(xt+ xt ) + A

## = (0) 2A() + A2 (0)

Setting the derivative with respect to A to zero yields
2() + 2A(0) = 0
and solving gives the required value.

E(x2t )

Chapter 1

(b)


()()
2
+ ()
M SE(A) = (0) 1 2
(0)


2
2
= (0) 1 2 () + ()


2
= (0) 1 ()

## (c) If xt+ = Axt with probability one, then



2
E(xt+ Axt ) = (0) 1 () = 0
2

## implying that () = 1. Since A = (), the conclusion follows.


1.11 (a) Since xt = j= j wtj ,
(h) =

2
j wt+hj wtk k = w

j= k=

2
j k hj+k = w

j,k

k+h k ,

k=

## where t = 1 for t = 0 and is zero otherwise.

(b) Consider the approximation

n


xnt =

j wtj .

j=n


xt xnt =
j wtj ,
|j|>n

so that
E[(xt xnt )2 ]

 

j k E(wtj wtk )

|j|>n |k|>n

 

## |j ||k |E 1/2 [(wtj )2 ]E 1/2 [(wtk )2 ]

|j|>n |k|>n
2
w

|j |

|j|>n

2
= w



|k |

|k|>n

2
|j | ,

|j|>n


which converges to zero as n . Actually, in the white noise case, j |j |2 < would be
enough, as can be seen by following through the same argument as above.
1.12
xy (h) = E[(xt+h x )(yt y )] = E[(yt y )(xt+h x )] = yx (h)
1.13 (a)

2
(1 + 2 ) + u2
w
2
y (h) = w

h=0
h = 1
|h| > 1.

Chapter 1

(b)

2
h=0
w
2
w
h = 1
0
otherwise.

2
x (h) = w h = 0
0
otherwise.

xy (h) =

xy (h) =

xy (h)
x (0)y (0)

## for h = 0, 1 and is zero otherwise.

(c) The processes are jointly stationary because the autocovariance and cross-covariance functions
depend only on lag h.
1.14 (a) For the mean, write
E(yt )

= E(exp{xt }

1
= exp x + (0) ,
2

## using the given equation at = 1.

(b) For the autocovariance function, note that


E(yt+h yt ) = E exp{xt+h } exp{xt }


= E exp{xt+h + xt }
= exp{2x + (0) + (h)},
since xt + xt+h is the sum of two correlated normal random variables and will be normally
distributed with mean 2x and variance


(0) + (0) + 2(h) = 2 (0) + (h)
For the autocovariance of yt
y (h)

## = E(yt+h yt ) E(yt+h )E(yt )

2


1
= exp{2x + (0) + (h)} exp x + (0)
2


= exp{2x + (0)} exp{(h)} 1 .

## 1.15 The process is stationary because

E(xt ) = E(wt wt1 ) = E(wt )E(wt1 ) = 0,
(0)

## = E(wt wt1 wt wt1 )

2
= E(wt2 )E(wt1
)
2 2
= w w
4
= w
,

(1)

= E(wt+1 wt wt wt1 )
= E(wt+1 )E(wt2 )E(wt1 )
= 0
= (1),

and similar computations establish that (h) = 0, |h| 1. The series is white noise.

Chapter 1

0.5
x

0
x2

0.5

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

0.5
x2
0

0.5

0.1

## Figure 4: x1 , x2 , x3 for rst nonstationarity points in Problem 1.16(b)

1.16 (a)

E(xt )

sin(2ut)du
0

1

1
cos(2ut)
=
2t
0

1 
cos(2t) 1
=
2t
= 0,
for t = 1, 2, . . ..


(h) =

0

sin() sin() =


1
cos( ) cos( + )
2

## gives (0) = 1/2 and (h) = 0, h = 0.

(b) This part of the problem is harder and it might be a good idea to omit it in more elementary
presentations. Note that nonstationarity holds at the following points
P {x1 1/2, x2 1/2} =

1
4
= P {x2 1/2, x3 1/2} =
2
9

1
1
= P {x2 0, x4 0} =
3
4
1
1
P {x1 > 0, x2 > 0, x3 > 0} = = P {x2 > 0, x3 > 0, x4 > 0} =
6
8
Figure 4 shows a plot of x1 , x2 , x3 over the interval 0 u 1; the probabilities are the Lebesgue
measure of the inverse images satisfying the joint probabilities. Figure 4 shows the plots and one
only needs to dene the intervals where both curves lie below .5 to compute the probabilities.
P {x1 0, x3 0} =

Chapter 1

## 1.17 (a) The exponent of the characteristic function is

n


j xj

j=1

n


j (wj wj1 )

j=1

= 1 w0 +

n1


(j j+1 )wj + n wn .

j=1

Because the wj s are independent and identically distributed, the characteristic function can be
written as
n1

(j j+1 )(n )
(1 , . . . , n ) = (1 )
j=1

(b) Because the joint distribution of the wj will not change simply by shifting x1 , . . . , xn to x1+h , . . . , xn+h ,
the characteristic function remains the same.
1.18 Letting k = j + h, holding j xed after substituting from (1.31) yields

|(h)|

2
= w

h=
2
w

2
= w

<

j+h j |

h= j=




|j+h ||j |

h= j=




|k |

k=

|j |

j=

1.19 Code for parts (a) and (b) are below. Students should have about 1 in 20 acf values within the bounds,
but the values for part (b) will be larger in general than for part (a).
wa=rnorm(500,0,1)
wb=rnorm(50,0,1)
par(mfrow=c(2,1))
acf(wa,20)
acf(wb,20)
1.20 This is similar to the previous problem. Generate 2 extra observations due to loss of the end points in
making the MA.
wa=rnorm(502,0,1)
wb=rnorm(52,0,1)
va=filter(wa, sides=2,
vb=filter(wb, sides=2,
par(mfrow=c(2,1))
acf(va,20, na.action =
acf(vb,20, na.action =

rep(1,3)/3)
rep(1,3)/3)
na.pass)
na.pass)

1.21 Generate the data as in Problem 1.2 and then type acf(x, 25). The sample ACF will exhibit
signicant correlations at one cycle every four lags, which is the same frequency as the signal. (The
process is not stationary because the mean function is the signal, which depends on time t.)
1.22 The sample ACF should look sinusoidal, making one cycle every 50 lags.
x = 2*cos(2*pi*(1:500)/50 + .6*pi)+ rnorm(500,0,1)
acf(x,100)

Chapter 1

1.23 y (h) = cov(yt+h , yt ) = cov(xt+h .7xt+h1 , xt .7xt1 ) = 0 if |h| > 1 because the xt s are independent.
When h = 0, y (0) = x2 (1 + .72 ), where x2 is the variance of xt . When h = 1, y (1) = .7x2 . Thus,
y (1) = .7/(1 + .72 ) = .47
1.24 (a) The variance is always non-negative, so that, for xt a zero-mean stationary series




n
n 
n
var
as xs = E
as xs at xt =
as (s t)at = a a
a 0,
s,t

s=1

s=1 t=1

## so that = (s t), s, t = 1, . . . , n is a non-negative denite matrix.

(b) Let yt = xt x
for t = 1 . . . n and construct the (2n 1) (2n 1) matrix

y1
0
0
... 0
y2
y1
0
... 0

y3
y
y
... 0
2
1

..
..
..
..
.

.
.
.
.
.
.

..

yn yn1 yn2 . . . .

yn1 . . . 0
yn
D= 0
.

. . . y1
0
yn
.

.. y
0
...
0
2
.
..
..
.
.

.
.
. . . ..
.
..
..
..
.
.
.
. . . yn
= (s t), s, t = 1 . . . , n, one can show by matrix multiplication that
If
= 1 D D.

n
Then,
a =
a a


1  
1
a D Da
a = cc =
c2i 0
n
n
i=1
n

for c = Da
a.
1.25 (a)
Ex
t =

N
n
1 
1 
N t
= t
xjt =
t =
N j=1
N j=1
N

(b)
E[(
xt t )2 ] =

N 
N
N

1  
(x

)(x

)
=
e (t, t)
jt
t
jt
t
N 2 j=1
N
j=1
k=1

k=1

1
N e (t, t)
= e (t, t)
N
N

(c) As long as the separate series are observing the same signal, we may assume that the variance goes
down proportionally to the number series as in the iid case. If normality is reasonable, pointwise
100(1 ) % intervals can be computed as

x
t z/2 e (t, t)/ N
1.26
Vx (h
h) =

1
1
E[(xs+h ) (xs )]2 = [(00) (h
h) (h
h) + (00)] = (00) (h
h).
2
2

Chapter 1

10

## 1.27 The numerator and denominator of (h) are

(h)

nh
1 
[1 (t t) + 1 h][1 (t t)]
n t=1

nh
nh

12 
2
(t t) + h
(t t)
n t=1
t=1

=
and

(0) =

n
12 
(t t)2
n t=1

(h) = (0) +


12

n


n


(t t)2 h

t=nh+1


(t t)

t=nh+1

## Hence, we can write

(h) = 1 + R
where
R=


12

n
(0)

n


n


(t t)2 h

t=nh+1


(t t)

t=nh+1

is a remainder term that needs to converge to zero. We can evaluate the terms in the remainder using
m


t=

t=1

and

m


t2 =

t=1

n
(0)

= 12

m(2m + 1)(m + 1)
6


n


m(m + 1)
2


t2 nt2

t=1

6
4
n(n
+
1)(n

1)
,
= 12
12

= 12

## whereas the numerator can be simplied by letting s = t n + h so that


 
h
h

12
2

R=
(s + n h t) h
(s + n h t)

n
(0)
s=1
s=1
The terms in the numerator of R are O(n2 ), whereas the denominator is O(n3 ) so that the remainder
term converges to zero.
1.28 (a)

E[
x2 ]
x| > } n 2
P { n|

Note that,
nE[
x2 ]

(h) = 0,

u=

## where the last step employs the summability condition.

Chapter 1

11

(b) An example of such a process is xt = wt = wt wt1 , where wt is white noise. This situation
arises when a stationary process is over-dierenced (i.e., wt is already stationary, so wt would
be considered over-dierencing).
1.29 Let yt = xt x and write the dierence as


n1/2 (h) (h)

= n1/2

n


yt+h yt n1/2

nh


(yt+h y)(yt y)

t=1

1/2

t=1
n


= n

yt+h yt + y

nh


## For the rst term


E n1/2 |


yt+h yt |

n


yt + y

t=1

t=nh+1

1/2

yt+h (n h)
y

|yt+h yt |

t=nh+1
n


n1/2


2

t=1

n


t=nh+1

nh


2
E 1/2 [yt+h
]E 1/2 [yt2 ]

t=nh+1
1/2

n
0,

hx (0)

as n . Applying the Markov inequality in the hint then shows that the rst term is op (1). In
y 2 , note that, from Theorem A.5,
order to handle the other terms, which dier trivially from n1/2 n
1/2
y 2 converges in distribution to
n y converging in distribution to a standard normal implies that n
2
y2 =
a chi-square random variable with 1 degree of freedom and hence n
y = Op (1). Hence, n1/2 n
1/2
Op (1) = op (1) and the result is proved.
n
1.30 To apply Theorem A.7, we need the ACF of xt . Note that

j k E[wt+hj wtk ]
x (h) =
j,k
2
= w


k

2 h
= w

h+k k

2k

k=0
2 h
w
,
1 2

and we have x (h) = h for the ACF. Now, from (A.55), we have
w11

=
=



u=1

2
x (u + 1) + x (u 1) 2x (1)x (u)
u+1 + u1 2u+1

u=1

(1 2 )2  2u

2
u=1

1 2 .

1 2
(1) AN ,
.
n

Chapter 1

12

## In order to derive a 100(1 ) % condence interval, note that

n(
(1) )2
2
z/2
1 2
with probability 1 . Looking at the roots of
A2 + B + C = 0,
where
A = (1 +

2
z/2

n
B = 2
(1),

and
C = 2 (1)
gives the interval

),

2
z/2

B
B 2 4AC

2A
2A
= 1.96 gives the approximate 95% condence interval (.47, .77).

## Taking (1) = .64, n = 100, z.025

1.31 (a) E(xt xt+h ) = 0 and E(xt xt+h xs xs+k ) = 0 unless all subscripts match. But t = s, and h, k 1, so
all subscripts cant match and hence cov(xt xt+h , xs xs+k ) = 0.
h
(b) Dene yt = xt j=1 j xt+j , for 1 , . . . , h R arbitrary. Then yt is strictly stationary, hh
n
dependent, and var(yt ) = 4 j=1 2j . Hence, with yn = 1 yt /n,

n
yn d N 0,

h
!
!
!

y () N 0, y (0) N 0, 4
2j .
j=1

=

Thus
2 n1/2

h
n 


j xt xt+j d N 0,

t=1 j=1

h


2j

j=1

2 n1/2

n


## (xt xt+1 , . . . , xt xt+h ) d (z1 , . . . , zh ) .

t=1

(c) This part follows from the proof of Problem 1.29, noting that x = 0.
(d) Using part (c), for large n,

n
n t=1 xt xt+h /n
n
.
n"
(h)
2
t=1 xt /n

## Since the denominator p 2 , using Slutskys Theorem,



n1/2

n

t=1 xt xt+h /n
2


2
;
j
=
1,
.
.
.
,
h
d (z1 , . . . , zh ) .
2 /n
x
t
t=1

n

Chapter 2

13

Chapter 2
2.1 (a)(c) The following code will produce all the necessary results. The model is overparameterized if an
intercept is included (the terms for each Q are intercepts); most packages will kick out Q4. In general,
i j is the average increase (decrease) from quarter i to quarter j. There is substantial correlation
left in the residuals, even at the yearly cycle.
jj=ts(scan("/mydata/jj.dat"), start=1960, frequency=4)
Q1=rep(c(1,0,0,0),21)
Q2=rep(c(0,1,0,0),21)
Q3=rep(c(0,0,1,0),21)
Q4=rep(c(0,0,0,1),21)
time=seq(1960,1980.75,by=.25)
reg=lm(log(jj)~0+time+Q1+Q2+Q3+Q4)
summary(reg)
# regression output
plot.ts(log(jj))
lines(time, reg\$fit,col="red") # the returned fitted values are in reg\$fit
plot.ts(reg\$resid)
# the returned residuals are in reg\$resid
acf(reg\$resid,20)
2.2 (a)(b) The following code will produce the output. Note that Pt4 is signicant in the regression and
highly correlated (zero-order correlation is .52) with mortality.
mort=ts(scan("/mydata/cmort.dat"))
temp=ts(scan("/mydata/temp.dat"))
part=ts(scan("/mydata/part.dat"))
t=ts(1:length(mort))
x=ts.intersect(mort,t,temp,temp^2,part,lag(part,-4))
fit=lm(x[,1]~x[,2:6])
summary(fit)
Estimate Std. Error t value Pr(>|t|)
(Intercept)
79.239918
1.224693 64.702 < 2e-16
x[, 2:6]t
-0.026641
0.001935 -13.765 < 2e-16
x[, 2:6]temp
-0.405808
0.035279 -11.503 < 2e-16
x[, 2:6]temp^2
0.021547
0.002803
7.688 8.02e-14
x[, 2:6]part
0.202882
0.022658
8.954 < 2e-16
x[, 2:6]lag(part, -4) 0.103037
0.024846
4.147 3.96e-05

***
***
***
***
***
***

## Residual standard error: 6.287 on 498 degrees of freedom

Multiple R-Squared: 0.608,
F-statistic: 154.5 on 5 and 498 DF, p-value: < 2.2e-16
cor(x, use="complete")
pairs(x)

## # part (b) - correlation matrix

# part (b) - scatterplot

2.3 The following code will produce the output. The slope of the tted line should be close to .1 (the true
slope), but both the true and tted lines will not be very good indicators of the so-called trend.
w=rnorm(500,.1,1)
x=cumsum(w)
t=1:500
fit=lm(x~0+t)
plot.ts(x)
lines(.1*t, lty="dashed")
abline(fit)

Chapter 2

14

j z t , j2 ), for j = 1, 2. Then
2.4 For the normal regression models we have xt N (
ln

x; 1 , 12 )
f1 (x
f2 (x
x; 2 , 22 )

n
n
ln 12 + ln 22
2
2
n
n

1
1 

2
(x

z
)
+
(xt 2z t )2
t
t
1
212 t=1
222 t=1

Taking expectations, the fourth term in the above becomes by adding and subtracting 1z t inside the
parentheses
1 2 ) Z  Z(
1 2)
E1 [(xt 2z t )2 ] = n12 + (
and, dividing through by n and collecting terms, we obtain the quoted result.
2.5 Using the quoted results and the independence of and
2 , we have

 2 
 
n
k
1
2
2
E1 [I(
, 2 ; ,
2 )] =
+
E
ln

+
E

ln

E1
1
1
1
2
2nk
2nk



 
 2
n
1
1
2
2

E
+
E
ln

+
E
=

ln

E1
1
1
1
1
k
2
2nk
2nk


k
1
n
+
= ln 12 + E1 ln
2 +
,
2
nk2 nk2
which simplies to the desired result.
2.6 (a) It is clear that Ext = 0 + 1 t and the mean depends on t. Note that the points will be randomly
distributed around a straight line.
(b) Note that xt = 1 + wt wt1 so that E(xt ) = 1 and

2
h=0
2w
2
cov(xt+h , xt ) = w
h = 1

0
|h| > 1.
(c) Here xt = 1 + yt yt1 , so E(xt ) = 1 + y y = 1 . Also,
cov(xt+h , xt ) = cov(yt+h yt+h1 , yt yt1 ) = 2y (h) y (h + 1) y (h 1),
which is independent of t.
2.7 This is similar to part (c) of the previous problem except that now we have E(xt xt1 ) = , with
autocovariance function
cov(wt+h + yt+h yt+h1 , wt + yt yt1 ) = w (h) + 2y (h) y (h + 1) y (h 1).
2.8 (a) The variance in the second half of the varve series is obviously larger than that in the rst half.
Dividing the data in half gives x (0) = 133, 593 for the rst and second parts respectively and
the variance is about 4.5 times as large in the second half. The transformed series yt = ln xt
has y (0) = .27, .45 for the two halves, respectively and the variance of the second half is only
about 1.7 times as large. Histograms, computed for the two series in Figure 5 indicate that the
transformation improves the normal approximation.
(b) Autocorrelation functions for the three series, shown in Figure 6 show nonstationary behavior,
except in the case of


xt
ut = yt yt1 = ln
,
xt1

Chapter 2

15

300

160

140

Untransformed Varves

250

Logarithms
120

200
100

150

80

60
100
40
50
20

50

100

150

200

## Figure 5: Histograms for varve series xt and yt = ln xt .

Autocorrelation Functions
1
0.5
0
0.5
1

Varve Series xt
0

10

20

30

40

50

60

70

80

90

100

20

30

40

50

60

70

80

90

100

20

30

40

50

60

70

80

90

100

1
0.5
0
yt= ln xt

0.5
1

10

1
0.5
0
0.5
1

ut=ytyt1
0

10

lag

## Figure 6: ACFs for varve series xt , yt = ln xt and ut = yt yt1 .

which has an ACF below the signicance levels, except for u (1) = .3974. Because ut can be
written in the form


xt xt1
xt xt1
= Pt ,
ut = ln 1 +

xt1
xt1
and the term Pt shown is the proportional increase (100Pt = percentage increase). Hence, it
appears that the percent increase in deposition in a year is a more stable quantity.
(c) The series appears stationary because the ACF in Figure 6 is essentially zero after lag one.
(d) Note that
2
(1 + 2 )
u (0) = E[ut u )2 ] = E[wt2 ] + 2 E[wt1 ]2 = w

Chapter 2

16

700

800

Gas Prices

700

600

Oil Prices

600

500

500
400
400
300

300

200
100

200
0

50

100

150

200

0.2

100

50

100

150

200

100

150

200

0.3
0.2

0.1

0.1
0

0.1

0.1

0.3

Differenced ln

0.2

Differenced ln

0.2

0.3
0

50

100

150

200

0.4

50

Figure 7: Gas series, oil series and percent changes for each.
and
2
u (1) = E[(wt+1 wt )(wt t1 )] = E[wt2 ] = w
,

## with u (h) = 0 for |h| > 1. The ACF is

(1) =

1 + 2

or
(1)2 + + (1) = 0
and we may solve for
=

1 42 (1)
2(1)

## using the quadratic formula. Hence, for (1) = .3974

1 1 4(.3974)2
,
=
2(.3974)
yielding the roots = .4946, 2.0217. We take the root = .4946 (this is the invertible root,
see Chapter 3). Then,
u (0)
.3317
2
w
=
=
= .2665
2
1+
1 + (.4946)2
2.9 (a) Figure 7 shows the raw gas and oil prices in the top two panels and we note the parallel nonstationary behavior. This is conrmed by the slowly decaying ACFs shown in Figure 8.
(b) The transformed series yt = ln xt ln xt1 are shown in the bottom panels of Figure 7 and we
note that the trend disappears. The interpretation as the percentage change in price per period
is the same as in Problem 1.23 and the argument is the same here. There are still nonstationary
bursts in the price series and later these will be studied in more detail in Chapter 5 on stochastic
volatility. The ACFs in Figure 8 seem to be more consistent with relatively stationary behavior.

Chapter 2

17

0.5

0.5

Gas ACF

0.5

10

20

0.5

30

40

50

0.5

0.5

0.5

10

20

30

10

20

30

40

50

0.5

Oil ACF

## Transformed Oil ACF

40

50

10

20

30

40

50

Figure 8: ACFs for oil and gas series and percent changes.
Cross Correlation: gas(t+h) vs oil(t)
1

0.8
.67
0.6
.44
.33
0.4

0.2

0.2

0.4

0.6

0.8

1
30

20

10

10

20

30

## Figure 9: Cross correlation function gas(t+h) vs oil(t).

1

0.8
Cross correlation (first 80 points)
0.6

0.4

0.2

0.2

0.4

0.6

0.8

1
30

20

10

10

20

30

Figure 10: Cross correlation function gas(t+h) vs oil(t) over rst 80 points.
(c) Figure 9 shows the cross correlation function (CCF) over the entire record and is virtually the
same as the CCF over the last 100 points, which is not shown. We see indications of instantaneous

Chapter 2

18

gas(t) vs oil(t+1)

gas(t) vs oil(t)

0.2

0.2

0.1

0.1

0.1

0.1

0.2

0.2

0.3
0.4

0.2

0.2

0.4

0.3
0.4

gas(t) vs oil(t1)
0.2

0.1

0.1

0.1

0.1

0.2

0.2

0.2

0.2

0.4

0.2

0.4

gas(t) vs oil(t4)

0.2

0.3
0.4

0.2

0.2

0.4

0.3
0.4

0.2

Figure 11: Scatterplot relating oil changes on abscissa to gas changes on ordinate at various lags.
responses of gas prices to oil price changes and also signicant values at lags of +1 (oil leads gas)
and 1 gas leads oil; the second of these might be considered as feedback. Figure 10 shows the
CCF over the rst 80 points, when there were no really substantial bursts and we note that longer
lags seem to be important. The scatter diagrams shown in Figure 11 for the main lagged relations
show an interesting nonlinear phenomenon. Even when the oil changes are around zero on the
horizontal axis, there are still fairly substantial variations in gas prices. Larger uctuations in
oil price still produce linear changes in gas prices of about the same order. Hence, there may be
some indication of a threshold type of regression operating here, with changes of less than, say
5% in oil prices associated with fairly large uctuations in gasoline prices.
2.10 The R code for this problem is below.
soi=scan("/mydata/soi.dat")
# part (a)
t=1:length(soi)
fit=lm(soi~t)
summary(fit)
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.2109341 0.0353571
5.966 4.93e-09 ***
t
-0.0005766 0.0001350 -4.272 2.36e-05 *** # <-significant slope
soi.detr=fit\$resid
# part (b), detrended data in soi\$resid
plot.ts(soi.detr)
per=abs(fft(soi.detr))^2
plot.ts(per)
cbind((t-1)/453,per)
# lists frequency and periodogram
# El Nino peak is around .024 or approx 1 cycle/42 months
# (freq=0.024282561 local max per=5.536548e+02)
2.11 (a) Unlike SOI, the recruitment series autocorrelation continues to decrease with lag. Also, the point

Chapter 2

19

## clouds arent normal ellipses.

rec=scan("/mydata/recruit.dat")
lag.plot(rec, lags=12, layout=c(3,4))
(b) Make sure soi and rec are time series objects. Note that lags 5,6,7,8 are highly signicant,
which agrees with the scatterplot.
u = cbind(rec, soi, lag(soi,-1),lag(soi,-2),lag(soi,-3), lag(soi,-4),
lag(soi,-5),lag(soi,-6),lag(soi,-7), lag(soi,-8))
fit = lm(u[,1]~u[,2:10])
Estimate Std. Error t value Pr(>|t|)
(Intercept)
68.8030
0.9576 71.850 < 2e-16 ***
u[, 2:10]soi
-6.3196
2.9696 -2.128
0.0339 *
u[, 2:10]lag(soi, -1) -3.2771
3.4022 -0.963
0.3360
u[, 2:10]lag(soi, -2) -0.7152
3.4219 -0.209
0.8345
u[, 2:10]lag(soi, -3)
0.4559
3.4217
0.133
0.8941
u[, 2:10]lag(soi, -4)
1.9077
3.4089
0.560
0.5760
u[, 2:10]lag(soi, -5) -19.5543
3.4265 -5.707 2.13e-08 ***
u[, 2:10]lag(soi, -6) -16.9848
3.4317 -4.949 1.07e-06 ***
u[, 2:10]lag(soi, -7) -14.7400
3.4289 -4.299 2.12e-05 ***
u[, 2:10]lag(soi, -8) -23.1478
2.9982 -7.721 8.03e-14 ***
(c) There are many ways to go here; the code for lowess is below. Note a general positive trend. Using
a 5% span, you notice an approximate periodicity of about 11 cycles in 453 months, or about 1
cycle every 42 months, which corresponds to the approximate El Ni
no cycle (see the previous
problem).
plot(rec)
lines(lowess(rec),col=2)
# trend
lines(lowess(rec,f=.05),col=4) # periodic
(d) The code for lowess is below; see Figure 12.
x=cbind(lag(soi,-6),rec)
x=x[7:453,]
plot(x[,1],x[,2])
lines(lowess(x[,1],x[,2]),col=4)
Lowess (10% smooth) of Recruitment from SOI at 6 mo lag
100

90

80

70

Recruits

60

50

40

30

20

10

0
1

0.8

0.6

0.4

0.2

0
SOI(6)

0.2

0.4

0.6

0.8

## Figure 12: Nonparametric prediction of recruitment from lagged SOI.

2.12 Two dierent lowess ts are given below.
gtemp=scan("/mydata/globtemp.dat")
plot.ts(gtemp)
lines(lowess(gtemp), col=2)
lines(lowess(gtemp, f=.25), col=4)

Chapter 3

20

Chapter 3

x (1)

1
3.1 Note x (1) = 1+
= (1+
2 . Thus
2 )2 = 0 when = 1. We conclude x (1) has a maximum at

## = 1 wherein x (1) = 1/2 and a minimum at = 1 wherein x (1) = 1/2.

t1
t1
t1
3.2 (a) Write xt = j=0 j wtj . Then E(xt ) = j=0 j E(wtj ) = 0 and var(xt ) = j=0 2j var(wtj ) =
t1 2j
2
w
j=0 . The process is not stationary because the variance of xt depends on time t.
!
h1
(b) cov(xt , xth ) = cov h xth + j=0 j wtj , xth = h var(xth ) for h 0 and t h 1.
Thus

1/2
var(xth )
cov(xt , xth )

corr(xt , xth ) =
= h
.
var(xt )
var(xt ) var(xth )
2
(c) Let t , then var(xt ) w
corr(xt , xth ) h .


j=0

2
w
2j = w
/(1 2 ). Thus, cov(xt , xth ) h 1
2 and

(d) Generate more than n observations, for example, generate n + 50 observations and discard the
rst 50.
2

2
w
(e) Use induction: var(x2 ) = var(x1 + w2 ) = 2 1
2 + w =
2
w
12 ,

2
w
12 .

## then var(xt ) = var(xt1 +wt ) =

and we conclude the process is stationary.

2
w
12

## = var(x1 ). Suppose var(xt1 ) =

2

w
By part (b), cov(xt , xth ) = h var(xth ) = h 1
2

3.3 (a) Write this as (1 .3B)(1 .5B)xt = (1 .3B)wt and reduce to (1 .5B)xt = wt . Hence the
process is a causal and invertible AR(1): xt = .5xt1 + wt .
(b) The AR polynomial is 1 1z + .5z 2 which has complex roots 1 i outside the unit circle (note
|1 i|2 = 2). The MA polynomial is 1 z which has root unity. Thus the process is a causal but
not invertible ARMA(2, 1).
3.4 Let 1 and 2 be the roots of (z), that is, (z) = (1 11 z)(1 21 z). The causal condition is
|1 | > 1, |2 | > 1. Let u1 = 11 and u2 = 21 so that (z) = (1 u1 z)(1 u2 z) with causal condition
|u1 | < 1, |u2 | < 1. To show |u1 | < 1, |u2 | < 1 if and only if the three given inequalities hold. In terms
of u1 and u2 , the inequalities are:
(i) 2 + 1 1 = (1 u1 )(1 u2 ) < 0

(note 1 = u1 + u2 and 2 = u1 u2 )

## (ii) 2 1 1 = (1 + u1 )(1 + u2 ) < 0

(iii) |2 | = |u1 u2 | < 1
If |u1 | < 1, |u2 | < 1 and they are real, then (i) and (ii) hold because (1 uj ) > 0 for j = 1, 2; (iii)
is obvious.
If |u1 | < 1, |u2 | < 1 and they are complex, u2 = u1 and (i) |1 u1 |2 < 0, (ii) |1 + u1 |2 < 0,
(iii) |u1 |2 < 1.
If (i)(iii) hold, then (iii), which is |u1 u2 | < 1, implies at least one of u1 , u2 must be less than 1
in absolute value (both if they are complex). Thus, (iii) is enough to imply |u1 | < 1, |u2 | < 1 in
the case of complex roots.
Now suppose the roots are real. Suppose wolog, |u1 | < 1. But, if |u1 | < 1, then (1 u1 ) > 0 so
for (i) and (ii) to hold, we must have (1 u2 ) > 0 or |u2 | < 1 as desired.

3.5 Refer to Example 3.8. The roots of (z) = 1.9z 2 are i/ .9. Because the roots are purely imaginary,

h
= arg(i/ .9) = /2 and consequently, (h) = a .9 cos( 2 h + b), or (h) makes one cycle every 4

Chapter 3

21

values of h. Because (0) = 1 and (1) = 1 /(1 2 ) = 0, it follows that a = 1 and b = 0 in which

5
h
case (h) = .9 cos( 2 h). Thus (h) = {1, 0, .9, 0, .9 , . . .} for h = 0, 1, 2, 3, 4, . . ..

## Figure 1: ACF for Problem 3.5

3.6 Refer to Examples 3.8 and 3.10. For (a) (c) we have 0 = 1 and 1 = 1 . From (3.30)(3.31) we have
distinct roots: j = c1 z1j + c2 z2j

equal roots:

j = z0j (c1 + c2 j)

For the ACF we have (0) = 1 and (1) = 1 /(1 2 ). From (3.30)(3.31) we have
distinct roots: (h) = c1 z1h + c2 z2h

equal roots:

## (h) = z0h (c1 + c2 h)

(a) (z) = 1+1.6z+.64z 2 = (1+.8z)2 . This is equal roots case with z0 = .8. Thus j = .8j (a+bj)
and (h) = .8h (c + dh). To solve for a and b note for j = 0 we have 0 = 1 = a and for j = 1
we have 1 = 1 = 1.6 = .81 (1 + b) or b = 2.28. Finally j = .8j (1 + 2.28j) for
j = 0, 1, 2, . . .. To solve for c and d note for h = 0 we have (0) = 1 = c and for h = 1 we
have (1) = 1.6/(1 + .64) = .81 (1 + d) or d = 1.78. Finally, (h) = .8h (1 + 1.78h) for
h = 0, 1, 2, . . . .
(b) (z) = 1 .4z .45z 2 = (1 .9z)(1 + .5z). This is the unequal roots case with z1 = .9 and
z2 = .5. Thus j = a0.9j + b(0.5)j where a and b are found by solving 1 = a + b and
.4 = a0.91 + b(0.5)1 . For the ACF, (h) = c0.9h + d(0.5)h where c and d are found by
solving 1 = c + d and .4/(1 .45) = c0.91 + d(0.5)1 .
(c) (z) = 1 1.2z + .85z 2 . This is the complex roots case, with roots .706 .824i. Refer to
Example 2.8, = arg(.706 + .824i) = .862 radians. Thus j = a|.706 + .824i|j cos(.862j + b) =
a 1.08j cos(.862j + b) where a and b satisfy 1 = a cos(b) and 1.2 = a 1.081 cos(.862 + b). For
the ACF, (h) = c 1.08h cos(.862h + d) where c and d are found by solving 1 = c cos(d) and
1.2/(1 + .85) = c 1.081 cos(.862 + d).
3.7 The ACF distinguishes the MA(1) case but not the ARMA(1,1) or AR(1) cases, which look similar to
each other (see Figure 2).

Chapter 3

22

## Figure 2: ACFs for Problem 3.7

3.8 ar = arima.sim(list(order=c(1,0,0), ar=.6), n=100)
ma = arima.sim(list(order=c(0,0,1), ma=.9), n=100)
arma = arima.sim(list(order=c(1,0,1), ar=.6, ma=.9), n=100)
par(mfcol=c(1,2))
acf(ar)
pacf(ar)
par(mfcol=c(1,2))
acf(ma)
pacf(ma)
par(mfcol=c(1,2))
acf(arma)
pacf(arma)
3.9 > reg=ar.ols(mort, order=2, demean=F, intercept=T)
> reg
Coefficients:
1
2
0.4308 0.4410
Intercept: 11.33 (2.403)
Order selected 2 sigma^2 estimated as 32.39
\$pred
Time Series:
Start = 509
End = 512
Frequency = 1
[1] 87.60259 86.77514 87.35034 87.23323
\$se
Time Series:
Start = 509
End = 512
Frequency = 1
[1] 5.691428 6.197170 6.686199 6.686199

3.10 (a) The model can be written as xn+1 = j=1 ()j xn+1j + wn+1 . From this we conclude that

2
2
#n+1 )2 = Ewn+1
= w
.
x
#n+1 = j=1 ()j xn+1j and MSE = E(xn+1 x

Chapter 3

23

#nn+1 =

n

j
j=1 () xn+1j .

Thus

MSE = E(xn+1 x
#nn+1 )2 = E

2
()j xn+1j + wn+1

j=n+1

= E ()(n+1)

## ()j(n+1) xn+1j + wn+1

j=n+1

= E ()(n+1) w0 + wn+1

)2

!
2
= w
1 + 2(n+1) .

There can be a substantial dierence between the two MSEs for small values of n, but for large
n the dierence is negligible.
3.11 The proof is by contradiction. Assume there is a n that is singular. Because (0) > 0, 1 = {(0)}
is non-singular. Thus, there is an r 1 such that r is non-singular. Consider the ordered sequence
1 , 2 , . . . and suppose r+1 is the rst singular n in the sequence. Then xr+1 is a linear combination
of x = (x1 , . . . , xr ) , say, xr+1 = bx where b = (b1 , ..., br ) . Because of stationarity, it must also
be true that xr+h+1 = bxh , where xh = (xh , . . . , xr+h1 ) for all h 1. This means that for any
n r + 1, xn is a linear combination of x1 , . . . , xr , i.e., xn = bnx where bn = (bn1 , ..., bnr ) . Thus,
(0) = var(xn ) = bn r bn = bn QQbn where QQ is the identity matrix and = diag{1 , . . . , r } is
the diagonal matrix of the positive eigenvalues (0 < 1 r ) of r . From this result we conclude
(0) 1bn QQbn = 1

r


b2nj ;

j=1

this shows that for each j, bnj is bounded in n. In addition, (0) = cov(xn , xn ) = cov(xn , bnx) from
which it follows that
r

0 < (0)
|bnj | |(n j)|.
j=1

From this inequality it is seen that because the bnj are bounded, it is not possible to have (0) > 0
and (h) 0 as h .
3.12 First take the prediction equations (3.56) with n = h and divide both sides by (0) to obtain Rhh = h .
h1 , hh ) [note (0) = 1]:
Partition the equation as in the hint with h = (
 



h1
Rh1
h1
#h1
=
,

#h1
1
hh
(h)
and solve. We get
Rh1h1 +
#h1 hh = h1

#h1h1

+ hh = (h).

(1)
(2)



1
h1
h1 = Rh1
#h1 hh .

hh =

1
h1
(h)
#h1 Rh1
1
1
#h1 Rh1

#h1

## Next, we must show that the PACF,

*

E(t th )
2 )
E(2t )E(th

(3)

Chapter 3

24

can be written in the form of equation (3). To this end, let x = (xt1 , ..., xth+1 ) . The regression of xt
1

on x is (1
#h1 )x; see equation
h1 h1 ) x ; see equation (2.59). The regression of xth on x is (h1
t = xt h1 1
h1 x
th = xth #h1 1
h1 x .
From this we calculate (the calculations below are all similar to the verication of equation (2.60); also,
note for vectors a and b, ab = ba)
E(t th ) = cov(t , th ) = (h) #h1 1
h1 h1 .
Similar calculations show that
2
) = var(th ) = (0) #h1 1
#h1 .
E(th
h1

Also note that the error of the regression of xt on x is the same as the error of the regression of xt on
#h1 )x
#. From this we conclude that
x
#, where x
# = (xth+1 , ..., xt1 ) ; that is, t = xt (1
h1
#h1 1
#h1 .
E(2t ) = var(t ) = (0) h1 1
h1 h1 = (0)
h1
This proves the result upon factoring out (0) in the numerator and denominator.


3.13 (a) We want to nd g(x) to minimize E[y g(x)]2 . Write this as E[E{(y g(x))2  x}]. Minimize
the inner expectation: E{(y g(x))2  x}/g(x) = 2[E(y|x) g(x)] = 0 from which we conclude
g(x) = E(y|x) is the required minimum.
(b) g(x) = E(y|x) = E(x2 + z|x) = x2 + E(z) = x2 . MSE = E(y g(x))2 = E(y x2 )2 = E(z 2 ) =
var(z) = 1.
(c) Let g(x) = a + bx. Using the prediction equations, g(x) satises
(i) E[y g(x)] = 0 (ii) E[(y g(x))x] = 0
or
(i) E[y] = E[a + bx] (ii) E(xy) = E[(a + bx)x]
From (i) we have a + bE(x) = E(y), but E(x) = 0 and E(y) = 1 so a = 1. From (ii) we have
aE(x) + bE(x2 ) = E(xy), or b = E[x(x2 + z)] = E(x3 ) + E(xz) = 0 + 0. Finally g(x) = a + bx = 1
and MSE = E(y 1)2 = E(y 2 ) 1 = E(x4 ) + E(z 2 ) 1 = 3 + 1 1 = 3.
Conclusion: In this case, the best linear predictor has three times the error of the optimal predictor
(conditional expectation).
m1 2
2
3.14 For an AR(1), equation (3.77) is exact; that is, E(xt+m xtt+m )2 = w
j=0 j . For an AR(1),
m1 2j
j
2
2
2m
2
j = and thus w j=0 = w (1 )/(1 ), the desired expression.

3.15 From Example 3.6, xt = 1.4 j=1 (.5)j1 xtj + wt , so the truncated one-step-ahead prediction using

n
(3.81) is x
#nn+1 = 1.4 j=1 (.5)j1 xn+1j .
From Equation (3.82)
x
#nn+1

n
= .9xn + .5w
#nn = .9xn + .5(xn .9xn1 .5w
#n1
)
n
n
= 1.4xn .9(.5)xn1 .5w
#n1 = 1.4xn .9(.5)xn1 .52 (xn1 .9xn2 .5w
#n2
)
n
= 1.4xn 1.4(.5)xn1 + .9(.52 )xn2 + .53 w
#n2
n
= 1.4xn 1.4(.5)xn1 + 1.4(.52 )xn2 .9(.53 )xn3 .54 w
#n3
..
.
n

(.5)j1 xn+1j
= 1.4
j=1

Chapter 3

25

## 3.16 Using the result above (2.78)

#n+m )(xn+m+k x
#n+m+k )
E(xn+m x

m1


= E(

m+k1


j wn+mj )(

j=0
2
= w

m1


 wn+m+k )

=0

j j+k

j=0

3.17 (a)(b) Below reg1 is least squares and reg2 is Yule-Walker. The standard errors for each case are
also evaluated; the Yule-Walker run uses Proposition P3.9. The two methods produce similar results.
(a)
> reg1=ar.ols(mort, order=2)
> reg2=ar.yw(mort, order=2)
> reg1
Coefficients:
1
2
0.4308 0.4410
Order selected 2 sigma^2 estimated as
> reg2
Coefficients:
1
2
0.4328 0.4395
Order selected 2 sigma^2 estimated as

32.39

32.62

(b)
> reg1\$asy.se.coef
\$ar
[1] 0.03996103 0.03994833
> reg2\$asy.se.coef
> sqrt(diag(reg2\$asy.var.coef))
[1] 0.04005162 0.04005162
3.18 (a) For an AR(1) we have, xn1 = x1 , xn0 = x1 , xn1 = xn0 = 2 x1 , and in general, xnt = 1t x1 for
t = 1, 0, 1, 2, . . ..
def

(b) w
"t () = xnt xnt1 = 1t x1 2t x1 = 1t (1 2 )x1 .
1

1
1
"t2 () = (1 2 )2 x21 t= 2(1t) = (1 2 )2 x21 t=1 2t2 = (1 2 )2 x21 1
(c)
2 =
t= w
2 2
(1 )x1 .

1
n
"t2 () + t=2 (xt xt1 )2 =
(d) 
From (3.96), S() = (1 2 )x21 + t=2 (xt xt1 )2 = t= w
n
"t2 () using (c) and the fact that w
"t () = xt xt1 for 1 t n.
t= w
= xt1 and xt xt1
= xt xt1 . For t = 1, x01 = E(x1 ) = 0 so
(e) For t = 2, ..., n, xt1
t
t
t1
0
2
so rtt1 = 1. For t = 1,
x1 x1 = x1 . Also, for t = 2, ..., n, Pt = E(xt xt1 )2 = E(wt2 ) = w
0
2
2
2
0
2
P1 = E(x1 ) = w /(1 ) so r1 = 1/(1 ) we may write S() in the desired form.
3.19 The simulations can easily be done in R. Although the results will vary, the data should behave like
observations from a white noise process.
>
>
>
>
>

## x = arima.sim(list(order=c(1,0,1), ar=.9, ma=-.9), n=500)

plot(x)
acf(x)
pacf(x)
arima(x, order = c(1, 0, 1))

Chapter 3

26

## 3.20 The following R program can be used.

phi=matrix(0,10,1)
theta=matrix(0,10,1)
sigma2=matrix(0,10,1)
for (i in 1:10){
x=arima.sim(n = 200, list(ar = .9, ma = .2, sd = sqrt(.25)))
fit=arima(x, order=c(1,0,1))
phi[i]=fit\$coef[1]
theta[i]=fit\$coef[2]
sigma2[i]=fit\$sigma2
}
3.21 Below is R code for this example using Yule-Walker. The asymptotic distribution is normal with mean
.99 and standard error (1 .992 )/50 .02. The bootstrap distribution should be very dierent than
the asymptotic distribution. (If you use MLE, there might be problems because is very near the
boundary. You might alert students to this fact or let them nd out on their own.)
x=arima.sim(list(order=c(1,0,0), ar=.99), n=50)
fit=ar.yw(x, order=1)
phi =fit\$ar
# estimate of phi
nboot = 200
# number of bootstrap replicates
resids = fit\$resid
resids = resids[2:50]
# the first resid is NA
x.star = x
phi.star = matrix(0, nboot, 1)
for (i in 1:nboot) {
resid.star = sample(resids)
for (t in 1:49){
x.star[t+1] = phi*x.star[t] + resid.star[t]
}
phi.star[i] = ar.yw(x.star, order=1)\$ar
}
3.22 Write wt () = xt xt1 for t = 1, ..., n conditional on x0 = 0. Then zt () = wt ()/ = xt1 .
Let (0) be an initial guess at , then
n
n
xt1 xt
t=1 xt1 (xt (0) xt1 )
n
(1) = (0) +
= t=1
,
n
2
2
t=1 xt1
t=1 xt1
and the estimate has converged in one step to the (conditional) MLE of .
3.23 (a) Taking expectation through the model, E(xt ) = + E(xt1 ) + 0 + 0, we have = + or
= (1 ). Let yt = xt , then the model can be written as yt = yt1 + wt + wt1 . Because
|| < 1 the process yt is causal (and hence stationary) and consequently xt is stationary. The
same technique used in Problem 1.17 can be used here to show that yt , and hence xt , is strictly
stationary.

(b) Because of causality, yt = j=0 j wtj where 0 = 1 and j = ( + )j1 for j = 1, 2, . . .

(see Examples 3.6 and 3.10) and hence xt = + j=0 j wtj . Thus, by Theorem A.5, x

!2
!2



2
2
= w

AN(, n1 V ) where V = w
1 + ( + ) j=1 j1 . Equivalently, x
j=0 j
AN(/(1 ), n1 V ).

(1 + a2 )s2 ,
3.24 (a) E(xt ) = 0 and x (h) = E(st + ast )(st+h + ast+h ) = as2 ,

0,
is stationary.

h=0
h = so the process
|h| > 1,

Chapter 3

27
2
k k
k+1 k+1
a
stk , and letting k
Also, xt ax
t + a xt2 + (1) a xtk = st (1)


j
shows st = j=0 (a) xtj is the mean square convergent representation of st . Note: If is
known, the process is an invertible MA() process with 1 = = 1 = 0 and = a.

(b) The Gauss-Newton procedure is similar to the MA(1) case in Example 3.30. Write st (a) = xt
ast (a) for t = 1, ..., n. Then zt (a) = st (a)/a = st (a) + ast /a = st (a) azt (a).
The iterative procedure is
n
zt (a(j) )st (a(j) )
n
a(j+1) = a(j) + t=1
j = 0, 1, 2, . . .
2
t=1 zt (a(j) )
where zt () = 0 and st () = 0 for t 0.
(c) If is unknown, the ACF of xt can be used to nd a preliminary estimate of . Then, a GaussNewton procedure can be used to minimize the error sum of squares, say Sc (a, ), over a grid of
values near the preliminary estimate. The values a
and that minimize Sc (a, ) are the required
estimates.
3.25 (a) By Property P3.9, " AN[, n1 (1 2 )] so that " = + Op (n1/2 ).
" n . Thus xn x
" n . Using Tchebches inequality,
(b) xnn+1 = xn whereas x
"nn+1 = x
"nn+1 = ( )x
n+1
it is easy to show xn = Op (1). Thus, by the properties of Op (),
" n = Op (n1/2 )Op (1) = Op (n1/2 )
"nn+1 = ( )x
xnn+1 x
3.26 Write k xt = (1 B)k xt =

k

## where cj is the coecient of B j in the binomial expansion

k
of (1 B)k . Because xt is stationary, E(k xt ) = x j=0 cj independent of t, and (for h 0)


h+k
k
k
cov(k xt+h , k xt ) = cov( j=0 cj xt+hj , j=0 cj xtj ) = j=0 dj x (j), that is, the covariance is a
time independent (linear) function of x (0), . . . , x (h + k). Thus k xt is stationary for any k.
j=0 cj xtj

## Write yt = mt + xt where mt is the given q-th order polynomial. Because k x

t is stationary for any
q1
k, we concentrate on mt . Note that mt = mt mt1 = cq [tq (t 1)q ] + j=0 cj [tj (t 1)j ];
from this it follows that the coecient of tq is zero. Now assume the result is true for k mt and show
it is true for k+1 mt [that is, for k < q, if k mt is a polynomial of degree q k then k+1 mt is a
polynomial of degree q (k + 1)]. The result holds by induction.
xt1 , then the model is yt = wt wt1 , which is invertible. That is, wt =
3.27 Write
 yjt = xt 

j
2

y
=
tj
j=0
j=0 (xtj xt1j ). Rearranging wt = xt (1 )xt1 (1 )xt2 , or

xt = j=1 j (1 )xtj + wt .
3.28 See Figure 3. The EWMAs are smoother than the data (note the EWMAs are within the extremes of
the data). The EWMAs are not extemely dierent for the dierent values of , the smoothest EWMA
being when = .75.
x = scan("/mydata/varve.dat")
x=log(x[1:100])
plot(x)
a=matrix(c(.25,.5,.75),3,1)
xs=x
for (i in 1:3){for (n in 1:99){
xs[n+1]=(1-a[i])*x[n] + a[i]*xs[n]}
lines(xs, lty=2, col=i+1, lwd=2)}

Chapter 3

1.5

2.0

2.5

3.0

3.5

4.0

28

20

40

60

80

100

Index

## Figure 3: EWMAs for Problem 3.28

3.29 Follow the steps of Examples 3.35 and 3.36, performing the diagnostics on gnpgr.ar.
should be similar to those in Example 3.36.

The results

3.30 Notice the high volatility near the middle and the end of the series. No ARIMA model will be able to
capture this and we shouldnt expect to a obtain a good t. Given the nature of the data, we suggest
working with the returns; that is if xt is the data, one should look at yt = ln(xt ). The ACF and
PACF of yt suggest an AR(3); that is, the ACF is tailing o whereas the PACF cuts o after lag 3.
Fitting an ARIMA(3,1,0) to ln(xt ) yields a reasonable t. The residuals appear to be uncorrelated,
but they are not normal (given the large number of outliers). Below is the R code for this problem.
x = scan("/mydata/gas.dat")
dlx=diff(log(x))
acf(dlx)
pacf(dlx)
fit=arima(log(x), order = c(3, 1, 0))
tsdiag(fit, gof.lag=20)
qqnorm(fit\$resid)
shapiro.test(fit\$resid)
3.31 An ARIMA(1,1,1) seems to t the data. Below is R code for the problem:
gtemp=ts(x[,2], start=1880)
plot(gtemp)
par(mfrow=c(2,1))
acf(diff(gtemp), 30)
pacf(diff(gtemp), 30)
fit=arima(gtemp, order=c(1,1,1))
ar1
ma1
0.2545 -0.7742
s.e. 0.1141
0.0651
sigma^2 estimated as 0.01728: log likelihood = 75.39,
tsdiag(fit, gof.lag=20)
# ok
predict(fit, n.ahead=15) # !!!! NOTE BELOW---

aic = -144.77

R doesnt do the forecasting correctly, I think it is ignoring the fact that d = 1. In any case, the
forecasts should look more like this:

Chapter 3

29

Period
126
127
128
129
130

Forecast
0.576718
0.574925
0.578924
0.584482
0.590461

95 Percent Limits
Lower
Upper
0.319895 0.833541
0.293668 0.856183
0.287486 0.870361
0.285658 0.883306
0.285017 0.895906

Period
131
132
133
134
135

95 Percent Limits
Lower
Upper
0.284780 0.908327
0.284740 0.920613
0.284835 0.932780
0.285046 0.944835
0.285363 0.956787

Forecast
0.596554
0.602676
0.608808
0.614941
0.621075

3.32 There is trend so we consider the (rst) dierenced series, which looks stationary. Investigation of
the ACF and PACF of the dierenced suggest an ARMA(0,1) or ARMA(1,1) model. Fitting an
ARIMA(0,1,1) and ARIMA(1,1,1) to the original data indicates the ARIMA(0,1,1) model; the AR
parameter is not signicant in the ARIMA(1,1,1) t. The residuals appear to be (borderline) white,
but not normal.
x = scan("/mydata/so2.dat")
dx=diff(x)
acf(dx)
pacf(dx)
fit=arima(log(x), order = c(0,1,1))
fit
tsdiag(fit, gof.lag=20)
qqnorm(fit\$resid)
shapiro.test(fit\$resid)
3.33 (a) The model is ARIMA(0, 0, 2) (0, 0, 0)s (s can be anything) or ARIMA(0, 0, 0) (0, 0, 1)2 .

## (b) The MA polynomial is (z) = 1 + z 2 with roots z = i/ outside the unit

 circle (because
|| < 1). To nd the invertible representation, note that 1/[1 (z 2 )] = j=0 (z 2 )j from
which we conclude that 2j = ()j and 2j+1 = 0 for j = 0, 1, 2, . . . . Consequently
wt =

()k xt2k .

k=0

k
k=1 () xn+m2k

x
#n+m =

## + wn from which we deduce that

()k x
#n+m2k

k=1

where x
#t = xt for t n. For the prediction error, note that 0 = 1, 2 = and j = 0 otherwise.
n
2
n
2
= w
for m = 1, 2; when m > 2 we have Pn+m
= w
(1 + 2 ).
Using (3.78), Pn+m
See Figure 4.

0.2

0.0

0.2

0.2

0.0

0.4

acf

pacf

0.6

0.4

0.8

0.6

1.0

3.34 Use the code from Example 3.41 with ma=.5 instead of ma=-.5.

10

20

30

40

50

10

lag

## Figure 4: ACF and PACF for Problem 3.33

20

30
lag

40

50

Chapter 3

30

3.35 After plotting the unemployment data, say xt , it is clear that one should t an ARMA model to
yt = 12 xt . The ACF and PACF of yt indicate a clear SMA(1) pattern (the seasonal lags in the
ACF cut o after lag 12, whereas the seasonal lags in the PACF tail o at lags 12, 24, 36, and so on).
Next, t an SARIMA(0, 1, 0) (0, 1, 1)12 to xt and look at the ACF and PACF of the residuals. The
within season part of the ACF tails o, and the PACF is either cutting o at lag 2 or is tailing o.
These facts suggest an AR(2) or and ARMA(1,1) for the within season part of the model. Hence, t
an (i) SARIMA(2, 1, 0) (0, 1, 1)12 or an (ii) SARIMA(1, 1, 1) (0, 1, 1)1 2 to xt . Both models have
the same number of parameters, so it should be clear that model (i) is better because the MSE is
smaller for model (i) and the residuals appear to white (while there may still be some correlation left
in the residuals for model (ii)). Below is the R code for tting model (i), along with diagnostics and
forecasting.
x = scan("/mydata/unemp.dat")
par(mfrow=c(2,1))
# (P)ACF of d1-d12 data
acf(diff(diff(x),12), 48)
pacf(diff(diff(x),12), 48)
fiti = arima(x, order=c(2,1,0), seasonal=list(order=c(0,1,1), period=12))
fiti
# to view the results
tsdiag(fiti, gof.lag=48) # diagnostics
x.pr = predict(fiti, n.ahead=12) # forecasts
U = x.pr\$pred + 2*x.pr\$se
L = x.pr\$pred - 2*x.pr\$se
month=337:372
plot(month, x[month], type="o", xlim=c(337,384), ylim=c(360,810))
lines(x.pr\$pred, col="red", type="o")
lines(U, col="blue", lty="dashed")
lines(L, col="blue", lty="dashed")
abline(v=372.5,lty="dotted")
3.36 The monthly (s = 12) U.S. Live Birth Series can be found in birth.dat. After plotting the data, say
xt , it is clear that one should t an ARMA model to yt = 12 xt . The ACF and PACF of yt indicate
a seasonal MA of order one, that is, t an ARIMA(0, 0, 0) (0, 0, 1)12 to yt . Looking at the ACF and
PACF of the residuals of that t suggests tting a nonseasonal ARMA(1,1) component (both the ACF
and PACF appear to be tailing o). After that, the residuals appear to be white. Finally, we settle
on tting an ARIMA(1, 1, 1) (0, 1, 1)12 model to the original data, xt . The code for this problem is
nearly the same as the previous problem.
x=scan("/mydata/birth.dat")
par(mfrow=c(2,1))
# (P)ACF of d1-d12 data
acf(diff(diff(x),12), 48)
pacf(diff(diff(x),12), 48)
### fit model (i)
fit = arima(x, order=c(1,1,1), seasonal=list(order=c(0,1,1), period=12))
fit
# to view the results
tsdiag(fit, gof.lag=48) # diagnostics
x.pr = predict(fit, n.ahead=12) # forecasts
U = x.pr\$pred + 2*x.pr\$se
L = x.pr\$pred - 2*x.pr\$se
month=337:372
plot(month, x[month], type="o", xlim=c(337,384), ylim=c(240,340))
lines(x.pr\$pred, col="red", type="o")
lines(U, col="blue", lty="dashed")
lines(L, col="blue", lty="dashed")
abline(v=372.5,lty="dotted")
3.37 Because of the increasing variability, the data, jjt , should be logged prior to any further analysis.
A plot of the logged data, say yt = ln jjt , shows trend, and one should notice the dierences in the

Chapter 3

31

behavior of the series at the beginning, middle, and end of the data (as if there are 3 dierent regimes).
Because of these inconsistencies (nonstationarities), it is dicult to discover an ARMA model and one
should expect students to come up with various models. In fact, assigning this problem may decrease
Next, apply a rst dierence and seasonal dierence to the logged data: xt = 4 yt . The PACF of xt
reveals a large correlation at the seasonal lag 4, so an SAR(1) seems appropriate. The ACF and PACF
of the residuals reveals an ARMA(1,1) correlation structure for the within the seasons. This seems to
be a reasonable t. Hence, a reasonable model is an SARIMA(1, 1, 0) (1, 1, 0)4 on the logged data.
Below is R code for this problem.
jj=scan("/mydata/jj.dat")
x=diff(diff(log(jj)),4)
par(mfrow=c(2,1))
acf(x, 24)
pacf(x, 24)
fit1 = arima(log(jj),order=c(0,1,0),seasonal=list(order=c(1,1,0), period=4))
par(mfrow=c(2,1))
acf(fit1\$resid, 24)
pacf(fit1\$resid, 24)
fit2 = arima(log(jj),order=c(1,1,0),seasonal=list(order=c(1,1,0), period=4))
par(mfrow=c(2,1))
acf(fit2\$resid, 24)
pacf(fit2\$resid, 24)
tsdiag(fit2, gof.lag=24)
### forecasts for the final model
U = x.pr\$pred + 2*x.pr\$se
L = x.pr\$pred - 2*x.pr\$se
quarter=1:88
plot(quarter, log(jj[quarter]), type="o", ylim=c(-1,4))
lines(x.pr\$pred, col="red", type="o")
lines(U, col="blue", lty="dashed")
lines(L, col="blue", lty="dashed")
abline(v=84.5,lty="dotted")
p
"n+1 satises the prediction
3.38 Clearly
j=1 j xn+1j sp{xk ; k n}, so it suces to show that x
"n+1 )xk ] = 0 for k n. But, by the model assumption, xn+1 x
"n+1 = wn+1 and
equations E[(xn+1 x
E(wn+1 xk ) = 0 for all k n.
3.39 First note that xi xi1
and xj xj1
for j > i = 1, ..., n are uncorrelated. This is because xi xi1

i
i
j
j1
sp{xk ; k = 1, ..., i} but xj xj
is orthogonal (uncorrelated) to sp{xk ; k = 1, ..., i} by denition of
xj1
Thus, by the projection theorem, for t = 1, 2, . . .,
j
xtt+1 =

t


tk (xt+1k xtk
t+1k )

(1)

k=1

where the tk are obtained by the prediction equations. Multiply both sides of (1) by xj+1 xjj+1 for
j = 0, ..., t 1 and take expectation to obtain
(
)
j
E xtt+1 (xj+1 xjj+1 ) = t,tj Pj+1
.
Because of the orthogonality E[(xt+1 xtt+1 )(xj+1 xjj+1 )] = 0 when j < t, so equation above can be
written as
)
(
j
.
(2)
E xt+1 (xj+1 xjj+1 ) = t,tj Pj+1

Chapter 3

32

## Using (1) with t replaced by j, (2) can be written as

+

,
j
!1

j
t,tj = E xt+1 xj+1
jk (xj+1k xjk
Pj+1
.
j+1k )
k=1

Thus


t,tj =

(t j) +

j


(
)
jk E xt+1 (xj+1k xjk
)
j+1k

j
Pj+1

!1

(3)

k=1
k
Using (2) we can write E[xt+1 (xj+1k xjk
j+1k )] = t,tk Pk+1 so (3) can be written in the form of
t
t
(2.71). To show (2.70), rst note that E(xt+1 xt+1 ) = E[xt+1 E(xt+1 |x1 , ..., xt )] = E[(xtt+1 )2 ]. Then,
for t = 1, 2, ...,

t
= E(xt+1 xtt+1 )2 = (0) E[(xtt+1 )2 ] = (0)
Pt+1

t1


j
2
t,tj
Pj+1

j=0

2 1
0
1
2
1

0 1
2

..
.
.
.
.

0
0
0
0
0

n
k=1

0
0
1
..
.
1

0
0
1

0 2 0

..
..
... = ... .
.
.

1
2 1 n
1
2

## Solving recursively we get 2 = 21 , 3 = 31 , and in general, k = k1 for k = 1, ..., n. This

1
and the result follows.
fact and the last equation gives 1 = n+1
2
(b) MSE= (0) a1 (n) a2 (n + 1) an (1) = w
[2 n/(n + 1)] =

(n+2) 2
(n+1) w .

## 3.41 Let x = (x1 , ..., xn ) , then x N(0, n ) with likelihood L(x

x) = |n |1/2 exp{ 12 x 1
n x } (ignoring a
t1
constant). Note that xt = Psp{x1 ,...,xt1 } xt = E(xt |x1 , . . . , xt1 ) because of the normality assumption. Hence, the innovations, t = xt xt1
, are independent Normal random variables with variance
t
is a linear combination of x1 , . . . , xt1 , the transformation of x ,
Ptt1 for t = 1, ..., n. Because xt1
t
where  = (1 , . . . , n ) , is lower triangular with ones along the diagonal; i.e. x = C, where C is lower
triangular. In fact, Problem 3.38 shows that (with ij dened there)

1
0
0 0

11
1
0 0

21
1 0
22
C=
.

..
..
..
..

.
.
.
.
n1,n1

n1,n2

Thus, L(x
x) = L(C). Noting that C N(0, CDC  ), where D = diag{P10 , P21 , . . . , Pnn1 } we have
1
L(x
x) = L(C) = |CDC  |1/2 exp{  C  (CDC  )1 C}.
2
This establishes the result noting that |CDC  |
= |C 2 | |D| and |C 2 | = 1, |D| = P10 P21 Pnn1 , and in
n
 
 1
 1
)2 /Ptt1 .
the exponential  C (CDC ) C =  D  = t=1 (xt xt1
t
3.42 These results are proven in Brockwell and Davis (1991, Proposition 2.3.2).
3.43 The proof of Property P2.2 is virtually identical to the proof of Property P2.1 given in Appendix B.

Chapter 4

33

Chapter 4

4.1 (a)(b) The code is basically the same as the example and is given below. The dierence is the
frequencies in the data (which are .06, .1, .4) are no longer fundamental frequencies (which are of the
form k/128). Consequently, the periodogram will have non-zero entries near .06, .1, .4 (unlike the
example where all other frequencies are zero).
t = 1:128
x1 = 2*cos(2*pi*t*6/100) + 3*sin(2*pi*t*6/100)
x2 = 4*cos(2*pi*t*10/100) + 5*sin(2*pi*t*10/100)
x3 = 6*cos(2*pi*t*40/100) + 7*sin(2*pi*t*40/100)
x = x1 + x2 + x3
par(mfrow=c(2,2))
plot.ts(x1, ylim=c(-16,16), main="freq=6/100, amp^2=13")
plot.ts(x2, ylim=c(-16,16), main="freq=10/100, amp^2=41")
plot.ts(x3, ylim=c(-16,16), main="freq=40/100, amp^2=85")
plot.ts(x, ylim=c(-16,16), main="sum")
P = abs(2*fft(x)/128)^2
f = 0:64/128
plot(f, P[1:65], type="o", xlab="frequency", ylab="periodogram")
(c) Use the same code as in the example, but with x = x1 + x2 + x3 + rnorm(100,0,5). Now the
periodogram will have large peaks at .06, .1, .4, but will also be positive at most other fundamental
frequencies.
4.2 (a) Rewrite the transformation as
x = tan1

z2
z1

y = z12 + z22 .

Note that

1 u
tan1 u =
.
x
1 + u2 x
Write the joint density of x and y as
g(x, y) = f (z1 , z2 )J,
1 ,z2 )
where J denotes the Jacobian, i.e., the determinant of the 2 2 matrix { (z
(x,y) }. It is easier to
compute
 x
  z

z1
x
2

 

 z1 z2   z12 +z22 z12 +z22 
1
 = 
 = 2,
= 
 

J
y
 y
 

2z1
2z2
z1
z2

## implying that J = 12 . For the joint density of x and y, we obtain

1 2
y
1 1
1 1
2
exp (z1 + z2 ) =
exp
g(x, y) = (2)
2
2
2 2
2
for 0 < x < 2 and 0 < y < . Integrating over x and y separately shows that the density factors
into a product of marginal densities as stated in the problem.
(b) Going the other way, we use
h(z1 , z2 ) = g(x, y)

 1

1 1
1
1
=
2 exp{y/2} =
exp (z12 + z22 ) ,
J
2 2
2
2

## since z12 + z22 = ( y)2 (cos2 (x) + sin2 (x)) = y.

Chapter 4

34

4.3 This is similar to Problem 1.9. Write the terms in the sum (4.4) as xt,k and note that xk,t and xt, are
uncorrelated for k = .
k (h)

= E(xt+h,k xt,k )


 

= E U1k sin[2k (t + h)] + U2k cos[2k (t + h)] U1k sin[2k t] + U2k cos[2k t]


= k2 sin[2k (t + h)] sin[2k t] + cos[2k (t + h)] cos[2k t]

= k2 cos[2k (t + h) 2k t] = k2 cos[2k h]
q
and (h) = k=1 k (h) give (4.5).
4.4 (a) Ewt = Ext = 0 by linearity, w (0) = 1 and zero otherwise; x (0) = (1 + 12 ), x (1) = 1 , and
is zero otherwise. The series are stationary because they are zero mean and the autocovariance
does not depend on time but only on the shift.
(b) By (4.13),
fx () =

1


h=1

## 4.5 (a) Write the equation as

xt xt1 = wt
2
and note that the spectrum of the righthand side is w
. The ACF of the lefthand side is

x (h)

## = E[(xt+h xt+h1 )(xt xt1 )]

= (1 + 2 )x (h) x (h 1) x (h + 1)
 1/2
e2ih [1 + 2 e2 e2i ]fx () d
=
1/2

1/2

=
1/2

## e2ih [1 + 2 2 cos(2)]fx ()d,

which exhibits the form of the spectrum by the uniqueness of the Fourier transform. Equating
the spectra of the left and right sides of the dening equation leads to
2
[1 + 2 2 cos(2)]fx () = w

## and the quoted result.

(b) From (4.13), write
fx ()

=
=
=
=

2 h 2ih
2
2 h 2ih


w
e
w
e
w
+
+
2
2
1
1
1 2
h=
h=1



2

w
2i h
2i h
(e
) +
(e
)
1 2
h=0
h=1


2
1
w
e2i
+
1 2 1 e2i
1 e2i
2
2
1
w
2
1 |1 e2i |2
2
w
,
1 + 2 2 cos(2)

Chapter 4

35

## 4.6 (a) First, note that the autocovariance function is

x (h) = (1 + A2 )s (h) + As (h D) + As (h + D) + n (h)
Using the spectral representation directly,

x (h) =

1/2

1/2





1 + A2 + Ae2iD + Ae2iD + fn () e2ih d

Substituting the exponential representation for cos(2D) and using the uniqueness gives the
required result.
(b) Note that multiplier for the signal spectrum is periodic and will be zero for
cos(2D) =

1 + A2
2A

Determining the multiple solutions for in the above equation will yield equally spaced values of
, proportional to D, where the spectrum should be zero.
4.7 The product series will have mean E(xt yt ) = E(xt )E(yt ) = 0 and autocovariance
z (h) = Ext+h yt+h xt yt = E(xt+h xt )E(yt+h yt ) = x (h)y (h).
Now, by (4.12)and (4.13)
fz () =


x (h)y (h) exp{2ih} =

h=

1/2

1/2 h=

x (h)e

2i()h

1/2

1/2 h=

fy () d =

x (h)e2ih e2ih fy () d

1/2

1/2

fx ( )fy () d.

4.8 Below is R code that will plot the periodogram on the actual scale and then on a log scale (this produces
a generic condence interval see Example 4.9 on how to get precise limits). The two major peaks are
marked; they are 3 cycles/480 points = 3 cycles/240 years or 80 years/cycle, and 22 cycles/480 points
= 22 cycles/240 years or about 11 years/cycle.
sun = scan("/mydata/sunspots.dat")
par(mfrow=c(2,1))
sun.per = spec.pgram(sun, taper=0,log="no")
sun.per = spec.pgram(sun, taper=0)
abline(v=3/480, lty="dashed") # 80 year cycle
abline(v=22/480, lty="dashed") # 11 year cycle
4.9 This is like the previous problem; the main component is 1 cycle/16 rows, although theres not enough
data to get signicance.
x = scan("/mydata/salt.dat")
temp = x[1:64]
salt = x[65:128]
par(mfrow=c(2,1))
temp.per = spec.pgram(temp, taper=0,log="no")
temp.per = spec.pgram(temp, taper=0)
abline(v=2/32, lty="dashed")
salt.per = spec.pgram(salt, taper=0,log="no")
salt.per = spec.pgram(salt, taper=0)
abline(v=2/32, lty="dashed")

Chapter 4

36

4.10 (a) Write the model in the notation of Chapter 2 as xt = z t +wt , where z t = (cos(2k t), sin(2k t))
and = (1 , 2 ) . Then

n
n
2


n

t=1 cos (2k t)
t=1 cos(2k t) sin(2k t)
0
= n/2
z tz t = 
n
0
n/2
n
2
t=1
t=1 cos(2k t) sin(2k t)
t=1 sin (2k t)
from the orthogonality properties of the sines and cosines. For example,
n




1  2ik t
e
+ e2ik t e2ik t + e2ik t
4 t=1
n

cos2 (2k t) =

t=1

 n
1  4ik t
e
+ 1 + 1 + e4ik t = ,
4 t=1
2
n

=
because, for example,

n


4ik t

t=1

Substituting,

1
2



e4ik/n 1 e4ik
=
=0
1 e4ik/n

n



x cos(2k t)
2 t=1 t
dc (k )
1/2

= 2n
=
.
ds (k )
n n
t=1 xt sin(2k t)

(b) Now,
n
SSE

= xx 2n1/2 ( dc (k ),

t=1

xt cos(2k t)

n
t=1

xt sin(2k t)

ds (k ) ) 



= xx 2 d2c (k ) + d2s (k ) = xx 2Ix (k ).
(c) The reduced model is given by xt = wt , so that RSS1 =
(b). For the F -test we have q = 2, q1 = 0, so that
F2,n2 =

n

2Ix (k )

x x 2Ix (k )

t=1

## x2t = xx and RSS is given in part

n2
2

is monotone in Ix .
4.11 By applying the denition to xts , we obtain
n


as xts

= n1/2

s=1

n1


dx (k )

k=0

n1

k=0

dx (k )n1/2

n


as e2ik (ts)

s=1
n


as e2ik s e2ik t =

s=1

## 4.12 Continuing from Problem 4.8:

sun = scan("/mydata/sunspots.dat")
par(mfrow=c(2,1))
sun.per = spectrum(sun, spans=c(7,7),log="no")
sun.per = spectrum(sun, spans=c(7,7))
abline(v=3/480, lty="dashed") # 80 year cycle
abline(v=22/480, lty="dashed") # 11 year cycle
4.13 Continuing from Problem 4.9:

n1

k=0

dA (k )dx (k )e2ik t .

Chapter 4

37

x = scan("/mydata/salt.dat")
temp = x[1:64]
salt = x[65:128]
par(mfrow=c(2,1))
temp.per = spec.pgram(temp, spans=5, log="no")
abline(v=2/32, lty="dashed")
salt.per = spectrum(salt, spans=5, log="no")
abline(v=2/32, lty="dashed")
4.14 R code and discussion below. Also, see Figure 1.

10 15

speech = scan("/mydata/speech.dat")
sp.per = spec.pgram(speech, taper=0) # plots the periodogram - which is periodic
x=sp.per\$spec
# x is the periodogram
x=log(x)
# log periodogram
ts.plot(x)
# another plot as a time series.
x.sp=spectrum(x,span=5) # cepstral analysis, x is detrended by default in R
abline(v=.1035, lty="dashed")
cbind(x.sp\$freq,x.sp\$spec)
# this lists the quefrencies and cepstra
[52,] 0.101562500 32.7549412
[53,] 0.103515625 34.8354468 # peak is around here, so Delay is about .1035 seconds
[54,] 0.105468750 30.3669195
#
which is about the same result as Example 1.24

100

200

300

400

500

Time

## 0.5 2.0 10.0 50.0

spectrum

Series: x
Smoothed Periodogram

0.0

0.1

0.2

0.3

0.4

0.5

frequency
bandwidth = 0.00246

## Figure 1: Figure for Problem 4.14

4.15 For yt = ht xt , the DFT is
dy (k ) = n1/2

n


ht xt e2ik t .

t=1

## Then, the expectation of the squared DFT is

E|dy ((k )|2

n
n 


hs ht (s t)e2ik (st) =

s=1 t=1
 1/2
1

= n

=

1/2

hs ht (s t)e2ik (st)

s=1 t=1
n


1/2 s=1
1/2

n
n 


hs e2i(k )s

n

t=1

|Hn (k )|2 fx () d.

ht e2i(k )t fx () d

Chapter 4

38

It follows that



1
2
E L
|Y (k + /n)|

1/2

=
1/2

1/2

=
1/2

1
|Hn (k + /n )|2 fx () d
L


Wn (k )fx () d.

4.16 (a) Since the means are both zero and the ACFs and CCFs

2
x (h) = 1

h=0
h = 1
|h| 2

1/2
y (h) = 1/4

h=0
h = 1
|h| 2

1/2
xy (h) =
1/2

h=0
h=1
h = 1
|h| 2

do not depend on the time index, the series are jointly stationary.
(b)
fx () = |1 e2i |2 = 2(1 cos(2))

and fy () =

1
1
|1 + e2i |2 = (1 + cos(2))
4
2

As goes from 0 12 , fx () increases, whereas fy () decreases. This means xt has more high
frequency behavior and yt has more low frequency behavior.
(c)

2La
2Lfy (.10)
2Lb
2La
2Lb

22L
=P
fy (.10)
fy (.10)
fy (.10)
fy (.10)
fy (.10)

## We can make the probability equal to .90 by setting

2La
= 22L (.95)
fy (.10)

and

2Lb
= 22L (.05)
fy (.10)

Setting L = 3, 26 (.95) = 1.635, 26 (.05) = 12.592, fy (.10) = .9045 and solving for a and b yields
a = .25, b = 1.90.
4.17 The analysis is similar to that of Example 4.16. The squared coherency is very large at periods ranging
from 16-32 points, or 272-544 feet (1 point = 17 feet). R code below:
x = scan("/mydata/salt.dat")
temp = x[1:64]
salt = x[65:128]
x = ts(cbind(temp,salt))
s = spec.pgram(x, kernel("daniell",2), taper=0)
s\$df
# = 10
f = qf(.999, 2, s\$df-2)
# = 18.49365
c = f/(18+f)
# = 0.5067635
plot(s, plot.type = "coh", ci.lty = 2)
abline(h = c)
cbind(s\$freq, s\$coh)
[,1]
[,2]
[1,] 0.015625 0.598399213
[2,] 0.031250 0.859492914 # period 1/.03 = 32
[3,] 0.046875 0.891469033
[4,] 0.062500 0.911331648 # period 1/.06 = 16
[5,] 0.078125 0.749974642

Chapter 4

39

2 D
hD = 1 when h = D and zero
4.18 (a) xy (h) = cov(xt+h , yt ) = cov(w
 t+h , wt + vt ) = h where
2
otherwise. Thus, fxy () = h xy (h) exp(2ih) = exp(2iD). Also, fx () = 2 and
P4.4 and the fact that wt and vt are independent. Finally
fy () = 2 (1 + 2 ) using Proposition
0
2xy () = | 2 exp(2iD)|2 [ 2 2 (1 + 2 )] = 2 /(1 + 2 ), which is constant and does not
depend on the value of D.

(b) In this case, 2xy () = .81/1.81 = .45. The R code to simulate the data and estimate the coherence
is given below. Note that using L = 1 gives a value of 1 no matter what the processes are, and
increasing L (span) gives better estimates.
x=rnorm(1024,0,1)
y=.9*x+rnorm(1024,0,1)
u = ts(cbind(x,y))
s=spec.pgram(u, taper=0, plot=F)
# use this for span=0 or
s=spectrum(u, span=3, taper=0, plot=F) # this for span=3 (span=41 and span=101)
plot(s, plot.type = "coh")
# -- these two lines can be used
abline(h = .81/1.81, lty="dashed")
# -- to obtain plots for each case
4.19 (a) It follows from the solution to the previous problem that xy () = 2D. Hence the slope of the
phase divided by 2 is the delay D.
(b) Bigger values of L give better estimates.
x=ts(rnorm(1025,0,1))
y=.9*lag(x,-1)+rnorm(1025,0,1)
u = ts(cbind(x,y))
u = u[2:1025,]
# drop the NAs
s = spectrum(u, span=101, taper=0, plot=F) # use span=3,41,101(displayed)
plot(s, plot.type = "phase")
abline(a=0,b=2*pi, lty="dashed")
# for L=1 use: s = spec.pgram(u, taper=0, plot=F)
4.20 (a) The R code for the cross-spectral analysis of the two series is below:
x=ts(scan("/mydata/prod.dat"))
y=ts(scan("/mydata/unemp.dat"))
pu=cbind(x,y)
par(mfrow=c(2,1))
pu.sp=spectrum(pu, span=7)
abline(v=c(1/12,2/12,3/12,4/12,5/12),lty="dashed")
plot(pu.sp, plot.type="coh")
See Figures 2 4. The log spectra with L = 7, show substantial peaks at periods of 2.5 months,
3 months, 4 months and 6 months for the production spectrum and signicant peaks at those
periods plus a 12 month or one year periodicity in the unemployment spectrum. It is natural that
the series tend to repeat yearly and quarterly so that 12 month and 3 month periods would be
expected. The 6 month period could be winter-summer uctuations or possibly a harmonic of the
yearly cycle. The 4 month period could be a three cycle per year variation due to something less
than quarterly variation or possibly a harmonic of the yearly cycle (recall harmonics of 1/12 are
of the form k/12, for k = 2, 3, 4, .... The squared coherence is large at the seasonal frequencies,
as well as a low frequency of about 33 months, or three years, possibly due to a common low
frequency business cycle. High coherence at a particular frequency indicates parallel movement
between two series at the frequency, but not necessarily causality.
(b) The following code will plot the frequency response functions; see Figure 3
w = seq(0,.5, length=1000)
par(mfrow=c(2,1))
FR12 = abs(1-exp(2i*12*pi*w))^2
plot(w, FR12, type="l", main="12th difference")
FR112 = abs(1-exp(2i*pi*w)-exp(2i*12*pi*w)+exp(2i*13*pi*w))^2
plot(w, FR112, type="l", main="1st diff and 12th diff")

Chapter 4

40

10

15

12mo
10

2.5mo

4mo

4mo

3mo

3mo

2.5mo

6mo

6mo

0.1

0.2

0.3

0.4

0.5

0.1

0.2

0.3

0.4

0.5

Coherence
1
33 mo

6mo

2.5mo

4mo

9mo

0.8

3mo

(coherence)

F2,8(.01)

0.6
0.4
0.2
0

0.1

0.2

0.3

0.4

0.5

2
0

FR12

12th difference

0.0

0.1

0.2

0.3

0.4

0.5

0.4

0.5

10
5
0

FR112

15

0.0

0.1

0.2

0.3
w

## Figure 3: Squared frequency response of various lters.

The frequency response resulting from the application of the standard dierence, followed by a
seasonal dierence shows that the low frequencies and seasonal frequencies should be attenuated,
the low frequencies by the dierence and the seasonal frequencies by the seasonal dierence. The
ltered series are plotted in Figure 4; the rst dierence obviously eliminates the trend but there
are still regular seasonal peaks, gradually increasing over the length of the series. The nal ltered
series tends to eliminate the regular seasonal patterns and retain the intervening frequencies.
(c) As mentioned before, the ltered outputs are shown in Figure 4. Figure 5 shows the spectral
analysis of the three series. The rst shows the spectrum of the original production series with the
low and seasonal frequency components. The second shows the spectrum of the dierenced series
and we see that the low frequency components have be attenuated while the seasonal component
remain. The third shows the spectrum of the seasonally dierenced dierences, and the power at

Chapter 4

41

Production Index
200
150
100
50
0

50

100

50

100

50

100

150

200
First Difference

250

300

350

150
200
250
Seasonal Difference of First Difference

300

350

300

350

10
5
0
5
10
10
5
0
5
10

150

200
Month

250

## Figure 4: Production and Filtered Series.

Production log spectrum

10

Power

10

0.1

0.2

0.3

0.4

0.5

0.1

0.2
0.3
Frequency

0.4

0.5

10

Frequency

## First difference attennuates low frequencies

5
Seasonal attenuates seasonal frequencies
0

0.1

0.2
0.3
Power

0.4

0.5

Figure 5: Log spectra of production, dierenced production and seasonally dierenced dierenced production.
the seasonal components is essentially notched out. Economists would prefer a atter response
for the seasonally adjusted series and the design of seasonal adjustment lters that maintain a
atter response is a continuing saga. Shumway (1988, Section 4.4.3) shows an example.
4.21 Write the lter in the general form (4.91), with a2 = a2 = 1, a1 = a1 = 4, a0 = 6. Then
A()

2


## at e2it = (e4i + e4i ) + 4(e2i + e2i ) + 6

t=2

(6 + 2 cos(4) + 8 cos(2))

By (4.94), the spectrum of the output series is fy () = (6 + 2 cos(4) + 8 cos(2))2 fx (). The spectrum of the output series will depend on the spectrum of the input series but we can see how frequencies
of the input series are modied by plotting the squared frequency response function |A()|2 . After
plotting the frequency response function, it will be obvious that the high frequencies are attenuated
and the lower frequencies are not. The lter is referred to as a low pass lter, because it keeps or passes
the low frequencies.

Chapter 4

42

4.22
yt


1  2it
2it
2it
=
ak cos[2(t k)] = e
A() + e
A () = Re A()e
2
k=

= Re (AR () iAI ())(cos(2t) + i sin(2t)) = AR () cos(2t) + AI sin(2t)


= |A()| cos(()) cos(2t) sin(()) sin(2t) = |A()| cos(2t + ()).

A2R () + A2I ().

## 4.23 (a) From Property P4.4, we have fy () = |A()|2 fx () and

fz () = |B()|2 fy () = |B()|2 |A()|2 fx ()
(b) The frequency response functions of the rst dierence lter |A()|2 , the seasonal dierence lter
|B()|2 and the product of the two are shown in 3. Note that the rst dierence tends to keep
the high frequencies and attenuate the low frequency and is an example of a high-pass filter.
The seasonal dierence tends to attenuate frequencies at the multiples 1/12, 2/12, . . . 6/12 which
correspond to periods of 12, 6, 4, 3, 2 months respectively. Frequencies in between are retained and
the lter is an example of a notch lter, since it attenuates or notches out the seasonal frequencies.
(c) The product of the two lters tends to reject low frequency trends (the high-pass part) and seasonal
frequencies (the notch part). Retaining the other frequencies is of interest to economists who seek
seasonal adjustment lters. Better ones can be designed by specifying a frequency response for
the high-pass part that rises more sharply.
4.24 (a) Using Property P4.4, fy () = [1 + a2 2a cos(2)]1 fx ().
(b) Figure 6 plots the frequency response functions of both lters and we note that they are both
low-pass filters with dierent rates of decrease. Recursive lters change the phase information
and sometimes this can be important. Running the lters twice, once forward and once backward
can x this problem.
Frequency Responses for =.8

25

1.5

20

1
15

10
0.5

0.1

0.2
0.3
frequency

0.4

0.5

0.1

0.2
0.3
frequency

0.4

0.5

## Figure 6: Squared frequency response of recursive lters.

4.25 R code for tting an AR spectrum using AIC is given below. The analysis results in tting an AR(16)
spectrum, which is similar to the nonparametric spectral estimate.
sun=scan("/mydata/sunspots.dat")
spec.ar(sun)

Chapter 4

43

4.26 R code for tting an AR spectrum using AIC is given below. The analysis results in tting an AR(13)
spectrum, which is similar to the nonparametric spectral estimate.
rec=scan("/mydata/recruit.dat")
spec.ar(rec)
4.27 We have 2Lfx (1/8)/fx (1/8) 22L where fx (1/8) = [1 + .52 2(.5) cos(2/8)]1 = 1.842 from Problem
4.24(a). For L = 3, we have 2(3)2.25/1.842 = 7.26 and does not exceed 26 (.05) = 12.59. For L = 11,
we have 2(11)2.25/1.842 = 26.87 and does not exceed 222 (.05) = 33.92. Neither sample has evidence
for rejecting the hypothesis that the spectrum is as claimed at the = .05 level.
4.28 The conditions imply that under H0 : d(k + /n) CN {0, fn ()} and under H1 : d(k + /n)
CN {0, fs () + fn ()}. For simplicity in notation, denote d = d(k + /n) and fs = fs (), fn = fn ().
(a) The ratio of likelihoods, under the two hypotheses would be
1
L (fs + fn )L  exp{|d |2 /(fs + fn )}
p1
1
=
p0
L (fn )  exp{|d |2 /fn }
and the log likelihood involving the data d is proportional to



1
1
p1
2

|d |
+
T = ln
,
p0
fn
(fs + fn )


(b) Write
T =


fs
|d |2
fn (fs + fn )


## and note that

2




|d |2

fn
under H0 and

22L


2  |d |2
22L
(fs + fn )

under H1 . Hence
T

fs
1
2
2 (fs + fn ) 2L

under H0 and
T

1 fs 2

2 fn 2L

under H1 .
(c) Here, we note that




fs + fn
(SN R + 1)
= P 22L > 2K
PF = P {T > K|H0 } = P 22L > 2K
,
fs
SN R
and

 

fn
2K
Pd = P {T > K|H1 } = P 22L > 2K
= P 22L >
,
fs
SN R

where SN R denotes the signal-to-noise ratio. Note that, as SN R , PF P {T > 2K} and
Pd 1, and the signal detection probability approaches unity for a xed false alarm rate, as
guaranteed by the Neyman-Pearson lemma.
4.29 The gures (shown at the end of the solutions) for the other earthquakes and explosions are consistent,
for the most part, with Example 4.20. The NZ event is more like an explosion than an earthquake.

Chapter 4

44

4.30 For brevity, we only show the energy distribution of the other earthquakes (EQ) and explosions (EX);
see Table 4.2 for EQ1 and EX1 and Example 4.22 for the NZ event. Typically, earthquakes have most
of the energy distributed between d2d4; the explosions typically have most of the energy distributed
between d2d3 (as does the NZ event). The waveshrink estimates for EQ 2, 4, 6, 8 and EX 2, 4, 6, 8
are shown at the end of the solutions.
Energy(%) Distribution for
EQ2
EQ3
EQ4
s6
0.000 0.000 0.012
d6
0.001 0.003 0.017
d5
0.036 0.121 0.071
d4
0.200 0.346 0.402
d3
0.433 0.399 0.334
d2
0.266 0.119 0.127
d1
0.064 0.012 0.038

Earthquakes
EQ5
EQ6
0.009 0.001
0.043 0.002
0.377 0.184
0.366 0.507
0.160 0.230
0.040 0.071
0.003 0.006

EQ7
0.000
0.001
0.019
0.309
0.524
0.129
0.019

EQ8
0.000
0.005
0.118
0.484
0.287
0.095
0.010

## Energy(%) Distribution for

EX2
EX3
EX4
s6
0.001 0.000 0.000
d6
0.002 0.001 0.002
d5
0.012 0.028 0.005
d4
0.064 0.232 0.053
d3
0.456 0.478 0.444
d2
0.385 0.242 0.375
d1
0.079 0.019 0.121

Explosions
EX5
EX6
0.001 0.002
0.005 0.002
0.005 0.007
0.018 0.015
0.210 0.559
0.654 0.349
0.108 0.066

EX7
0.001
0.009
0.026
0.123
0.366
0.413
0.062

EX8
0.005
0.018
0.130
0.384
0.318
0.122
0.024

4.31 The solution to this problem is given in the discussion of the previous two problems, 4.29 and 4.30.
4.32 Note rst that
aM
k

= M 1

M
1


A(j )e2ij k = M 1

at e2ij t e2ij k

j=0 t=

j=0

M
1


at M 1

t=

M
1


e2ij (tk) =

j=0

ak+M = ak +

=

ak+M .

=0

Thus
yt ytM


|k|M/2

|k|M/2

ak xtk

=0 |k|<M/2

|ak xtk | +

ak+M xtk

|ak xtk | +

|k|M/2

|ak xtk | 2

|k|>M/2

|ak+M xtk |

=0 |k|<M/2

|ak xtk |,

|k|M/2

where the last steps follow by writing the separate sums for  = 1, 2, . . . and simplifying. Then


|aj ||ak |E[|xtj ||xtk |]
E[(yt ytM )2 ] 4
|j|M/2 |k|M/2

## |aj ||ak |E 1/2 [|xtj |2 ]E 1/2 [|xtk |2 ]

|j|M/2 |k|M/2

 
4x (0)

2
|ak | ,

|k|M/2

which goes to zero as M increases as long as the absolute summability condition holds.

Chapter 4

45

4.33 Multiply both sides of the equation by xt+h an use the Fourier representation of the spectra and cross
spectra to show that fyx () = A()fx (). Also, fy () = |A()|2 fx (). Then, by the denition of
squared coherence,
|A()fx ()|2
|fyx ()|2
=
=1
2yx () =
fx ()fy ()
fx ()|A()|2 fx ()
4.34 (a) Figure 7.3 (in the text) shows what the ordinary coherence functions should look like. It is clear
that the precipitation-inow coherence is uniformly larger than the others so that precipitation
should be considered as the major contributor over the entire frequency range. Cloud cover also
appears to be predictive.
(b) Figure 7 shows the impulse-response function, suggesting dependence of the inow on an exponentially weighted combination of past precipitation, i.e.
It

j Ptj =

j=0

j B j P t =

j=0

1
Pt .
1 B

One can approximate the coecient from the plot or run a regression of the form
It = It1 + Pt
which yields = .54 as the decay constant.
Impulse response relating Inflow to Precipitation
0.5

0.4

0.3

0.2

0.1

0.1

0.2

0.3

0.4

0.5

15

10

0
lag

10

15

Figure 7: Impulse response relating transformed precipitation to transformed inow. Shows exponentially
decaying dependence on present and past precipitation.
4.35 For the model yt =


r=

## r xtr + vt , rst show that

y (h) =

r x (h r + s)s + v (h).

r= s=

Substituting the spectral representations for x (h) and v (h) and identifying the representations for
fx (), fv () as well as the Fourier series for t , leads to the rst required result. Then, note that
xy (h) =

r x (h + r) + v (h)

r=

## and substitute again. That is, since xt and vt are uncorrelated,

 1/2 
 1/2

xy (h) =
r e2ir fx ()e2ih d =
B()fx ()e2ih d.
1/2 r=

1/2

Chapter 4

46

## 4.36 Rewrite the rst equation as

yt 1 yt1

= xt 1 xt1 + vt 1 vt1
= wt + vt 1 vt1 ,

and identify the righthand side as a rst-order MA. The required spectrum is the spectrum of an
ARMA(1, 1) process, which also has an rst order MA on the righthand side. Assuming that the
process is Gaussian, we can equate the ACFs of the two processes and solve for 2 and 1 . Letting
ut = wt + vt 1 vt1 , we obtain
2
+ (1 + 22 )v2
u (0) = w
and u (1) = 1 v2 . Equating these results to the corresponding results for a rst-order MA leads to
the equations
2
(1 + 21 )v2 and 1 2 = 1 v2
2 (1 + 12 ) = w
relating the two models. Solving the second and substituting back into the rst equation leads to the
required results.
4.37 (a) Set up the orthogonality condition
E[(xt

as yts )ytu ] = 0, u = 0, 1, . . . ,

s=


which leads to the normal equations s= as y (u s) = xy (u). Noting that xy (u) = x (u)
and taking Fourier transforms leads to the equation
A() =

2
1
|1 1 e2i |2
1
2
fx ()
= w2
= w2
,
2i
2
2i
2
fy ()
|1 1 e
| |1 1 e
|
|1 1 e2i |2

2
which we recognize as the spectrum of a rst-order AR process, with variance w
/ 2 . Hence its
Fourier transform will be the autocovariance of a rst-order AR process, i.e.
|s|

as =

2
1
w
2
1 12

## for the optimal lter.

(b) The mean squared error will be
E[(x x
t )xt ] = E[(xt
=

## as yts )xt ] = x (0)

s=

2
w
1
1
4
w2
2
1 1
1 12 1 21

as x (s)

s=

|s|

(1 1 )

s=



2
2
1 1 + 1 1
w
2 2
w
=
= 2 v w2 .
1 2
2
2
1 1 )
1 1 1 1 1
(1 1 )

(c) To get the optimal nite estimator, use the orthogonality principle to get the equation

  

y (0) y (1)
x (1)
a1
=
a2
y (1) y (0)
x (2)
We obtain y (0), y (1) from Example 3.11 in Chapter 3, which derives the autocovariance of an
ARM A(1, 1) process and x (1), x (2) from Problem 3.5(b). This leads to the equation

  

.9583 .8147
a1
.8147
=
.8147 .9584
.7333
a2
and the solution a = (.7204, .1527) . The mean squared error can be computed from
MSE = E[(xt a1 yt1 a2 yt2 )xt ]
= x (0) a1 x (1) a2 x (2)
= .2064.
The optimal mean squared error from the equation in part (b) is .0364.

Chapter 4

47

## 4.38 (a) Write


y (h1 , h2 )

= E

au1 ,u2 av1 ,v2 xs1 +h1 u1 ,s2 +h2 u2 xs1 v1 ,s2 v2


y (h1 , h2 ) =

1/2



1/2 u1 ,u2



## av1 ,v2 e2i(v1 1 +v2 2 )

v1 ,v2

fx (1 2 )e2i(1 h1 +2 h2 ) d1 d2


1/2

=
1/2

## |A(1 , 2 )|2 fx (1 , 2 )e2i(1 h1 +2 h2 ) d1 d2 .

2
4.39 We can use the denition (C.13) for Sn (k ,  ) in the white noise case [(s t) = w
, for s = t and 0
otherwise] to write
n

2
e2i(k  )t ,
Sn (k ,  ) = n1 w
t=1

## which we know to be 1 when k  = 0, n, 2n . . . and 0 otherwise. Then, for example,

E[dc (k )dc ( )]

=
=

1
Sn (k ,  ) + Sn (k ,  )
4

+Sn ( , k ) + Sn (k ,  )
2
w
[0 + 1 + 1 + 0]
4
2
w
,
2

for k =  and is zero otherwise. The other terms are treated similarly.
4.43 Write
w

= 2Re[a
ac ia
as ) (x
xc ixs )]


= 2(a
ac x c + a s x s )
 
xc
= 2 ( ac as )
.
xs

Hence,
cov z


 
1 C Q
ac
)
=
a
Q
C
2
s
= 2(a
ac + ia
as )(C iQ)(a
ac ia
as )
= a a
a,
4 ( ac

as

Chapter 4

48

Chapter 4

49

Chapter 4

50

Chapter 4

51

EQ2

EQ4

Data

Data

Signal

Signal

Resid

Resid

500

1000

1500

2000

500

EQ6

Data

Signal

Signal

Resid

Resid

500

1000

1500

2000

1500

2000

EQ8

Data

1000

1500

2000

500

1000

Chapter 4

52

## Waveshrink Figures For Problem 4.30 - Explosions

EX2

EX4

Data

Data

Signal

Signal

Resid

Resid

500

1000

1500

2000

500

EX6

Data

Signal

Signal

Resid

Resid

500

1000

1500

2000

1500

2000

EX8

Data

1000

1500

2000

500

1000

Chapter 5

53

Chapter 5

5.1 (a) A time plot is shown below. Note apparent trend in data.

## Figure 1: Plot of data for Problem 5.1(a)

(b) Shown below. ACF dampens slowly, PACF lag-1 indicates possible random walk.

## Figure 2: ACF and PACF for Problem 5.1(b)

(c) R code and partial output below (must load fracdiff package):
frac = scan("/mydata/fracdiff.dat")
ts.plot(frac)
acf(frac,100)
pacf(frac,100)
fracdiff(frac, nar=1)
\$d
[1] 0.2646309 (se = 0.009653345)
\$ar
[1] 0.8630677 (se = 0.016921244)
(d) The time plot in (a) and the ACF and PACF in (b) suggest nonstationary behavior typically
found with a random walk; i.e. slow trend in data, ACF tails slowly, PACF lag-1 near one.
(e) A plot of the ACF and PACF suggests stationarity is achieved after dierencing.
(f) Use ar(diff(frac)) to nd an AR model that ts the data via AIC. The results are: Order
selected = 1, phi1hat = 0.1695, sigmahat2 = 1.002.

Chapter 5

54

5.2 R code for tting fractional noise is below. The estimated value of d is about .49. R wasnt able t an
gtemp=x[,2]
fracdiff(gtemp)
We also t an ARFIMA(1,1,0) in Splus; the tted model is possibly fractional white noise. The
estimates were " = .11 with estimated standard error .0667 (" not signicant at .05 level) and d" = .48
with estimated standard error .0013 with output shown below.
Splus Commands:
gtemp<-globtemp2[,2]
arima.fracdiff(gtemp, model = list(ar = NA), M=50)
\$model\$ar:
[1] -0.1696179 # borderline significance
\$model\$d:
[1] 0.4617656
\$var.coef:
d
ar1
d 0.00001355149 -0.00001478296
ar1 -0.00001478296 0.00771081885
5.3 R code to t an ARFIMA(1,1,1) is below:
nyse=scan("/mydata/nyse.dat")
x=abs(nyse)
acf(x,200)
fracdiff(x,nar=1,nma=1)
\$d
0.3100793
\$ar -0.06873574
\$ma
0.1660708
\$stderror.dpq
0.016432124 0.006991419 0.007021429
5.4 The time plot of the data indicates ARCH behavior. The ACF and PACF of the returns suggest an
MA(1) or AR(1) behavior (with or approximately .2). Both models provide a good t, but the
AR(1) is easier to t and that is what we use. We t an AR(1) to the data and obtained -0.0006
for the constant and 0.2411 for the AR parameter estimate. A time plot of the residuals indicates
ARCH behavior. The ACF/PACF of the residuals appear to support the fact that the residuals are
white, but the ACF/PACF of the squared residuals shows some low order correlation. Next, we t
an AR(1)ARCH(1) model to the data using the S-PLUS garch module. The results of the t and
standard errors are given below:
"0
"1

"1

"0
.8612 .0862 .0275 0.2273
(.0634) (.0486) (.0426) (.0452)
5.5 A plot of the returns ln(xt ), where xt is the oil price series, appears to have some ARCH behavior
in that there is clustering of volatility. There are also some regions where there appears to be some
structural breaks in the data. The ACF and PACF of the returns suggest an AR(1) for the mean.
Thus, we used the S+Garch module to t a Garch(1,1) with an AR(1) mean, with commands and
output given below.
>
>
>
>

## oil <- scan("oil.dat")

oil1 <- diff(log(oil))
oil1.mod <- garch(oil1~ar(1),~garch(1,1))
summary(oil1.mod)

Chapter 5

55

-------------------------------------------------------------Estimated Coefficients:
-------------------------------------------------------------Value Std.Error t value
Pr(>|t|)
C 0.0031795 0.00460304 0.6907 2.453e-001
(phi0)
AR(1) 0.5050445 0.12005115 4.2069 2.069e-005
(phi1)
A 0.0003676 0.00008318 4.4198 8.670e-006
(alpha0)
ARCH(1) 0.2347824 0.09981626 2.3521 9.892e-003
(alpha1)
GARCH(1) 0.6056319 0.09228240 6.5628 2.919e-010
(beta1)
-------------------------------------------------------------Normality Test:
-------------------------------------------------------------Jarque-Bera P-value Shapiro-Wilk P-value
2849
0
0.7874
0
Ljung-Box test for standardized residuals:
-------------------------------------------------------------Statistic P-value Chi^2-d.f.
5.895 0.9213
12
Ljung-Box test for squared standardized residuals:
-------------------------------------------------------------Statistic P-value Chi^2-d.f.
4.053 0.9825
12
Lagrange multiplier test:
-------------------------------------------------------------Lag 1 Lag 2
Lag 3
Lag 4
Lag 5 Lag 6
Lag 7
Lag 8 Lag 9
-0.246 -0.31 -0.1775 0.04317 0.09501 1.216 -0.005681 -0.3309 -0.201
Lag 10 Lag 11 Lag 12
C
TR^2 P-value F-stat P-value
1.336 -0.2221 -0.2371 0.5095
4.205 0.9794 0.3922 0.9947
5.8 Following the advice of many authors [see for example Thanoon (1990), J. Time Series Anal., 75-87]
we t a long AR, to the data with threshold xt3 > 36.6. We found an AR(15) worked well, the
residuals appeared to be white, but were somewhat heteroscedastic. The following models were t:
xt

= (1) +

15


(1)

(1)

xt3 36.6

(2)

(2)

j xtj + wt ,

j=1

xt

= (2) +

15


j xtj + wt ,

j=1

para
SE
cnst
1.798
1.089
phi1
2.287
0.080
-2.094
0.197
1.312
0.243
-1.095
0.239
phi5
0.578
0.217
-0.255
0.189
0.417
0.186
-0.557
0.181
0.478
0.167
phi10 -0.493
0.155

Chapter 5

phi15

56

0.473
-0.300
0.238
-0.154
0.062

0.156
0.150
0.136
0.113
0.053

## Second set (> 36.6) of estimated parameters and standard errors:

para
SE
cnst
4.034
1.461
phi1
1.545
0.059
-1.117
0.113
1.026
0.135
-1.029
0.150
phi5
0.959
0.166
-1.057
0.179
1.006
0.189
-0.858
0.196
0.811
0.202
phi10 -0.935
0.204
0.988
0.199
-0.836
0.191
0.705
0.180
-0.706
0.152
phi15
0.418
0.077
t1
, are given below.
The ACF and PACF of the residuals, xt x
t

## Figure 3: ACF and PACF of the residuals for Problem 5.8

5.9 The data can be found in sales.dat and lead.dat. R code is below. After tting the regression, the
ACF and PACF indicate an AR(1) for the residuals, which ts well.
sales=ts(scan("/mydata/sales.dat"))
ds=u[,1]
dl3=u[,2]

Chapter 5

57

## fit=lm(ds~dl3) # beta1hat = 3.33 is highly significant

acf(fit\$resid) # these two suggest
pacf(fit\$resid) # an AR(1) for the residuals
res.fit=arima(fit\$resid, order=c(1,0,0)) # phi1hat = .58 is significant
tsdiag(res.fit) # diagnostics ok
y0 ) =  E(yy ) =  E(Z + x) =  Z
. Thus, if
5.10 (a) First note, E(y0 ) = E(zz 0 + x0 ) = z 0 . Also, E("


2
y0 ) = z 0 . Write E("
y0 y0 ) = E(
y y0 )2 = E(
y y ) 2
 E(yy y0 ) + E(y02 ).
Z = z 0 then E("
xx ) = and E(yy y0 ) = E(x
xx0 ) = 0 (by denition). We want to minimize
Next, note E(yyy  ) = E(x
MSE subject to unbiasedness, which by (a) is
Z  = z 0 .
Thus, we minimize

Q = 
2
 0 + E(y02 ) + 2
 (Z  z 0 )

## where is a vector of LaGrange multipliers. Now

Q/
= 2
2 0 + 2Z
=0
yields

+ Z
= 0
as was to be shown.
(b) From (a)
+ Z
= 0 , so that = 1 ( 0 Z
). Thus z 0 = Z  = Z  1 ( 0 Z
). Solving,
= (Z  1 Z)1 [Z  1 0 z 0 ]. Finally,
= 1 0 1 Z(Z  1 Z)1 [Z  1 0 z 0 ].


Now y"0 = y = 1 0 1 Z(Z  1 Z)1 [Z  1 0 z 0 ] y , or
"
"  1 Z
y"0 = 0 1y + z 0
w
w
0
which simplies to the desired expression.
5.11 (a) We
the
the
the

transform inow and take the seasonal dierence yt = ln it ln it12 which is proportional to
percentage yearly increase in ow. Monthly precipitation has some zero values and we use
square root transformation to stabilize this variable. Fitting to two series separately leads to
two ARIMA models
xt = 12 Pt = (1 .812(.029)B 12 )wt

and
yt = (1 .764(.033)B 12 )zt ,
2
= 32.503 and
z2 = .225.
with
w

(b) Cross correlating the two transformed series xt and (1.812B 12 )yt leads to the gure shown below
and we note that the inow series seems to depend on exponentially decreasing lagged values of
the precipitation.
5.12 (a) The ACF of the residuals from the ARIMA(0, 0, 0) (0, 1, 1)12 model is well-behaved with all
values (except lag 1) well below the signicance levels.
(b) The CCF has been computed in Problem 5.11 and is shown in the gure below. The exponential
decrease, beginning at lag zero, suggests
(B) = 0 (1 + 1 B + 12 B 2 + 13 B 3 + . . .)
for tting the exponential decrease.

Chapter 5

58

1
CCF of transformed inflow with prewhitened precipitation
0.8

0.6

0.4

0.2

0.2

0.4

0.6

0.8

1
30

20

10

0
lag

10

20

30

## Figure 4: CCF for Problem 5.12

(c) So far, we have
yt =

xt + t
1 1 B

which becomes
(1 1 B)yt = xt + (1 1 B)t ,
or
yt = 1 yt1 + 0 xt + nt ,
where
nt = (1 1 B)t .
We can run the regression model above as ordinary least squares, even though the residuals are
correlated, obtaining
yt = .526(.026)yt1 + .050(.002)xt + nt
(d) To model the noise, we take n
t from the above model and note that
n
t = (1 .526B)t
can be solved for t by inverting the rst order moving average transformation. These residuals,
say t can be modeled by an ARIM A(1, 0, 0) (0, 0, 1)12 model of the form
(1 .384B)t = (1 .796B 12 )zt ,
where
z2 = .0630. Hence, the nal model is of the form
yt =

.050
xt + t ,
1 .526B

where
xt = (1 .812B 12 )wt
and the noise t is as modeled above.
(e) A possible general procedure would be to forecast xt and t separately and then combine the
forecasts using the dening equation above (note that xt and t are assumed to be independent).
To forecast xt , note that
xt = wt .812wt12
and
xt+m = wt+m .812wt+m12

Chapter 5

59

## and the forecast would be

x
t+m =

.812wt+m12
0

m 12
m > 12,

where the residuals wt come from applying the model to the data xt . The forcast variance for
2
2
for m 12 and (1 + .8122 )w
for m > 12. To forecast t , note that
x
t+m will be w
t+m = .384t+m1 + zt+m .795zt+m12 ,

so that
t+m =

.384
t+m1 .795zt+m12
.384
t+m1

m 12
m > 12,

t =

1 .796B 12
zt = (B)zt .
1 .384B

## The contribution of this term to the forecast variance will then be

E[(t+m t+m )2 ] = z2

m1


j2 .

j=0

Order
1
2
3
4

AIC
-23.769
-24.034
-24.040
-23.995

AICc
-23.651
-23.798
-23.686
-23.523

SIC
-23.597
-23.690
-23.524
-23.306

## Squared Partial Canonical Correlation Vectors

h=1
h=2
h=3
h=4
0.938 0.303
0.060
0.042
0.808 0.014
0.049
0.015
0.720 0.002
0.005
0.008
Initially, a VAR(2) seems appropriate. The estimates of 1 and 2 are

## 1.139 1.658 2.626

0.321
0.978
" 2 = 0.023 0.130
" 1 = 0.012
1.090
0.291

0.006
0.146
0.891
0.004 0.101
and

5.684
" w = 0.307
1000
0.066

0.307
0.119
0.037

2.621
0.229
0.052

0.066
0.037 .
0.062

Residual analysis shows that there is still some small amount of correlation in the residual corresponding
to consumption. Fitting a third order model removes this small but signicant correlation. The
estimates of the VAR(3) model are:

## 1.059 1.473 2.522

0.219 0.623
2.322
" 2 = 0.028
" 1 = 0.008
1.074
0.292
0.011 0.126

0.009
0.129
0.913
0.007 0.080
0.257

0.010
1.582
0.116
5.382 0.285 0.060
" 3 = 0.012 0.143 0.102 1000
" w = 0.285
0.116
0.035 .

## 0.008 0.002 0.242

0.060
0.035
0.058

Chapter 6

60

Chapter 6

6.1 (a)

xt
xt1


=

0 .9
1
0


and
yt = [1 0]



xt1
xt2

xt
xt1


+

wt
0


+ vt

(b) For yt to be stationary, xt must be stationary. Note that for t = 0, 1, 2, ..., we may write x2t1 =
t1
t1
j
t
j
t
j=0 (.9) w2t12j + (.9) x1 and x2t =
j=0 (.9) w2t2j + (.9) x0 . From this we see
the steps of Problem 3.5, we conclude that
that x2t1 and x2t are independent. Repeating

setting x0 = w0 / 1 .92 and x1 = w1 / 1 .92 will make xt stationary. In other words, set
2
/(1 .92 ).
02 = 12 = w
(c) and (d): The plots are not shown here.
6.2 (i) s = t: Without loss of generality, let s < t, then cov(s , t ) = E[s E(t |y1 , ..., ys )] = 0.
. Thus t = yt ytt1 = (xt xt1
) + vt , and
(ii) s = t: Note that ytt1 = E(xt + vt |y1 , ..., yt1 ) = xt1
t
t
t1
t1
it follows that var(t ) = var[(xt xt ) + vt ] = Pt + v2 .
6.3 See the code to duplicate Example 6.6 on the web site. Except for the estimation part, this problem
is similar to that example.
6.4 (a) Write x = (x1 , ..., xp ) , y = (y1 , ..., yq ) , b = (b1 , ..., bp ) and B = {Bij }i=1,...,p;j=1,...,q . The
projection equations are
E[(xi bi Bi1 y1 Biq yq ) 1] = 0,
E[(xi bi Bi1 y1 Biq yq ) yj ] = 0,

i = 1, ..., p;

i = 1, ..., p,

j = 1, ..., q.

(1)
(2)

## In matrix notation, the p equations in (1) are E(x

x b Byy ) = 0 and the p q equations in (2)
are E(x
xy  by  Byyy  ) = 0, as was to be shown. Solving (1) leads to the solution for b; that is
b = E(x
x) BE(yy ). Inserting this solution into (2) and then solving (2) leads to the solution for
B; that is, E(x
xy  ) xy = B[E(yyy  ) y y ] or B = xy 1
yy .
(b) Let x
" = PMx = x +xy 1
y
y ) as given in (a). The MSE matrix is E(x
x PMx)(x
x PMx) =
yy (y

x ] because x PMx M and PMx M. Thus,
E[(x
x PMx)x
M SE = E[(x
x x )x
x ] xy 1
y y )x
x ] = xx xy 1
yy E[(y
yy yx ,
noting, for example, that E[(x
x x )(x
x x ) ] = E[(x
x x )x
x ].
(c) Consider writing the equation preceding (6.27) in terms of this question. That is,




 
x = xt1
x = xt 
xx = Ptt1
xy = Ptt1 At
t
.
,
 Yt1 N
y = t
y = 0
yx = At Ptt1
yy = t
Then using (a), x
" = x + xy 1
y y ) corresponds to xtt = xt1
+ Ptt1 At 1
t
t  t , which
yy (y
1
is precisely (6.21). Moreover, from (b), the MSE is xx xy yy yx which corresponds to
t1
, which is precisely Ptt dened in (6.22). Thus the normal theory and
Ptt1 Ptt1 At 1
t At Pt
the projection theorem results coincide.
6.5 (a) Because y k+1 y kk+1 Lk+1 , it suces to show that y k+1 y kk+1 Lk . But

E[yy j (yy k+1 y kk+1 ) ] = E{yy j E[(yy k+1 y kk+1 )  y 1 , ..., y k ]} = 0 j = 1, 2, . . . , k,
as required.

Chapter 6

61

## (b) From the problem statement, we have

1

xk (yy k+1 y kk+1 ) ] E[(yy k+1 y kk+1 )(yy k+1 y kk+1 )]
.
Hk+1 = E[x
Now y k+1 y kk+1 = Ak+1 (x
xk+1 xkk+1 ) + v k+1 . From this it follows immediately that
k
E[(yy k+1 y kk+1 )(yy k+1 y kk+1 )] = Ak+1 Pk+1
Ak+1 + R.

## To complete the result write (x

xk v k+1 )
E[x
xk (yy k+1 y kk+1 ) ] = E[x
xk (x
xk+1 xkk+1 ) Ak+1 ] = E[x
xk (x
xk xkk1 )  Ak+1 ] = Pkk  Ak+1 .
(c) Using (6.23),

k
k
Ak+1 + R]1 = [Pk+1
]1 Kk+1 .
Ak+1 [Ak+1 Pk+1

k
Kk+1 (yy k+1 y kk+1 ) = xk+1
k+1 x k+1 .

## From these two facts we nd that

k
k
Hk+1 (yy k+1 y kk+1 ) = Pkk  [Pk+1
]1 Kk+1 (yy k+1 y kk+1 ) = Jk (x
xk+1
k+1 x k+1 ),

## and the result follows.

(d) and (e) The remainder of the problem follows in a similar manner.
2
+ 2v2 , and
6.6 (a) First write yt = yt yt1 = wt + vt vt1 . Then, E(yt ) = 0, var(yt ) = w
2
cov(yt , yth ) = v for h = 1 and 0 for |h| > 1. We conclude that yt is an MA(1) with
v2
h
2
ACF given by (h) = 2 +2
2 1 for h = 1, 2, ... . Note, |(1)| .5 for all values of w 0 and
w
v
v2 > 0.
2
= .012 and
"v2 = .181. With these estimates "(1) = .483,
(b) The estimates should be about
"w
"
and hence = .77. These values are close to the values found in Ch. 3.

## 6.7 (a) Regression results:

Standard
Prob
Variable
Estimate
Error
t-value
>|t|
-------------------------------------------------------CONSTANT
-0.560893
0.053917 -10.402877
0.000
t
0.015766
0.004264
3.697381
0.000
t^2
-0.000151
0.000091
-1.661112
0.100
t^3
0.000001
0.000001
1.056275
0.293

Figure 1: Plot of data yt and regression predictor and y"t for Problem 6.7(a).

Chapter 6

62

Figure 2: Plot of data yt ( ), the smoother, xnt () and the predictor xt1
(- - -) for Problem 6.7(b).
t
(b) The model can be written as

3 3
xt
xt1 = 1
0
0
1
xt2

1
xt1
wt
0 xt2 + 0
0
xt3
0

xt
yt = [1 0 0] xt1 + vt .
xt2

## Here is completely known, R = v2 and

2
w
Q= 0
0

0 0
0 0
0 0

2
Estimation using ASTSA yielded
"w
.001 and
"v2 .000. This model can also be estimated
with R with the help of the code on the web site.

6.8 Using (6.64), the essential part of the complete log-likelihood (i.e. dropping any constants) is
ln |x2 | +

n
n


x2t
(yt xt )2
2
+
ln
|
|
+
.
v
2
r
v2
t=1 t x
t=1

Following (6.71) and (6.72), the updated estimates will be (using the notation of the problem, " for
updates and # for current values)

"x2 = n1

n

[#
xn ]2 + P#n
t

t=1

rt

and
"v2 = n1

n


[(yt x
#nt )2 + P#tn ].

t=1

It remains to determine x
#nt and P#tn . These can be obtained from (B.9)(B.10). Write Xn = (x1 , ..., xn )

and Yn = (y1 , ..., yn ) and drop the # from the notation. Then


 

xx xy
Xn
N 0,
,
yx yy
Yn

Chapter 6

63

## where xx = diag {r1 x2 , . . . , rn x2 }, xy = xx [because E(xt yt ) = E(x2t + xt vt ) = E(x2t )] and yy =

diag {v2 , . . . , v2 }. Using (B.9)(B.10) it follows that
xnt = E(xt |Yn ) =

rt x2
yt
rt x2 + v2

rt2 x4
rt x2 v2
=
.
rt x2 + v2
rt x2 + v2

## The stated results now follow.

t2
t2 1
+ Q = 2 ([Pt1
] + R1 )1 + Q. Note R = v2
6.10 (a) Using Property P6.1, Ptt1 = 2 [1 Kt1 ]Pt1
2
and Q = w
1
(b) To ease the notation, we write Ptt1 as Pt . Part (a) is then Pt = 2 (Pt1
+ R1 )1 + Q. Using
this relationship yields
1
1
1
= 2 [Pt1
Pt2
].
(1)
Pt1 Pt1

From (1) we see that Pt [] Pt1 as Pt1 [] Pt2 , implying that the sequence {Pt } is
2
/(1 2 ) using the
monotonic. In addition, the sequence is bounded below by 0 and above by w
fact that
2
)2 var(xt ) w
/(1 2 ).
Pt = E(xt xt1
t
From these facts we conclude that Pt has a limit, say P , as t , and from part (a), P must
satisfy
(2)
P = 2 (P 1 + R1 )1 + Q.
We are given R = Q = 1; solving (2) yields
P 2 + (1 2 )P 1 = 0.
(c) Using the notation in (b), Kt = Pt /(Pt + 1) and it follows that Kt K = P/(P + 1). Also,
0 < (1 K) = 1/(P + 1) < 1 because P > 0.
n
= xnn+1 , and in steady state
(d) In this problem, yn+1

xnn+1

= Kyn + (1 K)xn1
n
= Kyn + 2 (1 K)Kyn1 + 2 (1 K)2 xn2
n1
..
.


j K(1 K)j1 yn+1j .
=
j=1

## 6.13 Using Property P6.2 and for m = 0, n:

n
m
xnm = xm
m + Jm (xm+1 xm+1 ).

(1)

m1
m1
Because xm is not observed, xm
= xm1 . Moreover, xnm+1 = xm+1 and xm
m = xm
m+1 = xm+1 =
2
m
m
m
m1
2
xm1 . Note, Jm = Pm /Pm+1 . Now from (6.22) with Am = 0, Pm = Pm
m1
m
2
= Pm+1
= w
(1 + 2 ). Thus, Jm = /(1 + 2 ). Inserting the -ed values in (1) gives the
Pm+1
desired result.

## Also from Property P6.2 and for m = 0, n,

n
m
2
n
m
Pm
= Pm
+ Jm
(Pm+1
Pm+1
).
n
Noting that Pm+1
= 0, and using the -ed values above yields the desired result.
2
6.14 The estimates are " = 0.786 (.065) and
"w
= 1.143 (.135). The missing value estimates are:

(2)

Chapter 6

64

t
1
2
3

x_t
1.01
****
1.05

6
7
8

0.76
****
1.95

13
14
15

-2.81
****
-0.42

40
41
42
43
44
45
46
47
48
49
50

-0.60
****
-1.21
****
1.29
0.12
****
-0.07
-0.28
****
-2.83

x_t^n

t
53
54
55

x_t
-3.12
****
-1.73

1.32

61
62
63

-2.28
****
-0.56

-1.38

-1.57

66
67
68

-1.64
****
-2.25

-1.89

-0.88

79
80
81

1.45
****
0.67

1.03

85
86
87
88
89
90

1.14
****
2.00
-0.59
****
-0.60

1.00

0.04
0.02
-1.51

x_t^n
-2.36

t
93
94
95

x_t
-1.78
****
-0.51

x_t^n
-1.11

1.53
-0.58

6.15 We t a model similar to Example 6.10, that is, yt = Tt + St + vt , where Tt = Tt1 + wt1 and
St + St1 + + St11 = wt2 . The state equation in this case is similar to Example 6.10 but with
13 1 state vector xt = (Tt , St , St1 , . . . , St11 ) . The estimates and the corresponding standard errors
"w2 = .000 (.030), and
"v = 1.178 (.234). The trend and
are " = 1.003 (.001),
"w1 = 2.152 (.219),
seasonal component estimates are shown in Figure 3 for the last 100 time points.

Figure 3: Plot of estimated trend and seasonal components (nal 100 time points) for Problem 6.15 .
6.16 (a) AR(1): xt+1 = xt + vt and yt = xt + vt .
(b) MA(1): xt+1 = 0xt + vt and yt = xt + vt .
(c) IMA(1,1): xt+1 = xt + (1 + )vt and yt = xt + vt .
6.17 The proof of Proposition P6.5 is similar to the proof of Proposition P6.1. The rst step is in noting
that in this setup,
cov(x
xt+1 , t |Yt1 ) = Ptt1 At + S

Chapter 6

65
cov(t , t ) t = At Ptt1 At + R

and
E(x
xt+1 |Yt1 ) xt1
xt1
+ u
ut .
t
t+1 = x
Then we may write

 t1  

t1
Pt+1
xt+1 
xt+1
,
 Yt1 N
t1 
t
0
At Pt + S 

Ptt1 At + S
t


,

## and (6.98), (6.100) follow. To show (6.99), write


(x
xt+1 xtt+1 ) = ( Kt At )(x
xt xt1
) + [I
t

Kt ]

wt
vt


.

Then
t
= E[(x
xt+1 xtt+1 )(x
xt+1 xtt+1 ) ]
Pt+1

( Kt At )Ptt1 ( Kt At )



Q S
I
[I Kt ]
.
S R
Kt

## and (6.99) follows.

6.18 Follow the technique of Example 6.11, iterating backward p in time (Example 6.11 is an example of an
iteration backward once).
6.19 In contrast to Example 6.12, stochastic regression is appropriate in this case. The estimates using
Newton-Raphson with estimated standard errors (asymptotic/bootstrap) are " = 0.896 (0.067/0.274),

## " = 0.970 (0.475/0.538), "b = 1.090 (0.158/0.221),

"w = 0.117 (0.037/0.122), and
"v = 1.191 (0.108/0.171).
Note that the asymptotic standard error estimates have the same problem as in Example 6.12. Compare Figures 46 here with Figures 6.96.11 in the text. Here, a 90% bootstrap condence interval for
is (.46, .92).

## Figure 4: Bootstrap distribution, B = 200, of the estimator of for Problem 6.19.

6.20 The number of sunspots rises slowly and decreases rapidly indicating a switch in regime. A threshold
model was t in Problem 5.8. Here we used an AR(2) as the basis of the model for the state, but this
could be extended to higher order models.
Consider state equations
xt1 = 0 + 1 xt1,1 + +2 xt2,1 + wt1
xt2 = 0 + 1 xt1,1 + +2 xt2,1 + wt2

Chapter 6

66

## Figure 5: Bootstrap distribution, B = 200, of the estimator of w for Problem 6.19.

Figure 6: Joint bootstrap distribution, B = 200, of the estimators of and w for Problem 6.19.
or in vector form

1
1
xt1,1

=
1
xt2
0
xt1,2
xt1

with

## and observation equation

2
0
2
0

0
0
0
1

0 xt1,1 0 wt1
0
xt2,1 0 0

+ +

0 xt1,2
0
wt2
0
xt2,2
0
0

2
w1
0
Q=
0
0

0
0
0
0

0
0

0
0

0
0
2
w2
0
xt1

yt = y + At t1,1 + vt
xt2
xt1,2
where
At = [1 0 0 0]

or

[1 0 1 0].

Chapter 6

67

## Thus there are two regimes and either

(a) yt = y + 0 + 1 xt1,1 + 2 xt2,1 + wt1 + vt

## The nal estimates are (we set

"y = y) shown in the table below. The estimated states, x
"t1 and
"t2 , are shown in Figure 7. In addition a dot at 40 indicates the model is selecting x
"t1 + x
"t2 ; in
x
"t1 + x
"t2 during periods the data is increasing and the peaks.
general, the model selects x
"t1 + x
Estimates for Problem 6.20
Parameter
Estimate
SE
1
1.700
0.026
2
0.793
0.027
1
0.355
0.078
2
0.272
0.090
w1
7.495
0.252
w2
0.001
1.118
v
0.000
0.371
0
0.218
0.334
3.675
1.166
0

## Figure 7: Sunspots analysis, Problem 6.20. x

"t1 (), x
"t1 + x
"t2 (- - -), and a dot () at 40 indicates the
"t2 .
model is selecting x
"t1 + x
6.25 As suggested, the square roots of the data are analyzed. First we t an AR residual model, and then a
compound symmetry model. The AR components are not signicant and it appears that the compound
symmetry model provides the best t. Level 1 refers to patients who are not depressed. The estimates
are given below:
Estimation Results for the AR residual model (Problem 6.25)
Parameter
1
2
1
2
w

Estimate
1.315
0.515
0.480
0.201
0.595

Estimated
Standard Error
0.300
0.381
0.294
0.424
0.102

## Estimation Results for the Compound Symmetry model (Problem 6.25)

Parameter
1
2
1
2
w

Estimate
1.290
0.517
0.747
0.537
2.000

Estimated
Standard Error
0.244
0.302
0.171
0.126
0.189

Chapter 7

68

Chapter 7
7.1 Note rst that
(C iQ)(vv c ivv s ) = (vv c ivv s )
implies that

C Q
Q C



vc
vs


=

vc
vs

## by equating the real and imaginary parts and stacking. Also,






vs
C Q
vs
=
vv c
vv c
Q C
is the same equation, showing that there are two eigen vectors for each eigen value. Hence,
 
 2p 
   2p
p
 1 C Q 
1

= 1
|diag{
,

,
.
.
.
,

}|
=
2j .
1
1
p
p
2 Q C

2
2
j=1
But,
|f | =
2


p

2
j

j=1

p


2j .

j=1

 2p
1
|f |2 ,
|| =
2

## which veries the result. To verify the second result, let Y = X M or

Y c iY
Y s = (X
X c M c ) i(X
X s M s)
and note that Y f 1Y is purely real because f = f is Hermitian. Then, let W = f 1Y so that
Y f 1Y

= Y W
= (Y
Y c + iY
Y s )(W
W c iW
W s)
= Y cW c + Y sW s ,

W implies that
Y s = (C iQ)(W
W c iW
W s ),
Y c iY


or

C Q
Q C



Wc
Ws


=

Yc
Ys

## Then, write the quadratic form

1
( (X
X c M c )
2

 

1 
1 C Q
Xc Mc
(X
X s M s) )
Xs Ms
2 Q C


as
( Y c

Y s


)

C Q
Q C

1 

Yc
Ys


=

( Y c

Y s


)

Wc
Ws

= (Y
Y cW c + Y sW s )
= Y f 1Y .

Chapter 7

69

7.2 Substitute Lf from (5.6) into (5.5) to rewrite the negative of the log likelihood as

L

1

L ln |f | + tr f
(X
X  M  )(X
X  M )

ln L(X
X 1, . . . , X L; f )

=1

= L ln |f | + L tr{ff 1 }
= L ln |ff 1 | + L tr{ff 1 } + ln |f|
= L ln |fP P | + L tr{fP P } + ln |f|
= L ln |P fP | + L tr{P fP } + ln |f|
= L ln || + L tr{} + ln |f|
= L

p


ln i + L

i=1

p


p


i Lp + Lp + ln |f|

i=1

(i ln i 1) + Lp + + ln |f|

i=1

Lp + ln |f|
with equality when = I or P fP = I so that
f = (P )1 P 1 = f
7.3
M SE

= y (0)


1/2
1/2

=
1/2

1/2

=
1/2

r xy (r)

r=
1/2

=


fy () d

1/2

 

1/2

r e2ir


f xy () d

r=

## [fy () B  ()ff xy ()] d

[fy () f xy ()fx1 ()ff xy ()] d

1/2

=
1/2

fyx () d.

## 7.4 Note rst that to nd the coherence, we must evaluate

2yy () =

|fyy ()|2
fy()fy ()

Note that the Fourier representation of the cross-covariance function, yy (h) can be written
yy

= E(
yt+h yt )
 


= E
r xt+hr yt
=
=

r=


r xy (h r)
r=
 1/2 

r e2ir f xy ()e2ih
1/2 r=

Chapter 7

70


1/2

=
1/2

1/2

=
1/2

B  ()ff xy ()e2ih d
f xy ()fx1 ()ff xy ()e2ih d

## We would also need

yy

= E(
yt+h yt )
 





= E
r xt+hr
xts s
=

r=

s=

r x (h r + s)
s

r= s=

1/2  


=

1/2

1/2

=
1/2

1/2

=
1/2

1/2

=
1/2

r e2ir

 


2is
se
fx ()
e2ih d

r=

s=

B  ()ff x ()B
B ()e2ih d
f xy ()fx1 ()fx ()fx1 ()ff xy ()e2ih d
f xy ()fx1 ()ff xy ()e2ih d

2yy () =

## f xy ()fx1 ()ff xy ()fy ()

f xy ()fx1 ()ff xy ()
fy ()

## which is just 2yx () as given in (5.21).

7.5 Writing the complex version of the regression model as
Y c iY
Y s = (Xc iXs )(B
B c iB
B s ) + V c iV
Vs
shows that
Y c = Xc B c Xs B s + V c
and
Y s = Xs B c + Xc B s + V s
which is the matrix equation determining the real regression model. Furthermore, two complex matrices
F = Fc iFs , G = Gc iGs can be multiplied and the real and imaginary components of the product
H = Hc iHs = F G will appear as components of the real product

 


Gc Gs
Hc Hs
Fc Fs
=
Fs Fc
Gs Gc
Hs Hc
This result, along with isomorphism involving the product of a matrix and a vector justies the last
two equations at the end of the problem. Note also that the least squares solution will be

1 
L
L
= (X X)1 X Y =
B
Xk Xk
Xk Yk
k=1

It follows that
 =
B

L

k=1

Yk Xk


L
k=1

Xk Xk

k=1

1

= f xy fx1 ,

Chapter 7

71

which is the sample version of (5.16). To verify the rst part of the last equation, note that
L


Y Y =

|Yk |2 = Lfy

k=1

and
Y X(X X)1 X Y

L


Yk X k


L

k=1


L

Yk X k


L

Yk X k

X k Yk

k=1

X k X k

k=1


L

k=1
k=1

Lf xy fx f xy

1 
L

k=1

k=1
L


X k X k

X kX k

1 
L


X k Yk

k=1

1 
L

X k Yk

k=1

## and the assertions in the last equations are true.

7.6 Note rst that, since

t =

hsy ts

s=

E t

= E

ar tr

r=

= E

r,s

ar hs Ztrsj j

r,s,j

1/2

A ()H()Z()B
B ()e2it d.

1/2

Now,
t =

ar tr

1/2

A ()B
B ()e2it d

1/2

r=

## and the above two expressions are equal if and only if

H()Z() = I
for all . To show that the variance is minimized, subject to the unbiased constraint, note that, for
any unbiased estimator

t = t +
a gsv (t r s),
r

r,s

we would have
E[(t t )2 ] = E[(t t )2 ] + E[(t t )2 ] + 2E[(t t )(t t )].
The rst two terms on the righthand side are positive and the result is shown if the cross product term
is zero. We have

ar (gs hs )E[vv trsv tjk ]hk aj
E[(t t )(t t )] =
r,s j,k

Chapter 7

72


1/2

=
1/2

1/2

=
1/2

1/2

=
1/2



A () G() H() H () d





1
A () G() H() Z() Z ()Z()
d


1
A () G()Z() H()Z() Z ()Z()
d

## because G()Z() = H()Z() = I.

7.7 In the model (5.39), make the identications t = (t , t ) and

t1

t
zt =
..
.
t

t2
..
,
.
tN

1 e2i1
1 e2i2
,
Z() =
..
...

.
1 e2iN

Then,


Sz () =

N

N

j=1

j=1

e2ij

e2ij

## Now, it follows that

Sz1 () =

1
1
N (1 |()|2 )


=N

1
()

()
1
()
1

()
1

for = 0. If we apply Z () directly to the transform vector Y () = (Y1 , (), . . . , Yn ()) , we obtain
a 2 1 vector containing N Y () and
N Bw () =

N


e2ij Yj (),

j=1

which leads to the desired result, on multiplying by the 2 2 matrix Sz1 ().
7.8 For computations, it is convenient to let u = t r in the model (5.66), so that
yt =

zu tu + v t .

u=

t =

hr y tr .

r=

## The orthogonality principle yields the optimal lter as a solution of

E[(
t t )yy ts ] = 0
or

E[(
ty ts )] = E[ ty ts )].

Chapter 7

73

since t and v t are uncorrelated the lefthand side of the above becomes




E t
tsu zu

u=

(s + u)zu

u=

1/2

=
1/2

f ()Z ()e2is d.

## The righthand side becomes



E ty ts

= E

 

hr y tr y ts

r=

hr y (s r)

r=

1/2

H()fy ()e2is d.

=
1/2

Equating the Fourier transforms of the left and right sides gives Hfy = f Z , or
H = f Z (Zf Z + fv I)1 ,
where the frequency arguments are suppressed to save notation. To get the nal form, note that (4.56)
can be written as
AC (CAC + B)1 = (A1 + C B 1 C)1 C B 1
for the complex case, implying the form

H

1
1
I + Z Z
f
fv

1

1
fv

(Sz + I)1 Z

for the optimal lters. To derive the mean square error, note that
MSE = E[(
t t )
t ] = E[(
t t )] E[( t t )].
The second term is
E[( t t )]

= E



hs zu tsu t

u,s

hs zu (s u)

s,u

1/2

=
1/2

H()Z()f () d.

## Combining the two terms gives



1/2

MSE =
1/2

[f () H()Z()f ()] d.

f f Z

Zf Z + fv I

1
Zf

Chapter 7

74

## Then, appeal to the complex version of (4.55), i.e.

1


A AC CAC + B
CA = A1 + C B 1 C)1
to write the argument as

1
1
I + Z Z
f
fv

1

## which reduces to (5.72).

7.9 Suppressing the frequency argument, we write
E[SSR] = E[Y
Y ZSz1 Z Y ]

1
= tr ZSx Z E[Y
YY ]

1

= tr ZSz Z (f ZZ + fv I)

1

1
= f tr ZSz Z ZZ
+ fv tr ZSz Z

1
= f tr ZZ
+ fv tr Sz Sz
= f tr {Sz } + qfv .
When the spectrum is cumulated over L frequencies, the multiplier appears.
7.10 Again, suppressing the frequency subscripts the model Y = ZB
B + V takes the vector form

Y11
V11
1 1
..
.. ..
..
. 
. .
.


Y1N 1 1 B1
V
+ 1N ,

=
Y21 1 1 B2
V21
. .
.
..

.. ..
..
.
1 1
Y2N
V2N
where B1 is the DFT of t and B2 is the DFT of 1t , 2t = 1t . The null hypothesis is that B2 = 0.
Now, in (5.52),


N (Y1 + Y2 )
szy =
N (Y1 Y2 )
and
s2yz

N
2 



|Yij |2 ( N (Y1 + Y2 )

N (Y1 Y2 ) )

i=1 j=1

N


|Y1j |2 +

j=1

2 
N


N


2N
0

0
2N

1 

N (Y1 + Y2 )
N (Y1 Y2 )

## |Y2j |2 N |Y1 |2 N |Y2 |2

j=1

|Yij Yi |2 ,

i=1 j=1

which is (5.85) Under the reduced model, s1y = N (Y1 + Y2 ) and S11 = 2N so that
s2y1 =

N

j=1

|Y1j |2 +

N

j=1

|Y2j |2

N
|Y1 + Y2 |2 .
2

Chapter 7

75

Then, substitute
Y =
to obtain
s2y1 =

N


1
(Y1 + Y2 )
2

|Y1j |2 +

j=1

N


|Y2j |2 2N |Y |2 .

j=1

Then,
= s2y1 s2yz

= 2N |Y |2 + N |Y1 |2 + N |Y2 |2
= N

2


|Yi Y |2

i=1

2 
N


|Yi Y |2 ,

i=1 j=1

which is (5.84).
7.11 Use the model
yijt = it + vijt ,
for i = 1, . . . , I, j = 1, . . . , Ni and write the frequency domain version in terms of the vector
Y = (Y11 , . . . , Y1,N1 , Y21 , . . . , Y2N2 , . . . , YI1 , . . . , YINI )
The matrix Z has N1 ones in elements 1 to N1 of column 1, and zeros elsewhere, N2 ones in elements
N1 + 1, . . . , N1 + N2 of column 2, etc. It follows that
Sz = Z Z = diag (N1 , N2 , . . . , NI )
and

Z Y = (N1 Y1 , N2 Y2 , . . . , NI YI )

so that

= (Y1 , Y2 , . . . , YI )
B

and
=
AB

I


Ai Yi .

i=1

Finally,

Q(A)

=
=

(A1 , A2 , . . . , AI ) diag
I

|Ai |2
i=1

Ni


1 1
1
,
,...,
(A1 , A2 , . . . , AI )
N1 N2
NI

## The error variance s2yz comes from (5.85).

7.12 Each of the spectra
f1 =

N1


|Y1j Y1 |2

and f2 =

j=1

N2


|Y2j Y2 |2

j=1

## can be regarded as an error power component, computed from the model

yijt = it + vijt

Chapter 7

76

for a xed i = 1, 2. Hence, from Table 5.2, the error power components will have a chi-squared
distribution, say,
2(Ni 1)fi
22(Ni 1)
fi
for i = 1, 2, and the two samples are assumed to be independent. It follows that the ratio
2[2(N1 1)] /2(N1 1)
f1 f2
F[2(N1 1),2(N2 1)] .
2
[2(N2 1)] /2(N2 1)
f2 f1
It follows that

f1
f1
F[2(N1 1),2(N2 1)] .

f
2
f2

2
7.13 In the notation of (5.113), 1 = s, 2 = 0, 1 = 2 = w
I and

x)
dL (x

=
=
=

1 
1 ss
1
sx
+ ln
2
2
w
2 w
2
n
n

1
1 t=1 s2t
1
s
x

+ ln
t
t
2
2
w t=1
2 w
2


n
1 
1 S
1
st xt
+ ln
2
w t=1
2 N
2

When 1 = 2 , the last term disappears and we may use (5.115) for the two error probabilities with
 
S
ss
D2 = 2 =
.
w
N
2
2
)I, 2 = w
I, so that (5.115) becomes
7.14 In this case, 1 = 2 = 0, 1 = (s2 + w
 2



2
s + w
1
n
1
1
1
dq (x
x) = ln
2 xx + ln

2
2
2
w
2 s2 + w
w
2
 2

n
2

s + w
s2
1
1
1
2
ln
=
x

+ ln
t
2
2
2
2
2 w (s + w ) t=1
2
w
2

## In this case, dene the signal-to-noise ratio as



S
N


=

s2
2
w

so that, for the quadratic criterion with 1 = 2 we accept 1 or 2 according to whether the statistic

s2
1
T (x
x) =
x2t
2 ( 2 + 2 )
2 w
s
w t=1
n

 2

 

+ 2
S
1
1
ln s 2 w = ln 1 +
.
2
w
2
N


2
2 2
)2n , whereas, under 2 , t x2t w
n , so that
Now, under 1 , t x2t (s2 + w
 
1 S
T (x
x)
2n
2 N

K=

under 1 and
T (x
x)
under 2 .


 1
S
1
2n
1+
2
N

Chapter 7

77

## Figure 2: Problem 7.15 (Awake-Heat) spectral density of rst PC series.

7.15 Awake-Heat: Figures 1 and 2 are the gures corresponding to Figures 5.14 and 5.15 of the text (Example
5.14) except that here, Caudate is included (location number 5). Awake-Shock: The corresponding
gures are Figures 3 and 4 below. The listing below is similar to Table 5.8 but for Awake-Heat and
Awake-Shock. Note that 22 (.95, .975, .99) = 5.99, 7.38, 9.21.
loc
1
2
3
4
5

|e|
.467
.450
.460
.363
.230

AWAKE-HEAT
chi^2
loc |e|
842.01
6 .121
150.16
7 .012
623.62
8 .323
104.45
9 .254
30.32

chi^2
8.09
0.07
64.99
46.76

|
|
|
|
|
|

loc
1
2
3
4
5

|e|
.410
.416
.387
.370
.269

AWAKE-SHOCK
chi^2
loc
233.39
6
138.42
7
389.46
8
352.68
9
107.23

|e|
0.139
0.161
0.309
0.398

chi^2
13.952
18.848
252.75
539.46

7.16 (a) See Figure 5. The P components have broad power at the midrange frequencies whereas the S
components have broad power at the lower frequencies.
(b) See Figure 7. There appears to be little or no coherence between the P and S components.
(c) - (d) See Figure 6. These gures support the overall conclusion of part (a).
(e) See Figure 8. The canonical variate series appear to be strongly coherent (in contrast to the
individual series).

Chapter 7

78

## Figure 4: Problem 7.15 (Awake-Shock) spectral density of rst PC series.

7.17 For p = 3 and q = 1, write (5.158) as

x1
b1
1
x2 = b2 z + 2 .
x3
b3
3

This implies

2
1 .4 .9
b1 + 12
.4 1 .7 = b1 b2
.9 .7 1
b1 b3

b1 b2
b22 + 22
b2 b3

b1 b3
b2 b3 .
b23 + 32

Now
b1 b2 = .4

and

b2 b3 = .7

b1 =

4
b3 .
7

But

4 2
b = .9.
7 3
This means that b23 = .9 74 = 1.575. But we also have b23 + 32 = 1 in which case 32 < 0 which is not a
valid variance.


re
if im (). Now f im () =
7.19 
Note that f () =
h (h) cos(2hv) i
h (h)

 sin(2hv) = f () 

im
();
h (h) sin(2hv) =
h (h) sin(2hv) =
h (h) sin(2hv) = 
h (h) sin(2hv) = f
re

that is, the imaginary part is skew symmetric. Also note that f () =
h (h) cos(2hv) =
f re (); that is, the real part is symmetric.
b1 b3 = .9

Next, because  f im ()
is a scalar,  f im ()
= (
 f im ()
) =  f im () =
 f im ()
. This
 im
= 0 for any real-valued vector .
result implies f ()

Chapter 7

79

## Figure 6: Problem 7.16 (c)-(d). Spectral density of rst PC series.

7.20 See Figure 9. Note the signicant peak at = 1/3 as opposed to the fourth quarter of EBV (see Figure
7.23).
7.21 (a) The estimated spectral density of rt is shown in Figure 10 and supports the white noise hypothesis.
(b) The spectral envelope with respect to G = {x, |x|, x2 } is shown in Figure 11 where substantial
power near the zero frequency is noted. The optimal transformation is shown in Figure 12 as a
solid line and the usual square transformation is shown as a dotted line; the two transformations
are similar.

Chapter 7

80

## Figure 7: Problem 7.16 (b). Estimated coherencies.

Figure 8: Problem 7.16 (c) (d). Squared coherency between canonical variate series.

Figure 9: Problem 7.20. This is the equivalent of Figure 7.23 but for the herpesvirus saimiri.

Chapter 7

81

Figure 10: Problem 7.21 (a). Estimated spectral density of NYSE returns.

Figure 11: Problem 7.21 (b). Spectral envelope of NYSE returns with respect to G = {x, |x|, x2 }.

Figure 12: Problem 7.21 (b). The optimal transformation () and the usual square transformation (- - -).