Edition 2
Instructors Manual
Chapter 1
Chapter 1
1.1 The major dierences are how quickly the signal dies out in the explosion versus the earthquake and
the larger amplitude of the signals in the explosion.
x = matrix(scan("/mydata/eq5exp6.dat"), ncol=2)
plot.ts(x[,1], col="blue", main="EQblue EXPred", ylab="")
lines(x[,2], col="red")
1.2 Consider a signal plus noise model of the general form xt = st + wt , where wt is Gaussian white noise
2
= 1. Simulate and plot n = 200 observations from each of the following two models.
with w
(a) Below is R code for this problem. Figure 1 shows contrived data simulated according to this
model. The modulating functions are also plotted.
w = rnorm(200,0,1)
t = 1:100
y = cos(2*pi*t/4)
e = 10*exp(t/20)
s = c(rep(0,100), y*e )
x = s+w
par(mfrow=c(2,1))
ts.plot(s, main="signal")
ts.plot(x, main="signal+noise")
(b) This is similar to part (a). The plots according to the model in this part are also shown and we
note that the second modulating function has less decay and produces a longer signal.
(c) The rst signal bears a striking resemblance to the two arrival phases in the explosion. The second
signal decays more slowly and looks more like the earthquake. The periodic behavior is emulated
by the cosine function which will make one cycle every four points. If we assume that the data
are sampled at 40 points per second, the data will make 10 cycles in a second. This is a bit high
for earthquakes and explosions, which will generally make about 1 cycle per second (see Figure
3.10).
1.3 Below is R code for parts (a)(c). In all cases the moving average nearly annihilates (completely in the
2nd case) the signal. The signals in part (a) and (c) are similar.
w = rnorm(150,0,1) # 50 extra to avoid startup problems
x = filter(w, filter=c(0,.9), method="recursive")
x = x[51:150]
x2 = 2*cos(2*pi*(1:100)/4)
x3 = x2 + rnorm(100,0,1)
v = filter(x, rep(1,4)/4) # moving average
v2 = filter(x2, rep(1,4)/4) # moving average
v3 = filter(x3, rep(1,4)/4) # moving average
par(mfrow=c(3,1))
plot.ts(x)
lines(v,lty="dashed")
plot.ts(x2)
lines(v2,lty="dashed")
plot.ts(x3)
lines(v3,lty="dashed")
Chapter 1
Series (a)
Modulator (a)
10
1
0.8
0.6
0
0.4
5
10
0.2
50
100
150
200
50
Series (b)
100
150
200
150
200
Modulator (b)
15
10
0.9
0.8
0.7
0
0.6
5
0.5
10
15
0.4
0
50
100
150
0.3
200
50
100
20
40
60
80
100
120
140
160
180
200
120
140
160
180
200
Series (b)
60
40
20
0
20
40
20
40
60
80
100
= E[(xs xt s xt xs t + s t ]
= E(xs xt ) s E(xt ) E(xs )t + s t
= E(xs xt ) s t s t + s t
1.5 For (a) and (b) Ext = st . To get Figure 3, just plot the signal (s) in Problem 1.2. Note that the
autocovariance function
(t, u) = E[(xt st )(xu su ) = E(wt wu ),
which is one when t = u and zero otherwise.
1.6 (a) Since Ext = 1 + 2 t, the mean is not constant, i.e., does not satisfy (1.17). Note that
xt xt1
= 1 + 2 t + wt 1 2 (t 1) wt1
= 2 + wt wt1 ,
Chapter 1
3
Mean Series (a)
10
10
20
40
60
80
100
120
140
160
180
200
120
140
160
180
200
10
20
40
60
80
100
q
1
[(1 + 2 (t j)]
2q + 1 j=q
q
1
j
(2q + 1)(1 + 2 t) 2
2q + 1
j=q
= 1 + 2 t
because the positive and negative terms in the last sum cancel out. To get the covariance write
the process as
yt =
aj wtj ,
j=
where aj = 1, j = q, . . . , 0, . . . , q and is zero otherwise. To get the covariance, note that we need
y (h) = E[(yt+h Eyt+h )(yt Eyt )]
aj ak Ewt+hj wtk
= (2q + 1)2
j
(2q + 1)2
aj ak h+kj ,
j,k
aj+h aj ,
j=
where h+kj = 1, j = k + h and is zero otherwise. Writing out the terms in y (h), for h =
0, 1, 2, . . ., we obtain
2 (2q + 1 h)
y (h) =
(2q + 1)2
for h = 0, 1, 2, . . . , 2q and zero for h > q.
1.7 By a computation analogous to that appearing in Example 1.17, we may obtain
6 2 h = 0
w
2
h = 1
4w
(h) =
2
h = 2
w
0
h > 2.
Chapter 1
2
.
The autocorrelation is obtained by dividing the autocovariances by (0) = 6w
s
1.8 (a) Simply substitute s + k=1 wk for xs to see that
t
t1
wk = + (t 1) +
wk + wt .
t +
k=1
k=1
t
t
Ext = E t +
wk = t +
Ewk = t.
k=1
k=1
j=1
k=1
= E (w1 + + ws )(w1 + + ws + ws+1 + . . . + wt )
=
s
2
E(wj2 ) = s w
j=1
2
2
2 , which yields the result. The implication is
(c) From (b), x (t 1, t) = (t 1)w
/ (t 1)w
tw
that the series tends to change slowly.
(d) The series is nonstationary because both the mean function and the autocovariance function
depend on time, t.
(e) One possibility is to note that xt = xt xt1 = + wt , which is stationary.
1.9 Note that E(U1 ) = E(U2 ) = 0, implying Ext = 0. Then,
(h)
= E(xt+h xt )
= E U1 sin[20 (t + h)] + U2 cos[20 (t + h)]
U1 sin[20 t] + U2 cos[20 t]
2
= w sin[20 (t + h)] sin[20 t] + cos[20 (t + h)] cos[20 t]
2
= w
cos[20 (t + h) 20 t]
2
= w
cos[20 h]
M SE(A) = E
x2t+
2AE(xt+ xt ) + A
E(x2t )
Chapter 1
(b)
()()
2
+ ()
M SE(A) = (0) 1 2
(0)
2
2
= (0) 1 2 () + ()
2
= (0) 1 ()
2
E(xt+ Axt ) = (0) 1 () = 0
2
2
j wt+hj wtk k = w
j= k=
2
j k hj+k = w
j,k
k+h k ,
k=
n
xnt =
j wtj .
j=n
so that
E[(xt xnt )2 ]
j k E(wtj wtk )
j>n k>n
j>n k>n
2
w
j 
j>n
2
= w
k 
k>n
2
j  ,
j>n
which converges to zero as n . Actually, in the white noise case, j j 2 < would be
enough, as can be seen by following through the same argument as above.
1.12
xy (h) = E[(xt+h x )(yt y )] = E[(yt y )(xt+h x )] = yx (h)
1.13 (a)
2
(1 + 2 ) + u2
w
2
y (h) = w
h=0
h = 1
h > 1.
Chapter 1
(b)
2
h=0
w
2
w
h = 1
0
otherwise.
2
x (h) = w h = 0
0
otherwise.
xy (h) =
xy (h) =
xy (h)
x (0)y (0)
= E(exp{xt }
1
= exp x + (0) ,
2
2
1
= exp{2x + (0) + (h)} exp x + (0)
2
= exp{2x + (0)} exp{(h)} 1 .
(1)
= E(wt+1 wt wt wt1 )
= E(wt+1 )E(wt2 )E(wt1 )
= 0
= (1),
and similar computations establish that (h) = 0, h 1. The series is white noise.
Chapter 1
0.5
x
0
x2
0.5
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
0.5
x2
0
0.5
0.1
sin(2ut)du
0
1
1
cos(2ut)
=
2t
0
1
cos(2t) 1
=
2t
= 0,
for t = 1, 2, . . ..
(h) =
1
cos( ) cos( + )
2
1
4
= P {x2 1/2, x3 1/2} =
2
9
1
1
= P {x2 0, x4 0} =
3
4
1
1
P {x1 > 0, x2 > 0, x3 > 0} = = P {x2 > 0, x3 > 0, x4 > 0} =
6
8
Figure 4 shows a plot of x1 , x2 , x3 over the interval 0 u 1; the probabilities are the Lebesgue
measure of the inverse images satisfying the joint probabilities. Figure 4 shows the plots and one
only needs to dene the intervals where both curves lie below .5 to compute the probabilities.
P {x1 0, x3 0} =
Chapter 1
j xj
j=1
n
j (wj wj1 )
j=1
= 1 w0 +
n1
(j j+1 )wj + n wn .
j=1
Because the wj s are independent and identically distributed, the characteristic function can be
written as
n1
(j j+1 )(n )
(1 , . . . , n ) = (1 )
j=1
(b) Because the joint distribution of the wj will not change simply by shifting x1 , . . . , xn to x1+h , . . . , xn+h ,
the characteristic function remains the same.
1.18 Letting k = j + h, holding j xed after substituting from (1.31) yields
(h)
2
= w
h=
2
w
2
= w
<
j+h j 
h= j=
j+h j 
h= j=
k 
k=
j 
j=
1.19 Code for parts (a) and (b) are below. Students should have about 1 in 20 acf values within the bounds,
but the values for part (b) will be larger in general than for part (a).
wa=rnorm(500,0,1)
wb=rnorm(50,0,1)
par(mfrow=c(2,1))
acf(wa,20)
acf(wb,20)
1.20 This is similar to the previous problem. Generate 2 extra observations due to loss of the end points in
making the MA.
wa=rnorm(502,0,1)
wb=rnorm(52,0,1)
va=filter(wa, sides=2,
vb=filter(wb, sides=2,
par(mfrow=c(2,1))
acf(va,20, na.action =
acf(vb,20, na.action =
rep(1,3)/3)
rep(1,3)/3)
na.pass)
na.pass)
1.21 Generate the data as in Problem 1.2 and then type acf(x, 25). The sample ACF will exhibit
signicant correlations at one cycle every four lags, which is the same frequency as the signal. (The
process is not stationary because the mean function is the signal, which depends on time t.)
1.22 The sample ACF should look sinusoidal, making one cycle every 50 lags.
x = 2*cos(2*pi*(1:500)/50 + .6*pi)+ rnorm(500,0,1)
acf(x,100)
Chapter 1
1.23 y (h) = cov(yt+h , yt ) = cov(xt+h .7xt+h1 , xt .7xt1 ) = 0 if h > 1 because the xt s are independent.
When h = 0, y (0) = x2 (1 + .72 ), where x2 is the variance of xt . When h = 1, y (1) = .7x2 . Thus,
y (1) = .7/(1 + .72 ) = .47
1.24 (a) The variance is always nonnegative, so that, for xt a zeromean stationary series
n
n
n
var
as xs = E
as xs at xt =
as (s t)at = a a
a 0,
s,t
s=1
s=1 t=1
y1
0
0
... 0
y2
y1
0
... 0
y3
y
y
... 0
2
1
..
..
..
..
.
.
.
.
.
.
.
..
yn yn1 yn2 . . . .
yn1 . . . 0
yn
D= 0
.
. . . y1
0
yn
.
.. y
0
...
0
2
.
..
..
.
.
.
.
. . . ..
.
..
..
..
.
.
.
. . . yn
= (s t), s, t = 1 . . . , n, one can show by matrix multiplication that
If
= 1 D D.
n
Then,
a =
a a
1
1
a D Da
a = cc =
c2i 0
n
n
i=1
n
for c = Da
a.
1.25 (a)
Ex
t =
N
n
1
1
N t
= t
xjt =
t =
N j=1
N j=1
N
(b)
E[(
xt t )2 ] =
N
N
N
1
(x
)(x
)
=
e (t, t)
jt
t
jt
t
N 2 j=1
N
j=1
k=1
k=1
1
N e (t, t)
= e (t, t)
N
N
(c) As long as the separate series are observing the same signal, we may assume that the variance goes
down proportionally to the number series as in the iid case. If normality is reasonable, pointwise
100(1 ) % intervals can be computed as
x
t z/2 e (t, t)/ N
1.26
Vx (h
h) =
1
1
E[(xs+h ) (xs )]2 = [(00) (h
h) (h
h) + (00)] = (00) (h
h).
2
2
Chapter 1
10
nh
1
[1 (t t) + 1 h][1 (t t)]
n t=1
nh
nh
12
2
(t t) + h
(t t)
n t=1
t=1
=
and
(0) =
n
12
(t t)2
n t=1
12
n
n
(t t)2 h
t=nh+1
(t t)
t=nh+1
12
n
(0)
n
n
(t t)2 h
t=nh+1
(t t)
t=nh+1
is a remainder term that needs to converge to zero. We can evaluate the terms in the remainder using
m
t=
t=1
and
m
t2 =
t=1
= 12
m(2m + 1)(m + 1)
6
n
m(m + 1)
2
t2 nt2
t=1
6
4
n(n
+
1)(n
1)
,
= 12
12
= 12
R=
(s + n h t) h
(s + n h t)
n
(0)
s=1
s=1
The terms in the numerator of R are O(n2 ), whereas the denominator is O(n3 ) so that the remainder
term converges to zero.
1.28 (a)
E[
x2 ]
x > } n 2
P { n
Note that,
nE[
x2 ]
(h) = 0,
u=
Chapter 1
11
(b) An example of such a process is xt = wt = wt wt1 , where wt is white noise. This situation
arises when a stationary process is overdierenced (i.e., wt is already stationary, so wt would
be considered overdierencing).
1.29 Let yt = xt x and write the dierence as
n1/2 (h) (h)
= n1/2
n
yt+h yt n1/2
nh
(yt+h y)(yt y)
t=1
1/2
t=1
n
= n
yt+h yt + y
nh
E n1/2 
yt+h yt 
n
yt + y
t=1
t=nh+1
1/2
yt+h (n h)
y
yt+h yt 
t=nh+1
n
n1/2
2
t=1
n
t=nh+1
nh
2
E 1/2 [yt+h
]E 1/2 [yt2 ]
t=nh+1
1/2
n
0,
hx (0)
as n . Applying the Markov inequality in the hint then shows that the rst term is op (1). In
y 2 , note that, from Theorem A.5,
order to handle the other terms, which dier trivially from n1/2 n
1/2
y 2 converges in distribution to
n y converging in distribution to a standard normal implies that n
2
y2 =
a chisquare random variable with 1 degree of freedom and hence n
y = Op (1). Hence, n1/2 n
1/2
Op (1) = op (1) and the result is proved.
n
1.30 To apply Theorem A.7, we need the ACF of xt . Note that
j k E[wt+hj wtk ]
x (h) =
j,k
2
= w
k
2 h
= w
h+k k
2k
k=0
2 h
w
,
1 2
and we have x (h) = h for the ACF. Now, from (A.55), we have
w11
=
=
u=1
2
x (u + 1) + x (u 1) 2x (1)x (u)
u+1 + u1 2u+1
u=1
(1 2 )2 2u
2
u=1
1 2 .
1 2
(1) AN ,
.
n
Chapter 1
12
2
z/2
n
B = 2
(1),
and
C = 2 (1)
gives the interval
),
2
z/2
B
B 2 4AC
2A
2A
= 1.96 gives the approximate 95% condence interval (.47, .77).
1.31 (a) E(xt xt+h ) = 0 and E(xt xt+h xs xs+k ) = 0 unless all subscripts match. But t = s, and h, k 1, so
all subscripts cant match and hence cov(xt xt+h , xs xs+k ) = 0.
h
(b) Dene yt = xt j=1 j xt+j , for 1 , . . . , h R arbitrary. Then yt is strictly stationary, hh
n
dependent, and var(yt ) = 4 j=1 2j . Hence, with yn = 1 yt /n,
n
yn d N 0,
h
!
!
!
y () N 0, y (0) N 0, 4
2j .
j=1
=
Thus
2 n1/2
h
n
j xt xt+j d N 0,
t=1 j=1
h
2j
j=1
n
t=1
(c) This part follows from the proof of Problem 1.29, noting that x = 0.
(d) Using part (c), for large n,
n
n t=1 xt xt+h /n
n
.
n"
(h)
2
t=1 xt /n
n1/2
n
t=1 xt xt+h /n
2
2
;
j
=
1,
.
.
.
,
h
d (z1 , . . . , zh ) .
2 /n
x
t
t=1
n
Chapter 2
13
Chapter 2
2.1 (a)(c) The following code will produce all the necessary results. The model is overparameterized if an
intercept is included (the terms for each Q are intercepts); most packages will kick out Q4. In general,
i j is the average increase (decrease) from quarter i to quarter j. There is substantial correlation
left in the residuals, even at the yearly cycle.
jj=ts(scan("/mydata/jj.dat"), start=1960, frequency=4)
Q1=rep(c(1,0,0,0),21)
Q2=rep(c(0,1,0,0),21)
Q3=rep(c(0,0,1,0),21)
Q4=rep(c(0,0,0,1),21)
time=seq(1960,1980.75,by=.25)
reg=lm(log(jj)~0+time+Q1+Q2+Q3+Q4)
summary(reg)
# regression output
plot.ts(log(jj))
lines(time, reg$fit,col="red") # the returned fitted values are in reg$fit
plot.ts(reg$resid)
# the returned residuals are in reg$resid
acf(reg$resid,20)
2.2 (a)(b) The following code will produce the output. Note that Pt4 is signicant in the regression and
highly correlated (zeroorder correlation is .52) with mortality.
mort=ts(scan("/mydata/cmort.dat"))
temp=ts(scan("/mydata/temp.dat"))
part=ts(scan("/mydata/part.dat"))
t=ts(1:length(mort))
x=ts.intersect(mort,t,temp,temp^2,part,lag(part,4))
fit=lm(x[,1]~x[,2:6])
summary(fit)
Estimate Std. Error t value Pr(>t)
(Intercept)
79.239918
1.224693 64.702 < 2e16
x[, 2:6]t
0.026641
0.001935 13.765 < 2e16
x[, 2:6]temp
0.405808
0.035279 11.503 < 2e16
x[, 2:6]temp^2
0.021547
0.002803
7.688 8.02e14
x[, 2:6]part
0.202882
0.022658
8.954 < 2e16
x[, 2:6]lag(part, 4) 0.103037
0.024846
4.147 3.96e05
***
***
***
***
***
***
2.3 The following code will produce the output. The slope of the tted line should be close to .1 (the true
slope), but both the true and tted lines will not be very good indicators of the socalled trend.
w=rnorm(500,.1,1)
x=cumsum(w)
t=1:500
fit=lm(x~0+t)
plot.ts(x)
lines(.1*t, lty="dashed")
abline(fit)
Chapter 2
14
j z t , j2 ), for j = 1, 2. Then
2.4 For the normal regression models we have xt N (
ln
x; 1 , 12 )
f1 (x
f2 (x
x; 2 , 22 )
n
n
ln 12 + ln 22
2
2
n
n
1
1
2
(x
z
)
+
(xt 2z t )2
t
t
1
212 t=1
222 t=1
Taking expectations, the fourth term in the above becomes by adding and subtracting 1z t inside the
parentheses
1 2 ) Z Z(
1 2)
E1 [(xt 2z t )2 ] = n12 + (
and, dividing through by n and collecting terms, we obtain the quoted result.
2.5 Using the quoted results and the independence of and
2 , we have
2
n
k
1
2
2
E1 [I(
, 2 ; ,
2 )] =
+
E
ln
+
E
ln
E1
1
1
1
2
2nk
2nk
2
n
1
1
2
2
E
+
E
ln
+
E
=
ln
E1
1
1
1
1
k
2
2nk
2nk
k
1
n
+
= ln 12 + E1 ln
2 +
,
2
nk2 nk2
which simplies to the desired result.
2.6 (a) It is clear that Ext = 0 + 1 t and the mean depends on t. Note that the points will be randomly
distributed around a straight line.
(b) Note that xt = 1 + wt wt1 so that E(xt ) = 1 and
2
h=0
2w
2
cov(xt+h , xt ) = w
h = 1
0
h > 1.
(c) Here xt = 1 + yt yt1 , so E(xt ) = 1 + y y = 1 . Also,
cov(xt+h , xt ) = cov(yt+h yt+h1 , yt yt1 ) = 2y (h) y (h + 1) y (h 1),
which is independent of t.
2.7 This is similar to part (c) of the previous problem except that now we have E(xt xt1 ) = , with
autocovariance function
cov(wt+h + yt+h yt+h1 , wt + yt yt1 ) = w (h) + 2y (h) y (h + 1) y (h 1).
2.8 (a) The variance in the second half of the varve series is obviously larger than that in the rst half.
Dividing the data in half gives x (0) = 133, 593 for the rst and second parts respectively and
the variance is about 4.5 times as large in the second half. The transformed series yt = ln xt
has y (0) = .27, .45 for the two halves, respectively and the variance of the second half is only
about 1.7 times as large. Histograms, computed for the two series in Figure 5 indicate that the
transformation improves the normal approximation.
(b) Autocorrelation functions for the three series, shown in Figure 6 show nonstationary behavior,
except in the case of
xt
ut = yt yt1 = ln
,
xt1
Chapter 2
15
300
160
140
Untransformed Varves
250
Logarithms
120
200
100
150
80
60
100
40
50
20
50
100
150
200
Varve Series xt
0
10
20
30
40
50
60
70
80
90
100
20
30
40
50
60
70
80
90
100
20
30
40
50
60
70
80
90
100
1
0.5
0
yt= ln xt
0.5
1
10
1
0.5
0
0.5
1
ut=ytyt1
0
10
lag
xt1
xt1
and the term Pt shown is the proportional increase (100Pt = percentage increase). Hence, it
appears that the percent increase in deposition in a year is a more stable quantity.
(c) The series appears stationary because the ACF in Figure 6 is essentially zero after lag one.
(d) Note that
2
(1 + 2 )
u (0) = E[ut u )2 ] = E[wt2 ] + 2 E[wt1 ]2 = w
Chapter 2
16
700
800
Gas Prices
700
600
Oil Prices
600
500
500
400
400
300
300
200
100
200
0
50
100
150
200
0.2
100
50
100
150
200
100
150
200
0.3
0.2
0.1
0.1
0
0.1
0.1
0.3
Differenced ln
0.2
Differenced ln
0.2
0.3
0
50
100
150
200
0.4
50
Figure 7: Gas series, oil series and percent changes for each.
and
2
u (1) = E[(wt+1 wt )(wt t1 )] = E[wt2 ] = w
,
1 + 2
or
(1)2 + + (1) = 0
and we may solve for
=
1 42 (1)
2(1)
Chapter 2
17
0.5
0.5
Gas ACF
0.5
10
20
0.5
30
40
50
0.5
0.5
0.5
10
20
30
10
20
30
40
50
0.5
Oil ACF
40
50
10
20
30
40
50
Figure 8: ACFs for oil and gas series and percent changes.
Cross Correlation: gas(t+h) vs oil(t)
1
0.8
.67
0.6
.44
.33
0.4
0.2
0.2
0.4
0.6
0.8
1
30
20
10
10
20
30
0.8
Cross correlation (first 80 points)
0.6
0.4
0.2
0.2
0.4
0.6
0.8
1
30
20
10
10
20
30
Figure 10: Cross correlation function gas(t+h) vs oil(t) over rst 80 points.
(c) Figure 9 shows the cross correlation function (CCF) over the entire record and is virtually the
same as the CCF over the last 100 points, which is not shown. We see indications of instantaneous
Chapter 2
18
gas(t) vs oil(t+1)
gas(t) vs oil(t)
0.2
0.2
0.1
0.1
0.1
0.1
0.2
0.2
0.3
0.4
0.2
0.2
0.4
0.3
0.4
gas(t) vs oil(t1)
0.2
0.1
0.1
0.1
0.1
0.2
0.2
0.2
0.2
0.4
0.2
0.4
gas(t) vs oil(t4)
0.2
0.3
0.4
0.2
0.2
0.4
0.3
0.4
0.2
Figure 11: Scatterplot relating oil changes on abscissa to gas changes on ordinate at various lags.
responses of gas prices to oil price changes and also signicant values at lags of +1 (oil leads gas)
and 1 gas leads oil; the second of these might be considered as feedback. Figure 10 shows the
CCF over the rst 80 points, when there were no really substantial bursts and we note that longer
lags seem to be important. The scatter diagrams shown in Figure 11 for the main lagged relations
show an interesting nonlinear phenomenon. Even when the oil changes are around zero on the
horizontal axis, there are still fairly substantial variations in gas prices. Larger uctuations in
oil price still produce linear changes in gas prices of about the same order. Hence, there may be
some indication of a threshold type of regression operating here, with changes of less than, say
5% in oil prices associated with fairly large uctuations in gasoline prices.
2.10 The R code for this problem is below.
soi=scan("/mydata/soi.dat")
# part (a)
t=1:length(soi)
fit=lm(soi~t)
summary(fit)
Estimate Std. Error t value Pr(>t)
(Intercept) 0.2109341 0.0353571
5.966 4.93e09 ***
t
0.0005766 0.0001350 4.272 2.36e05 *** # <significant slope
soi.detr=fit$resid
# part (b), detrended data in soi$resid
plot.ts(soi.detr)
per=abs(fft(soi.detr))^2
plot.ts(per)
cbind((t1)/453,per)
# lists frequency and periodogram
# El Nino peak is around .024 or approx 1 cycle/42 months
# (freq=0.024282561 local max per=5.536548e+02)
2.11 (a) Unlike SOI, the recruitment series autocorrelation continues to decrease with lag. Also, the point
Chapter 2
19
90
80
70
Recruits
60
50
40
30
20
10
0
1
0.8
0.6
0.4
0.2
0
SOI(6)
0.2
0.4
0.6
0.8
Chapter 3
20
Chapter 3
x (1)
1
3.1 Note x (1) = 1+
= (1+
2 . Thus
2 )2 = 0 when = 1. We conclude x (1) has a maximum at
j=0
2
w
2j = w
/(1 2 ). Thus, cov(xt , xth ) h 1
2 and
(d) Generate more than n observations, for example, generate n + 50 observations and discard the
rst 50.
2
2
w
(e) Use induction: var(x2 ) = var(x1 + w2 ) = 2 1
2 + w =
2
w
12 ,
2
w
12 .
2
w
12
w
By part (b), cov(xt , xth ) = h var(xth ) = h 1
2
3.3 (a) Write this as (1 .3B)(1 .5B)xt = (1 .3B)wt and reduce to (1 .5B)xt = wt . Hence the
process is a causal and invertible AR(1): xt = .5xt1 + wt .
(b) The AR polynomial is 1 1z + .5z 2 which has complex roots 1 i outside the unit circle (note
1 i2 = 2). The MA polynomial is 1 z which has root unity. Thus the process is a causal but
not invertible ARMA(2, 1).
3.4 Let 1 and 2 be the roots of (z), that is, (z) = (1 11 z)(1 21 z). The causal condition is
1  > 1, 2  > 1. Let u1 = 11 and u2 = 21 so that (z) = (1 u1 z)(1 u2 z) with causal condition
u1  < 1, u2  < 1. To show u1  < 1, u2  < 1 if and only if the three given inequalities hold. In terms
of u1 and u2 , the inequalities are:
(i) 2 + 1 1 = (1 u1 )(1 u2 ) < 0
(note 1 = u1 + u2 and 2 = u1 u2 )
3.5 Refer to Example 3.8. The roots of (z) = 1.9z 2 are i/ .9. Because the roots are purely imaginary,
h
= arg(i/ .9) = /2 and consequently, (h) = a .9 cos( 2 h + b), or (h) makes one cycle every 4
Chapter 3
21
values of h. Because (0) = 1 and (1) = 1 /(1 2 ) = 0, it follows that a = 1 and b = 0 in which
5
h
case (h) = .9 cos( 2 h). Thus (h) = {1, 0, .9, 0, .9 , . . .} for h = 0, 1, 2, 3, 4, . . ..
equal roots:
j = z0j (c1 + c2 j)
For the ACF we have (0) = 1 and (1) = 1 /(1 2 ). From (3.30)(3.31) we have
distinct roots: (h) = c1 z1h + c2 z2h
equal roots:
(a) (z) = 1+1.6z+.64z 2 = (1+.8z)2 . This is equal roots case with z0 = .8. Thus j = .8j (a+bj)
and (h) = .8h (c + dh). To solve for a and b note for j = 0 we have 0 = 1 = a and for j = 1
we have 1 = 1 = 1.6 = .81 (1 + b) or b = 2.28. Finally j = .8j (1 + 2.28j) for
j = 0, 1, 2, . . .. To solve for c and d note for h = 0 we have (0) = 1 = c and for h = 1 we
have (1) = 1.6/(1 + .64) = .81 (1 + d) or d = 1.78. Finally, (h) = .8h (1 + 1.78h) for
h = 0, 1, 2, . . . .
(b) (z) = 1 .4z .45z 2 = (1 .9z)(1 + .5z). This is the unequal roots case with z1 = .9 and
z2 = .5. Thus j = a0.9j + b(0.5)j where a and b are found by solving 1 = a + b and
.4 = a0.91 + b(0.5)1 . For the ACF, (h) = c0.9h + d(0.5)h where c and d are found by
solving 1 = c + d and .4/(1 .45) = c0.91 + d(0.5)1 .
(c) (z) = 1 1.2z + .85z 2 . This is the complex roots case, with roots .706 .824i. Refer to
Example 2.8, = arg(.706 + .824i) = .862 radians. Thus j = a.706 + .824ij cos(.862j + b) =
a 1.08j cos(.862j + b) where a and b satisfy 1 = a cos(b) and 1.2 = a 1.081 cos(.862 + b). For
the ACF, (h) = c 1.08h cos(.862h + d) where c and d are found by solving 1 = c cos(d) and
1.2/(1 + .85) = c 1.081 cos(.862 + d).
3.7 The ACF distinguishes the MA(1) case but not the ARMA(1,1) or AR(1) cases, which look similar to
each other (see Figure 2).
Chapter 3
22
Chapter 3
23
n
j
j=1 () xn+1j .
Thus
MSE = E(xn+1 x
#nn+1 )2 = E
2
()j xn+1j + wn+1
j=n+1
= E ()(n+1)
j=n+1
= E ()(n+1) w0 + wn+1
)2
!
2
= w
1 + 2(n+1) .
There can be a substantial dierence between the two MSEs for small values of n, but for large
n the dierence is negligible.
3.11 The proof is by contradiction. Assume there is a n that is singular. Because (0) > 0, 1 = {(0)}
is nonsingular. Thus, there is an r 1 such that r is nonsingular. Consider the ordered sequence
1 , 2 , . . . and suppose r+1 is the rst singular n in the sequence. Then xr+1 is a linear combination
of x = (x1 , . . . , xr ) , say, xr+1 = bx where b = (b1 , ..., br ) . Because of stationarity, it must also
be true that xr+h+1 = bxh , where xh = (xh , . . . , xr+h1 ) for all h 1. This means that for any
n r + 1, xn is a linear combination of x1 , . . . , xr , i.e., xn = bnx where bn = (bn1 , ..., bnr ) . Thus,
(0) = var(xn ) = bn r bn = bn QQbn where QQ is the identity matrix and = diag{1 , . . . , r } is
the diagonal matrix of the positive eigenvalues (0 < 1 r ) of r . From this result we conclude
(0) 1bn QQbn = 1
r
b2nj ;
j=1
this shows that for each j, bnj is bounded in n. In addition, (0) = cov(xn , xn ) = cov(xn , bnx) from
which it follows that
r
0 < (0)
bnj  (n j).
j=1
From this inequality it is seen that because the bnj are bounded, it is not possible to have (0) > 0
and (h) 0 as h .
3.12 First take the prediction equations (3.56) with n = h and divide both sides by (0) to obtain Rhh = h .
h1 , hh ) [note (0) = 1]:
Partition the equation as in the hint with h = (
h1
Rh1
h1
#h1
=
,
#h1
1
hh
(h)
and solve. We get
Rh1h1 +
#h1 hh = h1
#h1h1
+ hh = (h).
(1)
(2)
1
h1
h1 = Rh1
#h1 hh .
1
h1
(h)
#h1 Rh1
1
1
#h1 Rh1
#h1
E(t th )
2 )
E(2t )E(th
(3)
Chapter 3
24
can be written in the form of equation (3). To this end, let x = (xt1 , ..., xth+1 ) . The regression of xt
1
on x is (1
#h1 )x; see equation
h1 h1 ) x ; see equation (2.59). The regression of xth on x is (h1
(2.85) and the comments that follow (2.85). Thus
t = xt h1 1
h1 x
th = xth #h1 1
h1 x .
From this we calculate (the calculations below are all similar to the verication of equation (2.60); also,
note for vectors a and b, ab = ba)
E(t th ) = cov(t , th ) = (h) #h1 1
h1 h1 .
Similar calculations show that
2
) = var(th ) = (0) #h1 1
#h1 .
E(th
h1
Also note that the error of the regression of xt on x is the same as the error of the regression of xt on
#h1 )x
#. From this we conclude that
x
#, where x
# = (xth+1 , ..., xt1 ) ; that is, t = xt (1
h1
#h1 1
#h1 .
E(2t ) = var(t ) = (0) h1 1
h1 h1 = (0)
h1
This proves the result upon factoring out (0) in the numerator and denominator.
3.13 (a) We want to nd g(x) to minimize E[y g(x)]2 . Write this as E[E{(y g(x))2 x}]. Minimize
the inner expectation: E{(y g(x))2 x}/g(x) = 2[E(yx) g(x)] = 0 from which we conclude
g(x) = E(yx) is the required minimum.
(b) g(x) = E(yx) = E(x2 + zx) = x2 + E(z) = x2 . MSE = E(y g(x))2 = E(y x2 )2 = E(z 2 ) =
var(z) = 1.
(c) Let g(x) = a + bx. Using the prediction equations, g(x) satises
(i) E[y g(x)] = 0 (ii) E[(y g(x))x] = 0
or
(i) E[y] = E[a + bx] (ii) E(xy) = E[(a + bx)x]
From (i) we have a + bE(x) = E(y), but E(x) = 0 and E(y) = 1 so a = 1. From (ii) we have
aE(x) + bE(x2 ) = E(xy), or b = E[x(x2 + z)] = E(x3 ) + E(xz) = 0 + 0. Finally g(x) = a + bx = 1
and MSE = E(y 1)2 = E(y 2 ) 1 = E(x4 ) + E(z 2 ) 1 = 3 + 1 1 = 3.
Conclusion: In this case, the best linear predictor has three times the error of the optimal predictor
(conditional expectation).
m1 2
2
3.14 For an AR(1), equation (3.77) is exact; that is, E(xt+m xtt+m )2 = w
j=0 j . For an AR(1),
m1 2j
j
2
2
2m
2
j = and thus w j=0 = w (1 )/(1 ), the desired expression.
3.15 From Example 3.6, xt = 1.4 j=1 (.5)j1 xtj + wt , so the truncated onestepahead prediction using
n
(3.81) is x
#nn+1 = 1.4 j=1 (.5)j1 xn+1j .
From Equation (3.82)
x
#nn+1
n
= .9xn + .5w
#nn = .9xn + .5(xn .9xn1 .5w
#n1
)
n
n
= 1.4xn .9(.5)xn1 .5w
#n1 = 1.4xn .9(.5)xn1 .52 (xn1 .9xn2 .5w
#n2
)
n
= 1.4xn 1.4(.5)xn1 + .9(.52 )xn2 + .53 w
#n2
n
= 1.4xn 1.4(.5)xn1 + 1.4(.52 )xn2 .9(.53 )xn3 .54 w
#n3
..
.
n
(.5)j1 xn+1j
= 1.4
j=1
Chapter 3
25
m1
= E(
m+k1
j wn+mj )(
j=0
2
= w
m1
wn+m+k )
=0
j j+k
j=0
3.17 (a)(b) Below reg1 is least squares and reg2 is YuleWalker. The standard errors for each case are
also evaluated; the YuleWalker run uses Proposition P3.9. The two methods produce similar results.
(a)
> reg1=ar.ols(mort, order=2)
> reg2=ar.yw(mort, order=2)
> reg1
Coefficients:
1
2
0.4308 0.4410
Order selected 2 sigma^2 estimated as
> reg2
Coefficients:
1
2
0.4328 0.4395
Order selected 2 sigma^2 estimated as
32.39
32.62
(b)
> reg1$asy.se.coef
$ar
[1] 0.03996103 0.03994833
> reg2$asy.se.coef
> sqrt(diag(reg2$asy.var.coef))
[1] 0.04005162 0.04005162
3.18 (a) For an AR(1) we have, xn1 = x1 , xn0 = x1 , xn1 = xn0 = 2 x1 , and in general, xnt = 1t x1 for
t = 1, 0, 1, 2, . . ..
def
(b) w
"t () = xnt xnt1 = 1t x1 2t x1 = 1t (1 2 )x1 .
1
1
1
"t2 () = (1 2 )2 x21 t= 2(1t) = (1 2 )2 x21 t=1 2t2 = (1 2 )2 x21 1
(c)
2 =
t= w
2 2
(1 )x1 .
1
n
"t2 () + t=2 (xt xt1 )2 =
(d)
From (3.96), S() = (1 2 )x21 + t=2 (xt xt1 )2 = t= w
n
"t2 () using (c) and the fact that w
"t () = xt xt1 for 1 t n.
t= w
= xt1 and xt xt1
= xt xt1 . For t = 1, x01 = E(x1 ) = 0 so
(e) For t = 2, ..., n, xt1
t
t
t1
0
2
so rtt1 = 1. For t = 1,
x1 x1 = x1 . Also, for t = 2, ..., n, Pt = E(xt xt1 )2 = E(wt2 ) = w
0
2
2
2
0
2
P1 = E(x1 ) = w /(1 ) so r1 = 1/(1 ) we may write S() in the desired form.
3.19 The simulations can easily be done in R. Although the results will vary, the data should behave like
observations from a white noise process.
>
>
>
>
>
Chapter 3
26
2
2
= w
AN(, n1 V ) where V = w
1 + ( + ) j=1 j1 . Equivalently, x
j=0 j
AN(/(1 ), n1 V ).
(1 + a2 )s2 ,
3.24 (a) E(xt ) = 0 and x (h) = E(st + ast )(st+h + ast+h ) = as2 ,
0,
is stationary.
h=0
h = so the process
h > 1,
Chapter 3
27
2
k k
k+1 k+1
a
stk , and letting k
Also, xt ax
t + a xt2 + (1) a xtk = st (1)
j
shows st = j=0 (a) xtj is the mean square convergent representation of st . Note: If is
known, the process is an invertible MA() process with 1 = = 1 = 0 and = a.
(b) The GaussNewton procedure is similar to the MA(1) case in Example 3.30. Write st (a) = xt
ast (a) for t = 1, ..., n. Then zt (a) = st (a)/a = st (a) + ast /a = st (a) azt (a).
The iterative procedure is
n
zt (a(j) )st (a(j) )
n
a(j+1) = a(j) + t=1
j = 0, 1, 2, . . .
2
t=1 zt (a(j) )
where zt () = 0 and st () = 0 for t 0.
(c) If is unknown, the ACF of xt can be used to nd a preliminary estimate of . Then, a GaussNewton procedure can be used to minimize the error sum of squares, say Sc (a, ), over a grid of
values near the preliminary estimate. The values a
and that minimize Sc (a, ) are the required
estimates.
3.25 (a) By Property P3.9, " AN[, n1 (1 2 )] so that " = + Op (n1/2 ).
" n . Thus xn x
" n . Using Tchebches inequality,
(b) xnn+1 = xn whereas x
"nn+1 = x
"nn+1 = ( )x
n+1
it is easy to show xn = Op (1). Thus, by the properties of Op (),
" n = Op (n1/2 )Op (1) = Op (n1/2 )
"nn+1 = ( )x
xnn+1 x
3.26 Write k xt = (1 B)k xt =
k
j
2
y
=
tj
j=0
j=0 (xtj xt1j ). Rearranging wt = xt (1 )xt1 (1 )xt2 , or
xt = j=1 j (1 )xtj + wt .
3.28 See Figure 3. The EWMAs are smoother than the data (note the EWMAs are within the extremes of
the data). The EWMAs are not extemely dierent for the dierent values of , the smoothest EWMA
being when = .75.
x = scan("/mydata/varve.dat")
x=log(x[1:100])
plot(x)
a=matrix(c(.25,.5,.75),3,1)
xs=x
for (i in 1:3){for (n in 1:99){
xs[n+1]=(1a[i])*x[n] + a[i]*xs[n]}
lines(xs, lty=2, col=i+1, lwd=2)}
Chapter 3
1.5
2.0
2.5
3.0
3.5
4.0
28
20
40
60
80
100
Index
The results
3.30 Notice the high volatility near the middle and the end of the series. No ARIMA model will be able to
capture this and we shouldnt expect to a obtain a good t. Given the nature of the data, we suggest
working with the returns; that is if xt is the data, one should look at yt = ln(xt ). The ACF and
PACF of yt suggest an AR(3); that is, the ACF is tailing o whereas the PACF cuts o after lag 3.
Fitting an ARIMA(3,1,0) to ln(xt ) yields a reasonable t. The residuals appear to be uncorrelated,
but they are not normal (given the large number of outliers). Below is the R code for this problem.
x = scan("/mydata/gas.dat")
dlx=diff(log(x))
acf(dlx)
pacf(dlx)
fit=arima(log(x), order = c(3, 1, 0))
tsdiag(fit, gof.lag=20)
qqnorm(fit$resid)
shapiro.test(fit$resid)
3.31 An ARIMA(1,1,1) seems to t the data. Below is R code for the problem:
x = read.table("/mydata/globtemp2.dat")
gtemp=ts(x[,2], start=1880)
plot(gtemp)
par(mfrow=c(2,1))
acf(diff(gtemp), 30)
pacf(diff(gtemp), 30)
fit=arima(gtemp, order=c(1,1,1))
ar1
ma1
0.2545 0.7742
s.e. 0.1141
0.0651
sigma^2 estimated as 0.01728: log likelihood = 75.39,
tsdiag(fit, gof.lag=20)
# ok
predict(fit, n.ahead=15) # !!!! NOTE BELOW
aic = 144.77
R doesnt do the forecasting correctly, I think it is ignoring the fact that d = 1. In any case, the
forecasts should look more like this:
Chapter 3
29
Period
126
127
128
129
130
Forecast
0.576718
0.574925
0.578924
0.584482
0.590461
95 Percent Limits
Lower
Upper
0.319895 0.833541
0.293668 0.856183
0.287486 0.870361
0.285658 0.883306
0.285017 0.895906
Period
131
132
133
134
135
95 Percent Limits
Lower
Upper
0.284780 0.908327
0.284740 0.920613
0.284835 0.932780
0.285046 0.944835
0.285363 0.956787
Forecast
0.596554
0.602676
0.608808
0.614941
0.621075
3.32 There is trend so we consider the (rst) dierenced series, which looks stationary. Investigation of
the ACF and PACF of the dierenced suggest an ARMA(0,1) or ARMA(1,1) model. Fitting an
ARIMA(0,1,1) and ARIMA(1,1,1) to the original data indicates the ARIMA(0,1,1) model; the AR
parameter is not signicant in the ARIMA(1,1,1) t. The residuals appear to be (borderline) white,
but not normal.
x = scan("/mydata/so2.dat")
dx=diff(x)
acf(dx)
pacf(dx)
fit=arima(log(x), order = c(0,1,1))
fit
tsdiag(fit, gof.lag=20)
qqnorm(fit$resid)
shapiro.test(fit$resid)
3.33 (a) The model is ARIMA(0, 0, 2) (0, 0, 0)s (s can be anything) or ARIMA(0, 0, 0) (0, 0, 1)2 .
()k xt2k .
k=0
k
k=1 () xn+m2k
x
#n+m =
()k x
#n+m2k
k=1
where x
#t = xt for t n. For the prediction error, note that 0 = 1, 2 = and j = 0 otherwise.
n
2
n
2
= w
for m = 1, 2; when m > 2 we have Pn+m
= w
(1 + 2 ).
Using (3.78), Pn+m
See Figure 4.
0.2
0.0
0.2
0.2
0.0
0.4
acf
pacf
0.6
0.4
0.8
0.6
1.0
3.34 Use the code from Example 3.41 with ma=.5 instead of ma=.5.
10
20
30
40
50
10
lag
20
30
lag
40
50
Chapter 3
30
3.35 After plotting the unemployment data, say xt , it is clear that one should t an ARMA model to
yt = 12 xt . The ACF and PACF of yt indicate a clear SMA(1) pattern (the seasonal lags in the
ACF cut o after lag 12, whereas the seasonal lags in the PACF tail o at lags 12, 24, 36, and so on).
Next, t an SARIMA(0, 1, 0) (0, 1, 1)12 to xt and look at the ACF and PACF of the residuals. The
within season part of the ACF tails o, and the PACF is either cutting o at lag 2 or is tailing o.
These facts suggest an AR(2) or and ARMA(1,1) for the within season part of the model. Hence, t
an (i) SARIMA(2, 1, 0) (0, 1, 1)12 or an (ii) SARIMA(1, 1, 1) (0, 1, 1)1 2 to xt . Both models have
the same number of parameters, so it should be clear that model (i) is better because the MSE is
smaller for model (i) and the residuals appear to white (while there may still be some correlation left
in the residuals for model (ii)). Below is the R code for tting model (i), along with diagnostics and
forecasting.
x = scan("/mydata/unemp.dat")
par(mfrow=c(2,1))
# (P)ACF of d1d12 data
acf(diff(diff(x),12), 48)
pacf(diff(diff(x),12), 48)
fiti = arima(x, order=c(2,1,0), seasonal=list(order=c(0,1,1), period=12))
fiti
# to view the results
tsdiag(fiti, gof.lag=48) # diagnostics
x.pr = predict(fiti, n.ahead=12) # forecasts
U = x.pr$pred + 2*x.pr$se
L = x.pr$pred  2*x.pr$se
month=337:372
plot(month, x[month], type="o", xlim=c(337,384), ylim=c(360,810))
lines(x.pr$pred, col="red", type="o")
lines(U, col="blue", lty="dashed")
lines(L, col="blue", lty="dashed")
abline(v=372.5,lty="dotted")
3.36 The monthly (s = 12) U.S. Live Birth Series can be found in birth.dat. After plotting the data, say
xt , it is clear that one should t an ARMA model to yt = 12 xt . The ACF and PACF of yt indicate
a seasonal MA of order one, that is, t an ARIMA(0, 0, 0) (0, 0, 1)12 to yt . Looking at the ACF and
PACF of the residuals of that t suggests tting a nonseasonal ARMA(1,1) component (both the ACF
and PACF appear to be tailing o). After that, the residuals appear to be white. Finally, we settle
on tting an ARIMA(1, 1, 1) (0, 1, 1)12 model to the original data, xt . The code for this problem is
nearly the same as the previous problem.
x=scan("/mydata/birth.dat")
par(mfrow=c(2,1))
# (P)ACF of d1d12 data
acf(diff(diff(x),12), 48)
pacf(diff(diff(x),12), 48)
### fit model (i)
fit = arima(x, order=c(1,1,1), seasonal=list(order=c(0,1,1), period=12))
fit
# to view the results
tsdiag(fit, gof.lag=48) # diagnostics
x.pr = predict(fit, n.ahead=12) # forecasts
U = x.pr$pred + 2*x.pr$se
L = x.pr$pred  2*x.pr$se
month=337:372
plot(month, x[month], type="o", xlim=c(337,384), ylim=c(240,340))
lines(x.pr$pred, col="red", type="o")
lines(U, col="blue", lty="dashed")
lines(L, col="blue", lty="dashed")
abline(v=372.5,lty="dotted")
3.37 Because of the increasing variability, the data, jjt , should be logged prior to any further analysis.
A plot of the logged data, say yt = ln jjt , shows trend, and one should notice the dierences in the
Chapter 3
31
behavior of the series at the beginning, middle, and end of the data (as if there are 3 dierent regimes).
Because of these inconsistencies (nonstationarities), it is dicult to discover an ARMA model and one
should expect students to come up with various models. In fact, assigning this problem may decrease
your student evaluations substantially.
Next, apply a rst dierence and seasonal dierence to the logged data: xt = 4 yt . The PACF of xt
reveals a large correlation at the seasonal lag 4, so an SAR(1) seems appropriate. The ACF and PACF
of the residuals reveals an ARMA(1,1) correlation structure for the within the seasons. This seems to
be a reasonable t. Hence, a reasonable model is an SARIMA(1, 1, 0) (1, 1, 0)4 on the logged data.
Below is R code for this problem.
jj=scan("/mydata/jj.dat")
x=diff(diff(log(jj)),4)
par(mfrow=c(2,1))
acf(x, 24)
pacf(x, 24)
fit1 = arima(log(jj),order=c(0,1,0),seasonal=list(order=c(1,1,0), period=4))
par(mfrow=c(2,1))
acf(fit1$resid, 24)
pacf(fit1$resid, 24)
fit2 = arima(log(jj),order=c(1,1,0),seasonal=list(order=c(1,1,0), period=4))
par(mfrow=c(2,1))
acf(fit2$resid, 24)
pacf(fit2$resid, 24)
tsdiag(fit2, gof.lag=24)
### forecasts for the final model
x.pr = predict(fit2, n.ahead=4)
U = x.pr$pred + 2*x.pr$se
L = x.pr$pred  2*x.pr$se
quarter=1:88
plot(quarter, log(jj[quarter]), type="o", ylim=c(1,4))
lines(x.pr$pred, col="red", type="o")
lines(U, col="blue", lty="dashed")
lines(L, col="blue", lty="dashed")
abline(v=84.5,lty="dotted")
p
"n+1 satises the prediction
3.38 Clearly
j=1 j xn+1j sp{xk ; k n}, so it suces to show that x
"n+1 )xk ] = 0 for k n. But, by the model assumption, xn+1 x
"n+1 = wn+1 and
equations E[(xn+1 x
E(wn+1 xk ) = 0 for all k n.
3.39 First note that xi xi1
and xj xj1
for j > i = 1, ..., n are uncorrelated. This is because xi xi1
i
i
j
j1
sp{xk ; k = 1, ..., i} but xj xj
is orthogonal (uncorrelated) to sp{xk ; k = 1, ..., i} by denition of
xj1
Thus, by the projection theorem, for t = 1, 2, . . .,
j
xtt+1 =
t
tk (xt+1k xtk
t+1k )
(1)
k=1
where the tk are obtained by the prediction equations. Multiply both sides of (1) by xj+1 xjj+1 for
j = 0, ..., t 1 and take expectation to obtain
(
)
j
E xtt+1 (xj+1 xjj+1 ) = t,tj Pj+1
.
Because of the orthogonality E[(xt+1 xtt+1 )(xj+1 xjj+1 )] = 0 when j < t, so equation above can be
written as
)
(
j
.
(2)
E xt+1 (xj+1 xjj+1 ) = t,tj Pj+1
Chapter 3
32
,
j
!1
j
t,tj = E xt+1 xj+1
jk (xj+1k xjk
Pj+1
.
j+1k )
k=1
Thus
t,tj =
(t j) +
j
(
)
jk E xt+1 (xj+1k xjk
)
j+1k
j
Pj+1
!1
(3)
k=1
k
Using (2) we can write E[xt+1 (xj+1k xjk
j+1k )] = t,tk Pk+1 so (3) can be written in the form of
t
t
(2.71). To show (2.70), rst note that E(xt+1 xt+1 ) = E[xt+1 E(xt+1 x1 , ..., xt )] = E[(xtt+1 )2 ]. Then,
for t = 1, 2, ...,
t
= E(xt+1 xtt+1 )2 = (0) E[(xtt+1 )2 ] = (0)
Pt+1
t1
j
2
t,tj
Pj+1
j=0
2 1
0
1
2
1
0 1
2
..
.
.
.
.
0
0
0
0
0
n
k=1
0
0
1
..
.
1
0
0
1
0 2 0
..
..
... = ... .
.
.
1
2 1 n
1
2
(n+2) 2
(n+1) w .
1
0
0 0
11
1
0 0
21
1 0
22
C=
.
..
..
..
..
.
.
.
.
n1,n1
n1,n2
Thus, L(x
x) = L(C). Noting that C N(0, CDC ), where D = diag{P10 , P21 , . . . , Pnn1 } we have
1
L(x
x) = L(C) = CDC 1/2 exp{ C (CDC )1 C}.
2
This establishes the result noting that CDC 
= C 2  D and C 2  = 1, D = P10 P21 Pnn1 , and in
n
1
1
)2 /Ptt1 .
the exponential C (CDC ) C = D = t=1 (xt xt1
t
3.42 These results are proven in Brockwell and Davis (1991, Proposition 2.3.2).
3.43 The proof of Property P2.2 is virtually identical to the proof of Property P2.1 given in Appendix B.
Chapter 4
33
Chapter 4
4.1 (a)(b) The code is basically the same as the example and is given below. The dierence is the
frequencies in the data (which are .06, .1, .4) are no longer fundamental frequencies (which are of the
form k/128). Consequently, the periodogram will have nonzero entries near .06, .1, .4 (unlike the
example where all other frequencies are zero).
t = 1:128
x1 = 2*cos(2*pi*t*6/100) + 3*sin(2*pi*t*6/100)
x2 = 4*cos(2*pi*t*10/100) + 5*sin(2*pi*t*10/100)
x3 = 6*cos(2*pi*t*40/100) + 7*sin(2*pi*t*40/100)
x = x1 + x2 + x3
par(mfrow=c(2,2))
plot.ts(x1, ylim=c(16,16), main="freq=6/100, amp^2=13")
plot.ts(x2, ylim=c(16,16), main="freq=10/100, amp^2=41")
plot.ts(x3, ylim=c(16,16), main="freq=40/100, amp^2=85")
plot.ts(x, ylim=c(16,16), main="sum")
P = abs(2*fft(x)/128)^2
f = 0:64/128
plot(f, P[1:65], type="o", xlab="frequency", ylab="periodogram")
(c) Use the same code as in the example, but with x = x1 + x2 + x3 + rnorm(100,0,5). Now the
periodogram will have large peaks at .06, .1, .4, but will also be positive at most other fundamental
frequencies.
4.2 (a) Rewrite the transformation as
x = tan1
z2
z1
y = z12 + z22 .
Note that
1 u
tan1 u =
.
x
1 + u2 x
Write the joint density of x and y as
g(x, y) = f (z1 , z2 )J,
1 ,z2 )
where J denotes the Jacobian, i.e., the determinant of the 2 2 matrix { (z
(x,y) }. It is easier to
compute
x
z
z1
x
2
z1 z2 z12 +z22 z12 +z22
1
=
= 2,
=
J
y
y
2z1
2z2
z1
z2
1
1 1
1
1
=
2 exp{y/2} =
exp (z12 + z22 ) ,
J
2 2
2
2
Chapter 4
34
4.3 This is similar to Problem 1.9. Write the terms in the sum (4.4) as xt,k and note that xk,t and xt, are
uncorrelated for k = .
k (h)
= E(xt+h,k xt,k )
= E U1k sin[2k (t + h)] + U2k cos[2k (t + h)] U1k sin[2k t] + U2k cos[2k t]
= k2 sin[2k (t + h)] sin[2k t] + cos[2k (t + h)] cos[2k t]
= k2 cos[2k (t + h) 2k t] = k2 cos[2k h]
q
and (h) = k=1 k (h) give (4.5).
4.4 (a) Ewt = Ext = 0 by linearity, w (0) = 1 and zero otherwise; x (0) = (1 + 12 ), x (1) = 1 , and
is zero otherwise. The series are stationary because they are zero mean and the autocovariance
does not depend on time but only on the shift.
(b) By (4.13),
fx () =
1
h=1
x (h)
1/2
=
1/2
which exhibits the form of the spectrum by the uniqueness of the Fourier transform. Equating
the spectra of the left and right sides of the dening equation leads to
2
[1 + 2 2 cos(2)]fx () = w
=
=
=
=
2 h 2ih
2
2 h 2ih
w
e
w
e
w
+
+
2
2
1
1
1 2
h=
h=1
2
w
2i h
2i h
(e
) +
(e
)
1 2
h=0
h=1
2
1
w
e2i
+
1 2 1 e2i
1 e2i
2
2
1
w
2
1 1 e2i 2
2
w
,
1 + 2 2 cos(2)
Chapter 4
35
1/2
1/2
1 + A2 + Ae2iD + Ae2iD + fn () e2ih d
Substituting the exponential representation for cos(2D) and using the uniqueness gives the
required result.
(b) Note that multiplier for the signal spectrum is periodic and will be zero for
cos(2D) =
1 + A2
2A
Determining the multiple solutions for in the above equation will yield equally spaced values of
, proportional to D, where the spectrum should be zero.
4.7 The product series will have mean E(xt yt ) = E(xt )E(yt ) = 0 and autocovariance
z (h) = Ext+h yt+h xt yt = E(xt+h xt )E(yt+h yt ) = x (h)y (h).
Now, by (4.12)and (4.13)
fz () =
x (h)y (h) exp{2ih} =
h=
1/2
1/2 h=
x (h)e
2i()h
1/2
1/2 h=
fy () d =
x (h)e2ih e2ih fy () d
1/2
1/2
fx ( )fy () d.
4.8 Below is R code that will plot the periodogram on the actual scale and then on a log scale (this produces
a generic condence interval see Example 4.9 on how to get precise limits). The two major peaks are
marked; they are 3 cycles/480 points = 3 cycles/240 years or 80 years/cycle, and 22 cycles/480 points
= 22 cycles/240 years or about 11 years/cycle.
sun = scan("/mydata/sunspots.dat")
par(mfrow=c(2,1))
sun.per = spec.pgram(sun, taper=0,log="no")
sun.per = spec.pgram(sun, taper=0)
abline(v=3/480, lty="dashed") # 80 year cycle
abline(v=22/480, lty="dashed") # 11 year cycle
4.9 This is like the previous problem; the main component is 1 cycle/16 rows, although theres not enough
data to get signicance.
x = scan("/mydata/salt.dat")
temp = x[1:64]
salt = x[65:128]
par(mfrow=c(2,1))
temp.per = spec.pgram(temp, taper=0,log="no")
temp.per = spec.pgram(temp, taper=0)
abline(v=2/32, lty="dashed")
salt.per = spec.pgram(salt, taper=0,log="no")
salt.per = spec.pgram(salt, taper=0)
abline(v=2/32, lty="dashed")
Chapter 4
36
4.10 (a) Write the model in the notation of Chapter 2 as xt = z t +wt , where z t = (cos(2k t), sin(2k t))
and = (1 , 2 ) . Then
n
n
2
n
t=1 cos (2k t)
t=1 cos(2k t) sin(2k t)
0
= n/2
z tz t =
n
0
n/2
n
2
t=1
t=1 cos(2k t) sin(2k t)
t=1 sin (2k t)
from the orthogonality properties of the sines and cosines. For example,
n
1 2ik t
e
+ e2ik t e2ik t + e2ik t
4 t=1
n
cos2 (2k t) =
t=1
n
1 4ik t
e
+ 1 + 1 + e4ik t = ,
4 t=1
2
n
=
because, for example,
n
4ik t
t=1
Substituting,
1
2
e4ik/n 1 e4ik
=
=0
1 e4ik/n
n
x cos(2k t)
2 t=1 t
dc (k )
1/2
= 2n
=
.
ds (k )
n n
t=1 xt sin(2k t)
(b) Now,
n
SSE
= xx 2n1/2 ( dc (k ),
t=1
xt cos(2k t)
n
t=1
xt sin(2k t)
ds (k ) )
= xx 2 d2c (k ) + d2s (k ) = xx 2Ix (k ).
(c) The reduced model is given by xt = wt , so that RSS1 =
(b). For the F test we have q = 2, q1 = 0, so that
F2,n2 =
n
2Ix (k )
x x 2Ix (k )
t=1
n2
2
is monotone in Ix .
4.11 By applying the denition to xts , we obtain
n
as xts
= n1/2
s=1
n1
dx (k )
k=0
n1
k=0
dx (k )n1/2
n
as e2ik (ts)
s=1
n
as e2ik s e2ik t =
s=1
n1
k=0
dA (k )dx (k )e2ik t .
Chapter 4
37
x = scan("/mydata/salt.dat")
temp = x[1:64]
salt = x[65:128]
par(mfrow=c(2,1))
temp.per = spec.pgram(temp, spans=5, log="no")
abline(v=2/32, lty="dashed")
salt.per = spectrum(salt, spans=5, log="no")
abline(v=2/32, lty="dashed")
4.14 R code and discussion below. Also, see Figure 1.
10 15
speech = scan("/mydata/speech.dat")
sp.per = spec.pgram(speech, taper=0) # plots the periodogram  which is periodic
x=sp.per$spec
# x is the periodogram
x=log(x)
# log periodogram
ts.plot(x)
# another plot as a time series.
x.sp=spectrum(x,span=5) # cepstral analysis, x is detrended by default in R
abline(v=.1035, lty="dashed")
cbind(x.sp$freq,x.sp$spec)
# this lists the quefrencies and cepstra
[52,] 0.101562500 32.7549412
[53,] 0.103515625 34.8354468 # peak is around here, so Delay is about .1035 seconds
[54,] 0.105468750 30.3669195
#
which is about the same result as Example 1.24
100
200
300
400
500
Time
spectrum
Series: x
Smoothed Periodogram
0.0
0.1
0.2
0.3
0.4
0.5
frequency
bandwidth = 0.00246
n
ht xt e2ik t .
t=1
n
n
hs ht (s t)e2ik (st) =
s=1 t=1
1/2
1
= n
=
1/2
hs ht (s t)e2ik (st)
s=1 t=1
n
1/2 s=1
1/2
n
n
hs e2i(k )s
n
t=1
Hn (k )2 fx () d.
ht e2i(k )t fx () d
Chapter 4
38
It follows that
1
2
E L
Y (k + /n)
1/2
=
1/2
1/2
=
1/2
1
Hn (k + /n )2 fx () d
L
Wn (k )fx () d.
4.16 (a) Since the means are both zero and the ACFs and CCFs
2
x (h) = 1
h=0
h = 1
h 2
1/2
y (h) = 1/4
h=0
h = 1
h 2
1/2
xy (h) =
1/2
h=0
h=1
h = 1
h 2
do not depend on the time index, the series are jointly stationary.
(b)
fx () = 1 e2i 2 = 2(1 cos(2))
and fy () =
1
1
1 + e2i 2 = (1 + cos(2))
4
2
As goes from 0 12 , fx () increases, whereas fy () decreases. This means xt has more high
frequency behavior and yt has more low frequency behavior.
(c)
2La
2Lfy (.10)
2Lb
2La
2Lb
22L
=P
fy (.10)
fy (.10)
fy (.10)
fy (.10)
fy (.10)
and
2Lb
= 22L (.05)
fy (.10)
Setting L = 3, 26 (.95) = 1.635, 26 (.05) = 12.592, fy (.10) = .9045 and solving for a and b yields
a = .25, b = 1.90.
4.17 The analysis is similar to that of Example 4.16. The squared coherency is very large at periods ranging
from 1632 points, or 272544 feet (1 point = 17 feet). R code below:
x = scan("/mydata/salt.dat")
temp = x[1:64]
salt = x[65:128]
x = ts(cbind(temp,salt))
s = spec.pgram(x, kernel("daniell",2), taper=0)
s$df
# = 10
f = qf(.999, 2, s$df2)
# = 18.49365
c = f/(18+f)
# = 0.5067635
plot(s, plot.type = "coh", ci.lty = 2)
abline(h = c)
cbind(s$freq, s$coh)
[,1]
[,2]
[1,] 0.015625 0.598399213
[2,] 0.031250 0.859492914 # period 1/.03 = 32
[3,] 0.046875 0.891469033
[4,] 0.062500 0.911331648 # period 1/.06 = 16
[5,] 0.078125 0.749974642
Chapter 4
39
2 D
hD = 1 when h = D and zero
4.18 (a) xy (h) = cov(xt+h , yt ) = cov(w
t+h , wt + vt ) = h where
2
otherwise. Thus, fxy () = h xy (h) exp(2ih) = exp(2iD). Also, fx () = 2 and
P4.4 and the fact that wt and vt are independent. Finally
fy () = 2 (1 + 2 ) using Proposition
0
2xy () =  2 exp(2iD)2 [ 2 2 (1 + 2 )] = 2 /(1 + 2 ), which is constant and does not
depend on the value of D.
(b) In this case, 2xy () = .81/1.81 = .45. The R code to simulate the data and estimate the coherence
is given below. Note that using L = 1 gives a value of 1 no matter what the processes are, and
increasing L (span) gives better estimates.
x=rnorm(1024,0,1)
y=.9*x+rnorm(1024,0,1)
u = ts(cbind(x,y))
s=spec.pgram(u, taper=0, plot=F)
# use this for span=0 or
s=spectrum(u, span=3, taper=0, plot=F) # this for span=3 (span=41 and span=101)
plot(s, plot.type = "coh")
#  these two lines can be used
abline(h = .81/1.81, lty="dashed")
#  to obtain plots for each case
4.19 (a) It follows from the solution to the previous problem that xy () = 2D. Hence the slope of the
phase divided by 2 is the delay D.
(b) Bigger values of L give better estimates.
x=ts(rnorm(1025,0,1))
y=.9*lag(x,1)+rnorm(1025,0,1)
u = ts(cbind(x,y))
u = u[2:1025,]
# drop the NAs
s = spectrum(u, span=101, taper=0, plot=F) # use span=3,41,101(displayed)
plot(s, plot.type = "phase")
abline(a=0,b=2*pi, lty="dashed")
# for L=1 use: s = spec.pgram(u, taper=0, plot=F)
4.20 (a) The R code for the crossspectral analysis of the two series is below:
x=ts(scan("/mydata/prod.dat"))
y=ts(scan("/mydata/unemp.dat"))
pu=cbind(x,y)
par(mfrow=c(2,1))
pu.sp=spectrum(pu, span=7)
abline(v=c(1/12,2/12,3/12,4/12,5/12),lty="dashed")
plot(pu.sp, plot.type="coh")
See Figures 2 4. The log spectra with L = 7, show substantial peaks at periods of 2.5 months,
3 months, 4 months and 6 months for the production spectrum and signicant peaks at those
periods plus a 12 month or one year periodicity in the unemployment spectrum. It is natural that
the series tend to repeat yearly and quarterly so that 12 month and 3 month periods would be
expected. The 6 month period could be wintersummer uctuations or possibly a harmonic of the
yearly cycle. The 4 month period could be a three cycle per year variation due to something less
than quarterly variation or possibly a harmonic of the yearly cycle (recall harmonics of 1/12 are
of the form k/12, for k = 2, 3, 4, .... The squared coherence is large at the seasonal frequencies,
as well as a low frequency of about 33 months, or three years, possibly due to a common low
frequency business cycle. High coherence at a particular frequency indicates parallel movement
between two series at the frequency, but not necessarily causality.
(b) The following code will plot the frequency response functions; see Figure 3
w = seq(0,.5, length=1000)
par(mfrow=c(2,1))
FR12 = abs(1exp(2i*12*pi*w))^2
plot(w, FR12, type="l", main="12th difference")
FR112 = abs(1exp(2i*pi*w)exp(2i*12*pi*w)+exp(2i*13*pi*w))^2
plot(w, FR112, type="l", main="1st diff and 12th diff")
Chapter 4
40
15
12mo
10
2.5mo
4mo
4mo
3mo
3mo
2.5mo
6mo
6mo
0.1
0.2
0.3
0.4
0.5
0.1
0.2
0.3
0.4
0.5
Coherence
1
33 mo
6mo
2.5mo
4mo
9mo
0.8
3mo
(coherence)
F2,8(.01)
0.6
0.4
0.2
0
0.1
0.2
0.3
0.4
0.5
2
0
FR12
12th difference
0.0
0.1
0.2
0.3
0.4
0.5
0.4
0.5
10
5
0
FR112
15
0.0
0.1
0.2
0.3
w
Chapter 4
41
Production Index
200
150
100
50
0
50
100
50
100
50
100
150
200
First Difference
250
300
350
150
200
250
Seasonal Difference of First Difference
300
350
300
350
10
5
0
5
10
10
5
0
5
10
150
200
Month
250
Power
10
0.1
0.2
0.3
0.4
0.5
0.1
0.2
0.3
Frequency
0.4
0.5
Frequency
0.1
0.2
0.3
Power
0.4
0.5
Figure 5: Log spectra of production, dierenced production and seasonally dierenced dierenced production.
the seasonal components is essentially notched out. Economists would prefer a atter response
for the seasonally adjusted series and the design of seasonal adjustment lters that maintain a
atter response is a continuing saga. Shumway (1988, Section 4.4.3) shows an example.
4.21 Write the lter in the general form (4.91), with a2 = a2 = 1, a1 = a1 = 4, a0 = 6. Then
A()
2
t=2
(6 + 2 cos(4) + 8 cos(2))
By (4.94), the spectrum of the output series is fy () = (6 + 2 cos(4) + 8 cos(2))2 fx (). The spectrum of the output series will depend on the spectrum of the input series but we can see how frequencies
of the input series are modied by plotting the squared frequency response function A()2 . After
plotting the frequency response function, it will be obvious that the high frequencies are attenuated
and the lower frequencies are not. The lter is referred to as a low pass lter, because it keeps or passes
the low frequencies.
Chapter 4
42
4.22
yt
1 2it
2it
2it
=
ak cos[2(t k)] = e
A() + e
A () = Re A()e
2
k=
= Re (AR () iAI ())(cos(2t) + i sin(2t)) = AR () cos(2t) + AI sin(2t)
= A() cos(()) cos(2t) sin(()) sin(2t) = A() cos(2t + ()).
A2R () + A2I ().
25
1.5
20
1
15
10
0.5
0.1
0.2
0.3
frequency
0.4
0.5
0.1
0.2
0.3
frequency
0.4
0.5
Chapter 4
43
4.26 R code for tting an AR spectrum using AIC is given below. The analysis results in tting an AR(13)
spectrum, which is similar to the nonparametric spectral estimate.
rec=scan("/mydata/recruit.dat")
spec.ar(rec)
4.27 We have 2Lfx (1/8)/fx (1/8) 22L where fx (1/8) = [1 + .52 2(.5) cos(2/8)]1 = 1.842 from Problem
4.24(a). For L = 3, we have 2(3)2.25/1.842 = 7.26 and does not exceed 26 (.05) = 12.59. For L = 11,
we have 2(11)2.25/1.842 = 26.87 and does not exceed 222 (.05) = 33.92. Neither sample has evidence
for rejecting the hypothesis that the spectrum is as claimed at the = .05 level.
4.28 The conditions imply that under H0 : d(k + /n) CN {0, fn ()} and under H1 : d(k + /n)
CN {0, fs () + fn ()}. For simplicity in notation, denote d = d(k + /n) and fs = fs (), fn = fn ().
(a) The ratio of likelihoods, under the two hypotheses would be
1
L (fs + fn )L exp{d 2 /(fs + fn )}
p1
1
=
p0
L (fn ) exp{d 2 /fn }
and the log likelihood involving the data d is proportional to
1
1
p1
2
d 
+
T = ln
,
p0
fn
(fs + fn )
(b) Write
T =
fs
d 2
fn (fs + fn )
d 2
fn
under H0 and
22L
2 d 2
22L
(fs + fn )
under H1 . Hence
T
fs
1
2
2 (fs + fn ) 2L
under H0 and
T
1 fs 2
2 fn 2L
under H1 .
(c) Here, we note that
fs + fn
(SN R + 1)
= P 22L > 2K
PF = P {T > KH0 } = P 22L > 2K
,
fs
SN R
and
fn
2K
Pd = P {T > KH1 } = P 22L > 2K
= P 22L >
,
fs
SN R
where SN R denotes the signaltonoise ratio. Note that, as SN R , PF P {T > 2K} and
Pd 1, and the signal detection probability approaches unity for a xed false alarm rate, as
guaranteed by the NeymanPearson lemma.
4.29 The gures (shown at the end of the solutions) for the other earthquakes and explosions are consistent,
for the most part, with Example 4.20. The NZ event is more like an explosion than an earthquake.
Chapter 4
44
4.30 For brevity, we only show the energy distribution of the other earthquakes (EQ) and explosions (EX);
see Table 4.2 for EQ1 and EX1 and Example 4.22 for the NZ event. Typically, earthquakes have most
of the energy distributed between d2d4; the explosions typically have most of the energy distributed
between d2d3 (as does the NZ event). The waveshrink estimates for EQ 2, 4, 6, 8 and EX 2, 4, 6, 8
are shown at the end of the solutions.
Energy(%) Distribution for
EQ2
EQ3
EQ4
s6
0.000 0.000 0.012
d6
0.001 0.003 0.017
d5
0.036 0.121 0.071
d4
0.200 0.346 0.402
d3
0.433 0.399 0.334
d2
0.266 0.119 0.127
d1
0.064 0.012 0.038
Earthquakes
EQ5
EQ6
0.009 0.001
0.043 0.002
0.377 0.184
0.366 0.507
0.160 0.230
0.040 0.071
0.003 0.006
EQ7
0.000
0.001
0.019
0.309
0.524
0.129
0.019
EQ8
0.000
0.005
0.118
0.484
0.287
0.095
0.010
Explosions
EX5
EX6
0.001 0.002
0.005 0.002
0.005 0.007
0.018 0.015
0.210 0.559
0.654 0.349
0.108 0.066
EX7
0.001
0.009
0.026
0.123
0.366
0.413
0.062
EX8
0.005
0.018
0.130
0.384
0.318
0.122
0.024
4.31 The solution to this problem is given in the discussion of the previous two problems, 4.29 and 4.30.
4.32 Note rst that
aM
k
= M 1
M
1
A(j )e2ij k = M 1
at e2ij t e2ij k
j=0 t=
j=0
M
1
at M 1
t=
M
1
e2ij (tk) =
j=0
ak+M = ak +
=
ak+M .
=0
Thus
yt ytM
kM/2
kM/2
ak xtk
=0 k<M/2
ak xtk  +
ak+M xtk
ak xtk  +
kM/2
ak xtk  2
k>M/2
ak+M xtk 
=0 k<M/2
ak xtk ,
kM/2
where the last steps follow by writing the separate sums for = 1, 2, . . . and simplifying. Then
aj ak E[xtj xtk ]
E[(yt ytM )2 ] 4
jM/2 kM/2
jM/2 kM/2
4x (0)
2
ak  ,
kM/2
which goes to zero as M increases as long as the absolute summability condition holds.
Chapter 4
45
4.33 Multiply both sides of the equation by xt+h an use the Fourier representation of the spectra and cross
spectra to show that fyx () = A()fx (). Also, fy () = A()2 fx (). Then, by the denition of
squared coherence,
A()fx ()2
fyx ()2
=
=1
2yx () =
fx ()fy ()
fx ()A()2 fx ()
4.34 (a) Figure 7.3 (in the text) shows what the ordinary coherence functions should look like. It is clear
that the precipitationinow coherence is uniformly larger than the others so that precipitation
should be considered as the major contributor over the entire frequency range. Cloud cover also
appears to be predictive.
(b) Figure 7 shows the impulseresponse function, suggesting dependence of the inow on an exponentially weighted combination of past precipitation, i.e.
It
j Ptj =
j=0
j B j P t =
j=0
1
Pt .
1 B
One can approximate the coecient from the plot or run a regression of the form
It = It1 + Pt
which yields = .54 as the decay constant.
Impulse response relating Inflow to Precipitation
0.5
0.4
0.3
0.2
0.1
0.1
0.2
0.3
0.4
0.5
15
10
0
lag
10
15
Figure 7: Impulse response relating transformed precipitation to transformed inow. Shows exponentially
decaying dependence on present and past precipitation.
4.35 For the model yt =
r=
y (h) =
r x (h r + s)s + v (h).
r= s=
Substituting the spectral representations for x (h) and v (h) and identifying the representations for
fx (), fv () as well as the Fourier series for t , leads to the rst required result. Then, note that
xy (h) =
r x (h + r) + v (h)
r=
xy (h) =
r e2ir fx ()e2ih d =
B()fx ()e2ih d.
1/2 r=
1/2
Chapter 4
46
= xt 1 xt1 + vt 1 vt1
= wt + vt 1 vt1 ,
and identify the righthand side as a rstorder MA. The required spectrum is the spectrum of an
ARMA(1, 1) process, which also has an rst order MA on the righthand side. Assuming that the
process is Gaussian, we can equate the ACFs of the two processes and solve for 2 and 1 . Letting
ut = wt + vt 1 vt1 , we obtain
2
+ (1 + 22 )v2
u (0) = w
and u (1) = 1 v2 . Equating these results to the corresponding results for a rstorder MA leads to
the equations
2
(1 + 21 )v2 and 1 2 = 1 v2
2 (1 + 12 ) = w
relating the two models. Solving the second and substituting back into the rst equation leads to the
required results.
4.37 (a) Set up the orthogonality condition
E[(xt
as yts )ytu ] = 0, u = 0, 1, . . . ,
s=
which leads to the normal equations s= as y (u s) = xy (u). Noting that xy (u) = x (u)
and taking Fourier transforms leads to the equation
A() =
2
1
1 1 e2i 2
1
2
fx ()
= w2
= w2
,
2i
2
2i
2
fy ()
1 1 e
 1 1 e

1 1 e2i 2
2
which we recognize as the spectrum of a rstorder AR process, with variance w
/ 2 . Hence its
Fourier transform will be the autocovariance of a rstorder AR process, i.e.
s
as =
2
1
w
2
1 12
s=
2
w
1
1
4
w2
2
1 1
1 12 1 21
as x (s)
s=
s
(1 1 )
s=
2
2
1 1 + 1 1
w
2 2
w
=
= 2 v w2 .
1 2
2
2
1 1 )
1 1 1 1 1
(1 1 )
(c) To get the optimal nite estimator, use the orthogonality principle to get the equation
y (0) y (1)
x (1)
a1
=
a2
y (1) y (0)
x (2)
We obtain y (0), y (1) from Example 3.11 in Chapter 3, which derives the autocovariance of an
ARM A(1, 1) process and x (1), x (2) from Problem 3.5(b). This leads to the equation
.9583 .8147
a1
.8147
=
.8147 .9584
.7333
a2
and the solution a = (.7204, .1527) . The mean squared error can be computed from
MSE = E[(xt a1 yt1 a2 yt2 )xt ]
= x (0) a1 x (1) a2 x (2)
= .2064.
The optimal mean squared error from the equation in part (b) is .0364.
Chapter 4
47
= E
au1 ,u2 av1 ,v2 xs1 +h1 u1 ,s2 +h2 u2 xs1 v1 ,s2 v2
1/2
1/2 u1 ,u2
v1 ,v2
fx (1 2 )e2i(1 h1 +2 h2 ) d1 d2
1/2
=
1/2
2
4.39 We can use the denition (C.13) for Sn (k , ) in the white noise case [(s t) = w
, for s = t and 0
otherwise] to write
n
2
e2i(k )t ,
Sn (k , ) = n1 w
t=1
=
=
1
Sn (k , ) + Sn (k , )
4
+Sn ( , k ) + Sn (k , )
2
w
[0 + 1 + 1 + 0]
4
2
w
,
2
for k = and is zero otherwise. The other terms are treated similarly.
4.43 Write
w
= 2Re[a
ac ia
as ) (x
xc ixs )]
= 2(a
ac x c + a s x s )
xc
= 2 ( ac as )
.
xs
Hence,
cov z
1 C Q
ac
)
=
a
Q
C
2
s
= 2(a
ac + ia
as )(C iQ)(a
ac ia
as )
= a a
a,
4 ( ac
as
Chapter 4
48
Chapter 4
49
Chapter 4
50
Chapter 4
51
EQ2
EQ4
Data
Data
Signal
Signal
Resid
Resid
500
1000
1500
2000
500
EQ6
Data
Signal
Signal
Resid
Resid
500
1000
1500
2000
1500
2000
EQ8
Data
1000
1500
2000
500
1000
Chapter 4
52
EX2
EX4
Data
Data
Signal
Signal
Resid
Resid
500
1000
1500
2000
500
EX6
Data
Signal
Signal
Resid
Resid
500
1000
1500
2000
1500
2000
EX8
Data
1000
1500
2000
500
1000
Chapter 5
53
Chapter 5
5.1 (a) A time plot is shown below. Note apparent trend in data.
Chapter 5
54
5.2 R code for tting fractional noise is below. The estimated value of d is about .49. R wasnt able t an
additional AR or MA parameter.
x=read.table("/mydata/globtemp2.dat")
gtemp=x[,2]
fracdiff(gtemp)
We also t an ARFIMA(1,1,0) in Splus; the tted model is possibly fractional white noise. The
estimates were " = .11 with estimated standard error .0667 (" not signicant at .05 level) and d" = .48
with estimated standard error .0013 with output shown below.
Splus Commands:
gtemp<globtemp2[,2]
arima.fracdiff(gtemp, model = list(ar = NA), M=50)
$model$ar:
[1] 0.1696179 # borderline significance
$model$d:
[1] 0.4617656
$var.coef:
d
ar1
d 0.00001355149 0.00001478296
ar1 0.00001478296 0.00771081885
5.3 R code to t an ARFIMA(1,1,1) is below:
nyse=scan("/mydata/nyse.dat")
x=abs(nyse)
acf(x,200)
fracdiff(x,nar=1,nma=1)
$d
0.3100793
$ar 0.06873574
$ma
0.1660708
$stderror.dpq
0.016432124 0.006991419 0.007021429
5.4 The time plot of the data indicates ARCH behavior. The ACF and PACF of the returns suggest an
MA(1) or AR(1) behavior (with or approximately .2). Both models provide a good t, but the
AR(1) is easier to t and that is what we use. We t an AR(1) to the data and obtained 0.0006
for the constant and 0.2411 for the AR parameter estimate. A time plot of the residuals indicates
ARCH behavior. The ACF/PACF of the residuals appear to support the fact that the residuals are
white, but the ACF/PACF of the squared residuals shows some low order correlation. Next, we t
an AR(1)ARCH(1) model to the data using the SPLUS garch module. The results of the t and
standard errors are given below:
"0
"1
"1
"0
.8612 .0862 .0275 0.2273
(.0634) (.0486) (.0426) (.0452)
5.5 A plot of the returns ln(xt ), where xt is the oil price series, appears to have some ARCH behavior
in that there is clustering of volatility. There are also some regions where there appears to be some
structural breaks in the data. The ACF and PACF of the returns suggest an AR(1) for the mean.
Thus, we used the S+Garch module to t a Garch(1,1) with an AR(1) mean, with commands and
output given below.
>
>
>
>
Chapter 5
55
Estimated Coefficients:
Value Std.Error t value
Pr(>t)
C 0.0031795 0.00460304 0.6907 2.453e001
(phi0)
AR(1) 0.5050445 0.12005115 4.2069 2.069e005
(phi1)
A 0.0003676 0.00008318 4.4198 8.670e006
(alpha0)
ARCH(1) 0.2347824 0.09981626 2.3521 9.892e003
(alpha1)
GARCH(1) 0.6056319 0.09228240 6.5628 2.919e010
(beta1)
Normality Test:
JarqueBera Pvalue ShapiroWilk Pvalue
2849
0
0.7874
0
LjungBox test for standardized residuals:
Statistic Pvalue Chi^2d.f.
5.895 0.9213
12
LjungBox test for squared standardized residuals:
Statistic Pvalue Chi^2d.f.
4.053 0.9825
12
Lagrange multiplier test:
Lag 1 Lag 2
Lag 3
Lag 4
Lag 5 Lag 6
Lag 7
Lag 8 Lag 9
0.246 0.31 0.1775 0.04317 0.09501 1.216 0.005681 0.3309 0.201
Lag 10 Lag 11 Lag 12
C
TR^2 Pvalue Fstat Pvalue
1.336 0.2221 0.2371 0.5095
4.205 0.9794 0.3922 0.9947
5.8 Following the advice of many authors [see for example Thanoon (1990), J. Time Series Anal., 7587]
we t a long AR, to the data with threshold xt3 > 36.6. We found an AR(15) worked well, the
residuals appeared to be white, but were somewhat heteroscedastic. The following models were t:
xt
= (1) +
15
(1)
(1)
xt3 36.6
(2)
(2)
j xtj + wt ,
j=1
xt
= (2) +
15
j xtj + wt ,
j=1
Chapter 5
phi15
56
0.473
0.300
0.238
0.154
0.062
0.156
0.150
0.136
0.113
0.053
Chapter 5
57
Q =
2
0 + E(y02 ) + 2
(Z z 0 )
+ Z
= 0
as was to be shown.
(b) From (a)
+ Z
= 0 , so that = 1 ( 0 Z
). Thus z 0 = Z = Z 1 ( 0 Z
). Solving,
= (Z 1 Z)1 [Z 1 0 z 0 ]. Finally,
= 1 0 1 Z(Z 1 Z)1 [Z 1 0 z 0 ].
Now y"0 = y = 1 0 1 Z(Z 1 Z)1 [Z 1 0 z 0 ] y , or
"
" 1 Z
y"0 = 0 1y + z 0
w
w
0
which simplies to the desired expression.
5.11 (a) We
the
the
the
transform inow and take the seasonal dierence yt = ln it ln it12 which is proportional to
percentage yearly increase in ow. Monthly precipitation has some zero values and we use
square root transformation to stabilize this variable. Fitting to two series separately leads to
two ARIMA models
xt = 12 Pt = (1 .812(.029)B 12 )wt
and
yt = (1 .764(.033)B 12 )zt ,
2
= 32.503 and
z2 = .225.
with
w
(b) Cross correlating the two transformed series xt and (1.812B 12 )yt leads to the gure shown below
and we note that the inow series seems to depend on exponentially decreasing lagged values of
the precipitation.
5.12 (a) The ACF of the residuals from the ARIMA(0, 0, 0) (0, 1, 1)12 model is wellbehaved with all
values (except lag 1) well below the signicance levels.
(b) The CCF has been computed in Problem 5.11 and is shown in the gure below. The exponential
decrease, beginning at lag zero, suggests
(B) = 0 (1 + 1 B + 12 B 2 + 13 B 3 + . . .)
for tting the exponential decrease.
Chapter 5
58
1
CCF of transformed inflow with prewhitened precipitation
0.8
0.6
0.4
0.2
0.2
0.4
0.6
0.8
1
30
20
10
0
lag
10
20
30
xt + t
1 1 B
which becomes
(1 1 B)yt = xt + (1 1 B)t ,
or
yt = 1 yt1 + 0 xt + nt ,
where
nt = (1 1 B)t .
We can run the regression model above as ordinary least squares, even though the residuals are
correlated, obtaining
yt = .526(.026)yt1 + .050(.002)xt + nt
(d) To model the noise, we take n
t from the above model and note that
n
t = (1 .526B)t
can be solved for t by inverting the rst order moving average transformation. These residuals,
say t can be modeled by an ARIM A(1, 0, 0) (0, 0, 1)12 model of the form
(1 .384B)t = (1 .796B 12 )zt ,
where
z2 = .0630. Hence, the nal model is of the form
yt =
.050
xt + t ,
1 .526B
where
xt = (1 .812B 12 )wt
and the noise t is as modeled above.
(e) A possible general procedure would be to forecast xt and t separately and then combine the
forecasts using the dening equation above (note that xt and t are assumed to be independent).
To forecast xt , note that
xt = wt .812wt12
and
xt+m = wt+m .812wt+m12
Chapter 5
59
.812wt+m12
0
m 12
m > 12,
where the residuals wt come from applying the model to the data xt . The forcast variance for
2
2
for m 12 and (1 + .8122 )w
for m > 12. To forecast t , note that
x
t+m will be w
t+m = .384t+m1 + zt+m .795zt+m12 ,
so that
t+m =
.384
t+m1 .795zt+m12
.384
t+m1
m 12
m > 12,
1 .796B 12
zt = (B)zt .
1 .384B
m1
j2 .
j=0
AIC
23.769
24.034
24.040
23.995
AICc
23.651
23.798
23.686
23.523
SIC
23.597
23.690
23.524
23.306
0.006
0.146
0.891
0.004 0.101
and
5.684
" w = 0.307
1000
0.066
0.307
0.119
0.037
2.621
0.229
0.052
0.066
0.037 .
0.062
Residual analysis shows that there is still some small amount of correlation in the residual corresponding
to consumption. Fitting a third order model removes this small but signicant correlation. The
estimates of the VAR(3) model are:
0.009
0.129
0.913
0.007 0.080
0.257
0.010
1.582
0.116
5.382 0.285 0.060
" 3 = 0.012 0.143 0.102 1000
" w = 0.285
0.116
0.035 .
Chapter 6
60
Chapter 6
6.1 (a)
xt
xt1
=
0 .9
1
0
and
yt = [1 0]
xt1
xt2
xt
xt1
+
wt
0
+ vt
(b) For yt to be stationary, xt must be stationary. Note that for t = 0, 1, 2, ..., we may write x2t1 =
t1
t1
j
t
j
t
j=0 (.9) w2t12j + (.9) x1 and x2t =
j=0 (.9) w2t2j + (.9) x0 . From this we see
the steps of Problem 3.5, we conclude that
that x2t1 and x2t are independent. Repeating
setting x0 = w0 / 1 .92 and x1 = w1 / 1 .92 will make xt stationary. In other words, set
2
/(1 .92 ).
02 = 12 = w
(c) and (d): The plots are not shown here.
6.2 (i) s = t: Without loss of generality, let s < t, then cov(s , t ) = E[s E(t y1 , ..., ys )] = 0.
. Thus t = yt ytt1 = (xt xt1
) + vt , and
(ii) s = t: Note that ytt1 = E(xt + vt y1 , ..., yt1 ) = xt1
t
t
t1
t1
it follows that var(t ) = var[(xt xt ) + vt ] = Pt + v2 .
6.3 See the code to duplicate Example 6.6 on the web site. Except for the estimation part, this problem
is similar to that example.
6.4 (a) Write x = (x1 , ..., xp ) , y = (y1 , ..., yq ) , b = (b1 , ..., bp ) and B = {Bij }i=1,...,p;j=1,...,q . The
projection equations are
E[(xi bi Bi1 y1 Biq yq ) 1] = 0,
E[(xi bi Bi1 y1 Biq yq ) yj ] = 0,
i = 1, ..., p;
i = 1, ..., p,
j = 1, ..., q.
(1)
(2)
Chapter 6
61
k
k
Ak+1 + R]1 = [Pk+1
]1 Kk+1 .
Ak+1 [Ak+1 Pk+1
Figure 1: Plot of data yt and regression predictor and y"t for Problem 6.7(a).
Chapter 6
62
Figure 2: Plot of data yt ( ), the smoother, xnt () and the predictor xt1
(  ) for Problem 6.7(b).
t
(b) The model can be written as
3 3
xt
xt1 = 1
0
0
1
xt2
1
xt1
wt
0 xt2 + 0
0
xt3
0
xt
yt = [1 0 0] xt1 + vt .
xt2
2
w
Q= 0
0
0 0
0 0
0 0
2
Estimation using ASTSA yielded
"w
.001 and
"v2 .000. This model can also be estimated
with R with the help of the code on the web site.
6.8 Using (6.64), the essential part of the complete loglikelihood (i.e. dropping any constants) is
ln x2  +
n
n
x2t
(yt xt )2
2
+
ln


+
.
v
2
r
v2
t=1 t x
t=1
Following (6.71) and (6.72), the updated estimates will be (using the notation of the problem, " for
updates and # for current values)
"x2 = n1
n
[#
xn ]2 + P#n
t
t=1
rt
and
"v2 = n1
n
[(yt x
#nt )2 + P#tn ].
t=1
It remains to determine x
#nt and P#tn . These can be obtained from (B.9)(B.10). Write Xn = (x1 , ..., xn )
and Yn = (y1 , ..., yn ) and drop the # from the notation. Then
xx xy
Xn
N 0,
,
yx yy
Yn
Chapter 6
63
rt x2
yt
rt x2 + v2
rt2 x4
rt x2 v2
=
.
rt x2 + v2
rt x2 + v2
From (1) we see that Pt [] Pt1 as Pt1 [] Pt2 , implying that the sequence {Pt } is
2
/(1 2 ) using the
monotonic. In addition, the sequence is bounded below by 0 and above by w
fact that
2
)2 var(xt ) w
/(1 2 ).
Pt = E(xt xt1
t
From these facts we conclude that Pt has a limit, say P , as t , and from part (a), P must
satisfy
(2)
P = 2 (P 1 + R1 )1 + Q.
We are given R = Q = 1; solving (2) yields
P 2 + (1 2 )P 1 = 0.
(c) Using the notation in (b), Kt = Pt /(Pt + 1) and it follows that Kt K = P/(P + 1). Also,
0 < (1 K) = 1/(P + 1) < 1 because P > 0.
n
= xnn+1 , and in steady state
(d) In this problem, yn+1
xnn+1
= Kyn + (1 K)xn1
n
= Kyn + 2 (1 K)Kyn1 + 2 (1 K)2 xn2
n1
..
.
j K(1 K)j1 yn+1j .
=
j=1
(1)
m1
m1
Because xm is not observed, xm
= xm1 . Moreover, xnm+1 = xm+1 and xm
m = xm
m+1 = xm+1 =
2
m
m
m
m1
2
= w . In addition,
xm1 . Note, Jm = Pm /Pm+1 . Now from (6.22) with Am = 0, Pm = Pm
m1
m
2
= Pm+1
= w
(1 + 2 ). Thus, Jm = /(1 + 2 ). Inserting the ed values in (1) gives the
Pm+1
desired result.
(2)
Chapter 6
64
t
1
2
3
x_t
1.01
****
1.05
6
7
8
0.76
****
1.95
13
14
15
2.81
****
0.42
40
41
42
43
44
45
46
47
48
49
50
0.60
****
1.21
****
1.29
0.12
****
0.07
0.28
****
2.83
x_t^n
t
53
54
55
x_t
3.12
****
1.73
1.32
61
62
63
2.28
****
0.56
1.38
1.57
66
67
68
1.64
****
2.25
1.89
0.88
79
80
81
1.45
****
0.67
1.03
85
86
87
88
89
90
1.14
****
2.00
0.59
****
0.60
1.00
0.04
0.02
1.51
x_t^n
2.36
t
93
94
95
x_t
1.78
****
0.51
x_t^n
1.11
1.53
0.58
6.15 We t a model similar to Example 6.10, that is, yt = Tt + St + vt , where Tt = Tt1 + wt1 and
St + St1 + + St11 = wt2 . The state equation in this case is similar to Example 6.10 but with
13 1 state vector xt = (Tt , St , St1 , . . . , St11 ) . The estimates and the corresponding standard errors
"w2 = .000 (.030), and
"v = 1.178 (.234). The trend and
are " = 1.003 (.001),
"w1 = 2.152 (.219),
seasonal component estimates are shown in Figure 3 for the last 100 time points.
Figure 3: Plot of estimated trend and seasonal components (nal 100 time points) for Problem 6.15 .
6.16 (a) AR(1): xt+1 = xt + vt and yt = xt + vt .
(b) MA(1): xt+1 = 0xt + vt and yt = xt + vt .
(c) IMA(1,1): xt+1 = xt + (1 + )vt and yt = xt + vt .
6.17 The proof of Proposition P6.5 is similar to the proof of Proposition P6.1. The rst step is in noting
that in this setup,
cov(x
xt+1 , t Yt1 ) = Ptt1 At + S
Chapter 6
65
cov(t , t ) t = At Ptt1 At + R
and
E(x
xt+1 Yt1 ) xt1
xt1
+ u
ut .
t
t+1 = x
Then we may write
t1
t1
Pt+1
xt+1
xt+1
,
Yt1 N
t1
t
0
At Pt + S
Ptt1 At + S
t
,
Kt ]
wt
vt
.
Then
t
= E[(x
xt+1 xtt+1 )(x
xt+1 xtt+1 ) ]
Pt+1
( Kt At )Ptt1 ( Kt At )
Q S
I
[I Kt ]
.
S R
Kt
Chapter 6
66
Figure 6: Joint bootstrap distribution, B = 200, of the estimators of and w for Problem 6.19.
or in vector form
1
1
xt1,1
=
1
xt2
0
xt1,2
xt1
with
2
0
2
0
0
0
0
1
0 xt1,1 0 wt1
0
xt2,1 0 0
+ +
0 xt1,2
0
wt2
0
xt2,2
0
0
2
w1
0
Q=
0
0
0
0
0
0
0
0
0
0
0
0
2
w2
0
xt1
yt = y + At t1,1 + vt
xt2
xt1,2
where
At = [1 0 0 0]
or
[1 0 1 0].
Chapter 6
67
Estimate
1.315
0.515
0.480
0.201
0.595
Estimated
Standard Error
0.300
0.381
0.294
0.424
0.102
Estimate
1.290
0.517
0.747
0.537
2.000
Estimated
Standard Error
0.244
0.302
0.171
0.126
0.189
Chapter 7
68
Chapter 7
7.1 Note rst that
(C iQ)(vv c ivv s ) = (vv c ivv s )
implies that
C Q
Q C
vc
vs
=
vc
vs
,
.
.
.
,
}
=
2j .
1
1
p
p
2 Q C
2
2
j=1
But,
f  =
2
p
2
j
j=1
p
2j .
j=1
2p
1
f 2 ,
 =
2
= Y W
= (Y
Y c + iY
Y s )(W
W c iW
W s)
= Y cW c + Y sW s ,
or
C Q
Q C
Wc
Ws
=
Yc
Ys
1
1 C Q
Xc Mc
(X
X s M s) )
Xs Ms
2 Q C
as
( Y c
Y s
)
C Q
Q C
1
Yc
Ys
=
( Y c
Y s
)
Wc
Ws
= (Y
Y cW c + Y sW s )
= Y f 1Y .
Chapter 7
69
7.2 Substitute Lf from (5.6) into (5.5) to rewrite the negative of the log likelihood as
L
1
L ln f  + tr f
(X
X M )(X
X M )
ln L(X
X 1, . . . , X L; f )
=1
= L ln f  + L tr{ff 1 }
= L ln ff 1  + L tr{ff 1 } + ln f
= L ln fP P  + L tr{fP P } + ln f
= L ln P fP  + L tr{P fP } + ln f
= L ln  + L tr{} + ln f
= L
p
ln i + L
i=1
p
p
i Lp + Lp + ln f
i=1
(i ln i 1) + Lp + + ln f
i=1
Lp + ln f
with equality when = I or P fP = I so that
f = (P )1 P 1 = f
7.3
M SE
= y (0)
1/2
1/2
=
1/2
1/2
=
1/2
r xy (r)
r=
1/2
=
fy () d
1/2
1/2
r e2ir
f xy () d
r=
1/2
=
1/2
fyx () d.
fyy ()2
fy()fy ()
Note that the Fourier representation of the crosscovariance function, yy (h) can be written
yy
= E(
yt+h yt )
= E
r xt+hr yt
=
=
r=
r xy (h r)
r=
1/2
r e2ir f xy ()e2ih
1/2 r=
Chapter 7
70
1/2
=
1/2
1/2
=
1/2
B ()ff xy ()e2ih d
f xy ()fx1 ()ff xy ()e2ih d
= E(
yt+h yt )
= E
r xt+hr
xts s
=
r=
s=
r x (h r + s)
s
r= s=
1/2
=
1/2
1/2
=
1/2
1/2
=
1/2
1/2
=
1/2
r e2ir
2is
se
fx ()
e2ih d
r=
s=
B ()ff x ()B
B ()e2ih d
f xy ()fx1 ()fx ()fx1 ()ff xy ()e2ih d
f xy ()fx1 ()ff xy ()e2ih d
f xy ()fx1 ()ff xy ()
fy ()
It follows that
=
B
L
k=1
Yk Xk
L
k=1
Xk Xk
k=1
1
= f xy fx1 ,
Chapter 7
71
which is the sample version of (5.16). To verify the rst part of the last equation, note that
L
Y Y =
Yk 2 = Lfy
k=1
and
Y X(X X)1 X Y
L
Yk X k
L
k=1
L
Yk X k
L
Yk X k
X k Yk
k=1
X k X k
k=1
L
k=1
k=1
Lf xy fx f xy
1
L
k=1
k=1
L
X k X k
X kX k
1
L
X k Yk
k=1
1
L
X k Yk
k=1
t =
hsy ts
s=
E t
= E
ar tr
r=
= E
r,s
ar hs Ztrsj j
r,s,j
1/2
A ()H()Z()B
B ()e2it d.
1/2
Now,
t =
ar tr
1/2
A ()B
B ()e2it d
1/2
r=
r,s
we would have
E[(t t )2 ] = E[(t t )2 ] + E[(t t )2 ] + 2E[(t t )(t t )].
The rst two terms on the righthand side are positive and the result is shown if the cross product term
is zero. We have
ar (gs hs )E[vv trsv tjk ]hk aj
E[(t t )(t t )] =
r,s j,k
Chapter 7
72
1/2
=
1/2
1/2
=
1/2
1/2
=
1/2
A () G() H() H () d
1
A () G() H() Z() Z ()Z()
d
1
A () G()Z() H()Z() Z ()Z()
d
t1
t
zt =
..
.
t
t2
..
,
.
tN
1 e2i1
1 e2i2
,
Z() =
..
...
.
1 e2iN
Then,
Sz () =
N
N
j=1
j=1
e2ij
e2ij
1
1
N (1 ()2 )
=N
1
()
()
1
()
1
()
1
for = 0. If we apply Z () directly to the transform vector Y () = (Y1 , (), . . . , Yn ()) , we obtain
a 2 1 vector containing N Y () and
N Bw () =
N
e2ij Yj (),
j=1
which leads to the desired result, on multiplying by the 2 2 matrix Sz1 ().
7.8 For computations, it is convenient to let u = t r in the model (5.66), so that
yt =
zu tu + v t .
u=
hr y tr .
r=
E[(
ty ts )] = E[ ty ts )].
Chapter 7
73
since t and v t are uncorrelated the lefthand side of the above becomes
E t
tsu zu
u=
(s + u)zu
u=
1/2
=
1/2
f ()Z ()e2is d.
= E
hr y tr y ts
r=
hr y (s r)
r=
1/2
H()fy ()e2is d.
=
1/2
Equating the Fourier transforms of the left and right sides gives Hfy = f Z , or
H = f Z (Zf Z + fv I)1 ,
where the frequency arguments are suppressed to save notation. To get the nal form, note that (4.56)
can be written as
AC (CAC + B)1 = (A1 + C B 1 C)1 C B 1
for the complex case, implying the form
H
1
1
I + Z Z
f
fv
1
1
fv
(Sz + I)1 Z
for the optimal lters. To derive the mean square error, note that
MSE = E[(
t t )
t ] = E[(
t t )] E[( t t )].
The second term is
E[( t t )]
= E
hs zu tsu t
u,s
hs zu (s u)
s,u
1/2
=
1/2
H()Z()f () d.
1/2
MSE =
1/2
[f () H()Z()f ()] d.
Zf Z + fv I
1
Zf
Chapter 7
74
1
1
I + Z Z
f
fv
1
1
= tr ZSx Z E[Y
YY ]
1
= tr ZSz Z (f ZZ + fv I)
1
1
= f tr ZSz Z ZZ
+ fv tr ZSz Z
1
= f tr ZZ
+ fv tr Sz Sz
= f tr {Sz } + qfv .
When the spectrum is cumulated over L frequencies, the multiplier appears.
7.10 Again, suppressing the frequency subscripts the model Y = ZB
B + V takes the vector form
Y11
V11
1 1
..
.. ..
..
.
. .
.
Y1N 1 1 B1
V
+ 1N ,
=
Y21 1 1 B2
V21
. .
.
..
.. ..
..
.
1 1
Y2N
V2N
where B1 is the DFT of t and B2 is the DFT of 1t , 2t = 1t . The null hypothesis is that B2 = 0.
Now, in (5.52),
N (Y1 + Y2 )
szy =
N (Y1 Y2 )
and
s2yz
N
2
Yij 2 ( N (Y1 + Y2 )
N (Y1 Y2 ) )
i=1 j=1
N
Y1j 2 +
j=1
2
N
N
2N
0
0
2N
1
N (Y1 + Y2 )
N (Y1 Y2 )
j=1
Yij Yi 2 ,
i=1 j=1
which is (5.85) Under the reduced model, s1y = N (Y1 + Y2 ) and S11 = 2N so that
s2y1 =
N
j=1
Y1j 2 +
N
j=1
Y2j 2
N
Y1 + Y2 2 .
2
Chapter 7
75
Then, substitute
Y =
to obtain
s2y1 =
N
1
(Y1 + Y2 )
2
Y1j 2 +
j=1
N
Y2j 2 2N Y 2 .
j=1
Then,
= s2y1 s2yz
RSS
= 2N Y 2 + N Y1 2 + N Y2 2
= N
2
Yi Y 2
i=1
2
N
Yi Y 2 ,
i=1 j=1
which is (5.84).
7.11 Use the model
yijt = it + vijt ,
for i = 1, . . . , I, j = 1, . . . , Ni and write the frequency domain version in terms of the vector
Y = (Y11 , . . . , Y1,N1 , Y21 , . . . , Y2N2 , . . . , YI1 , . . . , YINI )
The matrix Z has N1 ones in elements 1 to N1 of column 1, and zeros elsewhere, N2 ones in elements
N1 + 1, . . . , N1 + N2 of column 2, etc. It follows that
Sz = Z Z = diag (N1 , N2 , . . . , NI )
and
Z Y = (N1 Y1 , N2 Y2 , . . . , NI YI )
so that
= (Y1 , Y2 , . . . , YI )
B
and
=
AB
I
Ai Yi .
i=1
Finally,
Q(A)
=
=
(A1 , A2 , . . . , AI ) diag
I
Ai 2
i=1
Ni
1 1
1
,
,...,
(A1 , A2 , . . . , AI )
N1 N2
NI
N1
Y1j Y1 2
and f2 =
j=1
N2
Y2j Y2 2
j=1
Chapter 7
76
for a xed i = 1, 2. Hence, from Table 5.2, the error power components will have a chisquared
distribution, say,
2(Ni 1)fi
22(Ni 1)
fi
for i = 1, 2, and the two samples are assumed to be independent. It follows that the ratio
2[2(N1 1)] /2(N1 1)
f1 f2
F[2(N1 1),2(N2 1)] .
2
[2(N2 1)] /2(N2 1)
f2 f1
It follows that
f1
f1
F[2(N1 1),2(N2 1)] .
f
2
f2
2
7.13 In the notation of (5.113), 1 = s, 2 = 0, 1 = 2 = w
I and
x)
dL (x
=
=
=
1
1 ss
1
sx
+ ln
2
2
w
2 w
2
n
n
1
1 t=1 s2t
1
s
x
+ ln
t
t
2
2
w t=1
2 w
2
n
1
1 S
1
st xt
+ ln
2
w t=1
2 N
2
When 1 = 2 , the last term disappears and we may use (5.115) for the two error probabilities with
S
ss
D2 = 2 =
.
w
N
2
2
)I, 2 = w
I, so that (5.115) becomes
7.14 In this case, 1 = 2 = 0, 1 = (s2 + w
2
2
s + w
1
n
1
1
1
dq (x
x) = ln
2 xx + ln
2
2
2
w
2 s2 + w
w
2
2
n
2
s + w
s2
1
1
1
2
ln
=
x
+ ln
t
2
2
2
2
2 w (s + w ) t=1
2
w
2
S
N
=
s2
2
w
so that, for the quadratic criterion with 1 = 2 we accept 1 or 2 according to whether the statistic
s2
1
T (x
x) =
x2t
2 ( 2 + 2 )
2 w
s
w t=1
n
2
+ 2
S
1
1
ln s 2 w = ln 1 +
.
2
w
2
N
2
2 2
)2n , whereas, under 2 , t x2t w
n , so that
Now, under 1 , t x2t (s2 + w
1 S
T (x
x)
2n
2 N
K=
under 1 and
T (x
x)
under 2 .
1
S
1
2n
1+
2
N
Chapter 7
77
7.15 AwakeHeat: Figures 1 and 2 are the gures corresponding to Figures 5.14 and 5.15 of the text (Example
5.14) except that here, Caudate is included (location number 5). AwakeShock: The corresponding
gures are Figures 3 and 4 below. The listing below is similar to Table 5.8 but for AwakeHeat and
AwakeShock. Note that 22 (.95, .975, .99) = 5.99, 7.38, 9.21.
loc
1
2
3
4
5
e
.467
.450
.460
.363
.230
AWAKEHEAT
chi^2
loc e
842.01
6 .121
150.16
7 .012
623.62
8 .323
104.45
9 .254
30.32
chi^2
8.09
0.07
64.99
46.76






loc
1
2
3
4
5
e
.410
.416
.387
.370
.269
AWAKESHOCK
chi^2
loc
233.39
6
138.42
7
389.46
8
352.68
9
107.23
e
0.139
0.161
0.309
0.398
chi^2
13.952
18.848
252.75
539.46
7.16 (a) See Figure 5. The P components have broad power at the midrange frequencies whereas the S
components have broad power at the lower frequencies.
(b) See Figure 7. There appears to be little or no coherence between the P and S components.
(c)  (d) See Figure 6. These gures support the overall conclusion of part (a).
(e) See Figure 8. The canonical variate series appear to be strongly coherent (in contrast to the
individual series).
Chapter 7
78
x1
b1
1
x2 = b2 z + 2 .
x3
b3
3
This implies
2
1 .4 .9
b1 + 12
.4 1 .7 = b1 b2
.9 .7 1
b1 b3
b1 b2
b22 + 22
b2 b3
b1 b3
b2 b3 .
b23 + 32
Now
b1 b2 = .4
and
b2 b3 = .7
b1 =
4
b3 .
7
But
4 2
b = .9.
7 3
This means that b23 = .9 74 = 1.575. But we also have b23 + 32 = 1 in which case 32 < 0 which is not a
valid variance.
re
if im (). Now f im () =
7.19
Note that f () =
h (h) cos(2hv) i
h (h)
sin(2hv) = f ()
im
();
h (h) sin(2hv) =
h (h) sin(2hv) =
h (h) sin(2hv) =
h (h) sin(2hv) = f
re
that is, the imaginary part is skew symmetric. Also note that f () =
h (h) cos(2hv) =
f re (); that is, the real part is symmetric.
b1 b3 = .9
Next, because f im ()
is a scalar, f im ()
= (
f im ()
) = f im () =
f im ()
. This
im
= 0 for any realvalued vector .
result implies f ()
Chapter 7
79
Chapter 7
80
Figure 8: Problem 7.16 (c) (d). Squared coherency between canonical variate series.
Figure 9: Problem 7.20. This is the equivalent of Figure 7.23 but for the herpesvirus saimiri.
Chapter 7
81
Figure 10: Problem 7.21 (a). Estimated spectral density of NYSE returns.
Figure 11: Problem 7.21 (b). Spectral envelope of NYSE returns with respect to G = {x, x, x2 }.
Figure 12: Problem 7.21 (b). The optimal transformation () and the usual square transformation (  ).