Solutions To Chapter 11

Solutions to Chapter 11
11.1. For minimum variance, we want to minimize diagonal terms of
ǫ2 , (Ŷ − Y )(Ŷ − Y )T ,
where Ŷ = AX.
ǫ2 = (AX − Y )(AX − Y )T = AK1 AT + K2 − AK12 − K21 AT .
Now write A = A0 + δ; is δ = 0 for minimum variance?
ǫ2 = (A0 + δ)K1 (A0 + δ)T + K2 − (A0 + δ)K12 − K21 (A0 + δ)T

= A0 K1 AT0 + K2 − A0 K12 − K21 AT0 + δK1 δT + A0 K1 δT + δK1 AT0 − δK12 − K21 δT
= (A0 X − Y )(A0 X − Y )T + δK1 δT + A0 K1 δT + δK1 AT0 − δK12 − K21 δT
= (A0 X − Y )(A0 X − Y )T + δK1 δT + K21 δT + δK12 − δK12 − K21 δT
= (A0 X − Y )(A0 X − Y )T + δK1 δT + 0.
Therefore,
traǫ2 = tra (Y − A0 X)(Y − A0 X)T + tra δK1 δT

 2
Xn X (0)
aij xj  + tra δCC T δT

= yi −
i=1 j
≥ 0.
tra δCC T δ = tra δC(δC)T ≥ 0 where we use factorization K1 = CC T .

T

Clearly set δ = 0 for traǫ2 to be minimum.
11.2. Suppose Y1 = Y − µ2 and X1 = X − µ1 . Then we know that E[Y1 ] = 0 = E[X1 ], K1 =

E[X1 X1T ], K2 = E[Y1 Y1T ], K12 = E[X1 Y1T ], K21 = E[Y1 X1T ]. Hence from problem 11.1, the
minimum variance estimator Y1 of the form Ŷ1 = AX1 is given by
Ŷ1 = K21 K1−1 (X − µ1 ).
But is Ŷ1 equal to Ŷ − µ2 ?

Write Y = Y1 + µ2 . Let β be such that Ŷ = Ŷ1 + β. Then
ǫ2T = (Ŷ − Y )(Ŷ − Y )T

= [(Ŷ1 − Y1 ) + (β − µ2 )][(Ŷ1 − Y1 ) + (β − µ2 )]T
= (Ŷ1 − Y1 )(Ŷ1 − Y1 )T + (β − µ2 )(β − µ2 )T
(because (Ŷ1 − Y1 ) = E[K21 K1−1 X1 ] − E[Y1 ] = 0 − 0 = 0)
= ǫ2 + (β − µ2 )(β − µ2 )T
trǫ2T = trǫ2 + tr[(β − µ2 )(β − µ2 )T ] ≥ 0 is minimum when β = µ2 . Hence
Ŷ = Ŷ1 + µ2 = µ2 + K21 K1−1 (X − µ1 ).
1
11.3. Given ǫ2 = E[(X − E[X|Y ])2 ] = E[X(X − E[X|Y ])] − E[E[X|Y ][X − E[X|Y ]]].
However, since the estimate is orthogonal to the error, we have
E[E[X|Y ][X − E[X|Y ]]] = 0. (1)
Therefore,
ǫ2 = E[X[X − E[X|Y ]]]. (2)
By the same reasoning, from Eq. 1: E[E[X|Y ][X − E[X|Y ]]] = 0 =⇒ E[XE[X|Y ]] =
E[[E[X|Y ]]2 ]. But from Eq. 2,
ǫ2 = E[X 2 ] − E[XE[X|Y ]] = E[X 2 ] − E[(E[X|Y ])2 ].
Now let us extend this to random N-vectors.
ǫ2 = E[[X − E[X|Y ]][X − E[X|Y ]]T ] (3)

T T T T
= E[XX ] − E[XE[X|Y ] ] − E[E[X|Y ]X ] + E[E[X|Y ]E[X|Y ] ]. (4)
However, using orthogonality again,
E[E[X|Y ][X − E[X|Y ]T ]] = 0
or
E[E[X|Y ]X T ] = E[E[X|Y ]E[X|Y ]T ]. (5)
Using Eq. 4 and Eq. 5 together,
ǫ2 = E[XX T ] − E[XE[X|Y ]T ] = E[X[X − E[X|Y ]]T ]. (6)
Also:
E[XE[X|Y ]T ] = E[E[X|Y ]E T [X|Y ]]. (7)
Using Eq. 6 and 7 together, we have
ǫ2 = E[XX T ] − E[E[X|Y ]E T [X|Y ]].
11.4. G[N ] , E[X[n]|YN ], YN , [Y [n], . . . , Y [n − N ]].

From problem 8.55, we know G[N ] is a Martingale sequence in the parameter N .
E[X 2 [n]] = E[(X[n] − E[X[n]|YN ] + E[X[n]|YN ])2 ]

= E[(X[n] − E[X[n]|YN ])2 ] + E[E 2 [X[n]|YN ]].
(by the orthogonality principle)
E[E 2 [X[n]|YN ]] ≤ E[X 2 [n]]
2
σG [N ] = E[E 2 [X[n]|YN ]] − E 2 [E[X[n]|YN ]] ≤ E[X 2 [n]] − E 2 [X[n]].
Therefore, σG2 [N ] ≤ σ 2 [n]. Let C , σ 2 [n] =⇒ σ 2 [N ] ≤ C for all N . By the Martingale
X X G
convergence theorem 8.8-4, we can conclude that the limit limN →∞ E[X[n]|YN ] exists with
probability 1.
2
11.5. The modified Theorem 11.1-3 is given as:
Modified theorem: The LMMSE estimate of the zero-mean sequence X[n] based on the
zero-mean random sequence Y [n]’s (p + 1) most recent terms is
p
(p)
X
Ê {X[n]|Y [n], . . . , Y [n − p]} = ai Y [n − i],
i=0
(p)
where the ai satisfy the orthogonality condition
p
" #
(p)
X
X[n] − ai Y [n − i] ⊥ Y [n − k], 0 ≤ k ≤ p.
i=0
Further, the LMMSE is given as

p
2(p) X (p)
ǫmin = E |X[n]|2 − ai E {Y [n − i]X ∗ [n]} .

i=0
(a) Eq. (11.1-25) changes to

p
(p)
X
E[X[n]Y ∗ [n − k]] = ai E {Y [n − i]Y ∗ [n − k]} , 0 ≤ k ≤ p
i=0
via the data modification of {Y [n], . . . , Y [0]} being replaced by {Y [n], . . . , Y [p]}. The
equation (11.2-4) changes via
h i
(p)
a(p) , a0 , . . . , ap(p)
and
Y , [Y [n], Y [n − 1], . . . , Y [n − p]] ,
thereby reversing time from the previous Y . The cross-covariance vector K XY then
becomes
K XY = E {X[n]Y ∗ [n], . . . , X[n]Y ∗ [n − p]}
and the matrix vector equation (9.1-26) remains essentially the same at
T
a(p) = K XY K −1
YY.
The entries in K −1
Y Y are given as
K Y Y ij = E [Y [n − i + 1]Y ∗ [n − j + 1]] , 1 ≤ i, j ≤ p + 1.

(b) The modified Eq. (11.1-27) is the same as before with the new K XY and K Y Y inserted
into the old version. Nothing else changes
2(p) 2
ǫmin = σX (n) − K XY K −1 T
Y Y K XY .
11.6. X[n] is defined as

n
X k+2
X[n] , − X[n − k] + W [n],
2
k=1
for n = 1, 2, . . . with X[0] = W [0], W [n] is Gaussian noise with zero mean and unit variance.
3
(a) W [n] is the innovation sequence for X[n] because
• W [n] is a white (uncorrelated) sequence.
• It is defined as a causal invertible linear transformation on X[n].
(b) A simple substitution will show the result. From the defn. of X[n],
n n
X k+2 X k+2
W [n] = X[n] + X[n − k] = X[n − k].
2 2
k=1 k=0
Then,
PnW [n] − 2W [n − 1] + 3WP[n − 2] − W [n − 3]
k+2 n−1 k+2 Pn−2 k+2
= k=0 2 X[n − k] − 3 k=0 2 X[n − 1 − k] + 3 k=0 2 X[n − 2 − k] −
Pn−3 k+2
k=0 2 X[n − 3 − k]
= X[n].
(c) X[n] is Gaussian since W [n] is Gaussian. X̂[12|10]

, Ê [X[12]|X[0], X[1], . . . , X[10]]
= Ê [X[12]|W [0], W [1], . . . , W [10]]
= Ê [W [12] − 3W [11] + 3W [10] − W [9]|W [0], W [1], . . . , W [10]]
= Ê [W [12]|W [0], W [1], . . . , W [10]] − 3Ê [W [11]|W [0], W [1], . . . , W [10]]
+ 3Ê [W [10]|W [0], W [1], . . . , W [10]] − Ê [W [9]|W [0], W [1], . . . , W [10]]
= 3W [10] − W [9]
= (because W [12] ⊥ [W [0], W [1], . . . , W [10]] and W [11] ⊥ [W [0], W [1], . . . , W [10]])
For M.S. prediction error,
E[(X[12] − X̂[12|10])2 ] = E [((W [12] − 3W [11] + 3W [10] − W [9]) − (3W [10] − W [9]))]
= E[(W [12] − 3W [11])2 ]
= E[W 2 [12]] + 9E[W 2 [11]] − 3E[W [12]W [11]]
= 1 + 9 + 0 = 10.
E[W [12]W [11]] = 0 because W [12] ⊥ W [11] and W [n] is zero-mean.
11.7. (a) The innovations sequence is clearly W [n], because X[n] = X[n − 1] + W [n].
(b)
X̂[n|n] = X̂[n − 1|n − 1] = Gn Y [n] − X̂[n − 1|n − 1] .
(c)
−1
Gn = ǫ2 [n] ǫ2 [n] + σv2

; n ≥ 1.
ǫ2 [n] = ǫ2 [n − 1](1 − Gn−1 ) + 1; n ≥ 1.
ǫ2 [0] = E[X 2 [0]] = 0.
11.8. Given 2Y [n + 2] + Y [n + 1] + Y [n] = 2W [n], W [n] ∼ N (0, 1), Y [0] = 0, Y [1] = 1, we have for
state-space representation

Y [n + 2] −0.5 −0.5 Y [n + 1] 1
= + W [n].
Y [n + 1] 1 0 Y [n] 0
If the state vector X[n] is defined as

Y [n + 2]
X[n] , ,
Y [n + 1]
4
we have
−0.5 −0.5 1
X[n] = X[n − 1] + W [n].
1 0 0
Also, E[2Y [n + 2] + Y [n + 1] + Y [n]] = 2E[W [n]], and so we have
2µY [n + 2] + µY [n + 1] + µY [n] = 2µW [n]
. Obviously, µY [0] = 0 and µY [1] = 1. Then,

2µY [2] = −µY [1] − µY [0] = −1 or µY [2] = −0.5.
2µY [3] = −µY [2] − µY [1] = −0.5 or µY [3] = −0.25. etc.
We also have E [[2Y [n + 2] + Y [n + 1] + Y [n]]Y [n]] = 2E[W [n]Y [n]] = 0.
E [[2Y [n + 2] + Y [n + 1] + Y [n]]Y [n + 1]] = 2E[W [n]Y [n + 1]] = 0.
Therefore, we can calculate R[n1 , n2 ] for all n1 , n2 . Note for initial conditions, we have
(a) RY [0, 0] = 0 = RY [0, 1] = RY [1, 0], and RY [1, 1] = 1, RY [2, 1] = −0.5 etc.
(b) RY [n1 , n2 ] = RY [n2 , n1 ] since Y is a real sequence.
11.9. (a) From the equation, we have X[n] = AX[n − 1] + BW [n]. We have
µX [n] , E[X[n]] = E [AX[n − 1] + BW [n]] = AµX [n − 1] + BµW [n].
From the equation, Y [n] = X[n] + V [n]. We also have
µY [n] , E[Y [n]] = E[X[n]] + E[V [n]] = µX [n] + µV [n].
(b) Since µX [n], µY [n] are deterministic variables, we have
X̂[n|n] , E[X[n]|Y [0], . . . , Y [n]]

= E[XC [n] + µX [n]|Y [0], . . . , Y [n]]
= E[XC [n]|Y [0], . . . , Y [n]] + µX [n]
= E[XC [n]|YC [0] + µY [0], . . . , YC [n] + µY [n]] + µX [n]
= E[XC [n]|YC [0], . . . , YC [n]] + µX [n]
= X̂C [n|n] + µX [n].
(c) Since XC [n] = AXC [n − 1] + BWC [n] and YC [n] = XC [n] + VC [n] are the same as in
(11.2-6 and 7), we can use the same estimate equation to estimate X̂C [n|n], i.e.,
h i
X̂C [n|n] = AX̂C [n − 1|n − 1] + Gn YC [n] − AX̂[n − 1|n − 1] .
So,
h i
X̂C [n|n] = AX̂[n−1|n−1]+Gn Y [n] − AX̂[n − 1|n − 1] −AµX [n−1]−Gn µY [n]+Gn AµX [n−1].
Therefore,
h i
X̂C [n|n] = AX̂[n−1|n−1]+Gn Y [n] − AX̂[n − 1|n − 1] −(Gn A−A)µX [n−1]−Gn µY [n].
(d) Since the gain and error covariance equations just depend on σV2 [n], σW
2 [n] and dynamical
model’s coefficients, the gain and error covariance equations do not change.
5
11.10. Y [n] = Cn X[n] + V [n]
We arrive at following equation simply by following the procedure developed in the book.
h i
X̂[n] = An (I − Gn−1 Cn−1 )X̂[n − 1] + Gn−1 Y [n − 1] ,
with h ih i−1
Gn = E X[n]Ỹ T [n] σỸ2 [n]
and Ỹ [n] = Y [n] − Cn X̂[n] where X̂[n] , X̂[n|n − 1]. However,

h i
T T
E[X[n]Ỹ [n]] = E X[n](Y [n] − Cn X̂[n])
h i
= E X[n](Cn X[n] + V [n] − Cn X̂[n])T
h i
= E X[n](X[n] − X̂[n])T CnT .
h i
By the orthogonality principe, E X̂[n](X[n] − X̂[n])T = 0. So
h i
E[X[n]Ỹ T [n]] = E (X[n] − X̂[n])(X[n] − X̂[n])T CnT = ǫ2 [n]CnT .
Also
σỸ2 [n] = E[Ỹ [n]Ỹ T [n]]

T
= E Cn (X[n] − X̂[n]) + Vn Cn (X[n] − X̂[n]) + Vn
= Cn ǫ2 [n]CnT + σV2 [n].

−1
So, G[n] = ǫ2 [n]CnT Cn ǫ2 [n]CnT + σV2 [n] . We know,

h i h i
ǫ2 [n] = E (X[n] − X̂[n])(X[n] − X̂[n])T = E X[n]X T [n] − E X̂[n]X̂ T [n] .

h i
Now, X[n] = An X[n − 1] + Bn W [n] and X̂[n] = An X̂[n − 1] + Gn−1 Ỹ [n − 1] . Therefore,
2
E[X[n]X T [n]] = An E X[n − 1]X T [n − 1] ATn + Bn σW [n]BnT ,

and also
h i h i
E[X̂[n]X̂ T [n]] = An E X̂[n − 1]X̂ T [n − 1] ATn + An Gn−1 E Ỹ [n − 1]Ỹ T [n − 1] GTn−1 ATn .
Hence,
2
ǫ2 [n] = An ǫ2 [n − 1] − Gn−1 σỸ2 [n − 1]GTn−1 ATn + Bn σW [n]BnT .
From the result above,

h i
Gn−1 σỸ [n − 1]GTn−1 = E X[n − 1]Ỹ T [n − 1] GTn−1 = ǫ2 [n − 1]Cn−1
T
GTn−1 .
Thus, finally
ǫ2 [n] = An ǫ2 [n − 1] I − Cn−1
T 2
GTn−1 ATn + Bn σW [n]BnT .

6
11.11. (a) Since Ỹ [−N ] ⊥ Ỹ [−N + 1] ⊥ . . . Ỹ [N ]. Using Theorem 11.1-4 property (b), we have
h i h i
Ê X[n]|Ỹ [−N ], Ỹ [−N + 1], . . . , Ỹ [N ] = Ê[X[n]|Ỹ [−N ]]+Ê X[n]|Ỹ [−N + 1], . . . , Ỹ [N ] .
By the same procedure
h i N
X
Ê X[n]|Ỹ [−N ], Ỹ [−N + 1], . . . , Ỹ [N ] = Ê[X[n]|Ỹ [k]].
k=−N
h i
Let Ê X[n]|Ỹ [k] , g[k]Ỹ [k]. Then we have
h i N
X
Ê X[n]|Ỹ [−N ], Ỹ [−N + 1], . . . , Ỹ [N ] = g[k]Ỹ [k].
k=−N
h i
(b) Let X̂[N ] = Ê X[n]|Ỹ [−N ], . . . , Ỹ [N ] .
h i
∵ E X̂[N ]|X̂[0], X̂[1], . . . , X̂[N − 1] = X̂[N − 1] since E , Ê in Gaussian case.
h i
∴ Ê X[n]|Ỹ [−N ], . . . , Ỹ [N ] is a Martingale sequence. Using the result of problem 11.4,
we conclude
h i N
X
lim Ê X[n]|Ỹ [−N ], . . . , Ỹ [N ] = lim g[k]Ỹ [k]
N →∞ N →∞
k=−N
exists with probability 1.

PN −1
11.12. R̂N [m] = N1 n=0 x[n + m]x∗ [n]
(a)
( N −1 ) N −1
n o 1 X ∗ 1 X
E R̂N [m] = E x[n + m]x [n] = E {x[n + m]x∗ [n]} = RX [m]
N n=0 N n=0
n o
(b) Show that limN →∞ E |R̂N [m] − RX [m]|2 = 0.
n o
limN →∞ E |R̂N [m] − RX [m]|2
n o
= limN →∞ E (R̂N [m] − RX [m])(R̂N∗ [m] − R∗ [m])
N
n o
∗ ∗ ∗ ∗
= limN →∞ E R̂N [m]R̂N [m] − R̂N [m]R̂X [m] − R̂N [m]RX [m] + RX [m]RX [m]
n o
= limN →∞ E R̂N [m]R̂N∗ [m] − R [m]R∗ [m] .
X X
Now th
n apply 4 order o moment property for (complex) Gaussian random variables to get
∗
E R̂N [m]R̂N [m]
nP o
N −1 PN −1 ∗
= N12 E n1 =0 X[n 1 + m]X ∗ [n ]
1 n2 =0 X [n 2 + m]X[n 2 ]
1 P ∗ ∗
= N 2 n1 ,n2 E {X[n1 + m]X [n1 ]X [n2 + m]X[n2 ]}
= N12 n1 ,n2 E {X[n1 + m]X ∗ [n1 ]} E {X[n2 + m]X ∗ [n2 ]}
P
7
+ N12 n1 ,n2 E {X[n1 + m]X ∗ [n2 + m]} E {X ∗ [n1 ]X[n2 ]}
P
+ extra term which is zero if X[n] has symmetry condition as in (10.6-4) and (10.6-5)
which can be restated for a complex random sequence X[n] = Xr [n] + jXi [n] as
KXr Xr [m] = KXi Xi [m] and KXr Xi [m] = −KXi Xr [m].
(NOTE: K[.] = R[.] because X[n] is zero mean.)
In this symmetric case, which occurs for the bandpass random process of section 10.6, we see
that E {X[n1 + m]X[n2 ]} is zero as follows:
E {X[n1 + m]X[n2 ]} =
= E {(Xr [n1 + m] + jXi [n1 + m])(Xr [n2 ] + jXr [n2 ])}
= E {Xr [n1 + m]Xr [n2 ] − Xi [n1 + m]Xi [n2 ]} + jE {Xi [n1 + m]Xr [n2 ] + Xr [n1 + m]Xi [n2 ]}
= KXr Xr [n1 − n2 + m] − KXi Xi [n1 − n2 + m] + j (KXi Xr [n1 − n2 + m] + KXr Xi [n1 − n2 + m])
= 0 + j0 = 0
(For more on complex Gaussian random processes and random sequences, see Discrete Ran-
dom Signals and Statistical Signal Processing, C. W. Therrien, Prentice-Hall, 1992.)
So,
N −1
n o 1 X
E |R̂N [m]|2 ∗
= RX [m]RX [m] + 2 ∗
RX [n1 − n2 ]RX [n1 − n2 ]
N n1 ,n2 =0
N −1
X N − |n|
= |RX [m]|2 + |RX [n]|2 .
N2
n=−(N −1)
n o
Thus limN →∞ E |R̂N [m] − RX [m]|2 ≤ limN →∞ N1 ∞ 2
P
m=−∞ |RX [m]| = 0 for square summable
RX [m].
(NOTE: For real-valued random sequence, get two O N1 terms.)

11.13. (a) The solution to this part is the same as the solution to Problem 8.32 (refer). The power
spectral density is given by
1 − ρ21 1 − ρ22

SXX (w) = 10 +5 ,
1 + ρ21 − 2ρ1 cos w 1 + ρ22 − 2ρ2 cos w
where ρ1 = e−λ1 , ρ2 = e−λ2 .

(b)
N −1
X N − |m|
E {IN (w)} = RX [m]e−jwm
N
m=−(N −1)
N −1
X N − m −λ1 m−jwm X
= 10e + 5e−λ2 m−jwm + 10eλ1 m−jwm + 5eλ2 m−jwm .
N
m=0
P
N −1 −m −λ1 m−jwm + 5e−λ2 m−jwm
So, limN →∞ E {IN (w)} = SX (w)+limN →∞ m=0 N 10e .
P
N −1 N −1 ∞
me−λm−jwm ≤ −λm for any λ > 0, and −λm > ∞.
P P
Now, m=0 m=0 me m=0 me

Hence lim N1 me−λm → 0. Thus we can conclude for this case that
P
lim E {IN (w)} = SX (w).

N →∞
8
11.14. r0 a1 = r1
2 a = σ 2 ρ =⇒ a ρ
σX 1 XP 1
σe = r0 − pm=1 am rm = σX
2 2 − ρ2 σ 2 = σ 2 (1 − ρ2 ).
X X
1
SX (w) =
−jwm |2
1 Pp
σe2
|1 − m=1 am e
σx2 (1 − ρ2 )
=
|1 − ρe−jw |2
σx2 (1 − ρ2 )
= , |w| ≤ π.
1 − 2ρ cos w + ρ2
11.15. The Matlab code (below) uses an AR(3) model to generate the N point random sample.
Figure 1 is an estimate of the correlation function, computed with N = 100. The bottom
axis should run from −100 to 100, as the zero shift value for R[m] estimate is in the middle
of the plot. Following the first plot, are three AR(3) spectral estimates for N = 25, 100, and
512 data points (Fig. 2). Also on each plot is the true psd for our AR(3) model.
%This program generates an AR3 estimate of psd of AR3 model.

clear
Pi=3.1415927;
disp(’This .m file computes ar(3) parametric psd estimate.’);
N=input(’choose data length (<=512) = ’);
randn(’state’,0);
w=randn(N,1);
bt=[1.0 0.0 0.0 0.0];
at=[1.0 -1.700 1.530 -0.648];
disp(’The true a vector is a = [1.0 -1.7 1.53 -0.648].’);
x=filter(bt,at,w);
y=flipud(x);
disp(’Now calculating estimate of R[m].’);
z=(1./N)*conv(x,y);
figure(1)
plot(z)
title(’estimate of correlation function’);
pause(5);
R(1,1)=z(N);
R(2,2)=z(N);
R(3,3)=z(N);
R(1,2)=z(N+1);
R(2,3)=z(N+1);
R(1,3)=z(N+2);
R(2,1)=R(1,2);
R(3,2)=R(2,3);
R(3,1)=R(1,3);
R
pause(5);
9
estimate of correlation function
4
R[m]
1
−1
−2
0 50 100 150 200
m
Figure 1: Estimate of correlation function for N = 100.
r(1,1)=z(N+1);
r(2,1)=z(N+2);
r(3,1)=z(N+3);
r
pause(5);
a=inv(R)*r
disp(’May compare a vector values to ar(3) coefficients.’);
b1=[1.0 0.0 0.0 0.0];
a1=[1.0 -a(1) -a(2) -a(3)];
[h,w]=freqz(b1,a1,512,’whole’);
s=h.*conj(h);
figure(2)
plot(w,s);
title(’ar(3) power spectral density estimate’);
pause(3);
[ht,w]=freqz(bt,at,512,’whole’);
st=ht.*conj(ht);
figure(3)
plot(w,st);
title(’true ar(3) power spectral density’);
pause(3);
figure(4)
plot(w,s,w,st)
title(’ar(3) estimate and true ar(3) psd.’);
text(0,-5,’Please press any key to exit.’);
11.16. The Matlab function VITERBIPATH given below computes the most likely state sequences
for the observations.
% The following function computes the most likely state seqeuence
10
ar(3) estimate and true ar(3) psd.
50
est.
true
40
30
20
10
0
0 1 2 3 4 5 6 7
ω
(a) N = 25
50
est.
true
40
30
20
10
0
0 1 2 3 4 5 6 7
ω
(b) N = 100
50
est.
true
40
30
20
10
0
0 1 2 3 4 5 6 7
ω
(c) N = 512
Figure 2: Comparison of the true spectrum and AR(3) spectral estimate for different values of N .
11
% accounting for the observations.
% N is the number of states and must be an integer.
% L is either the number of observations or the maximum value f
% the discrete time index
% O is the observation vector and in this program must correspond to
% the columns of B. For example if you observe
% {heads, heads, tails, heads, tails} the observation vector would be
% {1,1,2,1,2} where a 1 would correspond to a head and a 2 would
% correspond to a tail.
% A is the sate transitionn probability matrix and is 2x2 in this case
% since there are only two states ’heads’ and ’tails’.
% B is the state-conditional output probability matrix; the first column
% of B is [P[head|state1],P[head|state2]]
% PINT is the inital state probability row vector
function [PSTAR, Q] = VITERBIPATH(N,L,O,A,B,PINT)
% Initialize
Q = zeros(1,L);
psi = zeros(N,L);
phi = zeros(N,L);
for i = 1:N
phi(i,1) = PINT(i)*B(i,O(1));
end
% Iterate forward
for j = 2:L,
for i = 1:N,
for id = 1:N,
phiid(id) = phi(id,j-1) * A(id,i) * B(i,O(j));
psiid(id) = phi(id,j-1) * A(id,i);
end
phi(i,j) = max(phiid);
[junk, psi(i,j)] = max(psiid);
end
end
% Iterate backwards to find path

[PSTAR, Q(L)] = max(phi(:,L));
for n = L-1:-1:1
Q(n) = psi(Q(n+1), n+1);
end
% Graphical output
figure(1);
clf % clears current window
% The next instruction creates a graphical rectangular space of x-dimension
% going from 0 to L+1 and y-dimension from 0 to N+1.
12
axis([0 L+1 0 N+1]);
hold on; % keeps the current graphics
for j = 1:L
plot(j*ones(1,N),1:N,’o’);
end
plot(1:L, Q,’-’); % Q is a vector whose components are the states
% at time l = 1, ..., L
% Hence a solid line segment drawn from the previous node to the new node
% at tile l.
set(gca, ’XTickLabel’, [1:L]); % puts the numbers 1,2,...,L at the ticks

set(gca, ’YTick’ , [1:N]); % puts tick marks on y-label
set(gca, ’YTickLabel’, [1:N]); % puts the numbers 1,2,...,N at the ticks
xlabel(’time index’);
ylabel(’state’ );
title(’most likely state path determined by Viterbi algorithm’);
% grid
hold off; % liberates you from restrictions of
% the graphical environment above
We run the matlab file to check the answer in Example 11.5-5.
>> VITERBIPATH(2, 3, [1,2,2], [0.6,0.4,0.3,0.7], [0.3,0.7,0.6,0.4], [0.7,0.3])

ans = 0.0370
is the highest probability of observing this sequence probability Now that we are confident
that the program works we can try it on the observation vector head, head, head = 1,1,1.
The output figure obtained is given in Fig. 3.
>> VITERBIPATH(2, 3, [1,1,1], [0.6,0.4,0.3,0.7], [0.3,0.7,0.6,0.4], [0.7,0.3])

ans = 0.0318
11.17. The most likely state path is given in Fig. 4.
>> VITERBIPATH(2, 3, [1,2,2,1,1], [0.6,0.4,0.3,0.7], [0.3,0.7,0.6,0.4], [0.7,0.3])

ans = 0.0318
11.18. We are given

2 1 1 2
Y1 = X1 + X2 , Y2 = X1 + X2 ,
3 3 3 3
or
X1 = 2Y1 − Y2 , X2 = −Y1 + 2Y2 ,
where X1 : Poisson with λ1 , X2 : Poisson with λ2 , and PX1 X2 [m, k] = PX1 [m]PX2 [k].
13
most likely state path determined by Viterbi algorithm
2
state
1 2 3 1 2 3 1 2 3
time index
Figure 3:
most likely state path determined by Viterbi algorithm
2
state
1 2 3 4 5 1 2
time index
Figure 4:
14
We know that the maximum likelihood estimators for λ1 , λ2 are given by
(ml)
λ̂1 = X1 = 2Y1 − Y2 ,
(ml)
λ̂2 = X2 = −Y1 + 2Y2 .
Now let us see how the EM algorithm yields this result:

E-step: X̃ (k+1) = E[X̃ k |Ỹ ; n
λ̃(k) ], o
M-step: λ̃(k+1) = arg maxθ̃ − ni=1 θi + X̃ (k+1) Γ̃ ,
P
(h) (h)
h Γ̃ , (log θ1 , log θ2 , . . . , log θn ). To compute E[
where i X̃ |Ỹ ; λ̃ ], we need
(h) (h) (h) (h)
P X1 = x1 , X2 = x2 |Y1 = y1 , Y2 = y2 ; λ1 , λ2
h i
(h) (h) (h) (h)
= P X1 = x1 |Y1 = y1 ; λ1 ]P [X2 = x2 |Y2 = y2 ; λ2 ;
(h) (h)
since X1 , X2 are independent, X1 , X2 are also independent. However, in this case
(
(h) 1, x1 = 2y1 − y2
P [X1 = x1 |Ỹ = ỹ] = .
0, else.
Likewise (
(h) 1, x2 = −y1 + 2y2
P [X2 = x2 |Ỹ = ỹ] = ,
0, else.
(h+1) (h) (h+1) (h)
independent of the index h. Hence X1 = X1 , X2 = X2 or for all k
(k) (k)
X1 = 2Y1 − Y2 , X2 = −Y1 + 2Y2 .
Thus the E-step becomes
X̃ (h+1) = E[X̃ (h) |Ỹ , θ̃ (h) ] = (2Y1 − Y2 , −Y1 + 2Y2 )T .
And the M-step becomes

( 2 )
˜ (h+1) X (h) (h)
λ̂ = arg max − θi + X1 log θ1 + X2 log θ2 , arg max Λ(θ̃).
θ̃ i=1 θ̃
(h+1) (h+1)
∂Λ X1 (ml) ∂Λ X2
∂θ1 = 0 =⇒ 1 = θ1 or θ1 = λ1 = 2Y1 − Y2 . ∂θ2 = 0 =⇒ 1 = θ2 or
(ml)
θ2 = λ2= −Y1 + 2Y2 ,
independent of k. Hence the E-M algorithm will converge after one iteration.
11.19. From the cited equation, we have

2  2
X X
σU2 1 − ak e−jwk = σW
2 
1− ck e−jwk  ,

k k6=0
where σU2 is the minimum M.S. interpolation error and σW 2 is the minimum M.S. prediction
error, with the respective predictor coefficients ak , k = 1, . . . , p and interpolator coefficients

ck = c−k , k = 1, . . . , p (c0 = a0 = 0).
15
This equation is between two polynomials in the variable e−jw . Setting the constant terms
p
equal, we get σk2 1 + k=1 |ak |2 = σW
2 or
P
σ2
σU2 = PW 2
1 + |ak |
where σU2 < σW

2 if any a 6= 0, k = 1, . . . , p.
k
11.20. Set D(k) [n] , X[n] − X (k+1) [n]. Then from the two given equations,
σv2 X σv2
D (k) [n] = C ∗ D (k−1) [n] = cl D (k−1)
[n − l] .
σu2 + σv2 σu2 + σv2
l
Consider an interval [−N, N ] where the solution will be simulated for some large N < ∞.
Then define the norm
||D(k) || , max |D (k) [n]|.
|n|≤N
We have

X σ2
(k) (k−1)
||D || = max cl D [n − l] 2 v 2

|n|≤N σu + σv
l
X σ2
≤ |cl | max D (k−1) [n − l] 2 v 2

|n|≤N σu + σv
≤ ρ||D (k−1) ||,
2
where ρ , σ2σ+σ
P
l |cl | < 1. Note that in the above equation, the maximum of |n| should be
v
2
u v
less than or equal to N − p where p is the order of the Markov model, but N − p ≈ N .
So, ||D (k) || ≤ ρ||D (k−1) ||, k ≥ 1. As k → ∞, clearly ||D (k) || → 0 and so
X (k) [n] → X[n] for − N < n < N.
NOTE: We cannot let N = +∞ because then the norm ||D|| defined as max |D[n]| might be
infinite. Still, N is large compared to the Markov order p should be sufficient. If we know the
Y [n] are finite valued, then we could let N = +∞, because the equations are stable (BIBO
stable).
16

Solutions To Chapter 11

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Solutions To Chapter 11

Uploaded by

Copyright:

Available Formats

Solutions to Chapter 11

11.1. For minimum variance, we want to minimize diagonal terms of

ǫ2 = (AX − Y )(AX − Y )T = AK1 AT + K2 − AK12 − K21 AT .

Now write A = A0 + δ; is δ = 0 for minimum variance?

ǫ2 = (A0 + δ)K1 (A0 + δ)T + K2 − (A0 + δ)K12 − K21 (A0 + δ)T

traǫ2 = tra (Y − A0 X)(Y − A0 X)T + tra δK1 δT

tra δCC T δ = tra δC(δC)T ≥ 0 where we use factorization K1 = CC T .

Clearly set δ = 0 for traǫ2 to be minimum.

11.2. Suppose Y1 = Y − µ2 and X1 = X − µ1 . Then we know that E[Y1 ] = 0 = E[X1 ], K1 =

Ŷ1 = K21 K1−1 (X − µ1 ).

But is Ŷ1 equal to Ŷ − µ2 ?

ǫ2T = (Ŷ − Y )(Ŷ − Y )T

trǫ2T = trǫ2 + tr[(β − µ2 )(β − µ2 )T ] ≥ 0 is minimum when β = µ2 . Hence

Ŷ = Ŷ1 + µ2 = µ2 + K21 K1−1 (X − µ1 ).

E[E[X|Y ][X − E[X|Y ]]] = 0. (1)

ǫ2 = E[X 2 ] − E[XE[X|Y ]] = E[X 2 ] − E[(E[X|Y ])2 ].

Now let us extend this to random N-vectors.

ǫ2 = E[[X − E[X|Y ]][X − E[X|Y ]]T ] (3)

However, using orthogonality again,

E[E[X|Y ][X − E[X|Y ]T ]] = 0

Using Eq. 4 and Eq. 5 together,

ǫ2 = E[XX T ] − E[XE[X|Y ]T ] = E[X[X − E[X|Y ]]T ]. (6)

ǫ2 = E[XX T ] − E[E[X|Y ]E T [X|Y ]].

11.4. G[N ] , E[X[n]|YN ], YN , [Y [n], . . . , Y [n − N ]].

E[X 2 [n]] = E[(X[n] − E[X[n]|YN ] + E[X[n]|YN ])2 ]

E[E 2 [X[n]|YN ]] ≤ E[X 2 [n]]

Further, the LMMSE is given as

(a) Eq. (11.1-25) changes to

11.6. X[n] is defined as

(c) X[n] is Gaussian since W [n] is Gaussian. X̂[12|10]

Also, E[2Y [n + 2] + Y [n + 1] + Y [n]] = 2E[W [n]], and so we have

2µY [n + 2] + µY [n + 1] + µY [n] = 2µW [n]

. Obviously, µY [0] = 0 and µY [1] = 1. Then,

µX [n] , E[X[n]] = E [AX[n − 1] + BW [n]] = AµX [n − 1] + BµW [n].

From the equation, Y [n] = X[n] + V [n]. We also have

µY [n] , E[Y [n]] = E[X[n]] + E[V [n]] = µX [n] + µV [n].

(b) Since µX [n], µY [n] are deterministic variables, we have

X̂[n|n] , E[X[n]|Y [0], . . . , Y [n]]

and Ỹ [n] = Y [n] − Cn X̂[n] where X̂[n] , X̂[n|n − 1]. However,

σỸ2 [n] = E[Ỹ [n]Ỹ T [n]]

= Cn ǫ2 [n]CnT + σV2 [n].

From the result above,

By the same procedure

exists with probability 1.

where ρ1 = e−λ1 , ρ2 = e−λ2 .

lim E {IN (w)} = SX (w).

%This program generates an AR3 estimate of psd of AR3 model.

Figure 1: Estimate of correlation function for N = 100.

% The following function computes the most likely state seqeuence

function [PSTAR, Q] = VITERBIPATH(N,L,O,A,B,PINT)

% Iterate backwards to find path

set(gca, ’XTickLabel’, [1:L]); % puts the numbers 1,2,...,L at the ticks

We run the matlab file to check the answer in Example 11.5-5.

>> VITERBIPATH(2, 3, [1,2,2], [0.6,0.4,0.3,0.7], [0.3,0.7,0.6,0.4], [0.7,0.3])

>> VITERBIPATH(2, 3, [1,1,1], [0.6,0.4,0.3,0.7], [0.3,0.7,0.6,0.4], [0.7,0.3])

11.17. The most likely state path is given in Fig. 4.

>> VITERBIPATH(2, 3, [1,2,2,1,1], [0.6,0.4,0.3,0.7], [0.3,0.7,0.6,0.4], [0.7,0.3])

11.18. We are given

most likely state path determined by Viterbi algorithm

Now let us see how the EM algorithm yields this result:

Thus the E-step becomes

X̃ (h+1) = E[X̃ (h) |Ỹ , θ̃ (h) ] = (2Y1 − Y2 , −Y1 + 2Y2 )T .