You are on page 1of 61

12.

3 Linear MMSE Estimation

We begin our discussion by assuming a scalar parameter 6 is to be estimated based

on the data set {:r[0],x[11, . . .,:t:1N - 1]} or in vector form x = [:1:[0] :z:[1] ...xiaN - 1]]T.

The unknown parameter is modeled as the realization of a random variable. We do not

assume any specific form for the joint PDF p(x, 0), but as we shall see shortly, only a

knowledge of the First two moments. That 0 may be estimated from x is due to the

assumed statistical dependence of 0 on x as summarized by the joint PDF p(x, 9), and

in particular, for a linear estimator we rely on the correlation between 9 and x. We

now consider the class of all linear (actually affine) estimators of the form

N-1

6 a,,a:[n] + O N (12.1)

and choose the weighting coefficients an 's to minimize the Bayesian MSE

Bmse(6) E [(0 - é)2] (12.2)

where the expectation is with respect to the PDF p(x, 9). The resultant estimator is

termed the linear minimum mean square error (LMMSE) estimator. Note that we have

included the aN coeH'icient to allow for nonzero means of x and 9. If the means are

both zero, then this coefficient may be omitted, as will be shown later.

Before determining the LMMSE estimator we should keep in mind that the estima-

tor will be suboptimal unless the MMSE estimator happens to be linear. Such would

be the case, for example, if the Bayesian linear model applied (see Section

10.6).

Other-

wise, better estimators will exist, although they will be nonlinear (see the introductory

example in Section 10.3). Since the LMMSE estimator relies on the correlation between

random variables, a parameter uncorrelated with the data cannot be linearly estimated .

Consequently, the proposed approach is not always feasible. This is illustrated by the

following example. Consider a parameter 0 to be estimated based on the single data

sample :z:[0], where :1:101 A/'(0,02). If the parameter to be estimated is the power of

the :1:[0] realization or 0 = :1:2[01, then a perfect estimator will be


6 $2 [0]

since the minimum Bayesian MSE will be zero. This estimator is clearly nonlinear. If,

however, we attempt to use a LMMSE estimator or

é= a0a:[0] + a l a

then the optimal weighting coeiiicients 00 and (11 can be found by minimizing

Bmse(6)

II II

E [(9 - é)2]

E [(9 - a0a:[0] - 0 2 ] .

12.3. LINEAR MMSE ESTIMATION 381

We differentiate this with respect to Go and al and set the results equal to zero to

produce

E[(0-a0a:[0] -a1)w[011

E(6 - a0:r:[0] - al)

or

00

E(9:/:[0])

E(9).
a0E(x2[0]) + a1E(m[0])

a0E(:1:[0]) + 01

But E(:1:[0]) = 0 and E(9:1:[0]) = E(:1:3[0]) = 0, so that

.: 0

011 = 5(9) = E(:¢2101) = 02.

Therefore, the LMMSE estimator is 6 = 02 and does not depend on the data. This is

because 9 and :I:[0] are uncorrelated. The minimum MSE is

Bmse(6i) = E [(0 - é)2]

E [(0 - O-2)2]

E M01

- o2)2]

E(=v4[0]) - 202E(x2[0]) + 04

304 - 204 + 0-4

204

as opposed to a minimum MSE of zero for the nonlinear estimator 9 = a:2[01. Clearly,

the LMMSE estimator is inappropriate for this problem. Problem 12.1 explores how

to modify the LMMSE estimator to make it applicable.

We now derive the optimal weighting coefficients for use in (121), Substituting

(12.1) into (12.2) and differentiating

30N [I

9 N2-- l anx[n]

H2 N-1

-2E 0 Z a,1a:[n] Gn .
Setting this equal to zero produces

N-1

G N 5(9) arE(a:[n])

n=0

which as asserted earlier is zero if the means are zero. Continuing, we need to minimize

(12.3)

Bmse(65) = E

N-l

{2 an(x[n] - E(w[»])) - (0 - E(@))

382 CHAPTER 12. LINEAR BAYESIAN ESTIMATORS

over the remaining a,,'s, where

a = [as al .

..an-1]T,

we have
Gn has been replaced by (12.3). Letting

Bmse (6)

II II

E {[a"(x

- E(x)) - (0 - E(@))]2}

E [aT(x - E(»))(>< - E(x))"a] - E [aT(x - E(»=))(@ - E(9))]

II

- E {(6 - E(@))(x - E(x))Ta] + E [(0 - E(0))2]

a C xxi ` aTCx0 - C0xa + C90 (12.4)

where Cm, is the N X N covariance matrix of x, and Célx is the 1 X N cross-covariance

vector having the property that go = Cxfh and Coo is the variance of 9. Making use

of (4.3) we can minimize (12.4) by taking the gradient to yield

6Bmse(@)

pa 2Cx,,a - 2Cx0

`J

which when set to zero results in

a_

-1

" ` C x x C I 9 ' (12.5)

Using (12.3) and (12.5) in (12.1) produces

II II
aTx + Gn

CT0Cx lx + _z8(0) - CT0Cx 'E(x)

or finally the LMMSE estimator is

é= 5(0) + cgxc '(x - E(><)).

(12.6)

Note that it is identical in form to the MMSE estimator for jointly Gaussian x and 9, as

can be verified from (10.24). This is because in the Gaussian case the MMSE estimator

happens to be linear, and hence our constraint is automatically satisfied. If the means

of 0 and x are zero, then

Bmse(@)

é= cgxcx lx. (12.7)

The minimum Bayesian MSE is obtained by substituting (12.5) into (12.4) to yield

= CT0Cx ' c c lCx0 - CT

- CoxCx 1€x9 + Cao

20oxCx 1 Cx9 + Coo

II

Cox Co 1 Ca:0 q u -

-1

x9C:cx Cxél

or finally

Bmse(é) = c09 - cgcx lCx0' (12.8)


Again this is identical to that obtained by substituting (10.25) into (1112), Anexample

follows.

12.3. LINEAR MMSE ESTIMATION 383

Example 12.1 - DC Level in WGN with Uniform Prior PDF

Consider the introductory example in Chapter 10. The data model is

:z:[n]=A+w[n] n = 0 , 1 , . . . , N - 1

where A L([-A0, A01, w[n] is WGN with variance 0 2 , and A and w[n] are independent.

We wish to estimate A. The MMSE estimator cannot be obtained in closed form due

to the integration required (see (10.9)). Applying the LMMSE estimator, we first note

that E(A) = 0, and hence E(:c[n]) = 0. Since E(x) = 0, the covariances are

Cm = E(xxT)

= E l(A1 + w)(A1 + w

= E(A2)11T + 021

C6x = E(AxT)

= E [A(A1 + w)T]

= E(A2)

where 1 is an N X 1 vector of all ones. Hence, from (12.7)

IT

1 11 I

)T]

= C0,,Cx IX

= 02A Tlog T + 0 2 )"x

E(A2). But the form of the estimator is identical to that


02 as.

A AS

where we have let of =

encountered in Example 10.2 if we let NA = 0, so that from (10.31)

A = of 028

+-

Since 01 = E(A2) = (2A0)2/12 = Af,/3, the LMMSE estimator of A is

£3

= 3 - 2 :73.

EQ + L

As opposed to the original MMSE estimator which required integration, we have ob-

know that A was uniformly distributed but only its mean and variance, or that w1n1

of

A and w was not required, only that they were uncorrelated. In general, all that is

required to determine the LMMSE estimator are

12(0) c... Cox

E(x) Cx9 Cha:

However, we must realize that the LMMSE of (12.9) will be suboptimal since it has been

constrained to be linear. The optimal estimator for this problem is given by (10.9).

fained the LMMSE estimator in closed form. Also, note that we did not really need to

was Gaussian but only that it is white and its variance. Likewise, independence

the first two moments of p(x, 6) or

lM.
(12.9)

<>

lI

384 CHAPTER 12. LINEAR BAYESIAN ESTIMATORS

fr(0l

Ill] Figure 12.1 Vector space inter-

pretation of random variables

12.4 Geometrical Interpretations

In Chapter 8 we discussed a geometrical interpretation of the LSE based on the concept

of a vector space. The LMMSE estimator admits a similar interpretation, although now

the "vectors" are random variables. (Actually, both vector spaces are special cases of the

more general Hilbert space [Luenberger 1969].) This alternative formulation assumes

that 0 and X are zero mean. If they are not, we can always define the zero mean random

variables 9' = 9 - E(9) and x' = x - E(x), and consider estimation of 9' by a linear

function of x' (see also Problem 12.5). Now we wish to find the a12's so that

N--1

6 an as [n]

minimizes

Bmse(6i) = E [(@ - é)21 .

Let us now think of the random variables 0,:r:[0],:z:[1], . . . ,:z:1N - 1] as elements in a

vector space as shown symbolically in Figure 12.1. The reader may wish to verify that

the properties of a vector space are satisfied, such as vector addition and multiplication
by a scalar, etc. Since, as is usually the case, 0 cannot be perfectly expressed as a linear

combination of the a:1n]'s (if it could, then our estimator would be perfect), we picture

9 as only partially lying in the subspace spanned by the :1:1n1's. We may define the

"length" of each vector :IJ as II:r:ll = ~ / E ( ) or the square root of the variance. Longer

length vectors are those with larger variances. The zero length vector is the random

variable with zero variance or, therefore, the one that is identically zero (actually not

random at all). Finally, to complete our description we require the notion of an inner

product be tween two vectors. (Recall that if x,y are Euclidean vectors in R3, then

the inner product is (x,y) = Ty = llxll ||y||cosoz, where a is the angle between the

vectors.) It can be shown that an appropriate definition, i.e., one that satisfies the

properties of an inner product between the vectors az and y, is (see Problem 12.4)

(az, y) E(2=y)~ (12.10)

With this definition we have that

(M) = E(:1:2) r. ll2:ll 2

1 (12.11)

12.4. GEOMETRICAL INTERPRETATIONS 385

:I

y :asc

J:

a0
y

Figure 12.2 Orthogonal random

variables--y cannot be linearly es-

timated based on x

consistent with our earlier definition of the length of a vector. Also, we can now define

two vectors to be orthogonal if

(=v,y) = E(:/:y) = 0. (12.12)

Since the vectors are zero mean, this is equivalent to saying that two vectors are orthog-

onal if and only if they are uncorrelated. (In R3 two Euclidean vectors are orthogonal if

the angle between them is a = 90° so that (x,y) = llxll llyll cos = 0.) Recalling our

discussions from the previous section, this implies that if two vectors are orthogonal, we

cannot use one to estimate the other. As shown in Figure 12.2, since there is no com-

ponent of y along re, we cannot use as to estimate y. Attempting to do so means that we

need to find an a so that y* = as: minimizes the Bayesian MSE, Bmse(gQ) = E1(y - y~12],

To find the optimal value of a

dBmse(3])

do

II

II

II

II
0292]

_d E _

do [(y

do

-2E(:vy/ +

2aE(a:2)

0,

IW) - 2aE(:1:y) + aiE(a=2)1

which yields

a = E(2:y) =

E(IE2)

The LMMSE estimator of y is just y' = 0. Of course, this is a special case of (12.7)

where N = 1, 0 = y, a:[0] = Cr, and Cox = 0.

With these ideas in mind we proceed to determine the LMMSE estimator using the

vector space viewpoint. This approach is useful for conceptualization of the LMMSE

estimation process and will be used later to derive the sequential LMMSE estimator.

As before, we assume that

0.

n-1
6 a,1x[n]

where Gn = 0 due to the zero mean assumption. We wish to estimate 0 as a linear

combination of :1:[0], :r:[1], . . . ,:r:[N - 1]. The weighting coefficients an should be chosen

386 CHAPTER 12. LINEAR BAYESIAN ESTIMATORS

:c[1]

N2

9.A

1£9-9

k
III [0] ~. ~. ~.

Figure 12.3

(a) (b)

Orthogonality principle for LMMSE estimation

to minimize the MSE g

E[(0 é)2]

II

N-1

anx[n])

II

Z anx[n]

But this means that minimization of the MSE is equivalent to a minimization of the

squared length of the ETTWOT vector e = 0 -- 0. The error vector is shown in Figure 12.3b

for several candidate estimates. Clearly, the length of the error vector is minimized

when e is orthogonal to the subspace spanned by {x101, :r:111, . . . ,:z:1N - 1]}. Hence, we

require

e J. :1:[0],a:[1], .

or by using our definition of orthogonality


,a:[N-1] (12.13)

E [(6 - é)===lt=1]

II

n 0 , 1 , . . . , N - 1 . (12.14)

This is the important orthogonality principle or projection theorem. It says that in

estimating the realization of a random variable by a linear combination of data samples,

the optimal estimator is obtained when the error is orthogonal to each data sample.

Using the orthogonality principle, the weighting coefficients are easily found as

am:z:[m] I =»[~II = 0 n = 0 , 1 , . . . , N 1

or

N1

arE(x[m]x[n]) = E(6*:1:{n]) n = 0 , 1 , . . . , N - 1 .

12.4. GEOMETRICAL INTERPRETATIONS 387

In matrix form this is


E(r2[0l)

E(x[1]x[0])

E(22[0l1=[1])

E(:v2[1l)

E(:1:[0]x[N - 1])

E(a:[1]:1:[N - 1])

00

G1

E(:1:[N - 112401) E(:1:[N 11.411) MIN - 11) Gn 1

II

E(9:/:[0])

E(0a;[1])

(12.15)

E(0:1:[N -- 1])

These are the normal equations. The matrix is recognized as Cazxa and the right-hand

vector as Cx6. Therefore,

Cxxa = Cx0 (12.16)

and

a = Co-1

x Cx0° (12.17)

The LMMSE estimator of 0 is


6aTX

T -1

Cx6C:r:xX

or finally

é= C C , lx (12.18)

in agreement with (12.7). The minimum Bayesian MSE is the squared length of the

error vector or

Brnse(6) 116112

0 - Nil Ana:[n]

11=0

NZ-1 a,,[nI)2

for the all's given by (12.17). But

Bmse(6) = E

( » ) (»Nw-1»)l

E u»»»~»)1 N - 1K

. _ - E 0 - E a,1x[n] Z am:[m]

E(02) - 5 a,,E(x[n]0) - arE K9- - a.1:1:[n]

]
) wlml .

388 CHAPTER 12. LINEAR BAYESIAN ESTIMATORS

a:[0]4

Ill]

[1]

.n

=v10]

60

I
-it

**h-*----

651

@0+§1

(a)

Figure 12.4

(b)

Linear estimation by orthogonal vectors

The last term is zero due to the orthogonality principle, producing

Bmse(@)

H II II

609 -- 3TCxo

c99 - CT,,c,, 1Cx9


C09 "' CQICI 1Cx6

in agreement with (12.8).

Many important results can be easily derived when we view the LMMSE estimator

in a vector space framework. In Section 12.6 we examine its application to sequential

estimation. As a simple example of its utility in conceptualizing the estimation problem,

we provide the following illustration.

Example 12.2 - Estimation by Orthogonal Vectors

Assume that a:[0] and a:[1] are zero mean and uncorrelated with each other. However,

they are each correlated with 0. Figure 12.4a illustrates this case. The LMMSE esti-

mator of 9 based on :/:[0] and x[1] is the sum of the projections of 9 on :r:[0] and :c[1] as

shown in Figure 12.4b or

II II

0+l0,

60 + 61

r. 2:10] \ =v 10] will \ will

\ 'll:vl0]H/ will ll1=[1]ll/ 11f11111

Each component is the length of the projection (0,:1:[n]/lla:[n]ll) times the unit vector

in the :z:[n] direction. Equivalently, since la:[n]ll = \/var(:1:[n]) is a constant, this may

be written

..-=

(mol)

(w[0]»:vi0])
6 2:10] + (9»:13[1])

(3=[1]» 21111) a:[1].

By the definition of the inner product (12.10) we have

63

II

E(9a:[0])

Ew10l)

20] + E(9a:[1])

Ewulu :1:[1]

I .J I

12.5. THE VECTOR LMMSE ESTIMATOR 389

[E(@===l01) E(0a=l11)][ E(CE02 [OD 0

E(=»2u])

--1 :1:[0]

x[1]

II

-1

C622 C:z:a: X.
Clearly, the ease with which this result was obtained is due to the orthogonality of

:1:101, :1:111 or, equivalently, the diagonal nature of Cm. For nonorthogonal data samples

the same approach can be used if we first orthogonalize the samples or replace the data

by uncorrelated samples that span the same subspace. We will have more to say about

this approach in Section 12.6. O

12.5 The Vector LMMSE Estimator

The vector LMMSE estimator is a straightforward extension of the scalar one. Now we

wish to find the linear estimator that minimizes the Bayesian MSE for each element.

We assume that

61 0»nC/I[71] + 0»in (12.19)

for 5 1,2, , p and choose the weighting coefficients to minimize

Bmse(9)

II

E [(@, -6% )2] i = 1 , 2 , - . . , P

where the expectation is with respect to p(x,6li). Since we are actually determining p

separate estimators, the scalar solution can be applied, and we obtain from (12.6)

*go = E 000 + 09.-:0x 1(X - F(x))

and the minimum Bayesian MSE is from (12.8)

i 1,2, ,p

Bmse(6i) = C6i9i - CmC 1CIQi

'ea

II

1 9 2, ,p-
é

E091)

E(92)

The scalar LMMSE estimators can be combined into a vector estimator as

C0112Ca: lax -

+ C0222Ca: 1(X "

II

E(6p)

E(01)

E(02) +

1 (x

__

E(x))

C2232

Cerf

C611

CHQ"

c '(x - E(x))
II

E(6p) C9px

Et) + C C lax - E(x)) (12.20)

II

390 CHAPTER 12. LINEAR BAYESIAN ESTIMATORS

where now C993 is a p X N matrix. By a similar approach we find that the Bayesian

MSE matrix is (see Problem 12.7)

[( 0 - 0 6

Ma = E `)( )T]

= C60-C6;z:CI1Cx6 (12.21)

where Coo is the p X p covariance matrix. Consequently, the minimum Bayesian MSE

is (see Problem 12.7)

Brnse(6,)

= [Mélii

(12.22)

Of course, these results are identical to those for the MMSE estimator in the Gaussian

case, for which the estimator is linear. Note that to determine the LMMSE estimator

we require only the first two moments of the PDF.

Two properties of the LMMSE estimator are particularly useful. The first one states

that the LMMSE estimator commutes over linear (actually affine) transformations.

This is to say that if >

a A0+b
then the LMMSE estimator of a is

a=Aé+b (12.23)

with 63 given by (12.20). The second property states that the LMMSE estimator of a

sum of unknown parameters is the sum of the individual estimators. Specifically, if we

wish to estimate a = 01 + 92, then

d=é1+é2 (12.24)

where

61

92

II II

Era + c@,c,'(x - E(x))

E(02) + CQQIC 1(X -

The proof of these properties is left to the reader as an exercise (see Problem 12.8).

In analogy with the BLUE there is a corresponding Gauss-Markov theorem for

the Bayesian case. It asserts that for data having the Bayesian linear model form,

i.e., the Bayesian linear model without the Gaussian assumption, an optimal linear

estimator exists. Optimality is measured by the Bayesian MSE. The theorem is just the

application of our LMMSE estimator to the Bayesian linear model. More specifically,

the data are assumed to be

x H0 + w

where 0 is a random vector to be estimated and has mean 9 and covariance Coo, H

is a known observation matrix, w is a random vector with zero mean and covariance Cu, ,

E( )

II

12.5. THE VECTOR LMMSE ESTIMATOR 391


and 0 and

where

w are uncorrelated. Then, the LMMSE estimator of 0 is given by (12.20),

E(x)

Cm

C9a:

II II II

H0

E( )

HCQQHT + co

C99H

(See Section 10.6 for details.)

theorem.

This is summarized by the Bayesian Gauss-Markov

Theorem 12.1 (Bayesian Gauss-Markov Theorem) If the data are described by

the Bayesian linear model form

XH9+w

(12.25)

where x is an N x 1 data vector, is a known N x p observation matrix, 9 is a p x 1

random vector of parameters whose realization is to be estimated and has mean E( )


and covariance atria: Cos, and w is an N X 1 random vector with zero mean and

covariance matrix Co and is uncorrelated with (the joint PDF p(w, ) is otherwise

arbitrary), then the LMMSE estimator of 0 is

00

63

II II

E(0)+lco HC96 " + c w ) " 1 ( x -

E(0)+((oo1+ T c w l )-1 TC-1(X

HHH0

HHH-H9

E( ))

E( ))-

(12.26)

(12.27)

The performance of the estimator is measured by the error e

zero and whose covariance matrix: is

0 0 whose mean is
Q

II II II

E,,,9(eeT)

C00 C00 T(

(093 +

Trawl )

HCGQH T + C Hogg

1w)--1

-1

HH

(12.28)

(12.29)

The error covariance matrix is also the minimum MSE matrix Ma whose diagonal

elements yield the minimum Bayesian MSE

[Ma] if

II

II
[Cf]u

Bmselé). (12.30)

These results are identical to those in Theorem 11.1 for the Bayesian linear model except

that the error vector is not necessarily Gaussian. An example of the determination of

this estimator and its minimum Bayesian MSE has already been given in Section 10.6.

The Bayesian Gauss-Markov theorem states that within the class of linear estimators

the one that minimizes the Bayesian MSE for each element of 9 is given by (12.26)

or (12.27). It will not be optimal unless the conditional expectation E(9lx) happens

to be linear. Such was the case for the jointly Gaussian PDF. Although suboptimal,

the LMMSE estimator is in practice quite useful, being available in closed form and

depending only on the means and covariances.

392 CHAPTER 12. LINEAR BAYESIAN ESTIMATORS

12.6 Sequential LMMSE Estimation

In Chapter 8 we discussed the sequential LS procedure as the process of updating in

time the LSE estimator as new data become available. An analogous procedure can

be used for LMMSE estimators. That this is possible follows from the vector space

viewpoint. In Example 12.2 the LMMSE estimator was obtained by adding to the

old estimate 90, that based on a:[0], the estimate 61: that based on the new datum

:1:[1]. This was possible because :t:[01 and :1:[1] were orthogonal. When they are not, the

algebra becomes more tedious, although the approach is similar. Before stating the

general results we will illustrate the basic operations involved by considering the DC

level in white noise, which has the Bayesian linear model form. The derivation will be

purely age braic. Then we will repeat the derivation but appeal to the vector space

approach. The reason for doing so will be to "lay the groundwork" for the Kalman

filter in Ch apter 13, which will be derived using the same approach. We will assume a

zero mean for the DC level A, so that the vector space approach is applicable.

To begin the algebraic derivation we have from Example 10.1 with HA 2: 0 (see
(10.11))

._ 03 _

A[N-1]= 2 0 2 3 I

0 4 + N-

where A[N - 1] denotes the LMMSE estimator based on {a:[0], :1:[1], . . . ,:r:[N - 1]}. This

is because the LMMSE estimator is identical in form to the MMSE estimator for the

Gaussian case. Also, we note that from (10.14)

Base(A[N - 1]) 02402

N03 + 02

(12.31)

To update our estimator as :1:[N] becomes available

/HN]

II it II II II

II

of 1

U; + nd-1-1 N' +

:1:[n]
No

(N + 1)0?1 +

No

(N +

No + 0 2

n-1 x in] + :1:[N]

n=0

02

1] +

02N

O'+~

AN

1)0;+<t2 as I

AN-

(N+1)o§-I-02 I

AA N - N- 0 2 + 0 2

I(N+A

Go

1]+(V+

- 1] + (n +

up
Do +

1)03,+02 -II A[n-1]+ ( n +

A[N - 1] + (n + 1)ii'+ (2[N] -

to;

1)o§ +

02

x[N]

A[N - 11).

02:z:[N]

up

1)0E* +

02:z:[N]

(12.32)

12.6. SEQUENTIAL LMMSE ESTIMATION 393

Similar to the sequential LS estimator we correct the old estimate A[N - 1] by a scaled

version of the prediction error :z:[N] - A[N - 1]. The scaling or gain factor is from

(12.32) and (12.31)

KIN]
II II

O'A

+1)0'+0

_ 11)

(N

mSg(A[N

1])

-|-

O'

se(A3[N

Bm _
(12.33)

and decreases to zero as N -> oo, reflecting the increased confidence in the old estimator.

We can also update the minimum Bayesian MSE since from (12.31)

Bmse(/i[N])

II II II

0302

(n + 1)02A + oz

Na + 0.2 0 0 7 2

(N + l + 0 2 No + 0 2

(1 - K[N])Bmse(A[N - 11).

Summarizing our results we have the following sequential LMMSE estimator.

Estimator Update:

M m = A w - u + m m @ w A w ~ u ) (12.34)

where

KW] Brnse(A[N - 1])

Bmse(/-HN - 1]) + oz

(12.35)

Minimum MSE Update:

Base(A[N]) = (1 - K[N])B1nse(A[N - 11). (12.36)

We now use the vector space viewpoint to derive the same results. (The general sequen-

tial LMMSE estimator is derived in Appendix 12A using the vector space approach. It

is a straightforward extension of the following derivation.) Assume that the LMMSE

estimator AH] is to be found, which is based on the data {a:[01, a:[1]} as shown in Fig-

ure 12.5a. Because :1:[0] and :1:[11 are not orthogonal, we cannot simply add the estimate

based on :1:[0] to the estimate based on :1:[1]. If we did, we would be adding in an extra
component along :r[01. However, we can form A[1] as the sum of A 10] and a component

orthogonal to /i 101 as shown in Figure 12.5b. That component is A/i111. To find a vector

in the direction of this component recall that the LMMSE estimator has the property

that the error is orthogonal to the data. Hence, if we find the LMMSE estimator of

:1:[11 based on :r[0], call it i:[1l0], the error a:[1] - :i`:[1l01 will be orthogonal to :r:101. This

394 CHAPTER 12. LINEAR BAYESIAN ESTIMATORS

A[o]

3=10]

/<

r
I

1 Q _ q 1 i n - - _ - 2:

A11]

A Ill] A[0]

,A/§[1]

till]

(a) (b)

A/i[1] x[1] -~ :i:[1l0]

r*l

11 . *l

Ill]

[1l0]
(c)

Figure 12.5 Sequential estimation using vector space approach

variable y based

is shown in Figure 12.50. Since x[0] and :1:[1] - §:[1 IO] are orthogonal, we can project A

onto each vector separately and add the results, so that

AM=AM+A

To find the correction term AAU] first note that the LMMSE estimator of a random

on az, where both are zero mean, is from (12.7)

E(xy)

E(SE2) az. (12.37)

Thus, we have that

:E[1l0] E p p p m

E(f1=2[0])

E [(A + w[0])(A + w[1])]

E [(A + w[0])2]

0§0+ 02 a:[0]

:/310]

(12.38)

12.6. SEQUENTIAL LMMSE ESTIMATION 395

and the error vector :z'8[1] = :1:[1] - :i:[1l0] represents the new information that :1:[1]

contributes to the estimation of A. As such, it is called the innovation. The projection

of A along this error vector is the desired correction

AAU]
II II

1]

A, 14

II=1=[1]ll

E(Ai[1])@8[1]

E(5:2[1])

<7811]

ll52[1]II

If we let K[1] = E(A:i':11])/E(:i'

.:2111), we have for the LMMSE estimator

A11] = A 10] + Kl1l(22l1l - i11|0l)-

To evaluate 1:811101 we note that a:111 = A + w[1]. Hence, from the additive property

(12.24) :/811101 = A[0] +wl11l0]. Since w[1] is uncorrelated with w101, we have w`11l0] = 0.

Finally, then :I:11l01 = A10], and A 101 is found as

AA 0 = E(Aa:[01) 0 = UA2 0 .

Thus,

AM=AW+KW@M-AW)

It remains only to determine the gain for the correction. Since

;i':[1] = a:[1] - :i:[1l0]

: :1:[1]
Kg]

AGO]

a:[1] 00+ o2:z:[0]

the gain becomes

Kg]

E [A (ul

E[(=tl11-

0iOi02'"10l)

OA +o2t[01)]

go

- 20 + oz

which agrees with (12.33) for N = 1. We can continue this procedure to find A12], A13],

. . . . In general, we

1. find the LMMSE estimator of A based on :r[0], yielding A[0]

2. find the LMMSE estimator of :1:[1] based on :1:[0], yielding :i`:[1l0]

396 CHAPTER 12. LINEAR BAYESIAN ESTIMATORS

3. determine the innovation of the new datum :1:[1], yielding :t:111 - :i`:[1l01

4. add to A10] the LMMSE estimator of A based on the innovation, yielding As]

5. continue the process.

In essence we are generating a set of uncorrelated or orthogonal random variables,

namely, the innovations {:r:[0], :r:[11-:i:11l01, :z:121-§:[2l0, 1], . . .}. This procedure is termed
a Gram-Schmidt orthogonalization (see Problem 12.10). To find the LMMSE estimator

of A based on {:1:101, :t:[11, . . . ,:/:IN - 11} we simply add the individual estimators to yield

Aw - 11 = N-1

K[n] in] - :i:[nl0, 1, , n 11) (12.39)

where each gain factor is

K [11] 1,

1,

E [A (:1:[n] - :i:[nl0,

E [in] - :i8[nl0,

, T L * in .

, n - 1D2] 3 (12.40)

,N

This simple form is due to the uncorrelated property of the innovations. In sequential

form (12.39) becomes

Aw] = Aw - 1] + KlNl(1vlNl - i:[nl0, 1, . .. -- 11). (12.41)

To complete the derivation we must find :i:]NI0, 1, . . . , N - 1] and the gain K[N]. We

first use the additive property (12.24). Since A and w[N] are uncorrelated and zero

mean, the LMMSE estimator of :1:[N] = A + w[N] is

e[n|0,1,...,n- 1] =/-i[n|0,1,...,n-1]+w*[n|0,1,..., - 11.

But A[nlo, 1, . . . , N - 1] is by definition the LMMSE estimator of the realization of A

at time N based on {:/:[0],:c[1],. .. ,x[N - 1]} or A[N - 11 (recalling that A does not

change over the observation interval). Also, 1D]N]0, 1, . . . , N - 1] is zero since w]N] is

uncorrelated with the previous data samples since A and w]n] are uncorrelated and

w]n] is white noise. Thus,

N
:'i:[NI0, 1, . ,N-1l=AlN-H

To find the gain we use (12.40) and (12.42):

(12.42)

K[N]= E

[AMN] - A[N - 11)]

[(w[n1 A[N - up] •

But

E [Awe] - AIN - 11)] E [(A - Aw - 1])(.[N] - A[N - 11)] (12.43)

12.6. SEQUENTIAL LMMSE ESTIMATION 397

since :/:[N] - A[N - 1] is the innovation of :1:[N] which is orthogonal to {:I:[0], :1:[1], . . . ,

:/:[N - 1]} and hence to A[N - 1] (a linear combination of these data samples). Also,

E[w[N](A - A[N - 1])] = 0 as explained previously, so that

E {A(w[n1 -- A[N - II)]

II

II

EA-N~H

Bmse(/HN - 1]). (12.44)

Also,
E [ml - i[n - 11)2] E l M + A - A W - H H

E@ND+EA-N-H

0 2 + Bmse(A[N - 1]) (12.45)

SO that finally

K[n1 Bmse(A[N - 1])

+ Bmse(A[N - 1]) u

The minimum MSE update can be obtained as

O-2

(12.46)

Bmse(A[N])

II II II II

1])(=1=[N] - A[N - 11)]

E {<4 - A[nI)2]

E A - A[N - 1] - K[Nl(21[Nl - A[N -

E [(A - A[N - 1I)2] - 2K[n]E [(A _ A[N

+ K2[n1E [(m[v1 - A[N - 1112]

Bmse(/[N - 1]) - 2K[N}Bmse(A[N - 1])

+ K2[n1 (2 + BmselA[N - 11))

where we have used (12.43)-(12.45). Using (12.46), we have

Bmseli[n]l = (1 - K[NI)Bmse(A[N --v 11).

We have now derived the sequential LMMSE estimator equations based on the vector

space viewpoint.

In Appendix 12A we generalize the preceding vector space derivation for the sequen-

tial vector LMMSE estimator. To do so we must assume the data has the Bayesian
linear model form (see Theorem 12.1) and that Co is a diagonal matrix, so that the

w in] 's are uncorrelated with variance E(w21n]) = 0 2 . The latter is a critical assumption

and was used in the previous example. Fortuitously, the set of equations obtained are

valid for the case of nonzero means as well (see Appendix 12A). Hence, the equations

to follow are the sequential implementation of the vector LMMSE estimator of (12.26)

398 CHAPTER 12. LINEAR BAYESIAN ESTIMATORS

o2, h[n], M[n - 1]

:z:[n]

> :uv K[n]

,GA

> élan]

iT [al

é[n - 1]

2-1

Figure 12.6 Sequential linear minimum mean square estimator


)

or (12.27) when the noise covariance matrix Cu, is diagonal. To define the equations

we let 01n1 be the LMMSE estimator based on {a:101,3:111, . . . ,:r1n1}, and M

1n1 be the

corresponding minimum MSE matrix (just the sequential version of (12.28) or (12.29))

or

mini = E 1(

0[n1)(9 - QW] •

Also, we partition the (n + 1) p observation matrix

H[n - 1] P

hT[n] 1 p

Then, the sequential LMMSE estimator becomes (see Appendix 12A)

[to { 1 { 1

II

nx

as
Estimator Update:

9[n] = é[n - 1] + K[n](x[n] - hT[n]9[n - 1]) (12.47)

where

K [al

M[n - 1]h[n]

02 + he[n]M[n - 1]h[n] • (12.48)

Minimum MSE Matrix Update:

M [n] (1 - K[n]hT[n])M[n - 1]. (12.49)

The gain factor K[n] is a p X 1 vector, and the minimum MSE matrix has dimension

p X p. The entire estimator is summarized in Figure 12.6, where the thick arrows

indicate vector processing. To start the recursion we need to specify initial values for

9[n - 1] and [n - 1], so that K[n] can be determined from (12.48) and then é[-n] from

(12.4'7). To do so we specify é[-11 and M[-1] so that the recursion can begin at n = 0.

12.6. SEQUENTIAL LMMSE ESTIMATION 399

Since no data have been observed at n = -1, then from (12.19) our estimator becomes

the constant g = Gin. It is easily shown that the LMMSE estimator is just the mean

of H (see also (12.3)) or

69

1-11

As a result, the minimum MSE matrix is


m1-1] E [( i~1])( i-m)T]

Cgg.

E( ).

9-99-9

Several observations are in order.

1. For no prior knowledge about 9 we can let C06 -> OO. Then, we have the same form as

the sequential LSE (see Section 8.7), although the approaches are fundamentally

different.

2. No matrix inversions are required.

3. The gain factor K[n] depends on our confidence in the new data sample as measured

by 02 versus that in the previous data as summarized by M[n - 1].

An example follows.

Example 12.3 - Bayesian Fourier Analysis

We now use sequential LS to compute the LMMSE estimator in Example 11.1.

data model is

The

a:[n] = a cos 27r_f0n + bsin 27rf0n + w[n] n = 0 , 1 , 3 N 1

where to is any frequency (in Example 11.1 we assumed to to be a multiple of 1/N),

excepting 0 and 1/2 for which sin 21r_f0n is identically zero, and w[n] is white noise with

variance 02. It is desired to estimate 0 = [a biT sequentially. We further assume that

the Bayesian linear model form applies and that 0 has mean E(0) and covariance 0921.

The sequential estimator is initialized by

63

[-1]
[-1]

II

II

E( 0 )

Coo 092

1.

Then, from (12.47)

é[0} 1-1] + [0](I[0] - T[0]é[-1]).

The iT [0] vector is the first row of the observation matrix or

hTm = I 1 0 I u

0Kh

Subsequent rows are

T[n] = l cos 21rf0n sin 21rf0n ] .

400 CHAPTER 12. LINEAR BAYESIAN ESTIMATORS

The 2 X 1 gain vector is, from (12.48),

K [0] 02

Ml-1]h[0]
+ hf1°[0]M[-1]h[0]

where M[- 1] = 0§I. Once the gain vector has been found, é[0] can be computed.

Finally, the MSE matrix update is, from (12.49),

mm] = (I - K[0]hT[0])M[-1]

and the procedure continues in like manner for n > 1. <>

12.7 Signal Processing Examples - Wiener Filtering

We now examine in detail some of the important applications of the LMMSE eStimator.

In doing so we will assume that the data {a:[0],:z:[1], . . . ,:1:[N - 1]} is WSS with zero

mean. As such, the N X N covariance matrix Cm, takes the symmetric Toeplitz form

Cm,

am, [0]

rxx,1]

rx,,[1]

' x x [0]

T x x [N

Rx,,[N

am [N - 1] T x x [N - 2] am, [0]

II

R xx (12.50)

where Rx,x[k] is the ACF of the :1:[n] process and Rxx denotes the autocorrelation matrix.

Furthermore, the parameter 9 to be estimated is also assumed to be zero mean. The

group of applications that we will describe are generically called Wiener jilters.

There are three main problems that we will study. They are (see Figure 12.7)

1. Filtering, where 0 = s[n] is to be estimated based on :1:1m1 = s[m] + w[m] for


m = O, 1, . . . , n. The sequences s[n] and w[n] represent signal and noise processes.

The problem is to filter the signal from the noise. Note that the signal sample

is estimated based on the present and past data only, so that as n increases, we

view the estimation process as the application of a causal Filter to the data.

2. Smoothing, where 9 = s[n] is to be estimated for n = 0, 1, . . . , N - 1 based on the

data set {a:[0], x[1], . . . , x[N- 1]}, where :1:[n] = s 1n1 +w[n]. In contrast to Filtering

we are allowed to use future data. For instance, to estimate s[1] we can use the

entire data set {:1:[01, :1:[11, . . . ,:z:[N - 1]}, while in filtering we are constrained to

use only {:1:101, a:111}. Clearly, in smoothing an estimate cannot be obtained until

all the data has been collected.

3. Prediction, where 9

on {2:[0]»2¢[1], . . . ,

= :r:[N - 1 + Z] for Z a positive integer is to be estimated based

a:[N - 1]}. This is referred to as the Z-step prediction problem.

12.7. SIGNAL PROCESSING EXAMPLES .. VVIENER FILTERING

N=4

a."[n] a:[n]

401

t l2

113
n

T I2

1s3

§[0]

ml

up]

is]

§[0]

1]

up]

is]

(a) Filtering (b) Smoothing

mln]

N4

l=2

0
T

1s3

1a3

545] 5421

(c) Prediction

Figure 12.7
(d) Interpolation

Wiener filtering problem definitions

A related problem that is explored in Problem 12.14 is to estimate :/:[n1 based on

{:1:[0], . . . ,:z:[n - 1], :1:[n + 1], . . . , :1:[N - 1]} and is termed the interpolation problem. To

solve all three problems we use (12.20) with E(9) = E(x) = 0 or

9 -I

C9ICxx X (12.51)

and the minimum Bayesian MSE matrix given by (12.21)

MY = C66 - CoxCx 1C:c9 - (12.52)

Consider first the smoothing problem. We wish to estimate 0 = s = [s[0] s[1] . . . s[N -

111T based on x = [a:[0] :z:[1] . . .a:[N - 111T. We will make the reasonable assumption

that the signal and noise processes are uncorrelated. Hence,

r,,,,[k] = r8s[k] + rww[k].

402 CHAPTER 12. LINEAR BAYESIAN ESTIMATORS

Then, we have

C R xii = R S S +R ww I

Also,

C033 = E(sxT) = E(s(s + WF)

Therefore, the Wiener estimator of the signal is, from (12.51),

SS'

é8 is( R SS + R we) (12.53)

The N X N matrix

+ ww
.x[nu

MQ S ( ww

()

Rss-RR+RR

I-WR

ss

ss-

r,,,[0]

'res [0] + r w w [0]

W = Rs, Rs, (12.54)

is referred to as the Wiener smoothing matrix. The corresponding minimum MSE

matrix is, from (1252),

(12.55)

As an example, if N = 1 we would wish to estimate s[0] based on :1:[0] = 5101 + w[0].

Then, the Wiener smoother W is just a scalar W given by

W=

where n = r,,[0]/r,,,,,,[0] is the SNR. For a high SNR so that W -> 1, we have .§[01 -> :r:101,

while for a low SNR so that W -+ 0, we have §[01 -+ 0. The corresponding minimum

MSE is

m, = 1 - 1 - U

U+1
which for these two extremes is either 0 for a high SNR or

'res 101

for a low SNR. A

numerical example of a Wiener smoother has been given in Section 11.7. See also

Problem 12.15.

We next consider the filtering problem in which we wish to estimate 9 = s[n] based

on x = [:r:[0] :r:[1] . . The problem is repeated for each value of n until the entire

signal s[n] for n = 0, 1, . . . , N - 1 has been estimated. As before, :1:[n] = s[n] + w[n],

where s[n] and w[n] are signal and noise processes that are uncorrelated with each

other. Thus,

11

n+ 1

( W)rsS[0]

SS

res [0]

(12.56)

C I I = R ss + R w w

where R ss: Row are (n + 1) X (n + 1) autocorrelation matrices. Also,

Coat = E (s[n11 :1:[0] a:[11 :1:[n] ])

= E(s[n]1 s 101 s[1] s[n] 1)

= [rss[n] rss[n - 1] ...rss[0]] .

Letting the latter row vector be denoted as r'T8, we have from (12.51)
§[n] =

1-'" + Rw,,)"1 x.

ss R (12.57)

12.7. SIGNAL PROCESSING EXAMPLES - WIENER FILTERING 403

The (n + 1) X 1 vector of weights is seen to be

recalling that

§ [n] a Tx

where a = 1(10 (11 . . . an]T as in our original formulation of the scalar LMMSE estimator

of (12.1). We may interpret the process of forming the estimator as n increases as a

filtering operation if we define a time varying impulse response h(") [k]. Specifically, we

let h(")1k1 be the response of a filter at time n to an impulse applied k samples before.

To make the filtering correspondence we let

h(")[k] in k

Note for future reference that the vector h =

a when flipped upside down. Then,

k n.

= 0, 1, .

..,

[to] h(")[1] . h(")[n]]T is just the vector

nu] (1k$UU€]
k__0

Z h(*~)[n

-km]

or

§ [11]

II

Z" h(") [k]:z:[n - k] (12.58)

k..0

which is recognized as a time varying FIR filter. To explicitly find the impulse response

h we note that since

it follows that

(Res + R ww h = rss

where rss = [r,,[0] r,,[1] . ..r,,[n]]T. This result depends on the symmetric Toeplitz

nature of R" + Run, and the fact that h is just a when flipped upside down. Written

out, the set of linear equations becomes

rx,,[0]

rxx[1]

rxx[1]

rxx[0]

rxx[n]
rxx[n - 1]

h(")[0]

h(")[1]

II

71,8 [0]

r _,,[1]

(12.59)

Rx,,[n] rxx[n - 1] Tax [0] hen) [al rss [n]

where xx 1k1 = res la]

+ wwlkl-

These are the Wiener-Hopf filtering equations. It

appears that they must be solved for each value of n. However, a computationally

efficient algorithm for doing so is the Levinson recursion, which solves the equations

recursively to avoid resolving them for each value of n [Marple 1987]. For large enough

404 CHAPTER 12. LINEAR BAYESIAN ESTIMATORS

n it can be shown that the filter becomes time invariant so that only a single solution

is necessary. In this case an analytical solution may be possible. The Wiener-Hopf

filtering equations can be written as

ZTo 11('*)[1¢]1~,,[z - k]

rss[l] l = 0 , 1, ,n

where we have used the property xx[-k] = rxx[k]. As n -+ oo, we have upon replacing
the time varying impulse response h(")[1¢] by its time invariant version h[k]

OO

hlkltll - ks = [1] l 0,1, (12.60)

The same set of equations result if we attempt to estimate s[n] based on the present

and infinite past or based on :t:[m] for m <_ n. This is termed the infinite Wiener filter.

To see why this is so we let

STD]

= i h[k]:1:[n

- k]

and use the orthogonality principle (12.13). Then, we have

s[n] -§[n] .L

or by using the definition of orthogonality

E [(5[n] --~ §[n])x[n un = 0

,:1:[n - 1], :1:[n]

I 0,1,

Hence 7

E E h[k]a:[n - k]a:[n - l] = E(5[n]:z:[n - l])

k=0

and therefore, the equations to be solved for the infinite Wiener filter impulse response

are i
h[k]r,,,,[l - fol rS8[l] z = 0 , 1 ,

k=0

A little thought will convince the reader that the solutions must be identical since the

problem of estimating s[n] based on :1:[m] for 0 S m g_ n as n -+ oo is really just that of

using the present and infinite past to estimate the current sample. The time invariance

of the filter makes the solution independent of which sample is to be estimated or

independent of n. The solution of (12.60) utilizes spectral factorization and is explored

in Problem 12.16. At first glance it might appear that (12.60) could be solved by using

Fourier transform techniques since the left-hand side of the equation is a convolution

of two sequences. This is not the case, however, since the equations hold only for Z > 0.

If, indeed, they were valid for l < 0 as well, then the Fourier transform abbroach would

be viable. Such a set of equations arises in the smoothing problem in which s[n] is

12.7. SIGNAL PROCESSING EXAMPLES .. WIENER FILTERING 405

to be estimated based on {. . .,a:[-1],x[0],a:[1],

smoothing estimator takes the form

.} or :1:[k] for all k. In this case the

é[nl i as:c[k]

k=-OO

and by letting h[k] = a11,. we have the convolution sum

-§[nl i hmm-m

~="'*OO

where h[k] is the impulse response of an infinite two-sided time invariant filter. The

Wiener-Hopf equations become


ki= -oo h[k]r,,,,,[Z - k] r,3[k] o n < l < o o (12.61)

(see Problem 12.17). The difference from the filtering case is that now the equations

must be satisfied for all Z, and there is no constraint that h[k] must be causal. Hence,

we can use Fourier transform techniques to solve for the impulse response since (12.61)

becomes

h[n] * r,:,,[n] = r,81n] (12.62)

where * denotes convolution. Letting H(f) be the frequency response of the infinite

Wiener smoother, we have upon taking the Fourier transform of (12.62):

H(f)

II

II

p..,(f)

P%»=(f)

_p88(f)

p88(f) +

pww(f)

The frequency response is a real even function of frequency, so that the impulse response

must also be real and even. This means that the filter is not causal. Of course, this is

to be expected since we sought to estimate s[n] using future as well as present and past

data. Before leaving this topic it is worthwhile to point out that the Wiener smoother

emphasizes portions of the frequency spectrum of the data where the SNR is high and

attenuates those where it is low. This is evident if we define the "local SNR" as the

SNR in a narrow band of frequencies centered about f as

12(f)

II
P%-(f)

pww(f)

Then, the optimal filter frequency response becomes

H(f)

u(f)

1)(f) +

1.

406 CHAPTER 12. LINEAR BAYESIAN ESTIMATORS

Clearly, the filter response satisfies 0 < H(f) < 1, and thus the Wiener smoother

response is I-I(f) z 0 when 17(f) w 0 (low local SNR) and H(f) z 1 when n(f) -> oo

(high local SNR). The reader may wish to compare these results to those of the Wiener

You might also like