Random Data Processing MSE

12.
3 Linear MMSE Estimation
We begin our discussion by assuming a scalar parameter 6 is to be estimated based
on the data set {:r[0],x[11, . . .,:t:1N - 1]} or in vector form x = [:1:[0] :z:[1] ...xiaN - 1]]T.
The unknown parameter is modeled as the realization of a random variable. We do not
assume any specific form for the joint PDF p(x, 0), but as we shall see shortly, only a
knowledge of the First two moments. That 0 may be estimated from x is due to the
assumed statistical dependence of 0 on x as summarized by the joint PDF p(x, 9), and
in particular, for a linear estimator we rely on the correlation between 9 and x. We
now consider the class of all linear (actually affine) estimators of the form
N-1
6 a,,a:[n] + O N (12.1)
and choose the weighting coefficients an 's to minimize the Bayesian MSE
Bmse(6) E [(0 - é)2] (12.2)
where the expectation is with respect to the PDF p(x, 9). The resultant estimator is
termed the linear minimum mean square error (LMMSE) estimator. Note that we have
included the aN coeH'icient to allow for nonzero means of x and 9. If the means are
both zero, then this coefficient may be omitted, as will be shown later.
Before determining the LMMSE estimator we should keep in mind that the estima-
tor will be suboptimal unless the MMSE estimator happens to be linear. Such would
be the case, for example, if the Bayesian linear model applied (see Section
10.6).
Other-
wise, better estimators will exist, although they will be nonlinear (see the introductory
example in Section 10.3). Since the LMMSE estimator relies on the correlation between
random variables, a parameter uncorrelated with the data cannot be linearly estimated .
Consequently, the proposed approach is not always feasible. This is illustrated by the
following example. Consider a parameter 0 to be estimated based on the single data
sample :z:[0], where :1:101 A/'(0,02). If the parameter to be estimated is the power of
the :1:[0] realization or 0 = :1:2[01, then a perfect estimator will be

6 $2 [0]
since the minimum Bayesian MSE will be zero. This estimator is clearly nonlinear. If,
however, we attempt to use a LMMSE estimator or
é= a0a:[0] + a l a
then the optimal weighting coeiiicients 00 and (11 can be found by minimizing
Bmse(6)
II II
E [(9 - é)2]
E [(9 - a0a:[0] - 0 2 ] .
12.3. LINEAR MMSE ESTIMATION 381
We differentiate this with respect to Go and al and set the results equal to zero to
produce
E[(0-a0a:[0] -a1)w[011
E(6 - a0:r:[0] - al)
or
00
E(9:/:[0])
E(9).
a0E(x2[0]) + a1E(m[0])
a0E(:1:[0]) + 01
But E(:1:[0]) = 0 and E(9:1:[0]) = E(:1:3[0]) = 0, so that
.: 0
011 = 5(9) = E(:¢2101) = 02.
Therefore, the LMMSE estimator is 6 = 02 and does not depend on the data. This is
because 9 and :I:[0] are uncorrelated. The minimum MSE is
Bmse(6i) = E [(0 - é)2]
E [(0 - O-2)2]
E M01
- o2)2]
E(=v4[0]) - 202E(x2[0]) + 04
304 - 204 + 0-4
204
as opposed to a minimum MSE of zero for the nonlinear estimator 9 = a:2[01. Clearly,
the LMMSE estimator is inappropriate for this problem. Problem 12.1 explores how
to modify the LMMSE estimator to make it applicable.
We now derive the optimal weighting coefficients for use in (121), Substituting
(12.1) into (12.2) and differentiating
30N [I
9 N2-- l anx[n]
H2 N-1
-2E 0 Z a,1a:[n] Gn .
Setting this equal to zero produces
N-1
G N 5(9) arE(a:[n])
n=0
which as asserted earlier is zero if the means are zero. Continuing, we need to minimize
(12.3)
Bmse(65) = E
N-l
{2 an(x[n] - E(w[»])) - (0 - E(@))
382 CHAPTER 12. LINEAR BAYESIAN ESTIMATORS
over the remaining a,,'s, where
a = [as al .
..an-1]T,
we have
Gn has been replaced by (12.3). Letting
Bmse (6)
II II
E {[a"(x
- E(x)) - (0 - E(@))]2}
E [aT(x - E(»))(>< - E(x))"a] - E [aT(x - E(»=))(@ - E(9))]
II
- E {(6 - E(@))(x - E(x))Ta] + E [(0 - E(0))2]
a C xxi ` aTCx0 - C0xa + C90 (12.4)
where Cm, is the N X N covariance matrix of x, and Célx is the 1 X N cross-covariance
vector having the property that go = Cxfh and Coo is the variance of 9. Making use
of (4.3) we can minimize (12.4) by taking the gradient to yield
6Bmse(@)
pa 2Cx,,a - 2Cx0
`J
which when set to zero results in
a_
-1
" ` C x x C I 9 ' (12.5)
Using (12.3) and (12.5) in (12.1) produces
II II
aTx + Gn
CT0Cx lx + _z8(0) - CT0Cx 'E(x)
or finally the LMMSE estimator is
é= 5(0) + cgxc '(x - E(><)).
(12.6)
Note that it is identical in form to the MMSE estimator for jointly Gaussian x and 9, as
can be verified from (10.24). This is because in the Gaussian case the MMSE estimator
happens to be linear, and hence our constraint is automatically satisfied. If the means
of 0 and x are zero, then
Bmse(@)
é= cgxcx lx. (12.7)
The minimum Bayesian MSE is obtained by substituting (12.5) into (12.4) to yield
= CT0Cx ' c c lCx0 - CT
- CoxCx 1€x9 + Cao
20oxCx 1 Cx9 + Coo
II
Cox Co 1 Ca:0 q u -
-1
x9C:cx Cxél
or finally
Bmse(é) = c09 - cgcx lCx0' (12.8)

Again this is identical to that obtained by substituting (10.25) into (1112), Anexample
follows.
12.3. LINEAR MMSE ESTIMATION 383
Example 12.1 - DC Level in WGN with Uniform Prior PDF
Consider the introductory example in Chapter 10. The data model is
:z:[n]=A+w[n] n = 0 , 1 , . . . , N - 1
where A L([-A0, A01, w[n] is WGN with variance 0 2 , and A and w[n] are independent.
We wish to estimate A. The MMSE estimator cannot be obtained in closed form due
to the integration required (see (10.9)). Applying the LMMSE estimator, we first note
that E(A) = 0, and hence E(:c[n]) = 0. Since E(x) = 0, the covariances are
Cm = E(xxT)
= E l(A1 + w)(A1 + w
= E(A2)11T + 021
C6x = E(AxT)
= E [A(A1 + w)T]
= E(A2)
where 1 is an N X 1 vector of all ones. Hence, from (12.7)
IT
1 11 I
)T]
= C0,,Cx IX
= 02A Tlog T + 0 2 )"x
E(A2). But the form of the estimator is identical to that

02 as.
A AS
where we have let of =
encountered in Example 10.2 if we let NA = 0, so that from (10.31)
A = of 028
+-
Since 01 = E(A2) = (2A0)2/12 = Af,/3, the LMMSE estimator of A is
£3
= 3 - 2 :73.
EQ + L
As opposed to the original MMSE estimator which required integration, we have ob-
know that A was uniformly distributed but only its mean and variance, or that w1n1
of
A and w was not required, only that they were uncorrelated. In general, all that is
required to determine the LMMSE estimator are
12(0) c... Cox
E(x) Cx9 Cha:
However, we must realize that the LMMSE of (12.9) will be suboptimal since it has been
constrained to be linear. The optimal estimator for this problem is given by (10.9).
fained the LMMSE estimator in closed form. Also, note that we did not really need to
was Gaussian but only that it is white and its variance. Likewise, independence
the first two moments of p(x, 6) or
lM.
(12.9)
<>
lI
fr(0l
Ill] Figure 12.1 Vector space inter-
pretation of random variables
12.4 Geometrical Interpretations
In Chapter 8 we discussed a geometrical interpretation of the LSE based on the concept
of a vector space. The LMMSE estimator admits a similar interpretation, although now
the "vectors" are random variables. (Actually, both vector spaces are special cases of the
more general Hilbert space [Luenberger 1969].) This alternative formulation assumes
that 0 and X are zero mean. If they are not, we can always define the zero mean random
variables 9' = 9 - E(9) and x' = x - E(x), and consider estimation of 9' by a linear
function of x' (see also Problem 12.5). Now we wish to find the a12's so that
N--1
6 an as [n]
minimizes
Bmse(6i) = E [(@ - é)21 .
Let us now think of the random variables 0,:r:[0],:z:[1], . . . ,:z:1N - 1] as elements in a
vector space as shown symbolically in Figure 12.1. The reader may wish to verify that
the properties of a vector space are satisfied, such as vector addition and multiplication
by a scalar, etc. Since, as is usually the case, 0 cannot be perfectly expressed as a linear
combination of the a:1n]'s (if it could, then our estimator would be perfect), we picture
9 as only partially lying in the subspace spanned by the :1:1n1's. We may define the
"length" of each vector :IJ as II:r:ll = ~ / E ( ) or the square root of the variance. Longer
length vectors are those with larger variances. The zero length vector is the random
variable with zero variance or, therefore, the one that is identically zero (actually not
random at all). Finally, to complete our description we require the notion of an inner
product be tween two vectors. (Recall that if x,y are Euclidean vectors in R3, then
the inner product is (x,y) = Ty = llxll ||y||cosoz, where a is the angle between the
vectors.) It can be shown that an appropriate definition, i.e., one that satisfies the
properties of an inner product between the vectors az and y, is (see Problem 12.4)
(az, y) E(2=y)~ (12.10)
With this definition we have that
(M) = E(:1:2) r. ll2:ll 2
1 (12.11)
12.4. GEOMETRICAL INTERPRETATIONS 385
:I
y :asc
J:
a0
y
Figure 12.2 Orthogonal random
variables--y cannot be linearly es-
timated based on x
consistent with our earlier definition of the length of a vector. Also, we can now define
two vectors to be orthogonal if
(=v,y) = E(:/:y) = 0. (12.12)
Since the vectors are zero mean, this is equivalent to saying that two vectors are orthog-
onal if and only if they are uncorrelated. (In R3 two Euclidean vectors are orthogonal if
the angle between them is a = 90° so that (x,y) = llxll llyll cos = 0.) Recalling our
discussions from the previous section, this implies that if two vectors are orthogonal, we
cannot use one to estimate the other. As shown in Figure 12.2, since there is no com-
ponent of y along re, we cannot use as to estimate y. Attempting to do so means that we
need to find an a so that y* = as: minimizes the Bayesian MSE, Bmse(gQ) = E1(y - y~12],
To find the optimal value of a
dBmse(3])
do
II
II
II
II
0292]
_d E _
do [(y
do
-2E(:vy/ +
2aE(a:2)
0,
IW) - 2aE(:1:y) + aiE(a=2)1
which yields
a = E(2:y) =
E(IE2)
The LMMSE estimator of y is just y' = 0. Of course, this is a special case of (12.7)
where N = 1, 0 = y, a:[0] = Cr, and Cox = 0.
With these ideas in mind we proceed to determine the LMMSE estimator using the
vector space viewpoint. This approach is useful for conceptualization of the LMMSE
estimation process and will be used later to derive the sequential LMMSE estimator.
As before, we assume that
0.
n-1
6 a,1x[n]
where Gn = 0 due to the zero mean assumption. We wish to estimate 0 as a linear
combination of :1:[0], :r:[1], . . . ,:r:[N - 1]. The weighting coefficients an should be chosen
:c[1]
N2
9.A
1£9-9
k
III [0] ~. ~. ~.
Figure 12.3
(a) (b)
Orthogonality principle for LMMSE estimation
to minimize the MSE g
E[(0 é)2]
II
N-1
anx[n])
II
Z anx[n]
But this means that minimization of the MSE is equivalent to a minimization of the
squared length of the ETTWOT vector e = 0 -- 0. The error vector is shown in Figure 12.3b
for several candidate estimates. Clearly, the length of the error vector is minimized
when e is orthogonal to the subspace spanned by {x101, :r:111, . . . ,:z:1N - 1]}. Hence, we
require
e J. :1:[0],a:[1], .
or by using our definition of orthogonality

,a:[N-1] (12.13)
E [(6 - é)===lt=1]
II
n 0 , 1 , . . . , N - 1 . (12.14)
This is the important orthogonality principle or projection theorem. It says that in
estimating the realization of a random variable by a linear combination of data samples,
the optimal estimator is obtained when the error is orthogonal to each data sample.
Using the orthogonality principle, the weighting coefficients are easily found as
am:z:[m] I =»[~II = 0 n = 0 , 1 , . . . , N 1
or
N1
arE(x[m]x[n]) = E(6*:1:{n]) n = 0 , 1 , . . . , N - 1 .
12.4. GEOMETRICAL INTERPRETATIONS 387
In matrix form this is

E(r2[0l)
E(x[1]x[0])
E(22[0l1=[1])
E(:v2[1l)
E(:1:[0]x[N - 1])
E(a:[1]:1:[N - 1])
00
G1
E(:1:[N - 112401) E(:1:[N 11.411) MIN - 11) Gn 1
II
E(9:/:[0])
E(0a;[1])
(12.15)
E(0:1:[N -- 1])
These are the normal equations. The matrix is recognized as Cazxa and the right-hand
vector as Cx6. Therefore,
Cxxa = Cx0 (12.16)
and
a = Co-1
x Cx0° (12.17)
The LMMSE estimator of 0 is

6aTX
T -1
Cx6C:r:xX
or finally
é= C C , lx (12.18)
in agreement with (12.7). The minimum Bayesian MSE is the squared length of the
error vector or
Brnse(6) 116112
0 - Nil Ana:[n]
11=0
NZ-1 a,,[nI)2
for the all's given by (12.17). But
Bmse(6) = E
( » ) (»Nw-1»)l
E u»»»~»)1 N - 1K
. _ - E 0 - E a,1x[n] Z am:[m]
E(02) - 5 a,,E(x[n]0) - arE K9- - a.1:1:[n]
]
) wlml .
a:[0]4
Ill]
[1]
.n
=v10]
60
I
-it
**h-*----
651
@0+§1
(a)
Figure 12.4
(b)
Linear estimation by orthogonal vectors
The last term is zero due to the orthogonality principle, producing
Bmse(@)
H II II
609 -- 3TCxo
c99 - CT,,c,, 1Cx9

C09 "' CQICI 1Cx6
in agreement with (12.8).
Many important results can be easily derived when we view the LMMSE estimator
in a vector space framework. In Section 12.6 we examine its application to sequential
estimation. As a simple example of its utility in conceptualizing the estimation problem,
we provide the following illustration.
Example 12.2 - Estimation by Orthogonal Vectors
Assume that a:[0] and a:[1] are zero mean and uncorrelated with each other. However,
they are each correlated with 0. Figure 12.4a illustrates this case. The LMMSE esti-
mator of 9 based on :/:[0] and x[1] is the sum of the projections of 9 on :r:[0] and :c[1] as
shown in Figure 12.4b or
II II
0+l0,
60 + 61
r. 2:10] \ =v 10] will \ will
\ 'll:vl0]H/ will ll1=[1]ll/ 11f11111
Each component is the length of the projection (0,:1:[n]/lla:[n]ll) times the unit vector
in the :z:[n] direction. Equivalently, since la:[n]ll = \/var(:1:[n]) is a constant, this may
be written
..-=
(mol)
(w[0]»:vi0])
6 2:10] + (9»:13[1])
(3=[1]» 21111) a:[1].
By the definition of the inner product (12.10) we have
63
II
E(9a:[0])
Ew10l)
20] + E(9a:[1])
Ewulu :1:[1]
I .J I
12.5. THE VECTOR LMMSE ESTIMATOR 389
[E(@===l01) E(0a=l11)][ E(CE02 [OD 0
E(=»2u])
--1 :1:[0]
x[1]
II
-1
C622 C:z:a: X.
Clearly, the ease with which this result was obtained is due to the orthogonality of
:1:101, :1:111 or, equivalently, the diagonal nature of Cm. For nonorthogonal data samples
the same approach can be used if we first orthogonalize the samples or replace the data
by uncorrelated samples that span the same subspace. We will have more to say about
this approach in Section 12.6. O
12.5 The Vector LMMSE Estimator
The vector LMMSE estimator is a straightforward extension of the scalar one. Now we
wish to find the linear estimator that minimizes the Bayesian MSE for each element.
We assume that
61 0»nC/I[71] + 0»in (12.19)
for 5 1,2, , p and choose the weighting coefficients to minimize
Bmse(9)
II
E [(@, -6% )2] i = 1 , 2 , - . . , P
where the expectation is with respect to p(x,6li). Since we are actually determining p
separate estimators, the scalar solution can be applied, and we obtain from (12.6)
*go = E 000 + 09.-:0x 1(X - F(x))
and the minimum Bayesian MSE is from (12.8)
i 1,2, ,p
Bmse(6i) = C6i9i - CmC 1CIQi
'ea
II
1 9 2, ,p-
é
E091)
E(92)
The scalar LMMSE estimators can be combined into a vector estimator as
C0112Ca: lax -
+ C0222Ca: 1(X "
II
E(6p)
E(01)
E(02) +
1 (x
__
E(x))
C2232
Cerf
C611
CHQ"
c '(x - E(x))
II
E(6p) C9px
Et) + C C lax - E(x)) (12.20)
II
where now C993 is a p X N matrix. By a similar approach we find that the Bayesian
MSE matrix is (see Problem 12.7)
[( 0 - 0 6
Ma = E `)( )T]
= C60-C6;z:CI1Cx6 (12.21)
where Coo is the p X p covariance matrix. Consequently, the minimum Bayesian MSE
is (see Problem 12.7)
Brnse(6,)
= [Mélii
(12.22)
Of course, these results are identical to those for the MMSE estimator in the Gaussian
case, for which the estimator is linear. Note that to determine the LMMSE estimator
we require only the first two moments of the PDF.
Two properties of the LMMSE estimator are particularly useful. The first one states
that the LMMSE estimator commutes over linear (actually affine) transformations.
This is to say that if >
a A0+b
then the LMMSE estimator of a is
a=Aé+b (12.23)
with 63 given by (12.20). The second property states that the LMMSE estimator of a
sum of unknown parameters is the sum of the individual estimators. Specifically, if we
wish to estimate a = 01 + 92, then
d=é1+é2 (12.24)
where
61
92
II II
Era + c@,c,'(x - E(x))
E(02) + CQQIC 1(X -
The proof of these properties is left to the reader as an exercise (see Problem 12.8).
In analogy with the BLUE there is a corresponding Gauss-Markov theorem for
the Bayesian case. It asserts that for data having the Bayesian linear model form,
i.e., the Bayesian linear model without the Gaussian assumption, an optimal linear
estimator exists. Optimality is measured by the Bayesian MSE. The theorem is just the
application of our LMMSE estimator to the Bayesian linear model. More specifically,
the data are assumed to be
x H0 + w
where 0 is a random vector to be estimated and has mean 9 and covariance Coo, H
is a known observation matrix, w is a random vector with zero mean and covariance Cu, ,
E( )
II
12.5. THE VECTOR LMMSE ESTIMATOR 391

and 0 and
where
w are uncorrelated. Then, the LMMSE estimator of 0 is given by (12.20),
E(x)
Cm
C9a:
II II II
H0
E( )
HCQQHT + co
C99H
(See Section 10.6 for details.)
theorem.
This is summarized by the Bayesian Gauss-Markov
Theorem 12.1 (Bayesian Gauss-Markov Theorem) If the data are described by
the Bayesian linear model form
XH9+w
(12.25)
where x is an N x 1 data vector, is a known N x p observation matrix, 9 is a p x 1
random vector of parameters whose realization is to be estimated and has mean E( )

and covariance atria: Cos, and w is an N X 1 random vector with zero mean and
covariance matrix Co and is uncorrelated with (the joint PDF p(w, ) is otherwise
arbitrary), then the LMMSE estimator of 0 is
00
63
II II
E(0)+lco HC96 " + c w ) " 1 ( x -
E(0)+((oo1+ T c w l )-1 TC-1(X
HHH0
HHH-H9
E( ))
E( ))-
(12.26)
(12.27)
The performance of the estimator is measured by the error e
zero and whose covariance matrix: is
0 0 whose mean is
Q
II II II
E,,,9(eeT)
C00 C00 T(
(093 +
Trawl )
HCGQH T + C Hogg
1w)--1
-1
HH
(12.28)
(12.29)
The error covariance matrix is also the minimum MSE matrix Ma whose diagonal
elements yield the minimum Bayesian MSE
[Ma] if
II
II
[Cf]u
Bmselé). (12.30)
These results are identical to those in Theorem 11.1 for the Bayesian linear model except
that the error vector is not necessarily Gaussian. An example of the determination of
this estimator and its minimum Bayesian MSE has already been given in Section 10.6.
The Bayesian Gauss-Markov theorem states that within the class of linear estimators
the one that minimizes the Bayesian MSE for each element of 9 is given by (12.26)
or (12.27). It will not be optimal unless the conditional expectation E(9lx) happens
to be linear. Such was the case for the jointly Gaussian PDF. Although suboptimal,
the LMMSE estimator is in practice quite useful, being available in closed form and
depending only on the means and covariances.
12.6 Sequential LMMSE Estimation
In Chapter 8 we discussed the sequential LS procedure as the process of updating in
time the LSE estimator as new data become available. An analogous procedure can
be used for LMMSE estimators. That this is possible follows from the vector space
viewpoint. In Example 12.2 the LMMSE estimator was obtained by adding to the
old estimate 90, that based on a:[0], the estimate 61: that based on the new datum
:1:[1]. This was possible because :t:[01 and :1:[1] were orthogonal. When they are not, the
algebra becomes more tedious, although the approach is similar. Before stating the
general results we will illustrate the basic operations involved by considering the DC
level in white noise, which has the Bayesian linear model form. The derivation will be
purely age braic. Then we will repeat the derivation but appeal to the vector space
approach. The reason for doing so will be to "lay the groundwork" for the Kalman
filter in Ch apter 13, which will be derived using the same approach. We will assume a
zero mean for the DC level A, so that the vector space approach is applicable.
To begin the algebraic derivation we have from Example 10.1 with HA 2: 0 (see
(10.11))
._ 03 _
A[N-1]= 2 0 2 3 I
0 4 + N-
where A[N - 1] denotes the LMMSE estimator based on {a:[0], :1:[1], . . . ,:r:[N - 1]}. This
is because the LMMSE estimator is identical in form to the MMSE estimator for the
Gaussian case. Also, we note that from (10.14)
Base(A[N - 1]) 02402
N03 + 02
(12.31)
To update our estimator as :1:[N] becomes available
/HN]
II it II II II
II
of 1
U; + nd-1-1 N' +
:1:[n]
No
(N + 1)0?1 +
No
(N +
No + 0 2
n-1 x in] + :1:[N]
n=0
02
1] +
02N
O'+~
AN
1)0;+<t2 as I
AN-
(N+1)o§-I-02 I
AA N - N- 0 2 + 0 2
I(N+A
Go
1]+(V+
- 1] + (n +
up
Do +
1)03,+02 -II A[n-1]+ ( n +
A[N - 1] + (n + 1)ii'+ (2[N] -
to;
1)o§ +
02
x[N]
A[N - 11).
02:z:[N]
up
1)0E* +
02:z:[N]
(12.32)
12.6. SEQUENTIAL LMMSE ESTIMATION 393
Similar to the sequential LS estimator we correct the old estimate A[N - 1] by a scaled
version of the prediction error :z:[N] - A[N - 1]. The scaling or gain factor is from
(12.32) and (12.31)
KIN]
II II
O'A
+1)0'+0
_ 11)
(N
mSg(A[N
1])
-|-
O'
se(A3[N
Bm _
(12.33)
and decreases to zero as N -> oo, reflecting the increased confidence in the old estimator.
We can also update the minimum Bayesian MSE since from (12.31)
Bmse(/i[N])
II II II
0302
(n + 1)02A + oz
Na + 0.2 0 0 7 2
(N + l + 0 2 No + 0 2
(1 - K[N])Bmse(A[N - 11).
Summarizing our results we have the following sequential LMMSE estimator.
Estimator Update:
M m = A w - u + m m @ w A w ~ u ) (12.34)
where
KW] Brnse(A[N - 1])
Bmse(/-HN - 1]) + oz
(12.35)
Minimum MSE Update:
Base(A[N]) = (1 - K[N])B1nse(A[N - 11). (12.36)
We now use the vector space viewpoint to derive the same results. (The general sequen-
tial LMMSE estimator is derived in Appendix 12A using the vector space approach. It
is a straightforward extension of the following derivation.) Assume that the LMMSE
estimator AH] is to be found, which is based on the data {a:[01, a:[1]} as shown in Fig-
ure 12.5a. Because :1:[0] and :1:[11 are not orthogonal, we cannot simply add the estimate
based on :1:[0] to the estimate based on :1:[1]. If we did, we would be adding in an extra
component along :r[01. However, we can form A[1] as the sum of A 10] and a component
orthogonal to /i 101 as shown in Figure 12.5b. That component is A/i111. To find a vector
in the direction of this component recall that the LMMSE estimator has the property
that the error is orthogonal to the data. Hence, if we find the LMMSE estimator of
:1:[11 based on :r[0], call it i:[1l0], the error a:[1] - :i`:[1l01 will be orthogonal to :r:101. This
A[o]
3=10]
/<
r
I
1 Q _ q 1 i n - - _ - 2:
A11]
A Ill] A[0]
,A/§[1]
till]
(a) (b)
A/i[1] x[1] -~ :i:[1l0]
r*l
11 . *l
Ill]
[1l0]
(c)
Figure 12.5 Sequential estimation using vector space approach
variable y based
is shown in Figure 12.50. Since x[0] and :1:[1] - §:[1 IO] are orthogonal, we can project A
onto each vector separately and add the results, so that
AM=AM+A
To find the correction term AAU] first note that the LMMSE estimator of a random
on az, where both are zero mean, is from (12.7)
E(xy)
E(SE2) az. (12.37)
Thus, we have that
:E[1l0] E p p p m
E(f1=2[0])
E [(A + w[0])(A + w[1])]
E [(A + w[0])2]
0§0+ 02 a:[0]
:/310]
(12.38)
and the error vector :z'8[1] = :1:[1] - :i:[1l0] represents the new information that :1:[1]
contributes to the estimation of A. As such, it is called the innovation. The projection
of A along this error vector is the desired correction
AAU]
II II
1]
A, 14
II=1=[1]ll
E(Ai[1])@8[1]
E(5:2[1])
<7811]
ll52[1]II
If we let K[1] = E(A:i':11])/E(:i'
.:2111), we have for the LMMSE estimator
A11] = A 10] + Kl1l(22l1l - i11|0l)-
To evaluate 1:811101 we note that a:111 = A + w[1]. Hence, from the additive property
(12.24) :/811101 = A[0] +wl11l0]. Since w[1] is uncorrelated with w101, we have w`11l0] = 0.
Finally, then :I:11l01 = A10], and A 101 is found as
AA 0 = E(Aa:[01) 0 = UA2 0 .
Thus,
AM=AW+KW@M-AW)
It remains only to determine the gain for the correction. Since
;i':[1] = a:[1] - :i:[1l0]
: :1:[1]
Kg]
AGO]
a:[1] 00+ o2:z:[0]
the gain becomes
Kg]
E [A (ul
E[(=tl11-
0iOi02'"10l)
OA +o2t[01)]
go
- 20 + oz
which agrees with (12.33) for N = 1. We can continue this procedure to find A12], A13],
. . . . In general, we
1. find the LMMSE estimator of A based on :r[0], yielding A[0]
2. find the LMMSE estimator of :1:[1] based on :1:[0], yielding :i`:[1l0]
3. determine the innovation of the new datum :1:[1], yielding :t:111 - :i`:[1l01
4. add to A10] the LMMSE estimator of A based on the innovation, yielding As]
5. continue the process.
In essence we are generating a set of uncorrelated or orthogonal random variables,
namely, the innovations {:r:[0], :r:[11-:i:11l01, :z:121-§:[2l0, 1], . . .}. This procedure is termed
a Gram-Schmidt orthogonalization (see Problem 12.10). To find the LMMSE estimator
of A based on {:1:101, :t:[11, . . . ,:/:IN - 11} we simply add the individual estimators to yield
Aw - 11 = N-1
K[n] in] - :i:[nl0, 1, , n 11) (12.39)
where each gain factor is
K [11] 1,
1,
E [A (:1:[n] - :i:[nl0,
E [in] - :i8[nl0,
, T L * in .
, n - 1D2] 3 (12.40)
,N
This simple form is due to the uncorrelated property of the innovations. In sequential
form (12.39) becomes
Aw] = Aw - 1] + KlNl(1vlNl - i:[nl0, 1, . .. -- 11). (12.41)
To complete the derivation we must find :i:]NI0, 1, . . . , N - 1] and the gain K[N]. We
first use the additive property (12.24). Since A and w[N] are uncorrelated and zero
mean, the LMMSE estimator of :1:[N] = A + w[N] is
e[n|0,1,...,n- 1] =/-i[n|0,1,...,n-1]+w*[n|0,1,..., - 11.
But A[nlo, 1, . . . , N - 1] is by definition the LMMSE estimator of the realization of A
at time N based on {:/:[0],:c[1],. .. ,x[N - 1]} or A[N - 11 (recalling that A does not
change over the observation interval). Also, 1D]N]0, 1, . . . , N - 1] is zero since w]N] is
uncorrelated with the previous data samples since A and w]n] are uncorrelated and
w]n] is white noise. Thus,
N
:'i:[NI0, 1, . ,N-1l=AlN-H
To find the gain we use (12.40) and (12.42):
(12.42)
K[N]= E
[AMN] - A[N - 11)]
[(w[n1 A[N - up] •
But
E [Awe] - AIN - 11)] E [(A - Aw - 1])(.[N] - A[N - 11)] (12.43)
since :/:[N] - A[N - 1] is the innovation of :1:[N] which is orthogonal to {:I:[0], :1:[1], . . . ,
:/:[N - 1]} and hence to A[N - 1] (a linear combination of these data samples). Also,
E[w[N](A - A[N - 1])] = 0 as explained previously, so that
E {A(w[n1 -- A[N - II)]
II
II
EA-N~H
Bmse(/HN - 1]). (12.44)
Also,
E [ml - i[n - 11)2] E l M + A - A W - H H
E@ND+EA-N-H
0 2 + Bmse(A[N - 1]) (12.45)
SO that finally
K[n1 Bmse(A[N - 1])
+ Bmse(A[N - 1]) u
The minimum MSE update can be obtained as
O-2
(12.46)
Bmse(A[N])
II II II II
1])(=1=[N] - A[N - 11)]
E {<4 - A[nI)2]
E A - A[N - 1] - K[Nl(21[Nl - A[N -
E [(A - A[N - 1I)2] - 2K[n]E [(A _ A[N
+ K2[n1E [(m[v1 - A[N - 1112]
Bmse(/[N - 1]) - 2K[N}Bmse(A[N - 1])
+ K2[n1 (2 + BmselA[N - 11))
where we have used (12.43)-(12.45). Using (12.46), we have
Bmseli[n]l = (1 - K[NI)Bmse(A[N --v 11).
We have now derived the sequential LMMSE estimator equations based on the vector
space viewpoint.
In Appendix 12A we generalize the preceding vector space derivation for the sequen-
tial vector LMMSE estimator. To do so we must assume the data has the Bayesian
linear model form (see Theorem 12.1) and that Co is a diagonal matrix, so that the
w in] 's are uncorrelated with variance E(w21n]) = 0 2 . The latter is a critical assumption
and was used in the previous example. Fortuitously, the set of equations obtained are
valid for the case of nonzero means as well (see Appendix 12A). Hence, the equations
to follow are the sequential implementation of the vector LMMSE estimator of (12.26)
o2, h[n], M[n - 1]
:z:[n]
> :uv K[n]
,GA
> élan]
iT [al
é[n - 1]
2-1
Figure 12.6 Sequential linear minimum mean square estimator

)
or (12.27) when the noise covariance matrix Cu, is diagonal. To define the equations
we let 01n1 be the LMMSE estimator based on {a:101,3:111, . . . ,:r1n1}, and M
1n1 be the
corresponding minimum MSE matrix (just the sequential version of (12.28) or (12.29))
or
mini = E 1(
0[n1)(9 - QW] •
Also, we partition the (n + 1) p observation matrix
H[n - 1] P
hT[n] 1 p
Then, the sequential LMMSE estimator becomes (see Appendix 12A)
[to { 1 { 1
II
nx
as
Estimator Update:
9[n] = é[n - 1] + K[n](x[n] - hT[n]9[n - 1]) (12.47)
where
K [al
M[n - 1]h[n]
02 + he[n]M[n - 1]h[n] • (12.48)
Minimum MSE Matrix Update:
M [n] (1 - K[n]hT[n])M[n - 1]. (12.49)
The gain factor K[n] is a p X 1 vector, and the minimum MSE matrix has dimension
p X p. The entire estimator is summarized in Figure 12.6, where the thick arrows
indicate vector processing. To start the recursion we need to specify initial values for
9[n - 1] and [n - 1], so that K[n] can be determined from (12.48) and then é[-n] from
(12.4'7). To do so we specify é[-11 and M[-1] so that the recursion can begin at n = 0.
Since no data have been observed at n = -1, then from (12.19) our estimator becomes
the constant g = Gin. It is easily shown that the LMMSE estimator is just the mean
of H (see also (12.3)) or
69
1-11
As a result, the minimum MSE matrix is

m1-1] E [( i~1])( i-m)T]
Cgg.
E( ).
9-99-9
Several observations are in order.
1. For no prior knowledge about 9 we can let C06 -> OO. Then, we have the same form as
the sequential LSE (see Section 8.7), although the approaches are fundamentally
different.
2. No matrix inversions are required.
3. The gain factor K[n] depends on our confidence in the new data sample as measured
by 02 versus that in the previous data as summarized by M[n - 1].
An example follows.
Example 12.3 - Bayesian Fourier Analysis
We now use sequential LS to compute the LMMSE estimator in Example 11.1.
data model is
The
a:[n] = a cos 27r_f0n + bsin 27rf0n + w[n] n = 0 , 1 , 3 N 1
where to is any frequency (in Example 11.1 we assumed to to be a multiple of 1/N),
excepting 0 and 1/2 for which sin 21r_f0n is identically zero, and w[n] is white noise with
variance 02. It is desired to estimate 0 = [a biT sequentially. We further assume that
the Bayesian linear model form applies and that 0 has mean E(0) and covariance 0921.
The sequential estimator is initialized by
63
[-1]
[-1]
II
II
E( 0 )
Coo 092
1.
Then, from (12.47)
é[0} 1-1] + [0](I[0] - T[0]é[-1]).
The iT [0] vector is the first row of the observation matrix or
hTm = I 1 0 I u
0Kh
Subsequent rows are
T[n] = l cos 21rf0n sin 21rf0n ] .
The 2 X 1 gain vector is, from (12.48),
K [0] 02
Ml-1]h[0]
+ hf1°[0]M[-1]h[0]
where M[- 1] = 0§I. Once the gain vector has been found, é[0] can be computed.
Finally, the MSE matrix update is, from (12.49),
mm] = (I - K[0]hT[0])M[-1]
and the procedure continues in like manner for n > 1. <>
12.7 Signal Processing Examples - Wiener Filtering
We now examine in detail some of the important applications of the LMMSE eStimator.
In doing so we will assume that the data {a:[0],:z:[1], . . . ,:1:[N - 1]} is WSS with zero
mean. As such, the N X N covariance matrix Cm, takes the symmetric Toeplitz form
Cm,
am, [0]
rxx,1]
rx,,[1]
' x x [0]
T x x [N
Rx,,[N
am [N - 1] T x x [N - 2] am, [0]
II
R xx (12.50)
where Rx,x[k] is the ACF of the :1:[n] process and Rxx denotes the autocorrelation matrix.
Furthermore, the parameter 9 to be estimated is also assumed to be zero mean. The
group of applications that we will describe are generically called Wiener jilters.
There are three main problems that we will study. They are (see Figure 12.7)
1. Filtering, where 0 = s[n] is to be estimated based on :1:1m1 = s[m] + w[m] for

m = O, 1, . . . , n. The sequences s[n] and w[n] represent signal and noise processes.
The problem is to filter the signal from the noise. Note that the signal sample
is estimated based on the present and past data only, so that as n increases, we
view the estimation process as the application of a causal Filter to the data.
2. Smoothing, where 9 = s[n] is to be estimated for n = 0, 1, . . . , N - 1 based on the
data set {a:[0], x[1], . . . , x[N- 1]}, where :1:[n] = s 1n1 +w[n]. In contrast to Filtering
we are allowed to use future data. For instance, to estimate s[1] we can use the
entire data set {:1:[01, :1:[11, . . . ,:z:[N - 1]}, while in filtering we are constrained to
use only {:1:101, a:111}. Clearly, in smoothing an estimate cannot be obtained until
all the data has been collected.
3. Prediction, where 9
on {2:[0]»2¢[1], . . . ,
= :r:[N - 1 + Z] for Z a positive integer is to be estimated based
a:[N - 1]}. This is referred to as the Z-step prediction problem.
12.7. SIGNAL PROCESSING EXAMPLES .. VVIENER FILTERING
N=4
a."[n] a:[n]
401
t l2
113
n
T I2
1s3
§[0]
ml
up]
is]
§[0]
1]
up]
is]
(a) Filtering (b) Smoothing
mln]
N4
l=2
0
T
1s3
1a3
545] 5421
(c) Prediction
Figure 12.7
(d) Interpolation
Wiener filtering problem definitions
A related problem that is explored in Problem 12.14 is to estimate :/:[n1 based on
{:1:[0], . . . ,:z:[n - 1], :1:[n + 1], . . . , :1:[N - 1]} and is termed the interpolation problem. To
solve all three problems we use (12.20) with E(9) = E(x) = 0 or
9 -I
C9ICxx X (12.51)
and the minimum Bayesian MSE matrix given by (12.21)
MY = C66 - CoxCx 1C:c9 - (12.52)
Consider first the smoothing problem. We wish to estimate 0 = s = [s[0] s[1] . . . s[N -
111T based on x = [a:[0] :z:[1] . . .a:[N - 111T. We will make the reasonable assumption
that the signal and noise processes are uncorrelated. Hence,
r,,,,[k] = r8s[k] + rww[k].
Then, we have
C R xii = R S S +R ww I
Also,
C033 = E(sxT) = E(s(s + WF)
Therefore, the Wiener estimator of the signal is, from (12.51),
SS'
é8 is( R SS + R we) (12.53)
The N X N matrix
+ ww
.x[nu
MQ S ( ww
()
Rss-RR+RR
I-WR
ss
ss-
r,,,[0]
'res [0] + r w w [0]
W = Rs, Rs, (12.54)
is referred to as the Wiener smoothing matrix. The corresponding minimum MSE
matrix is, from (1252),
(12.55)
As an example, if N = 1 we would wish to estimate s[0] based on :1:[0] = 5101 + w[0].
Then, the Wiener smoother W is just a scalar W given by
W=
where n = r,,[0]/r,,,,,,[0] is the SNR. For a high SNR so that W -> 1, we have .§[01 -> :r:101,
while for a low SNR so that W -+ 0, we have §[01 -+ 0. The corresponding minimum
MSE is
m, = 1 - 1 - U
U+1
which for these two extremes is either 0 for a high SNR or
'res 101
for a low SNR. A
numerical example of a Wiener smoother has been given in Section 11.7. See also
Problem 12.15.
We next consider the filtering problem in which we wish to estimate 9 = s[n] based
on x = [:r:[0] :r:[1] . . The problem is repeated for each value of n until the entire
signal s[n] for n = 0, 1, . . . , N - 1 has been estimated. As before, :1:[n] = s[n] + w[n],
where s[n] and w[n] are signal and noise processes that are uncorrelated with each
other. Thus,
11
n+ 1
( W)rsS[0]
SS
res [0]
(12.56)
C I I = R ss + R w w
where R ss: Row are (n + 1) X (n + 1) autocorrelation matrices. Also,
Coat = E (s[n11 :1:[0] a:[11 :1:[n] ])
= E(s[n]1 s 101 s[1] s[n] 1)
= [rss[n] rss[n - 1] ...rss[0]] .
Letting the latter row vector be denoted as r'T8, we have from (12.51)
§[n] =
1-'" + Rw,,)"1 x.
ss R (12.57)
12.7. SIGNAL PROCESSING EXAMPLES - WIENER FILTERING 403
The (n + 1) X 1 vector of weights is seen to be
recalling that
§ [n] a Tx
where a = 1(10 (11 . . . an]T as in our original formulation of the scalar LMMSE estimator
of (12.1). We may interpret the process of forming the estimator as n increases as a
filtering operation if we define a time varying impulse response h(") [k]. Specifically, we
let h(")1k1 be the response of a filter at time n to an impulse applied k samples before.
To make the filtering correspondence we let
h(")[k] in k
Note for future reference that the vector h =
a when flipped upside down. Then,
k n.
= 0, 1, .
..,
[to] h(")[1] . h(")[n]]T is just the vector
nu] (1k$UU€]
k__0
Z h(*~)[n
-km]
or
§ [11]
II
Z" h(") [k]:z:[n - k] (12.58)
k..0
which is recognized as a time varying FIR filter. To explicitly find the impulse response
h we note that since
it follows that
(Res + R ww h = rss
where rss = [r,,[0] r,,[1] . ..r,,[n]]T. This result depends on the symmetric Toeplitz
nature of R" + Run, and the fact that h is just a when flipped upside down. Written
out, the set of linear equations becomes
rx,,[0]
rxx[1]
rxx[1]
rxx[0]
rxx[n]
rxx[n - 1]
h(")[0]
h(")[1]
II
71,8 [0]
r _,,[1]
(12.59)
Rx,,[n] rxx[n - 1] Tax [0] hen) [al rss [n]
where xx 1k1 = res la]
+ wwlkl-
These are the Wiener-Hopf filtering equations. It
appears that they must be solved for each value of n. However, a computationally
efficient algorithm for doing so is the Levinson recursion, which solves the equations
recursively to avoid resolving them for each value of n [Marple 1987]. For large enough
n it can be shown that the filter becomes time invariant so that only a single solution
is necessary. In this case an analytical solution may be possible. The Wiener-Hopf
filtering equations can be written as
ZTo 11('*)[1¢]1~,,[z - k]
rss[l] l = 0 , 1, ,n
where we have used the property xx[-k] = rxx[k]. As n -+ oo, we have upon replacing
the time varying impulse response h(")[1¢] by its time invariant version h[k]
OO
hlkltll - ks = [1] l 0,1, (12.60)
The same set of equations result if we attempt to estimate s[n] based on the present
and infinite past or based on :t:[m] for m <_ n. This is termed the infinite Wiener filter.
To see why this is so we let
STD]
= i h[k]:1:[n
- k]
and use the orthogonality principle (12.13). Then, we have
s[n] -§[n] .L
or by using the definition of orthogonality
E [(5[n] --~ §[n])x[n un = 0
,:1:[n - 1], :1:[n]
I 0,1,
Hence 7
E E h[k]a:[n - k]a:[n - l] = E(5[n]:z:[n - l])
k=0
and therefore, the equations to be solved for the infinite Wiener filter impulse response
are i
h[k]r,,,,[l - fol rS8[l] z = 0 , 1 ,
k=0
A little thought will convince the reader that the solutions must be identical since the
problem of estimating s[n] based on :1:[m] for 0 S m g_ n as n -+ oo is really just that of
using the present and infinite past to estimate the current sample. The time invariance
of the filter makes the solution independent of which sample is to be estimated or
independent of n. The solution of (12.60) utilizes spectral factorization and is explored
in Problem 12.16. At first glance it might appear that (12.60) could be solved by using
Fourier transform techniques since the left-hand side of the equation is a convolution
of two sequences. This is not the case, however, since the equations hold only for Z > 0.
If, indeed, they were valid for l < 0 as well, then the Fourier transform abbroach would
be viable. Such a set of equations arises in the smoothing problem in which s[n] is
12.7. SIGNAL PROCESSING EXAMPLES .. WIENER FILTERING 405
to be estimated based on {. . .,a:[-1],x[0],a:[1],
smoothing estimator takes the form
.} or :1:[k] for all k. In this case the
é[nl i as:c[k]
k=-OO
and by letting h[k] = a11,. we have the convolution sum
-§[nl i hmm-m
~="'*OO
where h[k] is the impulse response of an infinite two-sided time invariant filter. The
Wiener-Hopf equations become

ki= -oo h[k]r,,,,,[Z - k] r,3[k] o n < l < o o (12.61)
(see Problem 12.17). The difference from the filtering case is that now the equations
must be satisfied for all Z, and there is no constraint that h[k] must be causal. Hence,
we can use Fourier transform techniques to solve for the impulse response since (12.61)
becomes
h[n] * r,:,,[n] = r,81n] (12.62)
where * denotes convolution. Letting H(f) be the frequency response of the infinite
Wiener smoother, we have upon taking the Fourier transform of (12.62):
H(f)
II
II
p..,(f)
P%»=(f)
_p88(f)
p88(f) +
pww(f)
The frequency response is a real even function of frequency, so that the impulse response
must also be real and even. This means that the filter is not causal. Of course, this is
to be expected since we sought to estimate s[n] using future as well as present and past
data. Before leaving this topic it is worthwhile to point out that the Wiener smoother
emphasizes portions of the frequency spectrum of the data where the SNR is high and
attenuates those where it is low. This is evident if we define the "local SNR" as the
SNR in a narrow band of frequencies centered about f as
12(f)
II
P%-(f)
pww(f)
Then, the optimal filter frequency response becomes
H(f)
u(f)
1)(f) +
1.
Clearly, the filter response satisfies 0 < H(f) < 1, and thus the Wiener smoother
response is I-I(f) z 0 when 17(f) w 0 (low local SNR) and H(f) z 1 when n(f) -> oo
(high local SNR). The reader may wish to compare these results to those of the Wiener

Random Data Processing MSE

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Random Data Processing MSE

Uploaded by

Copyright:

Available Formats

12.

3 Linear MMSE Estimation

We begin our discussion by assuming a scalar parameter 6 is to be estimated based

The unknown parameter is modeled as the realization of a random variable. We do not

in particular, for a linear estimator we rely on the correlation between 9 and x. We

Bmse(6) E [(0 - é)2] (12.2)

following example. Consider a parameter 0 to be estimated based on the single data

the :1:[0] realization or 0 = :1:2[01, then a perfect estimator will be

however, we attempt to use a LMMSE estimator or

12.3. LINEAR MMSE ESTIMATION 381

E(6 - a0:r:[0] - al)

But E(:1:[0]) = 0 and E(9:1:[0]) = E(:1:3[0]) = 0, so that

011 = 5(9) = E(:¢2101) = 02.

because 9 and :I:[0] are uncorrelated. The minimum MSE is

Bmse(6i) = E [(0 - é)2]

304 - 204 + 0-4

to modify the LMMSE estimator to make it applicable.

(12.1) into (12.2) and differentiating

{2 an(x[n] - E(w[»])) - (0 - E(@))

382 CHAPTER 12. LINEAR BAYESIAN ESTIMATORS

over the remaining a,,'s, where

E [aT(x - E(»))(>< - E(x))"a] - E [aT(x - E(»=))(@ - E(9))]

- E {(6 - E(@))(x - E(x))Ta] + E [(0 - E(0))2]

a C xxi ` aTCx0 - C0xa + C90 (12.4)

where Cm, is the N X N covariance matrix of x, and Célx is the 1 X N cross-covariance

of (4.3) we can minimize (12.4) by taking the gradient to yield

which when set to zero results in

" ` C x x C I 9 ' (12.5)

Using (12.3) and (12.5) in (12.1) produces

CT0Cx lx + _z8(0) - CT0Cx 'E(x)

or finally the LMMSE estimator is

é= 5(0) + cgxc '(x - E(><)).

of 0 and x are zero, then

é= cgxcx lx. (12.7)

= CT0Cx ' c c lCx0 - CT

- CoxCx 1€x9 + Cao

20oxCx 1 Cx9 + Coo

Bmse(é) = c09 - cgcx lCx0' (12.8)

12.3. LINEAR MMSE ESTIMATION 383

Example 12.1 - DC Level in WGN with Uniform Prior PDF

Consider the introductory example in Chapter 10. The data model is

where 1 is an N X 1 vector of all ones. Hence, from (12.7)

= 02A Tlog T + 0 2 )"x

E(A2). But the form of the estimator is identical to that

where we have let of =

encountered in Example 10.2 if we let NA = 0, so that from (10.31)

Since 01 = E(A2) = (2A0)2/12 = Af,/3, the LMMSE estimator of A is

required to determine the LMMSE estimator are

12(0) c... Cox

E(x) Cx9 Cha:

the first two moments of p(x, 6) or

384 CHAPTER 12. LINEAR BAYESIAN ESTIMATORS

Ill] Figure 12.1 Vector space inter-

pretation of random variables

12.4 Geometrical Interpretations

In Chapter 8 we discussed a geometrical interpretation of the LSE based on the concept

Bmse(6i) = E [(@ - é)21 .

Let us now think of the random variables 0,:r:[0],:z:[1], . . . ,:z:1N - 1] as elements in a

(az, y) E(2=y)~ (12.10)

With this definition we have that

(M) = E(:1:2) r. ll2:ll 2

12.4. GEOMETRICAL INTERPRETATIONS 385

Figure 12.2 Orthogonal random

variables--y cannot be linearly es-