Faculty of EEE
SSPre2012
DHT, HCMUT
1
References
[1] Athanasios Papoulis, Probability, Random Variables, and
Stochastic Processes, McGrawHill, 1991 (3
rd
Ed.), 2001 (4
th
Ed.).
[2] Steven M. Kay, Fundamentals of Statistical Signal Processing:
Estimation Theory, Prentice Hall, 1993.
[3] Alan V. Oppenheim, Ronald W. Schafer, DiscreteTime Signal
Processing, Prentice Hall, 1989.
[4] Dimitris G. Manolakis, Vinay K. Ingle, Stephen M. Kogon,
Statistical and Adaptive Signal Processing, Artech House, 2005.
Dept. of Telecomm. Eng.
Faculty of EEE
SSPre2012
DHT, HCMUT
Chapter 4:
Mean Square Estimation
Dept. of Telecomm. Eng.
Faculty of EEE
SSPre2012
DHT, HCMUT
3
4. Minimum Mean Square Estimation (1)
Given some information that is related to an unknown quantity of
interest, the problem is to obtain a good estimate for the unknown in
terms of the observed data.
Suppose represent a sequence of random
variables about whom one set of observations are available, and Y
represents an unknown random variable. The problem is to obtain a
good estimate for Y in terms of the observations
Let
represent such an estimate for Y.
Note that can be a linear or a nonlinear function of the observation
. Clearly,
represents the error in the above estimate, and the square of the error.
n
X X X , , ,
2 1
n
X X X , , ,
2 1
) ( ) , , , (
ˆ
2 1
X X X X Y
n
ϕ ϕ = =
(41)
) (⋅ ϕ
n
X X X , , ,
2 1
) (
ˆ
) ( X Y Y Y X ϕ ε − = − = (42)
2
  ε
Dept. of Telecomm. Eng.
Faculty of EEE
SSPre2012
DHT, HCMUT
4. Minimum Mean Square Estimation (2)
Since ε is a random variable, represents the mean square error.
One strategy to obtain a good estimator would be to minimize the mean
square error by varying over all possible forms of and this procedure
gives rise to the Minimization of the Mean Square Error (MMSE)
criterion for estimation. Thus under MMSE criterion, the estimator is
chosen such that the mean square error is at its minimum.
Next we show that the conditional mean of Y given X is the best estimator
in the above sense.
Theorem 1: Under MMSE criterion, the best estimator for the unknown Y
in terms of is given by the conditional mean of Y gives X.
Thus
4
}   {
2
ε E
), (⋅ ϕ
), (⋅ ϕ
}   {
2
ε E
n
X X X , , ,
2 1
}.  { ) (
ˆ
X Y E X Y = =ϕ
(43)
Dept. of Telecomm. Eng.
Faculty of EEE
SSPre2012
DHT, HCMUT
4. Minimum Mean Square Estimation (3)
Proof : Let represent an estimate of Y in terms of
Then the error and the mean square
error is given by
Since
we can rewrite (44) as
where the inner expectation is with respect to Y, and the outer one is with
respect to Thus
5
) (
ˆ
X Y ϕ =
). , , , (
2 1 n
X X X X =
,
ˆ
Y Y − = ε
}  ) (  { } 
ˆ
 { }   {
2 2 2 2
X Y E Y Y E E ϕ ε σ
ε
− = − = =
(44)
}]  { [ ] [ X z E E z E
z X
= (45)
}]  ) (  { [ }  ) (  {
z
2
z
2 2
X X Y E E X Y E
Y X
ϕ ϕ σ
ε
− = − =
. X
∫
∞ +
∞ −
− =
− =
. ) ( }  ) (  {
}]  ) (  { [
2
2 2
dx X f X X Y E
X X Y E E
X
ϕ
ϕ σ
ε
(46)
Dept. of Telecomm. Eng.
Faculty of EEE
SSPre2012
DHT, HCMUT
4. Minimum Mean Square Estimation (4)
To obtain the best estimator ϕ we need to minimize in (46)
with respect to ϕ. In (46), since
and the variable ϕ appears only in the integrand term, minimization
of the mean square error in (46) with respect to ϕ is
equivalent to minimization of with respect to ϕ.
Since X is fixed at some value, is no longer random,
and hence minimization of is equivalent to
This gives
or
6
2
ε
σ
, 0 ) ( ≥ X f
X
, 0 }  ) (  {
2
≥ − X X Y E ϕ
2
ε
σ
}  ) (  {
2
X X Y E ϕ −
) ( X ϕ
}  ) (  {
2
X X Y E ϕ −
. 0 }  ) (  {
2
= −
∂
∂
X X Y E ϕ
ϕ
(47)
0 }  ) ( { = − X X Y E ϕ
. 0 }  ) ( { }  { = − X X E X Y E ϕ (48)
Dept. of Telecomm. Eng.
Faculty of EEE
SSPre2012
DHT, HCMUT
4. Minimum Mean Square Estimation (5)
But
since when is a fixed number Using (49) in (48) we
get the desired estimator to be
Thus, the conditional mean of Y given represents the best
estimator for Y that minimizes the mean square error.
The minimum value of the mean square error is given by
7
), ( }  ) ( { X X X E ϕ ϕ = (49)
) ( , X x X ϕ = ). (x ϕ
}. , , ,  { }  { ) (
ˆ
2 1 n
X X X Y E X Y E X Y = = =ϕ (410)
n
X X X , , ,
2 1
. 0 )}  {var(
] }  )  (  { [ }  )  (  {
) var(
2 2 2
min
≥ =
− = − =
X Y E
X X Y E Y E E X Y E Y E
X Y
σ
(411)
Dept. of Telecomm. Eng.
Faculty of EEE
SSPre2012
DHT, HCMUT
4. Minimum Mean Square Estimation (6)
As an example, suppose is the unknown. Then the best MMSE
estimator is given by
Clearly if then indeed is the best estimator for Y in terms
of X. Thus the best estimator can be nonlinear.
8
3
X Y =
. }  { }  {
ˆ
3
3
X X X E X Y E Y = = = (412)
3
X Y =
3
ˆ
X Y =
Dept. of Telecomm. Eng.
Faculty of EEE
SSPre2012
DHT, HCMUT
4. Minimum Mean Square Estimation (7)
Example: Let
where k > 0 is a suitable normalization constant. To determine the best
estimate for Y in terms of X, we need
Thus
9
¹
´
¦
< < <
=
otherwise, 0
1 0 ,
) , (
,
y x kxy
y x f
Y X
).  (

x y f
X Y
1. 0 ,
2
) 1 (
2
) , ( ) (
2
1
2
1
1
,
< <
−
= =
= =
∫ ∫
x
x kx kxy
kxydy dy y x f x f
x
x x
Y X X
. 1 0 ;
1
2
2 / ) 1 ( ) (
) , (
)  (
2 2
,
< < <
−
=
−
= = y x
x
y
x kx
kxy
x f
y x f
x y f
X
Y X
X Y
y
x
1
1
(413)
Dept. of Telecomm. Eng.
Faculty of EEE
SSPre2012
DHT, HCMUT
4. Minimum Mean Square Estimation (8)
Hence the best MMSE estimator is given by
Once again the best estimator is nonlinear. In general the best estimator
is difficult to evaluate, and hence next we will examine the
special subclass of best linear estimators.
10
.
1
) 1 (
3
2
1
1
3
2
1 3
2
)  ( }  { ) (
ˆ
2
2
2
3
1
2
3
1
2
1
2
1
1
2
1
2 2

x
x x
x
x
x
y
dy y dy y
dy x y f y X Y E X Y
x
x x x x
y
x
X Y
−
+ +
=
−
−
=
−
=
= =
= = =
∫ ∫
∫
− −
ϕ
}  { X Y E
(414)
Dept. of Telecomm. Eng.
Faculty of EEE
SSPre2012
DHT, HCMUT
Best Linear Estimator (1)
In this case the estimator is a linear function of the observations
Thus
where are unknown quantities to be determined. The mean
square error is given by
and under the MMSE criterion should be chosen so that the
mean square error is at its minimum possible value. Let
represent that minimum possible value. Then
To minimize (416), we can equate:
11
Y
ˆ
. , , ,
2 1 n
X X X
∑
=
= + + + =
n
i
i i n n l
X a X a X a X a Y
1
2 2 1 1
.
ˆ
(415)
n
a a a , , ,
2 1
)
ˆ
(
l
Y Y − = ε
}   { } 
ˆ
{ }   {
2 2 2
∑
− = − =
i i l
X a Y E Y Y E E ε (416)
n
a a a , , ,
2 1
}   {
2
ε E
2
n
σ
∑
=
− =
n
i
i i
a a a
n
X a Y E
n
1
2
, , ,
2
}.  { min
2 1
σ
(417)
,n. , , k E
a
k
2 1 , 0 }  {
2
= =
∂
∂
ε
(418)
Dept. of Telecomm. Eng.
Faculty of EEE
SSPre2012
DHT, HCMUT
Best Linear Estimator (2)
This gives
But
Substituting (419) in to (418), we get
or the best linear estimator must satisfy
12
. 0 2
 
}  {
*
2
2
=
(
(
¸
(
¸
)
`
¹
¹
´
¦
∂
∂
=
)
`
¹
¹
´
¦
∂
∂
=
∂
∂
k k k
a
E
a
E E
a
ε
ε
ε
ε
(419)
.
) ( ) (
1 1
k
k
n
i
i i
k k
n
i
i i
k
X
a
X a
a
Y
a
X a Y
a
− =
∂
∂
−
∂
∂
=
∂
− ∂
=
∂
∂
∑ ∑
= =
ε
(420)
, 0 } { 2
}  {
*
2
= − =
∂
∂
k
k
X E
a
E
ε
ε
. , , 2 , 1 , 0 } {
*
n k X E
k
= = ε (421)
Dept. of Telecomm. Eng.
Faculty of EEE
SSPre2012
DHT, HCMUT
Best Linear Estimator (3)
Notice that in (421), represents the estimation error
and represents the data. Thus from (421), the error is
orthogonal to the data for the best linear estimator. This is
the orthogonality principle.
In other words, in the linear estimator (415), the unknown constants
must be selected such that the error is
orthogonal to every data for the best linear estimator that
minimizes the mean square error.
Interestingly a general form of the orthogonality principle holds good in
the case of nonlinear estimators also.
Nonlinear orthogonality rule: Let represent any functional
form of the data and the best estimator for Y given With
, we shall show that
13
ε
∑
=
−
n
i
i i
X a Y
1
), (
n k X
k
→ =1 ,
ε
n k X
k
→ =1 ,
n
a a a , , ,
2 1
∑
=
− =
n
i
i i
X a Y
1
ε
n
X X X , , ,
2 1
) ( X h
}  { X Y E . X
}  { X Y E Y e − =
Dept. of Telecomm. Eng.
Faculty of EEE
SSPre2012
DHT, HCMUT
Best Linear Estimator (4)
implying that
This follows since
Thus in the nonlinear version of the orthogonality rule the error is
orthogonal to any functional form of the data.
14
, 0 )} ( { = X eh E (422)
). ( }  { X h X Y E Y e ⊥ − =
. 0 )} ( { )} ( {
]}  ) ( [ { )} ( {
} ) ( ]  [ { )} ( {
)} ( ])  [ {( )} ( {
= − =
− =
− =
− =
X Yh E X Yh E
X X Yh E E X Yh E
X h X Y E E X Yh E
X h X Y E Y E X eh E
Dept. of Telecomm. Eng.
Faculty of EEE
SSPre2012
DHT, HCMUT
Best Linear Estimator (5)
The orthogonality principle in (421) can be used to obtain the unknowns
in the linear case.
For example suppose n = 2, and we need to estimate Y in terms of X
1
and
X
2
linearly. Thus
From (421), the orthogonality rule gives
Thus
or
Solving (423) to obtain in terms of the crosscorrelations.
15
n
a a a , , ,
2 1
2 2 1 1
ˆ
X a X a Y
l
+ =
0 } ) {( } X {
0 } ) {( } X {
*
2 2 2 1 1
*
2
*
1 2 2 1 1
*
1
= − − =
= − − =
X X a X a Y E E
X X a X a Y E E
ε
ε
} { }  { } {
} { } { }  {
*
2 2
2
2 1
*
2 1
*
1 2
*
1 2 1
2
1
YX E a X E a X X E
YX E a X X E a X E
= +
= +


.

\

= 
.

\



.

\

} {
} {
}  { } {
} { }  {
*
2
*
1
2
1
2
2
*
2 1
*
1 2
2
1
YX E
YX E
a
a
X E X X E
X X E X E
(423)
2 1
and a a
Dept. of Telecomm. Eng.
Faculty of EEE
SSPre2012
DHT, HCMUT
Best Linear Estimator (6)
The minimum value of the mean square error in (417) is given by
But using (421), the second term in (424) is zero, since the error is
orthogonal to the data X
i
where are chosen to be optimum.
Thus the minimum value of the mean square error is given by
where are the optimum values from (421).
16
2
n
σ
}. { min } { min
} ) ( { min } { min
}  { min
*
1
, , ,
*
, , ,
1
*
, , ,
*
, , ,
2
, , ,
2
2 1 2 1
2 1 2 1
2 1
l
n
i
i
a a a a a a
n
i
i i
a a a a a a
a a a
n
X E a Y E
X a Y E E
E
n n
n n
n
ε ε
ε εε
ε σ
∑
∑
=
=
− =
− = =
=
(424)
n
a a a , , ,
2 1
} { }  {
} ) {( } {
1
* 2
1
* * 2
∑
∑
=
=
− =
− = =
n
i
i i
n
i
i i n
Y X E a Y E
Y X a Y E Y E ε σ
n
a a a , , ,
2 1
(425)
Dept. of Telecomm. Eng.
Faculty of EEE
SSPre2012
DHT, HCMUT
Best Linear Estimator (7)
Since the linear estimate in (415) is only a special case of the general
estimator in (41), the best linear estimator that satisfies (421)
cannot be superior to the best nonlinear estimator Often the best
linear estimator will be inferior to the best estimator in (43).
This raises the following question: Are there situations in which the best
estimator in (43) also turns out to be linear? In those situations it is
enough to use (421) and obtain the best linear estimators, since they also
represent the best global estimators. Such is the case if Y and
are distributed as jointly Gaussian.
We summarize this in the next theorem and prove that result.
Theorem 2: If and Y are jointly Gaussian zero mean
random variables, then the best estimate for Y in terms of is
always linear.
17
) ( X ϕ
}.  { X Y E
n
X X X , , ,
2 1
n
X X X , , ,
2 1
n
X X X , , ,
2 1
Dept. of Telecomm. Eng.
Faculty of EEE
SSPre2012
DHT, HCMUT
Best Linear Estimator (8)
Proof : Let
represent the best (possibly nonlinear) estimate of Y, and
the best linear estimate of Y. Then from (421)
is orthogonal to the data Thus
Also from (428),
Using (429)(430), we get
18
}  { ) , , , (
ˆ
2 1
X Y E X X X Y
n
= = ϕ
(426)
∑
=
=
n
i
i i l
X a Y
1
ˆ
(427)
1
n
l i i
i
Y Y Y a X ε
=
= − = −
∑
(428)
. 1 , n k X
k
→ =
. 1 , 0 } { n k X E
*
k
→ = = ε (429)
. 0 } { } { } {
1
= − =
∑
=
n
i
i i
X E a Y E E ε
(430)
. 1 , 0 } { } { } {
* *
n k X E E X E
k k
→ = = = ε ε (431)
Dept. of Telecomm. Eng.
Faculty of EEE
SSPre2012
DHT, HCMUT
Best Linear Estimator (9)
From (431), we obtain that ε and X
k
are zero mean uncorrelated random
variables for But ε itself represents a Gaussian random
variable, since from (428) it represents a linear combination of a set of
jointly Gaussian random variables. Thus ε and X are jointly Gaussian
and uncorrelated random variables. As a result, ε and X are independent
random variables. Thus from their independence
But from (430), and hence from (432)
Substituting (428) into (433), we get
or
19
. 1 n k → =
}. { }  { ε ε E X E =
(432)
, 0 } { = ε E
. 0 }  { = X E ε
(433)
0 }  { }  {
1
= − =
∑
=
X X a Y E X E
n
i
i i
ε
(434)
. }  { }  {
1
1
l
n
i
i i
n
i
i i
Y X a X X a E X Y E = = =
∑ ∑
= =
Dept. of Telecomm. Eng.
Faculty of EEE
SSPre2012
DHT, HCMUT
Best Linear Estimator (10)
From (426), represents the best possible estimator, and
from (434), represents the best linear estimator. Thus the best
linear estimator is also the best possible overall estimator in the Gaussian
case. Next we turn our attention to prediction problems using linear
estimators.
20
) ( }  { x X Y E ϕ =
∑
=
n
i
i i
X a
1
Dept. of Telecomm. Eng.
Faculty of EEE
SSPre2012
DHT, HCMUT
Linear Prediction (1)
Suppose are known and is unknown. Thus
and this represents a onestep prediction problem. If the unknown is
then it represents a kstep ahead prediction problem. Returning back to
the onestep predictor, let represent the best linear predictor. Then
where the error
is orthogonal to the data, i.e.,
21
n
X X X , , ,
2 1
1 + n
X ,
1 +
=
n
X Y
1
ˆ
+ n
X
,
k n
X
+
1
1
ˆ
= ,
n
n i i
i
X a X
+
=
−
∑
(435)
, 1 ,
ˆ
1
1
1
1 2 2 1 1
1
1 1 1
= =
+ + + + =
+ = − =
+
+
=
+
=
+ + +
∑
∑
n
n
i
i i
n n n
n
i
i i n n n n
a X a
X X a X a X a
X a X X X
ε
(436)
. 1 , 0 } {
*
n k X E
k n
→ = = ε
(437)
Dept. of Telecomm. Eng.
Faculty of EEE
SSPre2012
DHT, HCMUT
Linear Prediction (2)
Using (436) in (437), we get
Suppose represents the sample of a wide sense stationary stochastic
process X(t) so that
Thus (438) becomes
Expanding (440) for we get the following set of linear
equations
22
∑
+
=
→ = = =
1
1
* *
. 1 , 0 } { } {
n
i
k i i k n
n k X X E a X E ε (438)
i
X
* *
) ( } {
i k k i k i
r r k i R X X E
− −
= = − =
(439)
. 1 , 1 , 0 } {
1
1
1
*
n k a r a X E
n
n
i
k i i k n
→ = = = =
+
+
=
−
∑
ε
(440)
, , , 2 , 1 n k =
. 0
2 0
1 0
1 0
*
3 3
*
2 2
*
1 1
1 2 1 3 0 2
*
1 1
1 2 3 1 2 0 1
n k r r a r a r a r a
k r r a r a r a r a
k r r a r a r a r a
n n n n
n n n
n n n
= ← = + + + + +
= ← = + + + + +
= ← = + + + + +
− − −
− −
−
(441)
Dept. of Telecomm. Eng.
Faculty of EEE
SSPre2012
DHT, HCMUT
Linear Prediction (3)
Similarly using (425), the minimum mean square error is given by
The n equations in (441) together with (442) can be represented as
23
.
} ) {(
} { } { }  {
0 1
*
2 3
*
1 2
*
1
1
1
*
1
1
1
*
1
*
1
* 2 2
r r a r a r a r a
r a X X a E
X E Y E E
n n n n
n
i
i n i
n
i
n i i
n n n n
+ + + + + =
= =
= = =
− −
+
=
− +
+
=
+
+
∑ ∑
ε ε ε σ
(442)
.
0
0
0
0
1
2
3
2
1
0
*
1
*
1
*
1 0
*
2
*
1
2 0
*
1
*
2
1 1 0
*
1
2 1 0








.

\

=








.

\









.

\

−
− −
−
−
n
n
n n
n n
n
n
n
a
a
a
a
r r r r
r r r r
r r r r
r r r r
r r r r
σ
(443)
Dept. of Telecomm. Eng.
Faculty of EEE
SSPre2012
DHT, HCMUT
Linear Prediction (4)
Let
Notice that T
n
is Hermitian Toeplitz and positive definite. Using (444),
the unknowns in (443) can be represented as
24
.
0
*
1
*
1
*
1 1 0
*
1
2 1 0





.

\

=
−
−
r r r r
r r r r
r r r r
T
n n
n
n
n
(444)





.

\

=








.

\

=








.

\

−
−
1
2
2
1 3
2
1
of
column
Last
0
0
0
0
1
n
n
n
n
n
T
T
a
a
a
a
σ
σ
(445)
Dept. of Telecomm. Eng.
Faculty of EEE
SSPre2012
DHT, HCMUT
Linear Prediction (5)
Let
Then from (445),
Thus
25
.
1 , 1 2 , 1 1 , 1
1 , 2 22 21
1 , 1 12 11
1





.

\

=
+ + + +
+
+
−
n n
n
n
n
n
n
n
n n n
n
n n n
n
T T T
T T T
T T T
T
(446)
.
1
1 , 1
1 , 2
1 , 1
2
2
1





.

\

=






.

\

+ +
+
+
n n
n
n
n
n
n
n
n
T
T
T
a
a
a
σ
, 0
1
1 , 1
2
> =
+ + n n
n
n
T
σ
(447)
(448)
Dept. of Telecomm. Eng.
Faculty of EEE
SSPre2012
DHT, HCMUT
Linear Prediction (6)
and
Eq. (449) represents the best linear predictor coefficients, and they can be
evaluated from the last column of T
n
in (445). Using these, the best one
step ahead predictor in (435) taken the form
and from (448), the minimum mean square error is given by the
(n +1, n +1) entry of
26
.
1
1 , 1
1 , 2
1 , 1
1 , 1
2
1





.

\

=





.

\

+ +
+
+
+ +
n n
n
n
n
n
n
n n
n
n T
T
T
T
a
a
a
(449)
. ) (
1
ˆ
1
1 ,
1 , 1
1
∑
=
+
+ +
+


.

\

− =
n
i
i
n i
n
n n
n
n
X T
T
X
(450)
.
1 −
n
T
Dept. of Telecomm. Eng.
Faculty of EEE
SSPre2012
DHT, HCMUT
Linear Prediction (7)
From (436), since the onestep linear prediction error
we can represent (451) formally as follows
Thus, let
them from the above figure, we also have the representation
27
,
1 1 1 1 1
X a X a X a X
n n n n n n
+ + + + =
− − +
ε
(451)
n
n
n n n
z a z a z a X ε → + + + + →
− −
−
−
+
1
1
2
1
1
1
, 1 ) (
1
2
1
1 n
n n n
z a z a z a z A
− −
−
−
+ + + + = (452)
.
) (
1
1 +
→ →
n
n
n
X
z A
ε
Dept. of Telecomm. Eng.
Faculty of EEE
SSPre2012
DHT, HCMUT
Linear Prediction (8)
The filter
represents an AR(n) filter, and this shows that linear prediction leads to
an auto regressive (AR) model.
The polynomial in (452)(453) can be simplified using
(443)(444). To see this, we rewrite as
28
n
n n n
z a z a z a z A
z H
− −
−
−
+ + + +
= =
1
2
1
1
1
1
) (
1
) (
(453)
) (z A
n
) (z A
n
(
(
(
(
(
(
¸
(
¸
=
(
(
(
(
(
(
¸
(
¸
=
+ + + + + =
− − − − − − − − −
− −
−
− − −
2
n
1 1 ) 1 (
2
1
1 ) 1 (
1 2
1
) 1 (
2 1
0
0
0
] 1 , , , , [
1
] 1 , , , , [
1 ) (
σ
n
n n
n
n n
n n
n n
n
T z z z
a
a
a
z z z
z a z a z a z a z A
(454)
Dept. of Telecomm. Eng.
Faculty of EEE
SSPre2012
DHT, HCMUT
Linear Prediction (9)
To simplify (454), we can make use of the following matrix identity
Taking determinants, we get
In particular if we get
Using (457) in (454), with
29
.
0
0
1
(
¸
(
¸
−
=
(
¸
(
¸
−
(
¸
(
¸
−
B CA D C
A
I
AB I
D C
B A
(455)
.
1
B CA D A
D C
B A
−
− =
(456)
, 0 ≡ D
.
0
) 1 (
1
C
B A
A
B CA
n
−
=
−
(457)
(
(
(
¸
(
¸
= = =
− − − −
2
1 ) 1 (
0
, ], 1 , , , , [
n
n
n n
B T A z z z C
σ
Dept. of Telecomm. Eng.
Faculty of EEE
SSPre2012
DHT, HCMUT
Linear Prediction (10)
we get
Referring back to (443), using Cramer’s rule to solve for
we get
30
.
1
 
0 1 , , ,
0
0
0
 
) 1 (
) (
1 ) 1 (
1 0
*
2
*
1
1 1 0
*
1
2 1 0
2
1
2
− − − −
− −
−
− −
=
−
=
z z z
r r r r
r r r r
r r r r
T
z z
T
T
z A
n n
n n
n
n
n
n
n
n
n
n
n
n
σ
σ
(458)
), 1 (
1
=
+ n
a
1
 
 
 
1
2 0 1
1 0
2
1
= = =
−
−
−
+
n
n
n
n
n
n
n
n
T
T
T
r r
r r
a σ
σ
Dept. of Telecomm. Eng.
Faculty of EEE
SSPre2012
DHT, HCMUT
Linear Prediction (11)
or
Thus the polynomial (458) reduces to
31
. 0
 
 
1
2
> =
− n
n
n
T
T
σ (459)
. 1
1
 
1
) (
1
2
1
1
1 ) 1 (
1 0
*
2
*
1
1 1 0
*
1
2 1 0
1
n
n n
n n
n n
n
n
n
n
z a z a z a
z z z
r r r r
r r r r
r r r r
T
z A
− −
−
−
− − − −
− −
−
−
+ + + + =
=
(460)
Dept. of Telecomm. Eng.
Faculty of EEE
SSPre2012
DHT, HCMUT
Linear Prediction (12)
The polynomial in (453) can be alternatively represented as
in (460), and in fact represents a stable AR filter of
order n, whose input error signal is white noise of constant spectral
height equal to and output is
It can be shown that has all its zeros in provided
thus establishing stability.
32
) (z A
n
) ( ~
) (
1
) ( n AR
z A
z H
n
=
n
ε
  /  
1 − n n
T T .
1 + n
X
) (z A
n
1   > z 0   >
n
T
Dept. of Telecomm. Eng.
Faculty of EEE
SSPre2012
DHT, HCMUT
Linear Prediction (13)
Linear prediction error
From (459), the mean square error using n samples is given by
Suppose one more sample from the past is available to evaluate
( i.e., are available). Proceeding as above the new
coefficients and the mean square error can be determined.
From (459)(461),
Using another matrix identity it is easy to show that
33
. 0
 
 
1
2
> =
− n
n
n
T
T
σ
(461)
1 + n
X
0 1 1
, , , , X X X X
n n
−
2
1 + n
σ
.
 
 
1
2
1
n
n
n
T
T
+
+
= σ
(462)
).   1 (
 
 
 
2
1
1
2
1 +
−
+
− =
n
n
n
n
s
T
T
T
(463)
Dept. of Telecomm. Eng.
Faculty of EEE
SSPre2012
DHT, HCMUT
Linear Prediction (14)
Since we must have or for every n.
From (463), we have
or
since Thus the mean square error decreases as more and
more samples are used from the past in the linear predictor. In general
from (464), the mean square errors for the onestep predictor form a
monotonic nonincreasing sequence
whose limiting value
34
, 0   >
k
T 0 )   1 (
2
1
> −
+ n
s 1  
1
<
+ n
s
)   1 (
 
 
 
 
2
1
1
1
2 2
1
+
−
+
− =
+
n
n
n
n
n
s
T
T
T
T
n n
σ σ
, )   1 (
2 2
1
2 2
1 n n n n
s σ σ σ < − =
+ + (464)
. 1 )   1 (
2
1
< −
+ n
s
2 2 2
1
2
∞ +
→ > ≥ ≥ σ σ σ σ
k n n
. 0
2
≥
∞
σ
(465)
Dept. of Telecomm. Eng.
Faculty of EEE
SSPre2012
DHT, HCMUT
Linear Prediction (15)
Clearly, corresponds to the irreducible error in linear
prediction using the entire past samples, and it is related to the power
spectrum of the underlying process through the relation
where represents the power spectrum of
For any finite power process, we have
and since Thus
35
0
2
≥
∞
σ
) (nT X
2
1
exp ln ( ) 0.
2
XX
S d
π
π
σ ω ω
π
+
∞
−
(
= ≥
(
¸ ¸
∫
(466)
( ) 0
XX
S ω ≥
( ) ,
XX
S d
π
π
ω ω
+
−
< ∞
∫
) (nT X
( ( ) 0), ln ( ) ( ).
XX XX XX
S S S ω ω ω ≥ ≤
ln ( ) ( ) .
XX XX
S d S d
π π
π π
ω ω ω ω
+ +
− −
≤ < ∞
∫ ∫
(467)
Dept. of Telecomm. Eng.
Faculty of EEE
SSPre2012
DHT, HCMUT
Linear Prediction (16)
Moreover, if the power spectrum is strictly positive at every frequency,
i.e.,
then from (466)
and hence
i.e., for processes that satisfy the strict positivity condition in (468)
almost everywhere in the interval the final minimum mean square
error is strictly positive (see (470)). i.e., such processes are not
completely predictable even using their entire set of past samples, or they
are inherently stochastic, since the next output contains information that is
not contained in the past samples. Such processes are known as regular
stochastic processes, and their power spectrum is strictly positive.
36
( ) 0, in  ,
XX
S ω π ω π > < <
(468)
ln ( ) .
XX
S d
π
π
ω ω
+
−
> −∞
∫
2
1
exp ln ( ) 0
2
XX
S d e
π
π
σ ω ω
π
+
−∞
∞
−
(
= > =
(
¸ ¸
∫
(469)
(470)
), , ( π π −
Dept. of Telecomm. Eng.
Faculty of EEE
SSPre2012
DHT, HCMUT
Linear Prediction (17)
37
π − π
ω
) (ω
XX
S
Power spectrum of a regular stochastic process:
Conversely, if a process has the following power spectrum,
) (ω
XX
S
ω
π − π
1
ω
2
ω
such that in then from (470), ( ) 0
XX
S ω =
2 1
ω ω ω < < . 0
2
=
∞
σ
Dept. of Telecomm. Eng.
Faculty of EEE
SSPre2012
DHT, HCMUT
Linear Prediction (18)
Such processes are completely predictable from their past data samples. In
particular
is completely predictable from its past samples, since consists of
line spectrum.
in (471) is a shape deterministic stochastic process.
38
∑
+ =
k
k k k
t a nT X ) cos( ) ( φ ω
(471)
( )
XX
S ω
) (ω
XX
S
ω
1
ω
k
ω
) (nT X