Professional Documents
Culture Documents
(Introduction)
Estimation Theory
Course Objective
Extract information from noisy signals
Parameter Estimation Problem : Given a set of measured data
{x[0], x[1], . . . , x[N 1]}
which depends on an unknown parameter vector , determine an estimator
= g (x[0], x[1], . . . , x[N 1])
where g is some function.
Estimation Theory
Some Examples
Range estimation : We transmit a pulse that is reected by the aircraft.
An echo is received after second. Range is estimated from the equation
= c/2 where c is the lights speed.
System identication : The input of a plant is excited with u and the
output signal y is measured. If we have :
y [k] = G (q 1 , )u[k] + n[k]
where n[k] is the measurement noise. The model parameters are estimated
using u[k] and y [k] for k = 1, . . . , N.
DC level in noise : Consider a set of data {x[0], x[1], . . . , x[N 1]} that
can be modeled as :
x[n] = A + w [n]
where w [n] is some zero mean noise process. A can be estimated using the
measured data set.
(Introduction)
Estimation Theory
Outline
Classical estimation ( deterministic)
Minimum Variance Unbiased Estimator (MVU)
Cramer-Rao Lower Bound (CRLB)
Best Linear Unbiased Estimator (BLUE)
Maximum Likelihood Estimator (MLE)
Least Squares Estimator (LSE)
Bayesian estimation ( stochastic)
Minimum Mean Square Error Estimator (MMSE)
Maximum A Posteriori Estimator (MAP)
Linear MMSE Estimator
Kalman Filter
(Introduction)
Estimation Theory
References
Main reference :
Other references :
Lessons in Estimation Theory for Signal Processing, Communications
and Control. By Jerry M. Mendel, Prentice-Hall, 1995.
Probability, Random Processes and Estimation Theory for Engineers.
By Henry Stark and John W. Woods, Prentice-Hall, 1986.
(Introduction)
Estimation Theory
Estimation Theory
Random Variables
Random Variable : A rule X () that assigns to every element of a sample
space a real value is called a RV. So X is not really a variable that varies
randomly but a function whose domain is and whose range is some
subset of the real line.
Example : Consider the experiment of ipping a coin twice. The sample
space (the possible outcomes) is :
= {HH, HT , TH, TT }
We can dene a random variable X such that
X (HH) = 1, X (HT ) = 1.1, X (TH) = 1.6, X (TT ) = 1.8
Random variable X assigns to each event (e.g. E = {HT , TH} ) a
subset of the real line (in this case B = {1.1, 1.6}).
(Probability and Random Variables)
Estimation Theory
Estimation Theory
p(x) =
Properties :
(i )
(ii )
(iii )
dP(x)
dx
Estimation Theory
p()d
p(x) =
1
2 2
(x)2
22
The PDF has two parameters : , the mean and 2 the variance.
We note X N (, 2 ) when the random variable X has a normal
(Gaussian) distribution with the mean and the standard deviation .
Small means small variability (uncertainty) and large means large
variability.
Remark : Gaussian distribution is important because according to the
Central Limit Theorem the sum of N independent RVs has a PDF that
converges to a Gaussian distribution when N goes to innity.
(Probability and Random Variables)
Estimation Theory
p(x) =
Exponential ( > 0) :
p(x) =
Rayleigh ( > 0) :
p(x) =
Chi-square 2 :
p(x) =
where (z) =
1
ba
a<x <b
otherwise
1
exp(x/)u(x)
x
x 2
exp(
)u(x)
2
2 2
1
x n/21 exp( x2 ) for x > 0
2n (n/2)
0
for x < 0
Estimation Theory
Marginal PDF :
p(x) =
y1
p(x, y )dy
and
p(y ) =
p(x, y )dx
Estimation Theory
p(x, y )
p(x)p(y )
=
= p(x)
p(y )
p(y )
and
Estimation Theory
Estimation Theory
=
=
=
=
=
=
2 + 2
3 + 3 2
4 + 62 2 + 3 4
5 + 103 2 + 15 4
6 + 154 2 + 452 4 + 15 6
Estimation Theory
Central Moments
The r th central moment of X is dened as :
r
(x )r p(x)dx
E [(X ) ] =
Estimation Theory
Estimation Theory
Covariance
For two RVs X and Y , the covariance is dened as
xy =
xy = E [(X x )(Y y )]
(x x )(y y )p(x, y )dxdy
Estimation Theory
Estimation Theory
Random Vectors
Random Vector : is a vector of random variables 1 :
x = [x1 , x2 , . . . , xn ]T
Expectation Vector : x = E (x) = [E (x1 ), E (x2 ), . . . , E (xn )]T
Covariance Matrix : Cx = E [(x x )(x x )T ]
Cx is an n n symmetric matrix which is assumed to be positive
denite and so invertible.
The elements of this matrix are : [Cx ]ij = E {[xi E (xi )][xj E (xj )]}.
If the random variables are uncorrelated then Cx is a diagonal matrix.
Multivariate Gaussian PDF :
1
1
T 1
exp (x x ) Cx (x x )
p(x) =
2
(2)n det(Cx )
1. In some books (including our main reference) there is no distinction between random
variable X and its specific value x. From now on we adopt the notation of our reference.
(Probability and Random Variables)
Estimation Theory
Random Processes
Discrete Random Process : x[n] is a sequence of random variables
dened for every integer n.
Mean value : is dened as E (x[n]) = x [n].
Autocorrelation Function (ACF) : is dened as
rxx [k, n] = E (x[n]x[n + k])
Wide Sense Stationary (WSS) : x[n] is WSS if its mean and its
autocorrelation function (ACF) do not depend on n.
Autocovariance function : is dened as
cxx [k] = E [(x[n] x )(x[n + k] x )] = rxx [k] 2x
Cross-correlation Function (CCF) : is dened as
rxy [k] = E (x[n]y [n + k])
Cross-covariance function : is dened as
cxy [k] = E [(x[n] x )(y [n + k] y )] = rxy [k] x y
(Probability and Random Variables)
Estimation Theory
Power Spectral Density : The Fourier transform of ACF and CCF gives
the Auto-PSD and Cross-PSD :
Pxx (f ) =
Pxy (f ) =
k=
k=
Discrete White Noise : is a discrete random process with zero mean and
rxx [k] = 2 [k] where [k] is the Kronecker impulse function. The PSD of
white noise becomes Pxx (f ) = 2 and is completely at with frequency.
(Probability and Random Variables)
Estimation Theory
Estimation Theory
Estimation Theory
n = 0, 1, . . . , N 1
Suppose that w [n] N (0, 2 ) and is uncorrelated with all the other
samples. Letting = [A B] and x = [x[0], x[1], . . . , x[N 1]] the PDF is :
N1
N1
1
1
2
p(x; ) =
p(x[n]; ) =
exp 2
(x[n] A Bn)
2
( 2 2 )N
n=0
n=0
The quality of any estimator for this problem is related to the assumptions
on the data model. In this example, linear trend and WGN PDF
assumption.
Estimation Theory
Estimation Theory
n = 0, 1, . . . , N 1
2 = x[0]
A
1 = 0.95 and A
2 = 0.98. Which estimator is better ?
Suppose that A = 1, A
Estimation Theory
Unbiased Estimators
An estimator that on the average yield the true value is unbiased.
Mathematically
=0
E ( )
for a < < b
1 and A
2 :
Lets compute the expectation of the two estimators A
N1
N1
N1
1
1
1
E (x[n]) =
E (A + w [n]) =
(A + 0) = A
E (A1 ) =
N
N
N
n=0
n=0
n=0
2 ) = E (x[0]) = E (A + w [0]) = A + 0 = A
E (A
Both estimators are unbiased. Which one is better ?
Now, lets compute the variance of the two estimators :
N1
N1
1
1
1
2
1 ) = var
var(A
x[n] = 2
var(x[n]) = 2 N 2 =
N
N
N
N
n=0
n=0
Estimation Theory
Unbiased Estimators
Remark : When several unbiased estimators of the same parameters from
independent set of data are available, i.e., 1 , 2 , . . . , n , a better estimator
can be obtained by averaging :
1
i
=
n
n
=
E ()
i =1
var() = 2
var(i ) = 2 n var(i ) =
n
n
n
i =1
Estimation Theory
Estimation Theory
Estimation Theory
Spring 2013
31 / 152
Estimation Theory
Estimation Theory
var() E
2
An unbiased estimator that attains the CRLB can be found i :
ln p(x; )
= I ()(g (x) )
for some functions g (x) and I (). The estimator is = g (x) and the
minimum variance is 1/I ().
(Cramer-Rao Lower Bound)
Estimation Theory
1
ln p(x[0], A) = ln 2 2 2 (x[0] A)2
2
Then
ln p(x[0]; A)
1
= 2 (x[0] A)
A
2 ln p(x[0]; A)
1
= 2
2
A
According to Theorem :
2
var (A)
and
I (A) =
1
2
= g (x[0]) = x[0]
and A
Estimation Theory
n = 0, 1, . . . , N 1
N1
1
exp 2
(x[n] A)2
p(x; A) =
2
( 2 2 )N
n=0
Then
ln p(x; A)
A
N1
1
(x[n] A)2
ln[(2 2 )N/2 ] 2
A
2
n=0
N1
N1
1
N
1
(x[n] A) = 2
x[n] A
2
N
n=0
n=0
According to Theorem :
2
var (A)
N
(Cramer-Rao Lower Bound)
N
and I (A) = 2
N1
1
and A = g (x[0]) =
x[n]
N
Estimation Theory
n=0
Transformation of Parameters
If it is desired to estimate = g (), then the CRLB is :
2
1 2
ln p(x; )
g
var() E
2
Denition
Ecient estimator : An unbiased estimator that attains the CRLB is said
to be ecient.
Example : Knowing that x =
N1
1
x[n] is an ecient estimator for A, is
N
n=0
Estimation Theory
Transformation of Parameters
Solution : Knowing that x N (A, 2 /N), we have :
E (
x 2 ) = E 2 (
x ) + var(
x ) = A2 +
2
= A2
N
2
2
+ 3( )2
N
N
Therefore :
2
3 4
2
4A2 2 2 4
2
var(
x ) = A + 6A
+ 2 A +
=
+ 2
N
N
N
N
N
2
Estimation Theory
Transformation of Parameters
Remarks :
2 = x2 is biased and not ecient.
The estimator A
As N the bias goes to zero and the variance of the estimator
approaches the CRLB. This type of estimators are called
asymptotically ecient.
General Remarks :
is an
If g () = a + b is an ane function of , then g
() = g ()
var(g ())
var()
Estimation Theory
Estimation Theory
ln p(x;)
A2
2 ln p(x;)
A2
2 ln p(x;)
A2
2 ln p(x;)
(2 )2
=
N
2
0
N
24
The matrix is diagonal (just for this example) and can be easily inverted to
yield :
var()
2
N
2 )
var(
2 4
N
Estimation Theory
Spring 2013
41 / 152
Transformation of Parameters
If it is desired to estimate = g(), and the CRLB for the covariance of
is I1 (), then :
2
1 T
ln p(x; )
g
g
E
C
1
2
2
So the CRLB is :
2A
var(
)
2
(Cramer-Rao Lower Bound)
A2
4
N
2
1
N
24
Estimation Theory
2A
2
A2
4
=
4 + 22
N
Spring 2013
42 / 152
=N 1
=N p
=p1
=N 1
observation vector
observation matrix (known, rank p)
vector of parameters to be estimated
noise vector with PDF N (0, 2 I)
Compute the CRLB and the MVU estimator that achieves this bound.
Step 1 : Compute ln p(x; ).
2 ln p(x; )
and the covariance
Step 2 : Compute I() = E
2
matrix of : C = I1 ().
Step 3 : Find the MVU estimator g(x) by factoring
ln p(x; )
= I()[g(x) ]
(Linear Models)
Estimation Theory
Spring 2013
43 / 152
2
1 T
[H x HT H]
=
2
2
1
ln p(x; )
= 2 HT H
Then
I() = E
2
2
Therefore :
C = I1 () = 2 (HT H)1
= g(x) = (HT H)1 HT x
(Linear Models)
Estimation Theory
Spring 2013
44 / 152
(Linear Models)
Estimation Theory
Spring 2013
45 / 152
1
0
0
1
1
1
1
2
4
..
..
..
.
.
.
1 N 1 (N 1)2
..
.
0
1
2p
..
.
(N 1)p
N(p+1)
Estimation Theory
Spring 2013
46 / 152
M
2kn
2kn
)+
) + w [n]
bk sin(
N
N
M
ak cos(
k=1
k=1
hak
1
cos( 2k
N )
2k2
cos( N )
..
.
hbk
)
cos( 2k(N1)
N
1
sin( 2k
N )
2k2
sin( N )
..
.
)
sin( 2k(N1)
N
Estimation Theory
Spring 2013
47 / 152
2
N I),
we have :
2
= [(ha1 )T x, . . . , (haM )T x , (hb1 )T x, . . . , (hbM )T x]T
N
which is the same as the standard solution :
N1
N1
2
2kn
2
2kn
, bk =
x[n] cos
x[n] sin
ak =
N
N
N
N
n=0
n=0
2 2
I
N
Estimation Theory
Spring 2013
48 / 152
p1
h[k]u[n k] + w [n]
n = 0, 1, . . . , N 1
k=0
u[0]
0
0
h[0]
u[1]
u[0]
0
h[1]
H
=
=
..
..
..
..
...
.
.
.
.
h[p 1] p1
u[N 1] u[N 2] u[N p] Np
The MVU estimate is = (HT H)1 HT x with C = 2 (HT H)1 .
(Linear Models)
Estimation Theory
Spring 2013
49 / 152
Estimation Theory
Spring 2013
50 / 152
w [0]
x[0]
1
1
x[1] 1
r w [1]
= .. A + .. +
..
..
.
.
.
.
x[N 1]
r N1
w [N 1]
=
with var(A)
n=0
(Linear Models)
Estimation Theory
Spring 2013
2
N
51 / 152
= Ax
= AE (x) =
E ()
Estimation Theory
Spring 2013
52 / 152
N1
an x[n] = aT x
n = 0, 1, . . . , N 1
where a = [a0 , a1 , . . . , aN1 ]T
n=0
2
N1
an E (x[n]) =
n=0
3
Estimation Theory
Spring 2013
53 / 152
N1
an x[n] = aT x
n=0
2
= aT E (x) = aT s =
Restrict estimate to be unbiased : E ()
then aT s = 1 where s = [s[0], s[1], . . . , s[N 1]]T
Minimize aT Ca subject to aT s = 1.
J = aT Ca + (aT s 1)
and
=
var()
Estimation Theory
1
sT C1 s
Spring 2013
54 / 152
Estimation Theory
Spring 2013
55 / 152
2
0
0
02
0
0 0
0 1
0 2
0
0
12
1
C = ..
.. . .
.. C = ..
.
.
.
.
.
.
.
.
.
.
.
.
.
.
2
0 0 N1
0 0 21
N1
H)
H C
N1 1 N1
1
x[n]
x=
2
n
n2
n=0
C = (H C
H)
n=0
N1 1
1
=
n2
n=0
Estimation Theory
Spring 2013
56 / 152
Estimation Theory
Spring 2013
57 / 152
Estimation Theory
Spring 2013
58 / 152
N1
1
1
(x[n] A)2
p(x; A) =
N exp
2A
(2A) 2
n=0
n=0
Estimation Theory
Spring 2013
59 / 152
Stochastic Convergence
Convergence in distribution : Let {p(xN )} be a sequence of PDF. If
there exists a PDF p(x) such that
lim p(xN ) = p(x)
Example
Consider N independent RV x1 , x2 , . . . , xN with mean and nite variance
N
1
xn , then according to the Central Limit Theorem
2 . Let xN =
N
n=1
xN
converges in distribution to z N (0, 1).
(CLT), zN = N
Estimation Theory
Spring 2013
60 / 152
Stochastic Convergence
Convergence in probability : Sequence of random variables {xN }
converges in probability to the random variable x if for every > 0
lim Pr {|xN x| > } = 0
Example
Consider N independent RV x1 , x2 , . . . , xN with mean and nite variance
N
1
xn . Then according to the Law of Large Numbers
2 . Let xN =
N
n=1
(LLN), zN converges in probability to .
(Maximum Likelihood Estimation)
Estimation Theory
Spring 2013
61 / 152
Estimation Theory
Spring 2013
62 / 152
Consistency
Estimator N is a consistent estimator of if for every > 0
plim(N ) =
Remarks :
If N is asymptotically unbiased and its limiting variance is zero then
it converges to in mean-square.
If N converges to in mean-square, then the estimator is consistent.
Asymptotic unbiasedness does not imply consistency and vice versa.
plim can be treated as an operator, e.g. :
plim(xy ) = plim(x)plim(y ) ;
plim
plim(x)
x
=
y
plim(y )
Estimation Theory
Spring 2013
63 / 152
Estimation Theory
Spring 2013
64 / 152
N1
1
exp
(x[n] A)2
p(x; A) =
N
2
2
2
(2 ) 2
n=0
N1
1
ln p(x; A)
= 2
(x[n] A) = 0
A
n=0
N1
=0
x[n] N A
n=0
N1
= x = 1
x[n]
which leads to A
N
n=0
Estimation Theory
Spring 2013
65 / 152
= g ()
where is the MLE of .
It can be proved using the property of the consistent estimators.
If = g () is a one-to-one function, then
max
{:=g ()}
p(x; )
and
Estimation Theory
Spring 2013
66 / 152
Example
Consider DC level in WGN and nd MLE of = A2 . Since g () is not a
one-to-one function then :
=
(Maximum Likelihood Estimation)
arg
max
<A<
2
2 = x2
p(x; A) = A
Estimation Theory
Spring 2013
67 / 152
ln p(x; )
A
ln p(x; )
2
N1
N
1
= 2+ 4
(x[n] A)2
2
2
n=0
n=0
N1
1
2
(x[n] x)2
=
N
n=0
Estimation Theory
Spring 2013
68 / 152
Estimation Theory
Spring 2013
69 / 152
Then
HT C1 (x H) = 0
= (HT C1 H)1 HT C1 x
Estimation Theory
Spring 2013
70 / 152
k+1
2 ln p(x; )
= k
T
1
ln p(x; )
=k
Remarks :
The Hessian can be replaced by the negative of its expectation, the
Fisher information matrix I().
This method suers from convergence problems (local maximum).
Typically, for large data length, the log-likelihood function becomes
more quadratic and the algorithm will produce the MLE.
Estimation Theory
Spring 2013
71 / 152
(Least Squares)
Estimation Theory
Spring 2013
72 / 152
N1
n=0
Estimation Theory
Spring 2013
73 / 152
Example
Estimate the DC level of a signal. We observe x[n] = A + [n] for
n = 0, . . . , N 1 and the LS criterion is :
J(A) =
N1
(x[n] A)2
n=0
J(A)
= 2
(x[n] A) = 0
A
N1
n=0
(Least Squares)
Estimation Theory
N1
1
A=
x[n]
N
n=0
Spring 2013
74 / 152
N1
n=0
= xT x 2xT H + T HT H
where H is full rank. The gradient is
J()
= 2HT x + 2HT H = 0
= (HT H)1 HT x
Estimation Theory
Spring 2013
75 / 152
Estimator
Assumption
Estimate
No probabilistic assumption
= (HT H)1 HT x
BLUE
= (HT H)1 HT x
MLE
= (HT H)1 HT x
MVUE
= (HT H)1 HT x
LSE
Estimation Theory
Spring 2013
76 / 152
Estimation Theory
Spring 2013
77 / 152
Geometrical Interpretation
Recall the general signal model s = H. If we denote the column of H by
hi we have :
1
p
2
i hi
s = [h1 h2 hp ] . =
..
p
i =1
Estimation Theory
Spring 2013
78 / 152
=0
HT (x H)
= (HT H)1 HT x
Estimation Theory
Spring 2013
79 / 152
HT H =
< h1 , hp >
< h2 , hp >
..
.
< hp , hp >
s = H =
p
i hi =
i =1
p
(hT
i x)hi
i =1
Estimation Theory
Spring 2013
80 / 152
Estimation Theory
Spring 2013
81 / 152
Example
Consider LS estimate of DC level :
N1
1
A[N 1] =
x[n]. We have :
N
n=0
1
1
x[n] =
N +1
N +1
N
A[N]
=
n=0
N1
1
N
x[n] + x[N]
N
n=0
1
N
A[N 1] +
x[N]
=
N +1
N +1
1])
1] + 1 (x[N] A[N
= A[N
N +1
new estimate = old estimate + gain prediction error
(Least Squares)
Estimation Theory
Spring 2013
82 / 152
x[n]
n2
1
n=0 n2
1] = n=0
A[N
N1
1] + 1/N
x[N] A[N
A[N]
= A[N
N1 1
n=0 n2
1] + K [N] x[N] A[N
1]
= A[N
The gain factor K [N] can be reformulated as :
K [N] =
1])
var(A[N
1]) + 2
var(A[N
(Least Squares)
Estimation Theory
Spring 2013
83 / 152
Estimation Theory
Spring 2013
84 / 152
n2
[n 1]h[n]
+ hT [n][n 1]h[n]
Covariance Update :
[n] = (I K[n]hT [n])[n 1]
The following Lemma is used to compute the updates :
Estimation Theory
Spring 2013
85 / 152
1
= (HT H)1 HT x (HT H)1 AT
2
= (HT H)1 AT
2
Estimation Theory
Spring 2013
86 / 152
=k
where g() =
sT ()
[x s()] and
N1
2 s[n] s[n] s[n]
[g()]i
=
(x[n] s[n])
j
i j
j i
n=0
Around the solution [x s()] is small so the rst term in the Jacobian
can be neglected. This makes this method equivalent to the Gauss-Newton
algorithm which is numerically more robust.
(Least Squares)
Estimation Theory
Spring 2013
87 / 152
Estimation Theory
Spring 2013
88 / 152
n = 0, 1, . . . , N 1
Estimation Theory
Spring 2013
89 / 152
= arg max
xT H()[HT ()H()]1 HT () x
Estimation Theory
Spring 2013
90 / 152
n = 0, 1, . . . , N 1
1
1
1
2
r
r3
r
H(r ) = ..
..
..
.
.
.
Lets take
r N1 r 2(N1) r 3(N1)
Step 1 : Maximize xT H(r )[HT (r )H(r )]1 HT (r ) x to obtain r .
Step 2 : Compute = [HT (r )H(r )]1 HT (r )x.
(Least Squares)
Estimation Theory
Spring 2013
91 / 152
Estimation Theory
Spring 2013
92 / 152
Introduction
Classical Approach :
Assumes is unknown but deterministic.
Some prior knowledge on cannot be used.
Variance of the estimate may depend on .
MVUE may not exist.
In Monte-Carlo Simulations, we do M runs for each xed and then
compute sample mean and variance for each (no averaging over ).
Bayesian Approach :
Assumes is random with a known prior PDF, p().
We estimate a realization of based on the available data.
Variance of the estimate does not depend on .
A Bayesian estimate always exists
In Monte-Carlo simulations, we do M runs for randomly chosen and
then we compute sample mean and variance over all values.
(The Bayesian Philosophy)
Estimation Theory
Spring 2013
93 / 152
Bayesian MSE :
Bayesian MSE is not a function of and can be minimized to nd an
estimator.
2
2 p(x, )d xd
[ (x)]
Bmse() = E {[ (x)] } =
Note that the E {} is wrt the joint PDF of x and .
(The Bayesian Philosophy)
Estimation Theory
Spring 2013
94 / 152
[A A] p(A|x)dA p(x)d x
Bmse() =
[A A] p(x, )dxdA =
where we used Bayes theorem : p(x, ) = p(A|x)p(x).
Since p(x) 0 for all x, we need only minimize the integral in brackets.
We set the derivative equal to zero :
2
p(A|x)dA = 2 Ap(A|x)dA + 2A
p(A|x)dA
[A A]
A
which results in
=
A
Ap(A|x)dA = E (A|x)
Estimation Theory
Spring 2013
95 / 152
p(x|A)p(A)
p(x|A)p(A)
="
p(x)
p(x|A)p(A)dA
"
p(x) is the marginal PDF dened as p(x) = p(x, A)dA.
The integral in the denominator acts as a normalization of
p(x|A)p(A) such the integral of p(A|x) be equal to 1.
p(x|A) has exactly the same form as p(x; A) which is used in the
classical estimation.
The MMSE estimator always exists and can be computed by :
"
= " Ap(x|A)p(A)dA
A
p(x|A)p(A)dA
(The Bayesian Philosophy)
Estimation Theory
Spring 2013
96 / 152
A0
N1
1
1
1
A
exp
(x[n] A)2 dA
2A0 A0 (2 2 )N/2
2 2
n=0
=
A
A0
N1
1
1
1
2
exp
(x[n] A) dA
2A0 A0 (2 2 )N/2
2 2
n=0
Before collecting the data the mean of the prior PDF p(A) is the best
estimate while after collecting the data the best estimate is the mean
of posterior PDF p(A|x).
Choice of p(A) is crucial for the quality of the estimation.
Only Gaussian prior PDF leads to a closed-form estimator.
Conclusion : For an accurate estimator choose a prior PDF that can be
physically justied. For a closed-form estimator choose a Gaussian prior
PDF.
(The Bayesian Philosophy)
Estimation Theory
Spring 2013
97 / 152
1 x E (x)
1
x E (x)
1
exp
C
p(x, y ) =
y E (y )
y E (y )
2
2 det(C)
with covariance matrix :
C=
var(x)
cov(x, y )
cov(x, y ) var(y )
Estimation Theory
Spring 2013
98 / 152
= 2 = A2
var(A)
A|x
(The Bayesian Philosophy)
2
cov2 (A, x[0])
= A2 (1 2 A 2 )
var(x)
+ A
Estimation Theory
Spring 2013
99 / 152
1 x E (x)
1
x E (x)
1
exp
C
p(x, y ) =
k+l
y E (y)
y E (y)
2
(2) 2 det(C)
with covariance matrix :
C=
Cxx
Cyx
Cxy
Cyy
= Cyy Cyx C1
xx Cxy
Estimation Theory
Spring 2013
100 / 152
Cx
= E {( )[H( ) + w]T }
= E {( )( )T }HT = C HT
Estimation Theory
Spring 2013
101 / 152
Estimation Theory
Spring 2013
102 / 152
n = 0, . . . , N 1,
A N (A , A2 ),
w [n] N (0, 2 )
A2 T 1
1 ) =I
2
E (A|x) = A +
(The Bayesian Philosophy)
A2
A2 +
2
N
11T
2
2
A
+ 1T 1
=I
11T
2
2
A
+N
(
x A ) and var(A|x) =
Estimation Theory
2 2
N A
2
2
A + N
Spring 2013
103 / 152
Nuisance Parameters
Denition (Nuisance Parameter)
Suppose that and are unknown parameters but we are only interested
in estimating . In this case is called a nuisance parameter.
In the classical approach we have to estimate both but in the
Bayesian approach we can integrate it out.
Note that in the Bayesian approach we can nd p(|x) from p(, |x)
as a marginal PDF :
p(|x) = p(, |x)d
We can also express it as
p(x|)p()
p(|x) = "
p(x|)p()d
where
p(x|) =
p(x|, )p(|)d
Estimation Theory
Spring 2013
104 / 152
Estimation Theory
Spring 2013
105 / 152
= E {2 } = E {( )
2}
R()
2. Absolute :
= E {||} = E {| |}
R()
3. Hit-or-Miss :
= E {C ()}
R()
where C () =
Estimation Theory
0 || <
1 ||
Spring 2013
106 / 152
C ( )p(x,
)d xd =
g ()p(x)dx
where
=
g ()
C ( )p(|x)d
should be minimized.
Quadratic
= E {( )
2 } =Bmse()
which leads to the MMSE
For this case R()
estimator with
= E (|x)
so is the mean of the posterior PDF p(|x).
Estimation Theory
Spring 2013
107 / 152
g ()
=
( )p(|x)d +
( )p(|x)d
Estimation Theory
Spring 2013
108 / 152
= 1
1 p(|x)d +
1 p(|x)d
p(|x)d
Estimation Theory
Spring 2013
109 / 152
p(|x)d 2 d p d 1 = 1 p(|x)d
1 = 1
In vector form we have :
"
p(|x)d
1
"
2 p(|x)d
=
= p(|x)d = E (|x)
..
.
"
p p(|x)d
Similarly
Bmse(i ) = [C|x ]ii p(x)d x
(General Bayesian Estimators)
Estimation Theory
Spring 2013
110 / 152
Estimation Theory
Spring 2013
111 / 152
where
p(|x) =
p(x|)p()
p(x)
An equivalent maximization is :
= arg max p(x|)p()
or
= arg max[ln p(x|) + ln p()]
Estimation Theory
Spring 2013
112 / 152
A0
N1
1
1
1
2
A
exp
(x[n] A) dA
2A0 A0 (2 2 )N/2
2 2
n=0
=
A
A0
N1
1
1
1
exp
(x[n] A)2 dA
2A0 A0 (2 2 )N/2
2 2
n=0
x < A0
A0
=
x
A0 x A0
The MAP estimator is given as : A
A0
x > A0
Remark : The main advantage of MAP estimator is that for non jointly
Gaussian PDFs it may lead to explicit solutions or less computational
eort.
(General Bayesian Estimators)
Estimation Theory
Spring 2013
113 / 152
p(|x)d2 . . . dp
Estimation Theory
Spring 2013
114 / 152
Performance Description
In the classical estimation the mean and the variance of the estimate
(or its PDF) indicates the performance of the estimator.
In the Bayesian approach the PDF of the estimate is dierent for each
realization of . So a good estimator should perform well for every
possible value of .
The estimation error = is a function of two random variables
( and x).
The mean and the variance of the estimation error, wrt two random
variables, indicates the performance of the estimator.
The mean value of the estimation error is zero. So the estimates are
unbiased (in the Bayesian sense).
= Ex, ( E (|x)) = Ex [E|x () E|x (|x)] = Ex (0) = 0
Ex, ( )
The variance of the estimate is Bmse :
2 ] = Bmse()
Estimation Theory
Spring 2013
115 / 152
Estimation Theory
Spring 2013
116 / 152
n
s 1
n = 0, 1, . . . , N 1
m=0
where s[n] is a WSS Gaussian process with known ACF, and w [n] is WGN
with variance 2 .
x[0]
x[1]
.
..
x[N 1]
h[0]
h[1]
.
..
h[N 1]
x = H + w
0
h[0]
.
..
h[N 2]
.
..
0
0
.
..
h[N ns ]
s[0]
s[1]
.
..
s[ns 1]
w [0]
w [1]
.
..
w [N 1]
s = Cs HT (HCs HT + 2 I)1 x
Estimation Theory
Spring 2013
117 / 152
rss [0]
x[0]
x[0] =
2
rss [0] +
+1
u2
(a)|k|
1 a2
Estimation Theory
118 / 152
Estimation Theory
Spring 2013
119 / 152
Introduction
Problems with general Baysian estimators :
dicult to determine in closed form,
need intensive computations,
involve multidimensional integration (MMSE) or multidimensional
maximization (MAP),
can be determined only under the jointly Gaussian assumption.
What can we do if the joint PDF is not Gaussian or unknown ?
Keep the MMSE criterion ;
Restrict the estimator to be linear.
This leads to the Linear MMSE Estimator :
Which is a suboptimal estimator that can be easily implemented.
It needs only the rst and second moments of the joint PDF.
It is analogous to BLUE in classical estimation.
In practice, this estimator is termed Wiener lter.
(Linear Bayesian Estimators)
Estimation Theory
Spring 2013
120 / 152
N1
an x[n] + aN = aT x + aN
n=0
E ( aT x aN )2 = 2E ( aT x aN )
aN
Setting this equal to zero gives : aN = E () aT E (x).
(Linear Bayesian Estimators)
Estimation Theory
Spring 2013
121 / 152
Bmse()
= 2Cxx a 2Cx = 0
a
T
which results in a = C1
xx Cx and leads to (Note that : Cx = Cx ) :
1
T 1
= aT x + aN = CT
x Cxx x + E () Cx Cxx E (x)
= E () + Cx C1
xx (x E (x))
(Linear Bayesian Estimators)
Estimation Theory
Spring 2013
122 / 152
= C Cx C1
xx Cx
Remarks :
Estimation Theory
Spring 2013
123 / 152
Cx
where
A
A3 0
A20
1
= E (A ) =
A
dA =
=
2A0
6A0 A0
3
A0
A2
A20 /3
x
A20 /3 + 2 /N
A0
Estimation Theory
Spring 2013
124 / 152
A0
N1
1
2
A exp
(x[n] A) dA
2 2
A0
n=0
MMSE : A =
N1
A0
1
exp
(x[n] A)2 dA
2 2
A0
n=0
x < A0
A0
x
A0 x A0
MAP : A =
A0
x > A0
=
LMMSE : A
A20 /3
x
A20 /3 + 2 /N
Estimation Theory
Spring 2013
125 / 152
Geometrical Interpretation
Vector space of random variables : The set of scalar zero mean random
variables is a vector space.
The zero length vector of the set is a RV with zero variance.
For any real scalar a, ax is another zero mean RV in the set.
For two RVs x and y , x + y = y + x is a RV in the set.
For two RVs x and y , the inner product is : < x, y >= E (xy )
< y, x >
E (yx)
x=
x
2
x
x2
Estimation Theory
Spring 2013
126 / 152
Geometrical Interpretation
The LMMSE estimator can be determined using the vector space
viewpoint. We have :
N1
an x[n]
=
n=0
E ( )x[n]
= 0 for n = 0, 1, . . . , N 1
(Linear Bayesian Estimators)
Estimation Theory
Spring 2013
127 / 152
Geometrical Interpretation
The LMMSE estimator can be determined by solving the following
equations :
N1
am x[m] x[n] = 0
n = 0, 1, . . . , N 1
E
m=0
or
N1
n = 0, 1, . . . , N 1
am E (x[m]x[n]) = E (x[n])
m=0
E (x 2 [0])
E (x[1]x[0])
..
.
E (x[N 1]x[0])
E (x[0]x[1])
E (x 2 [1])
..
.
E (x[N 1]x[1])
..
.
E (x[0]x[N 1])
E (x[1]x[N 1])
..
.
2
E (x [N 1])
Therefore
Cxx a = CT
x
a0
a1
..
.
aN1
E (x[0])
E (x[1])
..
.
E (x[N 1])
T
a = C1
xx Cx
Estimation Theory
Spring 2013
128 / 152
N1
Bmse(i ) = E (i i )2
Minimize
n=0
Therefore :
i = E (i ) + Ci x C1
xx x E (x)
*+,- *+,- * +, 1N NN
i = 1, 2, . . . , p
N1
and similarly
Bmse(i ) = [M ]ii
pN NN
where
M = C Cx C1
Cx
*+,- *+,- *+,- xx *+,pp
N1
Estimation Theory
pp
pN
Np
Spring 2013
129 / 152
Estimation Theory
Spring 2013
130 / 152
x[n] = A + w [n]
Estimator Update :
1] =
A[N
A2
x
A2 + 2 /N
1] + K [N](x[N] A[N
1])
A[N]
= A[N
where
K [N] =
1])
Bmse(A[N
1]) + 2
Bmse(A[N
1])
Bmse(A[N])
= (1 K [N])Bmse(A[N
(Linear Bayesian Estimators)
Estimation Theory
Spring 2013
131 / 152
2
E (x[0]x[1])
x[0] = 2 A 2 x[0]
2
E (x [0])
A +
Add to A[0]
the LMMSE estimator of A based on the innovation :
= A[0]
+
A[1]
E (A
x [1])
x[1]
E (
x 2 [1])
+,
*
the projection of A on
+ K [1](x[1] x[1|0])
= A[0]
x[1]
Estimation Theory
Spring 2013
132 / 152
x[n]
x [n|0,1,...,n1]
A[N]
=
N
K [n]
x [n] = A[N1]+K
[N]
x [N]
where
K [n] =
n=0
E (A
x [n])
E (
x 2 [n])
and
K [N] =
1])
Bmse(A[N
1])
2 + Bmse(A[N
1])
Bmse(A[N])
= (1 K [N])Bmse(A[N
(Linear Bayesian Estimators)
Estimation Theory
Spring 2013
133 / 152
M[n 1]h[n]
+ hT [n]M[n 1]h[n]
Minimum MSE Matrix Update :
K[n] =
n2
Estimation Theory
Spring 2013
134 / 152
[1]
= E ()
M[1] = C
Estimation Theory
Spring 2013
135 / 152
Wiener Filtering
Consider the signal model : x[n] = s[n] + w [n]
n = 0, 1, . . . , N 1
where the data and noise are zero mean with known covariance matrix.
There are three main problems concerning the Wiener Filters :
Filtering : Estimate = s[n] (scalar) based on the data set
x = [x[0], x[1], . . . , x[n]]. The signal sample is estimated
based on the the present and past data only.
Smoothing : Estimate = s = [s[0], s[1], . . . , s[N 1]] (vector) based on
the data set x = x[0], x[1], . . . , x[N 1] . The signal sample
is estimated based on the the present, past and future data.
Prediction : Estimate = x[N 1 + ] for positive integer based on the
data set x = [x[0], x[1], . . . , x[N 1]].
Remarks : All these problems are solved using the LMMSE estimator :
with the minimum Bmse :
M = C Cx C1
= Cx C1
xx x
xx Cx
(Linear Bayesian Estimators)
Estimation Theory
Spring 2013
136 / 152
and
1
s = Csx C1
xx x = Rss (Rss + Rww ) x = Wx
aT
nx
N1
h(n) [N 1 k]x[k]
k=0
Estimation Theory
Spring 2013
137 / 152
s [n] = a x =
n
h(n) [n k]x[k]
k=0
This represents a time-varying, causal FIR lter. The impulse response can
be computed as :
(Rss + Rww )a = rss
(Linear Bayesian Estimators)
Estimation Theory
138 / 152
Therefore :
T
x[N 1 + ] = (rxx )T R1
xx x = a x
N1
h[N k]x[k]
k=0
Estimation Theory
Spring 2013
139 / 152
Kalman Filters
(Kalman Filters)
Estimation Theory
Spring 2013
140 / 152
Introduction
Kalman lter is a generalization of the Wiener lter.
In Wiener lter we estimate s[n] based on the noisy observation
vector x[n]. We assume that s[n] is a stationary random process with
known mean and covariance matrix.
In Kalman lter we assume that s[n] is a non stationary random
process whose mean and covariance matrix vary according to a known
dynamical model.
Kalman lter is a sequential MMSE estimator of s[n]. If the signal
and noise are jointly Gaussian, then the Kalman lter is an optimal
MMSE estimator, if not, it is the optimal LMMSE estimator.
Kalman lter has many applications in Control theory for state
estimation.
It can be generalized to vector signals and noise (in contrast to
Wiener lter).
(Kalman Filters)
Estimation Theory
Spring 2013
141 / 152
n0
n+1
s[1] +
n
ak u[n k]
E (s[n]) = an+1 s
k=0
and
Cs [m, n] = E
&
)
s[m] E (s[m]) s[n] E (s[n])
= am+n+2 s2 + u2 amn
n
a2k
for m 0
k=0
Estimation Theory
Spring 2013
142 / 152
n0
Cs [m, n] = A
m
n+1 T
T
Cs A
+
Ak BQBT Anm+k
k=mn
Estimation Theory
Spring 2013
143 / 152
Assumptions :
u[n] is zero mean Gaussian noise with independent samples and
E (u 2 [n]) = u2 .
w [n] is zero mean Gaussian noise with independent samples and
E (w 2 [n]) = n2 (time-varying variance).
s[1] N (s , s2 ) (for simplicity we suppose that s = 0).
s[1], u[n] and w [n] are independent.
Objective : Develop a sequential MMSE
estimator to estimate s[n] based
on the data X[n] = x[0], x[1], . . . , x[n] . This estimator is the mean of
posterior PDF :
s [n|n] = E (s[n]x[0], x[1], . . . , x[n])
(Kalman Filters)
Estimation Theory
Spring 2013
144 / 152
(Kalman Filters)
Estimation Theory
Spring 2013
145 / 152
s [n|n 1] = E (as[n 1] + u[n]X[n 1])
= aE (s[n 1]X[n 1]) + E (u[n]X[n 1])
= as [n 1|n 1]
1] = s [n|n 1].
Finally we have :
s [n|n] = as [n 1|n 1] + K [n](x[n] s [n|n 1])
(Kalman Filters)
Estimation Theory
Spring 2013
146 / 152
Prediction :
s [n|n 1] = as [n 1|n 1]
Minimum Prediction MSE :
M[n|n 1] = a2 M[n 1|n 1] + u2
Kalman Gain :
K [n] =
n2
M[n|n 1]
+ M[n|n 1]
Correction :
s [n|n] = s [n|n 1] + K [n](x[n] s [n|n 1])
Minimum MSE :
M[n|n] = (1 K [n])M[n|n 1]
(Kalman Filters)
Estimation Theory
Spring 2013
147 / 152
Estimation Theory
Spring 2013
148 / 152
M[n|n 1]h[n]
n2 + hT [n]M[n|n 1]h[n]
Correction :
s[n|n] = s[n|n 1] + K[n](x[n] hT [n]s [n|n 1])
Minimum MSE Matrix (p p) :
M[n|n] = (I K[n]hT [n])M[n|n 1]
(Kalman Filters)
Estimation Theory
Spring 2013
149 / 152
Correction :
s[n|n] = s[n|n 1] + K[n](x[n] H[n]s [n|n 1])
Minimum MSE Matrix (p p) :
M[n|n] = (I K[n]H[n])M[n|n 1]
(Kalman Filters)
Estimation Theory
Spring 2013
150 / 152
s [n1|n1]
(Kalman Filters)
Estimation Theory
s [n|n1]
Spring 2013
151 / 152
The extended Kalman lter has exactly the same equations where A
is replaced with A[n 1].
A[n 1] and H[n] should be computed at each sampling time.
MSE matrices and Kalman gain can no longer be computed o line.
(Kalman Filters)
Estimation Theory
Spring 2013
152 / 152