Estimation Theory Slides in pdf form

© All Rights Reserved

47 views

Estimation Theory Slides in pdf form

© All Rights Reserved

- week 9
- Lecture01 Estimation Theory Handouts
- Chaptetr 1 Solution Steven M KAY
- 11.Statistical Approach to Estimation
- Econometric s
- Stats the best
- Spatial misalignment in time series studies of air pollution and health data
- fulltext
- Sampling Distribution
- SSR 2007 Lecture 1
- 03Gobburu
- Technical Efficiency of Cowpea Production in Osun State, Nigeria
- Mau05MathKalman
- Special Project
- Fama Macbeth Revisited
- Lehmann.Scheffe
- Performance Analysis of RGMs by Using Supervised Learning Techniques
- Kay - Solutions
- Statistical Concepts
- Uncertainties in Material Strength Geometric and Load Variables

You are on page 1of 152

(Introduction)

Estimation Theory

Course Objective

Extract information from noisy signals

Parameter Estimation Problem : Given a set of measured data

{x[0], x[1], . . . , x[N 1]}

which depends on an unknown parameter vector , determine an estimator

= g (x[0], x[1], . . . , x[N 1])

where g is some function.

identication, state estimation in control, etc.

(Introduction)

Estimation Theory

Some Examples

Range estimation : We transmit a pulse that is reected by the aircraft.

An echo is received after second. Range is estimated from the equation

= c/2 where c is the lights speed.

System identication : The input of a plant is excited with u and the

output signal y is measured. If we have :

y [k] = G (q 1 , )u[k] + n[k]

where n[k] is the measurement noise. The model parameters are estimated

using u[k] and y [k] for k = 1, . . . , N.

DC level in noise : Consider a set of data {x[0], x[1], . . . , x[N 1]} that

can be modeled as :

x[n] = A + w [n]

where w [n] is some zero mean noise process. A can be estimated using the

measured data set.

(Introduction)

Estimation Theory

Outline

Classical estimation ( deterministic)

Minimum Variance Unbiased Estimator (MVU)

Cramer-Rao Lower Bound (CRLB)

Best Linear Unbiased Estimator (BLUE)

Maximum Likelihood Estimator (MLE)

Least Squares Estimator (LSE)

Bayesian estimation ( stochastic)

Minimum Mean Square Error Estimator (MMSE)

Maximum A Posteriori Estimator (MAP)

Linear MMSE Estimator

Kalman Filter

(Introduction)

Estimation Theory

References

Main reference :

Estimation Theory

by Steven M. KAY, Prentice-Hall, 1993 (available in Library de La

Fontaine, RLC). We cover Chapters 1 to 14, skipping Chapter 5 and

Chapter 9.

Other references :

Lessons in Estimation Theory for Signal Processing, Communications

and Control. By Jerry M. Mendel, Prentice-Hall, 1995.

Probability, Random Processes and Estimation Theory for Engineers.

By Henry Stark and John W. Woods, Prentice-Hall, 1986.

(Introduction)

Estimation Theory

Estimation Theory

Random Variables

Random Variable : A rule X () that assigns to every element of a sample

space a real value is called a RV. So X is not really a variable that varies

randomly but a function whose domain is and whose range is some

subset of the real line.

Example : Consider the experiment of ipping a coin twice. The sample

space (the possible outcomes) is :

= {HH, HT , TH, TT }

We can dene a random variable X such that

X (HH) = 1, X (HT ) = 1.1, X (TH) = 1.6, X (TT ) = 1.8

Random variable X assigns to each event (e.g. E = {HT , TH} ) a

subset of the real line (in this case B = {1.1, 1.6}).

(Probability and Random Variables)

Estimation Theory

For any element in , the event {|X () x} is an important event.

The probability of this event

Pr [{|X () x}] = PX (x)

is called the probability distribution function of X .

Example : For the random variable dened earlier, we have :

PX (1.5) = Pr [{|X () 1.5}] = Pr [{HH, HT }] = 0.5

PX (x) can be computed for all x R. It is clear that 0 PX (x) 1.

Remark :

For the same experiment (throwing a coin twice) we could dene

another random variable that would lead to a dierent PX (x).

In most of engineering problems the sample space is a subset of the

real line so X () = and PX (x) is a continuous function of x.

(Probability and Random Variables)

Estimation Theory

The Probability Density Function, if it exists, is given by :

dPX (x)

dx

When we deal with a single random variable the subscripts are removed :

pX (x) =

p(x) =

Properties :

(i )

(ii )

(iii )

dP(x)

dx

x2

Pr [x1 < X x2 ] =

p(x)dx

x1

Estimation Theory

p()d

A random variable is distributed according to a Gaussian or normal

distribution if the PDF is given by :

p(x) =

1

2 2

(x)2

22

The PDF has two parameters : , the mean and 2 the variance.

We note X N (, 2 ) when the random variable X has a normal

(Gaussian) distribution with the mean and the standard deviation .

Small means small variability (uncertainty) and large means large

variability.

Remark : Gaussian distribution is important because according to the

Central Limit Theorem the sum of N independent RVs has a PDF that

converges to a Gaussian distribution when N goes to innity.

(Probability and Random Variables)

Estimation Theory

Uniform (b > a) :

p(x) =

Exponential ( > 0) :

p(x) =

Rayleigh ( > 0) :

p(x) =

Chi-square 2 :

p(x) =

where (z) =

1

ba

a<x <b

otherwise

1

exp(x/)u(x)

x

x 2

exp(

)u(x)

2

2 2

1

x n/21 exp( x2 ) for x > 0

2n (n/2)

0

for x < 0

Estimation Theory

Joint PDF : Consider two random variables X and Y then :

x2 y2

Pr [x1 < X x2 and y1 < Y y2 ] =

p(x, y )dxdy

x1

Marginal PDF :

p(x) =

y1

p(x, y )dy

and

p(y ) =

p(x, y )dx

knowing the value of Y .

Bayes Formula : Consider two RVs dened on the same probability space

then we have :

p(x, y )

p(x, y ) = p(x|y )p(y ) = p(y |x)p(x)

or

p(x|y ) =

p(y )

(Probability and Random Variables)

Estimation Theory

Two RVs X and Y are independent if and only if :

p(x, y ) = p(x)p(y )

p(x|y ) =

p(x, y )

p(x)p(y )

=

= p(x)

p(y )

p(y )

and

Remark : For a joint Gaussian pdf the contours of constant density is an

ellipse centered at (x , y ). For independent X and Y the major (or

minor) axis is parallel to x or y axis.

Estimation Theory

The expected value, if it exists, of a random variable X with PDF p(x) is

dened by :

E (X ) =

xp(x)dx

E {X + Y } = E {X } + E {Y }

E {aX } = aE {X }

The expected value of Y = g (X ) can be computed by :

E (Y ) =

g (x)p(x)dx

specic value of Y has occurred is :

xp(x|y )dx

E (X |Y ) =

Estimation Theory

The r th moment of X is dened as :

E (X r ) =

x r p(x)dx

Moments of Gaussian RVs : A Gaussian RV with N (, 2 ) has

moments of all orders in closed form

E (X )

E (X 2 )

E (X 3 )

E (X 4 )

E (X 5 )

E (X 6 )

=

=

=

=

=

=

2 + 2

3 + 3 2

4 + 62 2 + 3 4

5 + 103 2 + 15 4

6 + 154 2 + 452 4 + 15 6

Estimation Theory

Central Moments

The r th central moment of X is dened as :

r

(x )r p(x)dx

E [(X ) ] =

Central Moments of Gaussian RVs :

0

if r is odd

r

E [(X ) ] =

r (r 1)!! if r is even

where n!! denotes the double factorial that is the product of every odd

number from n to 1.

Estimation Theory

If X N (x , x2 ) then Z = (X x )/x N (0, 1).

If Z N (0, 1) then X = x Z + x N (x , x2 ).

If X N (x , x2 ) then Z = aX + b N (ax + b, a2 x2 ).

If X N (x , x2 ) and Y N (y , y2 ) are two independent RVs, then

aX + bY N (ax + by , ax2 + by2 )

The sum of square of n independent RV with standard normal

distribution N (0, 1) has a 2n distribution with n degree of freedom.

For large value of n, 2n converges to N (n, 2n).

standard normal distribution has the Rayleigh distribution.

Estimation Theory

Covariance

For two RVs X and Y , the covariance is dened as

xy =

xy = E [(X x )(Y y )]

(x x )(y y )p(x, y )dxdy

var(X + Y ) = x2 + y2 + 2xy

var(aX ) = a2 x2

Important formula : The relation between the variance and the mean of

X is given by

2 = E [(X )2 ] = E (X 2 ) 2E (X ) + 2

= E (X 2 ) 2

The variance is the mean of the square minus the square of the mean.

(Probability and Random Variables)

Estimation Theory

If xy = 0, then X and Y are uncorrelated and

E {XY } = E {X }E {Y }

X and Y called orthogonal if E {XY } = 0.

If X and Y are independent then they are uncorrelated.

p(x, y ) = p(x)p(y ) E {XY } = E {X }E {Y }

Uncorrelatedness does not imply the independence. For example, if X

is a normal RV with zero mean and Y = X 2 we have p(y |x) = p(y )

but

xy = E {XY } E {X }E {Y } = E {X 3 } 0 = 0

The correlation only shows the linear dependence between two RV so

is weaker than independence.

For Jointly Gaussian RVs, independence is equivalent to being

uncorrelated.

(Probability and Random Variables)

Estimation Theory

Random Vectors

Random Vector : is a vector of random variables 1 :

x = [x1 , x2 , . . . , xn ]T

Expectation Vector : x = E (x) = [E (x1 ), E (x2 ), . . . , E (xn )]T

Covariance Matrix : Cx = E [(x x )(x x )T ]

Cx is an n n symmetric matrix which is assumed to be positive

denite and so invertible.

The elements of this matrix are : [Cx ]ij = E {[xi E (xi )][xj E (xj )]}.

If the random variables are uncorrelated then Cx is a diagonal matrix.

Multivariate Gaussian PDF :

1

1

T 1

exp (x x ) Cx (x x )

p(x) =

2

(2)n det(Cx )

1. In some books (including our main reference) there is no distinction between random

variable X and its specific value x. From now on we adopt the notation of our reference.

(Probability and Random Variables)

Estimation Theory

Random Processes

Discrete Random Process : x[n] is a sequence of random variables

dened for every integer n.

Mean value : is dened as E (x[n]) = x [n].

Autocorrelation Function (ACF) : is dened as

rxx [k, n] = E (x[n]x[n + k])

Wide Sense Stationary (WSS) : x[n] is WSS if its mean and its

autocorrelation function (ACF) do not depend on n.

Autocovariance function : is dened as

cxx [k] = E [(x[n] x )(x[n + k] x )] = rxx [k] 2x

Cross-correlation Function (CCF) : is dened as

rxy [k] = E (x[n]y [n + k])

Cross-covariance function : is dened as

cxy [k] = E [(x[n] x )(y [n + k] y )] = rxy [k] x y

(Probability and Random Variables)

Estimation Theory

Some properties of ACF and CCF :

rxx [0] |rxx [k]|

Power Spectral Density : The Fourier transform of ACF and CCF gives

the Auto-PSD and Cross-PSD :

Pxx (f ) =

Pxy (f ) =

k=

rxy [k] exp(j2fk)

k=

Discrete White Noise : is a discrete random process with zero mean and

rxx [k] = 2 [k] where [k] is the Kronecker impulse function. The PSD of

white noise becomes Pxx (f ) = 2 and is completely at with frequency.

(Probability and Random Variables)

Estimation Theory

Estimation

Estimation Theory

Parameter Estimation Problem : Given a set of measured data

x = {x[0], x[1], . . . , x[N 1]}

which depends on an unknown parameter vector , determine an estimator

= g (x[0], x[1], . . . , x[N 1])

where g is some function.

function of : p(x; )

Example : Consider the problem of DC level in white Gaussian noise with

one observed data x[0] = + w [0] where w [0] has the PDF N (0, 2 ).

Then the PDF of x[0] is :

1

1

2

exp 2 (x[0] )

p(x[0]; ) =

2

2 2

(Minimum Variance Unbiased Estimation)

Estimation Theory

Example : Consider a data sequence that can be modeled with a linear

trend in white Gaussian noise

x[n] = A + Bn + w [n]

n = 0, 1, . . . , N 1

Suppose that w [n] N (0, 2 ) and is uncorrelated with all the other

samples. Letting = [A B] and x = [x[0], x[1], . . . , x[N 1]] the PDF is :

N1

N1

1

1

2

p(x; ) =

p(x[n]; ) =

exp 2

(x[n] A Bn)

2

( 2 2 )N

n=0

n=0

The quality of any estimator for this problem is related to the assumptions

on the data model. In this example, linear trend and WGN PDF

assumption.

Estimation Theory

Classical versus Bayesian estimation

If we assume is deterministic we will have a classical estimation

problem. The following method will be studied : MVU, MLE, BLUE,

LSE.

If we assume is a random variable with a known PDF, then we will

have a Bayesian estimation problem. In this case the data are

described as the joint PDF

p(x, ) = p(x|)p()

where p() summarizes our knowledge about before any data is

observed and p(x|) summarizes our knowledge provided by data x

conditioned on knowing . The following methods will be studied :

MMSE, MAP, Kalman Filter.

Estimation Theory

Consider the problem of estimating a DC level A in uncorrelated noise :

x[n] = A + w [n]

n = 0, 1, . . . , N 1

N1

1 = 1

A

x[n]

N

n=0

2 = x[0]

A

1 = 0.95 and A

2 = 0.98. Which estimator is better ?

Suppose that A = 1, A

performance can only be described by its PDF or

statistically (e.g. by Monte-Carlo simulation).

(Minimum Variance Unbiased Estimation)

Estimation Theory

Unbiased Estimators

An estimator that on the average yield the true value is unbiased.

Mathematically

=0

E ( )

for a < < b

1 and A

2 :

Lets compute the expectation of the two estimators A

N1

N1

N1

1

1

1

E (x[n]) =

E (A + w [n]) =

(A + 0) = A

E (A1 ) =

N

N

N

n=0

n=0

n=0

2 ) = E (x[0]) = E (A + w [0]) = A + 0 = A

E (A

Both estimators are unbiased. Which one is better ?

Now, lets compute the variance of the two estimators :

N1

N1

1

1

1

2

1 ) = var

var(A

x[n] = 2

var(x[n]) = 2 N 2 =

N

N

N

N

n=0

n=0

1)

var(A

(Minimum Variance Unbiased Estimation)

Estimation Theory

Unbiased Estimators

Remark : When several unbiased estimators of the same parameters from

independent set of data are available, i.e., 1 , 2 , . . . , n , a better estimator

can be obtained by averaging :

1

i

=

n

n

=

E ()

i =1

n

1

1

var(i )

var() = 2

var(i ) = 2 n var(i ) =

n

n

n

i =1

It is not the case for biased estimators, no matter how many estimators are

averaged.

(Minimum Variance Unbiased Estimation)

Estimation Theory

The most logical criterion for estimation is the Mean Square Error (MSE) :

= E [( )2 ]

mse()

Unfortunately this type of estimators leads to unrealizable estimators (the

estimator will depend on ).

= E {[ E ()

+ E ()

]2 } = E {[ E ()

+ b()]2 }

mse()

is dened as the bias of the estimator. Therefore :

where b() = E ()

= E {[ E ()]

2 } + 2b()E [ E ()]

+ b 2 () = var()

+ b 2 ()

mse()

the unbiased estimators :

Minimum Variance Unbiased Estimator

(Minimum Variance Unbiased Estimation)

Estimation Theory

always exist. There may be no unbiased estimator or non of unbiased

estimators has a uniformly minimum variance.

Finding the MVU Estimator : There is no known procedure which

always leads to the MVU estimator. Three existing approaches are :

1

some estimator satises it.

Estimation Theory

Spring 2013

31 / 152

Estimation Theory

CRLB()

var()

Note that the CRLB is a function of .

It tells us what is the best performance that can be achieved

(useful in feasibility study and comparison with other estimators).

It may lead us to compute the MVU estimator.

Estimation Theory

Theorem (scalar case)

Assume that the PDF p(x; ) satises the regularity condition

ln p(x; )

E

=0

for all

2

1

ln p(x; )

var() E

2

An unbiased estimator that attains the CRLB can be found i :

ln p(x; )

= I ()(g (x) )

for some functions g (x) and I (). The estimator is = g (x) and the

minimum variance is 1/I ().

(Cramer-Rao Lower Bound)

Estimation Theory

Example : Consider x[0] = A + w [0] with w [0] N (0, 2 ).

1

1

2

p(x[0]; A) =

exp 2 (x[0] A)

2

2 2

1

ln p(x[0], A) = ln 2 2 2 (x[0] A)2

2

Then

ln p(x[0]; A)

1

= 2 (x[0] A)

A

2 ln p(x[0]; A)

1

= 2

2

A

According to Theorem :

2

var (A)

and

I (A) =

1

2

= g (x[0]) = x[0]

and A

Estimation Theory

Example : Consider multiple observations for DC level in WGN :

x[n] = A + w [n]

n = 0, 1, . . . , N 1

N1

1

exp 2

(x[n] A)2

p(x; A) =

2

( 2 2 )N

n=0

Then

ln p(x; A)

A

N1

1

(x[n] A)2

ln[(2 2 )N/2 ] 2

A

2

n=0

N1

N1

1

N

1

(x[n] A) = 2

x[n] A

2

N

n=0

n=0

According to Theorem :

2

var (A)

N

(Cramer-Rao Lower Bound)

N

and I (A) = 2

N1

1

and A = g (x[0]) =

x[n]

N

Estimation Theory

n=0

Transformation of Parameters

If it is desired to estimate = g (), then the CRLB is :

2

1 2

ln p(x; )

g

var() E

2

level in noise :

2

4A2 2

var(A2 )

(2A)2 =

N

N

Denition

Ecient estimator : An unbiased estimator that attains the CRLB is said

to be ecient.

Example : Knowing that x =

N1

1

x[n] is an ecient estimator for A, is

N

n=0

(Cramer-Rao Lower Bound)

Estimation Theory

Transformation of Parameters

Solution : Knowing that x N (A, 2 /N), we have :

E (

x 2 ) = E 2 (

x ) + var(

x ) = A2 +

2

= A2

N

So the estimator A

Lets look at the variance of this estimator :

x 4 ) E 2 (

x 2)

var(

x 2 ) = E (

but we have from the moments of Gaussian RVs (slide 15) :

E (

x 4 ) = A4 + 6A2

2

2

+ 3( )2

N

N

Therefore :

2

3 4

2

4A2 2 2 4

2

var(

x ) = A + 6A

+ 2 A +

=

+ 2

N

N

N

N

N

2

Estimation Theory

Transformation of Parameters

Remarks :

2 = x2 is biased and not ecient.

The estimator A

As N the bias goes to zero and the variance of the estimator

approaches the CRLB. This type of estimators are called

asymptotically ecient.

General Remarks :

is an

If g () = a + b is an ane function of , then g

() = g ()

moreover :

2

g

= a2 var()

var(g ())

var()

but var(g

()) = var(a + b) = a2 var(),

achieved.

If g () is a nonlinear function of and is an ecient estimator,

is an asymptotically ecient estimator.

then g ()

(Cramer-Rao Lower Bound)

Estimation Theory

Theorem (Vector Parameter)

Assume that the PDF p(x; ) satises the regularity condition

ln p(x; )

E

=0

for all

where 0 means that the matrix is positive semidenite. I() is called the

Fisher information matrix and is given by :

2

ln p(x; )

Iij () = E

i j

An unbiased estimator that attains the CRLB can be found i :

ln p(x; )

= I()(g(x) )

Estimation Theory

Example : Consider a DC level in WGN with A and 2 unknown.

Compute the CRLB for estimation of = [A 2 ]T .

N1

N

1

N

2

(x[n] A)2

ln p(x; ) = ln 2 ln 2

2

2

2

n=0

2

I() = E

ln p(x;)

A2

2 ln p(x;)

A2

2 ln p(x;)

A2

2 ln p(x;)

(2 )2

=

N

2

0

N

24

The matrix is diagonal (just for this example) and can be easily inverted to

yield :

var()

2

N

2 )

var(

2 4

N

(Cramer-Rao Lower Bound)

Estimation Theory

Spring 2013

41 / 152

Transformation of Parameters

If it is desired to estimate = g(), and the CRLB for the covariance of

is I1 (), then :

2

1 T

ln p(x; )

g

g

E

C

Compute the CRLB for estimation of signal to noise ratio = A2 / 2 .

We have = [A 2 ]T and = g () = 12 /2 , then the Jacobian is :

g () g ()

A2

2A

g ()

=

4

=

1

2

2

So the CRLB is :

2A

var(

)

2

(Cramer-Rao Lower Bound)

A2

4

N

2

1

N

24

Estimation Theory

2A

2

A2

4

=

4 + 22

N

Spring 2013

42 / 152

If N point samples of data observed can be modeled as

x = H + w

where

x

H

=N 1

=N p

=p1

=N 1

observation vector

observation matrix (known, rank p)

vector of parameters to be estimated

noise vector with PDF N (0, 2 I)

Compute the CRLB and the MVU estimator that achieves this bound.

Step 1 : Compute ln p(x; ).

2 ln p(x; )

and the covariance

Step 2 : Compute I() = E

2

matrix of : C = I1 ().

Step 3 : Find the MVU estimator g(x) by factoring

ln p(x; )

= I()[g(x) ]

(Linear Models)

Estimation Theory

Spring 2013

43 / 152

Step 2 :

1 T

ln p(x; )

= 2

[x x 2xT H + T HT H]

2

1 T

[H x HT H]

=

2

2

1

ln p(x; )

= 2 HT H

Then

I() = E

2

HT H

ln p(x; )

= I()[g(x) ] =

[(HT H)1 HT x ]

2

Therefore :

C = I1 () = 2 (HT H)1

= g(x) = (HT H)1 HT x

(Linear Models)

Estimation Theory

Spring 2013

44 / 152

For a linear model with WGN represented by x = H + w the MVU

estimator is :

= (HT H)1 HT x

This estimator is ecient and attains the CRLB.

That the estimator is unbiased can be seen easily by :

= (HT H)1 HT E (H + w) =

E ()

The statistical performance of is completely specied because is a

linear transformation of a Gaussian vector x and hence has a Gaussian

distribution :

N (, 2 (HT H)1 )

(Linear Models)

Estimation Theory

Spring 2013

45 / 152

Consider tting the data x[n] by a p-th order polynomial function of n :

x[n] = 0 + 1 n + 2 n2 + + p np + w [n]

We have N data samples, then :

x = [x[0], x[1], . . . , x[N 1]]T

w = [w [0], w [1], . . . , w [N 1]]T

= [0 , 1 , . . . , p ]T

so x = H + w, where H is :

1

0

0

1

1

1

1

2

4

..

..

..

.

.

.

1 N 1 (N 1)2

..

.

0

1

2p

..

.

(N 1)p

N(p+1)

(Linear Models)

Estimation Theory

Spring 2013

46 / 152

Consider the Fourier analysis of the data x[n] :

x[n] =

M

2kn

2kn

)+

) + w [n]

bk sin(

N

N

M

ak cos(

k=1

k=1

H = [ha1 , ha2 , . . . , haM , hb1 , hb2 , . . . , hbM ]

with

hak

1

cos( 2k

N )

2k2

cos( N )

..

.

hbk

)

cos( 2k(N1)

N

1

sin( 2k

N )

2k2

sin( N )

..

.

)

sin( 2k(N1)

N

(Linear Models)

Estimation Theory

Spring 2013

47 / 152

After simplication (noting that (HT H)1 =

2

N I),

we have :

2

= [(ha1 )T x, . . . , (haM )T x , (hb1 )T x, . . . , (hbM )T x]T

N

which is the same as the standard solution :

N1

N1

2

2kn

2

2kn

, bk =

x[n] cos

x[n] sin

ak =

N

N

N

N

n=0

n=0

The covariance matrix is :

C = 2 (HT H)1 =

2 2

I

N

are independent).

(Linear Models)

Estimation Theory

Spring 2013

48 / 152

Consider identication of a Finite Impulse Response (FIR) model, h[k] for

k = 0, 1, . . . , p 1, with input u[n] and output x[n] provided for

n = 0, 1, . . . , N 1 :

x[n] =

p1

h[k]u[n k] + w [n]

n = 0, 1, . . . , N 1

k=0

u[0]

0

0

h[0]

u[1]

u[0]

0

h[1]

H

=

=

..

..

..

..

...

.

.

.

.

h[p 1] p1

u[N 1] u[N 2] u[N p] Np

The MVU estimate is = (HT H)1 HT x with C = 2 (HT H)1 .

(Linear Models)

Estimation Theory

Spring 2013

49 / 152

Determine the MVU estimator for the linear model x = H + w with w a

colored Gaussian noise with N (0, C).

Whitening approach : Since C is positive denite, its inverse can be

factored as C1 = DT D where D is an invertible matrix. This matrix acts

as a whitening transformation for w :

E [(Dw)(Dw)T ] = E (DwwT D) = DCDT = DD1 DT D = I

Now if we transform the linear model x = H + w to :

x = Dx = DH + Dw = H + w

where w = Dw N (0, I) is white and we can compute the MVU

estimator as :

= (HT H )1 HT x = (HT DT DH)1 HT DT Dx

so, we have :

= (HT C1 H)1 HT C1 x with C = (HT H )1 = (HT C1 H)1

(Linear Models)

Estimation Theory

Spring 2013

50 / 152

Consider a linear model x = H + s + w, where s is a known signal. To

determine the MVU estimator let x = x s, so that x = H + w is a

standard linear model. The MVU estimator is :

= (HT H)1 HT (x s) with C = 2 (HT H)1

x[n] = A + r n + w [n] where r is known. Then we have :

w [0]

x[0]

1

1

x[1] 1

r w [1]

= .. A + .. +

..

..

.

.

.

.

x[N 1]

r N1

w [N 1]

N1

= (HT H)1 HT (x s) = 1

(x[n] r n )

A

N

=

with var(A)

n=0

(Linear Models)

Estimation Theory

Spring 2013

2

N

51 / 152

Problems of nding the MVU estimators :

The MVU estimator does not always exist or impossible to nd.

The PDF of data may be unknown.

BLUE is a suboptimal estimator that :

restricts estimates to be linear in data ;

restricts estimates to be unbiased ;

= Ax

= AE (x) =

E ()

needs only the mean and the variance of the data (not the PDF). As

a result, in general, the PDF of the estimates cannot be computed.

Remark : The unbiasedness restriction implies a linear model for the data.

However, it may still be used if the data are transformed suitably or the

model is linearized.

(Best Linear Unbiased Estimators)

Estimation Theory

Spring 2013

52 / 152

1

x[n]

=

N1

an x[n] = aT x

n = 0, 1, . . . , N 1

where a = [a0 , a1 , . . . , aN1 ]T

n=0

2

=

E ()

N1

an E (x[n]) =

n=0

3

= E {[ E ()]

2 } = E {[aT x aT E (x)]2 }

var()

= E {aT [x E (x)][x E (x)]T a} = aT Ca

Estimation Theory

Spring 2013

53 / 152

Consider the problem of amplitude estimation of known signals in noise :

x[n] = s[n] + w [n]

N1

an x[n] = aT x

n=0

2

= aT E (x) = aT s =

Restrict estimate to be unbiased : E ()

then aT s = 1 where s = [s[0], s[1], . . . , s[N 1]]T

Minimize aT Ca subject to aT s = 1.

Minimize

J = aT Ca + (aT s 1)

sT C1 x

= T 1

s C s

(Best Linear Unbiased Estimators)

and

=

var()

Estimation Theory

1

sT C1 s

Spring 2013

54 / 152

Theorem (GaussMarkov)

If the data are of the general linear model form

x = H + w

with w is a noise vector with zero mean and covariance C (the PDF of w

is arbitrary), then the BLUE of is :

= (HT C1 H)1 HT C1 x

and the covariance matrix of is

C = (HT C1 H)1

Remark : If noise is Gaussian then BLUE is MVU estimator.

(Best Linear Unbiased Estimators)

Estimation Theory

Spring 2013

55 / 152

Example : Consider the problem of DC level in noise : x[n] = A + w [n],

where w [n] is of unspecied PDF with var(w [n]) = n2 . We have = A

and H = 1 = [1, 1, . . . , 1]T . The covariance matrix is :

1

2

0

0

02

0

0 0

0 1

0 2

0

0

12

1

C = ..

.. . .

.. C = ..

.

.

.

.

.

.

.

.

.

.

.

.

.

.

2

0 0 N1

0 0 21

N1

= (H C

T

H)

H C

N1 1 N1

1

x[n]

x=

2

n

n2

n=0

T

C = (H C

H)

n=0

N1 1

1

=

n2

n=0

Estimation Theory

Spring 2013

56 / 152

Estimation Theory

Spring 2013

57 / 152

Problems : MVU estimator does not often exist or cannot be found.

BLUE is restricted to linear models.

Maximum Likelihood Estimator (MLE) :

can always be applied if the PDF is known ;

is optimal for large data size ;

is computationally complex and requires numerical methods.

Basic Idea : Choose the parameter value that makes the observed data,

the most likely data to have been observed.

Likelihood Function : is the PDF p(x; ) when is regarded as a variable

(not a parameter).

ML Estimate : is the value of that maximizes the likelihood function.

Procedure : Find log-likelihood function ln p(x; ) ; dierentiate w.r.t

and set to zero and solve for .

(Maximum Likelihood Estimation)

Estimation Theory

Spring 2013

58 / 152

Example : Consider DC level in WGN with unknown variance

x[n] = A + w [n]. Suppose that A > 0 and 2 = A. The PDF is :

N1

1

1

(x[n] A)2

p(x; A) =

N exp

2A

(2A) 2

n=0

N1

N1

N

1

1

ln p(x; A)

=

+

(x[n] A) + 2

(x[n] A)2

A

2A A

2A

n=0

n=0

MLE can be found by setting the above equation to zero :

N1

1

1

= +1

x 2 [n] +

A

2

N

4

n=0

Estimation Theory

Spring 2013

59 / 152

Stochastic Convergence

Convergence in distribution : Let {p(xN )} be a sequence of PDF. If

there exists a PDF p(x) such that

lim p(xN ) = p(x)

d

Example

Consider N independent RV x1 , x2 , . . . , xN with mean and nite variance

N

1

xn , then according to the Central Limit Theorem

2 . Let xN =

N

n=1

xN

converges in distribution to z N (0, 1).

(CLT), zN = N

Estimation Theory

Spring 2013

60 / 152

Stochastic Convergence

Convergence in probability : Sequence of random variables {xN }

converges in probability to the random variable x if for every > 0

lim Pr {|xN x| > } = 0

Sequence of random variables {xN } converges with probability 1 to the

random variable x if and only if for all possible events

Pr { lim xN = x} = 1

N

Example

Consider N independent RV x1 , x2 , . . . , xN with mean and nite variance

N

1

xn . Then according to the Law of Large Numbers

2 . Let xN =

N

n=1

(LLN), zN converges in probability to .

(Maximum Likelihood Estimation)

Estimation Theory

Spring 2013

61 / 152

Asymptotic Unbiasedness : Estimator N is an asymptotically unbiased

estimator of if :

lim E (N ) = 0

N

n = 1, 2, . . ., especially for large value of n (it is not the ultimate form of

distribution, which may be degenerate).

Asymptotic Variance : is not equal to lim var(N ) (which is the

N

1

lim E {N[N lim E (N )]2 }

asymptotic var(N ) =

N

N N

Mean-Square Convergence : Estimator N converges to in a

mean-squared sense, if :

lim E [(N )2 ] = 0

N

Estimation Theory

Spring 2013

62 / 152

Consistency

Estimator N is a consistent estimator of if for every > 0

plim(N ) =

Remarks :

If N is asymptotically unbiased and its limiting variance is zero then

it converges to in mean-square.

If N converges to in mean-square, then the estimator is consistent.

Asymptotic unbiasedness does not imply consistency and vice versa.

plim can be treated as an operator, e.g. :

plim(xy ) = plim(x)plim(y ) ;

plim

plim(x)

x

=

y

plim(y )

consistent estimator is itself a consistent estimator.

(Maximum Likelihood Estimation)

Estimation Theory

Spring 2013

63 / 152

Properties : MLE may be biased and is not necessarily an ecient

estimator. However :

MLE is a consistent estimator meaning that

lim Pr [| | > ] = 0

to CRLB).

Under some regularity conditions, the MLE is asymptotically

normally distributed

a

N (, I1 ())

even if the PDF of x is not Gaussian.

If an MVU estimator exists then ML procedure will nd it.

Estimation Theory

Spring 2013

64 / 152

Example

Consider DC level in WGN with known variance 2 .

Sol. Then

N1

1

exp

(x[n] A)2

p(x; A) =

N

2

2

2

(2 ) 2

n=0

N1

1

ln p(x; A)

= 2

(x[n] A) = 0

A

n=0

N1

=0

x[n] N A

n=0

N1

= x = 1

x[n]

which leads to A

N

n=0

Estimation Theory

Spring 2013

65 / 152

Theorem (Invariance Property of the MLE)

The MLE of the parameter = g (), where the PDF p(x; ) is

parameterized by , is given by

= g ()

where is the MLE of .

It can be proved using the property of the consistent estimators.

If = g () is a one-to-one function, then

pT (x; ) =

max

{:=g ()}

p(x; )

and

Estimation Theory

Spring 2013

66 / 152

Example

Consider DC level in WGN and nd MLE of = exp(A). Since g () is a

one-to-one function then :

x)

Example

Consider DC level in WGN and nd MLE of = A2 . Since g () is not a

one-to-one function then :

0

2

0

=

(Maximum Likelihood Estimation)

arg

max

<A<

2

2 = x2

p(x; A) = A

Estimation Theory

Spring 2013

67 / 152

Example

Consider DC Level in WGN with unknown variance. The vector parameter

= [A 2 ]T should be estimated.

We have :

N1

1

(x[n] A)

2

ln p(x; )

A

ln p(x; )

2

N1

N

1

= 2+ 4

(x[n] A)2

2

2

n=0

n=0

= x

A

N1

1

2

(x[n] x)2

=

N

n=0

Estimation Theory

Spring 2013

68 / 152

Consider the general Gaussian case where x N ((), C()).

The partial derivative of the PDF is :

1

C()

()T 1

ln p(x; )

1

= tr C ()

C ()(x ())

+

k

2

k

k

C1 ()

1

(x ())

(x ())T

2

k

for k = 1, . . . , p.

By setting the above equations equal to zero, MLE can be found.

A particular case is when C is known (the rst and third terms

become zero).

In addition, if () is linear in , the general linear model is obtained.

Estimation Theory

Spring 2013

69 / 152

Consider the general linear model x = H + w where w is a noise vector

with PDF N (0, C) :

1

1

T 1

exp (x H) C (x H)

p(x; ) =

2

(2)N det(C)

Taking the derivative of ln p(x; ) leads to :

(H)T 1

ln p(x; )

=

C (x H)

Then

HT C1 (x H) = 0

= (HT C1 H)1 HT C1 x

N (, (HT C1 H)1 )

(Maximum Likelihood Estimation)

Estimation Theory

Spring 2013

70 / 152

NewtonRaphson : A closed form estimator cannot be always computed

by maximizing the likelihood function. However, the maximum value can

be computed by the numerical methods like the iterative NewtonRaphson

algorithm.

k+1

2 ln p(x; )

= k

T

1

ln p(x; )

=k

Remarks :

The Hessian can be replaced by the negative of its expectation, the

Fisher information matrix I().

This method suers from convergence problems (local maximum).

Typically, for large data length, the log-likelihood function becomes

more quadratic and the algorithm will produce the MLE.

Estimation Theory

Spring 2013

71 / 152

(Least Squares)

Estimation Theory

Spring 2013

72 / 152

In all the previous methods we assumed that the measured signal x[n] is

the sum of a true signal s[n] and a measurement error w [n] with known

probabilistic model. In least squares method

x[n] = s[n, ] + e[n]

where e(n) represents the modeling and measurement errors. The objective

is to minimize the LS cost :

J() =

N1

n=0

a deterministic signal model.

It has a broader range of applications.

No claim about the optimality can be made.

The statistical performance cannot be assessed.

(Least Squares)

Estimation Theory

Spring 2013

73 / 152

Example

Estimate the DC level of a signal. We observe x[n] = A + [n] for

n = 0, . . . , N 1 and the LS criterion is :

J(A) =

N1

(x[n] A)2

n=0

J(A)

= 2

(x[n] A) = 0

A

N1

n=0

(Least Squares)

Estimation Theory

N1

1

A=

x[n]

N

n=0

Spring 2013

74 / 152

Suppose that the observation model is linear x = H + then

J() =

N1

n=0

= xT x 2xT H + T HT H

where H is full rank. The gradient is

J()

= 2HT x + 2HT H = 0

= (HT H)1 HT x

T (x H)

= xT [I H(HT H)1 HT ]x = xT (x H)

Jmin = (x H)

where I H(HT H)1 HT is an Idempotent matrix.

(Least Squares)

Estimation Theory

Spring 2013

75 / 152

Consider the following linear model

x = H + w

Estimator

Assumption

Estimate

No probabilistic assumption

= (HT H)1 HT x

BLUE

= (HT H)1 HT x

MLE

= (HT H)1 HT x

MVUE

= (HT H)1 HT x

LSE

Gaussian.

(Least Squares)

Estimation Theory

Spring 2013

76 / 152

The LS criterion can be modied by including a positive denite

(symmetric) weighting matrix W :

J() = (x H)T W(x H)

That leads to the following estimator :

= (HT WH)1 HT Wx

and minimum LS cost :

Jmin = xT [W WH(HT WH)1 HT W]x

Remark : If we take W = C1 , where C is the covariance of noise then

weighted least squares estimator is the BLUE. However, there is no true

LS-based reason for this choice.

(Least Squares)

Estimation Theory

Spring 2013

77 / 152

Geometrical Interpretation

Recall the general signal model s = H. If we denote the column of H by

hi we have :

1

p

2

i hi

s = [h1 h2 hp ] . =

..

p

i =1

The LS minimizes the length of the error vector between the data and

the signal model = x s :

J() = (x H)T (x H) = x H2

The data vector can lie anywhere in R N , while signal vectors must lie

in a p-dimensional subspace of R N , termed S p , which is spanned by

the column of H.

(Least Squares)

Estimation Theory

Spring 2013

78 / 152

Intuitively, it is clear that the LS error is minimized when s = H is

the orthogonal projection of x onto S p .

So the LS error vector = x s is orthogonal to all columns of H :

HT = 0

=0

HT (x H)

= (HT H)1 HT x

s = H = H(HT H)1 HT x = Px

where P is the orthogonal projection matrix.

Note that if z Range(H), then Pz = z. Recall that Range(H) is the

subspace spanned by the columns of H.

Now, since Px S p then P(Px) = Px. Therefore any projection

matrix is idempotent, i.e. P2 = P.

It can be veried that P is symmetric and singular (with rank p).

(Least Squares)

Estimation Theory

Spring 2013

79 / 152

Recall that

HT H =

< h2 , h1 > < h2 , h2 >

..

..

..

.

.

.

< hp , h1 > < hp , h2 >

< h1 , hp >

< h2 , hp >

..

.

< hp , hp >

In this case we have i = hT x, thus

i

s = H =

p

i hi =

i =1

p

(hT

i x)hi

i =1

model), we can easily compute the new estimate.

(Least Squares)

Estimation Theory

Spring 2013

80 / 152

Suppose that you have a set of data and the objective is to t a

polynomial to data. What is the best polynomial order ?

Remarks :

It is clear that by increasing the order Jmin is monotonically

non-increasing.

By choosing p = N we can perfectly t the data to model. However,

we t the noise as well.

We should choose the simplest model that adequately describes the

data.

We increase the order only if the cost reduction is signicant.

If we have an idea about the expected level of Jmin , we increase p to

approximately attain this level.

There is an order-recursive LS algorithm to eciently compute a

(p + 1)-th order model based on a p-th order one (See 8.6).

(Least Squares)

Estimation Theory

Spring 2013

81 / 152

1] based on x[N 1] = x[0], . . . , x[N 1] is

Suppose that [N

as a

Example

Consider LS estimate of DC level :

N1

1

A[N 1] =

x[n]. We have :

N

n=0

1

1

x[n] =

N +1

N +1

N

A[N]

=

n=0

N1

1

N

x[n] + x[N]

N

n=0

1

N

A[N 1] +

x[N]

=

N +1

N +1

1])

1] + 1 (x[N] A[N

= A[N

N +1

new estimate = old estimate + gain prediction error

(Least Squares)

Estimation Theory

Spring 2013

82 / 152

Example (DC level in uncorrelated noise)

x[n] = A + w [n] and var(w [n]) = n2 . The WLS estimate (or BLUE) is :

N1

x[n]

n2

1

n=0 n2

1] = n=0

A[N

N1

2

1]

1] + 1/N

x[N] A[N

A[N]

= A[N

N1 1

n=0 n2

1] + K [N] x[N] A[N

1]

= A[N

The gain factor K [N] can be reformulated as :

K [N] =

1])

var(A[N

1]) + 2

var(A[N

(Least Squares)

Estimation Theory

Spring 2013

83 / 152

Consider the general linear model x = H + w, where w is an uncorrelated

noise with the covariance matrix C. The BLUE (or WLS) is :

= (HT C1 H)1 HT C1 x and C = (HT C1 H)1

Lets dene :

C[n] = diag(02 , 12 , . . . , n2 )

H[n 1]

np

=

H[n] =

1p

hT [n]

x[n] = x[0], x[1], . . . , x[n]

based on n + 1 data samples, as a function of

1] and new data x[n]. The batch estimator is :

[n

1] = [n 1]HT [n 1]C1 [n 1]x[n 1]

[n

with [n 1] = (HT [n 1]C1 [n 1]H[n 1])1

(Least Squares)

Estimation Theory

Spring 2013

84 / 152

Estimator Update :

1]

= [n

1] + K[n] x[n] hT [n][n

[n]

where

K[n] =

n2

[n 1]h[n]

+ hT [n][n 1]h[n]

Covariance Update :

[n] = (I K[n]hT [n])[n 1]

The following Lemma is used to compute the updates :

(A + BCD)1 = A1 A1 B[C1 + DA1 B]1 DA1

with A = 1 [n 1], B = h[n], C = 1/n2 , D = hT [n].

Initialization ?

(Least Squares)

Estimation Theory

Spring 2013

85 / 152

Linear constraints in the form A = b can be considered in LS solution.

The LS criterion becomes :

Jc () = (x H)T (x H) + T (A b)

Jc

= 2HT x + 2HT H + AT

c

1

= (HT H)1 HT x (HT H)1 AT

2

= (HT H)1 AT

2

Now Ac = b can be solved to nd :

= 2[A(HT H)1 AT ]1 (A b)

(Least Squares)

Estimation Theory

Spring 2013

86 / 152

Many applications have nonlinear observation model : s() = H. This

leads to a nonlinear optimization problem that can be solved numerically.

Newton-Raphson Method : Find a zero of the gradient of the criterion

by linearizing the gradient around the current estimate.

1

g()

g()

k+1 = k

=k

where g() =

sT ()

[x s()] and

N1

2 s[n] s[n] s[n]

[g()]i

=

(x[n] s[n])

j

i j

j i

n=0

Around the solution [x s()] is small so the rst term in the Jacobian

can be neglected. This makes this method equivalent to the Gauss-Newton

algorithm which is numerically more robust.

(Least Squares)

Estimation Theory

Spring 2013

87 / 152

Gauss-Newton Method : Linearize signal model around the current

estimate and solve the resulting linear problem.

s()

( 0 ) = s(0 ) + H(0 )( 0 )

s() s(0 ) +

=0

The solution to the linearized problem is :

= [HT (0 )H(0 )]1 HT (0 )[x s(0 ) + H(0 )0 ]

= 0 + [HT (0 )H(0 )]1 HT (0 )[x s(0 )]

If we now iterate the solution, it becomes :

k+1 = k + [HT (k )H(k )]1 HT (k )[x s(k )]

Remark : Both the Newton-Raphson and the Gauss-Newton methods can

have convergence problems.

(Least Squares)

Estimation Theory

Spring 2013

88 / 152

Transformation of parameters : Transform into a linear problem by

seeking for an invertible function = g() such that :

s() = s(g1 ()) = H

where

= (HT H)1 HT x.

So the nonlinear LSE is = g1 ()

s[n] = A cos(2f0 n + )

n = 0, 1, . . . , N 1

A cos(2f0 n + ) = A cos cos 2f0 n A sin sin 2f0 n

if we let 1 = A cos and 2 = A sin then the signal model becomes

linear s = H. The LSE is

= (HT H)1 HT x and g1 () is

!

2

2

2

and

= arctan

A = 1 + 2

1

(Least Squares)

Estimation Theory

Spring 2013

89 / 152

Separability of parameters : If the signal model is linear in some of the

parameters, try to write it as s = H(). The LS error can be minimized

wrt .

= [HT ()H()]1 HT ()x

numerical method (e.g. brute force).

= xT I H()[HT ()H()]1 HT () x

J(, )

Then

= arg max

xT H()[HT ()H()]1 HT () x

than the dimension of .

(Least Squares)

Estimation Theory

Spring 2013

90 / 152

Example (Damped Exponentials)

Consider the following signal model :

s[n] = A1 r n + A2 r 2n + A3 r 3n

n = 0, 1, . . . , N 1

= [A1 , A2 , A3 ]T so we get s = H(r ) with :

1

1

1

2

r

r3

r

H(r ) = ..

..

..

.

.

.

Lets take

r N1 r 2(N1) r 3(N1)

Step 1 : Maximize xT H(r )[HT (r )H(r )]1 HT (r ) x to obtain r .

Step 2 : Compute = [HT (r )H(r )]1 HT (r )x.

(Least Squares)

Estimation Theory

Spring 2013

91 / 152

Estimation Theory

Spring 2013

92 / 152

Introduction

Classical Approach :

Assumes is unknown but deterministic.

Some prior knowledge on cannot be used.

Variance of the estimate may depend on .

MVUE may not exist.

In Monte-Carlo Simulations, we do M runs for each xed and then

compute sample mean and variance for each (no averaging over ).

Bayesian Approach :

Assumes is random with a known prior PDF, p().

We estimate a realization of based on the available data.

Variance of the estimate does not depend on .

A Bayesian estimate always exists

In Monte-Carlo simulations, we do M runs for randomly chosen and

then we compute sample mean and variance over all values.

(The Bayesian Philosophy)

Estimation Theory

Spring 2013

93 / 152

Classical MSE

Classical MSE is a function of the unknown parameter and cannot be

used for constructing the estimators.

2

2 p(x; )dx

Note that the E {} is wrt the PDF of x.

Bayesian MSE :

Bayesian MSE is not a function of and can be minimized to nd an

estimator.

2

2 p(x, )d xd

[ (x)]

Bmse() = E {[ (x)] } =

Note that the E {} is wrt the joint PDF of x and .

(The Bayesian Philosophy)

Estimation Theory

Spring 2013

94 / 152

Consider the estimation of A that minimizes the Bayesian MSE, where A is

a random variable with uniform PDF p(A) = U [A0 , A0 ] and independent

of w [n].

2

2

[A A] p(A|x)dA p(x)d x

Bmse() =

[A A] p(x, )dxdA =

where we used Bayes theorem : p(x, ) = p(A|x)p(x).

Since p(x) 0 for all x, we need only minimize the integral in brackets.

We set the derivative equal to zero :

2

p(A|x)dA = 2 Ap(A|x)dA + 2A

p(A|x)dA

[A A]

A

which results in

=

A

Ap(A|x)dA = E (A|x)

or the mean of posterior PDF p(A|x).

(The Bayesian Philosophy)

Estimation Theory

Spring 2013

95 / 152

How to compute the Bayesian MMSE estimate :

= E (A|x) = Ap(A|x)dA

A

The posteriori PDF can be computed using Bayes Rule :

p(A|x) =

p(x|A)p(A)

p(x|A)p(A)

="

p(x)

p(x|A)p(A)dA

"

p(x) is the marginal PDF dened as p(x) = p(x, A)dA.

The integral in the denominator acts as a normalization of

p(x|A)p(A) such the integral of p(A|x) be equal to 1.

p(x|A) has exactly the same form as p(x; A) which is used in the

classical estimation.

The MMSE estimator always exists and can be computed by :

"

= " Ap(x|A)p(A)dA

A

p(x|A)p(A)dA

(The Bayesian Philosophy)

Estimation Theory

Spring 2013

96 / 152

Example : For p(A) = U [A0 , A0 ] we have :

A0

N1

1

1

1

A

exp

(x[n] A)2 dA

2A0 A0 (2 2 )N/2

2 2

n=0

=

A

A0

N1

1

1

1

2

exp

(x[n] A) dA

2A0 A0 (2 2 )N/2

2 2

n=0

Before collecting the data the mean of the prior PDF p(A) is the best

estimate while after collecting the data the best estimate is the mean

of posterior PDF p(A|x).

Choice of p(A) is crucial for the quality of the estimation.

Only Gaussian prior PDF leads to a closed-form estimator.

Conclusion : For an accurate estimator choose a prior PDF that can be

physically justied. For a closed-form estimator choose a Gaussian prior

PDF.

(The Bayesian Philosophy)

Estimation Theory

Spring 2013

97 / 152

Theorem (Conditional PDF of Bivariate Gaussian)

If x and y are distributed according to a bivariate Gaussian PDF :

T

1 x E (x)

1

x E (x)

1

exp

C

p(x, y ) =

y E (y )

y E (y )

2

2 det(C)

with covariance matrix :

C=

var(x)

cov(x, y )

cov(x, y ) var(y )

cov(x, y )

[x E (x)]

var(x)

cov2 (x, y )

var(y |x) = var(y )

var(x)

E (y |x) = E (y ) +

Estimation Theory

Spring 2013

98 / 152

Example (DC level in WGN with Gaussian prior PDF)

Consider the Bayesian model x[0] = A + w [0] (just one observation) with

a prior PDF of A N (A , A2 ) independent of noise.

First we compute the covariance cov(A, x[0]) :

cov(A, x[0]) = E {(A A )(x[0] E (x[0])}

= E {(A A )(A + w [0] A 0)}

= E {(A A )2 + (A A )w [0]} = var(A) + 0 = A2

The Bayesian MMSE estimate is also Gaussian with :

2

= A|x = A + cov(A, x[0]) [x[0] A ] = A + A (x[0] A )

A

var(x)

2 + A2

= 2 = A2

var(A)

A|x

(The Bayesian Philosophy)

2

cov2 (A, x[0])

= A2 (1 2 A 2 )

var(x)

+ A

Estimation Theory

Spring 2013

99 / 152

Theorem (Conditional PDF of Multivariate Gaussian)

If x (with dimension k 1) and y (with dimension l 1) are jointly

Gaussian with PDF :

T

1 x E (x)

1

x E (x)

1

exp

C

p(x, y ) =

k+l

y E (y)

y E (y)

2

(2) 2 det(C)

with covariance matrix :

C=

Cxx

Cyx

Cxy

Cyy

E (y|x) = E (y) + Cyx C1

xx [x E (x)]

Cy |x

= Cyy Cyx C1

xx Cxy

Estimation Theory

Spring 2013

100 / 152

Let the data be modeled as

x = H + w

where is a p 1 random vector with prior PDF N ( , C ) and w is a

noise vector with PDF N (0, Cw ).

Since and w are independent and Gaussian, they are jointly Gaussian

then the posterior PDF is also Gaussian. In order to nd the MMSE

estimator we should compute the covariance matrices :

Cxx

= E {(H + w H )(H + w H )T }

= E {(H( ) + w)(H( ) + w)T }

= HE {( )( )T }HT + E (wwT ) = HC HT + Cw

Cx

= E {( )[H( ) + w]T }

= E {( )( )T }HT = C HT

Estimation Theory

Spring 2013

101 / 152

Theorem (Posterior PDF for the Bayesian General Linear Model)

If the observed data can be modeled as

x = H + w

where is a p 1 random vector with prior PDF N ( , C ) and w is a

noise vector with PDF N (0, Cw ), then the posterior PDF p(x|) is

Gaussian with mean :

E (|x) = + C HT (HC HT + Cw )1 (x H )

and covariance

C|x = C C HT (HC HT + Cw )1 HC

Remark : In contrast to the classical general linear model, H need not be

full rank.

(The Bayesian Philosophy)

Estimation Theory

Spring 2013

102 / 152

Example (DC Level in WGN with Gaussian Prior PDF)

x[n] = A + w [n],

n = 0, . . . , N 1,

A N (A , A2 ),

w [n] N (0, 2 )

E (A|x) = A + A2 1T (1A2 1T + 2 I)1 (x 1A )

var(A|x) = A2 A2 1T (1A2 1T + 2 I)1 1A2

We use the Matrix Inversion Lemma to get :

(I + 1

A2 T 1

1 ) =I

2

E (A|x) = A +

(The Bayesian Philosophy)

A2

A2 +

2

N

11T

2

2

A

+ 1T 1

=I

11T

2

2

A

+N

(

x A ) and var(A|x) =

Estimation Theory

2 2

N A

2

2

A + N

Spring 2013

103 / 152

Nuisance Parameters

Denition (Nuisance Parameter)

Suppose that and are unknown parameters but we are only interested

in estimating . In this case is called a nuisance parameter.

In the classical approach we have to estimate both but in the

Bayesian approach we can integrate it out.

Note that in the Bayesian approach we can nd p(|x) from p(, |x)

as a marginal PDF :

p(|x) = p(, |x)d

We can also express it as

p(x|)p()

p(|x) = "

p(x|)p()d

where

p(x|) =

p(x|, )p(|)d

p(x|) = p(x|, )p()d

(The Bayesian Philosophy)

Estimation Theory

Spring 2013

104 / 152

Estimation Theory

Spring 2013

105 / 152

Risk Function

A general Bayesian estimator is obtained by minimizing the Bayes Risk

where R()

= is the estimation error.

1. Quadratic :

= E {2 } = E {( )

2}

R()

2. Absolute :

= E {||} = E {| |}

R()

3. Hit-or-Miss :

= E {C ()}

R()

where C () =

Estimation Theory

0 || <

1 ||

Spring 2013

106 / 152

The Bayes risk is

= E {C ()} =

R()

C ( )p(x,

)d xd =

g ()p(x)dx

where

=

g ()

C ( )p(|x)d

should be minimized.

Quadratic

= E {( )

2 } =Bmse()

which leads to the MMSE

For this case R()

estimator with

= E (|x)

so is the mean of the posterior PDF p(|x).

Estimation Theory

Spring 2013

107 / 152

Absolute

In this case we have :

= | |p(|x)d

g ()

=

( )p(|x)d +

( )p(|x)d

By setting the derivative of g ()

rule, we get :

p(|x)d =

p(|x)d

Estimation Theory

Spring 2013

108 / 152

Hit-or-Miss

In this case we have

=

g ()

= 1

1 p(|x)d +

1 p(|x)d

p(|x)d

maximum of p(|x) or the mode of the posterior PDF.

This estimator is called the maximum a posteriori (MAP) estimator.

Remark : For unimodal and symmetric posterior PDF (e.g. Gaussian

PDF), the mean and the mode and the median are the same.

(General Bayesian Estimators)

Estimation Theory

Spring 2013

109 / 152

Extension to the vector parameter case : In general, like the scalar

case, we can write :

i = 1, 2, . . . , p

i = E (i |x) = i p(i |x)d i

Then p(i |x) can be computed as a marginal conditional PDF. For

example :

p(|x)d 2 d p d 1 = 1 p(|x)d

1 = 1

In vector form we have :

"

p(|x)d

1

"

2 p(|x)d

=

= p(|x)d = E (|x)

..

.

"

p p(|x)d

Similarly

Bmse(i ) = [C|x ]ii p(x)d x

(General Bayesian Estimators)

Estimation Theory

Spring 2013

110 / 152

For Bayesian Linear Model, poor prior knowledge leads to MVU

estimator.

T 1

1 T 1

= E (|x) = + (C1

+ H Cw H) H Cw (x H )

0, and therefore :

1 T 1

[HT C1

w H] H Cw x

Suppose that = A + b then the MMSE estimator for is

Enjoys the additive property for independent data sets.

Assume that , x1 , x2 are jointly Gaussian with x1 and x2

independent :

1

= E () + Cx1 C1

x1 x1 [x1 E (x1 )] + Cx2 Cx2 x2 [x2 E (x2 )]

Estimation Theory

Spring 2013

111 / 152

In the MAP estimation approach we have :

= arg max p(|x)

where

p(|x) =

p(x|)p()

p(x)

An equivalent maximization is :

= arg max p(x|)p()

or

= arg max[ln p(x|) + ln p()]

of p(x|) (for large data length), we can remove p() to obtain the

Bayesian Maximum Likelihood :

= arg max p(x|)

Estimation Theory

Spring 2013

112 / 152

Example (DC Level in WGN with Uniform Prior PDF)

The MMSE estimator cannot be obtained in explicit form due to the need

to evaluate the following integrals :

A0

N1

1

1

1

2

A

exp

(x[n] A) dA

2A0 A0 (2 2 )N/2

2 2

n=0

=

A

A0

N1

1

1

1

exp

(x[n] A)2 dA

2A0 A0 (2 2 )N/2

2 2

n=0

x < A0

A0

=

x

A0 x A0

The MAP estimator is given as : A

A0

x > A0

Remark : The main advantage of MAP estimator is that for non jointly

Gaussian PDFs it may lead to explicit solutions or less computational

eort.

(General Bayesian Estimators)

Estimation Theory

Spring 2013

113 / 152

Extension to the vector parameter case

1

p(|x)d2 . . . dp

An alternative is the following vector MAP estimator that maximizes the

joint conditional PDF :

= arg max p(|x) = arg max p(x|)p()

In general, as N , the MAP estimator the Bayesian MLE.

If the posterior PDF is Gaussian, the mode is identical to the mean,

therefore the MAP estimator is identical to the MMSE estimator.

The invariance property in ML theory does not hold for the MAP

estimator.

(General Bayesian Estimators)

Estimation Theory

Spring 2013

114 / 152

Performance Description

In the classical estimation the mean and the variance of the estimate

(or its PDF) indicates the performance of the estimator.

In the Bayesian approach the PDF of the estimate is dierent for each

realization of . So a good estimator should perform well for every

possible value of .

The estimation error = is a function of two random variables

( and x).

The mean and the variance of the estimation error, wrt two random

variables, indicates the performance of the estimator.

The mean value of the estimation error is zero. So the estimates are

unbiased (in the Bayesian sense).

= Ex, ( E (|x)) = Ex [E|x () E|x (|x)] = Ex (0) = 0

Ex, ( )

The variance of the estimate is Bmse :

2 ] = Bmse()

(General Bayesian Estimators)

Estimation Theory

Spring 2013

115 / 152

The vector of the estimation error is = and has a zero mean.

Its covariance matrix is :

M = Ex, (T ) = Ex, {[ E (|x)][ E (|x)]T }

&

'

()

= Ex E|x [ E (|x)][ E (|x)]T = Ex (C|x )

if x and are jointly Gaussian, we have :

M = C|x = C Cx C1

xx Cx

since C|x does not depend on x.

For a Bayesian linear model we have :

M = C|x = C C HT (HC HT + Cw )1 HC

In this case is a linear transformation of x and and thus is

Gaussian. Therefore :

N (0, M)

(General Bayesian Estimators)

Estimation Theory

Spring 2013

116 / 152

Deconvolution Problem : Estimate a signal transmitted through a

channel with known impulse response :

x[n] =

n

s 1

n = 0, 1, . . . , N 1

m=0

where s[n] is a WSS Gaussian process with known ACF, and w [n] is WGN

with variance 2 .

x[0]

x[1]

.

..

x[N 1]

h[0]

h[1]

.

..

h[N 1]

x = H + w

0

h[0]

.

..

h[N 2]

.

..

0

0

.

..

h[N ns ]

s[0]

s[1]

.

..

s[ns 1]

w [0]

w [1]

.

..

w [N 1]

s = Cs HT (HCs HT + 2 I)1 x

(General Bayesian Estimators)

Estimation Theory

Spring 2013

117 / 152

Consider that H = I (no ltering), so the Bayesian model becomes

x=s+w

Classical Approach The MVU estimator is s = (HT H)1 HT x = x.

Bayesian Approach The MMSE estimator is s = Cs (Cs + 2 I)1 x.

Scalar case : We estimate s[0] based on x[0] :

s [0] =

rss [0]

x[0]

x[0] =

2

rss [0] +

+1

Signal is a Realization of an Auto Regressive Process : Consider

a rst order AR process : s[n] = a s[n 1] + u[n], where u[n] is

WGN with variance u2 . The ACF of s is :

rss [k] =

(General Bayesian Estimators)

u2

(a)|k|

1 a2

Estimation Theory

Spring 2013

118 / 152

Estimation Theory

Spring 2013

119 / 152

Introduction

Problems with general Baysian estimators :

dicult to determine in closed form,

need intensive computations,

involve multidimensional integration (MMSE) or multidimensional

maximization (MAP),

can be determined only under the jointly Gaussian assumption.

What can we do if the joint PDF is not Gaussian or unknown ?

Keep the MMSE criterion ;

Restrict the estimator to be linear.

This leads to the Linear MMSE Estimator :

Which is a suboptimal estimator that can be easily implemented.

It needs only the rst and second moments of the joint PDF.

It is analogous to BLUE in classical estimation.

In practice, this estimator is termed Wiener lter.

(Linear Bayesian Estimators)

Estimation Theory

Spring 2013

120 / 152

Goal : Estimate , given the data vector x. Assume that only the rst two

moments of the joint PDF of x and are available :

E ()

C Cx

,

Cx Cxx

E (x)

LMMSE Estimator : Take the class of all ane estimators

=

N1

an x[n] + aN = aT x + aN

n=0

= E [( )

2]

Bmse()

Computing aN : Lets dierentiate the Bmse with respect to aN :

E ( aT x aN )2 = 2E ( aT x aN )

aN

Setting this equal to zero gives : aN = E () aT E (x).

(Linear Bayesian Estimators)

Estimation Theory

Spring 2013

121 / 152

Minimize the Bayesian MSE :

= E [( )

2 ] = E [( aT x E () + aT E (x))2 ]

Bmse()

(

' T

= E [a (x E (x)) ( E ())]2

= E [aT (x E (x))(x E (x))T a] E [aT (x E (x))( E ())]

E [( E ())(x E (x))T a] + E [( E ())2 ]

= aT Cxx a aT Cx Cx a + C

This can be minimized by setting the gradient to zero :

Bmse()

= 2Cxx a 2Cx = 0

a

T

which results in a = C1

xx Cx and leads to (Note that : Cx = Cx ) :

1

T 1

= aT x + aN = CT

x Cxx x + E () Cx Cxx E (x)

= E () + Cx C1

xx (x E (x))

(Linear Bayesian Estimators)

Estimation Theory

Spring 2013

122 / 152

The minimum Bayesian MSE is obtained :

= CT C1 Cxx C1 Cx CT C1 Cx

Bmse()

x xx

xx

x xx

Cx C1

xx Cx + C

= C Cx C1

xx Cx

Remarks :

Gaussian PDF.

The LMMSE estimator relies on the correlation between random

variables.

If a parameter is uncorrelated with the data (but nonlinearly

dependent), it cannot be estimated with an LMMSE estimator.

Estimation Theory

Spring 2013

123 / 152

Example (DC Level in WGN with Uniform Prior PDF)

The data model is : x[n] = A + w [n]

n = 0, 1, . . . , N 1

where A U [A0 , A0 ] is independent from w [n] (WGN with variance 2 ).

We have E (A) = 0 and E (x) = 0. The covariances are :

Cxx

Cx

2 T

2

T

2 1

= Cx C1

A

xx x = A 1 (A 11 + I) x =

where

A

A3 0

A20

1

= E (A ) =

A

dA =

=

2A0

6A0 A0

3

A0

A2

A20 /3

x

A20 /3 + 2 /N

A0

Estimation Theory

Spring 2013

124 / 152

Example (DC Level in WGN with Uniform Prior PDF)

Comparison of dierent Bayesian estimators :

A0

N1

1

2

A exp

(x[n] A) dA

2 2

A0

n=0

MMSE : A =

N1

A0

1

exp

(x[n] A)2 dA

2 2

A0

n=0

x < A0

A0

x

A0 x A0

MAP : A =

A0

x > A0

=

LMMSE : A

A20 /3

x

A20 /3 + 2 /N

Estimation Theory

Spring 2013

125 / 152

Geometrical Interpretation

Vector space of random variables : The set of scalar zero mean random

variables is a vector space.

The zero length vector of the set is a RV with zero variance.

For any real scalar a, ax is another zero mean RV in the set.

For two RVs x and y , x + y = y + x is a RV in the set.

For two RVs x and y , the inner product is : < x, y >= E (xy )

Two RVs x and y are orthogonal if :

< a1 x1 + a2 x2 , y >= a1 < x1 , y > +a2 < x2 , y >

E [(a1 x1 + a2 x2 )y ] = a1 E (x1 y ) + a2 E (x2 y )

The projection of y on x is :

(Linear Bayesian Estimators)

< y, x >

E (yx)

x=

x

2

x

x2

Estimation Theory

Spring 2013

126 / 152

Geometrical Interpretation

The LMMSE estimator can be determined using the vector space

viewpoint. We have :

N1

an x[n]

=

n=0

belongs to the subspace spanned by x[0], x[1], . . . , x[N 1].

is not in this subspace.

A good estimator will minimize the MSE :

2 ] = 2

E [( )

where = is the error vector.

Clearly the length of the vector error is minimized when is

orthogonal to the subspace spanned by x[0], x[1], . . . , x[N 1] or to

each data sample :

&

)

E ( )x[n]

= 0 for n = 0, 1, . . . , N 1

(Linear Bayesian Estimators)

Estimation Theory

Spring 2013

127 / 152

Geometrical Interpretation

The LMMSE estimator can be determined by solving the following

equations :

N1

am x[m] x[n] = 0

n = 0, 1, . . . , N 1

E

m=0

or

N1

n = 0, 1, . . . , N 1

am E (x[m]x[n]) = E (x[n])

m=0

E (x 2 [0])

E (x[1]x[0])

..

.

E (x[N 1]x[0])

E (x[0]x[1])

E (x 2 [1])

..

.

E (x[N 1]x[1])

..

.

E (x[0]x[N 1])

E (x[1]x[N 1])

..

.

2

E (x [N 1])

Therefore

Cxx a = CT

x

a0

a1

..

.

aN1

E (x[0])

E (x[1])

..

.

E (x[N 1])

T

a = C1

xx Cx

= aT x = Cx C1

xx x

(Linear Bayesian Estimators)

Estimation Theory

Spring 2013

128 / 152

We want to estimate = [1 , 2 , . . . , p ]T with a linear estimator that

minimize the Bayesian MSE for each element.

i =

N1

Bmse(i ) = E (i i )2

Minimize

n=0

Therefore :

i = E (i ) + Ci x C1

xx x E (x)

*+,- *+,- * +, 1N NN

i = 1, 2, . . . , p

N1

x E (x)

= E () + Cx C1

xx

* +, - *+,- *+,- * +, p1

and similarly

Bmse(i ) = [M ]ii

pN NN

where

M = C Cx C1

Cx

*+,- *+,- *+,- xx *+,pp

N1

Estimation Theory

pp

pN

Np

Spring 2013

129 / 152

Theorem (Bayesian GaussMarkov Theorem)

If the data are described by the Bayesian linear model form

x = H + w

where is a p 1 random vector with mean E () and covariance C , w

is a noise vector with zero mean and covariance Cw and is uncorrelated

with (the joint PDF of p(, w) is arbitrary), then the LMMSE estimator

of is :

= E () + C HT (HC HT + Cw )1 x HE ()

The performance of the estimator is measured by = whose mean is

zero and whose covariance matrix is

1

T 1

C = M = C C HT (HC HT + Cw )1 HC = C1

+ H Cw H

(Linear Bayesian Estimators)

Estimation Theory

Spring 2013

130 / 152

1] based on x[n 1], update the new estimate

Objective : Given [n

Assume that both A and w [n] have zero mean :

x[n] = A + w [n]

Estimator Update :

1] =

A[N

A2

x

A2 + 2 /N

1] + K [N](x[N] A[N

1])

A[N]

= A[N

where

K [N] =

1])

Bmse(A[N

1]) + 2

Bmse(A[N

1])

Bmse(A[N])

= (1 K [N])Bmse(A[N

(Linear Bayesian Estimators)

Estimation Theory

Spring 2013

131 / 152

Vector space view : If two observations are orthogonal, the LMMSE

estimate of is the sum of the projection of on each observation.

:

1 Find the LMMSE estimator of A based on x[0], yielding A[0]

A2

= E (Ax[0]) x[0] =

x[0]

A[0]

E (x 2 [0])

A2 + 2

2

x[1|0] =

2

E (x[0]x[1])

x[0] = 2 A 2 x[0]

2

E (x [0])

A +

x [1] = x[1] x[1|0].

This error vector is orthogonal to x[0]

Add to A[0]

the LMMSE estimator of A based on the innovation :

= A[0]

+

A[1]

E (A

x [1])

x[1]

E (

x 2 [1])

+,

*

the projection of A on

+ K [1](x[1] x[1|0])

= A[0]

x[1]

Estimation Theory

Spring 2013

132 / 152

Basic Idea : Generate a sequence of orthogonal RVs, namely, the

innovations :

/

.

x[2] , . . . ,

x[n]

x[0] , x[1] ,

*+,*+,*+,- *+,x[0] x[1]

x [1|0] x[2]

x [2|0,1]

x[n]

x [n|0,1,...,n1]

A[N]

=

N

K [n]

x [n] = A[N1]+K

[N]

x [N]

where

K [n] =

n=0

E (A

x [n])

E (

x 2 [n])

1]

x[N] = x[N] A[N

and

K [N] =

1])

Bmse(A[N

1])

2 + Bmse(A[N

1])

Bmse(A[N])

= (1 K [N])Bmse(A[N

(Linear Bayesian Estimators)

Estimation Theory

Spring 2013

133 / 152

Consider the general Bayesian linear model x = H + w, where w is an

uncorrelated noise with the diagonal covariance matrix Cw . Lets dene :

Cw [n] = diag(02 , 12 , . . . , n2 )

H[n 1]

np

=

H[n] =

1p

hT [n]

x[n] = x[0], x[1], . . . , x[n]

Estimator Update :

1]

= [n

1] + K[n] x[n] hT [n][n

[n]

where

M[n 1]h[n]

+ hT [n]M[n 1]h[n]

Minimum MSE Matrix Update :

K[n] =

n2

(Linear Bayesian Estimators)

Estimation Theory

Spring 2013

134 / 152

Remarks :

For the initialization of the sequential LMMSE estimator, we can use

the prior information :

[1]

= E ()

M[1] = C

the same form as the sequential LSE, although the approaches are

fundamentally dierent.

No matrix inversion is required.

The gain factor K[n] weights condence in the new data (measured

by n2 ) against all previous data (summarized by M[n 1]).

Estimation Theory

Spring 2013

135 / 152

Wiener Filtering

Consider the signal model : x[n] = s[n] + w [n]

n = 0, 1, . . . , N 1

where the data and noise are zero mean with known covariance matrix.

There are three main problems concerning the Wiener Filters :

Filtering : Estimate = s[n] (scalar) based on the data set

x = [x[0], x[1], . . . , x[n]]. The signal sample is estimated

based on the the present and past data only.

Smoothing : Estimate = s = [s[0], s[1], . . . , s[N 1]] (vector) based on

the data set x = x[0], x[1], . . . , x[N 1] . The signal sample

is estimated based on the the present, past and future data.

Prediction : Estimate = x[N 1 + ] for positive integer based on the

data set x = [x[0], x[1], . . . , x[N 1]].

Remarks : All these problems are solved using the LMMSE estimator :

with the minimum Bmse :

M = C Cx C1

= Cx C1

xx x

xx Cx

(Linear Bayesian Estimators)

Estimation Theory

Spring 2013

136 / 152

Estimate

= s = [s[0], s[1], . . . , s[N 1]] (vector) based on the data set

x = x[0], x[1], . . . , x[N 1] .

Cxx = Rxx = Rss + Rww

Therefore :

and

1

s = Csx C1

xx x = Rss (Rss + Rww ) x = Wx

as an FIR lter. Lets dene :

W = [a0 , a1 , . . . , aN1 ]

where aT

n is the (n + 1)th raw of W. Lets dene also :

hn = [h(n) [0], h(1) [1], . . . , h(n) [N 1]] which is just the vector an when

ipped upside down. Then

s [n] =

aT

nx

N1

h(n) [N 1 k]x[k]

k=0

(Linear Bayesian Estimators)

Estimation Theory

Spring 2013

137 / 152

Estimate = s[n] based on the data set x = x[0], x[1], . . . , x[n] . We

have again Cxx = Rxx = Rss + Rww and

Cx = E (s[n] x[0], x[1], . . . , x[n] ) = rss [n], rss [n 1], . . . , rss [0] = (rss )T

Therefore :

which is just the vector a when ipped upside down. Then :

T

s [n] = a x =

n

h(n) [n k]x[k]

k=0

This represents a time-varying, causal FIR lter. The impulse response can

be computed as :

(Rss + Rww )a = rss

(Linear Bayesian Estimators)

Estimation Theory

Spring 2013

138 / 152

Estimate = x[N 1 + ] based on the data set

x = [x[0], x[1], . . . , x[N 1]]. We have again Cxx = Rxx and

Cx

= rxx [N 1 + ], rxx [N 2 + ], . . . , rxx [] = (rxx )T

Therefore :

T

x[N 1 + ] = (rxx )T R1

xx x = a x

x[N 1 + ] = a x =

T

N1

h[N k]x[k]

k=0

can be computed as :

Rxx h = rxx

For = 1, this is equivalent to an Auto-Regressive process.

(Linear Bayesian Estimators)

Estimation Theory

Spring 2013

139 / 152

Kalman Filters

(Kalman Filters)

Estimation Theory

Spring 2013

140 / 152

Introduction

Kalman lter is a generalization of the Wiener lter.

In Wiener lter we estimate s[n] based on the noisy observation

vector x[n]. We assume that s[n] is a stationary random process with

known mean and covariance matrix.

In Kalman lter we assume that s[n] is a non stationary random

process whose mean and covariance matrix vary according to a known

dynamical model.

Kalman lter is a sequential MMSE estimator of s[n]. If the signal

and noise are jointly Gaussian, then the Kalman lter is an optimal

MMSE estimator, if not, it is the optimal LMMSE estimator.

Kalman lter has many applications in Control theory for state

estimation.

It can be generalized to vector signals and noise (in contrast to

Wiener lter).

(Kalman Filters)

Estimation Theory

Spring 2013

141 / 152

Consider a rst order dynamical model :

s[n] = as[n 1] + u[n]

n0

independent of u[n] for all n 0. It can be shown that :

s[n] = a

n+1

s[1] +

n

ak u[n k]

E (s[n]) = an+1 s

k=0

and

Cs [m, n] = E

&

)

s[m] E (s[m]) s[n] E (s[n])

= am+n+2 s2 + u2 amn

n

a2k

for m 0

k=0

(Kalman Filters)

Estimation Theory

Spring 2013

142 / 152

Theorem (Vector Gauss-Markov Model)

The Gauss-Markov model for a p 1 vector signal s[n] is :

s[n] = As[n 1] + Bu[n]

n0

magnitude. The r 1 vector u[n] N (0, Q) and the initial condition is a

p 1 random vector with s[n] N (s , Cs ) and is independent of u[n]s.

Then, the signal process is Gaussian with mean E (s[n]) = An+1 s and

covariance :

m+1

Cs [m, n] = A

m

n+1 T

T

Cs A

+

Ak BQBT Anm+k

k=mn

C[n] = Cs [n, n] and the mean and covariance propagation equations are :

E (s[n]) = AE (s[n 1]) , C[n] = AC[n 1]AT + BQBT

(Kalman Filters)

Estimation Theory

Spring 2013

143 / 152

Data model :

State equation

Observation equation

x[n] = s[n] + w [n]

Assumptions :

u[n] is zero mean Gaussian noise with independent samples and

E (u 2 [n]) = u2 .

w [n] is zero mean Gaussian noise with independent samples and

E (w 2 [n]) = n2 (time-varying variance).

s[1] N (s , s2 ) (for simplicity we suppose that s = 0).

s[1], u[n] and w [n] are independent.

Objective : Develop a sequential MMSE

estimator to estimate s[n] based

on the data X[n] = x[0], x[1], . . . , x[n] . This estimator is the mean of

posterior PDF :

s [n|n] = E (s[n]x[0], x[1], . . . , x[n])

(Kalman Filters)

Estimation Theory

Spring 2013

144 / 152

To develop the equations of a Kalman lter we need the following

properties :

Property 1 : For two uncorrelated jointly Gaussian data vector, the

MMSE estimator (if it is zero mean) is given by :

= E (|x1 , x2 ) = E (|x1 ) + E (|x2 )

Property 2 : If = 1 + 2 , then the MMSE estimator of is :

= E (|x) = E (1 + 2 |x) = E (1 |x) + E (2 |x)

Basic Idea : Generate the innovation x[n] = x[n] x[n|n 1] which is

uncorrelated with previous samples X[n

1]. Then use x[n] instead of x[n]

for estimation (X[n] is equivalent to X[n 1], x[n] ).

(Kalman Filters)

Estimation Theory

Spring 2013

145 / 152

From Property 1, we have :

s [n|n] = E (s[n]X[n 1], x[n]) = E (s[n]X[n 1]) + E (s[n]x[n])

= s [n|n 1] + E (s[n]x[n])

From Property 2, we have :

s [n|n 1] = E (as[n 1] + u[n]X[n 1])

= aE (s[n 1]X[n 1]) + E (u[n]X[n 1])

= as [n 1|n 1]

E (s[n]

x (n))

x[n] = K [n](x[n] x[n|n 1])

E (s[n]x[n]) =

2

E (

x [n])

where x[n|n 1] = s [n|n 1] + w[n|n

1] = s [n|n 1].

Finally we have :

s [n|n] = as [n 1|n 1] + K [n](x[n] s [n|n 1])

(Kalman Filters)

Estimation Theory

Spring 2013

146 / 152

State Model :

Observation Model :

x[n] = s[n] + w [n]

Prediction :

s [n|n 1] = as [n 1|n 1]

Minimum Prediction MSE :

M[n|n 1] = a2 M[n 1|n 1] + u2

Kalman Gain :

K [n] =

n2

M[n|n 1]

+ M[n|n 1]

Correction :

s [n|n] = s [n|n 1] + K [n](x[n] s [n|n 1])

Minimum MSE :

M[n|n] = (1 K [n])M[n|n 1]

(Kalman Filters)

Estimation Theory

Spring 2013

147 / 152

The Kalman lter is an extension of the sequential MMSE estimator

but applied to time-varying parameters that are represented by a

dynamic model.

No matrix inversion is required.

The Kalman lter is a time-varying linear lter.

The Kalman lter computes (and uses) its own performance measure,

the Bayesian MSE M[n|n].

The prediction stage increases the error, while the correction stage

decreases it.

The Kalman lter generates an uncorrelated sequence from the

observations so can be viewed as a whitening lter.

The Kalman lter is optimal in the Gaussian case and the optimal

LMMSE estimator if the Gaussian assumption is not valid.

In KF M[n|n], M[n|n 1] and K [n] do not depend on the

measurement and can be computed o line. At steady state (n )

the Kalman lter becomes an LTI lter (M[n|n], M[n|n 1] and K [n]

become constant).

(Kalman Filters)

Estimation Theory

Spring 2013

148 / 152

State Model :

Observation Model :

Prediction :

x[n] = hT [n]s[n] + w [n] with s[1] N (s , Cs )

s[n|n 1] = As[n 1|n 1]

M[n|n 1] = AM[n 1|n 1]AT + BQBT

Kalman Gain (p 1) :

K[n] =

M[n|n 1]h[n]

n2 + hT [n]M[n|n 1]h[n]

Correction :

s[n|n] = s[n|n 1] + K[n](x[n] hT [n]s [n|n 1])

Minimum MSE Matrix (p p) :

M[n|n] = (I K[n]hT [n])M[n|n 1]

(Kalman Filters)

Estimation Theory

Spring 2013

149 / 152

State Model :

Observation Model :

Prediction :

x[n] = H[n]s[n] + w[n] with w[n] N (0, C[n])

s[n|n 1] = As[n 1|n 1]

M[n|n 1] = AM[n 1|n 1]AT + BQBT

Kalman Gain Matrix (p M) : (Need matrix inversion !)

K[n] = M[n|n 1]HT [n] C[n] + H[n]M[n|n 1]HT [n]

Correction :

s[n|n] = s[n|n 1] + K[n](x[n] H[n]s [n|n 1])

Minimum MSE Matrix (p p) :

M[n|n] = (I K[n]H[n])M[n|n 1]

(Kalman Filters)

Estimation Theory

Spring 2013

150 / 152

The extended Kalman lter is a sub-optimal approach when the state

and the observation equations are nonlinear.

Nonlinear data model : Consider the following nonlinear data model

s[n] = f(s[n 1]) + Bu[n]

x[n] = g(s[n]) + w[n]

where f() and g() are nonlinear function mapping (vector to vector).

Model linearization : We linearize the model around the estimated value

of s using a rst order Taylor series.

f

s[n 1] s[n 1|n 1]

f(s[n 1]) f(s[n 1|n 1]) +

s[n 1]

g

s[n] s[n|n 1]

g(s[n]) g(s[n|n 1]) +

s[n]

We denote the Jacobians by :

g

f

H[n] =

A[n 1] =

s[n 1]

s[n]

s [n1|n1]

(Kalman Filters)

Estimation Theory

s [n|n1]

Spring 2013

151 / 152

Now we use the linearized models :

s[n] = A[n 1]s[n 1] + Bu[n]

+f(s[n 1|n 1]) A[n 1]s[n 1|n 1]

x[n] = H[n]s[n] + w[n]

+g(s[n|n 1]) H[n]s[n|n 1]

There are two dierences with the standard Kalman lter :

1

2

Both equations have known terms added to them. This will not change

the derivation of the Kalman lter.

The extended Kalman lter has exactly the same equations where A

is replaced with A[n 1].

A[n 1] and H[n] should be computed at each sampling time.

MSE matrices and Kalman gain can no longer be computed o line.

(Kalman Filters)

Estimation Theory

Spring 2013

152 / 152

- week 9Uploaded byapi-351776914
- Lecture01 Estimation Theory HandoutsUploaded byredcard53
- Chaptetr 1 Solution Steven M KAYUploaded byMuhammad Usman Iqbal
- 11.Statistical Approach to EstimationUploaded bysparsh
- Econometric sUploaded byA K
- Stats the bestUploaded byMax Chamberlin
- Spatial misalignment in time series studies of air pollution and health dataUploaded byrdpeng
- fulltextUploaded bySarah El Hachimi
- Sampling DistributionUploaded byDigvijay Singh
- SSR 2007 Lecture 1Uploaded byEng Samar
- 03GobburuUploaded byshewbaca
- Technical Efficiency of Cowpea Production in Osun State, NigeriaUploaded byiiste
- Mau05MathKalmanUploaded byDiego Alejandro Ávila Gárate
- Special ProjectUploaded byJason Gilgenbach
- Fama Macbeth RevisitedUploaded bysundarbalakrishna
- Lehmann.ScheffeUploaded byHasan Sastra Negara
- Performance Analysis of RGMs by Using Supervised Learning TechniquesUploaded byIJSTR Research Publication
- Kay - SolutionsUploaded byfarah727rash
- Statistical ConceptsUploaded byCipriano Armenteros
- Uncertainties in Material Strength Geometric and Load VariablesUploaded byDattatraya Parle
- Post StratificationUploaded byAldo Gomez Contreras
- Isom2500Uploaded byJessicaLi
- Reliability Modeling: The RIAC Guide to Reliability Prediction, AssessmentUploaded byAlexanderPetrov
- se04-01_cud be useful.pdfUploaded byshishiir saawant
- Package LtmUploaded byMRLV
- Algorithms 09 00060Uploaded byJanković Milica
- Generalized Linear ModelsUploaded byNanjunda Swamy
- Machine Learning Techniques in FinTechUploaded byMark Keaton
- ivasiuk_2007Uploaded byeya
- Ch 02 Wooldridge 6e PPT UpdatedUploaded byMy Tran Ha

- ntdc.pdfUploaded byilyasbabar55
- mvo_1965.pdfUploaded byMurash javaid
- NTS Contents for 2018 Test SNGPL TestUploaded byabuzaralmani
- Web Server 3Uploaded byBalthazar Fit
- FORM Learners Driving PermitUploaded bypRAMOD g pATOLE
- 2960scgUploaded byyefersonalb4879
- Final Annual Exams Syllabus 2018Uploaded byMurash javaid
- M3 Orange ManualUploaded byMurash javaid
- energycrisisppt-130320165743-phpapp02Uploaded byMurash javaid
- 60 Top Most CablesUploaded byMurash javaid
- 50 TOP MOST Electromagnetic InductionUploaded byMurash javaid
- 30 TOP MOST MAGNETIC CIRCUIT.pdfUploaded byMurash javaid
- BSC_01.pdfUploaded byGiannis Gkouzionis
- Experiment xUploaded byMurash javaid
- Transformers.pdfUploaded byMurash javaid
- Energy and ConservationUploaded byDidabellaLuphD'rain
- Finepix Jx500series Manual 01Uploaded bypr76
- mikroticUploaded byClaudio Frutos
- wxa_clustering_1.3_releasenotes_reva.pdfUploaded byMurash javaid
- Aftab AhmedMazariUploaded byMurash javaid
- SNGPL Connection NewUploaded bysabeearies
- Indo Pak History 969-416-225-019Uploaded byMurash javaid
- applicant-info-formUploaded byNajeeb Ur Rehman Lashari
- 20160701 VMWAW Windows10 DatasheetUploaded byMurash javaid
- Updated List of Branches Offering Extended Hours Banking ServicesUploaded byMurash javaid

- 360 Degree apraiselUploaded byrajukumar092
- How Marxism Infiltrated The Catholic ChurchUploaded byPRMurphy
- Banyan HouseUploaded byNimit Shah
- The International Journal of Psychoactive Mushrooms - TEO-August-2006-Uploaded byLuis Alejandro Diaz Arce
- Ad HominemUploaded byslide share
- Math g8 m1 Teacher Materials (1)Uploaded byRupelma Salazar Patnugot
- theory of learningUploaded byapi-336934060
- English a Democritazing or Imperialist LanguageUploaded byDevi Kavitha Deepa
- You Are Free SampleUploaded byZondervan
- Family Communication.jiaUploaded byRey Belano Tacsay
- Covert Ops - GM Operation ManualUploaded bySingham123
- See Far CBT (Lahad)Uploaded byJavier Estrada Campayo
- System DynamicsUploaded byJuan Colombo
- 0000-0150, AA VV, Vetera Testimonia de Jesu Christo, GR-En-LTUploaded bySahaquiel
- 2004_Swyngedouw_Globalisation or Glocalisation Networks, Territories and RescalingUploaded byrirochal
- Luxury BrandsUploaded byShruthi
- LVV4U Essay Research DocUploaded byEmmah Peters
- Gibbons-Integrating-art-therapy-and-yoga-therapy-pgs-51-66-extract.pdfUploaded byBalázs Tóth
- Machine Learning for Document Structure RecognitioUploaded byankit singhvi
- Learning to Talk and Talking to LearnUploaded byRosana Herrero
- KRA vs KPAUploaded byDr. S. Venkataraman
- SPR_BadiaUploaded byFen D Yap
- Hawking the Origin of the UniverseUploaded byFe Portabes
- 12-Must-Know-Facts-about-Childrens-Drawing-Interpretation.pdfUploaded byfeliperm41
- 3prep First Term Revision Unit TwoUploaded byKamal Naser
- PAYNE, Stanley G. Fascism. Comparison and DefinitionUploaded byClara Lima
- Indian PhilosophyUploaded bySivason
- Section 340 - Tan Ying Hong - Civil AppealUploaded bylionheart8888
- Jonathan L. Friedmann-Music in Our Lives_ Why We Listen, How It Works-McFarland (2014)Uploaded byHóDá Elsordi
- EIECC Program WAMY RiyadhUploaded byIslamic Books

## Much more than documents.

Discover everything Scribd has to offer, including books and audiobooks from major publishers.

Cancel anytime.