You are on page 1of 9

Maximum Likelihood Estimation

As discussed earlier, the pdf of x depends on θ, and write it as a function that


is parametrized on θ: p(x) = p(x; θ). This function can also be interpreted as the
likelihood function, since it tells us how likely it is to observe a certain x. The
maximum likelihood estimator (MLE) finds the θ that maximizes p(x; θ) over θ for a
given x.

The MLE is generally easy to derive (can be done analytically or numerically).

If the number of observations grows, the MLE is unbiased and reaches the CRLB,
so it is asymptotically unbiased and efficient.

But the MLE is not asymptotically equivalent to the MVU; the MLE is asymptoti-
cally Gaussian distributed.

If an unbiased efficient estimator exists, the MLE will produce it.

1
Maximum Likelihood Estimation
Example:

Consider estimating A for the following model (DC level in white noise)

x[n] = A + w[n] , n = 0, · · · , N − 1 , w[n] ∼ N (0, A)

Let us take a look at the CRLB:


N −1
" #
1 1 X
p(x; A) = exp − (x[n] − A)2
(2πA)N/2 2A
n=0
N −1 N −1
∂ ln p(x; A) N 1 X 1 X 2
=− + (x[n] − A) + (x[n] − A)
∂A 2A A n=0 2A2 n=0
N (A + 21 )
 2 
∂ ln p(x; A) N N 1
E = − − NA = −
∂A2 2A2 A A3 A2
A2
var(Â) ≥
N (A + 21 )
∂ ln p(x; A) ?
Can we use = I(A)(Â − A) to find the MVU?
∂A
We can use estimators for the mean and variance, but they are not optimal.

2
Maximum Likelihood Estimation
Example:

The MLE is obtained by setting the derivative of p(x; A) w.r.t. A to zero


N −1 N −1
∂ ln p(x; A) N 1 X 1 X 2
=− + (x[n] − A) + (x[n] − A) =0
∂A 2A A 2A2
n=0 n=0

We then obtain
N −1
1 X
Â2 + Â − x2 [n] = 0
N
n=0

There are 2 solutions, but we pick the one that always leads to a positive  and
it can be checked that this corresponds to a maximum of p(x; A):
v
u N −1
1 u 1 X 2 1
 = − + t x [n] +
2 N 4
n=0

It can be shown that  is asymptotically Gaussian with

A2
E(Â) → A and var(Â) →
N (A + 12 )

3
Maximum Likelihood Estimation (Linear Gaussian Model)

For the linear Gaussian model, the likelihood function is given by


 
1 1 T −1
p(x; θ) = exp − (x − hθ) C (x − hθ)
(2π)N/2 det(C)1/2 2

It is clear that this function is maximized by solving

θ̂ = argmin[(x − hθ)T C−1 (x − hθ)]


θ

For a given value of x this is a numerical expression. Note that since x is a stochas-
tic variable that can take many values, so is θ̂.

4
Maximum Likelihood Estimation (Linear Gaussian Model)

Problem: min[(x − hθ)T C−1 (x − hθ)]


θ
Solution: θ̂ = (hT C−1 h)−1 hT C−1 x

Proof:

Rewriting the cost function that we have to minimize, we get

J = (x − hθ)T C−1 (x − hθ) = xT C−1 x − 2hT C−1 xθ + hT C−1 hθ2

Setting the gradient with respect to θ to zero we get

∂J
= −2hT C−1 x + 2hT C−1 hθ = 0
∂θ

Remark: For the linear Gaussian model, the MLE is equivalent to the MVU estima-
tor.

5
MLE for Transformed Parameters

The MLE of the parameter α = g(θ), where the PDF p(x; θ) is parametrized by θ,
is given by
α̂ = g(θ̂)

where θ̂ is the MLE if θ, which is obtained by maximizing p(x; θ) over θ.

If g(·) is not a one-to-one function, then α̂ maximizes some modified likelihood


function p̄T (x; α), defined as

p̄T (x; α) = max p(x; θ).


{θ:α=g(θ)}

6
Least Squares Estimation

Under the general model


x = h(θ, w)

where w is some noise vector, the least squares estimator (LSE) finds the θ for
which
kx − h(θ, 0)k2

is minimal

Properties:

No probabilistic assumptions required

The performance highly depends on the noise

7
Least Squares Estimation (Linear Model)

For the linear model, the LSE solves

θ̂ = argminkx − hθk2
θ

Problem: minkx − hθk2


θ
Solution: θ̂ = (hT h)−1 hT x

Proof: As before

Remark: For the linear model the LSE corresponds to the BLUE when the noise is
white, and to the MVU estimator when the noise is Gaussian and white.

8
Least Squares Estimation (Linear Model)
Orthogonality Condition

Let us compute (x − hθ̂)T h:

(x − hθ̂)T h = xT h − {[xT h(hT h)−1 ]hT }h = 0

For the linear model the LSE leads to the following orthogonality condition:

(x − hθ̂)T h = 0 ⇔ (x − hθ̂) ⊥ h

a^

a
x

You might also like