Maximum Likelihood Estimation

Maximum Likelihood Estimation
As discussed earlier, the pdf of x depends on θ, and write it as a function that

is parametrized on θ: p(x) = p(x; θ). This function can also be interpreted as the
likelihood function, since it tells us how likely it is to observe a certain x. The
maximum likelihood estimator (MLE) finds the θ that maximizes p(x; θ) over θ for a
given x.
The MLE is generally easy to derive (can be done analytically or numerically).
If the number of observations grows, the MLE is unbiased and reaches the CRLB,
so it is asymptotically unbiased and efficient.
But the MLE is not asymptotically equivalent to the MVU; the MLE is asymptoti-
cally Gaussian distributed.
If an unbiased efficient estimator exists, the MLE will produce it.
1
Example:
Consider estimating A for the following model (DC level in white noise)
x[n] = A + w[n] , n = 0, · · · , N − 1 , w[n] ∼ N (0, A)
Let us take a look at the CRLB:

N −1
" #
1 1 X
p(x; A) = exp − (x[n] − A)2
(2πA)N/2 2A
n=0
N −1 N −1
∂ ln p(x; A) N 1 X 1 X 2
=− + (x[n] − A) + (x[n] − A)
∂A 2A A n=0 2A2 n=0
N (A + 21 )
2
∂ ln p(x; A) N N 1
E = − − NA = −
∂A2 2A2 A A3 A2
A2
var(Â) ≥
N (A + 21 )
∂ ln p(x; A) ?
Can we use = I(A)(Â − A) to find the MVU?
∂A
We can use estimators for the mean and variance, but they are not optimal.
2
Example:
The MLE is obtained by setting the derivative of p(x; A) w.r.t. A to zero

N −1 N −1
∂ ln p(x; A) N 1 X 1 X 2
=− + (x[n] − A) + (x[n] − A) =0
∂A 2A A 2A2
n=0 n=0
We then obtain
N −1
1 X
Â2 + Â − x2 [n] = 0
N
n=0
There are 2 solutions, but we pick the one that always leads to a positive Â and
it can be checked that this corresponds to a maximum of p(x; A):
v
u N −1
1 u 1 X 2 1
Â = − + t x [n] +
2 N 4
n=0
It can be shown that Â is asymptotically Gaussian with
A2
E(Â) → A and var(Â) →
N (A + 12 )
3
Maximum Likelihood Estimation (Linear Gaussian Model)
For the linear Gaussian model, the likelihood function is given by

1 1 T −1
p(x; θ) = exp − (x − hθ) C (x − hθ)
(2π)N/2 det(C)1/2 2
It is clear that this function is maximized by solving
θ̂ = argmin[(x − hθ)T C−1 (x − hθ)]

θ
For a given value of x this is a numerical expression. Note that since x is a stochas-
tic variable that can take many values, so is θ̂.
4
Maximum Likelihood Estimation (Linear Gaussian Model)
Problem: min[(x − hθ)T C−1 (x − hθ)]

θ
Solution: θ̂ = (hT C−1 h)−1 hT C−1 x
Proof:
Rewriting the cost function that we have to minimize, we get
J = (x − hθ)T C−1 (x − hθ) = xT C−1 x − 2hT C−1 xθ + hT C−1 hθ2
Setting the gradient with respect to θ to zero we get
∂J
= −2hT C−1 x + 2hT C−1 hθ = 0
∂θ
Remark: For the linear Gaussian model, the MLE is equivalent to the MVU estima-
tor.
5
MLE for Transformed Parameters
The MLE of the parameter α = g(θ), where the PDF p(x; θ) is parametrized by θ,
is given by
α̂ = g(θ̂)
where θ̂ is the MLE if θ, which is obtained by maximizing p(x; θ) over θ.
If g(·) is not a one-to-one function, then α̂ maximizes some modified likelihood

function p̄T (x; α), defined as
p̄T (x; α) = max p(x; θ).

{θ:α=g(θ)}
6
Least Squares Estimation
Under the general model

x = h(θ, w)
where w is some noise vector, the least squares estimator (LSE) finds the θ for
which
kx − h(θ, 0)k2
is minimal
Properties:
No probabilistic assumptions required
The performance highly depends on the noise
7
Least Squares Estimation (Linear Model)
For the linear model, the LSE solves
θ̂ = argminkx − hθk2
θ
Problem: minkx − hθk2

θ
Solution: θ̂ = (hT h)−1 hT x
Proof: As before
Remark: For the linear model the LSE corresponds to the BLUE when the noise is
white, and to the MVU estimator when the noise is Gaussian and white.
8
Least Squares Estimation (Linear Model)
Orthogonality Condition
Let us compute (x − hθ̂)T h:
(x − hθ̂)T h = xT h − {[xT h(hT h)−1 ]hT }h = 0
For the linear model the LSE leads to the following orthogonality condition:
(x − hθ̂)T h = 0 ⇔ (x − hθ̂) ⊥ h
a^
a
x

Maximum Likelihood Estimation

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Maximum Likelihood Estimation

Uploaded by

Copyright:

Available Formats

Maximum Likelihood Estimation

As discussed earlier, the pdf of x depends on θ, and write it as a function that

The MLE is generally easy to derive (can be done analytically or numerically).

If an unbiased efficient estimator exists, the MLE will produce it.

x[n] = A + w[n] , n = 0, · · · , N − 1 , w[n] ∼ N (0, A)

Let us take a look at the CRLB:

The MLE is obtained by setting the derivative of p(x; A) w.r.t. A to zero

It can be shown that Â is asymptotically Gaussian with

For the linear Gaussian model, the likelihood function is given by

It is clear that this function is maximized by solving

θ̂ = argmin[(x − hθ)T C−1 (x − hθ)]

Problem: min[(x − hθ)T C−1 (x − hθ)]

Rewriting the cost function that we have to minimize, we get

J = (x − hθ)T C−1 (x − hθ) = xT C−1 x − 2hT C−1 xθ + hT C−1 hθ2

Setting the gradient with respect to θ to zero we get

where θ̂ is the MLE if θ, which is obtained by maximizing p(x; θ) over θ.

If g(·) is not a one-to-one function, then α̂ maximizes some modified likelihood

p̄T (x; α) = max p(x; θ).

Under the general model

No probabilistic assumptions required

The performance highly depends on the noise

For the linear model, the LSE solves

Problem: minkx − hθk2

Let us compute (x − hθ̂)T h:

(x − hθ̂)T h = xT h − {[xT h(hT h)−1 ]hT }h = 0

You might also like