You are on page 1of 2

Maximum Likelihood Estimate

Consider a model parametrized by a vector θ, and let X = (x1 , . . . xN ) be observed data


samples from the model. Then the function p(X|θ) is called the likelihood function if viewed
as a function of the parameter vector θ. It shows how probable the observed data X is for
different values of θ. Note that the likelihood is not a probability distribution over θ (its
integral with respect to θ may not be equal to one).
For example, if we consider the set X of independently drawn samples from the normal
distribution with unknown parameters, then θ = (µ, σ 2 ), and the likelihood function is
N  N/2 N
!
Y
2 1 1 X 2
p(X|θ) = N (X|µ, σ) = N (xi |µ, σ ) = exp − 2 (xi − µ)
i=1
2πσ 2 2σ i=1

treated as a function of µ and σ.


The Maximum Likelihood Estimate (MLE) for parameter is the value of θ which maximizes
the likelihood. It is a very common way in statistics to estimate the unknown parameters
for the model after observing the data.
Continuing the example above, let us find the MLE for parameter µ. As we assumed that
samples are drawn independently from the model, the likelihood takes the form of a product
of individual likelihood functions for each sample p(xi |θ). When
Pfinding the MLE it is often
N
convenient to find the maximum of the function log p(X|θ) = i=1 log p(xi |θ) (which in its
turn, takes the form of the sum of individual log likelihood functions) instead of directly
optimizing p(X|θ):
N
N 2 1 X
(xi − µ)2

log p(X|θ) = − log 2πσ − 2
2 2σ i=1
To maximize this expression with respect to µ, we set the partial derivative with respect
to µ to zero and obtain:
N
∂ 1 X
p(X|µ, σ) = − 2 (xi − µ) = 0
∂µ σ i=1
N
1 X
µM LE = xi
N i=1
Let us also consider multidimensional case for this problem: now each xi is a d-dimensional
vector drawn from the multivariate normal distribution with parameters mean µ ∈ Rd and
covariance matrix Σ ∈ Rd×d :
 
1 1 T −1
N (x|µ, Σ) = exp − (x − µ) Σ (x − µ)
(2π)d/2 (det Σ)1/2 2
Similarly, the log likelihood for this model takes the form:
N
Nd N X1
log p(X|µ, Σ) = − log(2π) − log det Σ − (xi − µ)T Σ−1 (xi − µ) =
2 2 i=1
2
N
Nd N 1 X T −1
xi Σ xi − µT Σ−1 xi − xTi Σ−1 µ + µT Σ−1 µ =

=− log(2π) − log det Σ −
2 2 2 i=1


use the fact that Σ is symmetric, thus Σ−1 is also symmetric which leads to:


−1 −1
T −1 T
T T
xTi xTi Σ−1 µ

µ Σ xi = µ Σ x i = Σ µ=

N
Nd N 1 X T −1
xi Σ xi − 2xTi Σ−1 µ + µT Σ−1 µ

=− log(2π) − log det Σ −
2 2 2 i=1
Now to obtain the MLE for µ, we need to compute the derivative of this expression with
respect to vector µ and set it to zero. We will use the following vector differentiation rules:
∂ T
(a y) = a for y ∈ Rd , a ∈ Rd
∂y
∂ T
(y Ay) = 2Ay for y ∈ Rd and symmetric matrix A ∈ Rd×d
∂y
Applying them to the log likelihood expression, we get:
N N
∂ 1X  X −1
log p(X|µ, Σ) = − −2Σ−1 xi + 2Σ−1 µ = Σ (xi − µ) = 0
∂µ 2 i=1 i=1

N
1 X
µM L = xi
N i=1

You might also like