You are on page 1of 3

Note: Asymptotic Properties of Maximum Likelihood

Estimators

April 14, 2019

Theorem. Consider a scalar parameter θ whose true value is θ0 , and a continuous


random variable X with probability density function (PDF) f (x|θ). Subject to mild
regularity conditions, the maximum likelihood estimator (MLE) θ̂n of θ derived from a
random sample of size n is:
P
1. Consistent: θ̂n → θ0 .
D 1
2. Asymptotically normal: θ̂n → N (θ0 , nI(θ0)
) where I(θ0 ) is the Fisher information

in the random variable X at θ = θ0 , I(θ0 ) = EX|θ0 {[ ∂θ log f (X|θ)|θ=θ0 ]2 }.
Sketch of proof for the first statement.
Consider the random variable Z(θ) = log f (X|θ). Its expected value with respect to
the true distribution of X given θ = θ0 is
Z
EX|θ0 [Z(θ)] = log f (x|θ)f (x|θ0 )dx. (1)

We show in the following that EX|θ0 [Z(θ)] attains its maximum at θ = θ0 . For all
θ 6= θ0 ,
EX|θ0 [Z(θ)] − EX|θ0 [Z(θ0 )]
Z Z
= log f (x|θ)f (x|θ0 )dx − log f (x|θ0 )f (x|θ0 )dx
Z
f (x|θ)
= log f (x|θ0 )dx (2)
f (x|θ0 )
Z  
f (x|θ)
< − 1 f (x|θ0 )dx (as log u < u − 1 for u 6= 1)
f (x|θ0 )
= 0.
So θ0 = arg maxθ EX|θ0 [Z(θ)].
Next, we take a look at the log-likelihood function
n
X n
X
L(θ) = log fn (X1 , . . . , Xn |θ) = log f (Xi |θ) = Zi (θ). (3)
i=1 i=1

By the Law of Large Numbers,


n
1X P
Zi (θ) → EX|θ0 [Z(θ)]. (4)
n
i=1

Consequently, the maximizer of n1 ni=1 Zi (θ) (i.e., the MLE θ̂n = arg maxθ n1 ni=1 Zi (θ))
P P
must be close to the maximizer of EX|θ0 [Z(θ)] (which is the true parameter value θ0 ),
provided the convergence of n1 ni=1 Zi (θ) to EX|θ0 [Z(θ)] is uniform in θ.
P

1
Sketch of proof for the second statement.
Assume that we obtain the MLE θ̂n using the first-order condition

L(θ) θ=θ0
=0 (5)
∂θ
Pn
where L(θ) = i=1 log f (Xi |θ). Expanding the left hand side at point θ = θ0 , we get

∂ ∂2
0= L(θ) θ=θ0
+ (θ̂n − θ0 ) L(θ) θ=θ0
+ remainder. (6)
∂θ ∂θ2

Since θ̂n is consistent, as n → ∞, the value of θ̂n will be close to θ0 with high probability.
So with high probability we can ignore the remainder term and get
−1
∂2


θ̂n − θ = − L(θ) θ=θ0
L(θ) θ=θ0
. (7)
∂θ2 ∂θ

We now make a detour to present some important results about Fisher informa-
tion, which are useful for completing the proof of the main theorem.
Assume that for each value of x, the PDF f (x|θ) is a twice differentiable function of
∂2
θ. Let λ(x|θ) = log f (x|θ), λ0 (x|θ) = ∂θ

log f (x|θ) and λ00 = ∂θ 2 log f (x|θ). The Fisher
information in the random variable X is defined as

I(θ) = EX|θ [λ0 (X|θ)]2 .



(8)

Two alternative methods of calculating I(θ) are:

I(θ) = −EX|θ [λ00 (X|θ)] = V arX|θ [λ0 (X|θ)]. (9)

In the following we show thatR the three above expressions of I(θ) are equivalent. By
the definition of PDF, we have f (x|θ)dx = 1. By R taking first and second derivatives
with respect to θ inside the integral sign, we obtain f 0 (x|θ)dx = 0 and f 00 (x|θ)dx = 0.
R
0 (x|θ)
Since λ0 (x|θ) = ff (x|θ) , then
Z Z
EX|θ [λ0 (X|θ)] = λ0 (x|θ)f (x|θ)dx = f 0 (x|θ)dx = 0. (10)

So V arX|θ [λ0 (X|θ)] = EX|θ [λ0 (X|θ)]2 = I(θ).




Next, note that


f (x|θ)f 00 (x|θ) − [f 0 (x|θ)]2
λ00 (x|θ) =
[f (x|θ)]2
00
(11)
f (x|θ)
= − [λ0 (x|θ)]2 .
f (x|θ)
It then follows that
f 00 (x|θ)
Z
00
f (x|θ)dx − EX|θ [λ0 (X|θ)]2

EX|θ [λ (X|θ)] =
f (x|θ)
Z (12)
= f 00 (x|θ)dx − I(θ) = −I(θ).

Back to the asymptotic property of MLE θ̂n . Recall that we have


−1
∂2


θ̂n − θ0 = − L(θ) θ=θ0
L(θ) θ=θ0
. (13)
∂θ2 ∂θ

2
We look at the two components on the right hand side separately. First, by the Law of
Large Numbers,
n
1 ∂2 1 ∂2 X
L(θ) θ=θ0
= log f (Xi |θ) θ=θ0
n ∂θ2 n ∂θ2
i=1
n
1 X ∂2 (14)
= log f (Xi |θ) θ=θ0
n ∂θ2
i=1
P
→ EX|θ0 [λ00 (X|θ0 )] = −I(θ0 ).
Second, by the Central Limit Theorem,
n
1 ∂ 1 ∂ X
L(θ) θ=θ0
= log f (Xi |θ) θ=θ0
n ∂θ n ∂θ
i=1
n
1X ∂
= log f (Xi |θ) θ=θ0 (15)
n ∂θ
i=1
V arX|θ0 [λ0 (X|θ0 )]
   
D I(θ0 )
→ N EX|θ0 [λ0 (X|θ0 )], = N 0, .
n n

Putting everything together, we get


 
D I(θ0 )
θ̂n − θ0 → −[I(θ0 )]−1 N 0, (16)
n

and  
D 1
θ̂n → N θ0 , . (17)
nI(θ0 )

You might also like