You are on page 1of 43

Maximum Likelihood &

Method of Moments
Estimation
Patrick Zheng
01/30/14

Introduction
Goal: Find a good POINT estimation of population
parameter
Data: We begin with a random sample of size n
taken from the totality of a population.
We shall estimate the parameter based on the sample

Distribution: Initial step is to identify the


probability distribution of the sample, which is
characterized by the parameter.
The distribution is always easy to identify
The parameter is unknown.
2

Notations
Sample: , ,,
Distribution: iid f(x, )
Parameter:

Example
e.g., the distribution is normal (f=Normal) with
unknown parameter and (=(, )).
e.g., the distribution is binomial (f=binomial) with
unknown parameter p (= p).
3

Its important to have a good estimate!


The importance of point estimates lies in the
fact that many statistical formulas are based
on them, such as confidence interval and
formulas for hypothesis testing, etc..
A good estimate should
1.
2.
3.
4.

Be unbiased
Have small variance
Be efficient
Be consistent
4

Unbiasedness
An estimator is unbiased if its mean equals
the parameter.
It does not systematically overestimate or
underestimate the target parameter.
Sample mean(x)/proportion( p ) is an unbiased
estimator of population mean/proportion.

Small variance
We also prefer the sampling distribution of
the estimator has a small spread or
variability, i.e. small standard deviation.

Efficiency
An estimator is said to be efficient if its
Mean Square Error (MSE) is minimum
among all competitors.
MSE( )

E(
where Bias( )

)2
E( )

Bias 2 ( )

v ar( ),

Relative Efficiency( , ) =

(
(

)
)

If >1, is more efficient than .


If <1, is more efficient than .
7

Example: efficiency
Suppose X1 , X2 ,...Xn iid~ N( ,
If 1 X1 ,then
Bias 2 ( 1 )

MSE( 1 )

If 2

X1

MSE( 2 )

X2

... X n

n
Bias 2 ( 2 )

Since R.E.( 1 , 2 )

v ar( 1 )

).

,then

v ar( 2 )

MSE( 2 )
=
MSE( 1 )

is more efficient than .

0
2

/n
2

1
=
n

/ n.
1,

Consistency
An estimator is said to be consistent if
sample size goes to +, will converge in
probability to .
0, Pr(|

0 as n

Chebychevs rule
0, Pr(|

E(

)2
2

MSE( )
2

If one can prove MSE of tends to 0 when


goes to +, then is consistent.
9

Example: Consistency
Suppose X1 , X2 ,...Xn iid~ N( ,
Estimator
since

0, Pr(|
2

/n
2

X1

X2

).

... X n
n

)2

E(
2

is consistent,
MSE( )
2

0 as n

10

Point Estimation Methods


There are many methods available for
estimating the parameter(s) of interest.
Three of the most popular methods of
estimation are:
The method of moments (MM)
The method of maximum likelihood (ML)
Bayesian method

11

1, The Method of Moments

12

The Method of Moments


One of the oldest methods; very simple
procedure
What is Moment?
Based on the assumption that sample
moments should provide GOOD ESTIMATES
of the corresponding population moments.
13

How it works?

'
1

X; m

'
2

(1/ n)

n
i

2
X
1 i

14

Example: normal distribution


X1 , X 2 ,...X n iid~ N( ,
step 1,

'
1

E(X)
'
1

step 2,

X; m

step 3,

Set

'
1

m1' ,

).

'
2

E(X 2 )

(1/ n)

;
'
2

'
2

2
X
.
1 i

m '2 , therefore,

X,
2

(1/ n)

n
i

2
X
1 i

Solving the two equations, we get

X, 2

(1/ n)

n
i

2
2
X
X
1 i
15

Example: Bernoulli Distribution

X follows a Bernoulli distribution, if P(X

x)

p
1 p

if x 1
if x 0
16

Example: Poisson distribution

17

Note
MME may not be unique.
In general, minimum number of moment
conditions we need equals the number of
parameters.
Question: Can these two estimators be combined
in some optimal way?
Answer: Generalized method of moments.
18

Pros of Method of Moments


Easy to compute and always work:
The method often provides estimators when other
methods fail to do so or when estimators are hard
to obtain (as in the case of gamma distribution).

MME is consistent.

19

Cons of Method of Moments


They are usually not the best estimators
available. By best, we mean most efficient,
i.e., achieving minimum MSE.

Sometimes it may be meaningless.


(see next page for example)

20

Sometimes, MME is meaningless


Suppose we observe 3,5,6,18 from a U(0,)
Since E(X)= /2,

3+5+6+18
MME of is 2 X=2*
=16, which is
4
not acceptable, because we have
already observed a value of 18.

21

2, The Method of Maximum


Likelihood

22

The Method of Maximum Likelihood


Proposed by geneticist/statistician:
Sir Ronald A. Fisher in 1922
Idea: We attempt to find the values of the
parameters which would have most likely
produced the data that we in fact observed.

23

What is likelihood?

E.g., Likelihood of =1 is the chance of observing X1 , X 2 ,..., X n


when =1.

24

How to compute Likelihood?

25

Example of computing likelihood


(discrete case)

26

Example of computing likelihood


(continuous case)

27

Definition of MLE

In general, the method of ML results in the problem of


maximizing a function of single or several parameters.
One way to do the maximization is to take derivative.

28

Procedure to find MLE

29

Example: Poisson Distribution

30

Example contd

31

Example: Uniform Distribution

32

Example contd

33

More than one parameter

34

Pros of Method of ML
When sample size n is large (n>30), MLE is
unbiased, consistent, normally distributed,
and efficient (regularity conditions)
Efficient means it produces the minimum MSE
than other methods including Method of Moments

More useful in statistical inference.

35

Cons of Method of ML
MLE can be highly biased for small samples.
Sometimes, MLE has no closed-form solution.
MLE can be sensitive to starting values, which
might not give a global optimum.
Common when is of high dimension

36

How to maximize Likelihood


1.

Take derivative and solve analytically (as


aforementioned)

2.

Apply maximization techniques including


Newtons method, quasi-Newton method
(Broyden 1970), direct search method (Nelder
and Mead 1965), etc.
These methods can be implemented by R function
optimize(), optim()
37

Newtons Method
a method for finding successively better
approximations to the roots (or zeroes) of a
real-valued function.
Pick an close to the root of a continuous
function ()
Take the derivative of () to get ()
Plug into

(
(

)
,
)

( ) 0

Repeat until converges where

38

Example
Solve 1 = 0
Denote ()= 1; let starting point = 0.1
()=

(
(

)
)

= 0.1

.
.

= 0.0048374

Repeat until |
| < 0.00001,

= 7.106 10
39

Example: find MLE by Newtons Method


In Poisson Distribution, find is equivalent to
maximizing ln ()
( )

finding the root of

Implement Newtons method here,


( )

define () =
() =

(
(

)
)

Given , , ,

and , we can find .


40

Example contd
Suppose we collected a sample from Poi():
18,10,8,13,7,17,11,6,7,7,10,10,12,4,12,4,12,10,7,14,13,7

Implement Newtons method in R:

( )
( )

41

Use R function optim()


() =

Typo! This should be


-lnL(lamda).

42

The End!
Thank you!

43