You are on page 1of 43

# Maximum Likelihood &

Method of Moments
Estimation
Patrick Zheng
01/30/14

Introduction
Goal: Find a good POINT estimation of population
parameter
Data: We begin with a random sample of size n
taken from the totality of a population.
We shall estimate the parameter based on the sample

## Distribution: Initial step is to identify the

probability distribution of the sample, which is
characterized by the parameter.
The distribution is always easy to identify
The parameter is unknown.
2

Notations
Sample: , ,,
Distribution: iid f(x, )
Parameter:

Example
e.g., the distribution is normal (f=Normal) with
unknown parameter and (=(, )).
e.g., the distribution is binomial (f=binomial) with
unknown parameter p (= p).
3

## Its important to have a good estimate!

The importance of point estimates lies in the
fact that many statistical formulas are based
on them, such as confidence interval and
formulas for hypothesis testing, etc..
A good estimate should
1.
2.
3.
4.

Be unbiased
Have small variance
Be efficient
Be consistent
4

Unbiasedness
An estimator is unbiased if its mean equals
the parameter.
It does not systematically overestimate or
underestimate the target parameter.
Sample mean(x)/proportion( p ) is an unbiased
estimator of population mean/proportion.

Small variance
We also prefer the sampling distribution of
the estimator has a small spread or
variability, i.e. small standard deviation.

Efficiency
An estimator is said to be efficient if its
Mean Square Error (MSE) is minimum
among all competitors.
MSE( )

E(
where Bias( )

)2
E( )

Bias 2 ( )

v ar( ),

Relative Efficiency( , ) =

(
(

)
)

## If >1, is more efficient than .

If <1, is more efficient than .
7

Example: efficiency
Suppose X1 , X2 ,...Xn iid~ N( ,
If 1 X1 ,then
Bias 2 ( 1 )

MSE( 1 )

If 2

X1

MSE( 2 )

X2

... X n

n
Bias 2 ( 2 )

Since R.E.( 1 , 2 )

v ar( 1 )

).

,then

v ar( 2 )

MSE( 2 )
=
MSE( 1 )

## is more efficient than .

0
2

/n
2

1
=
n

/ n.
1,

Consistency
An estimator is said to be consistent if
sample size goes to +, will converge in
probability to .
0, Pr(|

0 as n

Chebychevs rule
0, Pr(|

E(

)2
2

MSE( )
2

## If one can prove MSE of tends to 0 when

goes to +, then is consistent.
9

Example: Consistency
Suppose X1 , X2 ,...Xn iid~ N( ,
Estimator
since

0, Pr(|
2

/n
2

X1

X2

).

... X n
n

)2

E(
2

is consistent,
MSE( )
2

0 as n

10

## Point Estimation Methods

There are many methods available for
estimating the parameter(s) of interest.
Three of the most popular methods of
estimation are:
The method of moments (MM)
The method of maximum likelihood (ML)
Bayesian method

11

12

## The Method of Moments

One of the oldest methods; very simple
procedure
What is Moment?
Based on the assumption that sample
moments should provide GOOD ESTIMATES
of the corresponding population moments.
13

How it works?

'
1

X; m

'
2

(1/ n)

n
i

2
X
1 i

14

## Example: normal distribution

X1 , X 2 ,...X n iid~ N( ,
step 1,

'
1

E(X)
'
1

step 2,

X; m

step 3,

Set

'
1

m1' ,

).

'
2

E(X 2 )

(1/ n)

;
'
2

'
2

2
X
.
1 i

m '2 , therefore,

X,
2

(1/ n)

n
i

2
X
1 i

X, 2

(1/ n)

n
i

2
2
X
X
1 i
15

x)

p
1 p

if x 1
if x 0
16

## Example: Poisson distribution

17

Note
MME may not be unique.
In general, minimum number of moment
conditions we need equals the number of
parameters.
Question: Can these two estimators be combined
in some optimal way?
Answer: Generalized method of moments.
18

## Pros of Method of Moments

Easy to compute and always work:
The method often provides estimators when other
methods fail to do so or when estimators are hard
to obtain (as in the case of gamma distribution).

MME is consistent.

19

## Cons of Method of Moments

They are usually not the best estimators
available. By best, we mean most efficient,
i.e., achieving minimum MSE.

## Sometimes it may be meaningless.

(see next page for example)

20

## Sometimes, MME is meaningless

Suppose we observe 3,5,6,18 from a U(0,)
Since E(X)= /2,

3+5+6+18
MME of is 2 X=2*
=16, which is
4
not acceptable, because we have
already observed a value of 18.

21

Likelihood

22

## The Method of Maximum Likelihood

Proposed by geneticist/statistician:
Sir Ronald A. Fisher in 1922
Idea: We attempt to find the values of the
parameters which would have most likely
produced the data that we in fact observed.

23

What is likelihood?

when =1.

24

25

(discrete case)

26

## Example of computing likelihood

(continuous case)

27

Definition of MLE

## In general, the method of ML results in the problem of

maximizing a function of single or several parameters.
One way to do the maximization is to take derivative.

28

29

30

Example contd

31

32

Example contd

33

## More than one parameter

34

Pros of Method of ML
When sample size n is large (n>30), MLE is
unbiased, consistent, normally distributed,
and efficient (regularity conditions)
Efficient means it produces the minimum MSE
than other methods including Method of Moments

## More useful in statistical inference.

35

Cons of Method of ML
MLE can be highly biased for small samples.
Sometimes, MLE has no closed-form solution.
MLE can be sensitive to starting values, which
might not give a global optimum.
Common when is of high dimension

36

1.

aforementioned)

2.

## Apply maximization techniques including

Newtons method, quasi-Newton method
(Broyden 1970), direct search method (Nelder
and Mead 1965), etc.
These methods can be implemented by R function
optimize(), optim()
37

Newtons Method
a method for finding successively better
approximations to the roots (or zeroes) of a
real-valued function.
Pick an close to the root of a continuous
function ()
Take the derivative of () to get ()
Plug into

(
(

)
,
)

( ) 0

## Repeat until converges where

38

Example
Solve 1 = 0
Denote ()= 1; let starting point = 0.1
()=

(
(

)
)

= 0.1

.
.

= 0.0048374

Repeat until |
| < 0.00001,

= 7.106 10
39

## Example: find MLE by Newtons Method

In Poisson Distribution, find is equivalent to
maximizing ln ()
( )

( )

define () =
() =

(
(

)
)

Given , , ,

## and , we can find .

40

Example contd
Suppose we collected a sample from Poi():
18,10,8,13,7,17,11,6,7,7,10,10,12,4,12,4,12,10,7,14,13,7

( )
( )

41

() =

-lnL(lamda).

42

The End!
Thank you!

43