Lecture 5

Lecture 5: Point Estimation of Parameters (2)
2. Maximum Likelihood Estimator
The maximum likelihood estimator is by far the most popular technique of parameter
estimation. If 𝑌1 , 𝑌2 , … 𝑌𝑛 be an iid sample from distribution 𝑓(𝑦; 𝜃1, 𝜃2 , … 𝜃𝑘 ), the
likelihood function is defined as the joint distribution of 𝑌1 , 𝑌2 , … 𝑌𝑛 , which by virtue
of independence is simply the product of the individual marginal distributions.
𝑛
𝐿(𝜽|𝒚) = 𝐿(𝜃1, 𝜃2 , … 𝜃𝑘 |𝑦1 , 𝑦2 , … 𝑦𝑛 ) = ∏ 𝑓(𝑦𝑖 ; 𝜃1, 𝜃2 , … 𝜃𝑘 )

𝑖=1
Note that likelihood function is a function of the parameters 𝜃1, 𝜃2 , … 𝜃𝑘 and sample
information is assumed fixed or known.
If the likelihood function is differentiable in 𝜃𝑖 , the possible candidate for the MLE
are the values of 𝜃1, 𝜃2 , … 𝜃𝑘 that solve:
𝜕𝐿(𝜽|𝒚)
= 0, 𝑖 = 1, 2, … 𝑘
𝜕𝜃𝑖
𝜕2 𝐿(𝜽|𝒚)
The second order condition [ < 0] should also be checked.
𝜕𝜃𝑖2
MLE Defined:
Suppose that the likelihood function depends on k parameters 𝜃1, 𝜃2 , … 𝜃𝑘 . Choose
as estimates those values of the parameters that maximize the
likelihood 𝐿(𝜃1, 𝜃2 , … 𝜃𝑘 |𝑦1 , 𝑦2 , … 𝑦𝑛 ).
Intuitively, the MLE is a reasonable choice for an estimator as it is the point in the
parameter space for which the observed sample is most likely to occur.
If the maximum occurs at the boundary point of the parameter space, derivative may
not be zero and the boundary must be checked for maximum separately. Points at
which derivative is zero may be local or global maximum or even point of inflection,
but MLE corresponds to global maximum. This is especially the case when the
resulting equation after setting derivative to zero has only numerically solution and
needs a starting point for iteration. If starting point is selected near local maximum,
chance is that solution obtained will not be optimal and hence not the MLE.
Logic of MLE:
Suppose that we are confronted with a box that contains three balls, each ball is
either red or white but actual number of red or white balls in the box is not known.
Let we are allowed to randomly sample two of the balls without replacement. If our
random sample yields two red balls, what would be a good estimate of the total
number of red balls in the box?
Obviously, the number of red balls in the box must be two or three. Let’s set up two
scenarios:
1. There are only two red balls in the box
2. All the three balls are red in the box
Probability that first scenario being true: P(two red balls in the box out of three) =
2 1
( )( ) 1
2 0
3 =
( ) 3
2
Probability that second scenario being true: P(two red balls in the box out of three)
=1 [We surely know that given that all three are red, two selected must be red].
It should seem reasonable to choose three as the estimate of the number of red balls
in the box because this estimate maximizes the probability of obtaining the observed
sample (in this case two selected balls being red).
This example illustrates a method for finding an estimator that can be applied to any
situation. The method of maximum likelihood, selects as estimates the values of the
parameters that maximize the likelihood (the joint probability function or joint
density function) of the observed sample.
Example1: Suppose that 𝑌1 , 𝑌2 , … 𝑌𝑛 constitute a random sample from a Poisson

distribution (with mean 𝜆 ) given by :
𝑒 −𝜆 𝜆𝑦
𝑓(𝑦; 𝜆) = , 𝑦 = 0,1,2, …
𝑦!
Find the MLE of 𝜆

Sol:
𝐿(𝜽|𝒚) = 𝐿(𝜆|𝑦1 , 𝑦2 , … 𝑦𝑛 ) = ∏𝑛𝑖=1 𝑓(𝑦𝑖 ; 𝜆)
𝐿(𝜆|𝒚) = 𝑓(𝑦1 ; 𝜆)𝑓(𝑦2 ; 𝜆) … 𝑓(𝑦𝑛 ; 𝜆)
𝑒 −𝜆 𝜆𝑦1 𝑒 −𝜆 𝜆𝑦2 𝑒 −𝜆 𝜆𝑦𝑛

= [ ][ ] …[ ]
𝑦1 ! 𝑦2 ! 𝑦𝑛 !
∑𝑛
𝑒 −𝑛𝜆 𝜆 𝑖=1 𝑦𝑖
= ∏𝑛𝑖=1 𝑦𝑖 !
MLE of 𝜆 is obtained by maximizing the likelihood function wrt 𝜆. However, it is

true that maximum of f(x) and ln f(x) occur at same x. We actually maximize the log
of Likelihood to simply the differentiation.
ln𝐿(𝜆|𝒚) = −𝑛 𝜆 + (∑𝑛𝑖=1 𝑦𝑖 ) 𝑙𝑛 𝜆 − ∑𝑛𝑖=1 𝑙𝑛𝑦𝑖
𝜕ln𝐿(𝜆|𝒚) (∑𝑛𝑖=1 𝑦𝑖 )
= −𝑛 + =0
𝜕𝜆 𝜆
∑𝑛
𝑖=1 𝑦𝑖
=>𝜆̂ = = 𝑦̅
𝑛
𝜕2 ln𝐿(𝜆|𝒚) ∑𝑛
𝑖=1 𝑦𝑖
[Check that = − < 0, as all y are non-negative]
𝜕𝜆2 𝜆2
Hence MLE of 𝜆 is 𝑦̅.
Example2: Let If 𝑌1 , 𝑌2 , … 𝑌𝑛 be a random sample from a normal distribution with

mean µ and variance σ2. Find the MLEs of µ and σ2.
Sol:
𝐿(𝜽|𝒚) = 𝐿(𝜇, 𝜎 2 |𝑦1 , 𝑦2 , … 𝑦𝑛 ) = ∏𝑛𝑖=1 𝑓(𝑦𝑖 ; 𝜇, 𝜎 2 )
𝐿(𝜇, 𝜎 2 |𝒚) = 𝑓(𝑦1 ; 𝜇, 𝜎 2 )𝑓(𝑦2 ; 𝜇, 𝜎 2 ) … 𝑓(𝑦𝑛 ; 𝜇, 𝜎 2 )
1 (𝑦1 −𝜇)2 1 (𝑦2 −𝜇)2 1 (𝑦𝑛 −𝜇)2

=[ exp{− }] [ exp{− }]…[ exp{− }]
𝜎√2𝜋 𝜎√2𝜋2𝜎 2 2𝜎 2 𝜎√2𝜋 2𝜎 2
1 1 𝑛
=[ 𝑛 exp{− 2 ∑𝑖=1(𝑦𝑖 − 𝜇 )2 }]
𝜎 (2𝜋)𝑛/2 2𝜎
1 1
=( )𝑛/2 exp{− ∑𝑛𝑖=1(𝑦𝑖 − 𝜇 )2 }
2𝜋𝜎 2 2𝜎 2
𝑛
𝑛 𝑛 1
ln 𝐿(𝜇, 𝜎 2 |𝒚) = − 𝑙𝑛𝜎 2 − 𝑙𝑛2𝜋 − 2 ∑(𝑦𝑖 − 𝜇)2
2 2 2𝜎
𝑖=1
𝑛
2 |𝒚)
𝜕 ln 𝐿(𝜇, 𝜎 1
= ∑(𝑦𝑖 − 𝜇)
𝜕𝜇 𝜎2
𝑖=1
𝑛
2 |𝒚)
𝜕 ln 𝐿(𝜇, 𝜎 𝑛 1 1
= − ( ) ( 2 ) + 4 ∑(𝑦𝑖 − 𝜇)2
𝜕𝜎 2 2 𝜎 2𝜎
𝑖=1
Setting these derivatives equal to zero and solving simultaneously, we obtain from
the first equation:
𝑛 𝑛
1 ∑𝑛𝑖=1 𝑦𝑖
∑(𝑦𝑖 − 𝜇̂ ) = 0 => ∑ 𝑦𝑖 − 𝑛𝜇̂ = 0 => 𝜇̂ = = 𝑦̅
𝜎̂ 2 𝑛
𝑖=1 𝑖=1
Substituting 𝑦̅ for 𝜇̂ in the second equation and solving for 𝜎̂ 2 , we have

𝑛
𝑛 1 2 2
∑𝑛𝑖=1(𝑦𝑖 − 𝑦̅)2
= − ( 2 ) + 4 ∑(𝑦𝑖 − 𝑦̅) = 0 => 𝜎̂ =
𝜎̂ 𝜎̂ 𝑛
𝑖=1
It can be shown that 𝑦̅ is unbiased for µ, but 𝜎̂ 2 is not unbiased for σ2, it can easily
be adjusted to the unbiased estimator S2. Also check that second derivatives are
negative. Also note that MLE and MoM estimators are same for these two
parameters.
Example3: Let If 𝑌1 , 𝑌2 , … 𝑌𝑛 be a random sample from a Bernoulli (p) distribution,

given by:
𝑓(𝑦; 𝑝) = 𝑝 𝑦 (1 − 𝑝)1−𝑦 , 𝑦 = 0,1
Find the MLE of p.
Sol:
𝐿(𝑝|𝒚) = 𝐿(𝑝|𝑦1 , 𝑦2 , … 𝑦𝑛 ) = ∏𝑛𝑖=1 𝑓(𝑦𝑖 |𝑝)
𝐿(𝑝|𝒚)= 𝑓(𝑦1 |𝑝)𝑓(𝑦2 |𝑝) … 𝑓(𝑦𝑛 |𝑝)
=∏𝑛𝑖=1 𝑝 𝑦𝑖 (1 − 𝑝)1−𝑦𝑖
= 𝑝 𝑦1 (1 − 𝑝)1−𝑦1 𝑝 𝑦2 (1 − 𝑝)1−𝑦2 … 𝑝 𝑦𝑛 (1 − 𝑝)1−𝑦𝑛
= 𝑝∑ 𝑦𝑖 (1 − 𝑝)𝑛−∑ 𝑦𝑖 = 𝑝 𝑦 (1 − 𝑝)𝑛−𝑦 (where 𝑦 = ∑ 𝑦𝑖 )
𝑙𝑛𝐿(𝑝|𝒚) = 𝑦𝑙𝑛 𝑝 + (𝑛 − 𝑦)𝑙𝑛(1 − 𝑝)
𝜕𝑙𝑛𝐿(𝑝|𝒚) 𝑦 𝑛−𝑦 𝑦 ∑𝑛𝑖=1 𝑦𝑖

= − = 0 => 𝑝̂ = =
𝜕𝑝 𝑝 1−𝑝 𝑛 𝑛
[𝑖. 𝑒. 𝑠𝑎𝑚𝑝𝑙𝑒 𝑝𝑟𝑜𝑝𝑜𝑟𝑡𝑖𝑜𝑛 𝑜𝑓 𝑦𝑖 = 1]
Ex1: Let 𝑌1 , 𝑌2 , … 𝑌𝑛 be a iid sample from following distribution with parameter 𝜃.

Find the MLE estimator of 𝜃 and compare with the MoM that you found earlier.
𝑓 (𝑦; 𝜃) = (𝜃 + 1) 𝑦 𝜃 , 0 < 𝑦 < 1, 𝜃 > −1
−𝑛 2𝑦̅−1
Sol: 𝜃̂ = ∑𝑛
− 1, MoM was 𝜃̃ =
𝑖=1 𝑙𝑛𝑦𝑖 1−𝑦̅
Ex2: Let 𝑌1 , 𝑌2 , … 𝑌𝑛 constitute a random sample from the probability density

function given by:
2
𝑓(𝑦; 𝜃 ) = (𝜃 − 𝑦), 0 ≤ 𝑦 ≤ 𝜃
𝜃2
̂
𝜃
It was noted that 𝜃̃ = 3𝑦̅, show that MLE is the solution to 2𝑛 = ∑𝑛𝑖=1(̂ )
𝜃−𝑦𝑖

Lecture 5

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Lecture 5

Uploaded by

Copyright:

Available Formats

Lecture 5: Point Estimation of Parameters (2)

2. Maximum Likelihood Estimator

𝐿(𝜽|𝒚) = 𝐿(𝜃1, 𝜃2 , … 𝜃𝑘 |𝑦1 , 𝑦2 , … 𝑦𝑛 ) = ∏ 𝑓(𝑦𝑖 ; 𝜃1, 𝜃2 , … 𝜃𝑘 )

Example1: Suppose that 𝑌1 , 𝑌2 , … 𝑌𝑛 constitute a random sample from a Poisson

Find the MLE of 𝜆

𝐿(𝜆|𝒚) = 𝑓(𝑦1 ; 𝜆)𝑓(𝑦2 ; 𝜆) … 𝑓(𝑦𝑛 ; 𝜆)

𝑒 −𝜆 𝜆𝑦1 𝑒 −𝜆 𝜆𝑦2 𝑒 −𝜆 𝜆𝑦𝑛

MLE of 𝜆 is obtained by maximizing the likelihood function wrt 𝜆. However, it is

ln𝐿(𝜆|𝒚) = −𝑛 𝜆 + (∑𝑛𝑖=1 𝑦𝑖 ) 𝑙𝑛 𝜆 − ∑𝑛𝑖=1 𝑙𝑛𝑦𝑖

Example2: Let If 𝑌1 , 𝑌2 , … 𝑌𝑛 be a random sample from a normal distribution with

𝐿(𝜇, 𝜎 2 |𝒚) = 𝑓(𝑦1 ; 𝜇, 𝜎 2 )𝑓(𝑦2 ; 𝜇, 𝜎 2 ) … 𝑓(𝑦𝑛 ; 𝜇, 𝜎 2 )

1 (𝑦1 −𝜇)2 1 (𝑦2 −𝜇)2 1 (𝑦𝑛 −𝜇)2

Substituting 𝑦̅ for 𝜇̂ in the second equation and solving for 𝜎̂ 2 , we have

Example3: Let If 𝑌1 , 𝑌2 , … 𝑌𝑛 be a random sample from a Bernoulli (p) distribution,

𝐿(𝑝|𝒚)= 𝑓(𝑦1 |𝑝)𝑓(𝑦2 |𝑝) … 𝑓(𝑦𝑛 |𝑝)

= 𝑝∑ 𝑦𝑖 (1 − 𝑝)𝑛−∑ 𝑦𝑖 = 𝑝 𝑦 (1 − 𝑝)𝑛−𝑦 (where 𝑦 = ∑ 𝑦𝑖 )

𝑙𝑛𝐿(𝑝|𝒚) = 𝑦𝑙𝑛 𝑝 + (𝑛 − 𝑦)𝑙𝑛(1 − 𝑝)

𝜕𝑙𝑛𝐿(𝑝|𝒚) 𝑦 𝑛−𝑦 𝑦 ∑𝑛𝑖=1 𝑦𝑖

[𝑖. 𝑒. 𝑠𝑎𝑚𝑝𝑙𝑒 𝑝𝑟𝑜𝑝𝑜𝑟𝑡𝑖𝑜𝑛 𝑜𝑓 𝑦𝑖 = 1]

Ex1: Let 𝑌1 , 𝑌2 , … 𝑌𝑛 be a iid sample from following distribution with parameter 𝜃.

Ex2: Let 𝑌1 , 𝑌2 , … 𝑌𝑛 constitute a random sample from the probability density

You might also like