You are on page 1of 6

Lecture 6: Point Estimation of Parameters (3)

Maximum Likelihood Estimator (further examples)

Invariance Property of the MLE: If 𝜃̂ is the MLE of 𝜃, then for any function 𝑔(𝜃)
the MLE is 𝑔(𝜃̂).

Ex1: Let Y1, Y2, … Yn be the random sample from the Poisson distrbution with
parameter 𝜆 given by:
𝑒 −𝜆 𝜆𝑦
𝑓(𝑦; 𝜆) = , 𝑦 = 0, 1,2 …
𝑦!
a) Derive the expression of the MLE of 𝜆
b) Find an estimate of MLE of 𝜆 from the random sample (from Poison dist.) of
size 55 represented in the following frequency distribution.
y 0 1 2 3 4 5
Frequency(f) 7 14 12 13 6 3
c) Find the MLE of P(Y = 2)
Sol: a) we found that 𝜆̂ = 𝑦̅
∑ 𝑓𝑦 116
b) 𝜆̂ = 𝑦̅ = ∑ = = 2.11
𝑓 55
𝑒 −𝜆 𝜆2
c) By invariance property of MLE of 𝑔(𝜆 ) = 𝑃(𝑌 = 2) = is:
2!
̂𝜆
̂𝑦 2.112
̂ ̂
𝑔(𝜆 ) = 𝑃(𝑌 = 2 ) = 𝑒 −𝜆
= 𝑒 −2.11
= 0.2698
𝑦! 2!
Ex2: What is the MLE of 𝜎 in example 2 (Lec5) (MLE of 𝜎 from normal
distribution).
2 2 ∑𝑛 ̅)2
𝑖=1(𝑦𝑖 − 𝑦
Sol: MLE of 𝜎 is 𝜎̂ = ,
𝑛
∑𝑛 ̅)2
𝑖=1(𝑦𝑖 −𝑦
Hence by invariance property: MLE of 𝜎 = √𝜎 2 is √𝜎̂ 2 = 𝜎̂ = √
𝑛
𝝏𝒍𝒏𝑳(𝜽|𝒚)
Cases where = 𝟎 does not yield explicit solution:
𝝏𝜽

Example4: If 𝑌1 , 𝑌2 , … 𝑌𝑛 be a random sample from a distribution given by:


𝑓(𝑦; 𝜃) = 𝜃𝑦 −2 , 0 < 𝜃 ≤ 𝑦 < ∞
(a) Find the MLE of 𝜃 (b) Find the method of moment estimator of 𝜃
Sol: (𝑎) 𝐿(𝜃|𝒚) = 𝐿(𝜃|𝑦1 , 𝑦2 , … 𝑦𝑛 ) = ∏𝑛𝑖=1 𝑓(𝑦𝑖 ; 𝜃)

𝐿(𝜃|𝒚)= 𝑓(𝑦1 ; 𝜃)𝑓(𝑦2 ; 𝜃) … 𝑓(𝑦𝑛 ; 𝜃)


−2
= (𝜃𝑦1−2 )(𝜃𝑦2−2 ) … (𝜃𝑦𝑛−2 ) = 𝜃 𝑛 (∏𝑛𝑖=1 𝑦𝑖 )

𝑙𝑛𝐿(𝜃|𝒚) = 𝑛𝑙𝑛 𝜃 − 2 ∑𝑛𝑖=1 𝑙𝑛 𝑦𝑖 (1)

𝜕 𝑙𝑛𝐿(𝜃|𝒚) 𝑛
=
𝜕𝜃 𝜃

Here setting the derivative equals zero does not yield a maximum. However, the
likelihood function or the log likelihood (1) is increasing in 𝜃 and approaches to ∞
𝑛
as 𝜃 increases (or approaches to zero as 𝜃 approaches to ∞). The maximum of
𝜃
the log likelihood is obtained by making 𝜃 as large as possible but the condition

0 < 𝜃 ≤ 𝑦(1) ≤ 𝑦(2) … ≤ 𝑦(𝑛) < ∞


must still be satisfied i.e. 𝜃 must be positive but should be less than or equal to the
minimum sample value and the smallest sample value thus provides the MLE i.e.
𝜃̂ = 𝑦(1) = 𝑀𝑖𝑛 ( 𝑦1 , 𝑦2 , … 𝑦𝑛 )

Here y(k) is the kth ordered value of Y called kth order statistic.
∞ ∞1
(b) 𝐸(𝑌) = 𝜃 ∫𝜃 𝑦𝑦 −2 𝑑𝑦 = 𝜃 ∫𝜃 𝑑𝑦 = 𝜃 [ 𝑙𝑛∞ − 𝑙𝑛𝜃] = ∞
𝑦
Hence the method of moment estimator for 𝜃 does not exist.
(Check that the given pdf is not a member of Exponential family).

Ex1: If 𝑌1 , 𝑌2 , … 𝑌𝑛 be a random sample from a Uniform distribution given by:


1
𝑓(𝑦; 𝜃) = , 0 ≤ 𝑦 ≤ 𝜃
𝜃
a) Find the MLE of 𝜃 and b) the method of moment estimator of 𝜃. [Note: mean of
general uniform U (a, b) distribution is (a + b)/2]

Sol: (a) Consider the shape of likelihood function for fixed n (shown below for
n=30). The log likelihood is decreasing in 𝜃 and its maximum approaches as 𝜃
approaches zero. The condition 0 ≤ 𝑦(1) ≤ 𝑦(2) … ≤ 𝑦(𝑛) ≤ 𝜃 must also holds.
Means that 𝜃 must be greater than or equal to the maximum 𝑦1 , 𝑦2 , … 𝑦𝑛 . Hence
𝜃̂ = 𝑦(𝑛) = 𝑀𝑎𝑥 ( 𝑦1 , 𝑦2 , … 𝑦𝑛 ).
-30ln(theta)
50
0
-50 0 200 400 600 800 1000 1200
theta
-100
-150
-200
-250

(b) 𝜃̃ = 2 𝑦̅ (check)

Cases where analytical closed form of MLE does not exist:

Example5: (DeGroot: Example 7.6.4 and 7.6.6, p-428): Let 𝑌1 , 𝑌2 , … 𝑌𝑛 be a iid


sample from Gamma (𝛼, 1) distribution given by:
1
𝑓(𝑦; 𝛼 ) = 𝑦 𝛼−1 𝑒 − 𝑦 , 𝑦 > 0, 𝛼 >0
Γ(𝛼 )

[Note Γ(𝛼) = ∫0 𝑦 𝛼−1 𝑒 − 𝑦 𝑑𝑦]

∑𝑛
𝑖=1 𝑙𝑛𝑦𝑖
Find the MLE of 𝛼 . Let we have 𝑛 = 20, = 1.220, 𝑦̅ = 3.679, find a
𝑛
particular ML estimate of 𝛼.

Sol: Show that the log likelihood is:


𝑛 𝑛

𝑙𝑛𝐿(𝛼|𝑦) = −𝑛𝑙𝑛Γ(𝛼) + (𝛼 − 1) ∑ 𝑙𝑛 𝑦𝑖 − ∑ 𝑦𝑖
𝑖=1 𝑖=1
Taking derivative wrt 𝛼 and equating to zero and dividing by -n, we have,
𝜕𝑙𝑛Γ(𝛼) ∑𝑛
𝑖=1 𝑙𝑛𝑦𝑖
− =0
𝜕𝛼 𝑛
Γ′(𝛼) ∑𝑛
𝑖=1 𝑙𝑛𝑦𝑖
− =0 (1)
Γ(𝛼) 𝑛
𝜕Γ(𝛼)
Where Γ′(𝛼 ) =
𝜕𝛼
Eq(1) has no analytical (closed form) solution for 𝛼. However, one can use
numerical methods to find an approximate solution.
𝜕𝑙𝑛Γ(𝛼)
The function is called digamma (short for double gamma). In R, digamma()
𝜕𝛼
𝜕 𝜕𝑙𝑛Γ(𝛼)
computes its value. In R, trigamma() computes values of ( ).
𝜕𝛼 𝜕𝛼

Newton –Raphson Method: Let 𝑓(𝜃) be a real-valued function of a real variable


and suppose that we wish to solve the equation 𝑓(𝜃) = 0. Let 𝜃0 be an initial guess
at the solution. The Newton’s method replaces the initial guess with the updated
guess as follows:

𝑓(𝜃0 )
𝜃1 = 𝜃0 −
𝑓′(𝜃0 )

𝜕𝑙𝑛Γ(𝛼) ∑𝑛
𝑖=1 𝑙𝑛𝑦𝑖
In our case: 𝜃 = 𝛼 and 𝑓 (𝛼 ) = − => we have solve:
𝜕𝛼 𝑛
𝜕𝑙𝑛Γ(𝛼) ∑𝑛
𝑖=1 𝑙𝑛𝑦𝑖
𝑓(𝛼 ) = − =0
𝜕𝛼 𝑛
𝜕 𝜕𝑙𝑛Γ(𝛼)
Now 𝑓 ′ (𝛼 ) = ( )
𝜕𝛼 𝜕𝛼

As E(Y) = 𝛼 (why? ), let take the initial guess 𝛼̃ = 𝑦̅ = 3.679


𝑓(3.679) = 𝑑𝑖𝑔𝑎𝑚𝑚𝑎(3.679) − 1.220

𝑓(3.679) = 1.160622 − 1.220 = −0.059378

𝜕 𝜕𝑙𝑛Γ(𝛼)
𝑓 ′ (3.679) = ( ) = 𝑡𝑟𝑖𝑔𝑎𝑚𝑚𝑎(3.679) = 0.3120541
𝜕𝛼 𝜕𝛼
𝑓(𝛼0 ) (−0.059378)
𝛼1 = 𝛼0 − = 3.679 − = 3.869281 (this updates at first iteration)
𝑓′ (𝛼0 ) 0.3120541

𝑓(3.869281) = 1.218316 − 1.220 = −0.001684


𝜕 𝜕𝑙𝑛Γ(𝛼)
𝑓 ′ (3.869281 ) = ( ) = 0.2946835
𝜕𝛼 𝜕𝛼

𝑓(𝛼1 ) −0.001684
𝛼2 = 𝛼1 − = 3.869281 − = 3.8749956
𝑓′ (𝛼1 ) 0.2946835
11
𝑓(3.8749956) = 1.219998 − 1.220 = − = −1.110−5
1000000
𝜕 𝜕𝑙𝑛Γ(𝛼)
𝑓 ′ (3.8749956) = ( ) = 0.2941915
𝜕𝛼 𝜕𝛼
𝑓(𝛼2 ) −1.110−5
𝛼3 = 𝛼2 − = 3.8749956 − = 3.875032991
𝑓′ (𝛼2 ) 0.2941915

Hence up two decimal places estimate of MLE of 𝛼 is 𝛼̂ = 3.87

Using R’s optimize function:


n=20
f=function(alpha1) -n*log(gamma(alpha1))+(alpha1-1)*20*1.22-3.679
alpha=optimize(f, c(0,10), maximum=T)
alpha[1] # gives alpha = 3.874988

Note1: c(0,10) specifies the range over which search for optimal alpha has to be
made. You can specify some reasonable range arbitrarily.
Note2: If ∑ y and ∑ log y are not given but the data on y are given, one can modify
the program to be more general as:

#First read y data from a csv file or directly specify y by typing y = c( ) function
n=length(y)
f=function(alpha1) -n*log(gamma(alpha1))+(alpha1-1)*sum(log(y)) –sum(y)
alpha=optimize(f,c(0,10), maximum=T)
alpha[1]

Example6: (Estimating parameter of regression model using MLE) : Let the


conditional mean of 𝑌 given 𝑋 = 𝑥 (also called the regression of Y on X) is linear
i.e. for each observation i, we have

𝐸(𝑌|𝑋 = 𝑥𝑖 ) = 𝛽0 + 𝛽1 𝑥𝑖
Or
𝑌𝑖 = 𝐸(𝑌|𝑋 = 𝑥𝑖 ) + 𝑒𝑖 = 𝛽0 + 𝛽1 𝑥𝑖 + 𝑒𝑖

Where we assume 𝑒𝑖 |𝑋~ 𝑁(0, 𝜎 2 ).

Let we have a random sample of size n as (𝑋𝑖 , 𝑌𝑖 ) [i = 1, 2….n]. Find the MLE of
regression parameters 𝛽0 and 𝛽1 .

𝑒𝑖 = 𝑌𝑖 − 𝛽0 − 𝛽1 𝑥𝑖

𝐿(𝛽0 , 𝛽1 |(𝒙, 𝒚)) = 𝑓(𝑒1 )𝑓(𝑒2 ) … 𝑓(𝑒𝑛 )


=
1 𝑒12 1 𝑒22 1 𝑒𝑛2
=[ exp{− 2
}] [ exp{− 2
}]…[ exp{− }]
𝜎√2𝜋 2𝜎 𝜎√2𝜋 2𝜎 𝜎 √2𝜋 2𝜎 2

1 1
=[ exp{− ∑𝑛𝑖=1 𝑒𝑖2 }]
𝜎 𝑛 (2𝜋)𝑛/2 2𝜎 2

1 1
=( )𝑛/2 exp{− ∑𝑛𝑖=1(𝑦𝑖 − 𝛽0 − 𝛽1 𝑥𝑖 )2 }
2𝜋𝜎 2 2𝜎 2
𝑛
𝑛 𝑛 1
𝑙𝑛𝐿(𝛽0 , 𝛽1 |(𝒙, 𝒚)) = − 𝑙𝑛𝜎 2 − 𝑙𝑛2𝜋 − 2 ∑(𝑦𝑖 − 𝛽0 − 𝛽1 𝑥𝑖 )2
2 2 2𝜎
𝑖=1

𝜕𝑙𝑛𝐿(𝛽0 , 𝛽1 |(𝒙, 𝒚)) 1


= ∑𝑛𝑖=1(𝑦𝑖 − 𝛽0 − 𝛽1 𝑥𝑖 ) = 0 (1)
𝜕𝛽0 𝜎2
𝜕𝑙𝑛𝐿(𝛽0 , 𝛽1 |(𝒙, 𝒚)) 1
= ∑𝑛𝑖=1(𝑦𝑖 − 𝛽0 − 𝛽1 𝑥𝑖 )𝑥𝑖 = 0 (2)
𝜕𝛽1 𝜎2

∑𝑛𝑖=1 𝑦𝑖 = 𝑛 𝛽0 + 𝛽1 ∑𝑛𝑖=1 𝑥𝑖 (1a)


𝑛 𝑛 𝑛 2
∑𝑖=1 𝑦𝑖 𝑥𝑖 = 𝛽0 ∑𝑖=1 𝑥𝑖 + 𝛽1 ∑𝑖=1 𝑥𝑖 (2a)

Solving simultaneously (e.g. using Cramer’s rule):

𝑛 ∑𝑛𝑖=1 𝑦𝑖 𝑥𝑖 − (∑𝑛𝑖=1 𝑥𝑖 )(∑𝑛𝑖=1 𝑦𝑖 ) ∑𝑛𝑖=1(𝑥𝑖 − 𝑥̅ )(𝑦𝑖 − 𝑦̅)


𝛽̂1 = =
𝑛 ∑𝑛𝑖=1 𝑥𝑖2 − (∑𝑛𝑖=1 𝑥𝑖 )
2 ∑𝑛𝑖=1(𝑥𝑖 − 𝑥̅ )2
𝛽̂0 = 𝑦̅ − 𝛽̂1 𝑥̅

[Check that differentiating the log likelihood wrt 𝜎 2 yields the MLE of 𝜎 2 as
2 ∑𝑛 2
𝑖=1 𝑒𝑖
𝜎̂ = ]
𝑛

You might also like