Professional Documents
Culture Documents
Setup
A model
Method of moments estimator
Maximum likelihood estimator
Exercise
Method of moments estimator
Setup
During a long vacation to some foreign country, you’ve been taking the same bus every
morning. Since you’re on vacation, the “morning” has been starting at pretty random
times for you, but you’ve been told that the buses in this country arrive in constant
intervals throughout the whole day.
https://5harad.com/mse125/discussions/week_4/week4_notes.html 2/5/2020
Discussion (Week 4): MLE and method of moments Page 2 of 7
During your stay so far, the times you’ve waited for a bus (in minutes) have been:
16, 2, 6, 7, 19
A model
Let’s call this interval θ . One way to model the historically observed wait times,
X1 , X2 , ⋯ , X5 , is to treat them as independent and identially distributed draws from
a Uniform(0, θ ) distribution. This does require, however, that you’re willing to make at
least two assumptions:
1. your arrival time at bus stops in the mornings has been pretty random
2. the bus you take does indeed arrive every θ minutes.
While not everyone might be on board with these assumptions, let’s just agree to take
them as given, for the purpose of illustration1.
theoretical value, and not something we can compute from the observed data. With the
data, we can estimate the j -th moment of Xi via
1 n
∑ (Xi )j
n i=1
1 n
X̄n = ∑ Xi .
n i=1
https://5harad.com/mse125/discussions/week_4/week4_notes.html 2/5/2020
Discussion (Week 4): MLE and method of moments Page 3 of 7
Given both the theoretical and estimated moments, one way to estimate the parameter
θ would be to set it such that our estimated moment is exactly the theoretical moment2.
~
In other words, we wish to set our estimator θ such that
~
θ
= X̄n
2
~
θ = 2X̄n .
16 + 2 + 6 + 7 + 19
2( ) = 20.
5
1/θ if xi ≤ θ
f(xi ; θ) = {
0 otherwise
(1/θ)n if xi ≤ θ ∀i
f(x1 , x2 , ⋯ , xn ; θ) = {
0 otherwise
Recall that in this joint density, the parameter θ is fixed, and f is a function of the
observable data X1 , ⋯ , Xn . But in reality, we’ve observed draws of each Xi , and
would like to know how likely our observed data would have been under different values
of θ . We can represent this likelihood by thinking of the joint density of the data given θ
as a function of θ , where the data Xi are now fixed to the values we’ve observed:
(1/θ)n if θ ≥ xi ∀i
Ln (θ) = {
0 otherwise
Then, one estimate of θ might be to find the value of θ that makes this likelihood as big
as possible, a.k.a., the maximum likelihood estimator.
One common technique for finding the value of a parameter that maximizes the
likelihood function is to take the log of the likelihood, often referred to as the log-
likelihood. In this case, that would be:
−n log(θ) if θ ≥ xi ∀i
ℓn (θ) = log Ln (θ) = {
−∞ otherwise
https://5harad.com/mse125/discussions/week_4/week4_notes.html 2/5/2020
Discussion (Week 4): MLE and method of moments Page 4 of 7
1. θ cannot be smaller than any of the observed values3, otherwise ℓ(θ) would be
−∞ !
2. −n log(θ) is a monotonically decreasing function
(https://en.wikipedia.org/wiki/Monotonic_function) of θ ; which means we want to
have the smallest possible value of θ , in order to maximize the value of ℓ(θ) .
Given these observations, the natural value θ^ that maximizes ℓ(θ) can be found to be
max(X1 , X2 , ⋯ , Xn ) , or in our case, θ^ = max(16, 2, 6, 7, 19) = 19 .
We can also use R to visualize the log-likelihood function ℓ(θ) (and its maximum) as a
function of θ , given our observed data. First, we implement ℓ(θ) as a function in R 4:
Then, for the purpose of plotting, we create a data frame with a range of possible values
for θ and the corresponding likelihood:
https://5harad.com/mse125/discussions/week_4/week4_notes.html 2/5/2020
Discussion (Week 4): MLE and method of moments Page 5 of 7
If the two estimators we find look familiar, that’s because they are the estimates we
present in assignment #2 (https://5harad.com/mse125/#hw2).
Exercise
Now that we’ve seen how to find both MLE and method of moments estimators, let’s
practice with a different distribution.
f(x; θ) = θxθ−1 ,
Let’s use the two methods we’ve covered above to find estimators of θ .
https://5harad.com/mse125/discussions/week_4/week4_notes.html 2/5/2020
Discussion (Week 4): MLE and method of moments Page 6 of 7
Finding estimators for j unknown parameters would require j equations, and hence we
would need to find up to the jth moment. For this exercise, since we have just one
unknown parameter θ , we only need to find one equation involving the first moment.
As we have done in the above example, an MLE can be found in two steps:
1. While it’s quite difficult to come up with real-life examples that fall exactly into
some theoretical distribution, in practice, models based on such simplified
assumptions provide surprisingly useful results.
2. Note that in this case, we only have one unknown, θ , so solving one equation is
sufficient; hence we only need to look at the first moment. In cases where there
are more than one unknown parameter, we would have to compare the
theoretical and estimated values of higher moments as well. More specifically, if
we wanted to know both the lower and upper bounds of the Uniform distribution
(instead of just assuming the lower bound to be 0), we would need at least the first
and second moments of Xi .
3. In our example, if θ were actually smaller than the largest time we actually waited,
this would imply that we had an unlucky day during which the bus was more
delayed than usual. However, in theory (and under the assumptions of our model),
we assume this doesn’t happen.
5. This is a bit sloppy. In reality, we would also have to check the second derivative of
https://5harad.com/mse125/discussions/week_4/week4_notes.html 2/5/2020
Discussion (Week 4): MLE and method of moments Page 7 of 7
https://5harad.com/mse125/discussions/week_4/week4_notes.html 2/5/2020