This action might not be possible to undo. Are you sure you want to continue?
Lecture 2: Point estimation
Oli Atlason
Outline
• Parametric Statistical Models
• Method of Moments and Maximum Likelihood
• Bias and MSE
• Asymptotic Properties
1
Parametric Statistical Models
Example
We observe X
1
, . . . , X
n
.
X
i
: number of alpha particles emitted by sample during the ith time interval
of an experiment
Natural model: X
i
∼ Poisson(λ) and X
1
, . . . , X
n
independent.
Poisson distribution
P
λ
(X
i
= k) = p
Poisson
λ
(k) =
λ
k
e
−λ
k!
, k ∈ N, λ > 0
Formal model
consists, for a given n, of the family
_
p
Poisson
λ
(X
i
)
_
λ∈R
+
here parameter space is R
+
2
Parametric Statistical Models
General deﬁnition
A parametric statistical model for observations X = (X
1
, . . . , X
n
) is a family
{f
θ
(x)}
θ∈Θ
of probability distributions.
We want to know which f
θ
is responsible.
Point estimator: a function
˜
θ(x) of the observations. This is a guess at
what θ was used to generate data.
Sampling distribution
θ(X) takes random values. It has a distribution derived from X.
3
Method of Moments
... is a simple way of ﬁnding an estimator.
kth sample moment
ˆ µ
k
=
1
n
(X
i
)
k
kth population moment
µ
k
(θ) = E
θ
[X
k
]
Now solve for θ in the system (µ
k
(θ) = ˆ µ
k
)
k=1,...,p
.
Usually need as many equations (p) as the number of parameters.
4
Method of Moments: Examples
Poisson
For Poisson, θ = λ. Also
E
λ
[X] =
∞
k=0
kP(X = k) =
∞
k=0
k
λ
k
e
−λ
k!
= λ
thus the method of moments estimator is
˜
λ(X) = ˆ µ
1
=
1
n
X
i
Normal
The parameters are two, θ = (µ, σ
2
). We need 2 equations. We know that
E[X] = µ and V ar[X] = σ
2
. Thus
µ
1
= µ, µ
2
= µ
2
+ σ
2
The method of moments estimator is ˜ µ = ˆ µ
1
,
˜ σ
2
= ˆ µ
2
− ˆ µ
1
2
=
1
n
x
2
i
− ¯ x
2
=
1
n
(x
2
i
− ¯ x)
2
+
2
n
x
i
¯ x − ¯ x
2
− ¯ x
2
=
1
n
(x
2
i
− ¯ x)
2
5
Maximum Likelihood
Likelihood is the density of the data as function of θ. When data i.i.d.
(independent and identically distributed) with density f
θ
, then
L(θ) = f
θ
(X
1
, . . . , X
n
) =
n
i=0
f
θ
(X
i
)
Maximum likelihood estimate (MLE),
ˆ
θ, is the θ that maximizes the
likelihood.
The idea is that
ˆ
θ is the value of θ for which the sample is most likely.
Finding MLE
• Helpful to take log
• Usually use calculus (L
(θ) = 0)
• Remember to check values at boundaries and second derivatives
6
MLE: Example
Poisson
L(θ) =
n
i=0
λ
X
i
e
−λ
X
i
!
take logs (argmax
x
log f(x) = argmax
x
f(x))
l(θ) = log L(θ) =
n
i=0
(X
i
log λ −λ −log X
i
!)
l
(θ) =
1
λ
X
i
−n
i.e. the MLE is
ˆ
θ =
1
n
X
i
.
In this example, method of moments and MLE give same answer.
Note, we don’t really need X = (X
1
, . . . , X
n
) here, only the sum.
T(X) =
X
i
is called a suﬃcient statistic.
7
Bias and MSE
˜
θ(X) is an estimator of θ. Then
Bias
˜
θ
= E
θ
[
˜
θ(X)] −θ
MSE
˜
θ
= E
θ
[(
˜
θ(X) −θ)
2
]
Note

˜
θ(X) is random (function of sample)
 Bias, MSE are nonrandom
 Bias, MSE are functions of θ
When E
˜
θ = θ, the estimator is called unbiased.
8
Bias Example: Sample variance
For an i.i.d. sample X
1
, . . . , X
n
from a N(µ, σ
2
), the method of moments
and maximum likelihood estimators coincide and are
ˆ σ
2
=
1
n
i
(X
i
−
¯
X)
2
this estimator is biased:
E[ˆ σ
2
] =
1
n
E
_
(X
2
i
−2X
i
¯
X +
¯
X
2
)
_
=
1
n
E
_
X
2
i
−n
¯
X
2
_
= E[X
2
1
] −E[
¯
X
2
]
= (σ
2
+ µ
2
) −
_
σ
2
n
+ µ
2
_
= σ
2
n −1
n
which implies
Bias = E[ˆ σ
2
] −σ
2
= −
σ
2
n
9
Bias Example: Sample variance
From last lecture, sample variance is
S
2
n
=
1
n −1
i
(X
i
−
¯
X)
2
by preceding derivation, S
2
n
is an unbiased estimate of σ
2
.
Frequently method of moments estimators and MLEs are biased and can be
made slightly better by a small change.
10
Biasvariance tradeoﬀ
We would like no bias and low variance.
Often there is a choice:
 low bias, high variance
 some bias, some variance
 high bias, low variance
Illustration
11
Mean Squared Error
The MSE combines the bias and the variance of the estimator.
MSE
˜
θ
= E
_
(
˜
θ −θ)
2
_
= E
_
_
˜
θ −E[
˜
θ] + E[
˜
θ] −θ
_
2
_
= E
_
_
˜
θ −E[
˜
θ]
_
2
_
+ E
_
(E[
˜
θ] −θ)
2
_
+ 2E
_
˜
θ −E[
˜
θ]
_
E
_
E[
˜
θ] −θ
_
= V ar[
˜
θ] + Bias
2
Bias and variance sometimes called precision and accuracy.
12
MSE Example: Sample Variance
To compute the MSE of S
2
n
, recall that under independence and normality
i
(X
i
−
¯
X)
2
σ
2
∼ χ
2
n−1
which has mean n −1 and variance 2(n −1). Thus
MSE
S
2
n
= Bias
2
+ V ar[S
2
n
] = V ar
_
1
n −1
i
(X
i
−
¯
X)
2
_
=
_
σ
2
n −1
_
2
×2(n −1) =
2σ
4
n −1
for the MLE, however
MSE
ˆ σ
2 = Bias
2
+ V ar[ˆ σ
2
] =
_
−
σ
2
n
_
2
+ V ar
_
1
n
i
(X
i
−
¯
X)
2
_
=
σ
4
n
2
+ σ
4
2(n −1)
n
2
= σ
4
2n −1
n
2
< MSE
S
2
n
13
MSE Example: Sample Variance
MSE perhaps not natural for scale parameters. In fact, minimum MSE
estimator is
1
n + 1
i
(X
i
−
¯
X)
2
Asymptotically identical (lim
n+1
n−1
= 1).
Method of moments and MLE estimators are rarely unbiased.
However, MLE has nice asymptotic properties.
14
Example: Gamma distribution, one observation
Recall that Γ(x) =
_
∞
0
t
x−1
e
−tx
dt.
Important property: Γ(α) = (α −1)Γ(α −1) for α > 0.
Gamma distribution with parameters α, β > 0:
f
α,β
(x) =
β
α
Γ(α)
x
α−1
e
−βx
, x > 0
We have one observation, X ∈ R
+
, we know α. Find MLE for β; compute bias
and MSE.
Solution:
l(β) = αlog(β) −log(Γ(α)) + (α −1) log(x) −βx
l
(β) =
α
β
−x = 0
our candidate is
ˆ
β =
α
x
.
15
Example: Gamma distribution, one observation (2)
Check:
l
(β) = −
α
β
2
< 0
and
α
x
> 0, i.e.
ˆ
β is in parameter space. Moments:
E
β
[
ˆ
β] =
_
∞
0
α
x
β
α
Γ(α)
x
α−1
e
−βx
dx
= αβ
Γ(α −1)
Γ(α)
_
∞
0
β
α−1
Γ(α −1)
x
(α−1)−1
e
−βx
dx
= β
α
α −1
and in same way
E
β
[
ˆ
β
2
] = α
2
β
2
Γ(α −2)
Γ(α)
= β
2
α
2
(α −1)(α −2)
16
Example: Gamma distribution, one observation (3)
E
β
[
ˆ
β] = β
α
α−1
and E
β
[
ˆ
β
2
] = β
2 α
2
(α−1)(α−2)
. so
Bias = E
ˆ
β −β =
_
α
α −1
−1
_
β =
β
α −1
MSE = Bias
2
+ V ar[
ˆ
β]
= β
2
_
1
α −1
_
2
+ E[
ˆ
β
2
] −E[
ˆ
β]
2
= β
2
_
1
(α −1)
2
+
α
2
(α −1)(α −2)
−
α
2
(α −1)
2
_
= β
2
α + 2
(α −1)(α −2)
17
Properties of MLE
Invariance
MLEs are invariant under transformations.
Ex: In Poisson model, θ =
1
λ
measures waiting time between observations. By
invariance,
ˆ
θ =
1
ˆ
λ
=
n
X
i
.
Consistency
ˆ
θ
n
is the MLE obtained from X
1
, . . . , X
n
. Then, under minimal technical
conditions,
ˆ
θ
n
P
→θ
Compare with statements of:
 unbiasedness ( E[
ˆ
θ] = θ),
 strong consistency,
ˆ
θ
n
→θ a.s..
Method of moments estimators often also consistent.
18
Properties of MLE (2)
Theorem (CramerRao) Under conditions, notably
∂
∂θ
E
˜
θ(X) =
_
˜
θ
∂
∂θ
f(xθ)dx,
V ar
θ
[
˜
θ(X)] ≥
(1 + Bias
(θ))
2
nI(θ)
.
where
I(θ) = E
θ
_
_
∂
∂θ
log f(Xθ)
_
2
_
= −E
θ
_
∂
2
∂θ
2
log f(xθ)
_
is the Fisher information of f
θ
.
Theorem (Fisher) MLE achieves bound asymptotically and
√
n
_
ˆ
θ −θ
_
D
→N
_
0,
1
I(θ)
_
MLE is asymptotically eﬃcient, i.e. attains lowest possible variance.
19
Example: Gamma, n observations
Let X
1
, . . . , X
n
be i.i.d. Gamma(α, β), α and β unknown.
L(α, β) =
n
i=1
f
α,β
(x
i
) =
n
i=1
β
α
Γ(α)
x
α−1
i
e
−βx
i
l(α, β) =
(αlog β −log Γ(α) + (α −1) log x
i
−βx
i
)
l
α
(α, β) = nlog β +
log x
i
−n
Γ
(α)
Γ(α)
l
β
(α, β) = n
α
β
−
x
i
Last equation:
ˆ
β =
nˆ α
x
i
. Must solve l
α
(α, β) = 0 numerically.
Hard to compute small sample bias and MSE.
Use asymptotic methods.
20
Example: Gamma, n observations (2)
Fisher information: nI(θ) = −E[l
(θ)].
l
α,α
(α, β) = −n
_
Γ
(α)
Γ(α)
_
= −n
Γ
(α)Γ(α) −Γ
(α)Γ
(α)
(Γ(α))
2
l
α,β
(α, β) =
n
β
l
β,β
(α, β) = −n
α
β
2
we can use e.g. the approximation
√
n(
ˆ
β −β) ∼ N
_
0,
β
2
α
_
i.e.
ˆ
β ∼ N
_
β,
β
2
nα
_
21
Parametric Statistical Models
Example We observe X1, . . . , Xn. Xi: number of alpha particles emitted by sample during the ith time interval of an experiment Natural model: Xi ∼ Poisson(λ) and X1, . . . , Xn independent. Poisson distribution Pλ(Xi = k) = pPoisson(k) λ λk e−λ = , k ∈ N, λ > 0 k!
Formal model consists, for a given n, of the family pPoisson(Xi) λ here parameter space is R+
λ∈R+
Sampling distribution θ(X) takes random values. . ˜ Point estimator: a function θ(x) of the observations. . . This is a guess at what θ was used to generate data.Parametric Statistical Models General deﬁnition A parametric statistical model for observations X = (X1. Xn) is a family {fθ (x)}θ∈Θ of probability distributions. . We want to know which fθ is responsible. . It has a distribution derived from X.
1 n (Xi)k ..p...Method of Moments .. Usually need as many equations (p) as the number of parameters. is a simple way of ﬁnding an estimator. kth sample moment µk = ˆ kth population moment µk (θ) = Eθ [X k ] ˆ Now solve for θ in the system (µk (θ) = µk )k=1...
µ2 = µ2 + σ 2 ˜ ˆ The method of moments estimator is µ = µ1. 1 ˜ ˆ ˆ ¯ σ 2 = µ2 − µ1 2 = x2 − x2 i n 2 1 ¯ ¯ (x2 − x)2 + ¯ xi x − x2 − x2 ¯ = i n n 1 ¯ = (x2 − x)2 i n .Method of Moments: Examples Poisson For Poisson. We know that E[X] = µ and V ar[X] = σ 2. Thus µ1 = µ. Also ∞ ∞ Eλ[X] = k=0 kP (X = k) = k=0 λk e−λ k =λ k! 1 ˜ ˆ thus the method of moments estimator is λ(X) = µ1 = n Xi Normal The parameters are two. σ 2). θ = λ. θ = (µ. We need 2 equations.
ˆ The idea is that θ is the value of θ for which the sample is most likely. . Finding MLE • Helpful to take log • Usually use calculus (L (θ) = 0) • Remember to check values at boundaries and second derivatives . Xn) = i=0 fθ (Xi) ˆ Maximum likelihood estimate (MLE). . (independent and identically distributed) with density fθ . When data i.i. .d. then n L(θ) = fθ (X1.Maximum Likelihood Likelihood is the density of the data as function of θ. . θ. is the θ that maximizes the likelihood.
. method of moments and MLE give same answer.MLE: Example Poisson n L(θ) = i=0 x x n λXi e−λ Xi! take logs (argmax log f (x) = argmaxf (x)) l(θ) = log L(θ) = i=0 (Xi log λ − λ − log Xi!) 1 λ Xi − n l (θ) = ˆ i. .e. the MLE is θ = 1 n Xi. . . only the sum. In this example. Note. T (X) = Xi is called a suﬃcient statistic. we don’t really need X = (X1. . Xn) here.
MSE are functions of θ ˜ When E θ = θ.Bias. Then ˜ Biasθ = Eθ [θ(X)] − θ ˜ ˜ MSEθ = Eθ [(θ(X) − θ)2] ˜ Note ˜ . the estimator is called unbiased.Bias and MSE ˜ θ(X) is an estimator of θ. .θ(X) is random (function of sample) .Bias. MSE are nonrandom .
Xn from a N (µ.d. σ 2). . .Bias Example: Sample variance For an i. . sample X1.i. the method of moments and maximum likelihood estimators coincide and are 1 ¯ σ2 = ˆ (Xi − X)2 n i this estimator is biased: E[ˆ 2] = σ 1 E n 1 = E n 2 2 ¯ ¯ (Xi2 − 2XiX + X 2) 2 ¯ ¯ Xi2 − nX 2 = E[X1 ] − E[X 2] = (σ + µ ) − = σ2 which implies n−1 n σ2 + µ2 n σ2 Bias = E[ˆ ] − σ = − σ n 2 2 . .
sample variance is 2 Sn = 1 n−1 ¯ (Xi − X)2 i 2 by preceding derivation. . Sn is an unbiased estimate of σ 2. Frequently method of moments estimators and MLEs are biased and can be made slightly better by a small change.Bias Example: Sample variance From last lecture.
Often there is a choice: .low bias. some variance .some bias.Biasvariance tradeoﬀ We would like no bias and low variance. low variance Illustration . high variance .high bias.
˜ MSEθ = E (θ − θ)2 ˜ = E = E ˜ ˜ ˜ θ − E[θ] + E[θ] − θ ˜ ˜ θ − E[θ] 2 2 ˜ ˜ ˜ ˜ + E (E[θ] − θ)2 + 2E θ − E[θ] E E[θ] − θ ˜ = V ar[θ] + Bias2 Bias and variance sometimes called precision and accuracy. .Mean Squared Error The MSE combines the bias and the variance of the estimator.
recall that under independence and normality i (Xi ¯ − X)2 σ2 ∼ χ2 n−1 which has mean n − 1 and variance 2(n − 1). however σ2 n−1 2 2σ 4 × 2(n − 1) = n−1 2 M SEσ2 = Bias2 + V ar[ˆ 2] = σ ˆ 4 σ2 − n + V ar 1 n ¯ (Xi − X)2 i = 2(n − 1) 2n − 1 σ + σ4 = σ4 < M SESn 2 n2 n2 n2 . Thus 2 M SESn = Bias2 + V ar[Sn] = V ar 2 1 n−1 ¯ (Xi − X)2 i = for the MLE.MSE Example: Sample Variance 2 To compute the MSE of Sn.
Method of moments and MLE estimators are rarely unbiased.MSE Example: Sample Variance MSE perhaps not natural for scale parameters. . However. In fact. minimum MSE estimator is 1 ¯ (Xi − X)2 n+1 i n+1 Asymptotically identical (lim n−1 = 1). MLE has nice asymptotic properties.
compute bias and MSE. Gamma distribution with parameters α. one observation Recall that Γ(x) = 0 tx−1e−txdt. β > 0: β α α−1 −βx fα. Find MLE for β.β (x) = x e . Solution: l(β) = α log(β) − log(Γ(α)) + (α − 1) log(x) − βx α −x=0 l (β) = β ˆ our candidate is β = α . Important property: Γ(α) = (α − 1)Γ(α − 1) for α > 0. X ∈ R+. we know α.x > 0 Γ(α) We have one observation.Example: Gamma distribution. x ∞ .
i. Moments: ˆ Eβ [β] = α β α α−1 −βx x e dx x Γ(α) 0 Γ(α − 1) ∞ β α−1 (α−1)−1 −βx = αβ x e dx Γ(α) Γ(α − 1) 0 α = β α−1 2 2 Γ(α ∞ and in same way Eβ [β ] = α β ˆ2 α2 − 2) 2 =β Γ(α) (α − 1)(α − 2) .Example: Gamma distribution. β is in parameter space. one observation (2) Check: l (β) = − and α x α <0 β2 ˆ > 0.e.
so ˆ Bias = E β − β = α β −1 β = α−1 α−1 ˆ MSE = Bias2 + V ar[β] 2 1 ˆ ˆ = β2 + E[β 2] − E[β]2 α−1 1 α2 α2 2 = β + − (α − 1)2 (α − 1)(α − 2) (α − 1)2 α+2 = β2 (α − 1)(α − 2) .Example: Gamma distribution. one observation (3) α α2 ˆ ˆ Eβ [β] = β α−1 and Eβ [β 2] = β 2 (α−1)(α−2) .
ˆ . θn → θ a. By ˆ 1 invariance. Then.. θ = λ = nXi .s.strong consistency. .unbiasedness ( E[θ] = θ).Properties of MLE Invariance MLEs are invariant under transformations. Xn. . ˆ P θn → θ Compare with statements of: ˆ . under minimal technical conditions. . Method of moments estimators often also consistent. . . ˆ Consistency ˆ θn is the MLE obtained from X1. 1 Ex: In Poisson model. θ = λ measures waiting time between observations.
Theorem (Fisher) MLE achieves bound asymptotically and √ D ˆ n θ−θ →N 0. 1 I(θ) MLE is asymptotically eﬃcient.e. i. nI(θ) where I(θ) = Eθ ∂ log f (Xθ) ∂θ 2 = −Eθ ∂2 log f (xθ) ∂θ2 is the Fisher information of fθ . attains lowest possible variance. notably ∂θ E θ(X) = ˜∂ θ ∂θ f (xθ)dx. . (1 + Bias (θ))2 ˜ V arθ [θ(X)] ≥ .Properties of MLE (2) ∂ ˜ Theorem (CramerRao) Under conditions.
x Hard to compute small sample bias and MSE. β) = n log β + α lβ (α. Must solve lα (α.d. β) = 0 numerically.i. . . Use asymptotic methods.β (xi) = i=1 β α α−1 −βxi x e Γ(α) i l(α. β). β) = n − β α ˆ Last equation: β = nˆ i .Example: Gamma. Gamma(α. . . . Xn be i. β) = (α log β − log Γ(α) + (α − 1) log xi − βxi) log xi − n xi Γ (α) Γ(α) lα (α. n n L(α. α and β unknown. n observations Let X1. β) = i=1 fα.
e. β) = n β Γ (α) Γ(α) α β2 β2 0. β ∼ N β.g.β (α.Example: Gamma. the approximation √ β2 ˆ i. nα ˆ n(β − β) ∼ N .β (α. lα. β) = −n we can use e. β) = −n lα. n observations (2) Fisher information: nI(θ) = −E[l (θ)]. α = −n Γ (α)Γ(α) − Γ (α)Γ (α) (Γ(α))2 lβ.α (α.