4 Comparison of Estimators: 4.1 Optimality Theory

4 Comparison of Estimators
4.1 Optimality theory
(1) Mean Squared Error (MSE)
R(θ, T ) = E(T (X) − θ)2
(2) Root Mean Squared Error (RMSE)
E|T (X) − θ|
if T ∼ N(θ, ),
r
2p
E|T (X) − θ| = R(θ, T ).
π
Relationship between MSE and bias and variance of an estimator:
R(θ, T ) = var(T (X)) + b2 (θ, T ),
where b(θ, T ) = E(T (X)) − θ.
Ex. (X1 , . . . , Xn ) ∼ N(µ, σ 2 )
1
Pn
µ̂ = X̄, σ̂ = n i=1 (Xi − X̄)2
σ2
R(θ, X̄) = var(X̄) =
n
σ2
2
2 nσ̂
b(θ, σ̂ ) = E − σ2
n σ2
σ2
= (n − 1) − σ 2
n
σ2
= −
n
1
σ4
2
2 nσ̂ σ4
R(θ, σ̂ ) = var +
n2 σ2 n2
σ4
= [2(n − 1) + 1]
n2
σ 4 (2n − 1)
= .
n2
Ex. Consider aX̄, 0 < a < 1.
σ2
R(θ, aX̄) = a2 + (a − 1)2 µ2
n
Then aX̄ is better than X̄ for µ ≈ 0.
Inadmissible : we say the estimator S is inadmissible if there exists the other estimator T
such that
R(θ, T ) ≤ R(θ, S), ∀θ,
with strict inequality holding for some θ.
* In general, there is no estimator improves all others in all respects.
* UMVUE (uniformly minimum variance unbiased estimate):
R(θ, T ) = var(T) ≤ var(S) = R(θ, S), ∀θ and S.
• Difficulty :
– (1) unbiased estimates may not exist.
– (2) even UMVUE exist, they may be Inadmissible.
– (3) unbiasedness is not invariant under functional transformation.
Other criteria for comparing estimators:
* Bayes risk :
Z
R(θ, T )π(θ)dθ
2
* Worst-case risk :
max R(θ, T )
θ
the estimator with minimum worst-case risk is called minimax.
4.2 UMVUEs
Ex. Consider two estimators for estimating the standard deviation σ when data ∼ N(µ, σ 2 ):
s
1X
σ̂ = (Xi − X̄)2
n i
1X
σ̃ = |Xi − X̄|
n i
• aσ̂ is UMVUE with

" √ #−1
2 Γ( 12 n)
a= √
n Γ( 21 (n − 1))
• cσ̃ is unbiased (but NOT UNVUE) with

"r #−1
2(n − 1)
c= .
πn
Theorem. (Rao-Blackwell) T (X) is sufficient for θ and Eθ (|S(X)|) < ∞. Then the estimator
defined by
T ∗ (X) = E(S(X)|T (X))
has the nice property
Eθ (T ∗ (X) − q(θ))2 ≤ Eθ (S(X) − q(θ))2 ∀θ.
Strict inequality holds unless T ∗ (X) = S(X) with probability one if varθ (S(X)) < ∞.
p.f.
b(θ, T ∗ (X)) = b(θ, S(X))
3
so it is sufficient to show that
var[E(S(X)|T (X))] ≤ var[S(X)]
with “=” holds iff E(S(X)|T (X)) = S(X); but that is already a known result (Chapter 1)
.
Complete: A statistic T is complete iff the only function g , defined on the range of T ,
satisfying
Eθ [g(T )] = 0, ∀θ
is just g() = 0.
P
Ex. (X1 , . . . , Xn ) ∼ P oisson(θ) ⇒ T = i Xi is sufficient for θ and T ∼ P oisson(nθ):
E(g(T ))
∞
−nθ
X g(i) (n θ)i
= e = 0, ∀θ > 0
i=0
i!
⇔ g(i) = 0, ∀i = 0, 1, 2, ...
So T is complete.
Theorem. (Lehmann-Scheffé) If T (X) is a complete sufficient statistic and S(X) is an
unbiased estimate of q(θ), then T ∗ = E[S(X)|T (X)] is UMVUE of q(θ). If varθ (T∗ (X)) < ∞,
T ∗ is the unique UMVUE.
p.f. Since b(θ, T ∗ ) = b(θ, S), so T ∗ is also unbiased. By Rao-Blackwell,
var(T∗ ) ≤ var(S)
(strictly smaller unless S is equal to T ∗ ). We need to show that T ∗ does not depend on
which unbiased estimate S we start with. Suppose
T1∗ = g1 (T ) and T2∗ = g2 (T )
4
both unbiased and obtained by Rao-Blackwell.
Then
E(g1 (T ) − g2 (T )) = 0.
By completeness g1 = g2 , so T ∗ = E(S|T ) cannot depend on S, and uniqueness follows from
the Rao-Blackwell theorem.
How to find UMVUE: first find a complete sufficient statistic T , then either
(1) find h(T (X)) such that h(T ) is unbiased, or
(2) find an unbiased S, then E(S|T ) is UMVUE.
Ex. hypergeometric ⇒ X is sufficient and complete:
n Nθ N −N θ

X k n−k
Eθ (g(X)) = N
g(k) = 0, ∀θ = 0, 1/N, . . . , 1.
k=0 n
When θ = 0,
E0 (g(X)) = g(0) = 0 ⇒ g(0) = 0
1
When θ = N
,
1 N −1 1 N −1

0 1 n−1
Eθ (g(X)) = N
n g(0) + N
g(1)
n n
⇒ g(1) = 0.
So, by induction,
g(0) = g(1) = · · · = g(n) = 0
hence g() = 0. Since Eθ ( Xn ) = θ,

X
is UMVUE.
n
5
Theorem. Suppose {Pθ } belongs to exponential family, and suppose C = (C1 (θ), · · · , Ck (θ))
has a nonempty interior. Then
T (X) = (T1 (X), . . . , Tk (X))
is complete and sufficient.
Ex. (X1 , . . . , Xn ) ∼ N(µ, σ),

P
2 i (Xi − X̄)
s =
n−1
is unbiased for σ 2 and is a function of complete sufficient statistics, so it is UMVUE for σ 2 .
Ex. (X1 , . . . , Xn ) ∼ U(0, θ)
1
p(X, θ) = if X(n) < θ, X(1) > 0.
θn
Recall that the sufficient statistic for θ is X(n) , and the density of X(n) is
n tn−1
fX(n) (t) = , 0 < t < θ.
θn
Note that, if
θ
n
Z
g(t) tn−1dt = 0, ∀θ > 0,

E g(X(n) ) = n
θ 0
⇒ g(t) = 0.
So X(n) is also complete. Since
θ
n nθ
Z
E(X(n) ) = n tn dt =
θ 0 n+1
n+1
We have that n
X(n) is UMVUE for θ.
Ex. X = (X1 , . . . , Xn ) ∼ Exp(λ),
f (x) = λe−λx , x > 0.
6
Pn
⇒T = i=1 Xi is sufficient, (complete) for λ. We want to estimate the quantity
Pλ (X1 ≤ x) = 1 − e−λx .
Note that I(X1 ≤ x) is an unbiased estimate of 1 − e−λx , and
E[I(X1 ≤ x)|T = t]
n
X
= P (X1 ≤ x| Xi = t)
i=1
n
!
X1 x X
= P Pn ≤ Pn | Xi = t
i=1 Xi i=1 Xi i=1
∵ X1 ∼ Γ(1, λ)
n
X
Xi ∼ Γ(n, λ)
i=1
X1
∴ Pn ∼ β(1,n−1)
i=1 Xi
n
X
E[I(X1 ≤ x)| Xi = t]
i=1
Z x/t
= β(1,n−1) (u) du
0
Z x/t
= (n − 1)(1 − u)n−2 du
0
Pn
Pn x )n−1 , if

 1 − (1 −

i=1 Xi > x
i=1 Xi
=


 1, o.w.
which is UMVUE for Pλ (X1 ≤ x).
4.3 Information Theory
* Regularity Assumptions :
(1) The set A = {x : p(x, θ) > 0} (support) does not depend on θ.
7
(2) ∀x ∈ A and θ ∈ Θ,
∂
log p(x, θ)
∂θ
exists and is finite.
(3) Eθ (|T |) < ∞, and hence
∂ ∂
Z Z
T (x)p(x, θ)dx = T (x) p(x, θ)dx
∂θ ∂θ
* Exponential family satisfies the regularity conditions above.
* Score Function (for single observation):
∂
S(X, θ) = log p(X, θ)
∂θ
* Properties for score function (under regularity assumptions) : (1) Eθ (S(X, θ)) = 0.
p.f.

∂
Eθ log p(x, θ)
∂θ
∂
Z
= log p(x, θ) p(x, θ)dx
∂θ
∂
Z
= p(x, θ)dx
∂θ
∂
Z
= p(x, θ)dx
∂θ
∂
= 1
∂θ
= 0.
h i
∂2
(2) varθ (S(X, θ)) = −E ∂θ 2
log p(x, θ) .
∂ 2
p.f. Since Eθ (S(X, θ)) = 0, varθ (S(X, θ)) = E[S(X, θ)]2 = E ∂θ
log p(X, θ) . The result
8
follows by noting that
∂2

−E log p(x, θ)
∂θ2
∂2
Z
= − log p(x, θ) p(x, θ)dx
∂θ2
Z
∂ ∂ ∂ ∂
Z
= − log p(x, θ) p(x, θ) dx − log p(x, θ) p(x, θ) dx
∂θ ∂θ ∂θ ∂θ
∂
∂ ∂ ∂ p(x, θ)
Z Z
= − log p(x, θ) p(x, θ)dx + log p(x, θ) ∂θ p(x, θ) dx
∂θ ∂θ ∂θ p(x, θ)
Z 2
∂
= 0+ log p(x, θ) p(x, θ)dx
∂θ
2
∂
= E log p(x, θ) .
∂θ
(3) covθ (S(X, θ), T (X)) = ψ ′ (θ) with ψ(θ) = Eθ (T (X)).
p.f. Since Eθ (S(X, θ)) = 0, covθ (S(X, θ), T (X)) = E[(S(X, θ)T (X)], and
∂
Z
E[(S(X, θ)T (X)] = log p(x, θ)T (x)p(x, θ) dx
∂θ
∂
Z
= p(x, θ)T (x) dx
∂θ
∂
Z
= p(x, θ)T (x) dx
∂θ
∂
= Eθ (T (x)) = ψ ′ (θ).
∂θ
* Fisher Information (for single observation):

2
∂
I(θ) = E log p(x, θ)
∂θ
2
∂
= −E log p(x, θ) (from Property (2))
∂θ2
Note that the Fisher information number satisfies 0 ≤ I(θ) ≤ ∞.
Ex. (X1 , . . . , Xn ) ∼ P oisson(θ)

Pn
∂ i=1 xi
log p(x, θ) = −n
∂θ θ
9
Pn 2
i=1 xi
I(θ) = E −n
θ
Pn
i=1 xi
= var
θ
1
= 2 nθ
θ
n
= .
θ
Theorem. (Information Inequality) Let T (x) be any statistic such that var(T (x)) < ∞, ∀θ.
Let Eθ (T (x)) = ψ(θ). Under reqularity conditions and 0 < I(θ) < ∞:
[ψ ′ (θ)]2
varθ (T (x)) ≥ , ∀ψ differentiable.
I(θ)
∂
p.f. Let S = ∂θ
log p(x, θ) (score). Then var(S) = I(θ), and ψ ′ (θ) = cov(T, S). The result
follows by the well known inequality for covariance (Cauchy-Schwartz).
Corollary.
For unbiased T (X) :

1
var(T (X)) ≥ .
I(θ)
1/I(θ) : Cramér-Rao lower bound.
Corollary.
For unbiased T ∗ , if
1
var(T ∗ ) =
I(θ)
then T ∗ is UMVUE.
In general, we have n observations and the Fisher information I(θ) is defined for n
observations. Suppose that I1 (θ) is the Fisher information for single observation. Then
I(θ) = n I1 (θ).
10
Ex. (X1 , . . . , Xn ) ∼ N(µ, σ 2 )
σ2 1 1
var(X̄) = = = ,
n I(θ) n I1 (θ)
1
where I1 (θ) = σ2
.
Theorem. If {Pθ : θ ∈ Θ} satisfies regularity conditions and there exists an unbiased
estimate T ∗ of ψ(θ) achieving information bound for every θ. Then Pθ is a one-parameter
exponential family with density p(x, θ) = exp [c(θ)T ∗ (x) + d(θ) + s(x)] IA (x). Conversely, if
Pθ is a one-parameter exponential family and c(θ) has a continuous non-vanishing deviate
on Θ. Then T (X) achieves the information bound and is a UMVUE of E(T (X)).
4.4 Large Sample Theory
* Consistency : Tn (X1 , . . . , Xn ) is with high probability close to θ.

p
Mathematically, Tn is consistent for θ (Tn → θ) iff ∀ε > 0, as n → ∞,
P [ |Tn (X1 , . . . , Xn ) − θ| ≥ ε ] → 0.
Ex.
n
1X j
m̂j = X
n i=1 i
is consistent for E(X j ).
Theorem. The method of moment estimator is consistent.
p
Tn = g(m̂1 , . . . , m̂r ) → g(m1 (θ), . . . , mr (θ)) = g(θ)
* UMVUEs are consistent; MLEs usually are.
11
* Asymptotic Normality:
Tn is approximately normally distributed with µn (θ) and variance σn2 (θ) iff

t − µn (θ)
P (Tn (X) ≤ t) ≈ Φ
σn (θ)
i.e., ∀z,

Tn (X1 , . . . , Xn ) − µn (θ)
lim P ≤ z = Φ(z).
n→∞ σn (θ)
√
* n-consistent :
√
n(µn (θ) − q(θ)) → 0,
n σn2 (θ) → σ 2 (θ) > 0.
Ex. (X1 , . . . , Xn ) ∼ Bin(θ)
√ d
n q(X̄n ) − q(θ) → Z ∼ N(0, [q ′ (θ)]2 θ(1 − θ)).
So we can take
[q ′ (θ)]2 θ(1 − θ)
µn = q(θ), σn2 = .
n
Theorem. (a) Suppose that P = (P1 , . . . , Pk ) are the population frequencies for k categories.
Let Tn = h(P̂1 , . . . , P̂k ) with P̂i (i = 1, . . . , k) the sample frequencies. Then
√
n(Tn − h(P1 , . . . , Pk )) → N(0, σh2 )
k 2 " k
#2
X ∂ X ∂
σh2 = Pi h(P) − Pi h(P)
i=1
∂Pi i=1
∂Pi
(b) Suppose that m = (m1 , . . . , mr ) are the population moments. Let Tn = h(m̂1 , . . . , m̂k )
with m̂i (i = 1, . . . , r) the sample moments. Then
√
n(Tn − g(m1 , . . . , mr )) → N(0, σg2 )
12
2r
" r #2
X X ∂
σg2 = bi mi − mi g(m)
i=2 i=1
∂mi
r
!
X ∂
= var g(m) X i ,
i=1
∂mi
X ∂ ∂
bi = g(m) g(m).
∂mj ∂mk
j+k=i
1≤j, k≤r
Ex. (Hardy-Weinberg Equilibrium) N1 = #(AA), N2 = #(Aa), N3 = #(aa), θ = P (A), P1 =
P (AA) = θ2 , P2 = P (Aa) = 2θ(1 − θ), P3 = P (aa) = (1 − θ)2 .

r
N1
T1 (X) =
nr
N3
T2 (X) = 1 −
n
N1 N2
T3 (X) = + (MLE)
n 2n
p
h1 (P1 , P2 , P3 ) = P1
p
h2 (P1 , P2 , P3 ) = 1 − P3
1
h3 (P1 , P2 , P3 ) = P1 + P2
2
2
1 − θ2

1 1 1 − P1
σ12 = √ · var(X1 ) = · P1 (1 − P1 ) = =
2 P1 4P1 4 4
2
1 − (1 − θ)2

1 1 1 − P3
σ22 = − √ · var(X3 ) = · P3 (1 − P3 ) = =
2 P3 4P3 4 4
2
1 1
σ32 = 2
1 · var(X1 ) + · var(X2 ) + 1 · · 2 cov(X1 , X2 )
2 2
1
= P1 (1 − P1 ) + · P2 (1 − P2 ) + 1 · (−P1 P2 )
4
1
= θ12 (1 − θ12 ) + · 2θ1 (1 − θ1 )[1 − 2θ1 (1 − θ1 )] − θ12 · 2θ1 (1 − θ1 )
4
2
θ θ1
= − 1 +
2 2
θ1 (1 − θ1 )
= .
2
13
Ex. σ̂ 2 = m̂2 − m̂21 .
g(m1 , m2 ) = m2 − m21
∂
g(m1 , m2 ) = −2m1
∂m1
∂
g(m1 , m2 ) = 1
∂m2
∴ var(σ̂ 2 ) = var(−2m1 · X + 1 · X 2 )
= var(X − m1 )2
= E(X − m1 )4 − σ 4
= 3m41 − 4m1 m3 + 8m21 m3 − m22 .
* Asymptotic Relative Efficiency (ARE) :
Tn(1) : nσn2 1 = σ12
Tn(2) : nσn2 2 = σ22
the asymptotic relative efficiency of Tn1 to Tn2 :
σ22
e(θ, T (1) , T (2) ) =
σ12
Ex. (HWE)
1 − θ2
σ12 =
4
2 1 − (1 − θ)2
σ2 =
4
1 − (1 − θ)2
e(T1 , T2 ) =
1 − θ2
1
T1 is better when θ > 2
and T1 ≈ T2 when θ = 21 .
14
Does the asymptotic variance σ 2 (θ) satisfy
{ψ ′ (θ)}2
σ 2 (θ) ≧ ?
I(θ)
generally holds.
Ex. Sample variance when X ∼ Poisson(θ):
σ 2 (θ) = E(X − θ)4 − θ2 = θ + 3θ2 − θ2 = θ(1 + 2θ)

∂
q ′ (θ) = θ=1
∂θ
2 2
θX e−θ

∂ X var(X) 1
I1 (θ) = E log =E −1 = 2
=
∂θ X! θ σ θ
∴ Information bound = θ ≤ θ(1 + 2θ).
*asymptotically efficient : σ 2 (θ) satisfies the information bound.
* MLE are asymptotically efficient.
Ex. X ∼ Poisson(θ)
MLE for θ is X̄ and

θ
var(X̄) = ,
n
achieving the information bound.
Ex. X ∼ Bin(θ)
2 2
(X − θ)2

∂ X 1−X 1
I(θ) = E log θX (1 − θ)1−X =E − =E =
∂θ θ 1−θ θ(1 − θ) θ(1 − θ)
∴ q(X̄) is asymptotically efficient.
*Proof of MLE’s asymptotic efficiency
∂
Let ℓi = ∂θ
log p(Xi , θ), and θ̂ the MLE. Then
X X hX i
0= ℓi (θ̂) ≈ ℓi (θ) + ℓ′i (θ∗ ) (θ̂ − θ)
15
− √1n ℓi (θ)
P
√

N(0, I(θ)) 1
n(θ̂ − θ) ≈ 1 P ′ −−−−→ = N 0,
n
ℓi (θ) Stustky
CLT I(θ) I(θ)
MLE is asymptotically normal and efficient.
(MLE is not the only method guaranteeing asymptotic optimality.)
(UMVUE, first-order Newton’s method approximation, Bayes estimate.)
Ex. q(θ) = θ(1 − θ)

n
UMVUE = X̄(1 − X̄)
n−1
√ n n √
n{Tn − θ(1 − θ)} = n{X̄n (1 − X̄n ) − θ(1 − θ)}
n−1 n−1
has the same limit as

√
n {X̄n (1 − X̄n ) − θ(1 − θ)}
∴ the UMVUE has the same asymptotical distribution as MLE.
16

4 Comparison of Estimators: 4.1 Optimality Theory

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

4 Comparison of Estimators: 4.1 Optimality Theory

Uploaded by

Copyright:

Available Formats

4 Comparison of Estimators

4.1 Optimality theory

(1) Mean Squared Error (MSE)

R(θ, T ) = E(T (X) − θ)2

(2) Root Mean Squared Error (RMSE)

Relationship between MSE and bias and variance of an estimator:

R(θ, T ) = var(T (X)) + b2 (θ, T ),

where b(θ, T ) = E(T (X)) − θ.

Ex. (X1 , . . . , Xn ) ∼ N(µ, σ 2 )

Ex. Consider aX̄, 0 < a < 1.

Then aX̄ is better than X̄ for µ ≈ 0.

R(θ, T ) ≤ R(θ, S), ∀θ,

with strict inequality holding for some θ.

* In general, there is no estimator improves all others in all respects.

* UMVUE (uniformly minimum variance unbiased estimate):

R(θ, T ) = var(T) ≤ var(S) = R(θ, S), ∀θ and S.

– (1) unbiased estimates may not exist.

– (2) even UMVUE exist, they may be Inadmissible.

– (3) unbiasedness is not invariant under functional transformation.

Other criteria for comparing estimators:

the estimator with minimum worst-case risk is called minimax.

• aσ̂ is UMVUE with

• cσ̃ is unbiased (but NOT UNVUE) with

T ∗ (X) = E(S(X)|T (X))

has the nice property

Eθ (T ∗ (X) − q(θ))2 ≤ Eθ (S(X) − q(θ))2 ∀θ.

b(θ, T ∗ (X)) = b(θ, S(X))

var[E(S(X)|T (X))] ≤ var[S(X)]

Theorem. (Lehmann-Scheffé) If T (X) is a complete sufficient statistic and S(X) is an

T ∗ is the unique UMVUE.

p.f. Since b(θ, T ∗ ) = b(θ, S), so T ∗ is also unbiased. By Rao-Blackwell,

which unbiased estimate S we start with. Suppose

T1∗ = g1 (T ) and T2∗ = g2 (T )

By completeness g1 = g2 , so T ∗ = E(S|T ) cannot depend on S, and uniqueness follows from

the Rao-Blackwell theorem.

(1) find h(T (X)) such that h(T ) is unbiased, or

(2) find an unbiased S, then E(S|T ) is UMVUE.

Ex. hypergeometric ⇒ X is sufficient and complete:

E0 (g(X)) = g(0) = 0 ⇒ g(0) = 0

g(0) = g(1) = · · · = g(n) = 0

hence g() = 0. Since Eθ ( Xn ) = θ,

has a nonempty interior. Then

T (X) = (T1 (X), . . . , Tk (X))

is complete and sufficient.

Ex. (X1 , . . . , Xn ) ∼ N(µ, σ),

is unbiased for σ 2 and is a function of complete sufficient statistics, so it is UMVUE for σ 2 .

Ex. (X1 , . . . , Xn ) ∼ U(0, θ)

So X(n) is also complete. Since

Ex. X = (X1 , . . . , Xn ) ∼ Exp(λ),

f (x) = λe−λx , x > 0.

Note that I(X1 ≤ x) is an unbiased estimate of 1 − e−λx , and

which is UMVUE for Pλ (X1 ≤ x).

4.3 Information Theory

(1) The set A = {x : p(x, θ) > 0} (support) does not depend on θ.

exists and is finite.

(3) Eθ (|T |) < ∞, and hence

* Exponential family satisfies the regularity conditions above.

* Score Function (for single observation):

(3) covθ (S(X, θ), T (X)) = ψ ′ (θ) with ψ(θ) = Eθ (T (X)).

* Fisher Information (for single observation):

Note that the Fisher information number satisfies 0 ≤ I(θ) ≤ ∞.

Ex. (X1 , . . . , Xn ) ∼ P oisson(θ)

follows by the well known inequality for covariance (Cauchy-Schwartz).

hence g() = 0. Since Eθ ( Xn ) = θ,