You are on page 1of 12

ESTIMATORS Sufficient Statistics

Let X1 , X2 , . . . , Xn be a random sample from a variable with probability density function


f (x; θ) and let Y1 , Y2 , . . . , Yn be n statistics defined by

Y1 = u1 (X1 , X2 , . . . , Xn )
Y2 = u2 (X1 , X2 , . . . , Xn )
Y3 = u3 (X1 , X2 , . . . , Xn )
..
.
Yn = un (X1 , X2 , . . . , Xn )

where the transformation is 1–1.

The joint probability density function of Y1 , Y2 , . . . , Yn is

g(y; θ) = f (w1 (y1 , y2 , . . . , yn ; θ))f (w2 (y1 , . . . , yn ; θ)) . . . f (wn (y1 , . . . , yn ; θ))|J|

and the marginal probability density function of Y1 is


Z ∞ Z ∞
g1 (y1 ; θ) = ... g(y1 , y2 , . . . , yn )dy2 . . . dyn
−∞ −∞

and the conditional probability density function of Y2 , . . . , Yn given Y1 = y1 is

g(y1 , y2 , . . . yn ; θ)
h(y2 , . . . , yn |y1 ; θ) =
g1 (y1 ; θ)

provided g1 (y1 ; θ) > 0. The conditional distribution usually depends on θ but if it does
not depend on θ we have the following definition of a sufficient statistic.

Sufficient Statistic

If for n fixed, X1 , X2 , . . . , Xn is a random sample from a variable with probability density


function f (x; θ), and the statistics Y1 , Y2 , . . . , Yn are defined by a transformation which
is 1–1 the statistic Y1 = u1 (X1 , X2 , . . . , Xn ) is a sufficient statistic for θ if and only
if for all other statistics Y2 = u2 (X1 , X2 , . . . , Xn ) . . . Yn = un (X1 , X2 , . . . , Xn ) for which
the Jacobian is non–zero, the conditional probability density function h(y2 , y3 , . . . , yn |y1 )
of Y2 , Y3 , . . . , Yn given Y1 = y1 does not depend on θ for any value y1 .

1
ESTIMATORS Fisher–Neyman Criterion

Fisher–Neyman Criterion

If X1 , . . . , Xn is a random sample from a random variable with probability density function


f (x; θ) θǫΩ and Y1 = u1 (X1 , . . . , Xn ) is a statistic with probability density function
g1 (y1 ; θ), then Y1 is sufficient for θ iff

Πf (xi ; θ) = g1 (u1 (x1 , . . . , xn ); θ)H(x1 , . . . , xn )

where H(x1 , . . . , xn ) does not depend on θ.

Example One

If the random variable X has probability density function

f (x; θ) = θx (1 − θ)1−x x = 0, 1 0<θ<1


P
then Y1 = Xi has probability density function

n!
g1 (y1 ; θ) = θy1 (1 − θ)n−y1 y1 = 0, 1, . . . , n.
y1 !(n − y1 )!

The joint probability density function of X1 , . . . , Xn is

θx1 (1 − θ)1−x1 θx2 (1 − θ)1−x2 . . . θxn (1 − θ)1−xn


= θx1 +x2 +...+xn (1 − θ)n−(x1 +x2 ...+xn )
(x1 + x2 + . . . + xn )!(n − (x1 + x2 + . . . + xn ))!
= g1 (y1 ; θ) ×
n!
and Y1 is sufficient for θ.

2
ESTIMATORS Fisher–Neyman Criterion

Example Two

If the random variable X has probability density function

f (x; θ) = e−(x−θ) θ<x<∞ −∞<θ <∞

then Y1 = X(1) has probability density function

g1 (y1 ; θ) = ne−n(y1 −θ) 0 < y1 < ∞.

The joint probability density function of X1 , . . . , Xn is


n
Y P
xi nθ
e−(xi −θ) = e− e
1
P
 xi− 
e
= g1 (x(1) ; θ)
n exp(−nx(1) )

and so Y1 = X(1) is sufficient for θ.

3
ESTIMATORS Factorization Criterion

Factorization Criterion

If X1 , . . . , Xn is a random sample from a random variable with probability density function


f (x; θ) θǫΩ and Y1 = u1 (X1 , . . . , Xn ) is a statistic with probability density function
g1 (y1 ; θ), then Y1 is sufficient for θ iff there exist two non–negative functions k1 and k2
such that
Πf (xi ; θ) = k1 [u1 (x1 , . . . , xn )]k2 (x1 , . . . , xn )
where for every value of y1 , k2 does not depend on θ.

Example One

If the random variable X has probability density function

f (x; θ) = θxθ−1 0<x<1 θ>0

the joint probability density function of a random sample X1 , X2 . . . , Xn is

1
θn (x1 x2 . . . xn )θ−1 = θn (x1 x2 . . . xn )θ
(x1 x2 . . . xn )

and setting
k1 (u1 (x1 , x2 , . . . , xn ); θ) = θn (x1 x2 , . . . xn )θ
1
k2 (x1 , x2 , . . . , xn ) =
(x1 x2 . . . xn )
k2 does not depend on θ and so ΠXi is sufficient for θ.

4
ESTIMATORS Factorization Criterion

Example Two

If the random variable X has probability density function

2 1 h 1 2
i
f (x; θ, σ ) = √ exp − 2 (x − θ) −∞<x<∞ σ2 known
σ 2π 2σ

the joint probability density function of a random sample X1 , X2 . . . , Xn is


n
2
1 n h 1 X 2
i
Πf (xi ; θ, σ ) = √ exp − 2 (xi − θ)
σ 2π 2σ i=1

and using the identity


X X 2
(xi − θ)2 = (xi − x̄) + (x̄ − θ)
X
= (xi − x̄)2 + n(x̄ − θ)2

this can be written as

2
 n 2
  1 X 2
. √ n
Πf (xi ; θ, σ ) = exp − 2 (x̄ − θ) exp − 2 (xi − x̄) (σ 2π) .
2σ 2σ

Setting
n
k1 (u1 (x1 , x2 , . . . , xn ); θ) = exp − 2 (x̄ − θ)2
 

1
(xi − x̄)2
 P 
exp − 2σ 2
k2 (x1 , x2 , . . . , xn ) = √
(σ 2π)n
k2 does not depend on θ and so X̄ is sufficient for θ.

Note

Every single valued function Z = u(Y1 ), not involving θ, with a single valued inverse is
also sufficient for θ.

5
ESTIMATORS Completeness

Completeness

Let {f (x; θ); θǫΩ} be a family of discrete or continuous probability density functions and
let u(x) be a continuous function of x but not a function of θ. If E(u(X)) = 0 for every
θǫΩ requires u(x) to be zero at each point x at which at least one member of the family of
probability density functions is positive, then the family of probability density functions
is called a complete family.

Example One

Consider the family of probability density functions given by

1
f (x; θ) = 0<x<θ 0 < θ < ∞.
θ

If Z ∞
E(u(X)) = u(x)f (x; θ) dx
−∞
Z θ
1
= u(x) dx
0 θ
=0 θ > 0 by assumption
then Z θ
u(x) dx = 0 θ > 0.
0

Differentiating with respect to θ gives u(θ) = 0 for θ > 0 and so u(x) = 0 for x > 0.

6
ESTIMATORS Completeness

Example Two

Consider the family of probability density functions given by

f (x; θ) = θx (1 − θ)1−x 0<θ<1 x = 0, 1.

Each member of the family is positive at only x = 0 and x = 1 so we need to show that
u(0) = u(1) = 0.

In this case X
E(u(X)) = u(x)f (x; θ)
x
1
X
= u(x)θx (1 − θ)1−x
x=0
= u(0)(1 − θ) + u(1)θ
= θ((u(1) − u(0)) + u(0)
=0
is a linear function of θ. If a linear function is zero at more than one point, then both
the slope and intercept are zero so that

u(1) − u(0) = 0 and u(0) = 0

so that
u(0) = u(1) = 0.

Example Three

Consider the family of probability density functions given by

1
f (x; θ) = −θ <x<θ 0<θ<∞

and let u(x) = x. Then


θ
1
Z
E(u(X)) = x dx = 0
−θ 2θ
so that E(u(X)) = 0 but u(x) 6= 0 so the family is not complete.

7
ESTIMATORS Uniqueness

Uniqueness

Let X1 , X2 , . . . , Xn be a random sample from a distribution with probability density


function f (x; θ) θǫΩ, let Y1 = u1 (X1 , X2 , . . . , Xn ) be a sufficient statistic for θ and let
the family {g1 (y1 ; θ); θǫΩ} of probability density functions be complete. If there exists a
continuous function of Y1 which is unbaised for θ, then this function of Y1 is the unique
best statistic for θ.

Unique best statistic

If a continuous function ϕ(Y1 ) is unbiased for θ and some other function ψ(Y1 ) which is
not a function of θ is also unbiased for θ, then

E(ϕ(Y1 ) − ψ(Y1 )) = 0 θǫΩ

and if the family {g1 (y1 ; θ); θǫΩ} is complete then for every continuous unbiased statistic
ϕ(Y1 )
ϕ(Y1 ) = ψ(Y1 )
at all points of non–zero probability density.

So if
Y1 = u1 (X1 , X2 , . . . , Xn ) is sufficient for θ
and
Y2 (not a function of Y1 alone) is unbiased for θ
consider
E(Y2 |y1 ) = ϕ(y1 ).

Y1 is sufficient for θ so the conditional probability density function of Y2 given Y1 = y1


does not depend on θ so ϕ(y1 ) is a function of y1 alone. That is, the statistic ϕ(Y1 ) is a
function of the sufficient statistic, is unbiased for θ and has smaller variance than Y2 .

8
ESTIMATORS Parameters in Exponential Class

If X1 , X2 , . . . , Xn is a random sample from a distribution with probability density function


f (x; θ), γ < θ < δ which represents P a regular case of an exponential class of probability
n
density functions, the statistic Y1 = 1 K(Xi ) is a sufficient statistic for θ and the family
{g1 (y1 ; θ) : γ < θ < δ} of probability density functions is complete. In this case, if there
exists a continuous function of Y1 , say ϕ(Y1 ), such that E(ϕ(Y1 )) = θ, then the statistic
ϕ(Y1 ) is the unique best statistic for θ.

Example One

If X1 , X2 , . . . , Xn is a random sample from a normal distribution, N (θ, σ 2 ), the probability


density function of X is

θ x2 √ θ2

f (x; θ) = exp 2 x − 2 − ln 2πσ 2 − 2
σ 2σ 2σ

which is a regular case of the exponential class with

θ x2 √ θ2
p(θ) = K(x) = x S(x) = − − ln 2πσ 2 q(θ) = −
σ2 2σ 2 2σ 2
P
and so Y1 = Xi is a complete sufficient statistic for θ and as E(Y1 ) = nθ,

Y1
ϕ(Y1 ) = = X̄
n
is unbiased for θ, is a function of the sufficient statistic Y1 and has minimum variance.
So X̄ is the unique best statistic for θ.

Example Two

The probability density function of a Poisson distribution is


 
f (x; θ) = exp (ln θ)x − ln (x!) − θ

P
and so Y1 = Xi is a complete sufficient statistic for θ and as E(Y1 ) = nθ, the statistic

Y1
ϕ(Y1 ) = = X̄
n
is unbiased for θ and is the unique best statistic for θ.

9
ESTIMATORS Invariance Property

Invariance property of maximum likelihood estimators

If θ̂ is the maximum likelihood estimator of θ, the maximum likelihood estimator of


τ = τ (θ) is τ (θ̂).

Example

If the random variable X has a Poisson distribution with probability density function
f (x; θ) = e−θ θx /x! the log of the likelihood is
X Y
ln L(θ) = −nθ + ln θ xi − ln( xi !)

and P
∂ ln L(θ) xi
= −n + = 0 if θ = x̄
∂θ θ
so that the maximum likelihood estimator of θ is X̄.

To estimate τ (θ) = P (X = 0) = e−θ let θ = − ln τ and reparametrize the probability


density function so that
τ (− ln τ )x
f ∗ (x; τ ) =
x!
and the likelihood is P
n xi
τ (− ln τ )
L∗ (τ ) = Q
xi !
with X   Y
ln L∗ (τ ) = n ln τ + xi ln (− ln τ ) − ln xi !

and P
∂ ln L∗ (τ ) n xi −1
= + =0 if − ln τ = x̄
∂τ τ − ln τ τ
and so the maximum likelihood estimator of τ is

τ̂ = e−X̄ .

10
ESTIMATORS Functions of a Parameter

Let X1 , X2 , . . . , Xn be a random sample from a normal distribution, N (θ, 1). Finding the
best statistic for P (X ≤ c) = Φ(c − θ) involves the following three steps.
(i) Find an unbiased statistic for Φ(c − θ).
(ii) Know that X̄ is sufficient for θ.
(iii) If E(unbiased statistic|X̄ = x̄) = ϕ(x̄) then note that ϕ(X̄) is the unique best
statistic for Φ(c − θ).

(i)

Let
1 x1 ≤ c
n
u(x1 ) =
0 x1 > c


1
Z
(x1 −θ)2
E(u(X1 )) = u(x1 ) √ e− 2 dx1
−∞ 2π
Z c
1 (x1 −θ)2
= √ e− 2 dx1
−∞ 2π
= Φ(c − θ)

(ii)

From previous results we know that X̄ is sufficient for θ.

(iii)

Variables X and Y have a bivariate normal distribution if their probability density


function is
1 q
f (x, y) = p e− 2
2πσ1 σ2 1 − ρ2
where  #
1 x − µ1 2  x − µ  y − µ   y − µ 2
1 2 2
q= − 2ρ +
1 − ρ2 σ1 σ1 σ2 σ2

11
ESTIMATORS Functions of a Parameter

The joint distribution of X1 and X̄ is bivariate normal with X1 having mean θ and
variance 1, X̄ having mean θ and variance n1 and X1 and X̄ having correlation coefficient
√1 .
n

n−1
The conditional distribution of X1 given X̄ = x̄ is normal with mean x̄ and variance n
and so

ϕ(x̄) = E(u(X1 )|X̄ = x̄)


r
n(x1 −x̄)2
Z ∞
n 1
= u(x1 ) √ e− 2(n−1) dx1
−∞ n − 1 2π
Z c r n(x1 −x̄)2
n 1
= √ e− 2(n−1) dx1
−∞ n − 1 2π

Z √ n(c−x̄) √
n−1 1 − z2 n(x1 − x̄)
= √ e 2 dz letting z = √
−∞ 2π n−1
 √n(c − x̄) 
=Φ √
n−1
and so the unique, minimum variance unbiased statistic for Φ(c − θ) is
√ 
n(c − X̄)
Φ √ .
n−1

12

You might also like