Lecture 8

ESTIMATORS Sufficient Statistics
Let X1 , X2 , . . . , Xn be a random sample from a variable with probability density function

f (x; θ) and let Y1 , Y2 , . . . , Yn be n statistics defined by
Y1 = u1 (X1 , X2 , . . . , Xn )
Y2 = u2 (X1 , X2 , . . . , Xn )
Y3 = u3 (X1 , X2 , . . . , Xn )
..
.
Yn = un (X1 , X2 , . . . , Xn )
where the transformation is 1–1.
The joint probability density function of Y1 , Y2 , . . . , Yn is
g(y; θ) = f (w1 (y1 , y2 , . . . , yn ; θ))f (w2 (y1 , . . . , yn ; θ)) . . . f (wn (y1 , . . . , yn ; θ))|J|
and the marginal probability density function of Y1 is

Z ∞ Z ∞
g1 (y1 ; θ) = ... g(y1 , y2 , . . . , yn )dy2 . . . dyn
−∞ −∞
and the conditional probability density function of Y2 , . . . , Yn given Y1 = y1 is
g(y1 , y2 , . . . yn ; θ)
h(y2 , . . . , yn |y1 ; θ) =
g1 (y1 ; θ)
provided g1 (y1 ; θ) > 0. The conditional distribution usually depends on θ but if it does
not depend on θ we have the following definition of a sufficient statistic.
Sufficient Statistic
If for n fixed, X1 , X2 , . . . , Xn is a random sample from a variable with probability density

function f (x; θ), and the statistics Y1 , Y2 , . . . , Yn are defined by a transformation which
is 1–1 the statistic Y1 = u1 (X1 , X2 , . . . , Xn ) is a sufficient statistic for θ if and only
if for all other statistics Y2 = u2 (X1 , X2 , . . . , Xn ) . . . Yn = un (X1 , X2 , . . . , Xn ) for which
the Jacobian is non–zero, the conditional probability density function h(y2 , y3 , . . . , yn |y1 )
of Y2 , Y3 , . . . , Yn given Y1 = y1 does not depend on θ for any value y1 .
1
ESTIMATORS Fisher–Neyman Criterion
Fisher–Neyman Criterion
If X1 , . . . , Xn is a random sample from a random variable with probability density function

f (x; θ) θǫΩ and Y1 = u1 (X1 , . . . , Xn ) is a statistic with probability density function
g1 (y1 ; θ), then Y1 is sufficient for θ iff
Πf (xi ; θ) = g1 (u1 (x1 , . . . , xn ); θ)H(x1 , . . . , xn )
where H(x1 , . . . , xn ) does not depend on θ.
Example One
If the random variable X has probability density function
f (x; θ) = θx (1 − θ)1−x x = 0, 1 0<θ<1

P
then Y1 = Xi has probability density function
n!
g1 (y1 ; θ) = θy1 (1 − θ)n−y1 y1 = 0, 1, . . . , n.
y1 !(n − y1 )!
The joint probability density function of X1 , . . . , Xn is
θx1 (1 − θ)1−x1 θx2 (1 − θ)1−x2 . . . θxn (1 − θ)1−xn

= θx1 +x2 +...+xn (1 − θ)n−(x1 +x2 ...+xn )
(x1 + x2 + . . . + xn )!(n − (x1 + x2 + . . . + xn ))!
= g1 (y1 ; θ) ×
n!
and Y1 is sufficient for θ.
2
ESTIMATORS Fisher–Neyman Criterion
Example Two
f (x; θ) = e−(x−θ) θ<x<∞ −∞<θ <∞
then Y1 = X(1) has probability density function
g1 (y1 ; θ) = ne−n(y1 −θ) 0 < y1 < ∞.
The joint probability density function of X1 , . . . , Xn is

n
Y P
xi nθ
e−(xi −θ) = e− e
1
P
xi−
e
= g1 (x(1) ; θ)
n exp(−nx(1) )
and so Y1 = X(1) is sufficient for θ.
3
ESTIMATORS Factorization Criterion
Factorization Criterion
If X1 , . . . , Xn is a random sample from a random variable with probability density function

f (x; θ) θǫΩ and Y1 = u1 (X1 , . . . , Xn ) is a statistic with probability density function
g1 (y1 ; θ), then Y1 is sufficient for θ iff there exist two non–negative functions k1 and k2
such that
Πf (xi ; θ) = k1 [u1 (x1 , . . . , xn )]k2 (x1 , . . . , xn )
where for every value of y1 , k2 does not depend on θ.
Example One
f (x; θ) = θxθ−1 0<x<1 θ>0
the joint probability density function of a random sample X1 , X2 . . . , Xn is
1
θn (x1 x2 . . . xn )θ−1 = θn (x1 x2 . . . xn )θ
(x1 x2 . . . xn )
and setting
k1 (u1 (x1 , x2 , . . . , xn ); θ) = θn (x1 x2 , . . . xn )θ
1
k2 (x1 , x2 , . . . , xn ) =
(x1 x2 . . . xn )
k2 does not depend on θ and so ΠXi is sufficient for θ.
4
ESTIMATORS Factorization Criterion
Example Two
2 1 h 1 2
i
f (x; θ, σ ) = √ exp − 2 (x − θ) −∞<x<∞ σ2 known
σ 2π 2σ
the joint probability density function of a random sample X1 , X2 . . . , Xn is

n
2
1 n h 1 X 2
i
Πf (xi ; θ, σ ) = √ exp − 2 (xi − θ)
σ 2π 2σ i=1
and using the identity

X X 2
(xi − θ)2 = (xi − x̄) + (x̄ − θ)
X
= (xi − x̄)2 + n(x̄ − θ)2
this can be written as
2
n 2
1 X 2
. √ n
Πf (xi ; θ, σ ) = exp − 2 (x̄ − θ) exp − 2 (xi − x̄) (σ 2π) .
2σ 2σ
Setting
n
k1 (u1 (x1 , x2 , . . . , xn ); θ) = exp − 2 (x̄ − θ)2

2σ
1
(xi − x̄)2
P
exp − 2σ 2
k2 (x1 , x2 , . . . , xn ) = √
(σ 2π)n
k2 does not depend on θ and so X̄ is sufficient for θ.
Note
Every single valued function Z = u(Y1 ), not involving θ, with a single valued inverse is
also sufficient for θ.
5
ESTIMATORS Completeness
Completeness
Let {f (x; θ); θǫΩ} be a family of discrete or continuous probability density functions and
let u(x) be a continuous function of x but not a function of θ. If E(u(X)) = 0 for every
θǫΩ requires u(x) to be zero at each point x at which at least one member of the family of
probability density functions is positive, then the family of probability density functions
is called a complete family.
Example One
Consider the family of probability density functions given by
1
f (x; θ) = 0<x<θ 0 < θ < ∞.
θ
If Z ∞
E(u(X)) = u(x)f (x; θ) dx
−∞
Z θ
1
= u(x) dx
0 θ
=0 θ > 0 by assumption
then Z θ
u(x) dx = 0 θ > 0.
0
Differentiating with respect to θ gives u(θ) = 0 for θ > 0 and so u(x) = 0 for x > 0.
6
ESTIMATORS Completeness
Example Two
f (x; θ) = θx (1 − θ)1−x 0<θ<1 x = 0, 1.
Each member of the family is positive at only x = 0 and x = 1 so we need to show that
u(0) = u(1) = 0.
In this case X
E(u(X)) = u(x)f (x; θ)
x
1
X
= u(x)θx (1 − θ)1−x
x=0
= u(0)(1 − θ) + u(1)θ
= θ((u(1) − u(0)) + u(0)
=0
is a linear function of θ. If a linear function is zero at more than one point, then both
the slope and intercept are zero so that
u(1) − u(0) = 0 and u(0) = 0
so that
u(0) = u(1) = 0.
Example Three
1
f (x; θ) = −θ <x<θ 0<θ<∞
2θ
and let u(x) = x. Then

θ
1
Z
E(u(X)) = x dx = 0
−θ 2θ
so that E(u(X)) = 0 but u(x) 6= 0 so the family is not complete.
7
ESTIMATORS Uniqueness
Uniqueness
Let X1 , X2 , . . . , Xn be a random sample from a distribution with probability density

function f (x; θ) θǫΩ, let Y1 = u1 (X1 , X2 , . . . , Xn ) be a sufficient statistic for θ and let
the family {g1 (y1 ; θ); θǫΩ} of probability density functions be complete. If there exists a
continuous function of Y1 which is unbaised for θ, then this function of Y1 is the unique
best statistic for θ.
Unique best statistic
If a continuous function ϕ(Y1 ) is unbiased for θ and some other function ψ(Y1 ) which is
not a function of θ is also unbiased for θ, then
E(ϕ(Y1 ) − ψ(Y1 )) = 0 θǫΩ
and if the family {g1 (y1 ; θ); θǫΩ} is complete then for every continuous unbiased statistic
ϕ(Y1 )
ϕ(Y1 ) = ψ(Y1 )
at all points of non–zero probability density.
So if
Y1 = u1 (X1 , X2 , . . . , Xn ) is sufficient for θ
and
Y2 (not a function of Y1 alone) is unbiased for θ
consider
E(Y2 |y1 ) = ϕ(y1 ).
Y1 is sufficient for θ so the conditional probability density function of Y2 given Y1 = y1

does not depend on θ so ϕ(y1 ) is a function of y1 alone. That is, the statistic ϕ(Y1 ) is a
function of the sufficient statistic, is unbiased for θ and has smaller variance than Y2 .
8
ESTIMATORS Parameters in Exponential Class
If X1 , X2 , . . . , Xn is a random sample from a distribution with probability density function

f (x; θ), γ < θ < δ which represents P a regular case of an exponential class of probability
n
density functions, the statistic Y1 = 1 K(Xi ) is a sufficient statistic for θ and the family
{g1 (y1 ; θ) : γ < θ < δ} of probability density functions is complete. In this case, if there
exists a continuous function of Y1 , say ϕ(Y1 ), such that E(ϕ(Y1 )) = θ, then the statistic
ϕ(Y1 ) is the unique best statistic for θ.
Example One
If X1 , X2 , . . . , Xn is a random sample from a normal distribution, N (θ, σ 2 ), the probability

density function of X is

θ x2 √ θ2

f (x; θ) = exp 2 x − 2 − ln 2πσ 2 − 2
σ 2σ 2σ
which is a regular case of the exponential class with
θ x2 √ θ2
p(θ) = K(x) = x S(x) = − − ln 2πσ 2 q(θ) = −
σ2 2σ 2 2σ 2
P
and so Y1 = Xi is a complete sufficient statistic for θ and as E(Y1 ) = nθ,
Y1
ϕ(Y1 ) = = X̄
n
is unbiased for θ, is a function of the sufficient statistic Y1 and has minimum variance.
So X̄ is the unique best statistic for θ.
Example Two
The probability density function of a Poisson distribution is

f (x; θ) = exp (ln θ)x − ln (x!) − θ
P
and so Y1 = Xi is a complete sufficient statistic for θ and as E(Y1 ) = nθ, the statistic
Y1
ϕ(Y1 ) = = X̄
n
is unbiased for θ and is the unique best statistic for θ.
9
ESTIMATORS Invariance Property
Invariance property of maximum likelihood estimators
If θ̂ is the maximum likelihood estimator of θ, the maximum likelihood estimator of

τ = τ (θ) is τ (θ̂).
Example
If the random variable X has a Poisson distribution with probability density function
f (x; θ) = e−θ θx /x! the log of the likelihood is
X Y
ln L(θ) = −nθ + ln θ xi − ln( xi !)
and P
∂ ln L(θ) xi
= −n + = 0 if θ = x̄
∂θ θ
so that the maximum likelihood estimator of θ is X̄.
To estimate τ (θ) = P (X = 0) = e−θ let θ = − ln τ and reparametrize the probability

density function so that
τ (− ln τ )x
f ∗ (x; τ ) =
x!
and the likelihood is P
n xi
τ (− ln τ )
L∗ (τ ) = Q
xi !
with X Y
ln L∗ (τ ) = n ln τ + xi ln (− ln τ ) − ln xi !
and P
∂ ln L∗ (τ ) n xi −1
= + =0 if − ln τ = x̄
∂τ τ − ln τ τ
and so the maximum likelihood estimator of τ is
τ̂ = e−X̄ .
10
ESTIMATORS Functions of a Parameter
Let X1 , X2 , . . . , Xn be a random sample from a normal distribution, N (θ, 1). Finding the
best statistic for P (X ≤ c) = Φ(c − θ) involves the following three steps.
(i) Find an unbiased statistic for Φ(c − θ).
(ii) Know that X̄ is sufficient for θ.
(iii) If E(unbiased statistic|X̄ = x̄) = ϕ(x̄) then note that ϕ(X̄) is the unique best
statistic for Φ(c − θ).
(i)
Let
1 x1 ≤ c
n
u(x1 ) =
0 x1 > c
∞
1
Z
(x1 −θ)2
E(u(X1 )) = u(x1 ) √ e− 2 dx1
−∞ 2π
Z c
1 (x1 −θ)2
= √ e− 2 dx1
−∞ 2π
= Φ(c − θ)
(ii)
From previous results we know that X̄ is sufficient for θ.
(iii)
Variables X and Y have a bivariate normal distribution if their probability density

function is
1 q
f (x, y) = p e− 2
2πσ1 σ2 1 − ρ2
where #
1 x − µ1 2 x − µ y − µ y − µ 2
1 2 2
q= − 2ρ +
1 − ρ2 σ1 σ1 σ2 σ2
11
ESTIMATORS Functions of a Parameter
The joint distribution of X1 and X̄ is bivariate normal with X1 having mean θ and
variance 1, X̄ having mean θ and variance n1 and X1 and X̄ having correlation coefficient
√1 .
n
n−1
The conditional distribution of X1 given X̄ = x̄ is normal with mean x̄ and variance n
and so
ϕ(x̄) = E(u(X1 )|X̄ = x̄)

r
n(x1 −x̄)2
Z ∞
n 1
= u(x1 ) √ e− 2(n−1) dx1
−∞ n − 1 2π
Z c r n(x1 −x̄)2
n 1
= √ e− 2(n−1) dx1
−∞ n − 1 2π
√
Z √ n(c−x̄) √
n−1 1 − z2 n(x1 − x̄)
= √ e 2 dz letting z = √
−∞ 2π n−1
√n(c − x̄)
=Φ √
n−1
and so the unique, minimum variance unbiased statistic for Φ(c − θ) is
√
n(c − X̄)
Φ √ .
n−1
12

Lecture 8

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Lecture 8

Uploaded by

Copyright:

Available Formats

ESTIMATORS Sufficient Statistics

Let X1 , X2 , . . . , Xn be a random sample from a variable with probability density function

where the transformation is 1–1.

The joint probability density function of Y1 , Y2 , . . . , Yn is

and the marginal probability density function of Y1 is

and the conditional probability density function of Y2 , . . . , Yn given Y1 = y1 is

If for n fixed, X1 , X2 , . . . , Xn is a random sample from a variable with probability density

If X1 , . . . , Xn is a random sample from a random variable with probability density function

Πf (xi ; θ) = g1 (u1 (x1 , . . . , xn ); θ)H(x1 , . . . , xn )

where H(x1 , . . . , xn ) does not depend on θ.

If the random variable X has probability density function

f (x; θ) = θx (1 − θ)1−x x = 0, 1 0<θ<1

The joint probability density function of X1 , . . . , Xn is

θx1 (1 − θ)1−x1 θx2 (1 − θ)1−x2 . . . θxn (1 − θ)1−xn

If the random variable X has probability density function

f (x; θ) = e−(x−θ) θ<x<∞ −∞<θ <∞

then Y1 = X(1) has probability density function

g1 (y1 ; θ) = ne−n(y1 −θ) 0 < y1 < ∞.

The joint probability density function of X1 , . . . , Xn is

and so Y1 = X(1) is sufficient for θ.

If X1 , . . . , Xn is a random sample from a random variable with probability density function

If the random variable X has probability density function

f (x; θ) = θxθ−1 0<x<1 θ>0

the joint probability density function of a random sample X1 , X2 . . . , Xn is

If the random variable X has probability density function

the joint probability density function of a random sample X1 , X2 . . . , Xn is

and using the identity

this can be written as

Consider the family of probability density functions given by

Consider the family of probability density functions given by

f (x; θ) = θx (1 − θ)1−x 0<θ<1 x = 0, 1.

u(1) − u(0) = 0 and u(0) = 0

Consider the family of probability density functions given by

and let u(x) = x. Then

Let X1 , X2 , . . . , Xn be a random sample from a distribution with probability density

Unique best statistic

E(ϕ(Y1 ) − ψ(Y1 )) = 0 θǫΩ

Y1 is sufficient for θ so the conditional probability density function of Y2 given Y1 = y1

If X1 , X2 , . . . , Xn is a random sample from a distribution with probability density function

If X1 , X2 , . . . , Xn is a random sample from a normal distribution, N (θ, σ 2 ), the probability

which is a regular case of the exponential class with

The probability density function of a Poisson distribution is

Invariance property of maximum likelihood estimators

If θ̂ is the maximum likelihood estimator of θ, the maximum likelihood estimator of

To estimate τ (θ) = P (X = 0) = e−θ let θ = − ln τ and reparametrize the probability

From previous results we know that X̄ is sufficient for θ.

Variables X and Y have a bivariate normal distribution if their probability density

ϕ(x̄) = E(u(X1 )|X̄ = x̄)

You might also like