Professional Documents
Culture Documents
Maura Mezzetti
Outline
Sherlock Holmes
Silver Blaze
Sufficiency
Any sample statistic Y = t(X) provides a partition of the sample
space into the subsets Ωy , where Ωy = {x : t(x) = y }
Sufficiency
Sufficiency
Sufficiency
Sufficiency
Example
Let X = (X1 , . . . , Xn ) be a random sample from N(µ, σ 2 ).
Suppose that σ 2 is known. Let consider T (X ) = X̄n and we know
X̄n ∼ N(µ, σ 2 /n). Let’s now calculate the conditional distribution
of X given T (X ) = t.
fX (x)
fX |T (X ) (x|T (X ) = T (x)) =
fT (T (x))
exp (− ni=1 (xi −µ)2 /(2σ 2 ))
P
(2π)n/2 σ n
= exp(−n(x̄n −µ)2 /(2σ 2 ))
(2π)1/2 (σ/n)1/2
exp − ni=1 (xi − x̄n )2 /(2σ 2 )
P
=
(2π)(n−1)/2 σ n−1 /n1/2
Example
P(X > 3 ∩ T = 1)
P(X > 3|T ) =
P(T = 1)
P(X > 3 ∩ X > 2)
=
P(X > 2)
P(X > 3)
=
P(X > 2)
exp(−3λ)
=
exp(−2λ)
= exp(−λ)
Sufficiency
Sufficiency
p(x|θ) = g (T (x)|θ)h(x).
Factorization Theorem
Brief proof for more details see See Casella and Berger pag. 276
Let’s consider only the discrete case
X
Pθ (X = x) = Pθ (X = x ∩ T = t)
t
= Pθ (X = x ∩ T = T (x))
= Pθ (X = x|T = T (x))Pθ (T = T (x))
Factorization Theorem
Brief proof for more details see See Casella and Berger pag. 276
On the other hand...
Pθ (X = x)
Pθ (X = x|T = T (x)) =
Pθ (T = T (x))
g (T (x)|θ)h(x)
= P
T (y )=t g (T (y )|θ)h(y )
n α
Y α xm
f (x|α, xm ) = α+1
x
i=1 i
αn x nα
= Qn mα+1
i=1 xi
= g (t, α)h(x)
.
P
Prove that log xi is a sufficient statistics for α,
Y Y
f (x1 , . . . , xn ) = αn (α + 1)n xiα+1 (1 − xi )
the statistic
Xn n
X
T (X ) = T1 (Xj ), . . . , Tk (Xj )
j=1 j=1
Example
Example
n
Y 1 xi
f (x1 , x2 , . . . , xn ; θ) = exp 1 − I (xi )
θ θ (θ,∞)
i=1
n
! n
1 X xi Y
= exp 1− I(θ,∞) (xi )
θn θ
i=1 i=1
n
!
1 1 X
= exp n − xi I(θ,∞) (x(1) )
θn θ
i=1
Example
P
So that sample sum, i xi , is sufficient and complete statistics.
Exercises:
Assume that the random variables X1 , X2 , . . . , Xn form a random
sample of size n form the distribution specified, and show that the
statistic T specified is a sufficient statistic for the parameter:
Exercise 1: A normal distribution for which the mean
Pn µ is known
2
and the variance σ is unknown; T = i=1 (Xi − µ)2 .
Exercise 2: A gamma distribution with parameters α and β,
where the value of βQis known and the value of α is
unknown (α); T = ni=1 Xi
Exercise 3: A uniform distribution on the interval [a, b], where
the value of a is known and the value of b is
unknown (b > a); T = max(X1 , . . . , Xn ).
Exercise 4: A uniform distribution on the interval [a, b], where
the value of b is known and the value of a is
unknown (b > a); T = min(X1 , . . . , Xn ).
Mezzetti The Sufficiency Principle
The Sufficiency Principle
Outline
Examples Sufficient Statistics
Principle of Data Reduction
Minimal Sufficient Statistics
Exercises:
Exercise 5: Suppose that X1 , X2 , . . . , Xn form a random sample
from a gamma distribution with parameters α > 0
and β > 0, and thePvalue of β is known. Show that
the statistic T = ni=1 logXi is a sufficient statistic
for the parameter α.
Exercise 6: The Pareto distribution has density function:
f (x|x0 ; θ) = θx0θ x −θ−1 , x ≥ x0 , θ>1
Assume that x0 > 0 is given and that X1 , X2 , . . . , Xn
is an i.i.d. sample. Find a sufficient statistic for theta
by
(a) using factorization theorem,
(b) using the property of exponential
family. Are they the same? If not, why
are both of them sufficient?
Mezzetti The Sufficiency Principle
The Sufficiency Principle
Outline
Examples Sufficient Statistics
Principle of Data Reduction
Minimal Sufficient Statistics
Example of in-sufficiency
Example of in-sufficiency