You are on page 1of 7

Exponential Families

The random vector Y = (Y1 , . . . , Yn )T has a distribution from an exponen-


tial family if the density of Y is of the form

fY (y|θ) = b(y) exp η(θ)T T (y) − a(θ)




where
T
η(θ) = η1 (θ), . . . , ηd (θ) ,
T
T (y) = T1 (y), . . . , Td (y) .

◦ η = (η1 , . . . , ηd )T is the natural parameter of the exponential family


◦ T = T (Y ) is a sufficient statistic for θ (or for η)

Definition A statistic T = T (Y1 , . . . , Yn ) is said to be sufficient for θ if the


conditional distribution of Y1 , . . . , Yn given T = t does not depend on
θ for any value of t.

A sufficient statistic for θ contains all the information in the sample about
θ. Thus, given the value of T , we cannot improve our knowledge about
θ by a more detailed analysis of the data Y1 , . . . , Yn . In other words, an
estimate based on T = t cannot improved by using the data Y1 , . . . , Yn .
Sufficiency and exponential families: Rice (1995), 280-284.
Example:
Consider a sequence of independent Bernoulli trials,
iid
Yi ∼ Bin(1, θ).

The number of successes, Sn = ni=1 Yi is sufficient for the parameter θ.


P

Additional information about the observed values Y1 , . . . , Yn , as e.g. the


order in which the successes occured, does not convey any information
about θ.

Exponential Families, Apr 13, 2004 -1-


Exponential Families

Examples

◦ Binomial distribution: Y ∼ Bin(n, θ)


 
n y
fY (y|θ) = θ (1 − θ)n−y
y
 
n h  
θ
i
= exp y log 1−θ + n log(1 − θ)
y
with
 θ 
η(θ) = log
1−θ
T (y) = y

◦ Normal distribution: Y ∼ N (µ, σ 2 )

2 2 − 21
 1 2

fY (y|µ, σ ) = (2πσ ) exp − 2 (y − µ)

 1 2 µ µ2 1 2

= exp − 2 y + 2 y − 2 − log(2πσ )
2σ σ 2σ 2
with

2
 1 µ T
η(µ, σ ) = − 2 , 2
2σ σ
T
T (y) = (y 2 , y)

iid
If Y = (Y1 , . . . , Yn )T with Yi ∼ N (µ, σ 2 ), then
n
P n T
2 P
T (y) = yi , yi
i=1 i=1

Exponential Families, Apr 13, 2004 -2-


Exponential Families

◦ Gamma distribution: Y ∼ Γ(α, λ)


λα α−1 
fY (y|α, λ) = y exp − λ y
Γ(α)
= x−1 exp − λ y + α log(y) + α log(λ) − log(Γ(α))


with

η(α, λ) = (−λ, α)T


T
T (y) = y, log(y)
iid
If Y = (Y1 , . . . , Yn )T with Yi ∼ Γ(α, λ), then
n
P n
P T
T (y) = yi , log(yi )
i=1 i=1

This includes as special cases


◦ the exponential distribution (= Γ(1, λ))
n 1

◦ the χ2 distribution with n degrees of freedom (= Γ 2, 2 )

◦ Beta distribution: Y ∼ B(α, β)


Γ(α + β) α−1
fY (y|α, β) = y (1 − y)β−1
Γ(α)Γ(β)
= [y(1 − y)]−1 exp α log(y) + β log(1 − y)

+ log Γ(α + β) − log Γ(α) − log Γ(β)

with

η(α, β) = (α, β)T


T
T (y) = log(y), log(1 − y)

Exponential Families, Apr 13, 2004 -3-


Maximum Likelihood for Exponential Families

Suppose that Y = (Y1 , . . . , Yn )T with density

fY (y|θ) = b(y) exp η(θ)T T (y) − a(θ)




Then the log-likelihood function is given by

ln (θ|Y ) = η(θ)T T (Y ) − a(θ) + log b(Y ) .




Differentiating with respect to θ, we obtain the likelihood equations

∂η(θ) T ∂a(θ)
T (Y ) = .
∂θ ∂θ

The likelihood equations can be rewritten in the following form


 ∂η(θ) T  ∂η(θ) T
E ∂θ
T (Y ) θ =

∂θ
T (Y ).

∂η(θ)
If the matrix ∂θ is invertible, then this simplifies to

E

T (Y )|θ = T (Y ).

Proof. To see this, note that


 ∂η(θ) T
E ∂θ
T (Y ) −
∂a(θ) 
∂θ

∂η(θ) T
Z 
∂a(θ) 
exp η(θ)T T (y) − a(θ) b(y) dy

= T (Y ) −
∂θ ∂θ
Z

exp η(θ)T T (y) − a(θ) b(y) dy

=
∂θ
=0

Exponential Families, Apr 13, 2004 -4-


Maximum Likelihood for Exponential Families

Examples
iid
◦ Y1 , . . . , Yn ∼ N (µ, σ 2 )
Note that
n
P n T
Yi2
P
T (Y ) = Yi , .
i=1 i=1

Thus the ML estimator is given by the solution of


n
P
Yi = n µ
i=1

n
Yi2 = n σ 2 + n µ2
P
i=1

iid
◦ Y1 , . . . , Yn ∼ Γ(α, λ)
Note that
n
P n
P T
T (Y ) = Yi , log(Yi ) .
i=1 i=1

Thus the ML estimator is given by the solution of


n
P α
Yi = n ·
i=1 λ

log(Yi ) = n E log(Y1 )
n
P 
i=1

It can be shown that

E
 ∂Γ(α) 1
log(Y1 ) = · − log(λ)
∂α Γ(α)

Exponential Families, Apr 13, 2004 -5-


The EM Algorithm for Exponential Families

Suppose the complete data Y have a distribution from an exponential


family

fY (y|θ) = b(y) exp η(θ)T T (y) − a(θ) .




Then the EM algorithm has a particularly simple form.

EM algorithm for exponential families

◦ E-step: Estimate the sufficient statistic T = T (Y ) by

T (k) = E T (Y ) Yobs , θ(k) .




◦ M-step: Find θ(k+1) by solving the likelihood equations for θ,


 ∂η(θ) T  ∂η(θ) T
E ∂θ
T (Y ) θ =

∂θ
T (k) ,

or, if the matrix ∂η(θ)


∂θ is invertible,

E T (Y )|θ = T (k).


Example: Univariate normal observations


Suppose that only the first m of n values are observed.
◦ E-step:

= E T1 (Y ) Yobs , µ̂(k) , σ̂ (k) 2 =


m
(k)
Yi + (n − m) µ̂(k)
 P
T1
i=1

=E
m
(k) (k) (k) 2
Yi2 + (n − m) σ̂ (k) 2 + µ̂(k) 2
 P 
T2 T2 (Y ) Yobs , µ̂ , σ̂
=
i=1

◦ M-step:

= E(T1 (Y )|µ, σ 2 ) = n µ
(k) 1 (k)
T1 ⇒ µ̂(k+1) = T
n 1
= E(T2 (Y )|µ, σ 2 ) = n σ 2 + µ2
(k) 1 (k) 1 (k) 2
⇒ σ̂ (k+1) 2

T2 = T2 − 2 T1
n n

Exponential Families, Apr 13, 2004 -6-


The EM Algorithm for Exponential Families

Example: t distribution
Suppose that Y1 , . . . , Yn are independently sampled from the density
1 2 −1

fYi (y|µ) = √ 1 + (y − µ)
πΓ 12


iid
Define the complete data as (Y, X) where Xi ∼ χ21 such that

Yi |Xi ∼ N (µ, Xi−1 ).

Then the complete-data likelihood is


 1P n  n p 
2 Q
Ln (µ|Y, X) = exp − Xi (Yi − µ) Xi · fXi (Xi )
2 i=1 i=1

Thus
1 2  n
P n
P 
η(µ) = µ, − µ and T (Y ) = Xi Yi , Xi .
2 i=1 i=1

◦ E-step:

E(Xi|Yi, µ̂(k))Yi = P 1 + (Y2 Y−i µ̂(k))2


n n
(k) P
T1 =
i=1 i=1 i

E(Xi|Yi, µ̂
n n 2
(k) P (k) P
T2 = )= (k) 2
i=1 i=1 1 + (Yi − µ̂ )

∂η(θ)
◦ M-step: Note that ∂θ = (1, −µ)T . Thus the ML estimator solves
the equations

= nE(X1 Y1 ) − n µ E(X1 ),
(k) (k)
T1 − µ T2

which yields
(k)
(k+1) T1
µ̂ = (k)
.
T2

Exponential Families, Apr 13, 2004 -7-

You might also like