You are on page 1of 73

STAT8310 Statistical Theory

2021

Topic 5.1
Generating functions

STAT8310 2021 Topic 5.1 1


Generating functions
• The generating function of the sequence {a0 , a1 , a2 , . . .} is defined
to be the function
X∞
g(z) = aj z j .
j=0

• Note that this is a function of z.


• If the sequence is finite, then g(z) is a finite degree polynomial.
• If the sequence is infinite, then g(z) is not a polynomial, but it is
assumed that g(z) exists for some open interval containing 0.
• The number aj is the coefficient of z j in g(z) (or the expansion of
g(z)).

STAT8310 2021 Topic 5.1 2


• For example, if aj = 1, ∀j, then ∀z ∈ (−1, 1) ,

X
g(z) = zj
j=0
1
= .
1−z
−1
• We can then get aj back by expanding (1 − z) about z = 0,
and finding the coefficient of z j , which here is 1. This may seem
trivial, but this is quite an important concept.
• If g(z) exists for z ∈ A, where A is open and contains 0, then,
assuming we can differentiate term by term,

STAT8310 2021 Topic 5.1 3


k
d
g (k) (z) = k g (z)
dz

X
= j (j − 1) · · · (j − k + 1) aj z j−k
j=k

= k (k − 1) · · · 1ak

X
+ j (j − 1) · · · (j − k + 1) aj z j−k .
j=k+1

• Thus

g (k) (0) = k (k − 1) · · · 1ak


= k!ak ,

since all of the other terms are 0.

STAT8310 2021 Topic 5.1 4


• We thus have
g (k) (0)
ak = .
k!
1
• Taking the previous example, where g(z) = , we can thus
1−z
obtain the ak from the above:
k

1 d −1
ak = k
(1 − z)
k! dz z=0
1 k

−k−1
= (−1) (−1) (−2) · · · (−k) (1 − z)
k!

z=0
= 1.

STAT8310 2021 Topic 5.1 5


Probability Generating Functions
• Let X be a discrete random variable.
• The probability generating function (pgf) GX (z) of X which has
probability function fX (x) is defined by
X

GX (z) = E z
X
= fX (x)z x .
x

• The summation will usually start at 0 or 1, depending on the


lowest value X takes on.

STAT8310 2021 Topic 5.1 6


Example

Let X be uniformly distributed in {−a, −a + 1, . . . , b − 1, b},


where a, b > 0. Then provided z 6= 1,
b
X 1 x z −a − z b+1
GX (z) = z = .
x=−a
a + b + 1 (a + b + 1)(1 − z)


• We shall assume from now on that X is non-negative.
• From above, we can find fX (x) from the formula
k

1 d
fX (k) = k
GX (z)
k! dz z=0

STAT8310 2021 Topic 5.1 7


Examples

1. X ∼ Bernoulli(p) . Then ∀z,

GX (z) = (1 − p) z 0 + pz
= 1 − p + pz.

2. X ∼ Bin(n, p) . Then ∀z,


n  
X n n−x
GX (z) = px (1 − p) zx
x=0
x
n  
X n n−x x
= (1 − p) (pz)
x=0
x
= (1 − p + pz)n .

STAT8310 2021 Topic 5.1 8


3. X ∼ Poisson(θ) . Then ∀z,
∞ x
−θ θ
X
GX (z) = e zx
x=0
x!
∞ x
−θ (θz)
X
= e
x=0
x!
∞ x
X (θz)
= e−θ eθz e−θz
x=0
x!
= e−θ+θz

STAT8310 2021 Topic 5.1 9


1
4. X ∼ Geometric(p) . Then, as long as |z| < ,
1−p

(1 − p)x−1 pz x
X
GX (z) =
x=1
pz
= .
1 − (1 − p) z

STAT8310 2021 Topic 5.1 10


5. X ∼ NegBin(r, p).

Recall from Topic 2: Newton’s generalized binomial theorem


∞  
X r r−i i
(a + b)r = a b
i=0
i

where
 
r r(r − 1) · · · (r − i + 1)
= for r any real number
i i!

Gives
∞  
X −k
p−k = (−1)y (1 − p)y .
y=0
y

STAT8310 2021 Topic 5.1 11


1
Now, as long as |z| < , we have
1−p
∞  
x−1
(1 − p)x−r pr z x
X
GX (z) =
x=r
r−1

(x − 1) · · · 1
(1 − p)x−r pr z x
X
=
x=r
(x − r) (x − r − 1) · · · 1 (r − 1) · · · 1

X (x − 1) · · · r x−r r x
= (1 − p) p z (put y = x − r)
x=r
(x − r) (x − r − 1) · · · 1

X (y + r − 1) · · · r y
= pr z r (1 − p) z y
y=0
y (y − 1) · · · 1

(−r) (−r − 1) · · · (−r − y + 1)
{− (1 − p) z}y
X
= pr z r
y=0
y (y − 1) · · · 1

= pr z r {1 − (1 − p) z}
−r

 r
pz
STAT8310 2021 = . Topic 5.1 12
1 − (1 − p) z
• Note that the pgf in 5 is the rth power of the pgf in 4, and that
the pgf in 2 is the nth power of the pgf in 1. This is not by
accident!
• Exercise: Now find P (X = x) from the above pgfs.

STAT8310 2021 Topic 5.1 13


Some properties of pgfs

• The function G(z) is differentiable for |z| < 1 and its derivative is
X

G (z) = xf (x)z x−1 < ∞.
x

• (Uniqueness) Let X and Y have pgfs GX (z) and GY (z). If for


some G(z) we have

GX (z) = GY (z) = G(z) for |z| < 1,

then X and Y have the same probability mass function.

STAT8310 2021 Topic 5.1 14


Calculating moments from the pgf

• First, note that GX (1) = 1 (sum of all probabilities).


• Next, note that, as long as we can differentiate term by term, we
have
X

GX (z) = xfX (x)z x−1 .
x

• Thus
X
G′X (1) = xfX (x)
x
= E (X) .

• Similarly,
′′ X
GX (z) = x (x − 1) fX (x)z x−2 .
x

STAT8310 2021 Topic 5.1 15


• Thus
′′ X
GX (1) = x (x − 1) fX (x)
x
= E {X (X − 1)}
2

=E X −X
2

= E X − E (X) .

• Hence
2
 2
var X = E X − {E (X)}
′′ 2
= GX (1) + E (X) − {E (X)}
′′ 2
= GX (1) + G′X (1) − {G′X (1)} .

• And so on. Since


′′′
3 2
 
GX (1) = E {X (X − 1) (X − 2)} = E X − 3E X + 2E (X) ,

STAT8310 2021 Topic 5.1 16


it follows that
′′′
n ′′ ′
o ′
3

E X = GX (1) + 3 GX (1) + GX (1) − 2GX (1)

and
n o n ′′ o
3 ′′′ ′′ ′ ′
E (X − µX ) = GX (1) + 3GX (1) + GX (1) − 3GX (1) GX (1) + G′X (1)
n ′ o3 n ′ o3
+ 3 GX (1) − GX (1)
′′′ ′′ ′ ′ ′′
= GX (1) + 3GX (1) + GX (1) − 3GX (1) GX (1)
n ′ o3
′ 2
− 3 {GX (1)} + 2 GX (1) .

• There is no point in deriving a general formula.

STAT8310 2021 Topic 5.1 17


The pgf of a sum of independent discrete rvs
• Let X1 , X2 , . . . , Xn be independent discrete rvs with probability
functions
fX1 (x), fX2 (x), . . . , fXn (x)
and pgfs
GX1 (z), GX2 (z), . . . , GXn (z).

• We wish to calculate the probability function of


Y = X1 + X2 + · · · + Xn .
• This may be difficult to do from first principles (using, e.g.
discrete convolution).
• We shall need to assume in what follows that if the rvs

STAT8310 2021 Topic 5.1 18


U1 , . . . , Un are independent, then
( n ) n
Y Y
E h (Ui ) = E {h (Ui )}
i=1 i=1

• Now, the pgf of Y is


Y

GY (z) = E z
X1 +X2 +···Xn

=E z
X1 X2 Xn

=E z z ···z
X1 X2 Xn
  
=E z E z ···E z
= GX1 (z)GX2 (z) · · · GXn (z)

• We may recognise this as the pgf corresponding to some


probability function.
• In that case, Y will have that probability function.

STAT8310 2021 Topic 5.1 19


• Even if we don’t recognise the probability function, we can find it
by expanding GY (z) about 0 and identifying fY (y) as the
coefficient of z y in this expansion, or by using
k

1 d
fY (k) = k
GY (z) .
k! dz z=0

• An important case is where the Xi are also identically


distributed, i.e. they have the same probability function fX (x)
and thus the same pgf GX (z), for then
n
GY (z) = {GX (z)} .

Example 1: Let X1 , X2 , . . . , Xn be i.i.d. Bern(p) . Then


GY (z) = {GX (z)}n = (1 − p + pz)n , which is the pgf of a Bin(n, p)
rv. Thus Y ∼ Bin(n, p) .

STAT8310 2021 Topic 5.1 20


Example 2: X1 , X2 , . . . , Xn i.i.d. Geometric(p) . Then
 n
n pz
GY (z) = {GX (z)} = ,
1 − (1 − p) z
which is the pgf of a NegBin(n, p) rv. Thus Y ∼NegBin(n, p) .

Example 3: X is (discrete uniform) on {1, 2, . . . , m} . Then


m
X 1 x
GX (z) = z
x=1
m
 m
z (1 − z )
; z 6= 1


= m (1 − z)
1 ; z = 1.

STAT8310 2021 Topic 5.1 21


and   n
m
z (1 − z )
; z 6= 1


GY (z) = m (1 − z)
1 ; z = 1,

which isn’t of much use!

Example 4: X1 , X2 , . . . , Xn i.i.d. Binomial(m, p) . Then


n m n
GY (z) = {GX (z)} = {(1 − p + pz) }
mn
= (1 − p + pz) ,

which is the pgf of a Binomial(mn, p) . Thus Y ∼ Bin(mn, p) .

STAT8310 2021 Topic 5.1 22


Example 5: X1 , X2 , . . . , Xn i.i.d. Poisson(θ) . Then
n
GY (z) = {GX (z)}
 −θ+θz n
= e
= e−nθ+nθz ,
which is the pgf of a Poisson(nθ) rv. Thus Y ∼ Poisson(nθ) .

The independence of rvs in terms of pgfs


It is natural to ask the following question: if X and Y are
non-negative, integer-valued rvs such that
GX+Y (z) = GX (z)GY (z)
is satisfied, does it follow that X and Y are independent?
We show by an example that in general the answer is negative.

STAT8310 2021 Topic 5.1 23


Example

Let ξ and η be independent rvs such that ξ takes the values 0, 1 and
2 with probability 1/3 each, and η takes the values 0 and 1 with
probabilities 1/3 and 2/3 respectively. Define X = ξ and
Y = (ξ + η)(mod 3).

STAT8310 2021 Topic 5.1 24


Then Y takes the values 0, 1 and 2 with probability 1/3 each.
Further, the sum X + Y takes the values 0, 1, 2, 3 and 4 with
probabilities 1/9, 2/9, 3/9, 2/9 and 1/9 respectively. It can be
checked that
GX+Y (z) = GX (z)GY (z)
is satisfied. However, the variables X and Y are not independent,
they are functionally dependent.

STAT8310 2021 Topic 5.1 25


Moment Generating Functions
• Let X be a r.v. Then the moment generating function of X is
defined by
tX

MX (t) = E e
 R
 ∞ etx f (x)dx ; X continuous
−∞ X
=
 P etx fX (x) ; X discrete
x

• This is related to the Laplace transform of fX (x).


−tX

• In fact, many texts define the mgf to be E e , which is the
Laplace transform of fX (x). We will use the above definition, as
it coincides with the definition in the textbook and some other
well-known Statistics textbooks.

STAT8310 2021 Topic 5.1 26


The moment generating function generates moments:
tX

MX (t) = E e
2 2 3 3
 
t X t X
= E 1 + tX + + + ···
2! 3!
(∞ )
X (tX) r
=E
r=0
r!

Assuming that we can take expectations inside the summation sign


(we can’t always do this), we get

STAT8310 2021 Topic 5.1 27


t2 2

MX (t) = 1 + tE (X) + E X + . . .
2!
∞ r
X t
= E (X r )
r=0
r!
∞ r
′ t
X
= µr
r=0
r!

k

Therefore the moments µ′k =E X can be obtained either by
tk
1. finding the coefficient of k! in the series expansion of MX (t)
about t = 0, or

STAT8310 2021 Topic 5.1 28


2. evaluating the k th derivative of MX (t) at t = 0: since
∞ r−k
(k) ′ t
X
MX (t) = r (r − 1) · · · (r − k + 1) µr
r=0
r!
∞ r−k
′ t
X
= r (r − 1) · · · (r − k + 1) µr ,
r!
r=k

it follows, since the only term not involving powers of t is the


r = k term, that
k

(k) d
MX (0) = k MX (t) = k!µ′k /k! = µ′k .
dt t=0

Note that MX (0) = E (1) = 1.

For the mgf to exist, it is necessary that MX (t) exist in a

STAT8310 2021 Topic 5.1 29


neighbourhood of zero, i.e. that MX (t) be finite in some interval
(−b, b) .

The mgf is related to the pgf, since for a discrete random variable X,
MX (t) = GX (et ) .

Trivially, if we wish to calculate central moments, then we use the


mgf of the random variable Y = X − µ.

Sometimes it is easier to calculate the mgf of Y directly.

STAT8310 2021 Topic 5.1 30


In other cases, we use
 
tY t(X−µ)

E e =E e
tX
−µt

=e E e
= e−µt MX (t).

STAT8310 2021 Topic 5.1 31


Moment generating functions of some known
distributions
1. X ∼ U [0, 1]
tX

MX (t) = E e
Z 1
= etx 1dx
0
 h i1
 etx ; t 6= 0
t
= 0
 1 ; t=0

 1 (et − 1) ; t 6= 0
t
=
 1 ; t = 0.

STAT8310 2021 Topic 5.1 32


• Compute µ′k by differentiation:
t t
′ te − (e − 1) × 1
MX (t) =
t2
• Thus
t t
 
′ te − (e − 1)
MX (0) = lim
t→0 t2
• Use l’Hôpital’s rule: If limt→0 f (t) = 0 and limt→0 g(t) = 0 ,
then    ′ 
f (t) f (t)
lim = lim ′
t→0 g(t) t→0 g (t)

STAT8310 2021 Topic 5.1 33


• Therefore, using l’Hôpital’s rule, we get
 t t t

′ te + e − e
MX (0) = lim
t→0 2t
 t
e
= lim
t→0 2
1
=
2
and so
1
µ′1 =µ= .
2

STAT8310 2021 Topic 5.1 34


• Compute µ′k by series expansion:
1 t 
MX (t) = e −1
t
2 3 4
 
1 t t t
= 1+ t + + + + ...− 1
t 2! 3! 4!
t t2 t3
= 1+ + + + ...
2! 3! 4!

X tr
=
r=0
(r + 1)!
∞ r
X t 1
=
r=0
r! r + 1

Thus
1
µ′k = for k = 0, 1, 2, 3, . . .
k+1

STAT8310 2021 Topic 5.1 35


• Hence
1 1
µ = µ′1 = =
1+1 2
1 1
µ′2 = =
2+1 3
and

σ 2 = µ′2 − µ2
1 1 1
= − =
3 4 12
etc.
• Computation of the moments is much easier directly!

STAT8310 2021 Topic 5.1 36


2

2. X ∼ N µ, σ

The mgf of X is
tX

MX (t) = E e
Z∞ (  2 )
1 1 x−µ
= etx √ exp − dx
2πσ 2 2 σ
−∞
Z∞  
1 1
exp − 2 x2 − 2µx + µ2 − 2σ 2 tx dx

=√
2πσ 2 2σ
−∞

• We now illustrate the principle of integration by recognition:


the exponent in the above is
1 2 2 2

− 2 x − 2xµ + µ − 2σ tx

STAT8310 2021 Topic 5.1 37


which is quadratic in x.
2
• The exponent of the pdf is of the form c (x − a) + d.
• If we can get the exponent here in the same form, we can use
the fact that the pdf of X integrates to 1 to find the mgf.
• The exponent is, completing the square, with respect to x,
1 n 2 2 2
o
− 2 x − 2x µ + σ 2 t + µ + σ 2 t − µ + σ 2 t + µ2
  

1 h 2
 2 2 2 4 2 2
i
=− 2 x− µ+σ t − µ − 2µσ t − σ t + µ

1  2
 2 1
=− 2 x− µ+σ t + µt + σ 2 t2
2σ 2
• Thus
∞  
1 1
Z
µt+ 12 σ 2 t2
 2
 2
MX (t) = e √ exp − 2 x − µ + σ t dx
2πσ 2 −∞ 2σ
1 2 2
= eµt+ 2 σ t
,

STAT8310 2021 Topic 5.1 38


2 2

since the integral is that of the pdf of a N µ + σ t, σ rv.
• Then
µt+ 12 σ 2 t2 2


MX (t) =e µ+σ t
and
′′ µt+ 21 σ 2 t2 2
2 µt+ 1 σ2 t2
2
MX (t) =e σ + µ+σ t e 2 .
• Hence

E (X) = MX (0)
= e0+0 (µ + 0) = µ

and
2

E X = M ′′ (0)
2
= e0+0 σ 2 + (µ + 0) e0+0
= σ 2 + µ2 .

STAT8310 2021 Topic 5.1 39


• Consequently

var X = µ2
= σ 2 + µ2 − µ2
= σ2 .

3. X ∼ Poisson(λ) .

 e−λ λx x = 0, 1, 2, . . .
x!
fX (x) =
0 otherwise

STAT8310 2021 Topic 5.1 40


• Thus
tX

MX (t) = E e
X
= etx fX (x)
x
∞ −λ x
X
tx e λ
= e
x=0
x!
∞ x
X (λet )
= e−λ
x=0
x!
t
= e−λ · eλe
λ(et −1)
=e .

STAT8310 2021 Topic 5.1 41


• The derivatives are given by
λ(et −1)

MX (t) =e λet
n o
(t) = λ eλ(e −1) et + et eλ(e −1) λet
t t
′′
MX
t λ(et −1) 2 2t λ(et −1)
= λe e +λ e e

• Hence

E (X) = MX (0)
= eλ(1−1) λe0 = λ
and
2
(0) = λ + λ2
 ′′
E X = MX
var (X) = λ + λ2 − λ2 = λ.

• The calculation using pgfs was clearly easier.

STAT8310 2021 Topic 5.1 42


• Now try series expansion:
2 t 2 3 t 3
λ (e − 1) λ (e − 1)
MX (t) = 1 + λ et − 1 +

+ + ···
2! 3!
while
2 3
t t
et − 1 = t + + + · · ·
2! 3!

STAT8310 2021 Topic 5.1 43


• Thus
2 3
 
t t
MX (t) = 1 + λ t + + + · · ·
2! 3!
2
   
λ 2 1 4
+ t2 + t3 + + t + ···
2! 3! 4
3
 
λ 3
+ t3 + t4 + . . . + · · ·
3! 2!
 2
2 t
= 1 + λt + λ + λ
2!
  3
3! 2 3 t
+ λ+ λ +λ + ···
2! 3!

STAT8310 2021 Topic 5.1 44


• So

E (X) = λ
E X = λ + λ2
2


3
E X = λ3 + 3λ2 + λ.


• In none of the previous examples was it easy to use the mgf


to obtain moments (using either the derivative or the series
expansion method).
• Also, the mgf of a distribution may not always exist. Even
the existence of all moments of a distribution does not
guarantee that the mgf exists.

STAT8310 2021 Topic 5.1 45


Example

Suppose that X is a rv with density


 1 exp(−√x),

2 if x ≥ 0
f (x) =
 0, if x < 0.

Then E(X k ) = Γ(2k + 2), k = 0, 1, . . . , and hence X has moments of


any order. However, the mgf MX (t) does not exist. Indeed, the mgf
can be written in the form
1 ∞ √
Z
MX (t) = exp(tx − x) dx.
2 0
If ε > 0 is small enough then for every t with 0 < t < ε we have

tx − x → ∞ as x → ∞. This implies that
R∞ √
0
exp(tx − x) dx = ∞. Therefore MX (t) does not exist in spite of
the fact that all moments of X exist.

STAT8310 2021 Topic 5.1 46


Basic Properties of mgfs
• MX (0) = 1;
• MX (t) may not always exist for any real t 6= 0.
• Ma+bX (t) = eat MX (bt) for a, b constants.
Proof:
n o
Ma+bX (t) = E et(a+bX)
n o
at (bt)X
=E e e
= eat MX (bt) .

STAT8310 2021 Topic 5.1 47


Uniqueness Theorem
• The mgf of a rv X uniquely determines the distribution function
of X at all points of continuity.
• The proof of this result comes from the uniqueness theorem for
Laplace transforms.
• The pdf or pf of a rv can thus in principle be obtained by
inverting the mgf.

STAT8310 2021 Topic 5.1 48


Moment generating functions of sums of
random variables
• Let X have mgf MX (t) and Y have mgf MY (t) with X and Y
independent.
• Then the mgf of W = X + Y is
tW

MW (t) = E e
n o
t(X+Y )
=E e
tX+tY

=E e
tX tY

=E e e
tX tY
 
=E e E e (by independence of X and Y )
= MX (t)MY (t)

STAT8310 2021 Topic 5.1 49


i.e. the mgf of the sum of two independent random variables is
the product of the mgfs.
2 2
 
• Suppose X ∼ N µX , σX and Y ∼ N µY , σY with X and Y
independent.
• Then
MW (t) = MX (t)MY (t)
2 2 2 2
µX t+ 12 σX t µY t+ 21 σY t
=e e

= e(µX +µY )t+ 2 (σX +σY )t .


1 2 2 2

• By the uniqueness theorem, we have


2
σY2

W ∼ N µX + µY , σX + ,
since MW (t) is recognised as the mgf of a rv which is normally
2 2

distributed with mean (µX + µY ) and variance σX + σY .

STAT8310 2021 Topic 5.1 50


• We could have obtained this by using the convolution formula,
but this idea is much more powerful and easily generalises:

Extension

• Let X1 , X2 , . . . , Xn be independent rvs, with mgfs


MXi (t); i = 1, . . . , n.
n
P
• Let Y = Xi .
i=1

STAT8310 2021 Topic 5.1 51


• Then
tY

MY (t) = E e
 Pn 
= E et i=1 Xi
n
!
Y
=E etXi
i=1
n
Y
tXi

= E e (by independence of Xi s)
i=1
Yn
= MXi (t).
i=1

• Thus the mgf of the sum of independent random variables is the


product of the mgfs.

STAT8310 2021 Topic 5.1 52


We want to know if

MX1 +X2 (t) = MX1 (t)MX2 (t)

implies the independence of X1 and X2 .

Example

Let (X1 , X2 ) be a two-dimensional random vector defined by the


table:
X1 \ X2 1 2 3
2 1 3
1 18 18 18
3 2 1
2 18 18 18
1 3 2
3 18 18 18

STAT8310 2021 Topic 5.1 53


We can easily find that X1 and X2 are identically distributed rvs
taking each of the values 1, 2, 3 with probability 1/3. The sum
Y = X1 + X2 is a rv taking values 2, 3, 4, 5, 6 with probabilities 1/9,
2/9, 3/9, 2/9, 1/9 respectively. For the mgfs we get
1 t
MX1 (t) = MX2 (t) = (e + e2t + e3t ),
3
1 2t
MY (t) = (e + 2e3t + 3e4t + 2e5t + e6t ).
9
Clearly
MX1 +X2 (t) = MX1 (t)MX2 (t).
However, the rvs X1 and X2 are not independent as can be seen
easily from the table above:

P (X1 = i, X2 = j) 6= P (X1 = i)P (X2 = j) for all i 6= j.

STAT8310 2021 Topic 5.1 54


Special cases

• If the Xi are identically distributed as well as being independent,


then
MXi (t) = MX (t),
say, ∀i.
n
P
• Thus Y = Xi has mgf
i=1

n
Y
MY (t) = MXi (t)
i=1
Yn
= MX (t)
i=1
n
= {MX (t)} .

STAT8310 2021 Topic 5.1 55


n
P
• Now suppose that Y = ai Xi where the Xi s are independent,
i=1
and the ai s are constants.
• Then
 Pn 
MY (t) = E et i=1 a i Xi

n
!
Y
=E etai Xi
i=1
n
Y
tai Xi

= E e (by independence of the Xi′ s)
i=1
Yn
= MXi (ai t)
i=1
Yn
= MX (ai t) if the Xi s are iid.
i=1

STAT8310 2021 Topic 5.1 56


1
– Special case ai = n ; (i = 1, . . . , n) . Then
n n
X 1 1X
Y = Xi = Xi = X.
i=1
n n i=1

and so
n 
Y t
MX (t) = MX
i=1
n
  n
t
= MX .
n

STAT8310 2021 Topic 5.1 57


2

– Example: Xi iid ∼ N µ, σ . Then
  n
t
MX (t) = MX
n
" (    2 )#n
t 1 2 t
= exp µX + σX
n 2 n
(    2 )
t 1 2 t
= exp µX n + σX n
n 2 n
2
σX
µX t+ 12 t2
=e n

which is the mgf of a rv which is normally distributed with


2
σX
mean µX and variance n .
– Hence, by applying the uniqueness theorem, it follows that if

STAT8310 2021 Topic 5.1 58


2

the Xi are iid N µX , σX, then
2
 
σX
X ∼ N µX , .
n
– Previously, without the use of mgfs, we were able to find the
mean and variance of X , but not define its exact distribution.

Further examples
n
1. The mgf of a rv X ∼ Bin(n, p) is (1 − p + pet ) , (by putting
z = et in its pgf).
• Suppose X1 ∼ Bin(n1 , p) , independently of X2 ∼ Bin (n2 , p) .

STAT8310 2021 Topic 5.1 59


Then

MX1 +X2 (t) = MX1 (t)MX2 (t)


t n1 t n2
 
= 1 − p + pe 1 − p + pe
t n1 +n2

= 1 − p + pe

• Thus X1 + X2 ∼ Bin(n1 + n2 , p) by the uniqueness theorem.

2. The mgf of a rv X ∼ Poisson(λ) has MX (t) = eλ(e −1)


t
(by
putting z = et in its pgf).
• Suppose Xi ∼ Poisson(λi ); i = 1, 2, . . . n and are independent.
Pn
• Let Y = i=1 Xi .

STAT8310 2021 Topic 5.1 60


• Then
n
Y
MY (t) = MXi (t)
i=1
n
eλi (e −1)
Y t
=
i=1
Pn
λi (et −1)
=e i=1

λ(et −1)
=e ,
Pn
where λ = i=1 λi .
• Thus Y ∼ Poisson(λ) by application of the uniqueness
theorem.
Question: Is X Poisson?
Answer: No, since X can take non-integer values.

STAT8310 2021 Topic 5.1 61


3. The mgf of a rv X which is distributed NegBin(k, p) is
 t
k
pe
t
.
1 − (1 − p) e
• Suppose X1 and X2 are independent and negative binomially
distributed with parameters (k1 , p) and (k2 , p) .
• Then

MX1 +X2 (t)


 t
k1  t
k2
pe pe
=
1 − (1 − p) et 1 − (1 − p) et
 t
k1 +k2
pe
= t
.
1 − (1 − p) e
• Application of the uniqueness theorem gives X1 + X2 ∼
NegBin(k1 + k2 , p) .

STAT8310 2021 Topic 5.1 62


4. Sum of i.i.d. Gamma’s.
• We’ll put β = 1/γ, to make the algebra easier:
• If X ∼ G (α, β), then
Z ∞ α α−1 −γx
tx γ x e
MX (t) = e dx
0 Γ (α)
Z ∞ α−1 −(γ−t)x
α x e
=γ dx
0 Γ (α)
α α
(γ − t) xα−1 e−(γ−t)x
Z ∞
γ
= α dx
(γ − t) 0 Γ (α)
  α
γ

γ−t ; t<γ
=
 ∞ ; t≥γ
  α
1

1−βt ; t < 1/β
=
 ∞ ; t ≥ 1/β.

STAT8310 2021 Topic 5.1 63


• If X1 , . . . , Xn are i.i.d G (α, β) , i = 1, . . . , n then the mgf of
Pn
Y = i=1 Xi is
n
Y −α
MY (t) = (1 − βt)
i=1
−nα
= (1 − βt) .

• Application of the uniqueness theorem gives


Pn
Xi ∼ G (nα, β) .
i=1
• Note: we needed to keep the β ′ s the same for each Xi .
• We previously obtained this result using convolutions.

STAT8310 2021 Topic 5.1 64


n
1
P
5. Sample mean of i.i.d. Gamma rvs: Letting X = n Xi ,
i=1
we have from above
 
t
MX (t) = MY
n
 −nα
βt
= 1−
n
 
Application of the uniqueness theorem gives X ∼ G nα, nβ .

STAT8310 2021 Topic 5.1 65


Cumulant Generating Function
• Definition: KX (t) = log MX (t) is called the cumulant generating
function (cgf) of X.
• It generates “cumulants” in the same way that the mgf generates
moments.
• Let

X κr
KX (t) = tr .
r=0
r!

• Then κr is called the rth cumulant of X.


• If the mgf exists for a distribution, then so does the cgf.

STAT8310 2021 Topic 5.1 66


• Let X1 and X2 be independent rvs and let Y = X1 + X2 . Then

KY (t) = log MY (t)


= log {MX1 (t)MX2 (t)}
= log MX1 (t) + log MX2 (t)
= KX1 (t) + KX2 (t)

• The cgf of two independent rvs is thus the sum of the two cgfs.
Pn ′
• Let Y = Xi where the Xi s are i.i.d.
i=1

• Then

KY (t) = log MY (t)


n
= log {MX (t)}
= n log MX (t),

STAT8310 2021 Topic 5.1 67


and

KX (t) = log MX (t)


  n
t
= log MX
n
 
t
= n log MX
n
 
t
= nKX .
n

• These results will come in useful later.


• As with mgfs, cumulants can be obtained from cgfs in two
different ways:
1. Differentiation
2. Taylor Series Expansion

STAT8310 2021 Topic 5.1 68


• Note: There is no simple formula for κr in general and cumulants
have no intrinsic meaning.
1. Differentiation
KX (0) = log MX (0) = log 1 = 0

′ d MX (t)
KX (t) = {log MX (t)} =
dt MX (t)
′ ′
M (0) µ
∴ κ1 = KX ′
(0) = X = 1 = µ′1 = µX
MX (0) 1
i.e. first cumulant is E (X) = µX .
2. Now
 ′

′′ d MX (t)
KX (t) =
dt MX (t)
′′
′ 2
MX (t)MX (t) − {MX (t)}
= 2 .
{MX (t)}

STAT8310 2021 Topic 5.1 69


Thus
2
′′ 1 × µ′2 − (µ′1 ) 2
κ2 = KX (0) = = σX
12
2
i.e. the second cumulant is the variance = σX . Similarly,
(3)
KX (0) = κ3 = µ3 ,

the third central moment. Also


(4)
KX (0) = κ4
= µ4 − 3σ 4

is not really meaningful, except that all of the cumulants of a


normal random variable from the 3rd onwards are 0, since
2

when X ∼ N µ, σ ,
1
KX (t) = µt + σ 2 t2 ,
2

STAT8310 2021 Topic 5.1 70


(r)
and so KX (t) = 0 when r ≥ 3.
3. The Taylor series expansion of KX (t) is found by expanding
MX (t) :

KX (t) = log MX (t)


2 3
 
t t
= log 1 + µ′1 t + µ′2 + µ′3 + · · ·
2! 3!
= log {1 + h(t)}
2 3
where h(t) = µ′1 t + µ′2 t2! + µ′3 t3! + · · · . Now, if |S| < 1,
1 2 1 3 1 4
log (1 + S) = S − S + S − S + · · · .
2 3 4

STAT8310 2021 Topic 5.1 71


Thus, as long as |h(t)| < 1,
1 1 1
KX (t) = h(t) − {h (t)} + {h(t)} − {h(t)}4 + · · ·
2 3
2 3 4
 2 3
  2 3
2
t t 1 t t
= µ′1 t + µ′2 + µ′3 + · · · − µ′1 t + µ′2 + µ′3 + · · · +·
2! 3! 2 2! 3!
Expanding and grouping like terms, we have
  t2
2
KX (t) = µ′1 t − µ′2 − (µ′1 ) + ···
2!
tr
Since κr is the coefficient of r! in KX (t) , we have

κ0 = 0
κ1 = µ′1 = µ
2
κ2 = µ′2 − (µ′1 ) = σ 2 .

STAT8310 2021 Topic 5.1 72


“Central” Cumulant Generating Function
• As with the mgf, we can find the cgf of Y = X − µ from

KX−µ (t) = log MX−µ (t)


 −µt
= log e MX (t)
= −µt + KX (t)

• Thus the only cumulants of Y and X which are different are the
first cumulants.

STAT8310 2021 Topic 5.1 73

You might also like