You are on page 1of 9

page 93

110SOR201(2002)

Chapter 6

Moment Generating Functions


6.1

Definition and Properties

Our previous discussion of probability generating functions was in the context of discrete r.v.s.
Now we introduce a more general form of generating function which can be used (though not
exclusively so) for continuous r.v.s.
The moment generating function (MGF) of a random variable X is defined as
X

ex P(X = x)

MX () = E(eX ) = Zx

ex fX (x)dx

if X is discrete
(6.1)
if X is continuous

for all real for which the sum or integral converges absolutely. In some cases the existence
of MX () can be a problem for non-zero : henceforth we assume that M X () exists in some
neighbourhood of the origin, || < 0 . In this case the following can be proved:
(i) There is a unique distribution with MGF M X ().
(ii) Moments about the origin may be found by power series expansion: thus we may write
MX () = E(eX )
= E

X
(X)r

r=0

X
r

r=0

r!

r!

E(X r )

[i.e. interchange of E and

valid]

i.e.
MX () =

r=0

0r

r
r!

where 0r = E(X r ).

(6.2)

So, given a function which is known to be the MGF of a r.v. X, expansion of this function in
a power series of gives 0r , the r th moment about the origin, as the coefficient of r /r!.
(iii) Moments about the origin may also be found by differentiation: thus
o
dr n
dr
X )
E(e
{M
()}
=
X
d r
dr r

d
X
= E
(e )
d r
 (i.e. interchange of E and differentiation valid)

r
X
.
= E X e

page 94

110SOR201(2002)

So


dr
{MX ()}
d r

=0

= E(X r ) = 0r .

(6.3)

(iv) If we require moments about the mean, r = E[(X )r ], we consider MX (), which
can be obtained from MX () as follows:


MX () = E e(X) = e E(eX ) = e MX ().


Then r can be obtained as the coefficient of

r
r!

MX () =

in the expansion

r=0

or by differentiation:

(6.4)

r
r!

dr
{MX ()}
r =
d r


(6.5)

(6.6)

=0

(v) More generally:




Ma+bX () = E e(a+bX) = ea MX (b).

(6.7)

Example
Find the MGF of the N (0, 1) distribution and hence of N (, 2 ). Find the moments about the
mean of N (, 2 ).
Solution

If Z N (0, 1),
Z
MZ () = E(e
Z )
1 2
1
=
ez e 2 z dz
2
Z

1
1
1

=
exp{ (z 2 2z + 2 ) + 2 }dz
2
2
2
Z
1
2
1 2 1
= exp( 2 ) 2
exp{ (z ) }dz.
2

But here

1 exp{...}
2

is the p.d.f. of N (, 1)), so


1
MZ () = exp( 2 )
2

If X = + Z, X N (, 2 ), and
MX () = M+Z ()
= e MZ () by (6.7)
= exp( + 21 2 2 ).

(6.8)

page 95

Then

110SOR201(2002)

MX () = e MX () = exp( 12 2 2 )

X
( 21 2 2 )r X
2r 2r
=
=

r!
2r r!
r=0
r=0

X
2r (2r)! 2r
.
.
.
=
2r r! (2r)!
r=0

Using property (iv) above, we obtain


2r+1 = 0, r = 1, 2, ...
2r (2r)!
2r =
, r = 0, 1, 2, ...
2r r!
e.g. 2 = 2 ;

6.2

(6.9)

4 = 3 4 .

Sum of independent variables

Theorem
Let X, Y be independent r.v.s with MGFs M X (), MY () respectively. Then
MX+Y () = MX ().MY ().
Proof

MX+Y () = E e(X+Y )


(6.10)

= E eX .eY
= E(eX ).E(eY ) [independence]
= MX ().MY ().
Corollary

If X1 , X2 , ..., Xn are independent r.v.s,


MX1 +X2 ++Xn () = MX1 ().MX2 ()...MXn ().

(6.11)

Note: If X is a count r.v. with PGF GX (s) and MGF MX (),


MX () = GX (e ) :

GX (s) = MX (log s).

(6.12)

Here the PGF is generally preferred, so we shall concentrate on the MGF applied to continuous
r.v.s.
Example
Let Z1 , ..., Zn be independent N (0, 1) r.v.s. Show that
V = Z12 + + Zn2 2n .
Solution

(6.13)

Let Z N (0, 1). Then




MZ 2 () = E eZ

Assuming < 21 , substitute y =


MZ 2 () =

1 2
1
2
ez e 2 z dz
2
Z

1
1
exp{ (1 2)z 2 }dz.
=
2
2

1 2z. Then

1 2
1
1
1
e 2 y .
dy = (1 2) 2 ,
1 2
2

1
< .
2

(6.14)

page 96

Hence

110SOR201(2002)

MV () = (1 2) 2 .(1 2) 2 ...(1 2) 2
= (1 2)n/2 , < 21 .
Now 2n has the p.d.f.
n
1
1
w 2 1 e 2 w ,
n
2 ( 2 )
n
2

w 0; n a positive integer.

Its MGF is
1
n
1
w 2 1 e 2 w dw
n
2 ( 2 )
Z0
n
1
1
w 2 1 exp{ w(1 2)}dw
n
n
2
0
2 2 ( 2 )
(t = 21 (1 2)
Z
n
n
1
(1 2) 2 n
t 2 1 et dt
( 2 ) 0
n
< 12
(1 2) 2 ,
MV ().

=
=
=

ew

n
2

( < 21 ))

So we deduce that V 2n . Also, from MZ 2 () we deduce that Z 2 21 .


If V1 2n1 , V2 2n2 and V1 , V2 are independent, then
n1

MV1 +V2 () = MV1 ().MV2 () = (1 2) 2 (1 2)


= (1 2)(n1 +n2 )/2 .

n2
2

( < 21 )

So V1 + V2 2n1 +n2 . [This was also shown in Example 3, 5.8.2.]

6.3

Bivariate MGF

The bivariate MGF (or joint MGF) of the continuous r.v.s (X, Y ) with joint p.d.f.
f (x, y), < x, y < is defined as


MX,Y (1 , 2 ) = E e1 X+2 Y

e1 x+2 y f (x, y)dxdy,

(6.15)

provided the integral converges absolutely (there is a similar definition for the discrete case). If
MX,Y (1 , 2 ) exists near the origin, for |1 | < 10 , |2 | < 20 say, then it can be shown that
"

r+s MX,Y (1 , 2 )
1r 2s

= E(X r Y s ).

(6.16)

1 =2 =0

The bivariate MGF can also be used to find the MGF of aX + bY , since


MaX+bY () = E e(aX+bY ) = E e(a)X+(b)Y

= MX+Y (a, b).

(6.17)

page 97

110SOR201(2002)

Bivariate Normal distribution

Example
Using MGFs:

(i) show that if (U, V ) N (0, 0; 1, 1; ), then (U, V ) = , and deduce (X, Y ),
where (X, Y ) N (x , y ; x2 , y2 ; );
(ii) for the variables (X, Y ) in (i), find the distribution of a linear combination aX + bY , and
generalise the result obtained to the multivariate Normal case.
Solution
(i)

We have
1 U +2 V )
MU,V (1 , 2 ) = E(e
Z Z
e1 u+2 v
=

12

= ......... =
Then

exp{......}dudv


exp{ 21 (12 +

MU,V (1 , 2 )
1
2 MU,V (1 , 2 )
1 2

So

1
1
p
exp
[u2 2uv + v 2 ] dudv
2
2(1 2 )
2 1
21 2 + 22 )}.

= exp{.....}(1 + 2 )
= exp{....}(1 + 2 )(1 + 2 ) + exp{....}.
"

2 MU,V (1 , 2 )
E(U V ) =
1 2

= .
1 =2 =0

Since E(U ) = E(V ) = 0 and Var(U ) = Var(V ) = 1, we have that the correlation coefficient of
U, V is
Cov(U, V )
E(U V ) E(U )E(V )
(U, V ) = p
= .
=
1
Var(U ).Var(V )
Now let

X = x + x U,

Y = y + y V.

Then, as we have seen in Example 1, 5.8.2,


(U, V ) N (0, 0; 1, 1; ) (X, Y ) N (x , y ; x2 , y2 ; ).
It is readily shown that a correlation coefficient remains unchanged under a linear transformation of variables, so (X, Y ) = (U, V ) = .
(ii) We have that
h

MX,Y (1 , 2 ) = E e1 (x +x U )+2 (y +y V )
= e(1 x +2 y ) MU,V (1 x , 2 y )
= exp{(1 x + 2 y ) + 21 (12 x2 + 21 2 x y + 22 y2 )].
So, for a linear combination of X and Y ,
MaX+bY () = MX,Y (a, b) = exp{(ax + by ) + 12 (a2 x2 + 2abCov(X, Y ) + b2 y2 ) 2 }
= MGF of N (ax + by , a2 x2 + 2abCov(X, Y ) + b2 y2 ) 2 ),
i.e.
aX + bY N (aE(X) + bE(Y ), a2 Var(X) + 2abCov(X, Y ) + b2 Var(Y )).

(6.18)

page 98

110SOR201(2002)

More generally, let (X1 , ..., Xn ) be multivariate normally distributed. Then, by induction,
n
X
i=1

n
n
X
X
X
ai aj Cov(Xi , Xj ) .
ai Xi N ai E(Xi ),
a2i Var(Xi ) + 2
i=1

(6.19)

i<j

i=1

(If the Xs are also independent, the covariance terms vanish but then there is a simpler
derivation (see HW 8).)

6.4

Sequences of r.v.s

6.4.1

Continuity theorem

First we state (without proof) the following:


Theorem
Let X1 , X2 , ... be a sequence of r.v.s (discrete or continuous) with c.d.f.s F X1 (x), FX2 (x), ...
and MGFs MX1 (), MX2 (), ..., and suppose that, as n ,
MXn () MX ()

for all ,

where MX () is the MGF of some r.v. X with c.d.f. F X (x). Then


FXn (x) FX (x)

as n

at each x where FX (x) is continuous.


Example
Using MGFs, discuss the limit of Bin(n, p) as n , p 0 with np = > 0 fixed.
Solution

Let Xn Bin(n, p), with PGF GX (s) = (ps + q)n . Then


MXn () = GXn (e ) = (pe + q)n = {1 +


(e 1)}n
n

where = np.

Let n , p 0 in such a way that remains fixed. Then


MXn () exp{(e 1)}
since

1+

a
n

n

ea

as n ,

as n , a constant,

(6.20)

i.e.
MXn () MGF of Poisson()

(6.21)

(use (6.12), replacing s by e in the Poisson PGF (3.7)). So, invoking the above continuity
theorem,
Bin(n, p) Poisson()
(6.22)
as n , p 0 with np = > 0 fixed. Hence in large samples, the binomial distribution
can be approximated by the Poisson distribution. As a rule of thumb: the approximation is
acceptable when n is large, p small, and = np 5.

page 99

6.4.2

110SOR201(2002)

Asymptotic normality

Let {Xn } be a sequence of r.v.s (discrete or continuous). If two quantities a and b can be found
such that
(Xn a)
c.d.f. of N (0, 1) as n ,
(6.23)
c.d.f. of
b
Xn is said to be asymptotically normally distributed with mean a and variance b 2 , and we write
Xn a a
N (0, 1)
b

Xn N (a, b2 ).

or

(6.24)

Notes: (i) a and b need not be functions of n; but often a and b 2 are the mean and variance of
Xn (and so are functions of n).
(ii) In large samples we use N (a, b2 ) as an approximation to the distribution of X n .

6.5

Central limit theorem

A restricted form of this celebrated theorem will now be stated and proved.
Theorem
Let X1 , X2 , ... be a sequence of independent identically distributed r.v.s, each with mean and
variance 2 . Let
(Sn n)

.
Sn = X 1 + X 2 + + X n ,
Zn =
n
Then
a

Zn N (0, 1)
a

and Sn
Proof

or

N (n, n 2 ).

Let Yi = Xi

P(Zn z) P(Z z)

as n , where Z N (0, 1),

(i = 1, 2, ...). Then Y1 , Y2 , ... are i.i.d. r.v.s, and

Sn n = X1 + + Xn n = Y1 + + Yn .
So
MSn n () = MY1 ().MY2 ()....MYn () = {MY ()}n ,
and

MZn () = M Snn () = E exp


h

S
n n

= E exp (Sn n)( n )

= MSn n
Note that
E(Y ) = E(X ) = 0 :
Then

= MY

oi

i

on

E(Y 2 ) = E{(X )2 } = 2 .

2
3
+ E(Y 2 ) + E(Y 3 ) +
1!
2!
3!
= 1 + 21 2 2 + o( 2 )

MY () = 1 + E(Y )

page 100

(where o( 2 ) denotes a function g() such that

g()
2

110SOR201(2002)

0 as 0). So

1 2 1
1 n
1 n
MZn () = {1 + 12 2 ( n
2 ) + o( n )} = {1 + 2 . n + o( n )}

(where o( n1 ) denotes a function h(n) such that

h(n)
1/n

0 as n ).

Using the standard result (6.20), we deduce that


MZn () exp( 21 2 )

as n

which is the MGF of N(0,1).


So
c.d.f. of Zn =
i.e.

Sn n

c.d.f. of N (0, 1)
n

Zn N (0, 1)

or

as n ,

Sn N (n, n 2 ).

(6.25)
u
t

Corollary
n
P

Let X n =

1
n

Proof
variance

X n = W1 + + Wn where Wi = n1 Xi and W1 , ..., Wn are i.i.d. with mean


. So

2
2
a
X n N (n. , n. 2 ) = N (, ).
n
n
n

i=1

Xi . Then X n N (, n ).

(6.26)

2
n2

and
u
t

(Note: The theorem can be generalised to


independent r.v.s with different means & variances
dependent r.v.s
but extra conditions on the distributions are required.
Example 1
Using the central limit theorem, obtain an approximation to Bin(n, p) for large n.
Solution

Let Sn Bin(n, p). Then


Sn = X 1 + X 2 + + X n ,

where
Xi =

1,
0,

if the ith trial yields a success


if the ith trial yields a failure.

Also, X1 , X2 , ..., Xn are independent r.v.s with


E(Xi ) = p,
So

Var(Xi ) = pq.

Sn N (np, npq),
i.e., for large n, the binomial c.d.f. is approximated by the c.d.f. of N (np, npq).
[As a rule of thumb: the approximation is acceptable when n is large and p
np > 5.]

1
2

u
t
such that

page 101

110SOR201(2002)

Example 2
As Example 1, but for the 2n distribution.
Solution

Let Vn 2n . Then we can write


Vn = Z12 + + Zn2 ,

where Z12 , ..., Zn2 are independent r.v.s and


Zi2 21 ;

Zi N (0, 1),
So

E(Zi2 ) = 1,

Var(Zi2 ) = 2.

Vn N (n, 2n).
Note:

u
t

These are not necessarily the best approximations for large n. Thus

(i)


P(Sn s) P Z
= FS

s+ 21 np

npq

s+ 21 np

npq

where Z N (0, 1)

The 12 is a continuity correction, to take account of the fact that we are approximating a
discrete distribution by a continuous one.
(ii)
p

2Vn

6.6

approx

N ( 2n 1, 1).

Characteristic function

The MGF does not exist unless all the moments of the distribution are finite. So many distributions (e.g. t,F ) do not have MGFs. So another GF is often used.
The characteristic function of a continuous r.v. X is
CX () = E(e

iX

)=

eix f (x)dx,

(6.27)

where is real and i = 1. CX () always exists, and has similar properties to M X (). The
CF uniquely determines the p.d.f.:
1
f (x) =
2

CX ()eix d

(6.28)

(cf. Fourier transform). The CF is particularly useful in studying limiting distributions. However, we do not consider the CF further in this module.

You might also like