Professional Documents
Culture Documents
110SOR201(2002)
Chapter 6
Our previous discussion of probability generating functions was in the context of discrete r.v.s.
Now we introduce a more general form of generating function which can be used (though not
exclusively so) for continuous r.v.s.
The moment generating function (MGF) of a random variable X is defined as
X
ex P(X = x)
MX () = E(eX ) = Zx
ex fX (x)dx
if X is discrete
(6.1)
if X is continuous
for all real for which the sum or integral converges absolutely. In some cases the existence
of MX () can be a problem for non-zero : henceforth we assume that M X () exists in some
neighbourhood of the origin, || < 0 . In this case the following can be proved:
(i) There is a unique distribution with MGF M X ().
(ii) Moments about the origin may be found by power series expansion: thus we may write
MX () = E(eX )
= E
X
(X)r
r=0
X
r
r=0
r!
r!
E(X r )
valid]
i.e.
MX () =
r=0
0r
r
r!
where 0r = E(X r ).
(6.2)
So, given a function which is known to be the MGF of a r.v. X, expansion of this function in
a power series of gives 0r , the r th moment about the origin, as the coefficient of r /r!.
(iii) Moments about the origin may also be found by differentiation: thus
o
dr n
dr
X )
E(e
{M
()}
=
X
d r
dr r
d
X
= E
(e )
d r
(i.e. interchange of E and differentiation valid)
r
X
.
= E X e
page 94
110SOR201(2002)
So
dr
{MX ()}
d r
=0
= E(X r ) = 0r .
(6.3)
(iv) If we require moments about the mean, r = E[(X )r ], we consider MX (), which
can be obtained from MX () as follows:
r
r!
MX () =
in the expansion
r=0
or by differentiation:
(6.4)
r
r!
dr
{MX ()}
r =
d r
(6.5)
(6.6)
=0
(6.7)
Example
Find the MGF of the N (0, 1) distribution and hence of N (, 2 ). Find the moments about the
mean of N (, 2 ).
Solution
If Z N (0, 1),
Z
MZ () = E(e
Z )
1 2
1
=
ez e 2 z dz
2
Z
1
1
1
=
exp{ (z 2 2z + 2 ) + 2 }dz
2
2
2
Z
1
2
1 2 1
= exp( 2 ) 2
exp{ (z ) }dz.
2
But here
1 exp{...}
2
If X = + Z, X N (, 2 ), and
MX () = M+Z ()
= e MZ () by (6.7)
= exp( + 21 2 2 ).
(6.8)
page 95
Then
110SOR201(2002)
MX () = e MX () = exp( 12 2 2 )
X
( 21 2 2 )r X
2r 2r
=
=
r!
2r r!
r=0
r=0
X
2r (2r)! 2r
.
.
.
=
2r r! (2r)!
r=0
6.2
(6.9)
4 = 3 4 .
Theorem
Let X, Y be independent r.v.s with MGFs M X (), MY () respectively. Then
MX+Y () = MX ().MY ().
Proof
MX+Y () = E e(X+Y )
(6.10)
= E eX .eY
= E(eX ).E(eY ) [independence]
= MX ().MY ().
Corollary
(6.11)
(6.12)
Here the PGF is generally preferred, so we shall concentrate on the MGF applied to continuous
r.v.s.
Example
Let Z1 , ..., Zn be independent N (0, 1) r.v.s. Show that
V = Z12 + + Zn2 2n .
Solution
(6.13)
MZ 2 () = E eZ
1 2
1
2
ez e 2 z dz
2
Z
1
1
exp{ (1 2)z 2 }dz.
=
2
2
1 2z. Then
1 2
1
1
1
e 2 y .
dy = (1 2) 2 ,
1 2
2
1
< .
2
(6.14)
page 96
Hence
110SOR201(2002)
MV () = (1 2) 2 .(1 2) 2 ...(1 2) 2
= (1 2)n/2 , < 21 .
Now 2n has the p.d.f.
n
1
1
w 2 1 e 2 w ,
n
2 ( 2 )
n
2
w 0; n a positive integer.
Its MGF is
1
n
1
w 2 1 e 2 w dw
n
2 ( 2 )
Z0
n
1
1
w 2 1 exp{ w(1 2)}dw
n
n
2
0
2 2 ( 2 )
(t = 21 (1 2)
Z
n
n
1
(1 2) 2 n
t 2 1 et dt
( 2 ) 0
n
< 12
(1 2) 2 ,
MV ().
=
=
=
ew
n
2
( < 21 ))
n2
2
( < 21 )
6.3
Bivariate MGF
The bivariate MGF (or joint MGF) of the continuous r.v.s (X, Y ) with joint p.d.f.
f (x, y), < x, y < is defined as
MX,Y (1 , 2 ) = E e1 X+2 Y
(6.15)
provided the integral converges absolutely (there is a similar definition for the discrete case). If
MX,Y (1 , 2 ) exists near the origin, for |1 | < 10 , |2 | < 20 say, then it can be shown that
"
r+s MX,Y (1 , 2 )
1r 2s
= E(X r Y s ).
(6.16)
1 =2 =0
The bivariate MGF can also be used to find the MGF of aX + bY , since
(6.17)
page 97
110SOR201(2002)
Example
Using MGFs:
(i) show that if (U, V ) N (0, 0; 1, 1; ), then (U, V ) = , and deduce (X, Y ),
where (X, Y ) N (x , y ; x2 , y2 ; );
(ii) for the variables (X, Y ) in (i), find the distribution of a linear combination aX + bY , and
generalise the result obtained to the multivariate Normal case.
Solution
(i)
We have
1 U +2 V )
MU,V (1 , 2 ) = E(e
Z Z
e1 u+2 v
=
12
= ......... =
Then
exp{......}dudv
exp{ 21 (12 +
MU,V (1 , 2 )
1
2 MU,V (1 , 2 )
1 2
So
1
1
p
exp
[u2 2uv + v 2 ] dudv
2
2(1 2 )
2 1
21 2 + 22 )}.
= exp{.....}(1 + 2 )
= exp{....}(1 + 2 )(1 + 2 ) + exp{....}.
"
2 MU,V (1 , 2 )
E(U V ) =
1 2
= .
1 =2 =0
Since E(U ) = E(V ) = 0 and Var(U ) = Var(V ) = 1, we have that the correlation coefficient of
U, V is
Cov(U, V )
E(U V ) E(U )E(V )
(U, V ) = p
= .
=
1
Var(U ).Var(V )
Now let
X = x + x U,
Y = y + y V.
MX,Y (1 , 2 ) = E e1 (x +x U )+2 (y +y V )
= e(1 x +2 y ) MU,V (1 x , 2 y )
= exp{(1 x + 2 y ) + 21 (12 x2 + 21 2 x y + 22 y2 )].
So, for a linear combination of X and Y ,
MaX+bY () = MX,Y (a, b) = exp{(ax + by ) + 12 (a2 x2 + 2abCov(X, Y ) + b2 y2 ) 2 }
= MGF of N (ax + by , a2 x2 + 2abCov(X, Y ) + b2 y2 ) 2 ),
i.e.
aX + bY N (aE(X) + bE(Y ), a2 Var(X) + 2abCov(X, Y ) + b2 Var(Y )).
(6.18)
page 98
110SOR201(2002)
More generally, let (X1 , ..., Xn ) be multivariate normally distributed. Then, by induction,
n
X
i=1
n
n
X
X
X
ai aj Cov(Xi , Xj ) .
ai Xi N ai E(Xi ),
a2i Var(Xi ) + 2
i=1
(6.19)
i<j
i=1
(If the Xs are also independent, the covariance terms vanish but then there is a simpler
derivation (see HW 8).)
6.4
Sequences of r.v.s
6.4.1
Continuity theorem
for all ,
as n
(e 1)}n
n
where = np.
1+
a
n
n
ea
as n ,
as n , a constant,
(6.20)
i.e.
MXn () MGF of Poisson()
(6.21)
(use (6.12), replacing s by e in the Poisson PGF (3.7)). So, invoking the above continuity
theorem,
Bin(n, p) Poisson()
(6.22)
as n , p 0 with np = > 0 fixed. Hence in large samples, the binomial distribution
can be approximated by the Poisson distribution. As a rule of thumb: the approximation is
acceptable when n is large, p small, and = np 5.
page 99
6.4.2
110SOR201(2002)
Asymptotic normality
Let {Xn } be a sequence of r.v.s (discrete or continuous). If two quantities a and b can be found
such that
(Xn a)
c.d.f. of N (0, 1) as n ,
(6.23)
c.d.f. of
b
Xn is said to be asymptotically normally distributed with mean a and variance b 2 , and we write
Xn a a
N (0, 1)
b
Xn N (a, b2 ).
or
(6.24)
Notes: (i) a and b need not be functions of n; but often a and b 2 are the mean and variance of
Xn (and so are functions of n).
(ii) In large samples we use N (a, b2 ) as an approximation to the distribution of X n .
6.5
A restricted form of this celebrated theorem will now be stated and proved.
Theorem
Let X1 , X2 , ... be a sequence of independent identically distributed r.v.s, each with mean and
variance 2 . Let
(Sn n)
.
Sn = X 1 + X 2 + + X n ,
Zn =
n
Then
a
Zn N (0, 1)
a
and Sn
Proof
or
N (n, n 2 ).
Let Yi = Xi
P(Zn z) P(Z z)
Sn n = X1 + + Xn n = Y1 + + Yn .
So
MSn n () = MY1 ().MY2 ()....MYn () = {MY ()}n ,
and
S
n n
= MSn n
Note that
E(Y ) = E(X ) = 0 :
Then
= MY
oi
i
on
E(Y 2 ) = E{(X )2 } = 2 .
2
3
+ E(Y 2 ) + E(Y 3 ) +
1!
2!
3!
= 1 + 21 2 2 + o( 2 )
MY () = 1 + E(Y )
page 100
g()
2
110SOR201(2002)
0 as 0). So
1 2 1
1 n
1 n
MZn () = {1 + 12 2 ( n
2 ) + o( n )} = {1 + 2 . n + o( n )}
h(n)
1/n
0 as n ).
as n
Sn n
c.d.f. of N (0, 1)
n
Zn N (0, 1)
or
as n ,
Sn N (n, n 2 ).
(6.25)
u
t
Corollary
n
P
Let X n =
1
n
Proof
variance
2
2
a
X n N (n. , n. 2 ) = N (, ).
n
n
n
i=1
Xi . Then X n N (, n ).
(6.26)
2
n2
and
u
t
where
Xi =
1,
0,
Var(Xi ) = pq.
Sn N (np, npq),
i.e., for large n, the binomial c.d.f. is approximated by the c.d.f. of N (np, npq).
[As a rule of thumb: the approximation is acceptable when n is large and p
np > 5.]
1
2
u
t
such that
page 101
110SOR201(2002)
Example 2
As Example 1, but for the 2n distribution.
Solution
Zi N (0, 1),
So
E(Zi2 ) = 1,
Var(Zi2 ) = 2.
Vn N (n, 2n).
Note:
u
t
These are not necessarily the best approximations for large n. Thus
(i)
P(Sn s) P Z
= FS
s+ 21 np
npq
s+ 21 np
npq
where Z N (0, 1)
The 12 is a continuity correction, to take account of the fact that we are approximating a
discrete distribution by a continuous one.
(ii)
p
2Vn
6.6
approx
N ( 2n 1, 1).
Characteristic function
The MGF does not exist unless all the moments of the distribution are finite. So many distributions (e.g. t,F ) do not have MGFs. So another GF is often used.
The characteristic function of a continuous r.v. X is
CX () = E(e
iX
)=
eix f (x)dx,
(6.27)
where is real and i = 1. CX () always exists, and has similar properties to M X (). The
CF uniquely determines the p.d.f.:
1
f (x) =
2
CX ()eix d
(6.28)
(cf. Fourier transform). The CF is particularly useful in studying limiting distributions. However, we do not consider the CF further in this module.