You are on page 1of 13

Introduction to Probability Theory

K. Suresh Kumar
Department of Mathematics
Indian Institute of Technology Bombay

October 19, 2017


2

LECTURES 22-23

Chapter 8 : Moment Generating functions and Character-


istic functions
In this chapter, we introduce the notion of moment generating function
(in short mgf) and characteristic function of a random variable and study its
properties. Both moment generating function and Characteristic function
can be used to identify distribution functions uniquely unlike moments. In
fact, a way to understand whether a distribution is moment determinant or
not is by using either moment generating function or characteristic functions.
It is interesting to note that mgf is closely related the Laplace transform and
characteristic function is its counter part Fourier transform.

0.1 Moment generating function


In this subsection we study moment generating functions and its properties.
Definition 8.1 Given a random variable on a probability space (Ω, F, P ),
its moment generating function denoted by MX is defined as

MX (t) = E[etX ], t ∈ I,

where I is an interval on which the rhs expectation exists. In fact for a


non negative random variable X, I always contains (−∞, 0]. If X is a non
negative random variable such that EX doesn’t exists, then MX (t) doesn’t
exists for t > 0 (exercise). Analogous comment holds for negative random
variable. Moment generating functions becomes useful, if I contains an
interval containing 0.

Example 0.1 Let X ∼ Bernoulli (p). Then

MX (t) = (1 − p) + pet , t ∈ R.

Now we will state and indicate the proofs of various properties of moment
generating functions.

Theorem 0.1 Let X be a random with mgf MX (t), t ∈ I and Y = aX +


b, a 6= 0. Then the mgf of Y is given by

MY (t) = ebt MX (t), t ∈ J,

where J = {t ∈ R|at ∈ I} := a−1 I.


0.1. MOMENT GENERATING FUNCTION 3

Proof: For t ∈ a−1 I,


E[etY ] = E[ebt eatX ]
= ebt MX (at).
Note that for t ∈ a−1 I, at ∈ I and hence MX (at) exists.
Following is an illustration of the use of the above theorem.
Example 0.2 Let X ∼ N (µ, σ 2 ). Then X = µ + σY, Y ∼ N (0, 1). Now
Z ∞
1 1 2
MY (t) = √ ety e− 2 y dy
2π −∞
Z ∞
t2 1 1 2
= e √ 2 e− 2 (y−t) dy
2π −∞
t2
= e 2 , t ∈ R.
t2
i.e. mgf of standard normal is e 2 , t ∈ R. Now using the above therorem
1 2 t2
MX (t) = eµt+ 2 σ , t ∈ R2 .

Theorem 0.2 Let X be a random variable such that MX (t) exists in an


interval I which contains [−h, h] for some h > 0. Then X has moments of
all orders and the mgf MX has all derivatives on (−h, h) and the following
holds.
(k)
EX k = MX (0), k = 0, 1, · · · .

Here M (k) (t) denote the kth derivative of MX (t) at t. Proof follows from
dk E[etX ] h dk etX i
= E , t ∈ (−h, h).
dtk dtk
So proof is all about justifying differentation under the ’integral’ sign.

The above theorem is bad for mgfs. What I meant is the following
”existence of mgf arround a neighourhood of 0, makes the random variable
very nice. This implies an unpleasent property of mgfs, i.e. they won’t
exists arround zero unless the random variable is nice”. We will see some
examples to illustrate this point.

Example 0.3 (Cauchy distribution) Let X be a Cauchy random variable,


i.e. X is a continuous random variable with pdf
1
f (x) = , x ∈ R.
π(1 + x2 )
4

We know that X doesn’t have finite mean (exercise). Hence Theorem 0.2
points to the non existence of mgf. In fact, we show that mgf of X exists
only at t = 0. For t > 0, consider
1 ∞ tx 1
Z
tX
E[e ] = e dx
π −∞ 1 + x2
1 ∞ tx 1
Z
≥ e dx
π 0 1 + x2
t ∞ x
Z
≥ dx
π 0 1 + x2
Z ∞
t 1
= dy.
2π 1 y
The RHS integral diverges to ∞. Hence by comparison of integrals, it follows
that E[etX ] diverges to ∞. i.e. MX (t) doesn’t exists for t > 0. Using a
simliar argument one can show that MX (t) doesn’t exists for t < 0 (exercise).
So MX (t) exists only at 0.
Example 0.4 (Log normal distribution) Let X ∼ eY , Y ∼ N (µ, σ 2 ). Then
X is said to be ’log normally’ distributed.
Note
n2
EX n = E[enY ] = MY (n) = e 2 , n ≥ 1.
Now since X ≥ 0, clearly MX (t) exists for t ≥ 0. Now for t > 0,
Y
E[etX ] = E[ete ]
Z ∞
1 y y2
= √ ete e− 2 dy
2π −∞
Z ∞
1
≥ √ edy,
2π K
for some K > 0 large enough. In the first inequality we used the fact that
there exists a K > 0
y y2
tee − ≥ 1 for all y ≥ K.
2
Now since the last integral diverges to ∞, it follows that E[etX diverges to
∞. Hence MX (t) doesn’t exists for t > 0, though EX n exists for all n ≥ 1.
Theorem 0.3 Let X1 , X2 , · · · , Xn be indedependent random variables with
mgfs MXi (t) exists on a common interval I, then the mgf of the sum Sn =
X1 + · · · + Xn exists on I and is given by
MSn (t) = Πnk=1 MXk (t), t ∈ I.
0.1. MOMENT GENERATING FUNCTION 5

Proof: We use the following results.


• If X and Y are independent, then E[XY ] = EXEY .

• If X and Y are independent so are f ◦ X and g ◦ Y , where f, g are


Borel functions.
Now

E[etSn ] = E[etX1 · · · etXn ]


= Πni=1 E[etXi ]
= Πni=1 MXi (t), t ∈ I.

This completes the proof.

Example 0.5 Let X ∼ Binomial (n, p). Note that X = X1 + · · · + Xn ,


where Xi ’s are i.i.d. Bernoulli (p). Hence

MX (t) = (MX1 (t))n = (1 − p + pet )n , t ∈ R.

   
n n
MX (t) = (1 − p)n + (1 − p)n−1 pet + (1 − p)n−2 p2 e2t + · · · + pn ent .
1 2
   
(1) n n−1 t n
MX (t) = (1 − p) pe + 2 (1 − p)n−2 p2 e2t + · · · + npn ent ,
1 2
   
(2) n n−1 t 2 n
MX (t) = (1 − p) pe + 2 (1 − p)n−2 p2 e2t + · · · + n2 pn ent ,
1 2
   
(k) n n−1 t k n
MX (t) = (1 − p) pe + 2 (1 − p)n−2 p2 e2t + · · · + nk pn ent ,
1 2
k = 3, 4, · · · .

Therefore
   
k n n−1 k n
EX = (1 − p) p+2 (1 − p)n−2 p2 + · · · + nk pn , k = 1, 2, · · ·
1 2

gives all the moments.

Theorem 0.4 Let X and Y be two random variables such that MX (t) =
MY (t) on some interval I, then X and Y has the same distribution function.

This proof is beyond the scope of this course. Any way we will see a similar
result for the characteristic functions soon.
6

Theorem 0.5 (moment determinant distributions) Let X be a random vari-


able such that the moments µk = EX k , k ≥ 1 exists and satisfies
1 2k1
lim µ2k = 0.
k→∞ 2k

Then if Y is another random variable with EY k = µk for all k ≥ 1, then X


and Y has same distribution.

The proof follows from the Riesz criterion (sufficient condition) for moment
determinancy given by
 µ 1
2k 2k
lim inf < ∞.
k→∞ (2k)!

and the Stirling’s approximation given by

n!
lim √ 1 = 1.
n→∞ 2πnn+ 2 e−n

Remark 0.1 Theorem 0.5 gives a partial converse for Theorem 0.2. i.e., ”
if µk = EX k exists for all k, then MX (t) is uniquely determined by µk ’s” is
not true in general. But with an extra condition that µk doesn’t grow rapidly
1
1 2k
(for example if limk→∞ 2k µ2k = 0, then mgf is uniquely determined. This
gives a partial answer to the question, when given all moments determine a
distribution uniquely, because mgfs determinds distributions uniquely, a fact
we will not state or prove in this course.

Now we will see some more examples.

Example 0.6 Let X ∼ Binomial (n, p). Then


n
X
X= kI{X=k} ,
k=0

with  
n k
P {X = k} = p (1 − p)n−k := pk , k = 0, 1, · · · , n.
k
Hence
n
X
MX (t) = ketk pk , t ∈ R.
k=0
0.2. CHARACTERISTIC FUNCTIONS 7

Therefore
n
(m)
X
MX (t) = k m etk pk , t ∈ R.
k=0

So we get the moments as


n
X
m
µm = EX = k m pk .
k=0

Now note that


0 ≤ µ2m ≤ 22m , m ≥ 1.
Therefore
1 1
lim (µ2m ) 2m = 0.
m→∞ 2m
i.e. Binomial (n, p) is moment determinant.

0.2 Characteristic functions


Definition 7.1(Characteristic functions) The characteristic function of a
random variable X is defined as

ΦX (t) = EeitX , t ∈ R.

(where EeitX = E cos tX + iE sin tX).

A digression: Before going further, I will give some very brief working
knowledge for complex valued functions defined on R.

• Given a function f = f1 +if2 : R → C, f1 , f2 are the real and imaginary


parts of f .

• Given two functions f, g : R → C with f = f1 + if2 , g = g1 + ig2 , we


define

(f + g)(x) := f1 (x) + g1 (x) + i(f2 (x) + g2 (x)), x ∈ R,


(f g)(x) = f (x)g(x)
= f1 (x)g1 (x) − f2 (x)g2 (x) + i(f1 (x)g2 (x) + f2 (x)g1 (x)), x ∈ R
1 1
(x) = , x ∈ R.
f f1 (x) + if2 (x)
8

• Given a finction g : R → R, we say that g is continuous at x if both


g1 and g2 are continuous at x. This equivalent to the following. ”For
each varepsilon > 0, there exists δ > 0 such that
|x − y| < δ ⇒ |g(x) − g(y)| < ε.

• A function g : R → C is uniformly continuous if g1 and g2 are uniformly


continuous. This is equivalent to ”for each ε > 0, there exists a δ > 0
such that
for all x, y ∈ R, with |x − y| < δ ⇒ |g(x) − g(y)| < ε.

• g : R → C is said be differentable at x, if g1 and g2 are differentiable


at x and in this case g 0 (x) = g10 (x) + g20 (x).
• g : R → C is (Riemann) integrable on [a, b] if g1 and g2 are integrable
on [a, b] and in this case
Z b Z b Z b
g(x)dx = g1 (x)dx + i g2 (x)dx.
a a a
These are special class of line integrals.
• Let G : R → C be the primitive of g : R → C, i.e., (G0 (x) = g(x), x ∈
R), then for any a < b,
Z b
g(x)dx = G(b) − G(a).
a
(This is called the fundamental theorem for line integrals).
Example 0.7 Let X ∼ Bernoulli(p). Then
φX (t) = (1 − p) + peit .
Example 0.8 Let X ∼ exponential(λ). Then
Z ∞
ΦX (t) = Ee itX = λ eitx e−λx dx
R ∞0 (it−λ)x
= 0 eh dxi

λ (it−λ)x
= it−λ e
0
λ
= it−λ , t ∈ R.
In the third equality, we used fundamental theorem for line integrals and in
the fourth equality, we used
lim e(it−λ)x = 0.
x→∞
0.2. CHARACTERISTIC FUNCTIONS 9

Theorem 0.6 For any random variable X, its characteristic function φX (·)
is uniformly continuous on R and satisfies
(i) ΦX (0) = 1
(ii) |ΦX (t)| ≤ 1
(iii) ΦX (−t) = ΦX (t) , where for z a complex number, z denote the
conjugate.

Proof:
We prove (iii), (i) and (ii) are exercises.

ΦX (−t) = Ee−itX = E cos tX − iE sin tX


= E cos tX + iE sin tX
= ΦX (t) .
Now we show that ΦX is uniformly continuous. Consider

|ΦX (t + h) − ΦX (t)| = |E(ei(t+h)X − eitX )|,


≤ ihX
p − 1|
E|e
= E 2(1 − cos(hX))
= 2E| sin( hX
2 )|

Note that  hX(ω) 


hX
lim sin = 0, and sin( ) ≤ 1.

h→0 2 2
Hence, using Dominated Convergence theorem, ΦX (t + h) → ΦX (t) uni-
formly in t as h → 0. This imply that ΦX is uniformly continuous.

Theorem 0.7 If the random variable X has finite moments upto order n.
Then Φ has continuous derivatives upto order n.More over
(k)
ik EX k = ΦX (0), k = 1, 2, . . . , n .

Proof.
Consider

ΦX (t + h) − ΦX (t) (eihX − 1)
= E[eitX ]
h h

since |eihx − 1| ≤ |hx|), we get

(eihX − 1)
|eitX | ≤ |X|
h
10

and E|X| < ∞. Hence by Dominated Convergence theorem

(eihX − 1)
lim E[eitX ] = E[iXeitX ] .
h→0 h
Therefore
Φ0X (t) = E[iXeitX ] .
Put t = 0, we get
(1)
ΦX (0) = i EX .
For higher order derivatives, repeat the above arguments.

Theorem 0.8 (Inversion theorem) Let X be a random variable with distri-


bution function F and characteristic function φX (·). Then

e−ita − e−itb
Z
1
F (b) − F (a) = ΦX (t)dt,
2π −∞ it

whenever a, b are points of continuity of F .

Proof. Before proceeding towards the sketch of proof, a word about the
integral on the rhs. The integral on the rhs is interpreted as an improper
Riemann integral which may not be in general absolutely integrable. At this
stage, student may not worry about this.

Consider
Z ∞ −ita Z ∞ −ita
1 e − e−itb 1 e − e−itb itX
ΦX (t)dt = Ee dt
2π −∞ it 2π −∞ Z ∞ −ita it
1 e − e−itb itX
= E e dt (0.1)
2π it
Z ∞ −∞
eit(X−a) − eit(X−b)
= E dt .
−∞ 2πit

The second equality follows from the change of order of integration (This
in fact, requires to consider the integrals on finite intervels say for example
[−T, T ](i.e. proper integrals) and use change of variable formula there and
then let T → ∞). Now

0
eit(X−a) − eit(X−b) e−it(X−a) − e−it(X−b)
Z Z
dt = dt (0.2)
−∞ 2πit 0 2πit
0.2. CHARACTERISTIC FUNCTIONS 11

Hence, using 2iSinθ = eiθ − e−iθ , we have


Z ∞ it(X−a)
− eit(X−b) 1 ∞ Sin t(X − a) 1 ∞ Sin t(X − b)
Z Z
e
dt = dt − dt .
−∞ 2πit π 0 t π 0 t
(0.3)
Using Z ∞
Sin αx π
dx = sgn(α)
0 x 2
we get  1
Z ∞ if X > a
1 Sin t(X − a)  2
dt = 0 if X = a (0.4)
π 0 t  1
− 2 if X < a ,
where 
 −1 if α < 0
sgn(α) = 0 if α = 0
1 if α > 0 .

Similarly, the other integral. Combining (0.1), (0.3) and (0.4), we com-
plete the proof.

Remark
R ∞ sin αx 0.2 In this remark, I will give the computation R ∞ of the integral
sin x
0 x dx. It is enough to compute the Dirichlet integral 0 x dx because
for other values of α, the integral follows from the Dirichlet integral easily.
For example, Z ∞ Z ∞
sin(−x) sin x
dx = − dx.
0 x 0 x
R∞
Before even proceeding to compute this, I will show that 0 sinx x dx is not
absolutely integrable but is integrable. First recall that
Z ∞ Z T
sin x sin x
dx := lim dx.
0 x ε→0,T →∞ ε x
Consider
Z T Z T
sin x 1
dx = (1 − cos x)0 dx
1 x 1 x
h 1 − cos x iT Z T 1 − cos x
integration by parts = + dx
x 1 1 x2
Now since
sin x
lim = 1
x↓0 x
12

sin x
the function f (x) = x ,x > 0, = 1, x = 0 is Riemann integrable and hence
Z 1
sin x
lim dx exists.
ε→0 ε x

Combining above arguments it follows that


Z T
sin x
lim dx
ε→0,T →∞ ε x

exists. i.e. Dirichlet integral exists as improper Riemann integral. Now we


will see that it is not absolutely integrable. Note that
∞ ∞
| sin x| | sin2 x|
Z Z
dx ≥ dx
0 |x| |x|
Z0 ∞
|1 − cos 2x|
= dx
2x
Z0 ∞
1 − cos 2x
= dx
2x
Z0 ∞
1 − cos 2x
≥ dx.
1 2x

The last integral has two parts in which first one is known to diverge to ∞
and the second one converges using the same arguement as for the Dirichlet
integral. Hence the integral on the lhs also diverges.

Now we compute the Dirichlet integral. To this end, first we need the
value of the following. Consider for any given u > 0
Z ∞ Z T
eix e−xu dx = lim e(i−u)x dx
0 T →∞ 0
h e(i−u)x iT
= lim
T →∞i−u 0
−1 u+i
= = .
i−u 1 + u2
Hence equating the real and imaginary parts, we get the values of the fol-
lowing improper Riemann integrals
Z ∞ Z ∞
−xu u 1
cos xe dx = 2
, sin xe−xu dx = .
0 1+u 0 1 + u2
0.2. CHARACTERISTIC FUNCTIONS 13

Now consider
Z ∞ Z ∞ Z ∞
sin x
dx = sin x e−xu dudx
0 x
Z0 ∞ Z ∞ 0

= sin xe−xu dxdu


Z0 ∞ 0
1 π
= du = .
0 1 + u2 2

Theorem 0.9 (Uniqueness Theorem)


Let X1 , X2 be two random variables such that ΦX1 ≡ ΦX2 . Then X1 , X2
have same distribution.

Proof:
Using Inversion theorem, we have

F1 (b) − F1 (a) = F2 (b) − F2 (a)

for all a, b ∈ R such that F1 , F2 are continuous at a and b.


Now let a → −∞, we have

F1 (b) = F2 (b)

for all b at which F1 and F2 are continuous.


Therefore
F1 ≡ F2 (Exercise)

You might also like