You are on page 1of 89

A Collection of Exercises

in
Advanced Probability Theory
The Solutions Manual
of
All Even-Numbered Exercises
from
A First Look at Rigorous Probability Theory
(Second Edition, 2006)
Mohsen Soltanifar
University of Saskatchewan, Canada
mohsen.soltanifar@usask.ca
Longhai Li
University of Saskatchewan, Canada
longhai@math.usask.ca
Jerey S. Rosenthal
University of Toronto, Canada
je@math.toronto.edu
(July 17, 2010.)
Copyright c 2010 by World Scientic Publishing Co. Pte. Ltd.
ii
Published by
World Scientic Publishing Co. Pte. Ltd.
5 Toh Tuck Link, Singapor 596224
USA oce: 27 Warren Street, Suite 401-402, Hackensack, NJ 07601
UK oce: 57 Shelton Street, Covent Garden, London WC2H 9HE
Web site: www.WorldScientic.com
A Collection of Exercises in Advanced Probability Theory
Copyright c _ 2010 by World Scientic Publishing Co. Pte. Ltd.
Preface
I am very pleased that, thanks to the hard work of Mohsen Soltanifar and Longhai Li, this solutions
manual for my book
1
is now available. I hope readers will nd these solutions helpful as you struggle
with learning the foundations of measure-theoretic probability. Of course, you will learn best if you
rst attempt to solve the exercises on your own, and only consult this manual when you are really
stuck (or to check your solution after you think you have it right).
For course instructors, I hope that these solutions will assist you in teaching students, by oering
them some extra guidance and information.
My book has been widely used for self-study, in addition to its use as a course textbook, allowing a
variety of students and professionals to learn the foundations of measure-theoretic probability theory
on their own time. Many self-study students have written to me requesting solutions to help assess
their progress, so I am pleased that this manual will ll that need as well.
Solutions manuals always present a dilemma: providing solutions can be very helpful to students
and self-studiers, but make it dicult for course instructors to assign exercises from the book for course
credit. To balance these competing demands, we considered maintaining a condential instructors
and self-study students only solutions manual, but decided that would be problematic and ultimately
infeasible. Instead, we settled on the compromise of providing a publicly-available solutions manual,
but to even-numbered exercises only. In this way, it is hoped that readers can use the even-numbered
exercise solutions to learn and assess their progress, while instructors can still assign odd-numbered
exercises for course credit as desired.
Of course, this solutions manual may well contain errors, perhaps signicant ones. If you nd some,
then please e-mail me and I will try to correct them promptly. (I also maintain an errata list for the
book itself, on my web site, and will add book corrections there.)
Happy studying!
Jeffrey S. Rosenthal
Toronto, Canada, 2010
jeff@math.toronto.edu
http://probability.ca/jeff/
1
J.S. Rosenthal, A First Look at Rigourous Probability Theory, 2nd ed. World Scientic Publishing, Singapore, 2006.
219 pages. ISBN: 981-270-371-5 / 981-270-371-3 (paperback).
iv
Contents
1 The need for measure theory 1
2 Probability triples 3
3 Further probabilistic foundations 9
4 Expected values 15
5 Inequalities and convergence 21
6 Distributions of random variables 25
7 Stochastic processes and gambling games 29
8 Discrete Markov chains 33
9 More probability theorems 41
10 Weak convergence 45
11 Characteristic functions 51
12 Decompositions of probability laws 55
13 Conditional probability and expectation 59
14 Martingales 63
15 General stochastic processes 71
vi
Chapter 1
The need for measure theory
Exercise 1.3.2. Suppose = 1, 2, 3 and T is a collection of all subsets of . Find (with proof)
necessary and sucient conditions on the real numbers x, y, and z such that there exists a countably
additive probability measure P on T, with x = P1, 2, y = P2, 3, and z = P1, 3.
Solution. The necessary and sucient conditions are: 0 x 1, 0 y 1, 0 z 1, and
x +y +z = 2.
To prove necessity, let P be a probability measure on . Then, for
x = P1, 2 = P1 +P2,
y = P2, 3 = P2 +P3,
and
z = P1, 3 = P1 +P3
we have by denition that 0 x 1, 0 y 1, and 0 z 1, and furthermore we compute that
x +y +z = 2(P1 +P2 +P3) = 2P() = 2,
thus proving the necessity.
Conversely, assume that 0 x 1, 0 y 1, 0 z 1, and x + y + z = 2. Then, dene the
desired countably additive probability measure P as follows:
P() = 0,
P1 = 1 y,
P2 = 1 z,
P3 = 1 x,
P1, 2 = x,
P1, 3 = z,
P2, 3 = y,
P1, 2, 3 = 1.
2 Chapter 1: The need for measure theory
It is easily checked directly that for any two disjoint sets A, B , we have
P(A B) = P(A) +P(B)
For example, if A = 1 and B = 2, then since x + y + z = 2, P(A B) = P1, 2 = x while
P(A) +P(B) = P1 +P2 = (1 y) + (1 z) = 2 y z = (x +y +z) y z = x = P(A B).
Hence, P is the desired probability measure, proving the suciency.
Exercise 1.3.4. Suppose that = N, and P is dened for all A by P(A) = [A[ if A is nite
(where [A[ is the number of elements in the subset A), and P(A) = if A is innite. This P is of
course not a probability measure(in fact it is counting measure), however we can still ask the following.
(By convention, += .)
(a) Is P nitely additive?
(b) Is P countably additive?
Solution.(a) Yes. Let A, B be disjoint. We consider two dierent cases.
Case 1: At least one of A or B is innite. Then A B is innite. Consequently, P(A B) and at
least one of P(A) or P(B) will be innite. Hence, P(A B) = and P(A) + P(B) = , implying
P(A B) = = P(A) +P(B).
Case 2: Both of A and B are nite. Then P(A B) = [A B[ = [A[ +[B[ = P(A) +P(B).
Accordingly, P is nitely additive.
(b) Yes. Let A
1
, A
2
, be a sequence of disjoint subsets of . We consider two dierent cases.
Case 1: At least one of A
n
s is innite. Then

n=1
A
n
is innite. Consequently, P(

n=1
A
n
) and
at least one of P(A
n
)s will be innite. Hence, P(

n=1
A
n
) = and

n=1
P(A
n
) = , implying
P(

n=1
A
n
) = =

n=1
P(A
n
).
Case 2: All of A
n
s are nite. Then depending on niteness of

n=1
A
n
we consider two cases. First,
let

n=1
A
n
be innite, then, P(

n=1
A
n
) = =

n=1
[A
n
[ =

n=1
P(A
n
). Second, let

n=1
A
n
be
nite, then,
P(

n=1
A
n
) = [

n=1
A
n
[ =

n=1
[A
n
[ =

n=1
P(A
n
).
Accordingly, P is countably additive.
Chapter 2
Probability triples
Exercise 2.7.2. Let = 1, 2, 3, 4, and let = 1, 2. Describe explicitly the -algebra ()
generated by .
Solution.
() = , 1, 2, 1, 2, 3, 4, 1, 3, 4, 2, 3, 4, .

Exercise 2.7.4. Let T


1
, T
2
, be a sequence of collections of subsets of , such that T
n
T
n+1
for
each n.
(a) Suppose that each T
i
is an algebra. Prove that

i=1
T
i
is also an algebra.
(b) Suppose that each T
i
is a -algebra. Show (by counterexample) that

i=1
T
i
might not be a -
algebra.
Solution.(a) First, since , T
1
andT
1

i=1
T
i
, we have ,

i=1
T
i
. Second, let A

i=1
T
i
,
then A T
i
for some i. On the other hand, A
c
T
i
and T
i

i=1
T
i
, implying A
c

i=1
T
i
.
Third, let A, B

i=1
T
i
, then A T
i
and B T
j
, for some i, j. However, A, B T
max(i,j)
yielding
A B T
max(i,j)
. On the other had, T
max(i,j)

i=1
T
i
implying A B

i=1
T
i
.
(b) Put
i
= j
i
j=1
, and let T
i
be the -algebra of the collection of all subsets of
i
for i N. Suppose
that

i=1
T
i
is also a -algebra. Since, for each i, i T
i
and T
i

i=1
T
i
we have i

i=1
T
i
.
Thus, by our primary assumption, N =

i=1
i

i=1
T
i
and, therefore, N T
i
for some i, which
implies N
i
, a contradiction. Hence,

i=1
T
i
is not a -algebra.
Exercise 2.7.6. Suppose that = [0, 1] is the unit interval, and T is the set of all subsets A such
that either A or A
c
is nite, and P is dened by P(A) = 0 if A is nite, and P(A) = 1 if A
c
is nite.
(a) Is T an algebra?
(b) Is T a -algebra?
(c) Is P nitely additive?
(d) Is P countably additive on T (as the previous exercise)?
4 Chapter 2: Probability triples
Solution. (a) Yes. First, since is nite and
c
= is nite, we have , T. Second, let A T,
then either A or A
c
is nite implying either A
c
or A is nite, hence, A
c
T. Third, let A, B T.
Then, we have several cases:
(i) A nite (A
c
innite):
(i-i) B nite (B
c
innite): A B nite , (A B)
c
= A
c
B
c
innite
(i-ii) B
c
nite (B innite): A B innite , (A B)
c
= A
c
B
c
nite
(ii) A
c
nite (A innite):
(ii-i) B nite (B
c
innite): A B innite , (A B)
c
= A
c
B
c
nite
(ii-ii) B
c
nite (B innite): A B innite , (A B)
c
= A
c
B
c
nite.
Hence, A B T.
(b) No. For any n N,
1
n
T. But,
1
n

n=1
/ T.
(c) Yes. let A, B T be disjoint. Then, we have several cases:
(i) A nite (A
c
innite):
(i-i) B nite (B
c
innite): P(A B) = 0 = 0 + 0 = P(A) +P(B)
(i-ii) B
c
nite (B innite): P(A B) = 1 = 0 + 1 = P(A) +P(B)
(ii) A
c
nite (A innite):
(ii-i) B nite (B
c
innite): P(A B) = 1 = 1 + 0 = P(A) +P(B)
(ii-ii) B
c
nite (B innite): Since A
c
B
c
is nite, A
c
B
c
,= implying A B ,= .
Hence, P(A B) = P(A) +P(B).
(d) Yes. Let A
1
, A
2
, T be disjoint such that

n=1
A
n
T. Then, there are two cases:
(i)

n=1
A
n
nite:
In this case, for each n N , A
n
is nite. Therefore, P(

n=1
A
n
) = 0 =

n=1
0 =

n=1
P(A
n
).
(ii) (

n=1
A
n
)
c
nite (

n=1
A
n
innite):
In this case, there is some n
0
N such that A
n
0
is innite.
(In fact, if all A
n
s are nite, then

n=1
A
n
will be countable. Hence it has Lebesgue measure zero
implying that its complement has Lebesgue measure one. On the other hand, its complement is nite
having Lebesgue measure zero, a contradiction.)
Now, let n ,= n
0
, then A
n
A
n
0
= yields A
n
A
c
n
0
. But A
c
n
0
is nite, implying that A
n
is nite.
Therefore:
P(

n=1
A
n
) = 1 = 1 + 0 = P(A
n
0
) +

n,=n
0
0 = P(A
n
0
) +

n,=n
0
P(A
n
0
) =

n=1
P(A
n
).
Accordingly, P is countably additive on T.
Exercise 2.7.8. For the example of Exercise 2.7.7, is P uncountably additive (cf. page 2)?
Chapter 2: Probability triples 5
Solution. No, if it is uncountably additive, then:
1 = P([0, 1]) = P(
x[0,1]
x) =

x[0,1]
P(x) =

x[0,1]
0 = 0,
a contradiction.
Exercise 2.7.10. Prove that the collection of (2.5.10) is a semi-algebra.
Solution. First, by denition , R . Second, let A
1
, A
2
. If A
i
= (), then A
1
A
2
=
(A
j
) . Assume, A
1
, A
2
,= , , then we have the following cases:
(i) A
1
= (, x
1
]:
(i-i) A
2
= (, x
2
]: A
1
A
2
= (, min(x
1
, x
2
)]
(i-ii) A
2
= (y
2
, ): A
1
A
2
= (y
2
, x
1
]
(i-iii) A
2
= (x
2
, y
2
]: A
1
A
2
= (x
2
, min(x
1
, y
2
)]
(ii) A
1
= (y
1
, ):
(ii-i) A
2
= (, x
2
]: A
1
A
2
= (y
1
, x
2
]
(ii-ii) A
2
= (y
2
, ): A
1
A
2
= (max(y
1
, y
2
), )
(ii-iii) A
2
= (x
2
, y
2
]: A
1
A
2
= (max(x
2
, y
1
), y
2
]
(iii) A
1
= (x
1
, y
1
]:
(iii-i) A
2
= (, x
2
]: A
1
A
2
= (x
1
, min(x
2
, y
1
)]
(iii-ii) A
2
= (y
2
, ): A
1
A
2
= (max(y
2
, x
1
), y
1
]
(iii-iii) A
2
= (x
2
, y
2
]: A
1
A
2
= (max(x
1
, x
2
), min(y
1
, y
2
)] .
Accordingly, A
1
A
2
. Now, the general case is easily proved by induction (Check!).
Third, let A . If A = (), then A
c
= () . If A = (, x], then A
c
= (x, ) . If
A = (y, ), then A
c
= (, y] . Finally, if A = (x, y], then A
c
= (, x] (y, ) where both
disjoint components are in .
Exercise 2.7.12. Let K be the Cantor set as dened in Subsection 2.4. Let D
n
= K
1
n
where K
1
n
is dened as in (1.2.4). Let B =

n=1
D
n
.
(a) Draw a rough sketch of D
3
.
(b) What is (D
3
)?
(c) Draw a rough sketch of B.
(d) What is (B)?
Solution.(a)
Figure 1: Constructing the sketch of the set D
3
= K
1
3
(b) (D
3
) = (k
1
3
) = (K) = 0.
6 Chapter 2: Probability triples
(c)
Figure 2: Constructing the sketch of the set B =

n=1
D
n
In Figure 2, the line one illustrates a rough sketch of the set D
1
, the line two illustrates a rough sketch
of
2
n=1
D
n
, the line three illustrates a rough sketch of
3
n=1
D
n
, and so on.
(d) From (D
n
) = (k
1
n
) = (K) = 0 for all n N,and (B)

n=1
(D
n
) it follows that
(B) = 0.
Exercise. 2.7.14. Let = 1, 2, 3, 4, with T the collection of all subsets of . Let P and Q be
two probability measures onT, such that P1 = P2 = P3 = P4 =
1
4
, and Q2 = Q4 =
1
2
,
extended to T by linearity. Finally, let = , , 1, 2, 2, 3, 3, 4, 1, 4.
(a) Prove that P(A) = Q(A) for all A .
(b) Prove that there is A () with P(A) ,= Q(A).
(c) Why does this not contradict Proposition 2.5.8?
Solution. (a)
P() = 0 = Q(),
P() = 1 = Q(),
Pa, b = Pa +Pb =
1
4
+
1
4
=
1
2
= Qa +Qb = Qa, bfor alla ,= b.
(b) Take A = 1, 2, 3 = 1, 2 2, 3 (). Then:
P(A) =
3

i=1
P(i) =
3
4
,=
1
2
=
3

i=1
Q(i) = Q(A).
(c) Since 1, 2, 2, 3 and 1, 2 2, 3 = 2 / , the set is not a semi-algebra. Thus, the
hypothesis of the proposition 2.5.8. is not satised by .
Exercise 2.7.16. (a) Where in the proof of Theorem 2.3.1. was assumption (2.3.3) used?
(b) How would the conclusion of Theorem 2.3.1 be modied if assumption (2.3.3) were dropped(but
all other assumptions remained the same)?
Solution.(a) It was used in the proof of Lemma 2.3.5.
(b) In the assertion of the Theorem 2.3.1, the equality P

(A) = P(A) will be replaced by P

(A) P(A)
for all A .
Chapter 2: Probability triples 7
Exercise 2.7.18. Let = 1, 2, = , , 1, P() = 0, P() = 1, and P1 =
1
3
.
(a) Can Theorem 2.3.1, Corollary 2.5.1, or Corollary 2.5.4 be applied in this case? Why or why not?
(b) Can this P be extended to a valid probability measure? Explain.
Solution.(a) No. Because is not a semi-algebra (in fact 1 but 1
c
= 2 cannot be written
as a union of disjoint elements of .
(b) Yes. It is sucient to put /= , , 1, 2 and P

(A) = P(A) if A ,
2
3
if A = 2.
Exercise 2.7.20. Let P and Q be two probability measures dened on the same sample space and
-algebra T.
(a) Suppose that P(A) = Q(A) for all A T with P(A)
1
2
. Prove that P = Q. i.e. that P(A) = Q(A)
for all A T.
(b) Give an example where P(A) = Q(A) for all A T with P(A) <
1
2
, but such that P ,= Q. i.e. that
P(A) ,= Q(A) for some A T.
Solution.(a) Let A T. If P(A)
1
2
, then P(A) = Q(A), by assumption. If P(A) >
1
2
, then
P(A
c
) <
1
2
. Therefore, 1 P(A) = P(A
c
) = Q(A
c
) = 1 Q(A) implying P(A) = Q(A).
(b) Take = 1, 2 and T = , 1, 2, 1, 2. Dene P, Q respectively as follows:
P() = 0, P1 =
1
2
, P2 =
1
2
, andP() = 1.
Q() = 0, Q1 =
1
3
, Q2 =
2
3
, andP() = 1.

Exercise 2.7.22. Let (


1
, T
1
, P
1
) be Lebesgue measure on [0, 1]. Consider a second probability triple
(
2
, T
2
, P
2
), dened as follows:
2
= 1, 2, T
2
consists of all subsets of
2
, and P
2
is dened by
P
2
1 =
1
3
, P
2
2 =
2
3
, and additivity. Let (, T, P) be the product measure of (
1
, T
1
, P
1
) and
(
2
, T
2
, P
2
).
(a) Express each of , T, and P as explicitly as possible.
(b) Find a set A T such that P(A) =
3
4
.
Solution.(a) Here
T = A, A1, A2, A1, 2 : A T
1
.
Then
P(AB) = 0 if B = ,
(A)
3
if B = 1,
2(A)
3
if B = 2, and (A) if B = 1, 2.
(b) Take A = [0,
3
4
] 1, 2, then P(A) = [0,
3
4
] = 3/4.
8 Chapter 2: Probability triples
Chapter 3
Further probabilistic foundations
Exercise 3.6.2. Let (, T, P) be Lebesgue measure on [0, 1]. Let A = (
1
2
,
3
4
) and B = (0,
2
3
). Are A
and B independent events?
Solution. Yes. In this case, P(A) =
1
4
, P(B) =
2
3
, and P(A B) = P((
1
2
,
2
3
)) =
1
6
. Hence:
P(A B) =
1
6
=
1
4
2
3
= P(A)P(B).

Exercise 3.6.4. Suppose A


n
A. Let f : R be any function. Prove that lim
n
inf
wAn
f(w) =
inf
wA
f(w).
Solution. Given > 0. Using the denition of inmum, there exists w

A such that f(w

)
inf
wA
f(w) +. On the other hand, A =

n=1
A
n
and A
n
A, therefor, there exists N N such that
for any n N with n N we have w

A
n
, implying inf
wAn
f(w) f(w

). Combining two recent


results yields:
inf
wAn
f(w) < inf
wA
f(w) + n = N, N + 1, ....()
Next, since A
N
A
N+1
A, for any n N with n N we have inf
wA
f(w) < inf
wA
f(w)
inf
wAn
f(w). Accordingly:
inf
wA
f(w) < inf
wAn
f(w) n = N, N + 1, ....()
Finally, by () and ():
[ inf
wAn
f(w) inf
wA
f(w)[ < n = N, N + 1, ...,
proving the assertion.
Exercise 3.6.6. Let X, Y, and Z be three independent random variables, and set W = X + Y . Let
10 Chapter 3: Further probabilistic foundations
B
k,n
= (n 1)2
k
X < n2
k
and let C
k,m
= (m1)2
k
Y < m2
k
. Let
A
k
=
_
n,mZ:(n+m)2
k
<x
(B
k,n
C
k,m
).
Fix x, z R, and let A = X +Y < x = W < x and D = Z < z.
(a) Prove that A
k
A.
(b) Prove that A
k
and D are independent.
(c) By continuity of probabilities, prove that A and D are independent.
(d) Use this to prove that W and Z are independent.
Solution.(a) We have:
A
k
=
_
n,mZ:
(n+m)
2
k
<x
(B
k,n
C
k,m
)
=
_
n,mZ:
(n+m)
2
k
<x

(m+n 2)
2
k
X +Y <
(m+n)
2
k

=
_
n,mZ:
(n+m)
2
k
<x
m+n:
(m+n)
2
k
<x
_
m+n=

(m+n 2)
2
k
X +Y <
(m+n)
2
k

=
_
n,mZ:
(n+m)
2
k
<x
X +Y <
(m+n)
2
k
.
On the other hand, using base 2 digit expansion of x yields
(m+n)
2
k
x as k . Hence, A
k
A.
(b)For any k N we have that:
P(A
k
D) = P(
_
n,mZ:
(n+m)
2
k
<x
(B
k,n
C
k,m
D))
=

n,mZ:
(n+m)
2
k
<x
P(B
k,n
C
k,m
D)
=

n,mZ:
(n+m)
2
k
<x
P(B
k,n
C
k,m
)P(D)
= P(A
k
)P(D).
(c) Using part (b):
P(A D) = lim
k
P(A
k
D) = lim
k
(P(A
k
)P(D)) = P(A)P(D).
(d) This is the consequence of part (c) and Proposition 3.2.4.
Chapter 3: Further probabilistic foundations 11
Exercise 3.6.8. Let be Lebesgue measure on [0, 1], and let 0 a b c d 1 be arbitrary real
numbers with d b +c a. Give an example of a sequence A
1
, A
2
, of intervals in [0, 1], such that
(liminf
n
A
n
) = a, liminf
n
(A
n
) = b, limsup
n
(A
n
) = c, and (limsup
n
A
n
) = d. For bonus points,
solve the question when d < b +c a, with each A
n
a nite union of intervals.
Solution. Let e = (d +a) (b +c), and consider:
A
3n
= (0, b +e),
A
3n1
= (e, b +e),
A
3n2
= (b a +e, c +b a +e),
for all n N. Then:
(liminf
n
A
n
) = (b a +e, b +e) = a,
liminf
n
(A
n
) = (e, b +e) = b,
limsup
n
(A
n
) = (b a +e, c +b a +e) = c,
(limsup
n
A
n
) = (0, d) = d,
where b +e c or d 2c a.
Exercise 3.6.10. Let A
1
, A
2
, be a sequence of events, and let N N. Suppose there are events B
and C such that B A
n
C for all n N, and such that P(B) = P(C). Prove that P(liminf
n
A
n
) =
P(limsup
n
A
n
) = P(B) = P(C).
Solution. Since:
B

n=N
A
n

m=1

n=m
A
n

m=1

n=m
A
n

n=N
A
n
C,
P(B) P(liminf
n
A
n
) P(limsup
n
A
n
) P(C). Now, using the condition P(B) = P(C), yields the
desired result.
Exercise 3.6.12. Let X be a random variable with P(X > 0) > 0. Prove that there is a > 0 such
that P(X ) > 0.[Hint: Dont forget continuity of probabilities.]
Solution. Method (1):
Put A = X > 0 and A
n
= X
1
n
for all n N. Then, A
n
A and using proposition 3.3.1,
lim
n
P(A
n
) = P(A). But, P(A) > 0, therefore , there is N N such that for all n N with n N we
have P(A
n
) > 0. In particular, P(A
N
) > 0. Take, =
1
N
.
Method (2):
Put A = X > 0 and A
n
= X
1
n
for all n N. Then, A =

n=1
A
n
and P(A)

n=1
P(A
n
). If
for any n N, P(A
n
) = 0, then using recent result, P(A) = 0, a contradiction. Therefore, there is at
12 Chapter 3: Further probabilistic foundations
least one N N such that P(A
N
) > 0. Take, =
1
N
.
Exercise 3.6.14. Let , > 0, and let X
1
, X
2
, be a sequence of non-negative independent random
variables such that P(X
i
) for all i. Prove that with probability one,

i=1
X
i
= .
Solution. Since P(X
i
) for all i,

i=1
P(X
i
) = , and by Borel-Cantelli Lemma:
P(limsup
i
(X
i
)) = 1.()
On the other hand,
limsup
i
(X
i
) (

i=1
X
i
= )
(in fact, let w limsup
i
(X
i
), then there exists a sequence i
j

j=1
such that X
i
j
for all j N,
yielding

j=1
X
i
j
(w) = , and consequently,

i=1
X
i
(w) = . This implies w (

i=1
X
i
= )).
Consequently:
P(limsup
i
(X
i
)) P(

i=1
X
i
= ).()
Now, by () and () it follows P(

i=1
X
i
= ) = 1.
Exercise 3.6.16. Consider innite, independent, fair coin tossing as in subsection 2.6, and let H
n
be
the event that the n
th
coin is heads. Determine the following probabilities.
(a) P(
9
i=1
H
n+i
i.o.).
(b) P(
n
i=1
H
n+i
i.o.).
(c) P(
[2 log
2
n]
i=1
H
n+i
i.o.).
(d)Prove that P(
[log
2
n]
i=1
H
n+i
i.o.) must equal either 0 or1.
(e) Determine P(
[log
2
n]
i=1
H
n+i
i.o.).[Hint: Find the right subsequence of indices.]
Solution.(a) First of all, put A
n
=
9
i=1
H
n+i
, (n 1), then :

n=1
P(A
n
) =

n=1
(
1
2
)
9
= .
But the events A
n
, (n 1) are not independent. Consider the independent subsequence B
n
=
A
f(n)
, (n 1) where the function f is given by f(n) = 10n, (n 1). Besides,

n=1
P(B
n
) =

n=1
P(
9
i=1
H
10n+i
) =

n=1
(
1
2
)
9
= .
Now,using Borel-Cantelli Lemma, P(B
n
i.o.) = 1 implying P(A
n
i.o.) = 1.
Chapter 3: Further probabilistic foundations 13
(b) Put A
n
=
n
i=1
H
n+i
, (n 1). Since

n=1
P(A
n
) =

n=1
(
1
2
)
n
= 1 < ,
the Borel-Cantelli Lemma implies P(A
n
i.o.) = 0.
(c) Put A
n
=
[2 log
2
n]
i=1
H
n+i
, (n 1). Since

n=1
P(A
n
) =

n=1
(
1
2
)
[2 log
2
n]

n=1
(
1
n
)
2
< ,
Borel-Cantelli Lemma implies P(A
n
i.o.) = 0.
(d),(e) Put A
n
=
[log
2
n]
i=1
H
n+i
, (n 1), then :

n=1
P(A
n
) =

n=1
(
1
2
)
[log
2
n]
= .
But the events A
n
, (n 1) are not independent. Consider the independent subsequence B
n
=
A
f(n)
, (n 1) where the function f is given by f(n) = [nlog
2
(n
2
)], (n 1). In addition,

n=1
P(B
n
) =

n=1
P(
[log
2
f(n)]
i=1
H
f(n)+i
) =

n=1
(
1
2
)
log
2
f(n)
= .
Using Borel-Cantelli Lemma, P(B
n
i.o.) = 1 implying P(A
n
i.o.) = 1.
Exercise. 3.6.18. Let A
1
, A
2
, be any independent sequence of events, and let S
x
= lim
n
1
n

n
i=1
1
A
i

x. Prove that for each x R we have P(S
x
) = 0 or 1.
Solution. For a xed (m 1), we have:
S
x
= lim
n
1
n
n

i=m
1
A
i
x =

s=1

N=m

n=N

1
n
n

i=m
1
A
i
x +
1
s
,
and

1
n
n

i=m
1
A
i
x +
1
s
(1
Am
, 1
A
m+1
, ),
which imply that
S
x
(1
Am
, 1
A
m+1
, ),
yielding:S
x

m=1
(1
Am
, 1
A
m+1
, ). Consequently, by Theorem (3.5.1), P(S
x
) = 0 or 1.
14 Chapter 3: Further probabilistic foundations
Chapter 4
Expected values
Exercise 4.5.2. Let X be a random variable with nite mean, and let a R be any real number.
Prove that E(max(X, a)) max(E(X), a).
Solution. Since E(.) is order preserving, from
max(X, a) X
it follows E(max(X, a)) E(X). Similarly, from
max(X, a) a
it follows that E(max(X, a)) E(a) = a.
Combining the recent result yields,
E(max(X, a)) max(E(X), a).

Exercise 4.5.4. Let (, T, P) be the uniform distribution on = 1, 2, 3, as in Example 2.2.2.


Find random variables X, Y , and Z on (, T, P) such that P(X > Y )P(Y > Z)P(Z > X) > 0, and
E(X) = E(Y ) = E(Z).
Solution. Put:
X = 1
1
, Y = 1
2
, Z = 1
3
.
Then,
E(X) = E(Y ) = E(Z) =
1
3
.
Besides, P(X > Y ) = P(Y > Z) = P(Z > X) =
1
3
, implying:
P(X > Y )P(Y > Z)P(Z > X) = (
1
3
)
3
> 0.

16 Chapter 4: Expected values


Exercise 4.5.6. Let X be a random variable dened on Lebesgue measure on [0, 1], and suppose that
X is a one to one function, i.e. that if w
1
= w
2
then X(w
1
) ,= X(w
2
). Prove that X is not a simple
random variable.
Solution. Suppose X be a simple random variable. Since [X([0, 1])[ <
0
< c = [[0, 1][ (where [[ refers
to the cardinality of the considered sets), we conclude that there is at least one y X([0, 1]) such that
for at least two elements w
1
, w
2
[0, 1] we have X(w
1
) = y = X(w
2
), contradicting injectivity of X.
Exercise 4.5.8. Let f(x) = ax
2
+ bx + c be a second degree polynomial function (where a, b, c R
are constants).
(a) Find necessary and sucient conditions on a, b, and c such that the equation E(f(X)) =

2
E(f(X)) holds for all R and all random variables X.
(b) Find necessary and sucient conditions on a, b, and c such that the equation E(f(X )) =
E(f(X)) holds for all R and all random variables X.
(c) Do parts (a) and (b) account for the properties of the variance function? Why or why not?
Solution. (a) Let for f(x) = ax
2
+bx +c we have E(f(X)) =
2
E(f(X)) for all R and all ran-
dom variables X. Then, a straightforward computation shows that the recent condition is equivalent
to :
X : (b
2
b)E(X) + (1
2
)E(c) = 0.
Consider a random variable X with E(X) ,= 0. Put = 1 , we obtain b = 0. Moreover, put = 0
we obtain c = 0. Hence,
f(x) = ax
2
.
Conversely, if f(x) = ax
2
then a simple calculation shows that E(f(X)) =
2
E(f(X)) for all R
and all random variables X (Check!).
(b) Let for f(x) = ax
2
+bx+c we have E(f(X)) = E(f(X)) for all R and all random variables
X. Then, a straightforward computation shows that the recent condition is equivalent to :
X : (2a)E(X) + (a
2
b) = 0.
Consider a random variable X with E(X) = 0. Then, for any R we have (a
2
b) = 0, implying
a = b = 0. Hence,
f(x) = c.
Conversely, if f(x) = c then a simple calculation shows that E(f(X)) = E(f(X)) for all R and
all random variables X (Check!).
(c) No. Assume, to reach a contradiction, that V ar(X) can be written in the form E(f(X)) for some
f(x) = ax
2
+bx +c. Then:
X : E(X
2
) E
2
(X) = aE(X
2
) +bE(X) +c.()
Consider a random variable X with E(X) = E(X
2
) = 0. Substituting it in () implies c = 0. Second,
consider a random variable X with E(X) = 0 and E(X
2
) ,= 0. Substituting it in () implies a = 1.
Chapter 4: Expected values 17
Now consider two random variables X
1
, and X
2
with E(X
1
) = 1 and E(X
2
) = 1, respectively.
Substituting them in () implies b = 1 and b = 1, respectively. Thus, 1=-1, a contradiction.
Exercise 4.5.10. Let X
1
, X
2
, be i.i.d. with mean and variance
2
, and let N be an integer-
valued random variable with mean m and variance , with N independent of all the X
i
. Let S =
X
1
+ +X
N
=

i=1
X
i
1
Ni
. Compute V ar(S) in terms of ,
2
, m, and .
Solution. We compute the components of
V ar(S) = E(S
2
) E(S)
2
as follows:
E(S
2
) = E((

i=1
X
i
1
Ni
)
2
)
= E(

i=1
X
2
i
1
2
Ni
+ 2

i,=j
X
i
1
Ni
X
j
1
Nj
)
=

i=1
E(X
2
i
)E(1
2
Ni
) + 2

i,=j
E(X
i
)E(1
Ni
)E(X
j
)E(1
Nj
),
and,
E(S)
2
= (E(

i=1
X
i
1
Ni
))
2
= (

i=1
E(X
i
)E(1
Ni
))
2
=

i=1
E(X
i
)
2
E(1
Ni
)
2
+ 2

i,=j
E(X
i
)E(1
Ni
)E(X
j
)E(1
Nj
).
Hence:
V ar(S) =

i=1
E(X
2
i
)E(1
2
Ni
)

i=1
E(X
i
)
2
E(1
Ni
)
2
= (
2
+
2
)

i=1
E(1
Ni
)
2

i=1
E(1
Ni
)
2
=
2

i=1
E(1
Ni
) +
2

i=1
(E(1
Ni
) E(1
Ni
)
2
)
=
2

i=1
E(1
Ni
) +
2

i=1
V ar(1
Ni
)
=
2
E(N) +
2
V ar(N)
=
2
.m+
2
..
18 Chapter 4: Expected values

Exercise. 4.5.12. Let X and Y be independent general nonnegative random variables, and let
X
n
=
n
(X), where
n
(x) = min(n, 2
n
2
n
x|) as in proposition 4.2.5.
(a) Give an example of a sequence of functions
n
: [0, ) [0, ), other that
n
(x) =
n
(x), such
that for all x, 0
n
(x) x and
n
(x) x as n .
(b) Suppose Y
n
=
n
(Y ) with
n
as in part (a). Must X
n
and Y
n
be independent?
(c) Suppose Y
n
is an arbitrary collection of non-negative simple random variables such that Y
n
Y .
Must X
n
and Y
n
be independent?
(d) Under the assumptions of part (c), determine(with proof) which quantities in equation (4.2.7) are
necessarily equal.
Solution. (a) Put:

n
(x) =

m=0
(f
n
(x m) +m)1
[m,m+1)
(x),
where
f
n
(x) =
2
n1
1

k=0
(2
n1
(x
k
2
n1
)
2
+
k
2
n1
)1
[
k
2
n1
,
k+1
2
n1
)
(x)
for all 0 x < 1,and n = 1, 2, .
Then,
n
has all the required properties (Check!).
(b) Since
n
(Y ) is a Borel measurable function, by proposition 3.2.3, X
n
=
n
(X) and Y
n
=
n
(Y )
are independent random variables.
(c) No. It is sucient to consider:
Y
n
= max(
n
(Y )
1
n
2
X
n
, 0),
for all n N.
(d) Since X
n
X, Y
n
Y and X
n
Y
n
XY , using Theorem 4.2.2., lim
n
E(X
n
) = E(X),lim
n
E(Y
n
) =
E(Y ) and lim
n
E(X
n
Y
n
) = E(XY ). Hence:
lim
n
E(X
n
)E(Y
n
) = E(X)E(Y )
and
lim
n
E(X
n
Y
n
) = E(XY ).

Exercise. 4.5.14. Let Z


1
, Z
2
, be general random variables with E([Z
i
[) < , and let Z =
Z
1
+Z
2
+ .
Chapter 4: Expected values 19
(a) Suppose

i
E(Z
+
i
) < and

i
E(Z

i
) < . Prove that E(Z) =

i
E(Z
i
).
(b) Show that we still have E(Z) =

i
E(Z
i
) if we have at least one of

i
E(Z
+
i
) < or

i
E(Z

i
) <
.
(c) Let Z
i
be independent, with P(Z
i
= 1) = P(Z
i
= 1) =
1
2
for each i. Does E(Z) =

i
E(Z
i
) in
this case? How does that relate to (4.2.8)?
Solution.(a)
E(Z) = E(

i
Z
i
)
= E(

i
(Z
+
i
Z

i
))
= E(

i
Z
+
i

i
Z

i
)
= E(

i
Z
+
i
) E(

i
Z

i
)
=

i
E(Z
+
i
)

i
E(Z

i
)
=

i
(E(Z
+
i
) E(Z

i
))
=

i
E(Z
i
).
(b) We prove the assertion for the case

i
E(Z
+
i
) = and

i
E(Z

i
) < (the proof of other case is
analogous.). Similar to part (a) we have:
E(Z) = E(

i
Z
i
)
= E(

i
(Z
+
i
Z

i
))
= E(

i
Z
+
i

i
Z

i
)
= E(

i
Z
+
i
) E(

i
Z

i
)
=

i
E(Z
+
i
)

i
E(Z

i
)
=
=

i
(E(Z
+
i
) E(Z

i
))
=

i
E(Z
i
).
(c) Since E(Z
i
) = 0 for all i,

i
E(Z
i
) = 0. On the other hand, E(Z) is undened, hence E(Z) ,=

i
E(Z
i
). This example shows if X
n

n=1
are not non-negative, then the equation (4.2.8) may fail.
20 Chapter 4: Expected values
Chapter 5
Inequalities and convergence
Exercise. 5.5.2. Give an example of a random variable X and > 0 such that P(X ) >
E(X)/.[Hint: Obviously X cannot be non-negative.] Where does the proof of Markovs inequality
break down in this case?
Solution.Part one: Let (, T, P) be the Lebesgue measure on [0,1]. Dene X : [0, 1] R by
X(w) = (1
[0,
1
2
]
1
(
1
2
,1]
)(w).
Then, by Theorem 4.4, E(X) =
_
1
0
X(w)dw = 0. However,
P(X
1
2
) =
1
2
> 0 = E(X)/
1
2
.
Part two: In the denition of Z we will not have Z X.
Exercise 5.5.4. Suppose X is a nonnegative random variable with E(X) = . What does Markovs
inequality say in this case?
Solution. In this case, it will be reduced to the trivial inequality P(X ) .
Exercise 5.5.6. For general jointly dened random variables X and Y , prove that [Corr(X, Y )[
1.[Hint: Dont forget the Cauchy-Schwarz inequality.]
Solution.Method(1):
By Cauchy-Schwarz inequality:
[Corr(X, Y )[ = [
Cov(X, Y )
_
V ar(X)V ar(Y )
[ = [
E((X
X
)(Y
y
))
_
E((X
X
)
2
)E((Y
y
)
2
)
[
E([(X
X
)(Y
y
)[)
_
E((X
X
)
2
)E((Y
y
)
2
)
1.
Method (2):
Since:
0 V ar(
X
_
V ar(X)
+
Y
_
V ar(Y )
) = 1 + 1 + 2
Cov(X, Y )
_
V ar(X)
_
V ar(Y )
= 2(1 + Corr(X, Y ))
22 Chapter 5: Inequalities and convergence
we conclude:
1 Corr(X, Y ).()
On the other hand, from:
0 V ar(
X
_
V ar(X)

Y
_
V ar(Y )
) = 1 + 1 2
Cov(X, Y )
_
V ar(X)
_
V ar(Y )
= 2(1 Corr(X, Y )),
if follows:
Corr(X, Y ) 1.()
Accordingly, by () and () the desired result follows.
Exercise 5.5.8. Let (x) = x
2
.
(a) Prove that is a convex function.
(b) What does Jensens inequality say for this choice of ?
(c) Where in the text have we already seen the result of part (b)?
Solution.(a) Let have a second derivative at each point of (a, b). Then is convex on (a, b) if and
only if
tt
(x) 0 for each x (a, b). Since in this problem
tt
(x) = 2 0, using the recent proposition
it follows that is a convex function.
(b) E(X
2
) E
2
(X).
(c) We have seen it in page 44, as the rst property of V ar(X).
Exercise 5.5.10. Let X
1
, X
2
, be a sequence of random variables, with E(X
n
) = 8 and V ar(X
n
) =
1/

n for each n. Prove or disprove that X


n
must converge to 8 in probability.
Solution. Given > 0. Using Proposition 5.1.2:
P([X
n
8[ ) V ar(X
n
)/
2
= 1/

n
2
,
for all n N. Let n , then:
lim
n
P([X
n
8[ ) = 0.
Hence, X
n
must converge to 8 in probability.
Exercise 5.5.12. Give (with proof) an example of two discrete random variables having the same
mean and the same variance, but which are not identically distributed.
Solution. As the rst example, let
= 1, 2, 3, 4, T = T(), P(X = i) = p
i
Chapter 5: Inequalities and convergence 23
and P(Y = i) = q
i
where
(p
1
, p
2
, p
3
, p
4
) = (
8
96
,
54
96
,
12
96
,
22
96
)
and
(q
1
, q
2
, q
3
, q
4
) = (
24
96
,
6
96
,
60
96
,
6
96
).
Then, E(X) =
240
96
= E(Y ) and E(X
2
) =
684
96
= E(Y
2
) but E(X
3
) =
2172
96
,=
2076
96
= E(Y
3
) .
As the second example, let

1
= 1, 2, T
1
= T(
1
), P(X = 1) =
1
2
= P(X = 1),
and

2
= 2, 0, 2, T
2
= T(
2
), P(Y = 2) =
1
8
= P(Y = 2), P(Y = 0) =
6
8
.
Then, E(X) = 0 = E(Y ) and E(X
2
) = 1 = E(Y
2
) but E(X
4
) = 1 ,= 4 = E(Y
4
).
Exercise 5.5.14. Prove the converse of Lemma 5.2.1. That is, prove that if X
n
converges to X
almost surely, then for each > 0 we have P([X
n
X[ i.o.) = 0.
Solution. From
1 = P(lim
n
X
n
= X) = P(

>0
(liminf
n
[X
n
X[ < )) = 1 P(
_
>0
(limsup
n
[X
n
X[ )),
it follows:
P(
_
>0
(limsup
n
[X
n
X[ )) = 0.
On the other hand:
> 0 : (limsup
n
[X
n
X[ ) (
_
>0
(limsup
n
[X
n
X[ )),
hence:
> 0 : P(limsup
n
[X
n
X[ ) = 0.

24 Chapter 5: Inequalities and convergence


Chapter 6
Distributions of random variables
Exercise 6.3.2. Suppose P(Z = 0) = P(Z = 1) =
1
2
, that Y N(0, 1), and that Y and Z are
independent. Set X = Y Z. What is the law of X?
Solution. Using the denition of conditional probability given in Page 84, for any Borel Set B R
we have that:
/(X)(B) = P(X B)
= P(X B[Z = 0)P(Z = 0) +P(X B[Z = 1)P(Z = 1)
= P(0 B)
1
2
+P(Y B)
1
2
=
(
0
+
N
)
2
(B).
Therefore, /(X) =
(
0
+
N
)
2
.
Exercise 6.3.4. Compute E(X), E(X
2
), and V ar(X), where the law of X is given by
(a) /(X) =
1
2

1
+
1
2
, where is Lebesgue measure on [0,1].
(b) /(X) =
1
3

2
+
2
3

N
, where
N
is the standard normal distribution N(0, 1).
Solution. Let /(X) =

n
i=1

i
/(X
i
) where

n
i=1

i
= 1, 0
i
1 for all 1 i n. Then, for any
Borel measurable function f : R R, combining Theorems 6.1.1, and 6.2.1 yields:
E
p
(f(X)) =
n

i=1

i
_

f(t)/(X
i
)(dt).
Using the above result and considering I(t) = t, it follows:
(a)
26 Chapter 6: Distributions of random variables
E
P
(X) =
1
2
_

I(t)
1
(dt) +
1
2
_

I(t)(dt) =
1
2
(1) +
1
2
(
1
2
) =
3
4
.
E
P
(X
2
) =
1
2
_

I
2
(t)
1
(dt) +
1
2
_

I
2
(t)(dt) =
1
2
(1) +
1
2
(
1
3
) =
2
3
.
V ar(X) = E
p
(X
2
) E
2
p
(X) =
5
48
.
(b)
E
P
(X) =
1
3
_

I(t)
2
(dt) +
2
3
_

I(t)
N
(dt) =
1
3
(2) +
2
3
(0) =
2
3
.
E
P
(X
2
) =
1
3
_

I
2
(t)
2
(dt) +
2
3
_

I
2
(t)
N
(dt) =
1
3
(4) +
2
3
(1) = 2.
V ar(X) = E
p
(X
2
) E
2
p
(X) =
14
9
.
Exercise 6.3.6. Let X and Y be random variables on some probability triple (, T, P). Suppose
E(X
4
) < , and that P(m X z) = P(m Y z) for all integers m and all z R. Prove or
disprove that we necessarily have E(X
4
) = E(Y
4
).
Solution. Yes. First from 0 P(X < m) F
X
(m), (m Z) and lim
m
F
X
(m) = 0 it follows
that:
lim
m
P(X < m) = 0.
Next, using recent result we have:
F
X
(z) = F
X
(z) lim
m
P(X < m)
= lim
m
(P(X z) P(X < m))
= lim
m
P(m X z)
= lim
m
P(m Y z)
= F
Y
(z).
for all z R.Therefore, by Proposition 6.0.2, /(X) = /(Y ) and by Corollary 6.1.3, the desired result
follows.
Exercise 6.3.8. Consider the statement : f(x) = (f(x))
2
for all x R.
(a) Prove that the statement is true for all indicator functions f = 1
B
.
(b) Prove that the statement is not true for the identity function f(x) = x.
(c) Why does this fact not contradict the method of proof of Theorem 6.1.1?
Solution. (a) 1
2
B
= 1
BB
= 1
B
, for all Borel measurable sets B R.
(b) f(4) = 4 ,= 16 = (f(4))
2
.
(c) The main reason is the fact that the functional equation f(x) = (f(x))
2
is not stable when the
Chapter 6: Distributions of random variables 27
satisfying function f is replaced by a linear combination such as

n
i=1
a
i
f
i
. Thus, in contrary to the
method of Proof of Theorem 6.1.1, we cannot pass the stability of the given functional equation from
indicator function to the simple function.
28 Chapter 6: Distributions of random variables
Chapter 7
Stochastic processes and gambling
games
Exercise 7.4.2. For the stochastic process X
n
given by (7.0.2), compute (for n, k > 0)
(a) P(X
n
= k).
(b) P(X
n
> 0).
Solution. (a) P(X
n
= k) = P(

n
i=1
r
i
=
n+k
2
) = (
n
n+k
2
)2
n
if n +k = 2, 4, ..., 2n, 0 etc.
(b) P(X
n
> 0) =

n
k=1
P(X
n
= k) =

1
n+k
2
n
(
n
n+k
2
)2
n
=

n
u=]
n
2
|+1
(
n
u
)2
n
.
Exercise 7.4.4. For the gamblers ruin model of Subsection 7.2, with c = 10, 000 and p = 0.49, nd
the smallest integer a such that s
c,p
(a)
1
2
. Interpret your result in plain English.
Solution. Substituting p = 0.49, q = 0.51, and c = 10, 000 in the equation (7.2.2), and considering
s
c,p
(a)
1
2
it follows:
1 (
0.51
0.49
)
a
1 (
0.51
0.49
)
10,000

1
2
or
a
ln(
1
2
((
0.51
0.49
)
10,000
1) + 1)
ln(
51
49
)
9982.67
and the smallest positive integer value for a is 9983.
This result means that if we start with $ 9983 (i.e. a = 9983) and our aim is to win $ 10,000 before
going broke (i.e. c = 10, 000), then with the winning probability of %49 in each game (i.e.p = 0.49)
our success probability, that we achieve our goal, is at least %50.
Exercise 7.4.6. Let W
n
be i.i.d. with P(W
n
= 1) = P(W = 0) =
1
4
and P(W
n
= 1) =
1
2
, and let a
be a positive integer. Let X
n
= a + W
1
+ W
2
+ ... + W
n
, and let
0
= infn 0; X
n
= 0. Compute
P(
0
< ).
30 Chapter 7: Stochastic processes and gambling games
Solution. Let 0 a c and
c
= infn 0 : X
n
= c. Consider s
c
(a) = P(
c
<
a
). Then:
s
c
(a) = P(
c
<
a
[W
n
= 0)P(W
n
= 0) +P(
c
<
a
[W
n
= 1)P(W
n
= 1) +P(
c
<
a
[W
n
= 1)P(W
n
= 1)
=
1
4
S
c
(a) +
1
4
S
c
(a + 1) +
1
2
S
c
(a 1),
where 1 a c 1, s
c
(0) = 0, and s
c
(c) = 1. Hence, s
c
(a + 1) s
c
(a) = 2(s
c
(a) s
c
(a 1)) for all
1 a c 1. Now, Solving this equation yields:
s
c
(a) =
2
a
1
2
c
1
0 a c.
On the other hand,
0
<
c

0
< if and only if
c

0

0
= . Therefore:
P(
0
< ) = 1 P(
0
= )
= 1 lim
c
P(
c

0
)
= 1 lim
c
S
c
(a)
= 1.

Exercise 7.4.8. In gamblers ruin, recall that


c
<
0
is the event that the player eventually wins,
and
0
<
c
is the event that the player eventually losses.
(a) Give a similar plain -English description of the complement of the union of these two events, i.e.
(
c
<
0

0
<
c
)
c
.
(b) Give three dierent proofs that the event described in part (a) has probability 0: one using Exercise
7.4.7; a second using Exercise 7.4.5; and a third recalling how the probabilities s
c,p
(a) were computed
in the text, and seeing to what extent the computation would have diered if we had instead replaced
s
c,p
(a) by S
c,p
(a) = P(
c

0
).
(c) Prove that, if c 4, then the event described in part (a) contains uncountably many outcomes(i.e.
that uncountably many dierent sequences Z
1
, Z
2
, ... correspond to this event, even though it has
probability zero).
Solution. (a) The event (
c
<
0

0
<
c
)
c
=
0
=
c
is the event that the player is both
winner (winning a c dollar) and loser (losing a dollar) at the same time.
(b) Method (1):
P(
0
=
c
) = 1 P(
c
<
0

0
<
c
) = 1 (r
c,p
(a) +s
c,p
(a)) = 0.
Method (2):.
Method (3):
Let S
c,p
(a) = P(
c

0
). Then, for 1 a c 1 we have that:
S
c,p
(a) = P(
c

0
[Z
1
= 1)P(Z
1
= 1)+P(
c

0
[Z
1
= 1)P(Z
1
= 1) = qS
c,p
(a1)+pS
c,p
(a+1)
where S
c,p
(0) = 0 and S
c,p
(c) = 1. Solving the above equation, it follows S
c,p
(a) = s
c,p
(a) for all
0 a c. Thus,
P(
c
=
0
) = S
c,p
(a) s
c,p
(a) = 0
Chapter 7: Stochastic processes and gambling games 31
for all 0 a c.
(c) We prove the assertion for the case a = 1 and c 4 (the case a 2 and c 4 is an straightforward
generalization). Consider all sequences Z
n

n=1
of the form :
Z
1
= 1, Z
2
= 1, Z
3
= Z
2
, Z
4
= 1, Z
5
= Z
4
, ..., Z
2n
= 1, Z
2n+1
= Z
2n
, ....
Since each Z
2n
, n = 1, 2, ... is selected in 2 ways, there are uncountably many sequences of this type
(In fact there is an onto function f from the set of all of these sequences to the closed unite interval
dened by
f(Z
n

n=1
) =

n=1
(
sgn(Z
2n
) + 1
2
)2
n
).
In addition, a simple calculation shows that :
X
1
= 2, X
2
= 1 or 3, X
3
= 2, X
4
= 1 or 3, X
5
= 2, ..., X
2n
= 1 or 3, X
2n+1
= 2, ....

Exercise 7.4.10. Consider the gambling policies model, with p =


1
3
, a = 6, and c = 8.
(a) Compute the probability s
c,p
(a) that the player will win (i.e. hit c before hitting 0) if they bet $1
each time(i.e. if W
n
1).
(b) Compute the probability that the player will win if they bet $ 2 each time (i.e. if W
n
2).
(c) Compute the probability that the player will win if they employ the strategy of Bold play(i.e., if
W
n
= min(X
n1
, c X
n1
)).
Solution.(a) For W
n
= 1, p =
1
3
, a = 6, c = 8, q =
2
3
and
q
p
= 2, it follows:
s
c,p
(a) =
2
a
1
2
c
1
0.247058823.
(b) For W
n
= 2, p =
1
3
, a = 3, c = 4, q =
2
3
and
q
p
= 2, it follows:
s
c,p
(a) =
2
a
1
2
c
1
0.46666666.
(c) For X
n
= 6 + W
1
Z
1
+ W
2
Z
2
+ ... + W
n
Z
n
, W
n
= min(X
n1
, c X
n1
), Z
n
= 1 and c = 8 it
follows:
W
1
= min(X
0
, 8 X
0
) = min(6, 8 6) = 2,
W
2
= min(X
1
, 8 X
1
) = min(6 + 2Z
1
, 2 2Z
1
) = 0 if Z
1
= 1, 4 if Z
1
= 1,
W
3
= min(X
2
, 8 X
2
)
= min(6 + 2Z
1
+W
2
Z
2
, 2 2Z
1
W
2
Z
2
)
= 0if(Z
1
= 1, W
2
= 0), 0 if (Z
1
= 1, W
2
= 4, Z
2
= 1), 8 if (Z
1
= 1, W
2
= 4, Z
2
= 1),
32 Chapter 7: Stochastic processes and gambling games
W
4
= min(X
3
, 8 X
3
)
= min(6 + 2Z
1
+W
2
Z
2
+W
3
Z
3
, 2 2Z
1
W
2
Z
2
W
3
Z
3
)
= 0 if (Z
1
= 1orZ
2
= 1), 0 if (Z
1
= Z
2
= 1, Z
3
= 1), 8 if (Z
1
= Z
2
= Z
3
= 1),
When Z
1
= Z
2
= 1, the event
0
occurs. Hence:
P(
c
<
0
) = P(Z
1
= 1) +P(Z
1
= 1, Z
2
= 1)
= P(Z
1
= 1) +P(Z
1
= 1)P(Z
2
= 1)
=
1
3
+
2
3
1
3
= 0.5555555.

Chapter 8
Discrete Markov chains
Exercise 8.5.2. For any > 0, give an example of an irreducible Markov chain on a countably innite
state space , such that [p
ij
p
ik
[ for all states i, j, and k.
Solution. Given > 0. If 1, then put S = N, v
i
= 2
i
(i N), and p
ij
= 2
j
(i, j N), giving:
P =
_
_
_
_
_
1
2
1
2
2
1
2
3
. . .
1
2
1
2
2
1
2
3
. . .
1
2
1
2
2
1
2
3
. . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
_
_
_
_
_
.
If 0 < < 1, then put S = N, v
i
= 2
i
(i N). Dene n
0
= maxn N : n < 1 and put p
ij
= if
(i = 1, 2, ..., j = 1, 2, ..., n
0
), (1 n
0
)2
(jn
0
)
if (i = 1, 2, ..., j = n
0
+ 1, ...), giving:
P =
_
_
_
_
_

1n
0

2
1n
0

2
2
1n
0

2
3


1n
0

2
1n
0

2
2
1n
0

2
3


1n
0

2
1n
0

2
2
1n
0

2
3

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
_
_
_
_
_
.
In both cases, [p
ij
p
ik
[ for all states i, j, and k (Check!). In addition, since P
ij
> 0 for all
i, j N, it follows that the corresponding Markov chain in the Theorem 8.1.1., is irreducible.
Note: A minor change in above solution shows that, in fact, there are uncountably many Markov
chains having this property (Check!).
Exercise 8.5.4. Given Markov chain transition probabilities P
ij

i,jS
on a state space S, call a subset
C S closed if

jC
p
ij
= 1 for each i C. Prove that a Markov chain is irreducible if and only if it
has no closed subsets (aside from the empty set and S itself).
Solution. First, let the given Markov chain has a proper closed subset C. Since

jS
p
ij
= 1 for each
34 Chapter 8: Discrete Markov chains
i S and

jC
p
ij
= 1 for each i C, we conclude that :

jSC
p
ij
=

jS
p
ij

jC
p
ij
= 0, (i C).
Consequently, p
ij
= 0 for all i C and j S C. Next, consider the Chapman-Kolmogorov equation:
p
(n)
ij
=

rS
p
(k)
ir
p
(nk)
rj
, (1 k n, n N).
Specially, for k = n 1, j S C, and i C, applying the above result gives:
p
(n)
ij
=

rSC
p
(n1)
ir
p
rj
, (n N).
Thus, the recent equation, inductively, yields:
p
(n)
ij
= 0(i C, j S C, n N),
where C ,= and S C ,= . Therefore, the given Markov chain is reducible.
Second, assume the given Markov chain is reducible. Hence, there are i
0
, j
0
S such that p
(n)
i
0
j
0
= 0for
all n N. On the other hand, C S is closed if and only if for all i C and j S if p
(n
0
)
ij
= 0 for
some n
0
N, then j C
(Proof. Let C S be closed. If i C and j S C with p
(n
0
)
ij
> 0 for some n
0
N, then

kC
p
(n
0
)
ik
< 1, a contradiction. Conversely, if the condition is satised and C is not closed, then

jC
p
ij
< 1, for some i C; hence p
ij
> 0 for some i C and j S C, a contradiction. ).
Now, it is sucient to take C = S j
0
.
Exercise 8.5.6. Consider the Markov chain with state space S = 1, 2, 3 and transition probabilities
p
12
= p
23
= p
31
= 1. Let
1
=
2
=
3
=
1
3
.
(a) Determine whether or not the chain is irreducible.
(b) Determine whether or not the chain is aperiodic.
(c) Determine whether or not the chain is reversible with respect to
i
.
(d) Determine whether or not
i
is a stationary distribution.
(e) Determine whether or not lim
n
p
(n)
11
=
1
.
Solution. (a) Yes. A simple calculation shows that for any nonempty subset C S, there is i C
such that

jC
p
ij
< 1, (Check!) . Thus, using Exercise 8.5.4., the given Markov chain is irreducible.
(b) No. Since the given Markov chain is irreducible, by Corollary 8.3.7., all of its states have the same
Chapter 8: Discrete Markov chains 35
period. Hence, period(1) = period(2) = period(3). Let i = 1 and consider the Chapman-Kolmogorov
equation in the solution of Exercise 8.5.4. Then:
p
(n)
11
=
3

r=1
P
(n1)
1r
p
r1
= p
(n1)
13
,
p
(n)
13
=
3

r=1
P
(n2)
1r
p
r3
= p
(n2)
12
,
p
(n)
12
=
3

r=1
P
(n3)
1r
p
r2
= p
(n3)
11
,
besides:
p
(1)
11
= 0,
p
(2)
11
=
3

i
k+1
=1
P
1i
k+1
p
i
k+1
1
= 0,
p
(3)
11
=
3

i
k+1
=1
3

i
k+2
=1
P
1i
k+1
P
i
k+1
i
k+2
p
i
k+2
1
= 1,
implying:
p
(n)
11
= 1 if 3[n, 0 etc.
Consequently, period(1) = 3 ,= 1.
(c) No. Let for all i, j S,
i
p
ij
=
j
p
ji
. Since,
i
=
j
=
1
3
, it follows that for all i, j S, p
ij
= p
ji
.
On the other hand, p
12
= 1 ,= 0 = p
21
, showing that this chain is not reversible with respect to
i

3
i=1
.
(d) Yes. Since

iS
p
ij
= 1 for any i, j S and
i
=
j
=
1
3
, it follows that for any j S,

iS

i
p
ij
=
1
3

iS
p
ij
=
1
3
=
j
.
(e) No. Since p
(n)
11
= 1 if 3[n, 0 etc., the given limit does not exist.
Exercise 8.5.8. Prove the identity f
ij
= p
ij
+

k,=j
p
ik
f
kj
.
36 Chapter 8: Discrete Markov chains
Solution.
f
ij
= f
(1)
ij
+

n=2
f
(n)
ij
= p
ij
+

n=2

k,=j
P
i
(X
1
= k, X
2
,= j, ..., X
n1
,= j, X
n
= j)
= p
ij
+

k,=j

n=2
P
i
(X
1
= k, X
2
,= j, ..., X
n1
,= j, X
n
= j)
= p
ij
+

k,=j

n=2
P
i
(X
1
= k)P
i
(X
2
,= j, ..., X
n1
,= j, X
n
= j)
= p
ij
+

k,=j
p
ik

n=2
P
i
(X
2
,= j, ..., X
n1
,= j, X
n
= j)
= p
ij
+

k,=j
p
ik
f
kj
.
Exercise 8.5.10. Consider a Markov chain (not necessarily irreducible) on a nite state space.
(a) Prove that at least one state must be recurrent.
(b) Give an example where exactly one state is recurrent (and all the rest are transient).
(c) Show by example that if the state space is countably innite then part (a) is no longer true.
Solution. (a) Since S is nite, there is at least one i
0
S such that for innite times X
n
= i
0
. Hence,
P(X
n
= i
0
i.o.) = 1, and consequently P
i
0
(X
n
= i
0
i.o.) > 0. Now, by Theorem 3.5.1. (Kolmogorov
zero-one law) P
i
0
(X
n
= i
0
i.o.) = 1. Therefore, by Theorem 8.2.1., i
0
is recurrent.
(b) Let S = 1, 2 and (p
ij
) =
_
1 0
1 0
_
. Using Chapman-Kolmogorov equation:
p
(n)
i2
=
2

r=1
p
(n1)
ir
p
r2
= 0(i = 1, 2, n = 1, 2, 3, ...).
Specially, p
(n)
22
= 0(n N), and, hence,

n=1
p
(n)
22
= 0 < . Thus, by Theorem 8.2.1. , it follows that
the state 2 is transient . Eventually, by part (a), the only remaining state, which is 1, is recurrent.
(c) Any state i Z in the simple asymmetric random walk is not recurrent (see page 87).
Exercise 8.5.12. Let P = (p
ij
) be the matrix of transition probabilities for a Markov chain on a nite
state space.
(a) Prove that P always has 1 as an eigenvalue.
(b) Suppose that v is a row eigenvector for P corresponding to the eigenvalue 1, so that vP = v. Does
v necessarily correspond to a stationary distribution? Why or why not?
Solution. (a) Let [S[ = n and P = (p
ij
). Since [S[ < , without loss of generality we can assume
Chapter 8: Discrete Markov chains 37
S = 1, 2, ..., n. Consider
[
(0)
]
t
=
_
_
_
_
_
1
1
.
.
.
1
_
_
_
_
_
.
Then:
P[
(0)
]
t
=
_
_
_
_
_

n
j=1
p
1j
.1

n
j=1
p
1j
.1
.
.
.

n
j=1
p
1j
.1
_
_
_
_
_
= [
(0)
]
t
.
So, = 1 is an eigenvalue for P.
(b) Generally No. As a counterexample, we can consider S = 1, 2, ..., n, P = I
nn
and v = (
1
n
)
n
i=1
which trivially does not correspond to any stationary distribution.
Exercise 8.5.14. Give an example of a Markov chain on a nite state space, such that three of the
states each have a dierent period.
Solution. Consider the corresponding Markov chain of Theorem 8.1.1. to the state space S =
1, 2, ..., 6 and the transition matrix:
(p
ij
) =
_
_
_
_
_
_
_
_
1/2 1/2 0 0 0 0
0 0 1 0 0 0
0 1 0 0 0 0
0 0 0 0 1 0
0 0 0 0 0 1
1/2 0 0 1/2 0 0
_
_
_
_
_
_
_
_
.
Then, since p
11
> 0, p
23
p
32
> 0, and, p
45
p
56
p
64
> 0, it follows period(1) = 1, Period(2) = Period(3) =
2, and Period(4) = Period(5) = Period(6) = 3, respectively .
Exercise 8.5.16. Consider the Markov chain with state space S = 1, 2, 3 and transition probabilities
given by :
(p
ij
) =
_
_
0 2/3 1/3
1/4 0 3/4
4/5 1/5 0
_
_
.
(a) Find an explicit formula for P
1
(
1
= n) for each n N, where
1
= infn 1 : X
n
= 1.
(b) Compute the mean return time m
1
= E
1
(
1
).
(c) Prove that this Markov chain has a unique stationary distribution, to be called
i
.
(d) Compute the stationary probability
1
.
38 Chapter 8: Discrete Markov chains
Solution. (a) Let a
n
= P
1
(
1
= n). Computing a
n
for n = 1, 2, ..., 7 yields:
a
1
= 0
a
2
= p
12
p
21
+p
13
p
31
a
3
= p
12
p
23
p
31
+p
13
p
32
p
21
a
4
= p
12
p
23
p
32
p
21
+p
13
p
32
p
23
p
31
a
5
= p
13
p
32
p
23
p
32
p
21
+p
12
p
23
p
32
p
23
p
31
a
6
= p
13
p
32
p
23
p
32
p
23
p
31
+p
12
p
23
p
32
p
23
p
32
p
21
a
7
= p
13
p
32
p
23
p
32
p
23
p
32
p
21
+p
12
p
23
p
32
p
23
p
32
p
23
p
31
.
In general, it follows by induction (Check!) that:
a
2m
= (p
23
p
32
)
m1
(p
12
p
21
+p
13
p
31
) (m N)
a
2m+1
= (p
23
p
32
)
m1
(p
12
p
23
p
31
+p
13
p
32
p
21
) (m N).
(b) Using part (a) we conclude:
m
1
= E(
1
)
=

n=1
nP
1
(
1
= n)
=

m=1
2mP
1
(
1
= 2m) +

m=1
(2m+ 1)P
1
(
1
= 2m+ 1)
= 2(p
12
p
21
+p
13
p
31
)

m=1
m(p
23
p
32
)
m1
+ 2(p
12
p
23
p
31
+p
13
p
32
p
21
)

m=1
(2m+ 1)(p
23
p
32
)
m1
=
2(p
12
p
21
+p
13
p
31
)
(1 p
23
p
32
)
2
+ (p
12
p
23
p
31
+p
13
p
32
p
21
)(
2
(1 p
23
p
32
)
2
+
1
(1 p
23
p
32
)
).
(c) First, we show that the given Markov chain is irreducible. Using Chapman-Kolmogorov equation:
p
(n)
ij
=
3

r=1
p
ir
p
(n1)
rj
i, j = 1, 2, 3, n = 1, 2, ...,
it follows that for any i, j = 1, 2, 3 , there is n
0
N such that p
(n
0
)
ij
> 0. In addition, a computation
similar to part (a) shows that m
i
= E(
i
) < for i = 2, 3. Hence, by Theorem 8.4.1., this Markov
chain has a unique distribution
i

3
i=1
given by
i
=
1
m
i
for i = 1, 2, 3.
(d)

1
=
1
m
1
=
(1 p
23
p
32
)
2
2(p
12
p
21
+p
13
p
31
) + (p
12
p
23
p
31
+p
13
p
32
p
21
)(3 p
23
p
32
)
.

Exercise 8.5.18. Prove that if f


ij
> 0 and f
ji
= 0, then i is transient.
Chapter 8: Discrete Markov chains 39
Solution. Consider the equation:
f
ik
=
n

m=1
f
(m)
ik
+

j,=k
P
i
(X
1
,= k, ..., X
n1
,= k, X
n
= j)f
jk
.
Let k = i. Then, f
ij
> 0 if and only if P
i
(X
1
,= i, ..., X
n1
,= i, X
n
= j) > 0 for some n N.
Consequently, if f
ij
> 0 and f
ji
= 0, then f
ii
< 1, showing that the state i is transient.
Exercise 8.5.20. (a) Give an example of a Markov chain on a nite state space which has multiple
(i.e. two or more) stationary distributions.
(b) Give an example of a reducible Markov chain on a nite state space, which nevertheless has a
unique stationary distribution.
(c) Suppose that a Markov chain on a nite state space is decomposable, meaning that the state space
can be partitioned as S = S
1
S
2
, with S
i
nonempty, such that f
ij
= f
ji
= 0 whenever i S
1
and
j S
2
. Prove that the chain has multiple stationary distribution.
(d) Prove that for a Markov chain as in part (b) , some states are transient.
Solution. (a) Consider a Markov chain with state space S = i
n
i=1
and transition matrix (p
ij
) = I
nn
where n 3. Then, any distribution (
i
)
n
i=1
with

n
i=1

i
= 1 and
i
0 is its stationary distribution.
(b) Take S = 1, 2 and:
(p
ij
) =
_
0 1
0 1
_
.
Applying Chapman-Kolmogorov equation:
p
(n)
11
=
2

r=1
p
(n1)
1r
p
r1
(n N),
it follows that p
(n)
11
= 0 (n N) yielding that this chain is reducible . Let (
i
)
2
i=1
with
1
+
2
= 1
and 0
1
,
2
1 be a stationary distribution. Then,
1
=
1
p
11
+
2
p
21
= 0 and
2
= 1
1
= 1.
Thus, this chain has only one stationary distribution.
(c) Let S = S
1
S
2
and f
ij
= 0 = f
ji
for any i S
1
, j S
2
. Hence, f
ij
> 0 if either i, j S
1
or i, j S
2
. Therefore, the restriction of the given Markov chain on the state spaces S
r
(r = 1, 2) is
irreducible. Hence, by Proposition 8.4.10, all the states of the state spaces S
r
(r = 1, 2) are positive
recurrent. Now, by Theorem 8.4.9, there are unique stationary distributions for S
r
(r = 1, 2), say (
i
)
[S
1
[
i=1
and (
i
)
[S
1
[+[S
2
[
i=[S
1
[+1
, respectively. Eventually, pick any 0 < < 1 and consider the stationary distribution
(
i
)
[S[
i=1
dened by:

i
=
i
1
1,...,[S
1
[
(i) + (1 )
i
1
[S
1
[+1,...,[S[
(i).
(d) Consider the solution of part (b). In that example, since p
21
= 0 and p
(n)
21
=

2
r=1
p
(n1)
2r
p
r1
=
0 (n 2), it follows that f
21
= 0. On the other hand, p
12
> 0 and hence, f
12
> 0. Now, by
Exercise 8.5.18. the state i = 1 is transient.
40 Chapter 8: Discrete Markov chains
Chapter 9
More probability theorems
Exercise 9.5.2. Give an example of a sequence of Random variables which is unbounded but still
uniformly integrable. For bonus points, make the sequence also be undominated , i.e. violate the
hypothesis of the Dominated Convergence Theorem.
Solution. Let = N, and P() = 2

( ). For n N, dene:
X
n
: R
X
n
() = n
n
.
Since lim
n
X
n
(n) = , there is no K > 0 such that [X
n
[ < K for all n N. On the other hand, for
any there exists some N

= ,| such that :
[X
n
[1
[Xn[
() = X
n
() n = N

, N

+ 1, .
Thus, we have that:
sup
n
E([X
n
[1
[Xn[
) = sup
nN
E(X
n
) = sup
nN
(n2
n
) = N

2
N
,
implying:
lim

sup
n
E([X
n
[1
[Xn[
) = lim

2
N
= 0.

Exercise 9.5.4. Suppose that lim


n
X
n
() = 0 for all , but lim
n
E(X
n
) ,= 0. Prove that
E(sup
n
[X
n
[) = .
Solution. Assume E(sup
n
[X
n
[) < . Take Y = sup
n
[X
n
[. Now, according to the Theorem 9.1.2,
lim
n
E(X
n
) = E( lim
n
X
n
) = 0,
a contradiction.
42 Chapter 9: More probability theorems
Exercise 9.5.6. Prove that Theorem 9.1.6. implies Theorem 9.1.2.
Solution. Let the assumptions of the Theorem 9.1.2. hold. Since [X
n
[ Y , it follows [X
n
[1
[Xn[

Y 1
Y
, and consequently , by taking rst expectation from both sides of the inequality and then taking
supremum of the result inequality , we have:
sup
n
E([X
n
[1
[Xn[
) E(Y 1
Y
) ( > 0)
On the other hand, lim

E(Y 1
Y
) = 0. Thus, by above inequality it follows
lim

sup
n
E([X
n
[1
[Xn[
) = 0.
Eventually, by Theorem 9.1.6, lim
n
E(X
n
) = E(X).
Exercise 9.5.8. Let = 1, 2, with P(1) = P(2) =
1
2
, and let F
t
(1) = t
2
, and F
t
(2) = t
4
,for
0 < t < 1.
(a) What does Proposition 9.2.1. conclude in this case?
(b) In light of the above, what rule from calculus is implied by Proposition 9.2.1?
Solution. (a) Since :
E(F
t
) = F
t
(1)P(1) +F
t
(2)P(2) =
t
2
+t
4
2
1 < ,
for all0 < t < 1 and, F

t
(1) = 2t and F

t
(2) = 4t
3
exist for all 0 < t < 1, by Theorem 9.2.1. it
follows that F

t
is random variable. Besides, since [F

t
[ 4 for all 0 < t < 1 and E(4) = 4 < ,
according to Theorem 9.2.1. (t) = E(F
t
) is dierentiable with nite derivative

(t) = E(F

t
) for all
0 < t < 1. Thus, E(F

t
) = t + 2t
3
for all 0 < t < 1.
(b)
d
dx
_
x
f(t)dt =
_
x
d
dt
f(t)dt.
Exercise 9.5.10. Let X
1
, X
2
, , be i.i.d., each having the standard normal distribution N(0, 1). Use
Theorem 9.3.4. to obtain an exponentially decreasing upper bound on P(
1
n
(X
1
+ +X
n
) 0.1).
Solution. In this case, for the assumptions of the Theorem 9.3.4., M
X
i
(s) = exp(
s
2
2
) < for all
a < s < b and all a, b, > 0, m = 0, and = 0.1. Thus:
P(
1
n
(X
1
+ +X
n
) 0.1)
n
where = inf
0<s<b
(exp(
s
2
2
0.1s)) for all n N. Put g(s) = exp(
s
2
2
0.1s) for 0 < s < . Then, a
simple calculation shows that for b = 0.1 the function g attains its inmum. Accordingly, = g(b) =
0.995012449.
Chapter 9: More probability theorems 43
Exercise 9.5.12. Let > 2, and let M(t) = exp([t[

) for t R. Prove that M(t) is not a


characteristic function of any probability distribution.
Solution. Calculating the rst two derivatives, we have:
M

(t) = t[t[
2
exp([t[

) < t <
M

(t) = [t[
2
(( 1) [t[

) exp([t[

) < t < ,
yielding, M

(0) = 0 and M

(0) = 0. Consider the following proposition which is a corollary of


Proposition 11.0.1 on page 125:
Proposition: Let X be a random variable. If for some n N the characteristic function M
X
(t) has a
nite derivative of order 2n at t = 0, then:
M
k
(0) = i
k
E(X
k
) 0 k 2n
Now, assume M = M
X
for some random variable X. Then, applying the above proposition for the
case n = 1 , it follows E(X
2
) = E(X) = 0, and, therefore, V ar(X) = 0. Next, by Proposition 5.1.2,
P([X[ )
V ar(X)

2
= 0 for all > 0, implying P(X ,= 0) = 0 or equivalently, X
0
. Accordingly,
M
X
(t) = 1 for all < t < , a clear contradiction.
Exercise 9.5.14. Let be Lebesgue measure on [0, 1], and let f(x, y) = 8xy(x
2
y
2
)(x
2
+ y
2
)
3
for
(x, y) ,= (0, 0), with f(0, 0) = 0.
(a) Compute
_
1
0
(
_
1
0
f(x, y)(dy))(dx).
(b) Compute
_
1
0
(
_
1
0
f(x, y)(dx))(dy).
(c) Why does the result not contradict Fubinis Theorem.
Solution. (a) Let u = x
2
+y
2
, v = x. Then, du = 2ydy, dx = dx, x
2
y
2
= 2v
2
u
2
and v
2
u v
2
+1.
Now, using this change of variable , it follows:
_
1
0
(
_
1
0
f(x, y)dy)dx =
_
1
0
(
_
1
0
8xy(x
2
y
2
)(x
2
+y
2
)
3
dy)dx
=
_
1
0
(
_
v
2
+1
v
2
4v(2v
2
u
2
)u
3
du)dv
=
_
1
0
(
4v
3
(v
2
+ 1)
2
+
4
v
4v ln
v
2
+ 1
v
2
)dv
= .
44 Chapter 9: More probability theorems
(b) Using integration by parts for the inside integral, it follows:
_
1
0
(
_
1
0
f(x, y)dx)dy =
_
1
0
(
_
1
0
8xy(x
2
y
2
)(x
2
+y
2
)
3
dx)dy
=
_
1
0
(
_
1
0
4x(x
2
y
2
)(x
2
+y
2
)
3
dx)2ydy
=
_
1
0
(
2
(y
2
+ 1)
2
)2ydy
= 1.
(c) In this case, the hypothesis of the Fubinis Theorem is violated. In fact, by considering suitable
Riemann sums for double integrals, it follows that
_ _
f
+
(dx dy) =
_ _
f

(dx dy) = .
Exercise 9.5.16. Let X N(a, v) and Y N(b, w) be independent. Let Z = X + Y. Use the
convolution formulae to prove that Z N(a +b, v +w).
Solution. Let a = b = 0. we claim that if X N(0, v) and Y N(0, w), then Z N(0, v + w). To
prove it, using convolution formulae, it follows:
h
Z
(z) =
_

f
X
(z y)g
Y
(y)dy
=
1

2v
1

2w
_

exp(
1
2
(
(z y)
2
v
+
y
2
w
))dy
=
exp(
z
2
v
)

2v

2w
_

exp(
1
2
((
1
w
+
1
v
)y
2

2z
v
y))dy
=
exp(
z
2
v
)

2v

2w
_

1
_
1
w
+
1
v
exp(
1
2
((s
2

2z
v
1
_
1
w
+
1
v
s))ds
=
exp(
z
2
v
)
_
2(v +w)

2
_

exp(
1
2
((s
z
v
_
1
z
2
w
+
1
v
)
2
(
z
v
_
1
w
+
1
v
)
2
))ds
=
1
_
2(v +w)
exp(
z
2
2v
) exp(
z
2
2v
2
(
1
v
+
1
w
)
)
1

2
_

exp(
1
2
(s
z
v
_
1
v
+
1
w
)
2
)ds
=
1
_
2(v +w)
exp(
z
2
2(v +w)
)
where < z < . Now, if X N(a, v) and Y N(b, w), then Xa N(0, v) and Y b N(0, w).
Hence, applying above result, we have that Z(a+b) N(0, v+w), or equivalently , Z N(a+b, v+w).

Chapter 10
Weak convergence
Exercise 10.3.2. Let X, Y
1
, Y
2
, be independent random variables, with P(Y
n
= 1) =
1
n
and
P(Y
n
= 0) = 1
1
n
. Let Z
n
= X + Y
n
. Prove that /(Z
n
) /(X), i.e. that the law of Z
n
converges
weakly to the law of X.
Solution. Given > 0. Then, [Z
n
X[ = if( > 1), 1 if(0 < 1), implying 0
P([Z
n
X[ )
1
n
for all n N. Accordingly, lim
n
P([Z
n
X[ ) = 0. Hence, lim
n
Z
n
= X in
probability, and consequently, by Proposition 10.2.1., /(Z
n
) /(X).
Exercise 10.3.4. Prove that weak limits, if they exist, are unique. That is, if , ,
1
,
2
, are
probability measures, and
n
and also
n
, then = .
Solution. Put F
n
(x) =
n
((, x]) , F(x) = ((, x]) and G(x) = ((, x]) for all x R and
n N. Then, by Theorem 10.1.1., and Exercise 6.3.7. it follows that lim
n
F
n
(x) = F(x) except of D
F
and lim
n
F
n
(x) = G(x) except of D
G
. Since F and G are increasing, the sets D
F
and D
G
are countable,
and so is D
F
D
G
. Thus, F = G except at most on D
F
D
G
. On the other hand, F and G are right
continuous everywhere and , in particular, on D
F
D
G
, implying F = G on D
F
D
G
(In fact, let x D
F
D
G
and x
n

n=1
be a sequence of (D
F
D
G
)
c
such that x
n
x. Then, by the
recent result it follows that :
F(x) = lim
n
F(x
n
) = lim
n
G(x
n
) = G(x)).
Accordingly, F = G on (, ). Eventually, by Proposition 6.0.2., = .
Exercise 10.3.6. Let A
1
, a
2
, be any sequence of non-negative real numbers with

i
a
i
= 1. Dene
the discrete measure by (.) =

iN
a
i

i
(.), where
i
(.) is a point mass at the positive integer i.
Construct a sequence
n
of probability measures , each having a density with respect to Lebesgue
measure , such that
n
.
46 Chapter 10: Weak convergence
Solution. Dene:

n
(.) =

iN
na
i
(. (i
1
n
, i])
where dentes the Lebesgue measure. Then,

n
((, )) = 1,
for all n N. Next, given x R and > 0. Then, for any n N if n
1
1(x[x])
, then
[
n
((, x]) ((, x])[ = 0 < . Accordingly, lim
n

n
((, x]) = ((, x]). Eventually, by
Theorem 10.1.1,
n
.
Exercise 10.3.8. Prove the following are equivalent:
(1)
n

(2)
_
fd
n

_
fd, for all non-negative bounded continuous f : R R.
(3)
_
fd
n

_
fd, for all non-negative continuous f : R Rwith compact support, i.e., such that
there are nite a and b with f(x) = 0 for all x < a and all x > b.
(4)
_
fd
n

_
fd, for all continuous f : R R with compact support.
(5)
_
fd
n

_
fd, for all non-negative continuous f : R R which vanish at innity, i.e.,
lim
x
f(x) = lim
x
f(x) = 0.
(6)
_
fd
n

_
fd, for all continuous f : R R which vanish at innity.
Solution. We prove the equivalence of the assertions according the following diagram:
(1)
(5)
(4)
(3)
(2) (6)
(1) (2) :
Any non-negative bounded continuous function is bounded continuous. Hence, by denition of weak
convergence, (1) implies (2).
(1) (4) :
Any continuous function f having compact support is bounded continuous, (since a continuous function
attains its supremum on any compact set). Hence, by denition of weak convergence, (1) implies (4).
(1) (5) :
Any positive continuous function which vanishes at is bounded continuous(since there is a compact
Chapter 10: Weak convergence 47
set [M, M] such that [f[ 1 outside of it, implying
[f[ max(1, sup
[M,M]
[f(x)[) < ).
Hence, by denition of weak convergence, (1) implies (5).
(1) (6) :
Any continuous function which vanishes at is bounded continuous(since there is a compact set
[M, M] such that [f[ 1 outside of it, implying
[f[ max(1, sup
[M,M]
[f(x)[) < ).
Hence, by denition of weak convergence, (1) implies (5).
(4) (3) :
Any function satisfying (3) is an special case of (4). Thus, (4) implies (3).
(5) (3) :
Any function satisfying (3) is an special case of (5). Thus, (5) implies (3).
(6) (3) :
Any function satisfying (3) is an special case of (6). Thus, (6) implies (3).
(2) (1) :
Let f be a bounded continuous function. Then, f
+
= max(f, 0) and f

= max(f, 0) are non-negative


bounded continuous functions. Hence, by (2) it follows:
lim
n
_
fd
n
= lim
n
_
f
+
d
n
lim
n
_
f

d
n
=
_
f
+
d
_
f
+
d =
_
fd.
(3) (2) :
Let f be a non-negative bounded continuous function. Put f
M
= f1
[M,M]
for M > 0. We claim that:
lim
n
_
f
M
d
n
=
_
f
M
d for any M > 0.()
To prove the assertion, let > 0. Then, there are non negative-compact support- continuous functions
g
M
and h
M
with h
M
f
M
g
M
such that :
_
f
M
d
_
g
M
d
_
f
M
d +,
_
f
M
d
_
h
M
d
_
f
M
d ,
48 Chapter 10: Weak convergence
implying:
_
f
M
d
_
h
M
d
= liminf
n
_
h
M
d
n
liminf
n
_
f
M
d
n
limsup
n
_
f
M
d
n
limsup
n
_
g
M
d
n
=
_
g
M
d

_
f
M
d +.
Now, since the > 0 can be chosen arbitrary, the desired result follows. Next, let (2) does not hold.
Therefore, there exist
0
> 0 and an increasing sequence n
k

k=1
such that [
_
fd
n
k

_
fd[ >
0
for
all k N. On the other hand, using () there are large enough M
0
> 0 and k
0
N such that :
[
_
fd
n
k
0

_
f
M
0
d
n
k
0
[ <

0
4
,
[
_
f
M
0
d
n
k
0

_
f
M
0
d[ <

0
4
,
[
_
f
M
0
d
_
fd[ <

0
4
,
Eventually, an application of triangular inequality shows that:

0
< [
_
fd
n
k
0

_
fd[ [
_
fd
n
k
0

_
f
M
0
d
n
k
0
[
+ [
_
f
M
0
d
n
k
0

_
f
M
0
d[
+ [
_
f
M
0
d
_
fd[
<
3
0
4
,
which is a contradiction.
Exercise 10.3.10. Let f : [0, 1] (0, ) be a continuous function such that
_
1
0
fd = 1 (where is
the Lebesgue measure on [0,1]). Dene probability measures and
n
by (A) =
_
1
0
f1
A
d and

n
(A) =
n

i=1
f(i/n)1
A
(i/n)/
n

i=1
f(i/n).
Chapter 10: Weak convergence 49
(a) Prove that
n
.
(b) Explicitly, construct random variables Y and Y
n
so that /(Y ) = , /(Y
n
) =
n
, and Y
n
Y
with probability 1.
Solution.(a) Let A be a measurable set with (A) = 0. Then:
lim
n

n
(A) = lim
n

n
i=1
f(i/n)1
A
(i/n)

n
i=1
f(i/n)
= lim
n

n
i=1
f(i/n)1
A
(i/n)1/n

n
i=1
f(i/n)1/n
=
lim
n

n
i=1
f(i/n)1
A
(i/n)1/n
lim
n

n
i=1
f(i/n)1/n
=
_
1
0
f1
A
d
_
1
0
fd
=
_
1
0
f1
A
d
= (A)
Consequently, by Theorem 10.1.1,
n
.
(b) Let (, T, P) be Lebesgue measure on [0, 1]. Put
F
n
(x) =
n
((, x]) = (

[nx]
i=1
f(
i
n
)

n
i=1
f(
i
n
)
)1
[0,1]
(x) + 1
(1,)
(x)
and
F(x) = ((, x]) =
_
x
0
f(t)dt1
[0,1]
(x) + 1
(1,)
(x).
Then, consider :
Y
n
(w) = infx : F
n
(x) w
= infx [0, 1] :
[nx]

i=1
f(
i
n
)
n

i=1
f(
i
n
)w,
and
Y (w) = infx : F(x) w
= infx [0, 1] :
_
x
0
f(t)dt w,
where 0 w 1. Now, by the proof of the Theorem 10.1.1, it follows that Y
n
Y with probability
1.
50 Chapter 10: Weak convergence
Chapter 11
Characteristic functions
Exercise 11.5.2. Let
n
=
nmod 3
be a point mass at n mod 3.(Thus,
1
=
1
,
2
=
2
,
3
=
0
,

4
=
1
,
5
=
2
,
6
=
0
, etc.)
(a) Is
n
tight?
(b) Does there exist a Borel probability measure , such that
n
? (If so, then specify .)
(c) Does there exist a subsequence
n
k
, and a Borel probability measure , such that
n
k
? (If
so, then specify
n
k
and .)
(d) Relate parts (b) and (c) to theorems from this section.
Solution. (a) Yes. Take [a, b] = [0, 3], then, for all > 0 and for all n N :

n
([0, 3]) =
n
([0, 3])
mod 3
= 1
[0,3]
(n)[
n=0,1,2
= 1 > 1 .
(b) No. Assume, there exists such a distribution . Hence, by Theorem 10.1.1, for any x R with
(x) = 0, it follows lim
n

n
((, x]) = ((, x]). On the other hand, pick x (0, 1) such that
(x) = 0,. Then,
n
((, x]) = 1if3[n, 0etc. for all n N. Therefore, lim
n

n
((, x]) does not
exist, a contradiction.
(c) Yes. Put n
k
= 3k + 1 for k = 0, 1, 2, .... Then,
n
k
=
1
=
1
for k = 0, 1, 2, ..., implying
n
k

1
.
(d) The sequence
n

n=1
is a tight sequence of probability measures, satisfying assumptions of the
Theorem 11.1.10, and according to that theorem, there exists a subsequence
n
k

k=1
(here, n
k
= 3k+1
for k N) such that for some probability measure (here, =
1
),
n
k
.
Exercise 11.5.4. Let
2n
=
0
, and let
2n+1
=
n
, for n = 1, 2, .
(a) Does there exist a Borel probability measure such that
n
?
(b) Suppose for some subsequence
n
k

k=1
and some Borel probability measure , we have
n
k
.
What must must be?
(c) Relate parts (a) and (b) to Corollary 11.1.11. Why is there no contradiction?
52 Chapter 11: Characteristic functions
Solution. (a) No. Assume, there exists such a distribution . Hence, by Theorem 10.1.1, for any
x R with (x) = 0, it follows lim
n

n
((, x]) = ((, x]). On the other hand, pick x (0, 1)
such that (x) = 0 . Then,
n
((, x]) = 1if2[n, 0etc. for all n N. Therefore, lim
n

n
((, x])
does not exist, a contradiction.
(b) Since any subsequence
n
k

k=1
including an innite subsequence of
2k+1

k=1
is not weakly
convergence, we must consider
n
k

k=1
as a subsequence of
2k

k=1
. In this case,
n
k
= =
0
for
all k N, implying
n
k
.
(c) Since
n

n=1
is not tight (Take =
1
2
. Then, for any [a, b] R and any n N, if n b| +1 then

2n+1
([a, b]) = 0 < 1
1
2
.), the hypothesis of Corollary 11.1.11 is violated. Consequently, there is no
contradiction to the assertion of that Corollary.
Exercise 11.5.6. Suppose
n
. Prove or disprove that
n
must be tight.
Solution. First Method:
Given > 0. Since (R) = 1, there is a closed interval [a
0
, b
0
] such that:
1

2
([a
0
, b
0
]). ()
Next, by Theorem 10.1.1(2), there is a positive integer N such that:
[
n
([a
0
, b
0
]) ([a
0
, b
0
])[

2
n = N + 1, ,
implying:
([a
0
, b
0
])

2

n
([a
0
, b
0
]) n = N + 1, . ()
Combining () and () yields:
1
n
([a
0
, b
0
]) n = N + 1, . ( )
Next, for each 1 n N, there is a closed interval [a
n
, b
n
] such that:
1
n
([a
n
, b
n
]) n = 1, , N. ( )
Dene:
a = min
0nN
(a
n
) b = max
0nN
(b
n
).
Then, by [a
n
, b
n
] [a, b] for all 0 n N, the inequality ( ) and the inequality ( ) we have
that :
1
n
([a, b]) n = 1, 2, .
Second Method:
Let ,
1
,
2
, , be corresponding characteristic functions to the probability measures ,
1
,
2
, .
Then, by Theorem 11.1.14(the continuity theorem)
n
implies, lim
n

n
(t) = (t) for all t R. On
the other hand, is continuous on the R. Thus, according to Lemma 11.1.13,
n
is tight.
Chapter 11: Characteristic functions 53
Exercise 11.5.8. Use characteristic functions to provide an alternative solution of Exercise 10.3.2.
Solution. Since X and Y
n
are independent, it follows:

Zn
(t) =
X
(t)
Yn
(t) =
X
(t)(exp(it).
1
n
+ (1
1
n
))for all(n N).
Therefore, lim
n

Zn
(t) =
X
(t) for all t R. Hence, by Theorem 11.1.14 (the continuity theorem),
/(Z
n
) /(X).
Exercise 11.5.10. Use characteristic functions to provide an alternative solution of Exercise 10.3.4.
Solution. Let
n
and
n
. Then, by Theorem 11.1.14, lim
n

n
(t) =

(t) and lim


n

n
(t) =

(t) for all t R. Hence,

(t) =

(t) for all t R. Eventually, by Corollary 11.1.7., = .


Exercise 11.5.12. Suppose that for n N, we have P(X
n
= 5) =
1
n
and P(X
n
= 6) = 1
1
n
.
(a) Compute the characteristic function
Xn
(t), for all n N and all t R.
(b) Compute lim
n

Xn
(t).
(c) Specify a distribution such that lim
n

Xn
(t) =
_
exp(itx)(dx) for all t R.
(d) Determine (with explanation) whether or not /(X
n
) .
Solution. (a)

Xn
(t) = E(exp(itX
n
)) =
exp(5it)
n
+
(n1) exp(6it)
n
t R.
(b)
lim
n

Xn
(t) = exp(6it) t R.
(c)
=
6
.
(d)
As a result of the Theorem 11.1.14 (the continuity theorem), /(X
n
)
6
.
Exercise 11.5.14. Let X
1
, X
2
, be i.i.d. with mean 4 and variance 9. Find values C(n, x), for
n N and x R , such that as n , P(X
1
+X
2
+ +X
n
C(n, x)) (x).
Solution. As in Corollary 11.2.3:
P(
S
n
nm

nv
x) = (x) asn
Thus :
(x) = P(
S
n
4n

9n

C(n, x) 4n

9n
) = (
C(n, x) 4n

9n
),
asn , and injectivity of implies :
x =
C(n, x) 4n

9n
,
54 Chapter 11: Characteristic functions
or :
C(n, x) =

9nx + 4n.

Exercise 11.5.16. Let X be a random variable whose distribution /(X) is innitely divisible. Let
a > 0 and b R, and set Y = aX +b, prove that /(Y ) is innitely divisible.
Solution. Given n N. Since the distribution /(X) is innitely divisible, there is a distribution

n
such that if X
k

n
, 1 k n, are independent random variables, then

n
k=1
X
k
/(X) or,
equivalently, there is a distribution
n
such that if X
k
, 1 k n, with F
X
k
(x) =
n
((, x]), (x R)
are independent, then F

n
k=1
X
k
(x) = F
X
(x), (x R). The latest assertion yields that there is a
distribution
n
dened by
n
((, x]) =
n
((,
xb
a
]), (x R)such that if X
k
, 1 k n, with
F
aX
k
+b
(x) =
n
((, x]), (x R) are independent, then F

n
k=1
(aX
k
+b)
(x) = F
aX+b
(x), (x R).
Accordingly, the distribution /(Y ) is innitely divisible, as well.
Exercise 11.5.18. Let X, X
1
, X
2
, be random variables which are uniformly bounded , i.e. there
is M R with [X[ M and [X
n
[ M for all n. Prove that /(X
n
) /(X) if and only if E(X
k
n
)
E(X
k
) for all k N.
Solution. Let /(X
n
) /(X). Then, by Theorem 10.1.1, lim
n
_
fd(/(X
n
)) =
_
fd(/(X)) for any
bounded Borel measurable function f with /(X)(D
f
) = 0. Put f
k
(x) = x
k
(k N). Then, since
[f
k
[ M
k
< (k N), it follows that:
lim
n
E(X
k
n
) = lim
n
_
x
k
d(/(X
n
)) =
_
x
k
d(/(X)) = E(X
k
),
for all k N.
Conversely, let E(X
k
n
) E(X
k
) for all k N. Since [X[ M,
[M
X
(s)[ = [E(e
sX
)[ max(e
M
, e
M
) <
for all [s[ < 1. Therefore, by Theorem 11.4.3, /(X) is determined by its moments. Now, the desired
assertion follows from Theorem 11.4.1.
Chapter 12
Decompositions of probability laws
Exercise 12.3.2. Let X and Y be discrete random variables (not necessarily independent), and let
Z = X +Y . Prove that /(Z) is discrete.
Solution. Let F be an an uncountable set of positive numbers. Then,

xF
x = +. Thus, the
summation in the denition of Borel probability measure of discrete random variable is, in fact, taken
over a countable set for which the probability of each of its elements is nonzero. Hence, in the equalities

xR
/(X)(x) = /(X)(R)
and

yR
/(Y )(y) = /(Y )(R),
the sets A
X
= x R : /(X)(x) > 0 and B
Y
= y R : /(Y )(y) > 0 are countable. On the
other hand, for any z R we have /(X +Y )(z) P(X = x, Y = z x) > 0for somex R, and the
corresponding ordered pair (x, z x) in the recent result is an element of the countable set A
X
B
Y
.
Hence, the set C
Z
= z R : /(X +Y )(z) > 0 is countable. Thus:

zR
/(X +Y )(z) =

zC
Z
/(X +Y )(z)
= /(X +Y )(C
Z
) + 0
= /(X +Y )(C
Z
) +/(X +Y )(C
c
Z
)
= /(X +Y )(R).

Exercise 12.3.4. Let X and Y be random variables, with /(Y ) absolutely continuous, and let
Z = X +Y.
(a) Assume X and Y are independent. Prove that /(Z) is absolutely continuous, regardless of the
nature of /(X).
(b) Show that if X and Y are not independent, then /(Z) may fail to be absolutely continuous.
56 Chapter 12: Decompositions of probability laws
Solution. (a) Let for a Borel measurable set A R, (A) = 0. Hence, (A x) = 0, for all x R.
Then, by Corollary 12.1.2.(Radon-Nikodym Theorem), /(Y )(Ax) = 0 for all x R. Now, by Theorem
9.4.5.:
/(Z)(A) =
_
/(Y )(Ax)/(X)dx = 0.
Eventually, by another application of the Corollary 12.1.2.(Radon-Nikodym Theorem), the desired
result follows.
(b) First example:
Put X = Z and Y = Z, where Z N(0, 1). Then, X +Y = 0, which is clearly discrete.
Second example:
Similar to Exercise 6.3.5., let X N(0, 1), Y = XU where X and U are independent and P(U = 1) =
P(U = 1) =
1
2
. Then, X and Y are dependent. Put Z = X + Y. Then, for the Borel measurable set
A = 0, (A) = 0. But, /(Z)(A) =
1
2
,= 0 (Check!).
Exercise 12.3.6. Let A, B, Z
1
, Z
2
, be i.i.d., each equal to +1 with probability 2/3, or equal to 0
with probability 1/3. Let Y =

i=1
Z
i
2
i
as at the beginning of this section (so = /(Y ) is singular
continuous), and let W N(0, 1). Finally, let X = A(BY + (1 B)W), and set = /(X). Find a
discrete measure
disc
, an absolutely continuous measure
ac
, and a singular continuous measure
s
,
such that =
disc
+
ac
+
s
.
Solution. Using conditional probability, for any Borel measurable set U R , it follows:
/(X)(U) = P(X U)
= P(X U[A = 0)P(A = 0)
+ P(X U[A = 1)P(A = 1)
=
1
3
P(0 U)
+
2
3
P(BY + (1 B)W U)
=
1
3

0
(U)
+
2
3
(P(BY + (1 B)W U[B = 0)P(B = 0)
+ P(BY + (1 B)W U[B = 1)P(B = 1))
=
1
3

0
(U) +
2
3
(
1
3
P(W U) +
2
3
P(Y U))
=
1
3

0
(U) +
2
9
/(W)(U) +
4
9
/(Y )(U).
Accordingly, /(X) =
1
3

0
+
2
9
/(W) +
4
9
/(Y ).
Exercise 12.3.8. Let and be probability measures with and . (This is sometimes
Chapter 12: Decompositions of probability laws 57
written as .) Prove that
d
d
> 0 with -probability 1, and in fact
d
d
=
1
d
d
.
Solution. By Exercise 12.3.7.,
d
d
=
d
d
d
d
with -probability 1. We claim that :
d
d
=
d
d
d
d
probability1.()
To prove the above equality we notice that :
(w :
d
d
= 0) =
_
w:
d
d
=0
d
d
d = 0,
consequently,
d
d
> 0, probability-1 . Hence, () holds probability-1. Of course on the set
w :
d
d
= 0, the right hand side of () can be dened as zero. Now, let = . Then,
d
d
>
0, probability-1 and
d
d
=
1
d
d
. Eventually, the nal assertion follows by symmetric relationship of
and .
Exercise 12.3.10. Let and be absolutely continuous probability measures on R (with the Borel
algebra), with = . Write down an explicit Hahn decomposition R = A
+
A

for .
Solution. First, since and are absolutely continuous , they are countable additive (In fact,
(

n=1
A
n
) =
_
1

n=1
An
d
d
d
=
_

n=1
1
An
d
d
d
=

_

n=1
1
An
d
d
d
=

n=1
(A
n
),
where the union is considered as the disjoint one. ) Next, by linearity it follows that =
is countable additive, as well. Finally, by Remark (3), Page 145, (E) = (E A
+
) and (E) =
(E A

).
58 Chapter 12: Decompositions of probability laws
Chapter 13
Conditional probability and expectation
Exercise 13.4.2. Let ( be a sub--algebra, and let A be any event. Dene the random variable X to
be the indicator function 1
A
. Prove that E(X[() = P(A[() with probability 1.
Solution. By denition, both of E(1
A
[() and P(A[() are ( measurable random variables. Next, by
equations (13.2.1) and (13.2.2):
E(P(A[()1
G
) = P(A G)
= E(1
AG
)
= E(1
A
1
G
)
= E(E(1
A
[()1
G
),
for any G (. Consequently, E(1
A
[() = P(A[() with probability 1.
Exercise 13.4.4. Let X and Y be random variables with joint distribution given by /(X, Y ) = dP =
f(x, y)
2
(dx, dy), where
2
is two dimensional Lebesgue measure, and f : R
2
R is a non-negative
Borel measurable function with
_
R
2
fd
2
= 1. Show that we can take P(Y B[X) =
_
B
g
X
(y)(dy)
and E(Y [X) =
_
R
yg
X
(y)(dy), where the function g
x
: R R is dened by g
x
(y) =
f(x,y)

R
f(x,t)(dt)
whenever
_
R
f(x, t)(dt) is positive and nite, otherwise (say) g
x
(y) = 0.
Solution. First, let

Y
B
(x) =
_
B
g
X
(y)(dy). Then,

Y
B
(x) is a (X)-measurable random variable and
for A = (Y B) in Denition 13.1.4 an application of the Fubinis Theorem in Real Analysis yields:
E(

Y
B
(x)1
XS
) = E(
_
B
g
X
(y)(dy)1
XS
)
=
_
S
_
B
g
X
(y)f
X
(x)(dy)(dx)
=
_ _
(SB)
f(x, y)
2
(dx, dy)
= P((Y B) X S),
60 Chapter 13: Conditional probability and expectation
for all Borel S R, proving that P(Y B[X) =
_
B
g
X
(y)(dy) with probability 1.
Second, by considering E(1
B
[X) = P(Y B[X), the second equation follows from the rst equation
for the special case Y = 1
B
. Now, by usual linearity and monotone convergence arguments the second
equality follows for general Y .
Exercise 13.4.6. Let ( be a sub--algebra, and let X and Y be two independent random variables.
Prove by example that E(X[() and E(Y [() need not be independent.
Solution. Consider the sub--algebra ( = , . Let X, Y N(0, 1) be independent. Then:
E(X[() : R
E(X[()(w) = E(X) = 0,
and,
E(Y [() : R
E(Y [()(w) = E(Y ) = 0.
Consequently, E(X[() = E(Y [(), showing that they are not independent.
Exercise 13.4.8. Suppose Y is (-measurable. Prove that V ar(Y [() = 0.
Solution. Since Y and E(Y [() are (-measurable, it follows that Y E(Y [() is (-measurable, too.
Now, an application of Proposition 13.2.6 yields:
V ar(Y [() = E((Y E(Y [())
2
[()
= (Y E(Y [())E(Y E(Y [()[()
= (Y E(Y [())(E(Y [() E(Y [())
= 0.

Exercise 13.4.10. Give an example of jointly dened random variables which are not independent,
but such that E(Y [X) = E(Y ) w.p. 1.
Solution. Let X = Y = 1. Then, E(Y ) = 1 is a (X)-measurable random variable and:
E(E(Y )1
XS
) = E(1.1
XS
) = E(Y 1
XS
)
for any Borel S R. Accordingly, by Denition 13.1.4 the desired result follows.
Chapter 13: Conditional probability and expectation 61
Exercise 13.4.12. Let Z
n
be independent, each with nite mean. Let X
0
= a, and X
n
= a +Z
1
+
... +Z
n
for n 1. Prove that
E(X
n+1
[X
0
, X
1
, ..., X
n
) = X
n
+E(Z
n+1
).
Solution. Let ( = (X
0
, X
1
, ..., X
n
). Then, X
n
and E(Z
n+1
) are ( measurable and so is X
n
+E(Z
n+1
).
Since Z
1
, Z
2
, ..., Z
n
, Z
n+1
are independent , by Exercise 3.6.6 and induction it follows that Z
n+1
and
1
G
are independent for all G (. Consequently:
E((X
n
+E(Z
n+1
))1
G
) = E(X
n
1
G
) +E(E(Z
n+1
)1
G
)
= E(X
n
1
G
) +E(Z
n+1
1
G
)
= E((X
n
+Z
n+1
)1
G
)
= E(X
n+1
1
G
),
for all G (. Thus, E(X
n+1
[X
0
, X
1
, ..., X
n
) = X
n
+E(Z
n+1
) w.p.1.
62 Chapter 13: Conditional probability and expectation
Chapter 14
Martingales
Exercise 14.4.2. Let X
n
be a submartingale, and let a R. Let Y
n
= max(X
n
, a). Prove that
Y
n
is also a submartingale.
Solution. First,
E([Y
n
[) E(max([X
n
[, [a[)) E([X
n
[) +[a[ < ,
for all n N. Second, using conditional version of Exercise 4.5.2, it follows that:
E(Y
n+1
[(Y
0
, ..., Y
n
)) = E(max(X
n+1
, a)[(X
0
, ..., X
n
))
max(E(X
n+1
[(X
0
, ..., X
n
)), a)
max(X
n
, a)
= Y
n
,
W.P 1, for all n N.
Exercise 14.4.4. The conditional Jensens inequality states that if is a convex function, then
E((X)[() (E(X[()).
(a) Assuming this, prove that if X
n
is a submartingale, then so is (X
n
) whenever is non-
decreasing and convex with E([(X
n
)[) < , for all n.
(b) Show that the conclusion of the two previous exercises follow from part (a).
Solution. (a) First, by assumption E([(X
n
)[) < , for all n. Second, using conditional Jensens
inequality if follows that :
E((X
n+1
)[((X
0
), ..., (X
n
))) = E((X
n+1
)[(X
0
, ..., X
n
))
(E(X
n+1
[(X
0
, ..., X
n
)))
(X
n
),
w.p. 1, for all n N.
64 Chapter 14: Martingales
(b) It is sucient to consider non-decreasing and convex functions
1
(x) = max(x, a) and
2
(x) = x
2
in Exercise 14.4.2, and Exercise 14.4.3, respectively.
Exercise 14.4.6. Let X
n
be a stochastic process, let and be two non-negative-integer valued
random variables , and let m N.
(a) Prove that is a stopping time for X
n
if and only if n (X
0
, ..., X
n
) for all n 0.
(b) Prove that if is a stopping time, then so is min(, m).
(c) Prove that if and are stopping times for X
n
, then so is min(, ).
Solution. (a) Let satises n (X
0
, ..., X
n
) for all n 0. Then,
= n = n

n 1
c
(X
0
, ..., X
n
),
for all n 0. Thus, is a stopping time.
Conversely, let be a stopping time and let n 0 be given. Then,
= m (X
0
, ..., X
n
)
for all 0 m n, and , consequently,
n =
n
_
m=0
= m (X
0
, ..., X
n
).
(b) First, min(, m) is a non-negative integer-valued random variable. Second, let n 0 be given.
Then,
min(, m) n = n
_
m n,
and depending on whether m n or m > n we have that m n = or m n = , respectively.
Therefore, min(, m) n (X
0
, ..., X
n
). Now, the assertion follows by part (a).
(c) First, since and are non-negative-integer valued random variables, min(, ) has the same
properties. Second, let n 0 be given. Then, by part (a), n, n (X
0
, ..., X
n
) and,
consequently,
min(, ) n = n
_
n (X
0
, ..., X
n
).
Finally, another application of part (a) proves the desired result.
Exercise 14.4.8. Let X
n
be simple symmetric random walk, with X
0
= 0. Let = infn 5 :
X
n+1
= X
n
+ 1 be the rst time after 4 which is just before the chain increases. Let = + 1.
(a) Is a stopping time? Is a stopping time?
(b) Use Theorem 14.1.5., to compute E(X

).
(c) Use the result of part (b) to compute E(X

). Why does this not contradict Theorem 14.1.5.?


Solution. (a) No. is not a stopping time because
= n = X
6
,= X
5
+ 1, ..., X
n
,= X
n1
+ 1, X
n+1
= X
n
+ 1
/ (X
0
, ..., X
n
)
Chapter 14: Martingales 65
for all n 6.
Yes. First, = + 1 is a non- negative integer-valued random variable. Second, = n = =
n 1 = if 0 n 5, X
6
,= X
5
+ 1, ..., X
n1
,= X
n2
+ 1, X
n
= X
n1
+ 1 if n 6, implying that
= n (X
0
, ..., X
n
).
(b) E(X

) = E(X
0
) = 0.
(c) By denition of , we have X

= X

+ 1 and consequently, E(X

) = E(X

) 1 = 1. This result
does not contradict Theorem 14.1.5. due to the fact that is not a stopping time and, consequently,
the assumption of that Theorem is violated.
Exercise 14.4.10. Let 0 < a < c be integers. Let X
n
be simple symmetric random walk, started
at X
0
= a. Let = infn 1 : X
n
= 0, orc.
(a) Prove that X
n
is a martingale.
(b) Prove that E(X

) = a.
(c) Use this fact to derive an alternative proof of the gamblers ruin formula given in Section 7.2, for
the case p = 1/2.
Solution. (a) First, let Y
n
= X
n
a(n N). Then, by equation 14.0.2, it is a martingale. Hence,
E([X
n
[) E([Y
n
[) +[a[ < ,
for all n 0. Second, considering the denition of simple random walk in page 75, it follows that:
E(X
n+1
[(X
0
, ..., X
n
)) = E(X
n
+Z
n+1
[(X
0
, ..., X
n
))
= X
n
+E(Z
n+1
[(X
0
, ..., X
n
))
= X
n
+E(Z
n+1
[(Z
0
, ..., Z
n
))
= X
n
+E(Z
n+1
)
= X
n
.
for all n 0.
(b) By = min(
0
,
c
) and P(
0
< ) = 1 = P(
c
< ), it follows that;
P( < ) = P(
0
<
c
< ) = 1.
Next, [X
n
[1
n
c1
n
for all n 0. Consequently, by Corollary 14.1.7, it follows that E(X

) =
E(X
0
) = a.
(c) By part (b),
a = E(X

) = 0.P(X

= 0) +c.P(X

= c) = cP(
c
<
0
) = cS(a),
66 Chapter 14: Martingales
implying S(a) =
a
c
.
Exercise 14.4.12. Let S
n
and be as in Example 14.1.13.
(a) Prove that E() < .
(b) Prove that S

= + 10.
Solution. First, we claim that
P( > 3m) (
7
8
)
m
,
for all m N. To prove it we use induction. Let m = 1, then
P( > 3) = 1 P( = 3) = 1
1
8
=
7
8
.
Assume the assertion holds for positive integer m > 1. Then,
P( > 3(m+ 1)) = P( > 3m (r
3m+1
, r
3m+2
, r
3m+3
) ,= (1, 0, 1))
= P( > 3m)P((r
3m+1
, r
3m+2
, r
3m+3
) ,= (1, 0, 1))
(
7
8
)
m
.(
7
8
)
= (
7
8
)
m+1
,
proving the assertion for positive integer m+ 1. Second, using Proposition 4.2.9, we have that
E() =

k=0
P( > k)
=

m=1
(P( > 3m1) +P( > 3m2) +P( > 3m3))
3

m=1
P( > 3m3)
= 3 + 3

m=1
P( > 3m)
3 + 3

m=1
(
7
8
)
m
< .
(b) We consider 2 dierent players . Each of those numbered 1 to 3 has bet and lost $1 (The
person has lost the $1 bet or, has won the $1 bet and then has lost the $2 bet or, has won the $1
bet and the next $2 bet but has lost the $4 bet. In each of these three dierent cases, the person has
totally lost $1. ). The person numbered 2 has won all of $1, $2, and $4 bets successively and has
totally won $7. Accordingly,
S

= ( 3)(1) + 1.7 = + 10.

Chapter 14: Martingales 67


Exercise 14.4.14. Why does the proof of Theorem 14.1.1 fail if M = ?
Solution. Let X
n

n=0
be dened by X
0
= 0 and X
n
=

n
i=1
Z
i
where Z
i
are i.i.d, with P(Z
i
=
+1) = P(Z
i
= 1) =
1
2
for all i N. Then, by Exercise 13.4.2, it is a martingale. Now, by Exercise
4.5.14(c),
E(

i=1
Z
i
) ,=

i=1
E(Z
i
),
showing that the Proof of Theorem 14.1.1 fails for the case M = .
Exercise 14.4.16. Let X
n
be simple symmetric random walk, with X
0
= 10. Let = minn
1 : X
n
= 0, and let Y
n
= X
min(n,)
. Determine (with explanation) whether each of the following
statements is true or false.
(a) E(X
200
) = 10.
(b) E(Y
200
) = 10.
(c) E(X

) = 10.
(d) E(Y

) = 10.
(e) There is a random variable X such that X
n
X a.s.
(f) There is a random variable Y such that Y
n
Y a.s.
Solution. (a) True. Since X
n
is a martingale, using equation 14.0.4. it follows that E(X
200
) =
E(X
0
) = 10.
(b) True. Consider the stopping time = min(, 200). Then, 0 200 and, therefore, by Corollary
14.1.3,
E(Y
200
) = E(X

) = E(X
0
) = 10.
(c) False. Since P(X

= 0) = 1, it follows that E(X

) = 0 ,= 10.
(d) False. Since Y
(w)
(w) = X
min((w),(w))
(w) = X
(w)
(w) for all w , it follows that E(Y

) =
E(X

) = 0 ,= 10.
(e) False. Since limsup
n
X
n
= + and liminf
n
X
n
= , there is no nite random variable X with
lim
n
X
n
= X a.s.
(f) True. For the non-negative martingale Y
n
an application of Corollary 14.2.2, yields existence of
a nite random variable Y with lim
n
Y
n
= Y a.s.
Exercise 14.4.18. Let 0 < p < 1 with p ,= 1/2, and let 0 < a < c be integers. Let X
n
be simple
random walk with parameter p , started at X
0
= a. Let = infn 1 : X
n
= 0, orc.
(a) Compute E(X

) by direct computation.
68 Chapter 14: Martingales
(b) Use Walds theorem part (a) to compute E() in terms of E(X

).
(c) Prove that the games expected duration satises
E() = (a c[((1 p)/p)
a
1]/[((1 p)/p)
c
1])/(1 2p).
(d) Show that the limit of E() as p 1/2 is equal to a(c a).
Solution. (a)
E(X

) = c.P(X

= c) + 0.P(X

= 0) = c.(
1 (
1p
p
)
a
1 (
1p
p
)
c
).
(b) From E(X

) = a + (p q)E() it follows that


E() =
1
q p
(a E(X

)).
(c) By parts (a) and (b), it follows that
E(X

) =
1
1 2p
(a c.(
1 (
1p
p
)
a
1 (
1p
p
)
c
)).
(d) Applying I.Hopitals rule, it follows that :
lim
p1/2
E() = lim
p1/2
a(1 (
1p
p
)
c
) c(1 (
1p
p
)
a
)
(1 2p)(1 (
1p
p
)
c
)
=
Hop
lim
p1/2
a.c(
1p
p
)
c1
(
1
p
2
) c(a)(
1p
p
)
a1
(
1
p
2
)
2(1 (
1p
p
)
c
) + (1 2p)(c(
1p
p
)
c1
(
1
p
2
))
= lim
p1/2
ac.
(
1p
p
)
a1
(
1p
p
)
c1
2p
2
(1 (
1p
p
)
c
) (1 2p)c(
1p
p
)
c1
=
Hop
lim
p1/2
ac.
(a 1)(
1p
p
)
a2
(
1
p
2
) (c 1)(
1p
p
)
c2
(
1
p
2
)
4p(1 (
1p
p
)
c
) + 2p
2
(c(
1p
p
)
c1
(
1
p
2
)) + 2c(
1p
p
)
c1
(1 2p)c(c 1)(
1p
p
)
c2
(
1
p
2
)
= ac.
(a 1)(4) (c 1)(4)
0 + 2c + 2c 0
= a(c a).

Exercise 14.4.20. Let X


n
be a martingale with [X
n+1
X
n
[ 10 for all n. Let = infn 1 :
[X
n
[ 100.
(a) Prove or disprove that this implies that P( < ) = 1.
(b) Prove or disprove that this implies there is a random variable X with X
n
X a.s.
(c) Prove or disprove that this implies that
P( < , or there is a random variable X withX
n
X) = 1.
Solution. (a) No. As a counterexample, let X
0
= 0 and X
n
=

n
i=1
Z
i
where Z
i
are i.i.d with
P(Z
i
= 2
i
) =
1
2
and P(Z
i
= 2
i
) =
1
2
for all i N. Since, E(Z
i
) = 0 for all i N, by Exercise
13.4.12, X
n
is a martingale. Clearly, [X
n+1
X
n
[ < 2 < 10 for all n N. But,
[X
n
[ = [
n

i=1
Z
i
[
n

i=1
[Z
i
[ < 2
Chapter 14: Martingales 69
for all n N, implying = inf = , and consequently, P( < ) = 0 ,= 1.
(b) No. As a counterexample, consider the simple symmetric random walk with X
0
= 0. In this case,
[X
n+1
X
n
[ < 2 10 for all n, however, limsup
n
X
n
= + and liminf
n
X
n
= , showing that
there is no nite random variable X with lim
n
X
n
= X a.s.
(c) Yes. Dene :
A

= limsup
n
X
n
= liminf
n
X
n
= +,
and
B = there is a random variable X withX
n
X.
We claim that P(A

B) = 1. To do this, let a N and dene = infn 1 : X


n
a. Then,
X
min(,n)
is a martingale satisfying :
sup
n
E(X
+
min(,n)
) a +E(sup
n
[X
n+1
X
n
[) a + 10 < .
Thus, by Theorem 14.2.1, X
min(,n)
converges a.s. Hence, X
n
converges a.s. on = =
sup
n
X
n
< a. Let a , then it follows that X
n
converges a.s. on sup
n
X
n
< . A symmetric
argument applied to X
n
shows that X
n
converges a.s. on inf
n
X
n
> , proving our claim.
Now, since < A

, the desired result is proved.


70 Chapter 14: Martingales
Chapter 15
General stochastic processes
Exercise 15.1.6. Let X
1
, X
2
, ... be independent, with X
n

n
.
(a) Specify the nite dimensional distributions
t
1
,t
2
,...,t
k
for distinct non-negative integers t
1
< t
2
<
... < t
k
.
(b) Prove that these
t
1
,t
2
,...,t
k
satisfy (15.1.1) and (15.1.2).
(c) Prove that Theorem 7.1.1 follows from Theorem 15.1.3.
Solution. (a) Since the -algebra of k-dimensional Borel sets is generated by the class of all bounded
rectangles H
R
=

k
n=1
I
n
where I
n
= (a
n
, b
n
], (1 n k) it is sucient to specify the distribution

t
1
,t
2
,...,t
k
on H
R
s. Now, we have that:

t
1
,t
2
,...,t
k
(
k

n=1
I
n
) = P(X
tn
I
n
: 1 n k)
=
k

n=1
P(X
tn
I
n
)
=
k

n=1

tn
(I
n
).()
(b) To prove (15.1.1), several consecutive applications of Proposition 3.3.1 on the both sides of the
equality () in part (a) imply that:

t
1
,t
2
,...,t
k
(
k

n=1
H
n
) =
k

n=1

tn
(H
n
)
=
k

n=1

t
s(n)
(H
s(n)
)
=
t
s(1)
,t
s(2)
,...,t
s(k)
(
k

n=1
H
s(n)
),
72 Chapter 15: General stochastic processes
for all Borel H
1
, H
2
, ..., H
k
R and all permutations (s(n))
k
n=1
of (n)
k
n=1
.
To prove (15.1.2) , by a similar argument used for the proof of (15.1.1), it follows that :

t
1
,t
2
,...,t
k
(H
1
H
2
... H
k1
R) =
k1

n=1

tn
(H
n
)
t
k
(R)
=
k1

n=1

tn
(H
n
)
=
t
1
,t
2
,...,t
k1
(H
1
H
2
... H
k1
),
for all Borel H
1
, H
2
, ..., H
k1
R.
(c) We consider the family of Borel probability measures
t
1
,t
2
,...,t
k
: k N, t
i
Ndistinct with

t
1
,t
2
,...,t
k
dened by

t
1
,t
2
,...,t
k
(
k

n=1
H
n
) =
k

n=1

tn
(H
n
),
for all Borel H
1
, H
2
, ..., H
k
R. Then, by part (b) it satises the consistency conditions (C1) and (C2).
Consequently, by Theorem 15.1.3, there is a probability space (R
N
, T
N
, P) and random variables X
n

dened on that triple such that

t
1
,t
2
,...,t
k
(H) = P((X
t
1
, X
t
2
, ..., X
t
k
) H), ()
for all k N, distinct t
1
, ..., t
k
N and Borel H R
k
. Now, put
H = R ... R H
n
R... R,
where H
n
R is Borel (1 n k). Then, by () :

tn
(H
n
) = P(X
tn
H
n
) = /
tn
(H
n
),
for all n N.
Exercise 15.2.4. Consider a Markov chain which is irreducible with respect to some non-zero -
nite measure , and which is periodic with corresponding disjoint subsets A
1
, ..., A
d
. Let B =
i
A
i
.
(a) Prove that P
n
(x, B
c
) = 0 for all x B.
(b) Prove that (B
c
) = 0.
(c) Prove that (A
i
) > 0 for some i.
Solution. (a) We prove the assertion by induction. Let n = 1 and x B =
d
i=1
A
i
. Then, there exists
an unique 1 i d such that x A
i
, and consequently, P(x, A
i+1
) = 1 if 1 i d1 or P(x, A
1
) = 1
if i = d. Hence,
0 P(x, B
c
) P(x, A
c
i+1
) = 0 if 1 i d 1
or
0 P(x, B
c
) P(x, A
c
1
) = 0 if i = d,
Chapter 15: General stochastic processes 73
implying P
1
(x, B
c
) = 0. Next, assume the assertion holds for positive integer n > 1. Then, by
P
n
(z, B
c
) = 0(z B
c
) it follows that:
P
n+1
(x, B
c
) =
_
.
P(x, dz)P
n
(z, B
c
)
=
_
B
P(x, dz)P
n
(z, B
c
) +
_
B
c
P(x, dz)P
n
(z, B
c
)
= 0,
proving the assertion for positive integer n + 1.
(b) Since
B
c = infn 0 : X
n
B
c
= inf = , it follows that P
x
(
B
c < ) = 0, for all x / B
c
,
and, consequently, (B
c
) = 0.
(c) Using part (b), we have that:
d

i=1
(A
i
) = (B) = (A) (B
c
) = (A) > 0,
implying that (A
i
) > 0 for some 1 i d.
Exercise 15.2.6. (a) Prove that a Markov chain on a countable state space A is -irreducible if and
only if there is j A such that P
i
(
j
< ) > 0, for all i A.
(b) Give an example of a Markov chain on a countable state space which is - irreducible, but which
is not irreducible in the sense of Subsection 8.2.
Solution. (a) Let the given Markov chain on the countable state space A be irreducible. Then, for
some A = j T with (A) > 0, we have
A
=
j
implying P
i
(
j
< ) > 0 for all i A. Conversely,
let there is j A such that P
i
(
j
< ) > 0 for all i A. Then, we consider the -nite measure on
T dened by:
(A) =
j
(A),
for all A T. Now, let A T with (A) > 0, then j A implying that
A

j
and consequently,
P
i
(
A
< ) P
i
(
j
< ) > 0 for all i A.
(b) Consider the Markov chain given in the solution of Exercise 8.5.20(a), which is reducible in the
sense of Subsection 8.2. However, for j = 2 we have that:
P
i
(
2
< ) = P(n 0 : X
n
= 2[X
0
= i)

i
1
,...,i
n1
p
ii
1
p
i
1
i
2
...p
i
n2
i
n1
p
i
n2
2
> 0,
74 Chapter 15: General stochastic processes
for i = 1, 2. Hence, by part (a), this Markov chain is -irreducible.
Exercise 15.2.8. Consider a discrete Markov chain with state space A = R, and with transition
probabilities such that P(x, ) is uniform on the interval [x 1, x + 1]. Determine whether or not this
chain is -irreducible.
Solution. Yes, this chain is -irreducible. Indeed, P(x, ) has positive density (with respect to Lebesgue
measure) throughout [x 1, x + 1]. Then, by the convolution formula, P
2
(x, ) has positive density
throughout [x2, x+2]. By induction, P
n
(x, ) has positive density throughout [xn, x+n]. Now, if
A A has positive Lebesgue measure, then A[xn, x+n] must have positive Lebesgue measure for
some n N. It then follows that P
n
(x, A) > 0. Hence, the chain is -irreducible where is Lebesgue
measure.
Exercise 15.2.10. Consider the Markov chain with A = R, and with P(x, .) = N(x, 1) for each x A.
(a) Prove that this chain is -irreducible and aperiodic.
(b) Prove that his chain does not have a stationary distribution. Relate this to Theorem 15.2.3.
Solution. (a) Here P(x, ) has an everywhere-positive density (with respect to Lebesgue measure). It
follows that if A A has positive Lebesgue measure, then P(x, A) > 0 for all x X. So, the chain is
-irreducible where is Lebesgue measure. As for aperiodicity, if A
1
, . . . , A
d
is a periodic decomposi-
tion for some d 2, then if x
1
A
1
then P(x
1
, A
2
) > 0. This implies that A
2
has positive Lebesgue
measure, which in turn implies that P(x, A
2
) > 0 for all x A, even for x A
2
. This contradicts
the assumption of periodic decomposition. So, no such periodic decomposition exists, i.e. the chain is
aperiodic.
(b) This chain is equivalent to adding an independent N(0, 1) random variable at each iteration. So, if
a stationary probability distribution () existed, it would have to satisfy the property that if X ()
and Y N(0, 1) are independent, then X + Y (). It would follow by induction that if Y
1
, Y
2
, . . .
are i.i.d. N(0, 1) (and independent of X), then Z
n
X + Y
1
+ . . . + Y
n
() for all n, which
would imply that for any a R, we have ((a, )) = lim
n
P(Z
n
> a) = 1/2. This is impossible,
since we must have lim
a
((a, )) = 0. Hence, no such stationary probability distribution exists.
Hence, Theorem 15.2.3 does not apply, and the distributions P
n
(x, ) need not converge. (In fact,
lim
n
P
n
(x, A) = 0 for every bounded subset A.)
Exercise 15.2.12. Show that the nite-dimensional distributions implied by (15.2.1) satisfy the two
consistency conditions of the Kolmogorov Existence Theorem. What does this allow us to conclude?
Solution. Since (15.2.1) species the probabilities for random variables specically in the order
X
0
, X
1
, . . . , X
n
, the probabilities for random variables in any other order would be found simply by
un-permuting them and then applying the same formula (15.2.1). Hence, (C1) is immediate. Similarly,
since (15.2.1) species the probabilities for the entire sequence X
0
, X
1
, . . . , X
n
, the probabilities for
just a subset of these variables would be found by integrating over the missing variables, i.e. by setting
A
i
= R for each of them, thus automatically satisfying (C2). Hence, since (C1) and (C2) are satised,
Chapter 15: General stochastic processes 75
it follows that there must exist some probability triple on which random variables can be dened which
satisfy the probability specications (15.2.1). Informally speaking, this says that Markov chains on
general state spaces dened by (15.2.1) must actually exist.
Exercise 15.3.6. Let A = 1, 2, and let Q = (q
ij
) be the generator of a continuous time Markov
process on A, with
Q =
_
3 3
6 6
_
.
Compute the corresponding transition probabilities P
t
= (p
t
ij
) of the process, for any t > 0.
Solution. For given generator matrix Q, an easy argument by induction shows that (Check!):
Q
m
=
_
3
2m1
(1)
m
3
2m1
(1)
m
2.3
2m1
(1)
m
2.3
2m1
(1)
m
_
m 1.
Now, by Exercise 15.3.5, it follows that:
P
t
= exp(tQ)
= I +

m=1
t
m
m!
Q
m
= I +

m=1
t
m
m!
_
3
2m1
(1)
m
3
2m1
(1)
m
2.3
2m1
(1)
m
2.3
2m1
(1)
m
_
=
_
1 +
1
3

m=1
(9t)
m
m!

1
3

m=1
(9t)
m
m!
2
3

m=1
(9t)
m
m!
1 +
2
3

m=1
(9t)
m
m!
_
=
_
1 +
1
3
(e
9t
1)
1
3
(e
9t
1)
2
3
(e
9t
1) 1 +
2
3
(e
9t
1).
_
(t > 0)

Exercise 15.3.8. For a Markov chain on a nite state space A with generator Q, prove that
i

i.
is a stationary distribution if and only if Q = 0, i.e. if and only if

i.

i
q
ij
= 0 for all j A.
Solution. First, let
i

i.
be a stationary distribution. Then, by equation 15.3.4 and niteness of
A it follows that:

i.

i
q
ij
=

i.

i
( lim
t0
+
p
t
ij

ij
t
)
= lim
t0
+

i.

i
p
t
ij

i.

i

ij
t
= lim
t0
+

j

j
t
= 0,
76 Chapter 15: General stochastic processes
for all j A, i.e. Q = 0.
Conversely, assume Q = 0, then by Exercise 15.3.5 we have that:
P
t
= exp(t.Q)
= (I +

m=1
t
m
m!
Q
m
)
= +

m=1
t
m
m!
(Q)Q
m1
= ,
for all t > 0. Hence, is a stationary distribution.
Exercise 15.3.10. (Poisson Process.) Let > 0, let Z
n
be i.i.d. Exp(), and let T
n
= Z
1
+Z
2
+
... +Z
n
for n 1. Let N
t

t0
be a continuous time Markov process on the state space A = 0, 1, 2, ...,
with N
0
= 0, which does not move except at the times T
n
. (Equivalently, N
t
= #n N : T
n
n;
intuitively, N
t
counts the number of events by time t.)
(a) Compute the generator Q for this process.
(b) Prove that P(N
t
m) = e
t
(t)
m
/m! +P(N
t
m1) for m = 0, 1, 2, ....
(c) Conclude that P(N
t
= j) = e
t
(t)
j
/j!, i.e. that N
t
Poisson(t).
Solution. (a) Since the exponential distribution has memoryless property, it follows that:
P(N
t+.t
i + 1[N
t
= i) = P(T
i+1
t +t[T
i
t, T
i+1
> t)
= P(Z
i+1
t +t T
i
[T
i
t, Z
i+1
> t T
i
)
= P(Z
i+1
t)
= 1 exp( t),
for all i A. Consequently,
q
i,i+1
= lim
.t0
+
P(N
t+.t
= i + 1[N
t
= i)
t
= lim
.t0
+
1 exp( t)
t
= ,
and,
q
i,i
= lim
.t0
+
P(N
t+.t
= i[N
t
= i) 1
t
= ,
for all i A. Next, by considering

j.
q
ij
= 0 for all i A, it follows that

j,=i,i+1
q
ij
= 0 for all
i A. On the other hand, q
ij
0 for all j ,= i, i + 1 and , consequently, q
ij
= 0 for all j ,= i, i + 1 and
i A. To sum up , the generator matrix has the form:
Q =
_
_
_
_
_
0 0 ...
0 0 ...
0 0 ...
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
_
_
_
_
_
.
(b) For the Poisson process N(t) with rate parameter , and the mth arrival time T
m
(m 1), the
number of arrivals before some xed time t is less than m+ 1 if and only if the waiting time until the
Chapter 15: General stochastic processes 77
m + 1th arrival is more than t. Hence the event N(t) < m + 1 = N(t) m occurs if and only if
the event T
m+1
> t occurs. Thus, the probabilities of these events are the same:
P(N
t
m) = P(T
m+1
> t). ()
Next, since Z
k
Exp() = Gamma(1, ), (1 k m), by Exercise 9.5.17 and induction it follows
that T
m
Gamma (m, ), (m 1). Therefore, by equality () and integration by parts it follows that:
P(N
t
m) = P(T
m+1
> t)
=
_

t
(x)
m
e
x
m!
dx
=
_

t
(x)
m
d(
e
x
m!
)dx
= (x)
m
(
e
x
m!
)[

t
+
_

t
e
x
m!
m(x)
m1
dx
= e
t
(t)
t
m!
+P(T
m
> t)
= e
t
(t)
t
m!
+P(N
t
m1),
for all m 1.
(c) By part (b), we have that :
P(N
t
= j) = P(N
t
j) P(N
t
j 1) = e
t
(t)
j
/j!,
for all t > 0 and j = 0, 1, 2, ....
Exercise 15.3.12. (a) Let N
t

t0
be a Poisson process with rate > 0, let 0 < s < t, and let U
1
, U
2
be i.i.d. Uniform[0, t].
(a) Compute P(N
s
= 0[N
t
= 2).
(b) Compute P(U
1
> s, U
2
> s).
(c) Compare the answers to parts (a) and (b). What does this comparison seem to imply?
Solution. (a)
P(N
s
= 0[N
t
= 2) =
P(N
t
= 2[N
s
= 0)P(N
s
= 0)
P(N
t
= 2)
=
P(N
ts
= 2)P(N
s
= 0)
P(N
t
= 2)
=
(
e
(ts)
((ts))
2
2!
)e
s
e
t
(t)
2
2!
= (
t s
t
)
2
.
78 Chapter 15: General stochastic processes
(b)
P(U
1
> s, U
2
> s) = P(U
1
> s)P(U
2
> s)
= (
_
t
s
dx
t
)(
_
t
s
dx
t
)
= (
t s
t
)
2
.
(c) By comparing part (a) and part (b), we have that:
P(N
s
= 0[N
t
= 2) = P(U
1
> s, U
2
> s) 0 < s < t.

Exercise 15.4.4. Let B


t

t0
be Brownian motion, and let X
t
= 2t + 3B
t
for t 0.
(a) Compute the distribution of X
t
for t 0.
(b) Compute E(X
2
t
) for t 0.
(c) Compute E(X
s
X
t
) for 0 < s < t.
Solution. (a) If Y N(,
2
), then aY + b N(a + b, a
2

2
), where a ,= 0, b R. Hence, from
B
t
N(0, t) it follows that X
t
N(2t, 9t), t 0
(b)
E(X
2
t
) = var(X
t
) +E
2
(X
t
) = 9t + (2t)
2
= 4t
2
+ 9t.
(c)
E(X
s
X
t
) = E((3B
s
+ 2s)(3B
t
+ 2t))
= E(9B
s
B
t
+ 6tB
s
+ 6sB
t
+ 4st)
= 9E(B
s
B
t
) + 6tE(B
s
) + 6sE(B
t
) + 4st
= 9s + 4st.(0 s t)

Exercise 15.4.6. (a) Let f : R R be Lipschitz function, i.e. a function for which there exists R
such that [f(x) f(y)[ [xy[ for all x, y R. Compute lim
h0
+(f(t +h) f(t))
2
/h for any t R.
(b) Let B
t
be Brownian motion. Compute lim
h0
+ E((B
(
t +h) B
(
t))
2
/h) for any t > 0.
(c) What do parts (a) and (b) seem to imply about Brownian motion?
Solution. (a) By considering :
0 [(f(t +h) f(t))
2
/h[

2
[t +h t[
2
h
=
2
.h,
Chapter 15: General stochastic processes 79
and lim
h0
+
2
.h = 0, it follows that, lim
h0
+(f(t +h) f(t))
2
/h = 0, for any t > 0.
(b) Since /(B
t+h
B
t
) = N(0, t +h t) = N(0, h) = /(B
h
), we have that :
lim
h0
+
E((B
t+h
B
t
)
2
/h) = lim
h0
+
V ar(B
h
) +E
2
(B
h
)
h
,
= lim
h0
+
h + 0
2
h
,
= 1.
(c) Let B
t
, t 0 be Lipschitz function. Then, by part (a), lim
h0
+(B
t+h
B
t
)
2
/h = 0, and consequently,
by Part (b),
0 = lim
h0
+
E((B
t+h
B
t
)
2
/h) = 1,
a contradiction. Therefore, B
t
, t 0 is not a Lipschitz function, yielding that at least one of four
Dini-derivatives of B
t
, t 0 , dened by :
D
+
B
t
= limsup
h0
+
B
t+h
B
t
h
,
D

B
t
= limsup
h0
+
B
t
B
th
h
,
D
+
B
t
= liminf
h0
+
B
t+h
B
t
h
,
D

B
t
= liminf
h0
+
B
t
B
th
h
,
is unbounded on [0, ).
Exercise 15.5.2. Let X
t

tT
and X

tT
be stochastic processes with the countable time index T.
Suppose X
t

tT
and X

tT
have identical nite-dimensional distributions. Prove or disprove that
X
t

tT
and X

tT
must have the same full joint distribution.
Solution. We show that if the distributions of both stochastic processes X
t

tT
and X

tT
satisfy
the innite version of the equation (15.1.2), then the answer is armative. For this case, let T =
80 Chapter 15: General stochastic processes
t
n

n=1
. Then,two applications of Proposition 3.3.1. yield:
P(

m=1
X
tm

m=1
H
m
) = P( lim
n
(

m=1
X
tm
H
1
... H
n
R ...))
= lim
n
P(

m=1
X
tm
H
1
... H
n
R ...)
= lim
n
P(
n

m=1
X
tm
H
1
... H
n
)
= lim
n
P

(
n

m=1
X

tm
H
1
... H
n
)
= lim
n
P

m=1
X

tm
H
1
... H
n
R ...)
= P

( lim
n
(

m=1
X

tm
H
1
... H
n
R ...))
= P

m=1
X

tm

m=1
H
m
)
for all Borel H
m
R(1 m < .
Exercise 15.6.8. Let B
t

t0
be standard Brownian motion, with B
0
= 0. Let X
t
=
_
t
0
ads +
_
t
0
bB
s
ds = at+bB
t
be a diusion with constant drift (x) = a > 0 and constant volatility (x) = b > 0.
Let Z
t
= exp[2aX
t
/b
2
].
(a) Prove that Z
t

t0
is a martingale, i.e. that E[Z
t
[Z
u
(0 u s)] = Z
s
for 0 < s < t.
(b) Let A, B > 0 and let T
A
= inft 0 : X
t
= A and T
B
= inft 0 : X
t
= B denote the rst
hitting times of A and B, respectively. Compute P(T
A
< T
B
).
Solution. (a) Let c = 2a/b
2
, so Z
t
= exp[cX
t
]. Then by the independent increments property,
E[Z
t
[Z
u
(0 u s)] = E[Z
s
(Z
t
/Z
s
)[Z
u
(0 u s)]
= E[Z
s
exp(c(X
t
X
s
))[Z
u
(0 u s)] = Z
s
E[exp(c(X
t
X
s
))] . ()
But X
t
X
s
= a(t s) +b(B
t
B
s
) N(a(t s), b
2
(t s)), so X
t
X
s
= a(t s) +b

t s U where
U N(0, 1). Hence,
E[exp(c(X
t
X
s
))] = E[exp(ac(t s) +bc

t s U)]
= exp[ac(t s) +b
2
c
2
(t s)/2] = exp[a(2a/b
2
)(t s) +b
2
(4a
2
/b
4
)(t s)/2]
= exp[(2a
2
/b
2
)(t s) + (2a
2
/b
2
)(t s)] = exp[0] = 1 .
Substituting this into (), we have E[Z
t
[Z
u
(0 u s)] = Z
s
(1) = Z
s
.
(b) Let p = P(T
A
< T
B
), and let = min(T
A
, T
B
). Then is a stopping time for the martingale. So,
Chapter 15: General stochastic processes 81
by Corollary 14.1.7, we must have E(Z

) = E(Z
0
) = exp(0 + 0) = 1, so that p e
cA
+ (1 p)e
cB
= 1,
whence p(e
cA
e
cB
) = 1 c
cB
, so p = (1 e
cB
)/(e
cA
e
cB
).
Exercise 15.6.10. Let p
t
ij
be the transition probabilities for a Markov chain on a nite state space
A. Dene the matrix Q = (q
ij
) by (15.3.4.). Let f : A R be any function, and let i A. Prove that
(Qf)
i
(i.e.,

k.
q
ik
f
k
) corresponds to (Qf)(i) from (15.6.4).
Solution. Using (15.6.7) and niteness of A, we have that:
(Qf)(i) = lim
t0
+
E
i
(f(X
t
0
+t
) f(X
t
0
))
t
= lim
t0
+
E(f(X
t
0
+t
) f(X
t
0
)[X
t
0
= i)
t
= lim
t0
+

k.
(p
t
ik
f
k

ik
f
k
)
t
= lim
t0
+

k.
(
1
t
(p
t
ik

ik
)f
k
)
=

k.
lim
t0
+
1
t
(p
t
ik

ik
)f
k
=

k.
q
ik
f
k
= (Qf)
i
.

Exercise 15.6.12. Suppose a diusion X


t

t0
has generator given by (Qf)(x) =
x
2
f

(x) +
1
2
f

(x).
(Such a diusion is called an Ornstein-Uhlenbeck process.)
(a) Write down a formula for dX
t
.
(b) Show that X
t

t0
is reversible with respect to the standard normal distribution, N(0, 1).
Solution. (a) By equation (15.6.6) it follows that:
(x)f

(x) +
1
2

2
(x)f

(x) =
x
2
f

(x) +
1
2
f

(x),
and, consequently, (x) =
x
2
and (x) = 1. Now, by recent result and equation (15.6.2) we conclude
that :
dX
t
= dB
t

X
t
2
dt.
(b) The standard normal distribution has probability distribution
g(x) =
1

2
exp(
x
2
2
) < x < ,
82 Chapter 15: General stochastic processes
and, therefore, (x) =
x
2
=
1
2
g

(x)
g(x)
. Besides, (x) = 1, and the desired result follows by Exercise
15.6.9.
Exercise 15.7.2. Consider the Ornstein-Uhlenbeck process X
t

t0
of Exercise 15.6.12, with generator
(Qf)(x) =
x
2
f

(x) +
1
2
f

(x).
(a) Let Y
t
= X
2
t
for each t 0. Compute dY
t
.
(b) Let Z
t
= X
3
t
for each t 0. Compute dZ
t
.
Solution. (a),(b) For f
n
(x) = x
n
(n 1), (x) = 1, and (x) =
x
2
an application of the equation
(15.7.1) yields:
d(f
n
(X
t
)) = f

n
(X
t
)(X
t
)dB
t
+ (f

n
(X
t
)(X
t
) +
1
2
f

n
(X
t
)
2
(X
t
))dt
= nX
n1
t
dB
t
+ (
nX
n
t
+n(n 1)X
n2
t
2
)dt.

Exercise 15.8.6. Show that (15.8.4) is indeed equal to (15.8.5).


Solution.Using change of variable method and
(x) = 1 (x)
for all x 0 we have that:

e
rT
max(0, P0 exp(x + (r
1
2

2
)T) q)e

x
2
2T

2T
dx =

(log(
q
P
0
)(r
1
2

2
)T))
e
rT
(P0 exp((x + (r
1
2

2
)T) q)e

x
2
2T

2T
dx
= P0

(log(
q
P
0
)(r
1
2

2
)T))
1

2T
exp(
(x T)
2
2T
)dx
qe
rT

(log(
q
P
0
)(r
1
2

2
)T))
1

2T
exp(
x
2
2T
)dx
= P0

T
(log(
q
P
0
)(r+
1
2

2
)T))
1

2
exp(
s
2
2
)ds
qe
rT

T
(log(
q
P
0
)(r
1
2

2
)T))
1

2
exp(
s
2
2
)ds
= P0(
1

T
(log(
P0
q
) + (r +
1
2

2
)T))
qe
rT
(
1

T
(log(
P0
q
) + (r
1
2

2
)T)).

Exercise 15.8.8. Consider the price formula (15.8.5), with r, , P


0
and q xed positive quantities.
(a) What happens to the price (15.8.5) as T 0?Does this result make intuitive sense?
Chapter 15: General stochastic processes 83
(b) What happens to the price (15.8.5) as T ?Does this result make intuitive sense?
Solution.(a)
lim
T0
+
(P
0
(
(log(
P
0
q
) + (r +
1
2

2
)T)

T
) qe
rT
(
(log(
P
0
q
) + (r
1
2

2
)T)

T
))
equals to P
0
q if P
0
> q,
1
2
(P
0
q) if P
0
= q, and 0 if P
0
< q.
This result means that a fair price for the option to purchase the stock at a xed T 0, for a xed
price q > 0 and the current price P
0
of the stock is P
0
q,
1
2
(P
0
q), or 0 provided that P
0
> q, P
0
= q
or P
0
< q, respectively.
(b)
lim
T
(P
0
(
(log(
P
0
q
) + (r +
1
2

2
)T)

T
) qe
rT
(
(log(
P
0
q
) + (r
1
2

2
)T)

T
)) = P
0
.
This result means that a fair price for the option to purchase the stock at a xed T , for a xed
price q > 0 and the current price P
0
of the stock is P
0
.

You might also like