You are on page 1of 54

Ryan McCorvie

Exercises from Probability with Martingales

EG 1 Two points are chosen at random on a line AB independently according to the


uniform distribution. If the line is cut at these two points, what’s the probability the
segments will form a triangle?

The sample space is the unit square [0, 1]2 with points corresponding to ( x1 , x2 ). The
probability measure is the same as area in this square. A necessary and sufficient condi-
tion to form a triangle is that the segments [0, x1 ], [ x1 , x2 ] and [ x2 , 1] satisfy the triangle
inequalities. Assuming x1 < x2 , these take the form

x1 < ( x2 − x1 ) + (1 − x2 ) 1 − x2 < x1 + ( x2 − x1 ) x2 − x1 < (1 − x2 ) + x1 (1)

The first inequality says x1 < 21 , the second that x2 > 12 and the last that x2 − x1 < 21 .
These inequalities describe a triangle in the unit square with area 18 . Considering also the
symmetrical when x1 > x2 , the total probability is 82 

EG 2 Planet X is a ball with center O. Thre spaceships A, B and C land at random


on its surface, their positions being independent and each uniformly distributed on the
surface. Spaceships A and B can communicate directly by radio if ∠ AOB ≤ 90◦ . What’s
the probability all three can communicate? (For example, A can communicate to C via
B if necessary).

A spaceship X may communicate with any spaceship in the points in the hemisphere
centered at X. Denote this by h( X ). Without loss of generality, rotate the sphere so that
A is the north pole, and B lies in a fixed plane through O and A. The three ships may
communicate if:
1. B is in the northern hemisphere and C is in the northern hemisphere (the ships can
communicate via A)

2. B is in the northern hemisphere and C is in the half-lune h( B) \ h( A) (the ships can


communicate via B)

3. B is in the southern hemisphere and C is in the half-lune h( A) ∩ h( B) (the ships can


communicate via C)
These cases are disjoint, so we can consider them separately and sum their contributions.
The area of a half lune is proportional to angle ∠XOY where X and Y are the centers of
the hemispheres. This is the same as the dihedral angle between the equitorial planes of
the hemispheres. Thus the probability of being in a half lune of angle θ is θ/2π.

1
So writing one integral for each case above, conditioning on the angle θ = ∠ AOB and
giving the probabity for C to be in the the configuration described by the case gives the
equation
1 (π − θ )
Z π/2 Z π/2 Z π
θ
P= p(dθ ) + p(dθ ) + p(dθ ) (2)
0 2 0 2π π/2 2π
Now, the probability density of θ = ∠ AOB is given by 21 sin θ dθ. Heuristically, this can be
seen by consiering the infinitesimally thin strip on the surface of the sphere with constant
latitue θ. This is like a cylinder (or frustrum of a cone) of side edge length r dθ and radius
r sin θ, so the area is like 2πr2 sin θ dθ. Normalizing by the total surface area 4πr2 gives
the probability density.
Calculating each contribution

1 1 1
Z π/2
· sin θ dθ =
0 2 2 4
1 1
Z π/2
θ
· sin θ dθ = (3)
0 2π 2 4π
(π − θ ) 1 1
Z π
· sin θ dθ =
π/2 2π 2 4π
2+ π
Summing the contributions we find P = 4π 

EG.3 Let G be the free group with two generators a and b. Start at time 0 with the unit
element 1, the empty word. At each second multiply the current word on the right by
one of the four elements a, a−1 , b, b−1 , choosing each with probability 14 (independently
of previous choices). The choices

a, a, b, a−1 , a, b−1 , a−1 , a, b (4)

at times 1 to 9 will produce the reduced word aab of length 3 at time 9. Prove that the
probability that the reduced word 1 ever occurs at a positive time is 1/3, and explain
why it is intuitively clear that (almost surely)

1
(length of reduced word at time n)/n → . (5)
2

Multiplying by random elements in { a, a−1 , b, b−1 } is the same as taking a random walk
on the Cayley graph of the free group with two generators (see figure 1). This is the rooted
4-regular tree. The depth of a vertex is the length of the unique path to the root, which is
the vertex corresponding to 1 in the free group element. For any vertex of depth d > 0,
there are three edges leading to a node of depth d + 1 and one edge leading to a node of
depth d − 1.
Let pd be the probability of returning to 1 starting from a node of depth d. Its intu-
itively clear from the symmetry of the graph that the probability only depends on the
depth. There is a graph isomorphism mapping any node of depth d to another node of

2
Figure 1: Cayley graph on the free group of 2 generators

depth d, so there is a one-to-one correspondence between paths which lead back to 1, and
those paths have equal probability. The probability must satisfy the recurrance

3 1
pn = p n +1 + p n −1 (6)
4 4
This is because we can condition the return probability on the first node in the walk and
use the fact that the return probability depends only the depth of the node. Fundamen-
tal solutions of recurrance equations have the form pn = λn and satisfy a characteristic
polynomial equation which, for this recurrance is

1
3λ2 − 4λ + 1 = 0 ⇒ λ= or λ = 1 (7)
3
n
The only solution of the form pn = a · 13 + b which also satisfies the boundary conditions
n
p0 = 1 and pn → 0 as n → ∞ is pn = 13 . In particular, a random walk starting at time
1 from a node of depth 1 returns to the origin with probability 13 . This is the same as the
event that the reduced word is ever 1 at positive time, since every word sequence as one
letter at time 1.
Let Ln be the random variable which represents word length at time n. Note that

3 1 1
E [ L n | L n −1 ] = ( L n −1 + 1 ) + ( L n −1 − 1 ) = L n −1 + (8)
4 4 2
since Ln is the same thing as vertex depth. Taking expectations and using the tower law

1
E L n = E L n −1 + (9)
2
Hence the expectation satisfies a simple recurrance. Since L0 = 0, this means E Ln = n/2.
More precisely, the quantity Ln − Ln−1 is an i.i.d. random variable with finite variance
and mean 21 . Therefore by the SLLN, Ln /n → 12 almost surely. 

3
EG.4 Suppose now that the elements a, a−1 , b, b−1 are chosen instead with respective
probabilities α, α, β, β, where α > 0, β > 0, α + β = 21 . Prove that the conditional proba-
bility that the reduced word 1 ever occurs at a positive time, given that the element a is
chosen at time 1, is the unique root x = ( a) (say) in (0, 1) of the equation

3x2 + (3 − 4α−1 ) x2 + x + 1 = 0 (10)

As time goes on, (it is almost surely true that) more and more of the reduced word
becomes fixed, so that a final word is built up. If in the final word, the symbols a and
a−1 are both replaced by A and the symbols b and b−1 are both replaced by B, show that
the sequence of A’s and B’s obtained is a Markov chain on { A, B} with (for example)

α (1 − x )
p AA = (11)
α(1 − x ) + 2β(1 − y)

where y = r ( β). What is the (almost sure) limiting proportion of occurrence of the
symbol a in the final word? (Note. This result was used by Professor Lyons of Edinburgh
to solve a long-standing problem in potential theory on Riemannian manifolds.)


Algebra’s, etc.

1.1 Let V ⊂ N, and say V has Cesaro density γ(V ) whenever

](V ∩ {1, 2, . . . , n})


γ(V ) = lim (12)
n→∞ n
exists. Let C be the set of all sets with Cesaro density. Give an example of V1 , V2 ∈ C
such that V1 ∩ V2 6∈ C

Take V1 to be 1 and n > 10 whose base 10 representation has an equal first and last digit.
Take V2 to be the natural numbers whose last digit in its base 10 representation is 1.
Then 0 ≤ ](Vi ∩ {1, 2, . . . , n}) − bn/10c ≤ 1 since in every subset {q · 10, q · 10 +
1, q · 10 + 2, . . . , q · 10 + 9} each of V1 and V2 have exactly one representative. Therefore
γ(V1 ) = γ(V2 ) = 1.
Note V1 ∩ V2 consists of 1 and all numbers of the form 1b2 b3 . . . bn−1 1 where bi can take
any digit from 0 to 9. Thus between 10k and 10k+1 there are 10k−1 members of V1 ∩ V2 .
Therefore if n = 10k ,
10k−1 − 1
](V1 ∩ V2 ∩ {1, . . . , n}) = 1 + 1 + 10 + 100 + · · · + 10k−2 = +1 (13)
9
](V1 ∩V2 ∩{1,...,n}) 1
and n < 89 . However if n = 2 · 10k then

10k − 1
](V1 ∩ V2 ∩ {1, . . . , n}) = 1 + 1 + 10 + 100 + · · · + 10k−1 = +1 (14)
9

4
](V ∩V2 ∩{1,...,n}) 1
and 1 n > 18 . Therefore the limit diverges since the lim sup is greater than the
lim inf. 

Independence

4.1 Let (Ω, F , Pr) be a probability triple. Let I1 , I2 and I3 be three π-systems on Ω
such that for k = 1, 2, 3
Ik ⊂ F and Ω ∈ Ik (15)
Prove that if
Pr( I1 ∩ I2 ∩ I3 ) = Pr( I1 ) Pr( I2 ) Pr( I3 ) (16)
whenever Ik ∈ Ik then the σ (Ik ) are independent. Why did we require Ω ∈ Ik ?

Fix I2 ∈ I2 and I3 ∈ I3 . Then consider the mappings for I ⊂ I1

µ( I ) ,→ Pr( I ∩ I2 ∩ I3 ) µ0 ( I ) ,→ Pr( I ) Pr( I2 ) Pr( I3 ) (17)

Each of µ and µ0 are countably additive measures because the probability Pr is a count-
ably additive measure on F . By the hypothesis (16), the measures are equal on the
π-system I1 . By assumption Ω ∈ I1 as well, so µ(Ω) = µ0 (Ω). If we dropped the
assumption that Ω ∈ I1 , then the condition that µ(Ω) = µ0 (Ω) is the condition that
Pr( I2 ∩ I3 ) = Pr( I2 ) Pr( I3 ), which doesn’t hold generically. We conclude from theorem
1.6 that µ( I ) = µ0 ( I ) for I ∈ σ(I1 ). Since I2 and I3 were arbitrary, equation (16) holds for
all I1 ∈ σ (I1 ), I2 ∈ I2 , I3 ∈ I3 .
Now we can repeat the above argument with the π-systems I10 = I2 , I20 = I3 , I30 =
σ (I1 ), and observe (16) is invariant to permutations of {1, 2, 3}, to conclude that (16)
holds for I1 ∈ σ (I1 ), I2 ∈ σ(I2 ), I3 ∈ I3 . Applying the arugment one more time for
I10 = I3 , I20 = σ(I1 ), I30 = σ(I2 ) gives the result we want, that (16) holds for Ik ∈ σ(Ik ) 

4.2 For s > 1 define ζ (s) = ∑n∈N n−s and consider the distribution on N given by
Pr( X = n) = ζ (s)−1 n−s . Let Ek = {n divisible by k }. Show E p and Eq are inde-
pendent for primes p 6= q. Show Euler’s formula 1/ζ (s) = ∏ p (1 − 1/ps ). Show
Pr(X is square-free) = 1/ζ (2s). Let H = gcd( X, Y ). Show Pr( H = n) = n−2s /ζ (2s)

The elements of m ∈ Ek are in one-to-one correspondence with elements of n ∈ N by the


relationship m = nk. Therefore

Pr( Ek ) = ∑ Pr(m) = ∑ Pr(nk) (18)


m∈ Ek n ∈N

Since
Pr(nk ) = k−s n−s /ζ (s) = k−s Pr(n) (19)
we have
Pr( Ek ) = k−s ∑ Pr(n) = k−s (20)
n ∈N

5
Furthermore, the conditional probability satisfies

Pr( X = nk ) (nk)−s /ζ (s)


Pr ( X = kn | X ∈ Ek ) = = = n−s /ζ (s) (21)
Pr( Ek ) k −s

Thus the conditional distribution of X is given by the ζ (s) distribution on the relative
divisor X/k.
Now, for m, n relatively prime, Em ∩ En = Emn by unique factorization. Therefore

Pr( Em ∩ En ) = Pr( Emn ) = (mn)−s = m−s n−s = Pr( Em ) Pr( En ) (22)

which proves independence.


Consider the probability X is not divisible by any of the first n primes p1 , p2 , . . . , pn .
This is the event E = Ecp1 ∩ · · · ∩ Ecpn and
n n
Pr( E) = ∏(1 − Pr(Epi )) = ∏(1 − pi−s ) (23)
i =1 i =1

As n → ∞ the set E consists of 1 and really big numbers, certainly every element of E
exceeds pn+1 , and Pr( X > n) ↓ 0 as n → ∞. Thus Pr( E) → Pr( X = 1) = 1/ζ (s). This
shows
ζ (s)−1 = ∏(1 − 1/ps ) (24)
p

The set of integers which don’t contain a square prime factor up to pn is given by

F = Ecp2 ∩ · · · ∩ Ecp2 (25)


1 n

Each of the sets E p2 are independent, because the numbers p2 are relatively prime. Thus
as n → ∞, F converges to a set of square-free numbers up to pn (at least) and a set of
negligible probability, so

Pr( F ) → ∏(1 − ( p2 )−s ) = ∏(1 − 1/p2s ) = 1/ζ (2s) (26)


p p

Now n = gcd( x, y) iff X and Y are divisible by n, and the relative divisors n x , ny such
that x = n · n x and y = n · ny . By independence the event that both X ∈ En and Y ∈ En
satisfies
Pr( X ∈ En , Y ∈ En ) = Pr( X ∈ En ) Pr(Y ∈ En ) = n−2s (27)
Let R = {( a, b) | a, b relatively prime} where we take the convention (1, 1) 6∈ R. We can
calulate Pr(( X, Y ) ∈ R) as follows. First consider the sets Fp = { X ∈ E p } ∩ {Y ∈ E p }. By
independence Pr( Fp ) = Pr( X ∈ E p ) Pr(Y ∈ E p ) = p−2s . The Fp are independent of each
other since the E p are. Then Fpc is the event that p is not a divisor of at least one of X and
Y. Hence F = p Fpc is the event that X and Y have no common divisor.
T

Pr( X, Y relatively prime) = Pr( F ) = ∏(1 − p−2s ) = 1/ζ (2s) (28)


p

6
Pulling it together
Pr(gcd( X, Y ) = n) = Pr( An ) Pr( Bn | An ) = n−2s /ζ (2s) (29)
where An = { X ∈ En } ∩ {Y ∈ En } and Bn = {( X/n, Y/n) ∈ R}.


4.3 Let X1 , X2 , . . . be i.i.d. random variables from a continuous distribution. Let E1 = Ω


and for n ≥ 2 let En = { Xn = max( X1 , X2 , . . . , Xn )}, that is that a “record” occurs at
event n. Show that the events En are independent and that Pr( En ) = 1/n.

A continuous distribution has the property that singletons have measure zero µ( X =
x ) = 0. Thus conditioning on the value of X1 ,R it follows that { X1 = X2 } has measure
zero also, by Fubini’s theorem Pr( X1 = X2 ) = Ω Pr( X2 = x ) µ(dx ) = 0. Since there are
a finite number of pairs of variables, we may assume none of the Xi are coincident by
excluding a set of measure zero.
Allow a permutation σ ∈ Sn to act on the sample space by permuting the indices. That
is σ : ( X1 , X2 , . . . , Xn ) ,→ ( Xσ1 , Xσ2 , . . . , Xσn ). Clearly this also permutes the ranks so that
ρi (σX ) = ρσi ( X ), or in vector form, ρ(σX ) = σρ( X ). Because the Xi are i.i.d., the permu-
tation σ is measure preserving Pr( E) = Pr(σE) for any event E ⊂ Ωn . Consider the Weyl
chamber W = { x1 < x2 < · · · < xn }. Clearly the disjoint σW partition Ωn , excluding a
set of measure zero of coincident points, since every point x ∈ Ωn is in ρ( x )−1 (W ) where
ρ( x ) is the permutation induced by mapping each component to its ordinal ranking from
smallest to greatest.
We infer that Pr(W ) = 1/n!, since the disjoint copies of W have equal probability and
comprise the entire space. The event En is the union of all the sets σW where σn = n.
There are (n − 1)! such permutations, so Pr En = (n − 1)!/n! = 1/n.
For any σW ⊂ En we can apply any permutation σ0 ∈ Sm ⊂ Sn which acts only on
the first m letters and keeps the rest fixed. For any σ0 ∈ Sm , σ0 σW ⊂ En since the nth
component is largest and unchanged by the action of σ0 . Let us group the sets σW ⊂ En
into the orbits of the action of Sm . From each orbit we may choose the representative with
ρ1 ( x ) < ρ2 ( x ) < · · · < ρm ( x ) for each x ∈ σW, which is formed by permuting the first
m elements into sorted order. Let U be the union of all orbit representatives. For σ0 ∈ Sm
the sets σ0 U are disjoint and partition En . There are m! such sets, they all have equal
probability, and their union has probability 1/n, so Pr(U ) = 1/(n · m!). Of these exactly
(m − 1)! are also in Em . These correspond to σ0 ∈ Sm which satisfy σ0 m = m. Therefore
Pr( Em ∩ En ) = 1/(mn) = Pr( Em ) Pr( En ) and the sets are independent.


Borel-Cantelli Lemmas

4.4 Suppose a coin with probability p of heads is tossed repeatedly. Let Ak be the event
that a sequence of k (or more) heads occurs among the tosses numbered 2k , 2k + 1, 2k +
1, . . . , 2k+1 − 1. Prove that (
1 if p ≥ 21
Pr( Ak i.o) = (30)
0 if p < 12

7
Let Ek,i be the event that there is a string of k heads at times j for

2k + i ≤ j < 2k + i + k − 1 (31)
Sm
Observe Pr( Ek,i ) = pk . If mk = 2k − k and i ≤ mk then Ek,i ⊂ Ak . Furthmore Ak ⊂ i=k0 Ek,i
since if there’s a string of k heads in this range, it has to start somewhere. Thus by the
union bound
mk
Pr( Ak ) ≤ ∑ Ek,i = mk · Pr(Ek,i ) ≤ (2p)k (32)
i =0

If p < 12 then ∑k Pr( Ak ) < ∞. By Borel-Cantelli 1, we must have Pr( Ak i.o.) = 0.


Now consider Ek,0 , Ek,k , Ek,2k , . . . so long as ik ≤ mk . Now the sets are independent
and
bmk /kc
[
Ak ⊃ Ek,ik (33)
i =0
Taking the inclusion-exclusion inequality
Pr( Ak ) ≥ ∑ Pr(Ek,ik ) − ∑ Pr(Ek,ik ∩ Ek,jk ) (34)
i i,j

All of the terms in the first sum are pk and there are bmk /k c ≥ 2k /k − 2 terms. All the
terms in the second sum are p2k since the Ek,ik are independent and there are (bmk2/kc) of
them. Therefore when p = 21
! !
2k 22k 1 1
Pr( Ak ) ≥ − 2 pk − 2
p2k = − 2−(k−1) − 2 (35)
k 2k k 2k

Therefore ∑ Pr( Ak ) = ∞, since its greater than sum of the harmonic series and a con-
vergent series. Since the Ak are independent, the second Borel Cantelli lemma says that
Pr( A)k i.o.) = 1.
As a function of p, the probability Pr( Ak i.o.) must non-decreasing, since increasing
the probability of heads increases the likelihood of the event. However we just showed
this probability is 1 for p = 21 . Therefore the probability is 1 for any p > 12 . 

4.5 Prove that if G ∼ N (0, 1) then for x > 0

1 1 2
Pr( G > x ) ≤ √ e− 2 x (36)
x 2π

Let Xi N (0, 1) be i.i.d.. Show that L = 1 almost surely where


p
L = lim sup( Xn / 2 log n) (37)

Let Sn = ∑i Xi . Recall Sn / n ∼ N (0, 1). Prove that
p
|Sn | < 2 n log n, eventually almost surely (38)

8
Let’s do a calcualtion, noting that u/x ≥ 1 in the range of the integral
Z ∞ − 1 u2 Z ∞ 1 2
e 2 1 − 12 u2 e− 2 x
du ≤ ue du = k+1 (39)
x uk x k +1 x x
1 2
Now integrating by parts with v = xe− 2 x and u0 = 1/x we get
1 2 Z ∞ − 1 x2
√ Z ∞
− 12 u2 e− 2 x e 2
2π Pr( G > x ) = e du = − dx (40)
x x x x2
Alternately dropping the last term or applying (39) with k = 2 we find
 
1 1 1 1 2 1 1 − 1 x2
√ − 3 e− 2 x ≤ Pr( G > x ) ≤ √ e 2 (41)
2π x x 2π x
1 2 1− e
If I ( x ) = x −1 e− 2 x then this shows for any e > 0 of x is large enough then √ I (x) ≤

P( G > x ) ≤ √1 . Now

1 1
e−α log n = p
p
I( 2α log n) = p (42)
α log n x α α log n
p
Using the substitution the substitution u = log x we can apply the integral test
Z ∞ Z ∞ Z ∞
dx 2 2
e(1−α)u du
p
I( 2α log n) dx = p = √ (43)
a a x α α log x α a

Clearly the integral converges if α > 1 and diverges if α ≤ 1, so the same is true of

∑ ∑ 2α log n))
p p
Pr ( X n > 2α log n ) = O ( I ( (44)
n n

The Xn are independent so we may apply Borel-Cantelli and its converse to conclude for
α>1 p p
lim sup Xn / 2α log n ≤ 1 a.s. lim inf Xn / 2 log n ≥ 1 a.s. (45)
Let α ↓ 1 to conclude lim L = 1 almost surely.
Note that we only used independence in the lower bound, the upper bound √ uses plain
jane Borel-Cantelli which does not assume independence.
√ Thus Yn = S n / n are a se-
quence of N (0, 1) random variables. If we take α = 2 then conclude
p
Yn < 2 log n eventually, a.s. (46)

The same is true for −Yn whichp has the same distribution, so p eventually (taking the
greater eventuality) |Yn | < 2 log n. This is the same as |Sn | < 2 n log n eventually al-
2
q this implies the strong law of large numbers for Xn ∼ N (µ, σ )
most surely. In particular
log n
since |Sn /n − µ| < 2σ n → 0 eventually almost surely. 

9
4.6 This is a converse to the SLLN. Let Xn be a sequence of i.i.d. RV with E[| Xn |] = ∞.
Prove that
| Xn |
∑ Pr(|Xn | > kn) = ∞ for ∀k and lim sup
n
= ∞ a.s. (47)
n

For Sn = ∑in=1 Xn conclude


| Sn |
lim sup = ∞ a.s. (48)
n
Then
bZc = ∑ IZ ≥ n (49)
n ∈N
since each indicator with n < Z adds 1 to the sum and all the other indicaters are 0,
and there are b Z c such indicators. Taking expectations of both sides and noting that
bZc ≤ Z ≤ bZc + 1
E[ Z ] − 1 ≤ ∑ Pr( Z ≥ n) ≤ E[ Z ] (50)
n ∈N
FOr any k ∈ N, take Z = | X1 |/k, since E[ Z ] = ∞ we get that
∑ Pr(|X1 |/k > n) ≥ E[Z] − 1 = ∞ (51)
n

For i.i.d. Xn since Pr(| Xn | > α) = Pr(| X1 | > α) for any α ≥ 0 this implies
∑ Pr(|Xn | > kn) = ∞ (52)
n

By the converse of Borel-Cantelli we must have | Xn |/n > k infinitely often, and therefore
lim sup| Xn |/n ≥ k for every k ∈ N. We conclude lim sup| Xn |/n = ∞.
Suppose |Sn |/n ≤ k eventually for some k > 0. That is, there exists a k > 0 and N > 0
such that if n ≥ N then |Sn | ≤ kn. We know that almost surely there exists an n ≥ N + 1
such that | Xn | ≥ 2kn. But then
|Sn | ≥ | Xn | − |Sn−1 | ≥ 2kn − k(n − 1) > kn (53)
This is a contradiction. Hence, with probability 1, |Sn |/n > k infinitely often for any k > 0
and therefore
| Sn |
lim sup = ∞ a.s. (54)
n


4.7 What’s fair about a fair game? Let X1 , X2 , . . . be independent random variables
such that (
n2 − 1 with probability n−2
Xn = (55)
−1 with probability 1 − n−2
Prove that E Xn = 0 for all n, but that if Sn = X1 + X2 + · · · + Xn then
Sn
→ −1 a.s. (56)
n

10
An easy calculation shows

E Xn = (n2 − 1) · Pr(n2 − 1) + −1 · Pr(−1) (57)


= (1 − n −2 ) + ( n −2 − 1 ) = 0 (58)

Now let An be the event Xn = −1. Since

∑ Pr( Acn ) = ∑ n−2 < ∞ (59)


n n

By Borel-Cantelli, with probability 1, only finitely many of the Acn occur. Therefore, with
probability 1, Xn = −1 with a finite number of exceptions. On any such sample, let N be
the largest index such that X N = 1 − n−2 . Then for n > N

Sn S −1 · ( n − N )
= N+ (60)
n n n
The as n → ∞, the first term tends to zero because S N is fixed and the second term tends
to -1. (Note the indepdendence of Xn was not needed in this argument) 

4.8 This exercise assumes that you are familiar with continuous-parameter Markov
chains with two states.
For each n ∈ N, let X (n) = { X (n) (t) : t ≥ 0} be a Markov chain with state-space the
two-point set {0, 1} with Q-matrix
 
(n) − an an
Q = , a n , bn > 0 (61)
bn − bn

and transition function P(n) (t) = exp(tQ(n) ). Show that, for every t,
(n) (n)
p00 (t) ≥ bn /( an + bn ), p01 (t) ≤ an /( an + bn ) (62)

The processes ( X (n) : n ∈ N) are independent and X (n) (0) = 0 for every n. Each X (n)
has right-continuous paths.
Suppose that ∑ an = ∞ and ∑ an /bn < ∞. Prove that if t is a fixed time then

Pr( X (n) (t) = 1 for infinitely many n) = 0 (63)


(n)
Use Weierstrass’s M-test to show that ∑n log p00 (t) is uniformly convergent on [0, 1], and
deduce that
Pr( X (n) (t) = 0 for ALL n) → 1 as t ↓ 0 (64)
Prove that
Pr( X (n) (s) = 0, ∀s ≤ t, ∀n) = 0 for every t > 0 (65)
and discuss with your tutor why it is almost surely true that within every non-empty
time interval, infinitely many of the X (n ) chains jump.

11
First let’s calculate the matrix exponential
 2
a + ab − a2 − ab
  
−a a 2
Q= we have Q = = −( a + b) Q (66)
b −b − ab − b2 ab + b2
Thus, by induction, Qn = (− a − b)n−1 Q, and therefore
tk Qk Q tk (− a − b)k e−(a+b)t − 1
exp(tQ) = I + ∑ k! = I −
a + b ∑ k!
= I −
a + b
Q
k ≥1 k ≥1
(67)
− t ( a + b ) a − ae−t(a+b)
 
1 b + ae
=
a + b b − be−t(a+b) a + be−t(a+b)
From this (62) is evident.
For fixed t > 0,
an an
∑ Pr({X (n) (t) = 1}) = ∑ p01 (t) ≤ ∑ an + bn ≤∑
(n)
<∞ (68)
n n n n bn
Thus by Borel-Cantelli, almost surely, at any time t only finitely many of the X (n) (t) = 1
Now consider the joint probability Π(t) that every X (n) (t) = 0 at time t. This satisfies
the bound
 
a n + bn an
− log Π(t) = − log ∏ p00 (t) ≤ ∑ log ≤∑
(n)
<∞ (69)
n n bn n bn
This gives a uniform bound for the convergence of − log Π(t), implies the product con-
verges uniformly on any compact subset of R≥0 , such as [0, 1]. In particular this means
that

lim Π(t) = Π(0) =
t ↓0
∏1=1 (70)
n =1
TODO finish this 

Tail σ-algebras

4.9 Let Y0 , Y1 , Y2 , . . . be independent random variables with


(
+1 probability 12
Yn = (71)
−1 probability 12

Define
Xn = Y0 Y1 · · · Yn (72)
Prove the Xn are independent. Define

Y = σ(Y1 , Y2 , . . . ), T n = σ ( Xr : r > n ) (73)

Prove !
\ \
L := σ(Y , Tn ) 6= σ Y, Tn =: R (74)
n n

12
First note Pr( Xn = 1) = Pr( Xn = −1) = 21 , so Xn has the same distribution as Yn . If we
let X0 = Y0 , this follows by induction since
(
X n −1 with probability 12
Xn = (75)
− Xn−1 with probability 12
and so, for example,
1 1 1 1
Pr( Xn = 1) = Pr( Xn = 1) + Pr( Xn−1 = −1) = 2 · = (76)
2 2 4 2
For m < n and em , en ∈ {−1, 1}, consider Pr({ Xm = em } ∩ { Xn = en }). Clearly this is
the same as Pr({ Xm = em } ∩ { Xm · Xn = em en }). Given Xm and Xn we can multiply them
to generate Xm · Xn . Conversely, given Xm and Xm · Xn we can multiply them to recover
Xn . Note also that
Xm · Xn = (Y0 · Y1 · · · Ym )2 Ym+1 · · · Yn = Ym+1 · · · Yn (77)
Hence Xm and Xm · Xn are independent, since they are functions of independent random
variables. Thus
Pr({ Xm = em } ∩ { Xn = en }) = Pr({ Xm = em } ∩ { Xm · Xn = em en })
= Pr({ Xm = em }) Pr({ Xm · Xn = em en }) (78)
1 1
= · = Pr( Xm = em ) Pr( Xn = en )
2 2
This argument generalizes to any finite set of Xn1 , . . . , Xnk by considering the isometric
transformation    
Xn1 Xn1
 Xn   Xn1 Xn2 
 2
→ (79)
 
 ..   .. 
 .   . 
Xn k Xn1 Xn2 · · · Xn k
and noting each of the variables on the right hand side are independent with the same
distribution as Xn so the joint probability is 2−k for any fixed values enk . But this is the
same as ∏k Pr( Xnk = enk ).
Furthermore Xn is independent of Ym . If m > n this is obvious because the definition of
Xm involves terms independent of Yn by definition. If m ≤ n then consider the isometric
transformation (Ym , Xn ) → (Ym , Xn · Ym ). The second variable is the product Y0 · · · Ym−1 ·
Ym+1 · · · Yn which is clearly independent of Ym
Suppose we know the values of Yk for all k ≥ 1 and also some Xr . Then we can
immediately deduce the value of Y0 since
Y0 = Y0 · (Y1 · · · Yr )2 = Xr · Y1 · Y2 · · · Yr (80)
This shows that for any n, Y0 is measurable in the algebra σ(Y1 , . . . , Yn , Xn ) ⊂ σ(Y , Tn−1 ).
Therefore Y0 is measurable in L = n σ(Y , Tn )
T

On the other hand, Y0 is independent of R. Let T = ∩n Tn be the tail field. By the


above argument Y0 is independent of σ (Y1 , . . . , Ym , Tn ) for any n > m + 1. Therefore Y0
is independent of Rm = σ (Y1 , . . . , Tm , T ) since its a subset. But Rm ⊂ Rm+1 and so Y0 is
independent of ∪n Rn = R 

13
Dominated Convergence Theorem

5.1 Let S = [0, 1], Σ = B(S), µ be the Lesbegue measure. Define f n = nI(0,1/n) . Prove
that f n (s) → 0 for every s ∈ S but that µ( f n ) = 1 for every n. Draw a picture of
g = supn | f n | and show that g 6∈ L1 (S, Σ, µ)

If n > 1/s then f n (s) = 0 because the indicator I(0,1/n) doesn’t put any mass at points
as large as s. So f n (s) eventually for every s and lim f n (s) → 0. On the other hand
µ( f n ) = nµ(0, 1/n) = nn = 1. The function g = supn | f n | sort of looks like the function 1/x
near 0, except it has stairsteps at each 1/n. It takes the value n on the range (1/(n + 1), n].
Now gn = max(| f 1 |, . . . , | f n |) is a simple function which satisfies
n   n
1 1 1
µgn ≥ ∑ k − = ∑ (81)
k =1
k k + 1 k =1
k + 1

The sum diverges as n ↑ ∞, which means µg also diverges since its greater than any µgn .


5.2 Prove inclusion-exclusion considering integrals of indicator functions

We will use the following property of indicators repeatedly: for events A and B,

I A IB = I A ∩ B (82)

This can be verified by considering the four cases x ∈ ( A ∪ B)c , x ∈ A \ B, x ∈ B \ A and


x ∈ A ∩ B. These four cases partition Ω, and the values of the expressins on both sides of
(82) are equal.
Let A1 , . . . , An be a collection of events. To simplify notation I’m going to write I ( A)
for indicators rather than I A , but the indicator remains a function of ω ∈ Ω. Now by de
Morgan’s law ( A1 ∪ A2 ∪ . . . An )c = A1c ∩ A2c ∩ · · · ∩ Acn .

1 − I ( A1 ∪ · · · ∪ An ) = I (( A1 ∪ . . . An )c )
= I ( A1c ∩ A2c ∩ · · · ∩ Acn )
= I ( A1c ) I ( A2c ) . . . I ( Acn )
=(1 − I ( A1 ))(1 − I ( A2 )) . . . (1 − I ( An ))
=1 − ∑ I ( A i ) + ∑ I ( A i ) I ( A j ) − ∑ I ( Ai ) I ( A j ) I ( A k ) (83)
i i< j i < j<k

+ · · · + (−1)n−1 I ( A1 ) I ( A2 ) . . . I ( An )
=1 − ∑ I ( A i ) + ∑ I ( A i ∩ A j ) − ∑ I ( Ai ∩ A j ∩ A k )
i i< j i < j<k

+ · · · + (−1)n−1 I ( A1 ∩ A2 ∩ · · · ∩ An )

14
Take expectations of each side yields inclusion exclusion

Pr( A1 ∪ · · · ∪ An ) = ∑ Pr( Ai ) + ∑ Pr( Ai ∩ A j ) − ∑ Pr( Ai ∩ A j ∩ Ak )


i i< j i < j<k (84)
n −1
+ · · · + (−1) Pr( A1 ∩ A2 ∩ · · · ∩ An )

Now we prove the inclusion-exclusion in equalities. The d-depth inequality on n terms


on the indicator functions is given by

I ( A1 ∪ · · · ∪ A n ) Q ∑ I ( Ai ) + ∑ I ( Ai ∩ A j ) − ∑ I ( Ai ∩ A j ∩ A k )
i i< j i < j<k
(85)
+ · · · + (−1)d−1 ∑ I ( A i1 ∩ A i2 ∩ · · · ∩ A i d )
i1 <i2 <···<id

where the inequality is ≤ if d is odd and ≥ if d is even. For each d we induct on n. Clearly
its true for d = 1 and n = 2 since by inclusion-exclusion

I ( A1 ∪ A2 ) = I ( A1 ) + I ( A2 ) − I ( A1 ∩ A2 ) ≤ I ( A1 ) + I ( A2 ) (86)

Then its true inductively for any n and d = 1, assuming its true for n − 1

I ( A 1 ∪ · · · ∪ A n ) ≤ I ( A 1 ∪ · · · ∪ A n −1 ) ∪ I ( A n )
(87)
≤ I ( A 1 ) + · · · + I ( A n −1 ) + I ( A n )

For arbitrary d, the inequality is true for n ≤ k, since in this case its just the inclusion-
exclusion equality. No terms are truncated. So inductively if its true for n − 1, we can
group together A1 ∪ A2 and treat it as one set

I ( A1 ∪ · · · ∪ A n ) Q I ( A1 ∪ A2 ) + ∑ I ( A i )
i
− ∑ I (( A1 ∪ A2 ) ∩ Ai ) − ∑ I ( Ai ∩ A j )
i i< j

+ ∑ I (( A1 ∪ A2 ) ∩ Ai ∩ A j ) + ∑ I ( Ai ∩ A j ∩ A k )
i,j i < j<k (88)
+...
+ (−1)d−1 ∑ I (( A1 ∪ A2 ) ∩ Ai1 ∩ Ai2 ∩ · · · ∩ Aid−1 )
i1 <i2 <···<id−1

+ (−1)d−1 ∑ I ( A i1 ∩ A i2 ∩ · · · ∩ A i d )
i1 <i2 <···<id

Here indices range from 3 to n since we’ve explicitly broken out 1 and 2. Using the dis-
tributive law, each indicator which includes A1 ∪ A2 can be written as the indicator of a
union of two sets ( A1 ∩ B) ∪ ( A2 ∩ B) where B = Ai1 ∩ · · · ∩ Aim . For all such indica-
tors of m < d terms, use inclusion-exclusion for n = 2 to expand I (( A1 ∪ A2 ) ∩ B) →
I ( A1 ∩ B) + I ( A2 ∩ B) − I ( A1 ∩ A2 ∩ B). After performing all these expansions, we see
we recover most of the terms inclusion-exclusion formula up to level d. This is because

15
the terms of level m expand to provide all the missing terms in level m which have exactly
one of A1 or A2 , and also to provide all the missing terms in level m + 1 including both
A1 and A2 . Terms with neither A1 or A2 were already present before the expansion.
Finally we extend the inequality by expanding the terms in level d using the inequality
I (( A1 ∪ A2 ) ∩ B) ≤ I ( A1 ∩ B) + I ( A2 ∩ B). This provides the missing terms at level d. This
also extends the inequality in the right direction, ≤ for odd d and ≥ for even d, owing to
the factor (−1)d−1 on these terms.


The Strong Law

7.1 Let f be a bounded continuous function on [0, ∞). The Laplace transform of f is the
function L on (0, ∞) defined by
Z ∞
L(λ) = e−λx f ( x ) dx (89)
0

Let X1 , X2 , . . . be i.i.d. random variables with the exponential distriution of rate λ, so


Pr( X > x ) = e−λx , E[ X ] = λ1 , Var( X ) = λ12 . If Sn = X1 + X2 + · · · + Xn show that

(−1)n−1 λn (n−1)
E f ( Sn ) = L (λ) (90)
( n − 1) !
Show that f may be recovered from L as follows: for y > 0

(n/y)n Ln−1 (n/y)


f (y) = lim (−1)n−1 (91)
n↑∞ ( n − 1) !

We can write
Z
E f ( Sn ) = f ( x1 + x2 + · · · + xn )e−λx1 e−λx2 . . . e−λxn dx1 dx2 . . . dxn
Rn+
Z ∞Z ∞ Z ∞ (92)
= ··· f (un )e−λun dun . . . du1
0 u1 u n −1

where we performed a change of variables u1 = x1 , u2 = x1 + x2 , . . . , un = x1 + x2 + · · · +


xn .
To simplify this, we’re going to successively apply Fubini’s theorem. To that end con-
sider the calculation
Z ∞ Z ∞ Z ∞ Z v Z ∞ k +1
k k v
u g(v) dv du = g(v) u du dv = g(v) dv (93)
0 u 0 0 0 k+1
R∞
f (un )e−λun . We are left with
R
So let u = u1 , v = u2 , k = 0 and g(v) = v ··· u n −1
Z ∞ Z ∞ Z ∞
u2 ··· f (un )e−λun dun . . . du1 (94)
0 u2 u n −1

16
We may repeatedly interchange the order of integration, and then integrate out one vari-
able until we are left with
Z ∞
unn−1
E f ( Sn ) = f (un )e−λun dun (95)
0 ( n − 1) !
Now we need two fact from the theory of Laplace transforms

d ∞ d
Z Z ∞
L[ f ] = f ( x )e−λx dx = − x f ( x )e−λx dx (96)
dλ 0 dλ 0
= − L[ x f ] (97)
Z ∞ ∞ Z ∞
L[ f 0 ] = 0
f ( x )e −λx
dx = f ( x )e −λx
+λ f ( x )e−λx dx (98)
0 0 0
= λ f (0) + λL[ f ] (99)

Comparing this equation to (95) its clear that

(−1)n−1 n−1
E f ( Sn ) = L [f] (100)
( n − 1) !

TODO where is the factor of λn−1 ???


Note E x4 = λ−5 Γ(4) < ∞. Therefore by the SLLN, Sn /n → λ almost surely. By
continuity, f (Sn /n) → f (1/λ) almost surely. By bounded convergence theorem (since
f ≤ B for some constant B and E B = B), E f (Sn /n) → f (1/λ). We connect this to our
previous expression by first noting for α > 0
Z ∞ Z ∞
−λx −1
L[ f (αx )](λ) = f (αx )e dx = α e−λαu f (u) du = α−1 L[ f ( x )](αλ) (101)
0 0

where the middle equality comes from the substitution u = x/α. From this its clear that

d n −1 −n d
n −1
L[ f (αx )](λ) = α L[ f ( x )](αλ) (102)
dx n dx n
Taking y = 1/λ and α = 1/n we get the desired result

(n/y)n L(n−1) (n/y)


f (y) = E f (Sn /n) = lim (−1)n−1 (103)
n→∞ ( n − 1) !


17
7.2 As usual, write Sn−1 = { x ∈ Rn : | x | = 1}. There is a unique probability mea-
sure νn−1 on (Sn−1 , B(Sn−1 )) such that νn−1 ( A) = νn−1 ( H A) for every orthogonal n × n
matrix H and every A in B(Sn−1 ). This is the same as the radial measure in polar coor-
dinates, or the Haar measure under the action of orthogonal matricies.

(a) Prove that if X is a vector in Rn , the components of which are independent N (0, 1)
variables, then for every orthogonal n × n matrix H the vector HX has the same
property. Deduce that X/k X k has law νn−1

(b) Let Z1 , Z2 , · · · ∼ N (0, 1) and let


1
Rn = k( Z1 , . . . , Zn )k = ( Z12 + Z22 + · · · + Zn2 ) 2 (104)

Show Rn / n → 1 a.s.
(n) (n) (n)
(c) For each n, let (Y1 , Y2 , . . . , Yn ) be a point chosen on Sn−1 according to the dis-
tribution νn−1 . Then
√ (n) 1
Z x
2 /2
lim Pr( Y1 ≤ x ) = Φ( x ) = √ e−y dy (105)
n→∞ 2π −∞

√ (n) √ (n)
lim Pr( Y 1 ≤ x1 ; Y 2 ≤ x2 ) = Φ( x1 )Φ( x2 ) (106)
n→∞

(a) Let Y = HX. Each component is the linear combination of Gaussian random vari-
ables, so it too must be jointly Gaussian. Let’s compute the moments of the compo-
nents. Note E Y = E HX = E H0 = 0. Therefore
Cov Y = E YY | = E( HX )( HX )| = E HXX | H | = H (E XX | ) H | = H I H | = I (107)
We’ve used the identity HH | for orthogonal matricies. Thus each Yi ∼ N(0, 1). Fur-
thermore Cov(Yi , Yj ) = 0 if i 6= j. Since these are jointly Gaussian random variables,
zero correlation is the same thing as independence. Now X/k X k ∈ Sn−1 and the law
of X, by the above calculation, is invariant under orthogonal transformations. That
is, HX/k HX k has the same distribution as X/k X k, which is true since HX has the
same distribution as X. We conclude that the law of X/k X k is νn−1 by the uniqueness
of this measure.
(b) Note E Zk2 = 1 since this is just the variance of Zk . Also E Zk8 < ∞ since a Gaussian
has finite moments of all orders. By the SLLN applied to the random variables Zk ,
n
R2n /n = ( ∑ Zk2 )/n → 1 a.s. (108)
k =1

By continuity of square roots, it must also be the case that Rn / n → 1 almost surely.

(c) Note Y has the same
√ law as X/ k X k = X/R n . As n → ∞, this tends
√ to X/ n√almost
surely. Thus Pr( nY1 < x ) → Pr( X1 < x ) = Φ( x ). Similarly Pr( nY1 < x1 ; nY2 <
x2 ) → Pr( X1 < x1 ; X2 < x2 ) = Φ( x1 )Φ( x2 ).

18


Conditional Expectation

9.1 Prove that if G is a sub σ-algebra of F and if X ∈ L1 (Ω, F , Pr) and if Y ∈


L1 (Ω, G , Pr) and
E( X; G ) = E(Y; G ) (109)
for every G in a π-system which contains Ω and generates G , then (109) holds for every
G ∈ G.

We make a standard monotone class argument. We will show that the collection of sets
satisfying (109) is a d-system, so by Dynkin’s lemma and the hypothesis the π-system G
has property implies σ(G) has the property.
(a) By assumption Ω ∈ G
(b) Suppose A, B ∈ G . Then L = A ∩ B ∈ G because G is a π-system. Now since B =
( B \ A) ä L is the union of these disjoint sets. Therefore IB = IB\ A + IL . Thus by the
linearity of expectations
E( X; B \ A) = E XIB\ A = E X ( IB − IL ) = E( X; B) − E( X; L) (110)
Since the same is true for Y we get
E( X; B \ A) = E( X; B) − E( X; L) = E(Y; B) − E(Y; L) = E(Y; B \ A) (111)
and the class of sets satisfying (109) is closed under taking differences.
(c) Suppose An ↑ A. Then XI An is dominated by | XI A | and also XI An → XI A . Thus by
dominated convergence
E( X; An ) → E( X; A) (112)
Mutatis mutandis, the same is true for Y hence
E( X; A) = lim E( X; An ) = lim E(Y; An ) = E(Y; A) (113)
n→∞ n→∞

This shows that A is in the class of sets which satisfies (109)


Thus, the class of sets which satisfies (109) is a d-system as well as a π-system, so it in-
cludes σ(G).
I guess another approach is to note that G → E[ X; G ] is a (signed) measure, so we can
use something like theorem 1.6 on the uniqueness of measures defined on π-systems? 

9.2 Suppose that X, Y ∈ L1 (Ω, F , Pr) and that

E( X |Y ) = Y a.s. E(Y | X ) = X a.s. (114)

Prove that Pr( X = Y ) = 1

19
Recall the defining property of of conditional expectation. First, E[Y | X ] is σ( X )measurable,
and second and for G ∈ σ( X ),

E[E[Y | X ]; G ] = E[Y; G ] (115)

Let A ∈ σ ( X ). From (114) and (115)

E[ X; A] = E[ E[Y | X ]; A] = E[Y; A] (116)

Also, mutatis mutandis, E[Y; B] = E[ X; B] for any B ∈ σ(Y ). For c ∈ R, taking A = { X ≤


c} and B = {Y ≤ c} gives

E( X; X ≤ c) = E(Y; X ≤ c) ⇒
E( X − Y; X ≤ c, Y > c) + E( X − Y; X ≤ c, Y ≤ c) = 0
(117)
E( X; Y ≤ c) = E(Y; Y ≤ c) ⇒
E( X − Y; X > c, Y ≤ c) + E( X − Y; X ≤ c, Y ≤ c) = 0

Therefore
E( X − Y; X ≤ c, Y > c) = E( X − Y; X > c, Y ≤ c) (118)
since both equal E(Y − X; X ≤ c, Y ≤ c). Since the left hand side is non-positive, and the
right hand side is non-negative, both sides must be zero.
I think this clinches it. Note { X > Y } ⊂ c∈Q { X > c, Y ≤ c}. However
S

E( X − Y; X > Y ) ≤ ∑ E(X − Y; X > c, Y ≤ c) = 0 (119)


c ∈Q

However if Pr( X > Y ) > 0 then E( X − Y; X > Y ) > 0. Thus X ≤ Y almost surely. By
symmetry, Y ≤ X almost surely, and the intersection of these almost sure events is almost
sure. So X = Y almost surely. 

Martingales

10.1 Polya’s urn At time 0 an urn contains 1 black ball and 1 white ball. At each time
t = 1, 2, . . . a ball is chosen at random from the urn and is replaced, and another ball of
the same color is also added. Just after time n there are therefore n + 2 balls in the urn,
of which Bn + 1 is black, where Bn is the number of black balls chosen by time n.
Let Mn = ( Bn + 1)/(n + 2) be the proportion of blacks balls just after time n. Prove that
M is a martingale.
1 for 0 ≤ k ≤ n. What’s the distribution of Θ = lim Mn . Prove
1
Prove Pr( Bn = k) = n+
that for 0 < θ < 1,
( n + 1) !
Nnθ = θ Bn (1 − θ )n− Bn (120)
Bn !(n − Bn )!
defines a martingale N θ

20
We will consider M with respect to the filtration Fn = σ( B1 , . . . , Bn ). Thus, since σ( Bn−1 ) ⊂
F n −1
Bn−1 + 1 B +2
E[ Mn | Fn−1 ] = Pr[ Bn = Bn+1 ] + Pr[ Bn = Bn+1 + 1] n−1
n+2 n+2
n − Bn−1 Bn−1 + 1 Bn−1 + 1 Bn−1 + 2
= · + · (121)
n+1 n+2 n+1 n+2
(n + 2)( Bn−1 + 1)
= = Mn − 1
(n + 2)(n + 1)
which verifies Mn is a martingale.
Now let ti specify on which draw the ith black ball selected, and let s j specify which
draw the jth white ball is selected are white. That is, if n = 7 and k = 4 and the balls
drawn are bwwbbwb then t1 = 1, t2 = 4, t3 = 5, t4 = 7 and ti is given by s1 = 2, s2 =
3, s3 = 6. The length of each sequence is the total number of balls chosen of the respective
color. Since every draw is either black or white, thus it must appear either in the black
sequence or the white sequence. Therefore i ti ∪ i si = {1, 2, . . . , n}. Let’s compute the
S S

probability of a specific sequence


k n−k   −1
i j k!(n − k )! 1 n
P=∏ ·∏ = = (122)
i =1
t i + 1 j =1
s j + 1 n + 1 ( n + 1 ) k

We’ve learned a remarkable fact: the probability doesn’t depend on the order of the
draws, just the total number which are black. The sequence is exchangable, any per-
mutation of the outcomes results in an event with the same probability. There are (nk)
sequences with k black balls and n − k white ones.
    −1
n 1 n 1
Pr( Bn = k ) = = (123)
k ( n + 1) k n+1
Thus we’ve learned another remarkable fact, the distribution of Bn is uniform on 1, 2, . . . , n+1.
Thus Θ = limn→∞ Mn must converge in distribution to the uniform distribution U (0, 1).
Turning to Nnθ , let B be the event that there’s a black ball on the nth draw. That is that
Bn = Bn−1 + 1. Let W = Bc , the probability there is a white ball.
( n + 1) !
E[ Nnθ | Fn−1 ] = Pr(W ) θ Bn−1 (1 − θ )n− Bn−1
Bn−1 !(n − Bn−1 )!
( n + 1) !
+ Pr( B) θ Bn−1 +1 (1 − θ )n− Bn −1
( Bn−1 + 1)!(n − Bn−1 − 1)!
(124)
n!
= θ Bn −1 (1 − θ )n−1− Bn ×
Bn−1 !(n − 1 − Bn−1 )!
(1 − θ )(n + 1)
 
θ ( n + 1)
P (W ) + P( B)
n − Bn−1 Bn−1 + 1
The factor before the parantheses is Nnθ−1 . The expression in the parentheses is
n − Bn−1
   
n+1 n+1 Bn−1 + 1
(1 − θ ) +θ = 1−θ+θ = 1 (125)
n − Bn+1 n+1 Bn−1 + 1 n+1

21


So the martingale property is proved

10.2 Belman’s optimality principle Your winnings per unit stake on game n are en
where en are i.i.d. RV with distribution
(
+1 with probability p
en = (126)
−1 with probability q

Your stake Cn on game n must lie between 0 and Zn−1 , where Zn−1 is your fortune at
time n − 1. Your object is to maximize the expected interest rate E log( ZN /Z0 ) where
N is a given integer representing the number of times you will play and Z0 , your initial
wealth, is a given constant. Let Fn = σ(e1 , . . . , en ) be your history up to time n. Show
that if C is any previsible strategy, then log Zn − nα is a supermartingale where α denotes
the entropy
α = p log p + q log q + log 2 (127)
so that E log( ZN /Z0 ) ≤ Nα, but that, for a certain strategy, log Zn − nα is a martingale.
What is the best strategy?

For a discrete distribution on {1, 2, . . . , n}, lets maximize E log X for a random variable
subject to the constraint ∑in=1 X (i ) = K. Using the method of Lagrange multipliers we
maximize the expression
n n
F= ∑ pi log xi − λ( ∑ xi − K ) (128)
i =1 i =1

over the n + 1 variables x1 , x2 , . . . , xn , λ. (Maximizing with respect to λ) is just the same


as respecting the constraint. The first order conditions are
∂F pi
=0 ⇒ =λ (129)
∂xi xi
Summing over the equations 1 = ∑i pi = λ ∑i xi = λK. Thus xi = Kpi is the extremal
solution. This is a maximum, which can be seen by considering the distribution where
2 p
p1 = 1, or the second order conditions ∂∂xF2 = − x2i = − p1 . The maximum value is given
i i i
by
∑ pi log xi = ∑ pi (log pi + log K) = − H ( p) + log K (130)
i i
where H ( p) = − ∑i pi log pi is the entropy of the distribution
Calculating  
Zn−1 + Cn en
log Zn − log Zn−1 = log (131)
Zn
Let f n = Cn /Zn−1 . For any choice 0 ≤ Cn ≤ Zn−1 , let f n = Cn /Zn−1 we get 0 ≤ f ≤ 1.
E[log Zn − log Zn−1 ] = p log(1 + f n ) + q log(1 − f n ) ≤ p log p + q log q + log 2 (132)

22
where the last inequality follow from our previous calculation. Therefore Zn − nα is a su-
permartingale for any previsible Cn and the optimal Cn is given by Cn = (2p − 1) Zn−1 =
( p − q) Zn−1 . The supermartingale property implies that E[log ZN − Nα] ≤ E[log Z0 ] and
thus E[log( ZN /Z0 )] ≤ Nα 

10.3 Stopping times. Suppose S and T are stopping times relative to (Ω, F , {Fn }).
Prove that S ∧ T = min(S, T ) and S ∨ T = max(S, T ) and S + T are all stopping times.

First note that for any constant n, { T = n} ∈ Fn . This is { T ≤ n} \ { T ≤ n − 1} and


the former is in Fn and the latter is in Fn−1 ⊂ Fn . Therefore { T ≥ n} ∈ Fn since
{ T ≥ n} = { T = n} ∪ { T > n}. The latter is just { T ≤ n}c ∈ Fn .
Turning to S ∧ T, suppose S ∧ T = n. Without loss of generality, suppose S = n
and T ≥ n. We’ve already shown that each of these sets {S = n} and { T ≥ n} are
in Fn , therefore their intersection is as well. The function S ∨ T is even simpler, since if
S ∨ T = n, without loss of generality assume S = n and T ≤ n. These events are both in
Fn , so their intersection is as well.
For S + T, note that each of S and T are natural numbers. Therefore if S + T = n for
some n ∈ N, there are a finite number of possibilities that S = s and T = t and s + t = n.
(Namely s ∈ {0, 1, . . . , n} and t = n − s). So { T + S = n} = nk=0 {S = k} ∩ { T = n − k},
S

and each of the sets in this expression are clearly in Fn 

10.4 Let S and T be stopping times with S ≤ T. Define the process 1(S,T ] with parameter
set N via (
1 if S(ω ) < n ≤ T (ω )
1(S,T ] (n, ω ) = (133)
0 otherwise
Prove 1(S,T ] is previsible and deduce that if X is a supermartingale then

E( X T ∧ n ) ≤ E( XS ∧ n ), ∀n (134)
Note
{1(S,T ] (n, ω ) = 1} = {S(ω ) ≤ n − 1} ∩ { T (ω ) ≥ n} (135)
The first event {S ≤ n − 1} ∈ Fn−1 because S is a stopping time. Similarly the second
event { T ≥ n} = { T ≤ n − 1}c ∈ Fn−1 because T is a stopping time. Since 1(S,T ] only
takes the values {0, 1} this proves that ω ,→ 1(S,T ] (n, ω ) is Fn−1 measurable. Hence the
process 1(S,T ) is previsible.
Thus for any supermartingale Xn , the process Yn = 1(S,T ] • Xn is also a supermartin-
gale with value at time n given by
1(S,T ] • Xn = XT ∧n − XS∧n (136)
Without loss of generality we can assume Y0 = 0, reindexing time if necessary so that
S, T ≥ 1. Taking expectations of both sides gives the desired result
E Yn = E XT ∧n − E XS∧n ≤ Y0 = 0 (137)


23
10.5 Suppose that T is a stopping time such that for some N ∈ N and some e > 0 we
have, for every n
Pr( T ≤ n + N | Fn ) > e, a.s. (138)
By induction using Pr( T > kN ) = Pr( T > kN; T > (k − 1) N ) that for k = 1, 2, 3, . . .

Pr( T > kN ) ≤ (1 − e)k (139)

Show that E( T ) < ∞

Let Gn = { T > n}. Taking complements, our hypothesis implies

Pr( Gn+ N | Fn ) < 1 − e, a.s. (140)

Taking n = 0 this implies Pr( GN ) = Pr( GN | Fn ) < 1 − e. By induction, using the relation
above for n = N (k − 1) we get

Pr( GNk ) = Pr( GN (k−1)+ N | GN (k−1) ) Pr( GN (k−1) ) ≤ (1 − e)(1 − e)n−1 (141)

so the induction holds.


Let’s derive an alternate expression for expectation
∞ ∞ n ∞ ∞ ∞
ET = ∑ n Pr(T = n) = ∑ ∑ Pr(T = n) = ∑ ∑ Pr(T = n) = ∑ Pr(T ≥ k) (142)
n =1 n =1 k =1 k =1 n = k k =1

Also if m < n then Gm ⊃ Gn we have the trivial inequality Pr( Gm ) ≥ Pr( Gn ). So using
this inequality for n = Nk + 1 to n = N (k + 1) − 1 compared with m = Nk we get
∞ ∞ ∞
N
ET = ∑ Pr( T ≥ n) ≤ ∑ N Pr( T ≥ Nk) ≤ N ∑ (1 − e ) k = e
(143)
n =1 k =0 k =0

Clearly this is finite. Note that the inequalities used above are true almost surely. How-
ever, we used only a countable number of inequalities, and the countable intersection of
almost sure events is still almost sure. Thus the inequality above also holds almost surely.
The result may be summarized, “so long as there’s a chance something happens, no mat-
ter how small (so long as the chance is bounded below by a positive number), then it will
happen eventually” 

24
10.6 ABRACADABRA. At each of times 1, 2, 3, . . . a monkey types a capital letter at
random, so the sequence typed is an i.i.d. sequence of RV’s each chosed uniformly
among the 26 possible capital letters. Just before each time t = 1, 2, . . . a new gambler
arrives. He bets $1 that
the nth letter will be A (144)
If he loses he leaves. If he wins he bets his fortune of $26 that

the (n + 1)st letter will be B (145)

If he loses he leaves. If he wins he bets his fortune of $262 that

the (n + 2)st letter will be R (146)

and so on through ABRACADABRA. Let T be the first time the monkey has produced
the consecutive sequence. Show why

E( T ) = 2611 + 264 + 26 (147)

Intuitively, the cumulative winnings of all the gamblers are a martingale, since its the sum
of a bunch of martingales. The cumulative winnings are − T for the initial stake placed
at each time 1, . . . , T plus 2611 + 264 + 26 payout for the winnings on the bets placed on
the first A (winning for the whole word ABRACADABRA), the bet placed on the second
to last A (winning for the terminal fragment ABRA) and the bet placed on the final A
(winning for A). For this to be a martingale, expectated cumulative winnings must be 0
so E T = 2611 + 264 + 26
More formally, let Ln be the nth randomly selected letter. The Ln are i.i.d., with uni-
form distribution over { A, B, . . . , Z }. For a sequence of letters a1 a2 . . . an consider the
martingale


 1 if n ≤ j
26k if n = j + k and L = a , L

( j) j 1 j +1 = a 2 , . . . , L j + k −1 = a k
Mn = 11 if n > j + 11 and L = a , L
(148)


 26 j 1 j + 1 = a 2 , . . . , L j + 10 = a 11
0 otherwise

( j)
This is bounded, and hence is in L1 . Mn satisfies the martingale prorperty since, except
for case 2, its constant and in that case
( j) 1 ( j) 25 ( j)
E [ Mn + 1 | F n ] = · 26Mn + · 0 = Mn (149)
26 26
Now form the martingale N ( j) = In≥ j • M( j) where In≥ j is the indicator for n ≥ j. Now
form the martingale
B = ∑ N ( j) (150)
j ≥1

At any time n only finitely many of the terms are non-zero. Each term is in L1 , so B is
also in L1 . The martingale property holds since B is the sum of martingales. For n ∈ N

25
( j)
let S be the set of j ≤ n such that Mn is in case 2, matching some number of letters. Let
( j)
S0 be the set of j ≤ n such that Mn is in case 3, matching all the letters. The value of B is
given by
Bn = −n + ∑ 26n− j + ∑ 2611 (151)
j∈S j∈S0

Let T be the stopping time for the first time some M( j) enters case 3, that is M( j)
matches all 11 letters in ABRACADABRA. It must be that E| T | < ∞ because it statis-
fies the property in 10.5 with N = 11 and e = 26−11 – i.e., it stops if the next 26 letters
are ABRACADABRA in order. Whenever the stopping time happens, Bn has the same
value for the second and third terms. In this case S = {n − 4, n − 1} since the terminal
ABRA and A matched those M(n−4) and M(n−1) respectively, and S0 has exactly one el-
ement, since this is the first time ABRACADABRA appeared. Hence by Doob’s optional
stopping theorem
0 = B0 = E BT = − E T + 264 + 26 + 2611 (152)
This is the same as (147) 

10.7 Gambler’s Ruin. Suppose that X1 , X2 , . . . are i.i.d. RV’s with

Pr[ X = +1] = p, Pr[ X = −1] = q, where 0 < p = 1 − q < 1 (153)

and p 6= q. Suppose that a and b are integers with 0 < a < b. Define

S n = a + X1 + · · · + X n , T = inf{n : Sn = 0 or Sn = b} (154)

Let F0 = {∅, Ω} and Fn = σ ( X1 , . . . , Xn ). Explain why T satisfies the condition in 10.5.


Prove that   Sn
q
Mn = and Nn = Sn − n( p − q) (155)
p
define martingales M and N. Deduce the values of Pr(ST = 0) and E ST

Let’s compute
  S n −1 +1   S n −1 −1   S n −1
q q q
E [ Mn | F n − 1 ] = p +q = (q + p) = Mn − 1 (156)
p p p
E[ Nn | Fn−1 ] = p(Sn−1 + 1 − n( p − q)) + q(Sn−1 − 1 − n( p − q)) (157)
= ( p + q)Sn−1 + p − q − n( p − q) = Nn−1 (158)

Now the condition described in 10.5 holds with N = b − 1. For any n suppose that
Xn+1 = Xn+2 = · · · = Xn+b = +1. Conditional on Sn ∈ (0, b), then some Sn+b =
Sn + b > b, so Sn+k = b for some k ∈ {1, 2, . . . , b − 1}. In particular, T ≤ n + b − 1.
However Pr( Xn+1 = Xn+2 = · · · = Xn+b = +1) = pb > 0. So

Pr( T > N + b − 1 | Fn ) > pb (159)

26
and the condition is satisfied. We conclude that E T < ∞.
Let π = Pr(ST = 0) = 1 − Pr(ST = b). By Doob’s optional-stopping theorem
 a  b
q q
= M0 = E MT = π · 1 + (1 − π ) · (160)
p p
or

(q/p) a − (q/p)b 1 − (q/p) a


Pr(ST = 0) = π = Pr(ST = b) = 1 − π = (161)
1 − (q/p)b 1 − (q/p)b

Also
a = N0 = E NT = π · 0 + (1 − π ) · b − T ( p − q) (162)
or
(1 − π ) b − a
T= (163)
p−q


10.8 A random number Θ is chosen uniformly in [0, 1] and a coin with probability Θ
of heads is minted. The coin is tossed repeatedly. Let Bn be the number of heads in n
tosses. Prove that Bn has exactly the same probabilistic structure as the ( Bn ) sequence in
10.1 on Polya’s urn. Prove that Nnθ is the regular conditional pdf of Θ given B1 , . . . , Bn
R1
Consider the beta function B(m + 1, n + 1) = 0 um (1 − u)n du. Note B(m, n) = B(n, m)
by a change of variables v = 1 − u. Thus, without loss of generality, assume m ≥ n
Z 1
B(m + 1, n + 1) = um (1 − u)n du
0
1 Z 1
1 n
m +1 n
um+1 (1 − u)n−1 du

= u u +
m+1 0 m + 1 0
n
! (164)
n n+1−k
= B(m + 2, n) = ∏ B(m + n + 1, 1)
m+1 k =1
m+k
n!m! 1 n!m!
= =
(m + n)! m + n + 1 ( n + m + 1) !

Now conditional on Θ = θ, Bn ∼ Bin(n, θ ). Thus we can marginalize over θ to find


Z 1 Z 1 
n k
Pr( Bn = k) = Pr( Bn = k | Θ = θ ) dθ = θ (1 − θ )n−k dθ
0 0 k
  (165)
n 1
= B(k + 1, n − k + 1) =
k n+1

27
Compare this with (123). Now
Z 1
Pr( Bn = k; Bn−1 = k − 1) = Pr( Bn = k; Bn−1 = k − 1 | Θ = θ ) dθ
0
Z 1
n − 1 k −1
 
= θ· θ (1 − θ )n−k dθ (166)
0 k−1
n−1
 
k
= B(k + 1, n − k + 1) =
k−1 n ( n + 1)

We’ve used the fact that Pr( Bn = k | Bn−1 = k − 1, Θ = θ ) = θ, since conditional on Θ,


the trials are independent with probability θ. Therefore using Baye’s rule

Pr( Bn = k; Bn−1 = k − 1) k
Pr( Bn = k | Bn−1 = k − 1) = = (167)
Pr( Bn−1 = k − 1) n

This is eactly the rule for for Polya’s urn.


Let’s compute the regular conditinal pdf for Θ. Let B(n) = ( B1 , B2 , . . . , Bn ) be the
vector of values of Bk for k = 1, . . . , n and let b = (b1 , . . . , bn ) be a realization. Now
n
Pr( B(n) = b | Θ = θ ) = ∏ θ I ( bk = bk −1 + 1 ) ( 1 − θ ) I ( bk = bk − 1 ) (168)
k =1

That is, we pick up a factor of θ whenever bk increases (we flipped a head) or 1 − θ when-
ever bk stays the same (we flipped a tails). Clearly this can be simplified to

Pr( B(n) = b | Θ = θ ) = θ Bn (1 − θ )n− Bn (169)

Marginalizing over Θ, we compute


Z 1
Bn !(n − Bn )!
Pr( B (n)
= b) = θ Bn (1 − θ )n− Bn dθ = (170)
0 ( n + 1) !
Now by Bayes rule

Pr( B(n) = b | Θ = θ ) Pr(Θ = θ )


Pr(Θ = θ | B(n) = b) =
Pr( B(n) = b)
( n + 1) ! (171)
= θ Bn (1 − θ )n− Bn dθ
Bn !(n − Bn )!
= Nnθ dθ

This proves the last assertion in the problem.

28
The jailhouse problem.
Z ∞ Z ∞
x −1 − x
Γ( x )Γ(y) = s e ds ty−1 e−y dt
Z0 ∞ Z ∞ 0

= s x−1 ty−1 e− x−y ds dt


0 0
Z ∞Z 1
= (uv) x−1 ((1 − u)v)y−1 e−x−y v du dv (172)
0 0
Z 1 Z ∞
= u x−1 (1 − u)y−1 du v x+y−1 e− x−y dv
0 0
= B( x, y)Γ( x + y)
s
where we made the substitution u = = s + t with Jacobian
s+t , v

∂(u, v) t/(s + t) −s/(s + t)
= 1 = 1
= (173)
∂(s, t) 1 1 s+t v

When x, y ∈ N we recover our previous formula (164) since Γ(n) = (n − 1)! for n ∈ N 

10.9 Show that if X is a non-negative supermartingale and T is a stopping time then

E ( X T ; T < ∞ ) ≤ E X0 (174)

Deduce that
E X0
Pr(sup Xn ≥ c) ≤ (175)
c

Let Zn = Xn IT <∞ . Then as n → ∞, Zn∧T → XT IT <∞ since if T (ω ) = N then Zn = XT for


all n > N so Zn → XT and if T (ω ) = ∞ then Zn = 0 for all n. Since Xn ≥ 0, by Fatou’s
lemma
E[ XT ; T < ∞] = E lim ZT ∧n = E lim inf ZT ∧n ≤ lim inf E ZT ∧n (176)
so, because Xn ≥ 0 and because of the supermartingale property

E[ XT ; T < ∞] − E[ X0 ] ≤ lim inf E ZT ∧n − E X0


≤ lim inf E XT ∧n − E X0 (177)
n→∞
≤0

This gives us (174).


Now consider the stopping time T = inf{ Xn ≥ c}. From the definition of T, if n = T
then Xn ≥ c.

c Pr(max Xn ≥ c) = c Pr( T < ∞) ≤ E[ XT I{T <∞} ] ≤ E X0 (178)

This gives us (175). Note this is a maximal version of Markov’s inequality. 

29
10.10 The “Starship Enterprise” problem. The control system on the star-ship Enter-
prise has gone wonky. All that one can do is to set a distance to be travelled. The
spaceship will then move that distance in a randomly chosen direction, then stop. The
object is to get into the Solar System, a ball of radius ρ. Initially, the Enterprise is at a
distance R0 > ρ from the Sun.
Show that for all ways to set the jump lengths rn
ρ
Pr( the Enterprise returns to the solar system) ≤ (179)
R0
and find a strategy of jump lengths rn so that
ρ
Pr(the Enterprise returns to the solar system) ≥ −e (180)
R0

1
Let Rn be the distance after n space-hops. Now f ( x) = | x|
is a harmonic function satis-
∂2 f
fying ∇2 f = ∑i ∂xi2
= 0 when x 6= 0. Let r = | x |, let S( x, r ) be the 2-sphere of radius r
centered at x. Then let S(r ) = S(0, r ), and let dσ be the surface area element. Similarly let
B( x, r ) be the ball of radius r centered at x.

∂f x ∂2 f 1 3xi2 3 3r2
= − 3i = − + ∇2 f = − + =0 (181)
∂xi r ∂xi2 r3 r5 r3 r5

Any harmonic function satisfies the mean-value property. Let

1 1
ZZ ZZ
A( x, r ) = f dσ = f ( x + rn) dσ1 (182)
4πr2 S (r ) 4π S (1)

In the above we applied a dilation to integrate over S(1) with surface element dσ1 = r2 dω.
Then differentiating under the integral sign with respect to r

d 1 1
ZZ ZZ
A( x, r ) = ∇ f ( x + rn) · n dσ1 = ∇ f · n dσ
dr 4π S(1) 4πr2 S (r )
(183)
1
ZZZ
= ∇2 f dV = 0
4πr2 B (r )

In the last equation we used Gauss’ law. By continuity A( x, r ) → f ( x ) as r → 0 so


A( x, r ) = f ( x ) for all r.

The probability law for the starship enterprise for a jump of size r is given by 4πr 2 on
− 1
the ball S( x, r ). so E 1/Rn+1 = A( xn , r ) for f ( x) = | x | . Hence, if r < Rn , by the above
calculation
1 1
E = (184)
R n +1 Rn
since | x |−1 is harmonic inside of B( x, r ), and therefore 1/Rn is a martingale.

30
Before considering the case r > Rn we need to perform a calculation. For any e > 0
and f ( x) = | x|−1

x·x 4πe2
ZZ ZZ
∇ f · n dσ = − dσ = − = −4π (185)
S(e) S(e) r4 e2

since ∇| x|−1 = − x/r3 and n = x/r. Remarkably, the integral does not depend on e. Thus
for r > Rn , we can modify (183) to integrate over B( x, r ) \ B(0, e), and then take e → 0.
This gives the formula (
d 0 if r < | x |
A( x, r ) = 1 (186)
dr − r2 if r ≥ | x |
Thus, this more complete analysis shows that 1/Rn is a supermartingale everywhere, and
1 1 1
E = ≤ (187)
R n +1 max(r, Rn ) Rn
By the positive supermartingale maximal inequality (175)
 
1 1 ρ
Pr(min Rn ≤ ρ) = Pr max ≥ ≤ (188)
n n Rn ρ R0
The probability on the left is the probability that the enterprise ever gets into the solar
system, and it applies to any strategy rn of jump-lengths.
To get a lower bound, consider the process with constant jump sizes rn = e for any
0 < e < ρ, and also choose ρ0 > R0 . Let τ = inf{ Rn ≤ ρ} and τ 0 = inf{ Rn ≥ ρ0 }
and let t = τ ∧ τ 0 . First we show Pr(t < ∞) = 1, that is we hit one barrier or the other
eventually. Note the x coordinate of the Enterprise’s location follows a random walk
since the x projection of each jump of length e is i.i.d.. Let Xn be the x-coordinate at time
n. By the symmetry of each spherical jump, Xn is a martingale, so E Xn = X0 . Also
if Jn = Xn − Xn−1 note Var( Jn ) = e2 σ02 for some σ0 since, by dilation symmetry of the
spherical jump, the variance scales as e2 .

Pr( Rn ≥ ρ0 ) ≥ Pr(| Xn | ≥ ρ0 ) ≥ Pr(| Xn − X0 | > 2ρ0 ) (189)

However by the central limit theorem


2ρ0
 
0
Pr(| Xn − X0 | > 2ρ ) → 2Φ − √ →1 as n → ∞ (190)
eσ0 n
We conclude

Pr(t = ∞) ≤ Pr(τ 0 = ∞) = Pr(max Rn ≤ ρ0 ) ≤ inf Pr( Rn ≤ ρ0 ) = 0 (191)


n n

Now let’s consider the stopped martingale Zn = 1/Rt∧n . This is a martingale since our
chose of e < ρ implies rn = e < Rn for all n. We can apply Doob’s optional stopping
theorem since
1 1
0
≤ Zn ≤ (192)
ρ +e ρ−e

31
and therefore Zn is a bounded, positive martingale. We can break the expectation up into
two parts, when Rn hits the upper and lower barriers ρ0 and ρ.

1 Pr(t = τ ) Pr(t = τ 0 )
= E[ Zt ] = s E[ Zt ; t = τ ] + E[ Zt ; t = τ 0 ] ≤ + (193)
R0 ρ−e ρ0
where we’ve estimated the expectations by the simple upper bounds implied by

ρ − e ≤ Rt ≤ ρ if t = τ
(194)
ρ0 ≤ Rt ≤ ρ0 + e if t = τ 0

This gives the lower bound


1 1
R0 − ρ 0
Pr(t = τ ) ≥ 1 1
(195)
ρ−e − ρ0

Now as ρ0 → ∞, the event {t = τ } is essentially the event that the Enterprise returns
to the solar system. It says that the distance Rn ≤ ρ occurs before Rn → ∞. So we can
interpret this equation in the limit that ρ0 → ∞ to say
ρ−e
Pr(Enterprise returns) ≥ (196)
R0


10.11 Star Trek 2, Captain’s Log


Mr Spock and Chief Engineer Scott have modified the control system so that the Enter-
prise is confined to move for ever in a fixed plane passing through the Sun. However,
the next ’hop-length’ is now automatically set to be the current distance to the Sun (’next’
and ’current’ being updated in the obvious way). Spock is muttering something about
logarithms and random walks, but I wonder whether it is (almost) certain that we will
get into the Solar System sometime. ” ’

1
Let f ( x) = log| x| = 2 log( x12 + x22 ).

∂f x ∂2 f 1 2xi2 2 2r2
= − 2i =− 2+ 4 2
∇ f =− 2+ 4 =0 (197)
∂xi r ∂xi2 r r r r

so this function is harmonic for x 6= 0. Thus log Rn is a martingale so Xn = log Rn −


log Rn−1 has mean zero. Note the law of Rn is unique determined by the value of Rn−1 .
However, the law of Xn is independent of the value of Rn−1 because

Xn = log Rn − log Rn−1 = log αRn − log αRn−1 for any α > 0 (198)

We can choose α = 1/Rn−1 . Therefore we conclude the sequence X1 , X2 , . . . is i.i.d. Fur-


thermore, as we’ll show later, E Xn2 < ∞, so the random variables have finite variance σ2 .
Consider
S n = X1 + X2 + · · · + X n (199)

32
For any real number α > 0,

Pr(inf Sn = −∞) ≥ Pr(Sn ≤ −ασ n, i.o) (200)
n

since if the event on the right hand side happens, the event on the left hand side cer- √
tainly does, since Sn takes on values which are arbitrarily negative if its less than −ασ n
infinitely often.
Let Ek be a sequence of events. Then the event that Ek occur infinitely often is given
by \ [
Ei.o. = Ek (201)
n ∈N k ≥ n

This implies that


[ [
Pr( Ei.o ) = lim Pr( Ek ) = lim sup Pr( Ek ) (202)
n→∞ n→∞
k≥n k≥n

The limit exists because the terms are non-increasing, and it equals the lim sup. Since
Pr( k≥n Ek ) ≥ Pr( En ), a lower bound is given by
S

Pr( Ei.o ) ≥ lim sup Pr( En ) (203)


n→∞

By the central limit theorem



lim Pr(Sn / n ≤ −ασ ) = Φ(−α) > 0 (204)
n→∞

and therefore Pr(infn Sn = −∞) > 0. This implies that Pr(infn Sn = −∞) = 1 by Kolmol-
gorov’s 0-1 law, since we will show this event is in the tail field.
There are two loose ends to tie up. First we will demonstrate that the event E =
{inf Sn = −∞} is in the tail field. Select k such that Snk ≤ −k. Now replace X1 , . . . , Xm by
X10 , . . . , Xm
0 , and let

(
X10 + X20 + · · · + Xn0 if n ≤ m
Sn0 = 0 0 0
(205)
X1 + X2 + · · · + Xm + Xm+1 + · · · + Xn if n > m

If D = ∑im=1 Xi0 − Xi , then for n > m, Sn0 = Sn + D. From this its obvious that, for
large enough k, Sn0 k ≤ −k + M → −∞. Thus inf Sn0 = −∞ as well. Therefore E is in
σ ( Xm+1 , Xm+2 , . . . ) since its independent of the values of X1 , . . . , Xm . Since this is true for
any m, E is in the tail field.
Finally we show that σ2 = Var( Xi ) < ∞. We show E Xi2 < ∞. Without loss of gen-
erality, assume Rn−1 = 1, and we’ll compute the pdf of R = Rn . Let Θ be the direction
the Enterprise travels, with Θ = 0 corresponding to the direction toward the center of the
solar system. By hypothesis Θ ∼ U (0, 2π ). Now R is the base of an isoceles triangle with
edges 1 and opposite angle Θ so
Θ 4
R = 2 sin ⇒ p R (r ) = √ (206)
2 4 − r2

33
where p R (r ) is the pdf of R. Thus we want to compute

4(log r )2 4u2 eu
Z 2 Z log 2
2
E(logR) = √ dr = √ du (207)
0 4 − r2 −∞ 4 − eu
The left tail when r is close to zero equivalent to the
R ∞ transformed integral when u is close
to −∞. This integral resembles the tail of Γ(3) = 0 u e , and is therefore finite. On the
2 − u

right tail when r → 2 the integrand resembles 4(log 2)2 / 4 − r2 which has antiderivative
A arctan r/2 for some constant A, and therefore finite integral on [2 − e, 2]. On the inter-
mediate range [e, 2 − e], the integrand is continuous and hence bounded, so the integral
is bounded there too. Thus E(log R)2 < ∞ 

12.1 Branching process. A branching process Z = { Zn : n ≥ 0} is constructed in the


(n)
usual way. Thus, a family { Xk : n, k ≥ 0} of i.i.d. Z+ -valued random variables is
supposed given. We define Z0 = 1 and recursively
( n +1) ( n +1)
Zn+1 = X1 + · · · + XZn (208)

(n)
Assume that if X denotes any of the Xk then

µ = EX < ∞ and 0 < σ2 = Var( X ) < ∞ (209)

Provet that Mn = Zn /µn defintes a martingale M relative to the filtration Fn =


σ( Z0 , Z1 , . . . , Zn ). Show that

E( Zn2 +1 | Fn ) = µ2 Zn2 + σ2 Zn (210)

and deduce that M is bounded in L2 iff µ > 1. Show that when µ > 1

σ2
Var( M∞ ) = (211)
µ ( µ − 1)

( n +1) ( n +1)
Note that since E( Xk | F n ) = E Xk = µ and since Zn is Fn -measurable
Zn
∑ E( Xk
( n +1)
E( Zn+1 | Fn ) = | Fn ) = Zn µ (212)
k =1

From this it immediately follows that Mn is a martingale

E( Mn+1 | Fn ) = E( Zn+1 /µn+1 | Fn ) = Zn /µn = Mn (213)


( n +1)
Similarly since Var( Xk | Fn ) = σ2 ,
Zn
∑ Var(Xk
( n +1)
Var( Zn+1 | Fn ) = | Fn ) = Zn σ2 (214)
k =1

34
But by definition, Var( Zn+1 | Fn ) = E( Zn2 | Fn ) − (E( Zn+1 | Fn ))2 , so

E( Zn2 | Fn ) = (E( Zn+1 | Fn ))2 + Zn σ2 = Zn2 µ2 + Zn σ2 (215)

From this we get

µ2 Zn2 Zn σ2 2 σ2
E ( Mn + 1 | Fn ) = 2n+2 + 2n+2 = Mn + Mn n+2 (216)
µ µ µ
Thus
σ2 σ2
bn+1 = E( Mn2 +1 − Mn2 ) = E M n = (217)
µ n +2 µ n +2
Note bn is a geometric series and therefore ∑n bn < ∞ iff µ > 1. By the theorem in section
12.1, M is bounded in L2 iff ∑n bn < ∞, and in that case it converges almost surely to
2 =
some random variable M∞ with E M∞ ∑n bn + M02 . When µ > 1 we can sum the series

2 σ2 /µ2 σ2
Var( M∞ ) = E M∞ − ( E M∞ ) 2 = = (218)
1 − µ −1 µ ( µ − 1)
since E M∞ = M0 

12.2 Let E1 , E2 , . . . be independent events with Pr( En ) = 1/n. Let Yk = IEk . Prove that
∑(Yk − 1k )/ log k converges a.s. and use Kronecker’s Lemma to deduce that

Nn
→1 a.s. (219)
log n

where Nn = Y1 + · · · + Yn
1
First note E Yk = Pr Ek = k and
1
Var Yk ≤ E Yk2 = E IE2 k = E IEk = Pr( Ek ) = (220)
k
Yk −k−1
Consider the random variable Zk = log k . Note that E Zk = 0 and

Var(Yk ) 1
σk2 = Var Zk = ≤ (221)
(log k)2 k (log k )2
dx
= − log1 x so by the integral test, ∑k σk2 < ∞. Also the Zk are independent,
R
Now x(log x )2
so by theorem 12.2,  
1
∑ Zk = ∑ Yk − k / log k (222)
k k
converges almost surely. By Kronecker’s lemma, this means

∑nk=1 Yk − k−1 Nn − Hn
= →0 a.s. (223)
log n log n

35
where Nn = Y1 + · · · + Yn and Hn = 1 + 21 + · · · + n1 is the harmonic series. Now
Hn → log n + γ where γ is the Euler-Mascheroni constant. In particular Hn / log n → 1.
Therefore
Nn
→1 a.s. (224)
log n
Note that the events in 4.3 for X1 , X2 , . . . drawn from i.i.d. continuous distribution that
there is a “record” at time k satisfies the hypothesis of this problem. In 4.3 we show the
events are independent, and that they have probability 1k . 

12.3 Star Trek 3. Prove that if the strategy in 10.11 is (in the obvious sense) employed –
and for ever – in R3 rather than in R2 , then

∑ R−n 2 < ∞ a.s. (225)

where Rn is the distance from the Enterprise to the Sun at time n.

Let En be the event that the Enterprise returns to the solar system for the first time at time
n. The area of a cap on a sphere with central angle θ is 2πR2 (1 − cos θ ) and therefore the
probability of a jump landing in that cap is 21 (1 − cos θ ). The area formula can be seen by
integrating the surface area element R2 sin θ dθdφ or by using Archemedes observation
that the projection of an sphere onto an enclosing cylendar preserves surface area. For
the jumps strategy described in this problem, the radius of the jump sphere for jump n is
Rn−1 , the distance from the Sun. The probabilty we enter the solar system corresponds
to the cap whose edge is distance r from the sun. Thus there is an isoceles triangle with
equal sides R and base r, and the opposite angle of the base is θ. By the law of cosines,
ρ2 = 2R2 − 2R2 cos θ. Let T be the stopping time that the Enterprise returns to the solar
system T = inf{ Rn ≤ r }. This implies
 2
 ρ k≤T
2
ξ k = Pr( Ek | Fk−1 ) = 4Rk−1 (226)
0 otherwise

Using the notation of theorem 12.15, Zn = ∑k≤n IEn ≤ 1 for all n. Let Y∞ = ∑∞
k =1 ξ k =
ρ2
∑kT=1 4R2k−1
. If Y∞ = ∞ then 12.15b implies that limn→∞ Zn /Yn = 1, but this is a contradic-
tion because Zn is bounded. Therefore Y∞ < ∞ almost surely.
TODO: to clinch the argument I want to take ρ → 0 and T → ∞, but the sum also gets
bigger in this limit since it includes the terms where Rn is really close to 0.


13.1 Prove that a class C of RVs is UI if and only if both of the following conditions hold

(i) C is bounded in L1 , so that A = sup{ E(| X |) : X ∈ C} < ∞

(ii) for every e > 0, ∃δ > 0 such that if F ∈ F , Pr( F ) < δ and X ∈ C then E(| X |; F ) < e

36
Assume the conditions are true. Let

BK = sup{Pr(| X | > K )} (227)


X ∈C

Now for any X ∈ C , Markov’s inequality implies

E| X | A
Pr(| X | > K ) ≤ ≤ (228)
K K
Taking the supremum over all X ∈ C , this shows BK ≤ A/K. Therefore BK → 0 as K → ∞.
For any e > 0, there is a δ satisfying condition (ii). Take K large enough that BK < δ. From
the definition of BK , all of the sets F = {| X | > K } satisfy Pr( F ) < δ, and hence condition
(ii) implies E(| X |; | X | > K ) < e for each X ∈ C . This verifies that C is UI.
Conversely assume C is UI. For any K > 0 and F ∈ F , note

IF ≤ IF ( I|X |>K + I|X |≤K ) ≤ I|X |>K + I|X |≤K IF (229)

Multiplying both sides by | X | and taking expectatins gives

E(| X |; F ) ≤ E(| X |; | X | > K ) + E(| X |; F ∩ {| X | ≤ K })


(230)
≤ E(| X |; | X | > K ) + K Pr( F )

Hence, given e > 0 choose K such that E(| X |; | X | > K ) < 2e for all X. Then choose δ = 2K
e
.
Equation (230) shows that for this choice of δ, condition (ii) is satisfied. Furthermore with
F = Ω and K chosen so E(| X |; | X | > K ) ≤ 1, (230) gives a uniform bound of 1 + K for
E(| X |), so condition (i) is satisfied. 

13.2 Prove that if C and D are UI classes of RVs, and if we define

C + D = { X + Y : X ∈ C , Y ∈ D} (231)

then C + D is UI. Hint.

We’ll verify the conditions in 13.1. First note that for Z0 ∈ C + D with Z0 = X0 + Y0

E| Z0 | ≤ E| X0 | + E|Y0 | ≤ sup E| X | + sup E|Y | (232)


X ∈C Y ∈D

The bound on the right hand side is independent of Z, so its a bound for supZ∈C+D E| Z |,
and condition (i) is satisfied.
For e > 0, choose δ such that for any X ∈ C and Y ∈ D and F ∈ F with Pr( F ) < δ,
then E(| X |; F ) < 2e and E(|Y |; F ) < 2e (Take δ to be the minimum of the δ needed for C or
D alone). Then for Z = X + Y
e e
E(| Z |; F ) ≤ E(| X |; F ) + E(|Y |; F ) < + =e (233)
2 2
Thus condition (ii) is satisfied for C + D 

37
13.3 Let C be a UI family of RVs. Say that Y ∈ D if for some X ∈ C and some sub-σ-
algebra G of F , we have Y = E( X | G), a.s. Prove that D is UI.

Let G ∈ G and Y = E( X | G). Then


E(|Y |; G ) = E(|E( X | G)| IG ) = E(|E( XIG | G)|)
(234)
≤ E(E(| XIG | | G)) = E(| X | IG ) = E(| X |; G )
The inequality in the middle is the conditional Jensen’s inequality for the convex function
| X |. When G = Ω, this gives a bound E(|Y |) ≤ E(| X |). So A = supX ∈C E(| X |) is a bound
for supY ∈D E(|Y |), and condition (i) of 13.1 is satisfied. Given e > 0, choose δ > 0 to
satisfy condition (ii) of 13.1 for C . Then for any G ∈ G ⊂ F with Pr( G ) < δ have for any
Y∈D
E(|Y |; G ) ≤ E(| X |; G ) < e (235)
Thus D also satisfies condition (ii) of 13.1. Satisfying both conditions, it is UI. 

14.1 Hunt’s lemma. Suppose that ( Xn ) is a sequence of RV’s such that X = lim Xn
exists a.s. and that ( Xn ) is dominated by Y in ( L1 )+ .

| Xn (ω )| ≤ Y (ω ), ∀(n, ω ), and E (Y ) < ∞ (236)

Let {Fn } be any filtration. Prove that

E( Xn | F n ) → E( X | F ∞ ) a.s. (237)

Let Zm = supr≥m | Xr − X |. This tends to 0 because


lim Xn = X ⇒ lim| Xn − X | = 0 ⇒ lim sup| Xn − X | = 0 (238)
n n n

The last limit is the same as limm Zm = 0. Furthermore Zm → 0 in L1 by dominated con-


vergence. Note | X | = limn | Xn | ≤ Y, so | Zm | ≤ supr≥m | Xr − X | ≤ 2Y and is dominated
by an L1 function. Therefore E Zm → 0.
Let’s turn to E( Xn | Fn ). Note for any m ≤ n,
|E( Xn | Fn ) − E( X | F∞ )| ≤ |E( X | Fn ) − E( X | F∞ )| + |E( Xn | Fn ) − E( X | Fn )|
≤ |E( X | Fn ) − E( X | F∞ )| + E(| Xn − X | | Fn )
≤ |E( X | Fn ) − E( X | F∞ )| + E( Zm | Fn ) (239)
≤ |E( X | Fn ) − E( X | F∞ )|
+ |E( Zm | Fn ) − E( Zm | F∞ )| + E( Zm | F∞ )
By Doob’s upward convergence theorem, E( X | Fn ) → E( X | F∞ ) and E( Zm | Fn ) →
E( Zm | F∞ ) in L1 and almost surely. By the tower property and positivity,
E|E( Zm | F∞ )| = E Zm → 0 (240)
so the last term converges to 0 in L1 . Furthermore by conditional dominated convergence,
E( Zm | F∞ ) → E(0 | F∞ ) = 0, so the last term converges to 0 almost surely. Therefore
E( Xn | Fn ) → E( X | F∞ ) in L1 and almost surely. 

38
14.2 Azuma-Hoeffding

(a) Show that if Y is a RV with values in [−c, c] and E Y = 0 then for θ ∈ R


 
1 2 2
E e ≤ cosh θc ≤ exp
θY
θ c (241)
2

(b) Prove that if M is a positive martingale null at 0 such that for some sequence (cn :
n ∈ N) of positive constants

| Mn − Mn − 1 | ≤ c n , ∀n (242)

then, for x > 0, ! !


 n
1 2
Pr sup Mk ≥ x ≤ exp
2
x ∑ c2k (243)
k≤n k =1

(a) Define α such that Y is the convex combination of c and −c


Y+c
α= ⇒ Y = cα − c(1 − α) (244)
2c
1
Since |Y | ≤ c we have α ∈ [0, 1]. By Jensen’s inequality since E α = 2

1 θc
E eθY = E eθcα−θc(1−α) ≤ eθc E α + e−θc E(1 − α) = (e + e−θc ) = cosh θc (245)
2
Now compare the Taylor series of cosh x vs exp( 12 x2 )

x2k x2k
 
1 2
cosh x = ∑ exp x =∑ (246)
k
(2k)! 2 k 2k k!

For x ≥ 0, we see exp( 12 x2 ) dominates cosh x term by term since the product on the
RHS below contains only the even terms

(2k)! = 2k · (2k − 1) · · · 2 · 1 ≥ 2k · (2k − 2) · · · 2 = 2k k! (247)


1 2 2
Thus we get the second inequality cosh θc ≤ e 2 θ c .

(b) Now by the conditional Jensen inequality, e Mn is a submartingale. By Doob’s sub-


martingale inequality
!
Pr sup eθMk ≥ eθx ≤ e−θx E eθMn (248)
k≤n

Using the tower property, the inequality from (a)


1 2 2
E eθ Mn = E(E(e Mn − Mn−1 e Mn−1 | Fn−1 )) = e− 2 θ cn
E eθMn−1 (249)

39
1 2 2 2 2
Thus by induction E eθ Mn = e 2 θ (cn +cn−1 +···+c1 ) E e M0 = exp( 12 θ 2 ∑k c2k ). If we choose
θ to minimize exp(−θx + 12 θ 2 ∑k c2k ) we get θ = x/(∑k c2k ). Plugging this into (248),
where we can rewrite the LHS because eθx is monotonically increasing
!  !
1
Pr sup Mk ≥ x ≤ exp − x ∑ c2k (250)
k≤n 2 k

16.1 Prove that Z T


π
lim x −1 sin x dx = (251)
T ↑∞ 0 2

Define Z ∞ −λx
−1 e sin x
f (λ) = L[ x sin x ] = dx (252)
0 x
Then by (97) f 0 (α) = − L[sin x ]. We can compute this handily
Z ∞
0
f (α) = − e−λx sin x dx (253)
0
∞ Z ∞
−λx
= cos xe +λ e−λx cos x dx (254)
0 0
 ∞ Z ∞ 
−λx −λx
= 1 + λ sin xe +λ sin xe (255)
0 0
= (1 + λ2 ) f 0 ( α ) (256)

From dominated convergence theorem, f (∞) = 0. Thus we integrate from ∞ to find


Z α
π
f (λ) = 2
= arctan(λ) + (257)
∞ 1+λ 2
R∞
Set λ = 0 to find 0 sinx x dx = π2
A second solution is to use complex analysis. Consider the contour γ in the complex
plane along the real line from − T to −e, then a semicircle centered at 0 from −e to e, then
along the real line from e to T, then a semicircle centered at 0 from T to − T. By Cauchy’s
theorem Z iz
e
lim dz = iπ (258)
e →0 γ z

since the integrand is holomorphic inside of the region bound by γ, but has a residue at
0, and the arc of radius e is π radians. On the semicircle of radius T, writing z = Teiθ we
have dz = iTeiθ dθ.
exp(iTeiθ )
Z π Z π Z π

e−T sin θ dθ
iT (cos θ +i sin θ ))
iTe dθ ≤ e dθ = (259)
0 Teiθ 0 0

40
The integrand is dominated by 1, so by the dominated convergence theorem the integral
converges to 0 as T → ∞. Therefore as T → ∞ and e → 0 we have
Z ∞ ix
eiz e
Z
lim dz = dx (260)
T →∞,e→0 γ z −∞ x

and we conclude this integral equals iπ. Making the substitution x → − x we conclude
Z ∞ −ix
e
dx = −iπ (261)
−∞ x
eix −e−ix
Using the fact sin x = 2i , this implies
Z ∞ Z ∞
sin x sin x π
dx = π ⇒ dx = (262)
−∞ x 0 x 2
Since the integrand on the left is even. 

16.2 Prove that if Z has U [−1, 1] distribution, then

φZ (θ ) = sin θ/θ (263)

Show there do not exist i.i.d. RV X and Y such that

X − Y ∼ U [−1, 1] (264)

Computing
Z 1 iθx 1

1 e = sin θ
φZ (θ ) = E eiθZ = eiθx dx = (265)
−1 2 2iθ −1 θ
For X and Y i.i.d. and Z = X − Y

φZ (θ ) = E eiθ (X −Y ) = E eiθX e−iθY = E eiθX E e−iθY = φX (θ )φX (−θ ) (266)

where we used the property that E f ( X ) g(Y ) = E f ( X ) E g(Y ) for independent X and Y.
However since φX (−θ ) = φX (θ ), in order for Z = X − Y to satisfy Z ∼ U [−1, 1] it must
be that
sin θ
kφX (θ )k2 = (267)
θ
But this is impossible since the left hand side is strictly non-negative whereas the right
hand side is sometimes negative. 

16.3 Calculate φX (θ ) where X has the Cauchy distribution, which has pdf 1+1x2 . Show
that the Cauchy distribution is stable, i.e., that ( X1 + X2 + · · · + Xn )/n ∼ X for i.i.d.
Cauchy Xi .

41
First assume θ > 0. Consider the contour γ which goes from − R to R along the real line,
and then along a semicircle centered at 0 from R to − R. By Cauchy’s theorem

eθiz e−θ
Z
dz = 2πi = πe−θ (268)
γ 1 + z2 2i

since there is exactly one residue at z = i (the denominator can be written (i + z)(i − z))
and the residue there is e−θ /2. Along the semicircle we can use the elementary bound
iθz
Z
e πR
γ 0 1 + z2 ≤ R2 − 1 → 0
dz (269)

since for z = Reiψ the magnitude exp(iθReiψ ) = e− Rθ sin ψ ≤ 1 (this is where we use

the hypothesis θ > 0 and that the path is a semicircle in the upper half plane). Also
|1 + z2 | ≥ |z|2 − 1 = R2 − 1. Thus the integrand is bound by R21−1 and the length of the
path is πR. Thus in the limit only the segment along the real line makes a contribution to
the path integral and we conclude for θ > 0
Z ∞
1 eiθx
φX ( θ ) = dx = e−θ if θ ≥ 0 (270)
π −∞ 1 + x2
For θ < 0 we can modify the above argument, using a semicircle in the lower half
plane from R to − R instead. Then in Cauchy’s theorem we pick up the residue at −i (in-
stead of i) going along a clockwise path (instead of counterclockwise). Thus the integral
equals πeθ . We get the same bounds on the integral which show the contribution of the
segment along the semicircle tends to 0. Thus
Z ∞
1 eiθx
φX ( θ ) = dx = eθ if θ ≤ 0 (271)
π −∞ 1 + x2

Both cases may be summarized by the formula φX (θ ) = e−|θ | . From this it immediately
follows that X is stable since

φSn /n (θ ) = φX (θ/n)n = e−n|θ/n| = e−|θ | (272)

By the uniqueness of characteristic functions, Sn /n is distributed like a standard Cauchy


random variable.
All that remains is to justify φSn /n (θ ) = φX (θ/n)n . This follows from the following
calculation for i.i.d. Xk :

φSn /n (θ ) = E eiθ (X1 +···+Xn )/n = (E eiθX1 /n )(E eiθX2 /n ) · · · (E eiθXn /n ) = φX (θ/n)n (273)

16.4 Suppose that X ∼ N (0, 1). Show that φX (θ ) = exp(− 21 θ 2 )

42
Note that
1 ∞Z
1 2 1 ∞ Z
1 2 1 2 1 2
φX ( θ ) = E eiθX
=√ eiθx e− 2 x dx = √ e− 2 ( x−iθ )) − 2 θ = e− 2 θ I (iθ ) (274)
2π −∞ 2π −∞
R∞ 1 2
where I (α) = √1 −∞ e− 2 ( x−α) dx. For a real parameter α, I (α) = I (0) = 1, since we can

just perform a change of variables u = x − α, and the range of integration is unchanged.
In the complex case, consider a path integral in C along the rectangle with corners at
1 2
± R and ± R − iθ of the function f (z) = √12π e− 2 z . Since f (z) is entire, the path integral is
0. As R → ∞, the integral along the bottom side from − R − iθ to R − iθ tends to I (iθ ). The
integral along the top side from R to − R tends to − I (0) (since the limits of the integral are
reversed). Along the sides of the rectangle, z = ± R − ix for x ∈ [0, θ ], so the magnitude
of the integrand satisfies

1 − 1 (± R−ix)2
√ e 2 = √1 |e− 21 ( R2 − x2 +2Rxi) | ≤ √1 e− 12 ( R2 −θ 2 ) → 0 (275)
2π 2π 2π
The path along this side is constant length θ as R → ∞. Therefore the contribution to the
path integral from the left and right sides of the rectangle is negligible as R → ∞, by the
simple bound of the path length times the maximum integrand magnitude. We conclude
Z
0= f (z) dz → I (iθ ) − I (0) (276)
γ

1 2
Thus I (iθ ) = I (0) = 1 and φX (θ ) = e− 2 θ . 

16.5 Prove that if φ is the characteristic function of a RV X then φ is non-negative definite


in that for c1 , c2 , . . . , cn ∈ C and θ1 , θ2 , . . . , θn ∈ R

∑ c j ck φ(θ j − θk ) ≥ 0 (277)
j,k

Consider the RV Z = ∑k ck eiθk X . Now by the positivity of expectations

E| Z |2 ≥ 0 (278)

Since ! !
| Z |2 = ZZ = ∑ c j eiθj X ∑ ck e−iθk X = ∑ c j c k ei (θ j −θk ) X (279)
j k j,k

Taking expectations of both sides and using linearity yields (277) 

43
16.6

(a) Let (Ω, F , Pr) = ([0, 1], B[0, 1], Leb). What is the distribution of the random variable
Z = 2ω − 1? Let ω = ∑ 2−n Rn (ω ) be the binary expansion of ω. Let

U (ω ) = ∑ 2− n Q n ( ω ) where Qn (ω ) = 2Rn (ω ) − 1 (280)


odd n

Find a random variable V independent of U such that U and V are identically dis-
tributed and U + 21 V is uniformly distributed on [−1, 1]

(b) Now suppose that (on some probability triple) X and Y are i.i.d. RV such that

1
X+ Y is uniformly distributed on [−1, 1] (281)
2

Let φ be the characteristic function of X. Calculate φ(θ )/φ( 14 θ ). Show that the dis-
tribution of X must be the same as that of U in part (a) and deduce that there exists
a set F ∈ B[−1, 1] such that Leb( F ) = 0 and that Pr( X ∈ F ) = 1

(a) The Lebesgue measure gives the uniform distribution on [0, 1]. Under the linear trans-
form Z = 2ω − 1 the distribution remains uniform, but over [−1, 1]. To see this note
for any z ∈ [−1, 1]
   
1 1 1
Pr( Z < z) = Pr ω < (z + 1) = Leb 0, (z + 1) = (z − (−1)) (282)
2 2 2
which is exactly the distribution for U [−1, 1]. Let V be given by a similar expression
to U
V (ω ) = ∑ 2−n (2Rn+1 (ω ) − 1) = ∑ 2−n+1 (2Rn (ω ) − 1) (283)
odd n even n
the difference being that the nth term uses the n + 1st Rademacher function Rn+1
rather than Rn . Its a standard result that the Rademacher functions Rn (ω ) are i.i.d..
The variables U and V are functions of disjoint collections of Rn , so they are indepen-
dent. Because their expressions as functions of identically distributed Rademacher
functions are the same, the variables have the same distribution. Its also clear that
1
U + V = ∑ 2−n (2Rn (ω ) − 1) + ∑ 2−n (2Rn (ω ) − 1)
2 odd n even n
! (284)
=2 ∑ 2− n R n ( ω ) − 1 = 2ω − 1
n ∈N

Thus U + 12 V ∼ U [−1, 1]

(b) From the assumption Z = X + 12 Y has Z ∼ U [−1, 1] we get


Z 1
1 du sin θ
φX ( θ ) φX ( θ ) = eiθu · = (285)
2 −1 2 θ

44
sin( 12 θ )
From this we deduce φX ( 21 θ )φX ( 14 θ ) = 1 and hence

φX ( θ ) φX (θ ) φX ( 12 θ ) sin θ 1
= = = cos ( θ) (286)
φX ( 41 θ ) φX ( 12 θ ) φX ( 14 θ ) 2 sin( 12 θ ) 2

Hence, by induction

φX ( θ ) φX (θ ) φX ( 41 θ ) φ (4− k +1 θ ) 1 1
− k
= 1 1
· · · − k
= cos( θ ) cos( θ ) · · · cos(2−2k+1 θ )
φX ( 4 θ ) φX ( 4 θ ) φX ( 16 θ ) φ (4 θ ) 2 8
(287)
Since φ is uniformly continuous and φ(0) = 0 we conclude by taking the limit k → ∞

φ(θ ) = ∏ cos(2−k θ ) (288)


odd k

On the other hand


1 iθ 1 −iθ
φQn (θ ) = E eiθQn = e + e = cos(θ ) (289)
2 2
Therefore since U = ∑odd k 2−k Qn we have

φU (θ ) = ∏ cos(2−k θ ) = φX (θ ) (290)
odd k

So it must be that U and X have the same distribution. TODO prove the measure
0 thing. Its not hard to show that the values of X lie in a cantor-like set with middle
thirds excluded. However this approach doesn’t use the characteristic function at all...
is there some clever thing with Fourier analysis?


18.1

(a) Supose that λ > 0 and that (for n > λ), Fn is the DF associated with the Binomial
distribution Bin(n, λ/n). Prove using CF’s that Fn converges weakly to F where F is
the DF of the Poisson distribution with parameter λ.

(b) Suppose that X1 , X2 , . . . are i.i.d. RV’s each with the density function (1 − cos x )/πx2
on R. Prove that for x ∈ R
X1 + X2 + · · · + X n
 
1
lim Pr ≤ x = + π −1 arctan x (291)
n→∞ n 2

where arctan ∈ (− π2 , π2 )

Consider the Bernoulli random variable Z with parameter p

φZ (θ ) = E eiθZ = peiθ ·1 + qeiθ ·0 = p(eiθ − 1) + 1 (292)

45
(a) Since B(n, p) is the sum of independent Bernoulli random variables, if Fn ∼ B(n, p) =
B(n, λ/n) then
n  n

iθ λ iθ
φFn (θ ) = pe + q = 1 + (e − 1) → exp(λ(eiθ − 1)) (293)
n
On the other if F has a Poisson distribution with parameter λ, its CF is given by

λk
φF (θ ) = E eiθF = ∑ eiθk e−λ
k!
= exp(λeiθ − λ) (294)
k =0

which is exactly the same as limn→∞ φFn (θ ). Thus we conclude that Fn → F in weakly
in distribution.
(b) Let’s compute
1 − cos x
Z
φX (θ ) = E eiθX = dx eiθx (295)
R πx2
To this end first let’s compute for k ∈ N and α > 0
Z
Ik (α) = x −k eiαx dx (296)
R

(Aside: there’s something subtle going on here, because, like the integral in 16.1, this
is not integrable in the L1 sense). As in problem 16.1, take a contour along the real
axis from − R to −e, a semicircle in the upper half plane from −e to e, a line from e
to R, and then a semicircle in the upper half plane back to − R. We use a variation
f (z)
of the residue theorem γ0 (z−z )k dz = (ki∆θ f (k−1) (z0 ), and ∆θ is the radians in the
R
0 −1) !
infinitesimal path.
(iα)k−1
I iαz
e
dz = πi (297)
γ zk ( k − 1) !
Using the technique of 16.1, we get an estimate of the integral along the big semicircle
Z π − Rα sin θ
eiαz e
Z
dz = dθ → 0 as R → ∞ (298)
γ00 zk 0 R k −1

Since | x −k | ≤ | x −1 | on the big semicircle, the estimate for the contribution of this
segment holds. Therefore as R → ∞ and e → 0 we get

eiαz
I
dz → Ik (α) (299)
γ zk
When α < 0 we can modify the argument by taking semicircles in the lower half
plane. This allows us to conclude the integral vanishes along the big semicircle, but
now the sense of integration along the small semicircle is reversed to clockwise from
counterclockwise. We can summarize both cases as
π (iα)k
Ik (α) = (300)
( k − 1) ! | α |

46
This gives I1 (α) = πi sgn( x ) and I2 (α) = −π |α|.
Expressing cos x = 12 (eix + e−ix ), our CF can be written
 
1 1
φX ( θ ) = I2 (θ ) − ( I2 (θ + 1) + I2 (θ − 1))
π 2
1
= (|θ + 1| + |θ − 1|) − |θ | (301)
2
(
1 − |θ | |θ | ≤ 1
=
0 otherwise

Thus φX is a kind of a witch-hat centered at 0.


Like in 16.3, for Sn = X1 + · · · + Xn ,
|θ | n
( 
|θ |
n 1− n if n ≤1
φSn /n (θ ) = φX (θ/n) = (302)
0 otherwise

For any fixed θ, for large enough n, |θ |/n ≤ 1 so the first case applies eventually.
Taking the limit as n → ∞ this implies

φSn /n (θ ) → e−|θ | (303)

This is the CF of the Cauchy distribution, Sn /n converges in distribution to the Cauchy


distribution. We conclude
Z x
dx 1
P(Sn /n ≤ x ) → 2
= + π −1 arctan( x ) (304)
−∞ π (1 + x ) 2

18.2 Prove the weak law of large numbers in the following form. Suppose X1 , X2 , . . .
are i.i.d. RV each with the same distribution as X. Suppose X ∈ L1 and that E X = µ.
Prove by use of CF’s that the distribution of

S n = n − 1 ( X1 + · · · + X n ) (305)

converges weakly to the unit mass at µ. Deduce that

Sn → µ in probability (306)

Of course, SLLN implies this weak law.

Let Z ∼ δµ be equal to µ almost surely. Then

φZ (θ ) = E eiθZ = eiθµ (307)

47
Now consider the Taylor expansion of φX (θ ). Note that

0
φX ( 0 ) = 1 (0) = E iXeiθX = iµ (308)

φX
θ =0

Therefore
log φX (θ ) = 1 + iµθ + o (θ ) (309)
So the characteristic function satisfies
  n
n iµθ θ
φSn (θ ) = φX (θ/n) = 1 + +o → exp(iθµ) = φZ (θ ) (310)
n n
This implies that Sn → δµ in distribution. However, for a unit mass, convergence in
distribution is the same as convergence in probability since for any e > 0
Pr(Sn < µ − e) → Pr( Z < µ − e) = 0 Pr(Sn > µ + e) → Pr( Z > µ + e) = 0 (311)
and hence Pr(|Sn − µ| > e) → 0 for any e > 0. This proves the weak law of large
numbers. 

Weak Convergence for [0, 1]

18.3 Let X and Y be RV’s taking values in [0, 1]. Suppose that

E Xk = E Yk k = 0, 1, 2, . . . (312)

Prove that

(i) E p( X ) = E p(Y ) for every polynomial p

(ii) E f ( X ) = E f (Y ) for every continuous function f on [0, 1]

(iii) Pr( X ≤ α) = Pr(Y ≤ α) for every α ∈ [0, 1].

Item (i) follows from linearity of expectations. If p( x ) = ∑nk=0 ck x k then


n n
E p( X ) = ∑ ck E X k = ∑ c k E Y k = E p (Y ) (313)
k =0 k =0

Item (ii) follows from the Weierstrass theorem. On the compact set [0, 1], every contin-
uous function f can be approximated uniformly by a sequence of polynomials pn . Choose
N large enough so that for all n > N, | f ( x ) − pn ( x )| < e for all x. In this case
|E f ( X ) − E pn ( X )| ≤ E| f ( X ) − pn ( X )| ≤ e (314)
Therefore
|E f ( X ) − E f (Y )| ≤ |E f ( X ) − E pn ( X )| + |E pn ( X ) − E pn (Y )| + |E pn ( X ) − E f ( X )|
≤ e + 0 + e = 2e
(315)

48
Since e is arbitrary, this shows E f ( X ) = E f (Y ).
Approximate I[0,α] by the continuous peicewise linear function

1
 x ∈ [0, α)
f n ( x ) = n (α − x ) x ∈ [α, α + n1 ) (316)
x ∈ [α + n1 , 1]

0

Using the dominated convergence theorem

lim E f n ( X ) = E I[0,α] ( X ) = Pr ( X ≤ α) (317)


n→∞

The same holds for Y. However, E f n ( X ) = E f n (Y ) for all n by part (ii). Therefore the
limits are equal and Pr( X ≤ α) = Pr(Y ≤ α). Thus the moments unique determine the
distribution on [0, 1]. 

18.4 Suppose that ( Fn ) is a sequence of DFs with

Fn ( x ) = 0 for x < 0, Fn (1) = 1, for every n (318)

Suppose that Z
mk = lim x k dFn exists for k = 0, 1, 2, . . . (319)
n [0,1]
w
Use
R the Helly-Bray Lemma and 18.3 to show that Fn → F, where F is characterized by
x k dF = m , ∀ k
[0,1] k

w
Let Fn be associated with the measure µn . Suppose a subsequence Fni → F converges
w
in distribution. By Helly-Bray there is a subsequence ni such that Fni → F, and call the
associated measure µ. By the definition of weak convergence since x k is a continuous
function on [0, 1], and since ni is a subsequence of n
Z
k k k
mk = lim µn ( x ) = lim µni ( x ) = µ( x ) = x k dF (320)
n ni [0,1]

w
If another subsequence ni0 satisfies Fn0 → F 0 with associated measure µ0 , then 10.3 implies
i
that F 0 = F since the above argument shows µ0 ( x k ) = mk as well, so µ0 ( x k ) = µ( x k ) for
all k.
w
This implies that Fn → F. For suppose there is some continuous g : [0, 1] → R such
that lim sup µn ( g) 6= µ( g) or lim inf µn ( g) 6= µ( g). Then choose a subsequence ni so
that µni ( g) converges to a value other than µ( g). By Helly-Bray we can take a further
subsequence converging to a distribution F 0 . This distribution must satisfy µ0 ( g) 6= µ( g),
but this contradicts what we’ve previously shown, that every convergent subsequence
has the same limit. Therefore limn µn ( g) exists for all g ∈ C [0, 1] and lim µn ( g) = µ( g).


49
18.5 Improving on 18.3, a moment inversion formula. Let F be a distribution with
F (0−) = 0 and F (1) = 1. Let µ be the associated law, and define
Z
mk = x k dF ( x ) (321)
[0,1]

Define

Ω = [0, 1] × [0, 1]N , =B × B N , Pr = µ × LebN


(322)
Θ(ω ) = ω0 , Hk (ω ) = I[0,ω0 ] (ωk )

This models the situation in which Θ is chosen with law µ, a coin with probability Θ is
minted and tossed at times 1, 2, . . . . See E10.8. The RV Hk is 1 if the kth toss produces
heads, 0 otherwise. Define
Sn = H1 + H2 + · · · + Hn (323)
By the strong law of large numbers and fubini’s theorem

Sn /n → Θ, a.s. (324)

Define a map D on the space of real sequences ( an : n ∈ Z+ ) by setting

Da = ( an − an+1 : n ∈ Z+ ) (325)

Prove that  
n
Fn ( x ) = ∑ ( D n −i m )i → F ( x ) (326)
i
at every point of continuity of F

By the SLLN, we know that

F ( x ) = Pr(Θ ≤ x ) = lim Pr(Sn /n ≤ x ) (327)


n→∞

Conditional on Θ, Sn ∼ Bin(n, Θ), so marginalizing over Θ and using Fubini’s theorem


we find
   Z
n n−k n
Z
Pr(Sn ≤ nx ) = ∑
[0,1] k≤nx k
θ (1 − θ ) µ(dθ ) = ∑
k
k [0,1]
θ n−k (1 − θ )k µ(dθ )
k≤nx
(328)
But we can write the integral in terms of the mk . By induction I claim
Z
n
( D m)k = x k (1 − x )n µ(dx ) (329)
[0,1]

50
Clearly this is true for n = 0, for n ≥ 1

( D n m)k = ( D · ( D n−1 m))k = ( D n−1 m)k − ( D n−1 m)k+1


Z Z
n −1
= k
x (1 − x ) µ(dx ) − x k+1 (1 − x )n−1 µ(dx )
[0,1] [0,1] (330)
Z
= x k (1 − x )n−1 (1 − x ) µ(dx )
[0,1]

Pulling together the parts we get (326) 

18.6 Moment Problem. Prove that if (mk : k ∈ Z+ ) is a sequence of numbers in [0, 1],
then there exiss a RV X with values in [0, 1] such that E( X k )


Weak Convergence for [0, ∞)

18.7 Using Laplace transforms instead of CF’s. Suppose that F and G are DF’s on R+
such that F (0−) = G (0−) and
Z Z
−λx
e dF ( x ) = e−λx dG ( x ), ∀x ≥ 0 (331)
[0,∞) [0,∞)

Note that the integral on the LHS has a contribution F (0) from {0}. Prove that F = G.
Suppose that ( Fn ) is a sequence of distribution functions on R each with Fn (0−) = 0
and such that Z
L(λ) = lim e−λx dFn ( x ) (332)
n
exists for λ ≥ 0 and that L is continuous at 0. Prove that Fn is tight and that
Z
w
Fn → F where e−λx dF ( x ) = L(λ), ∀λ ≥ 0 (333)

Let U = e−X and V = e−Y . Since the range of X, Y is [0, ∞) the range of U, V is [0, 1].
From (331)
E U k = E e−kX = E e−kY = E V k (334)
Hence by 10.3 the distributions of U and V are equal. However, since e x is monotonic and
invertible, this means that the distributions of X and Y are equal. More concretely, the
relationship Pr(U ≥ c) = Pr( X ≤ − log c) provides a dictionary to translate between the
distributions of X and U, and similarly for Y and V.
Given e > 0 choose λ small enough so that L(λ) > 1 − e/4. This is possible because L
is continuous at 0. Thus, because L is the limit of the Laplace transforms of the Fn , for all
but finitely many n we must have
Z
(1 − e−λx ) dFn ( x ) < e/2 (335)
[0,∞)

51
Thus for any K, for all but finitely many n
Z
−λK
(1 − Fn (K ))(1 − e )≤ (1 − e−λx ) dFn ( x ) < e/2 (336)
[0,∞)

Choose K large enough so that 1 − e−λK > 12 . Thus we get, for all but finitely many n
Fn (K ) > 1 − e (337)
which shows that Fn is tight.
The rest of the proof is basically the same as 18.4. Having proved tightness, Hally-Bray
implies that some subsequence converges in distribution to some distribution, say F. Any
distribution which is the limit of a subsequence of Fn has the Laplace transform L (since
the limit of a subsequences are the same as the limit of the sequence). By the argument
above, there is a unique distribution with a given Laplace transform, Thus, the limit of
a subsequence of Fn , when it exists, converges to a unique distribution regardless of the
w
subsequence.R This implies that R Fn → FRsince, for any continuous function g, we must
have lim sup g dFn = lim inf g dFn = g dF, since otherwise we could use Helly-Bray
to find a subsequence which converged to a distribution other than F.


A13.1 Modes of convergence.

(a) Show that ( Xn → X, a.s.) ⇒ ( Xn → X in prob)

(b) Prove that ( Xn → X in prob) 6⇒ ( Xn → X, a.s.)

(c) Prove that if ∑ Pr(| Xn − X | > e) < ∞, ∀e > 0 then Xn → X, a.s.

(d) Suppose that Xn → X in probability. Prove that a subsequence ( Xnk ) of ( Xn ) con-


verges Xnk → X, a.s.

(e) Deduce from (a) and (d) that Xn → X in probability if and only if every subsequence
of ( Xn ) contains a further subsequence which converges a.s. to X

(a) Suppose there is an event E ⊂ Ω with Pr( E) > 0 and for some e > 0, | Xn − X | > e
infinitely often for ω ∈ Ω. Then for every ω ∈ E, it can’t be that Xn → X since either
Xn > X + e infinitely often or lim supn Xn < X − e infinitely often, and hence either
the lim inf or lim sup is different from X. Thus Pr( Xn → X ) ≤ 1 − Pr( E) < 1.
(b) Let Ω = [0, 1], taking Pr as the Lesbegue measure on the Borel sets. Let
(
1 if nk ≤ x < k+
n
1
Xn,k ( x ) = (338)
0 otherwise
Then take the Xn,k as a sequence taking the indices in lexicographic order. That is,
take the n’s in order, and for a given n take the k’s in order. Now for n ≥ N, and
e ∈ (0, 1)
1 1
Pr(| Xn,k − 0| > e) = Pr( Xn,k = 1) = ≤ (339)
n N

52
Thus Xn,k → 0 in probability. On the other hand, lim inf Xn,k ( x ) = 1 for any x ∈ [0, 1).
That’s because for any n, let k n ( x ) = bn/x c. Then Xn,kn ( x) ( x ) = 1, and Xn,k ( x ) = 1
infinitel often. Therefore it can’t be that Xn,k → X almost sure (in fact, almost surely,
the sequence doesn’t converge).
A second solution is given by considering IEn for independent En where Pr( En ) = pn .
Note
Pr(| IEm − IEn | > e) = Pr( En ∆Em ) = (1 − pn ) pm + pn (1 − pm ) (340)
which are independent. TODO finish second solution

(c) By Borel-Cantelli, | Xn − X | ≤ e eventually almost surely, because only finitely of the


events {| Xn − X | > e} hold. By hypothesis, this is true for each e = 1/k for k ∈ N,
so take the intersection of these countably many almost sure sets Ek to get an almost
sure set E = k Ek . For samples ω ∈ E, for each k ∈ N, there exists an Nk (ω ) ∈ N
T

such that if n > Nk (ω ) then | Xn (ω ) − X (ω )| ≤ 1/k. For arbitrary e > 0, choose


1/k < e and Ne (ω ) = Nk (ω ) to get a corresponding statement for e. This is the
definition of almost sure convergence– for each e almost surely there exists an N such
that | Xn − X | < e for n > N.

(d) Start with m = 1. Since Xn → X in probability, we can choose a subsequence n1,k


such that Pr(| Xnk − X | > 1) < 2−k for k ∈ N. This subsequence evidently satisfies
the property ∑ Pr(| Xn1,k − X | > 1) = 1 < ∞.
Now recursively choose a subsequence nm,k of nm−1,k which satisfies Pr(| Xnm,k − X | >
1/m) < 2−k . This subsequence satisfies the property ∑ Pr(| Xn1,k − X | > 1/m) = 1 <
∞. Therefore, for any m0 ∈ N the diagonal subsquence nm = nm,m is entirely within
the subsequence nm0 ,k except for at most finitely many terms when m < m0 . Thus
∑ Pr(| Xnm − X | > 1/m0 ) < ∞ for all m0 . This shows that the subsequence satisfies
property (c), and therefore Xnm → X almost surely.

(e) If Xn → X in probability, then for any e > 0, pn = Pr(| Xn − X |) → 0. Since the


sequence pn has a limit, any subsequence pnk converges to the same limit, 0. Thus the
subsequence Xnk also converges in probability to X. By (d) this subsequence has a
further subsequence which converges almost surely to X.
Conversely, assume the property, and choose e > 0. We’ll show pn = Pr(| Xn − X | >
e) has the property pn → 0. Choose δ > 0, n1 = 1, and recursively find a subsequence
nk by choosing pnk + δ > supm>nk−1 pm . By the property, pnk has a subsequence nl for
which Xnl converges almost surely to X. By (a), Xnl converges in in probability, so
pnl → 0. Since nl is a subsequence of nk is has the property that pnl + δ > pm for any
m > nl . Taking the limit l → ∞, this shows that lim pm < δ. Since δ is arbitrary, this
shows pm → 0. Since e is arbitrary, this shows Xn → X in probability.


53
A13.2 If ξ ∼ N (0, 1), show E eλξ = exp(− 21 λ2 ). Suppose ξ 1 , ξ 2 , . . . are i.i.d. RV’s each
with a N (0, 1) distribution. Let Sn = ∑in=1 ξ i and let a, b ∈ R and define

Xn = exp( aSn − bn) (341)

Show
( Xn → 0 a.s.) ⇔ (b > 0) (342)
but that for r ≥ 1
( Xn → 0 in Lr ) ⇔ (r < 2b/a2 ) (343)

Calculating

1 1 − 12 λ2 √1
Z Z Z
−λξ − 12 ξ 2 − 21 (ξ −λ)2 − 12 λ2 1 2
Ee λξ
=√ e e dξ = √ e dξ = e e− 2 ζ dζ
2π R 2π R 2π R
− 12 λ2
=e
(344)

where we performed the substitution ζ = ξ − λ 

Suppose b < 0. By the SLLN, Sn /n → 0 as n → ∞ so, almost surely. Thus for each
element of the sample space ω there is an N (ω ) such that Snn − b < 21 b for all n > N. In this
case exp(Sn − bn) < exp(− 12 bn) → 0. Now suppose b > 0. The similarly there is an N (ω )
such that Snn > 21 b and therefore Xn > exp( 12 bn) → ∞. For b = 0, the law of the iterated
logarithm says that Sn > n log log n infinitely often. Therefore e aSn > (log n) an infinitely
often, and hence does not converge to 0. Since −Sn is also the sum of i.i.d. N (0, 1) RV’s,
e aSn < (log1n)an infinitely often. Thus the limit of e aSn doesn’t exist, it oscillates infinitely
often reaching arbitrarily high and low positive values.
On the other hand
1 2 − br ) n
E Xnr = E er(aSn −bn) = e−rbn (E eraξ )n = e( 2 (ra) (345)

If r < 2b/a2 then k Xn kr → 0, but if r ≥ 2b/a2 its not true. If 2b/a2 then k Xn kr → ∞ and
if r = 2b/a2 then k Xn kr → 1

54

You might also like