Professional Documents
Culture Documents
The sample space is the unit square [0, 1]2 with points corresponding to ( x1 , x2 ). The
probability measure is the same as area in this square. A necessary and sufficient condi-
tion to form a triangle is that the segments [0, x1 ], [ x1 , x2 ] and [ x2 , 1] satisfy the triangle
inequalities. Assuming x1 < x2 , these take the form
The first inequality says x1 < 21 , the second that x2 > 12 and the last that x2 − x1 < 21 .
These inequalities describe a triangle in the unit square with area 18 . Considering also the
symmetrical when x1 > x2 , the total probability is 82
A spaceship X may communicate with any spaceship in the points in the hemisphere
centered at X. Denote this by h( X ). Without loss of generality, rotate the sphere so that
A is the north pole, and B lies in a fixed plane through O and A. The three ships may
communicate if:
1. B is in the northern hemisphere and C is in the northern hemisphere (the ships can
communicate via A)
1
So writing one integral for each case above, conditioning on the angle θ = ∠ AOB and
giving the probabity for C to be in the the configuration described by the case gives the
equation
1 (π − θ )
Z π/2 Z π/2 Z π
θ
P= p(dθ ) + p(dθ ) + p(dθ ) (2)
0 2 0 2π π/2 2π
Now, the probability density of θ = ∠ AOB is given by 21 sin θ dθ. Heuristically, this can be
seen by consiering the infinitesimally thin strip on the surface of the sphere with constant
latitue θ. This is like a cylinder (or frustrum of a cone) of side edge length r dθ and radius
r sin θ, so the area is like 2πr2 sin θ dθ. Normalizing by the total surface area 4πr2 gives
the probability density.
Calculating each contribution
1 1 1
Z π/2
· sin θ dθ =
0 2 2 4
1 1
Z π/2
θ
· sin θ dθ = (3)
0 2π 2 4π
(π − θ ) 1 1
Z π
· sin θ dθ =
π/2 2π 2 4π
2+ π
Summing the contributions we find P = 4π
EG.3 Let G be the free group with two generators a and b. Start at time 0 with the unit
element 1, the empty word. At each second multiply the current word on the right by
one of the four elements a, a−1 , b, b−1 , choosing each with probability 14 (independently
of previous choices). The choices
at times 1 to 9 will produce the reduced word aab of length 3 at time 9. Prove that the
probability that the reduced word 1 ever occurs at a positive time is 1/3, and explain
why it is intuitively clear that (almost surely)
1
(length of reduced word at time n)/n → . (5)
2
Multiplying by random elements in { a, a−1 , b, b−1 } is the same as taking a random walk
on the Cayley graph of the free group with two generators (see figure 1). This is the rooted
4-regular tree. The depth of a vertex is the length of the unique path to the root, which is
the vertex corresponding to 1 in the free group element. For any vertex of depth d > 0,
there are three edges leading to a node of depth d + 1 and one edge leading to a node of
depth d − 1.
Let pd be the probability of returning to 1 starting from a node of depth d. Its intu-
itively clear from the symmetry of the graph that the probability only depends on the
depth. There is a graph isomorphism mapping any node of depth d to another node of
2
Figure 1: Cayley graph on the free group of 2 generators
depth d, so there is a one-to-one correspondence between paths which lead back to 1, and
those paths have equal probability. The probability must satisfy the recurrance
3 1
pn = p n +1 + p n −1 (6)
4 4
This is because we can condition the return probability on the first node in the walk and
use the fact that the return probability depends only the depth of the node. Fundamen-
tal solutions of recurrance equations have the form pn = λn and satisfy a characteristic
polynomial equation which, for this recurrance is
1
3λ2 − 4λ + 1 = 0 ⇒ λ= or λ = 1 (7)
3
n
The only solution of the form pn = a · 13 + b which also satisfies the boundary conditions
n
p0 = 1 and pn → 0 as n → ∞ is pn = 13 . In particular, a random walk starting at time
1 from a node of depth 1 returns to the origin with probability 13 . This is the same as the
event that the reduced word is ever 1 at positive time, since every word sequence as one
letter at time 1.
Let Ln be the random variable which represents word length at time n. Note that
3 1 1
E [ L n | L n −1 ] = ( L n −1 + 1 ) + ( L n −1 − 1 ) = L n −1 + (8)
4 4 2
since Ln is the same thing as vertex depth. Taking expectations and using the tower law
1
E L n = E L n −1 + (9)
2
Hence the expectation satisfies a simple recurrance. Since L0 = 0, this means E Ln = n/2.
More precisely, the quantity Ln − Ln−1 is an i.i.d. random variable with finite variance
and mean 21 . Therefore by the SLLN, Ln /n → 12 almost surely.
3
EG.4 Suppose now that the elements a, a−1 , b, b−1 are chosen instead with respective
probabilities α, α, β, β, where α > 0, β > 0, α + β = 21 . Prove that the conditional proba-
bility that the reduced word 1 ever occurs at a positive time, given that the element a is
chosen at time 1, is the unique root x = ( a) (say) in (0, 1) of the equation
As time goes on, (it is almost surely true that) more and more of the reduced word
becomes fixed, so that a final word is built up. If in the final word, the symbols a and
a−1 are both replaced by A and the symbols b and b−1 are both replaced by B, show that
the sequence of A’s and B’s obtained is a Markov chain on { A, B} with (for example)
α (1 − x )
p AA = (11)
α(1 − x ) + 2β(1 − y)
where y = r ( β). What is the (almost sure) limiting proportion of occurrence of the
symbol a in the final word? (Note. This result was used by Professor Lyons of Edinburgh
to solve a long-standing problem in potential theory on Riemannian manifolds.)
Algebra’s, etc.
Take V1 to be 1 and n > 10 whose base 10 representation has an equal first and last digit.
Take V2 to be the natural numbers whose last digit in its base 10 representation is 1.
Then 0 ≤ ](Vi ∩ {1, 2, . . . , n}) − bn/10c ≤ 1 since in every subset {q · 10, q · 10 +
1, q · 10 + 2, . . . , q · 10 + 9} each of V1 and V2 have exactly one representative. Therefore
γ(V1 ) = γ(V2 ) = 1.
Note V1 ∩ V2 consists of 1 and all numbers of the form 1b2 b3 . . . bn−1 1 where bi can take
any digit from 0 to 9. Thus between 10k and 10k+1 there are 10k−1 members of V1 ∩ V2 .
Therefore if n = 10k ,
10k−1 − 1
](V1 ∩ V2 ∩ {1, . . . , n}) = 1 + 1 + 10 + 100 + · · · + 10k−2 = +1 (13)
9
](V1 ∩V2 ∩{1,...,n}) 1
and n < 89 . However if n = 2 · 10k then
10k − 1
](V1 ∩ V2 ∩ {1, . . . , n}) = 1 + 1 + 10 + 100 + · · · + 10k−1 = +1 (14)
9
4
](V ∩V2 ∩{1,...,n}) 1
and 1 n > 18 . Therefore the limit diverges since the lim sup is greater than the
lim inf.
Independence
4.1 Let (Ω, F , Pr) be a probability triple. Let I1 , I2 and I3 be three π-systems on Ω
such that for k = 1, 2, 3
Ik ⊂ F and Ω ∈ Ik (15)
Prove that if
Pr( I1 ∩ I2 ∩ I3 ) = Pr( I1 ) Pr( I2 ) Pr( I3 ) (16)
whenever Ik ∈ Ik then the σ (Ik ) are independent. Why did we require Ω ∈ Ik ?
Each of µ and µ0 are countably additive measures because the probability Pr is a count-
ably additive measure on F . By the hypothesis (16), the measures are equal on the
π-system I1 . By assumption Ω ∈ I1 as well, so µ(Ω) = µ0 (Ω). If we dropped the
assumption that Ω ∈ I1 , then the condition that µ(Ω) = µ0 (Ω) is the condition that
Pr( I2 ∩ I3 ) = Pr( I2 ) Pr( I3 ), which doesn’t hold generically. We conclude from theorem
1.6 that µ( I ) = µ0 ( I ) for I ∈ σ(I1 ). Since I2 and I3 were arbitrary, equation (16) holds for
all I1 ∈ σ (I1 ), I2 ∈ I2 , I3 ∈ I3 .
Now we can repeat the above argument with the π-systems I10 = I2 , I20 = I3 , I30 =
σ (I1 ), and observe (16) is invariant to permutations of {1, 2, 3}, to conclude that (16)
holds for I1 ∈ σ (I1 ), I2 ∈ σ(I2 ), I3 ∈ I3 . Applying the arugment one more time for
I10 = I3 , I20 = σ(I1 ), I30 = σ(I2 ) gives the result we want, that (16) holds for Ik ∈ σ(Ik )
4.2 For s > 1 define ζ (s) = ∑n∈N n−s and consider the distribution on N given by
Pr( X = n) = ζ (s)−1 n−s . Let Ek = {n divisible by k }. Show E p and Eq are inde-
pendent for primes p 6= q. Show Euler’s formula 1/ζ (s) = ∏ p (1 − 1/ps ). Show
Pr(X is square-free) = 1/ζ (2s). Let H = gcd( X, Y ). Show Pr( H = n) = n−2s /ζ (2s)
Since
Pr(nk ) = k−s n−s /ζ (s) = k−s Pr(n) (19)
we have
Pr( Ek ) = k−s ∑ Pr(n) = k−s (20)
n ∈N
5
Furthermore, the conditional probability satisfies
Thus the conditional distribution of X is given by the ζ (s) distribution on the relative
divisor X/k.
Now, for m, n relatively prime, Em ∩ En = Emn by unique factorization. Therefore
As n → ∞ the set E consists of 1 and really big numbers, certainly every element of E
exceeds pn+1 , and Pr( X > n) ↓ 0 as n → ∞. Thus Pr( E) → Pr( X = 1) = 1/ζ (s). This
shows
ζ (s)−1 = ∏(1 − 1/ps ) (24)
p
The set of integers which don’t contain a square prime factor up to pn is given by
Each of the sets E p2 are independent, because the numbers p2 are relatively prime. Thus
as n → ∞, F converges to a set of square-free numbers up to pn (at least) and a set of
negligible probability, so
Now n = gcd( x, y) iff X and Y are divisible by n, and the relative divisors n x , ny such
that x = n · n x and y = n · ny . By independence the event that both X ∈ En and Y ∈ En
satisfies
Pr( X ∈ En , Y ∈ En ) = Pr( X ∈ En ) Pr(Y ∈ En ) = n−2s (27)
Let R = {( a, b) | a, b relatively prime} where we take the convention (1, 1) 6∈ R. We can
calulate Pr(( X, Y ) ∈ R) as follows. First consider the sets Fp = { X ∈ E p } ∩ {Y ∈ E p }. By
independence Pr( Fp ) = Pr( X ∈ E p ) Pr(Y ∈ E p ) = p−2s . The Fp are independent of each
other since the E p are. Then Fpc is the event that p is not a divisor of at least one of X and
Y. Hence F = p Fpc is the event that X and Y have no common divisor.
T
6
Pulling it together
Pr(gcd( X, Y ) = n) = Pr( An ) Pr( Bn | An ) = n−2s /ζ (2s) (29)
where An = { X ∈ En } ∩ {Y ∈ En } and Bn = {( X/n, Y/n) ∈ R}.
A continuous distribution has the property that singletons have measure zero µ( X =
x ) = 0. Thus conditioning on the value of X1 ,R it follows that { X1 = X2 } has measure
zero also, by Fubini’s theorem Pr( X1 = X2 ) = Ω Pr( X2 = x ) µ(dx ) = 0. Since there are
a finite number of pairs of variables, we may assume none of the Xi are coincident by
excluding a set of measure zero.
Allow a permutation σ ∈ Sn to act on the sample space by permuting the indices. That
is σ : ( X1 , X2 , . . . , Xn ) ,→ ( Xσ1 , Xσ2 , . . . , Xσn ). Clearly this also permutes the ranks so that
ρi (σX ) = ρσi ( X ), or in vector form, ρ(σX ) = σρ( X ). Because the Xi are i.i.d., the permu-
tation σ is measure preserving Pr( E) = Pr(σE) for any event E ⊂ Ωn . Consider the Weyl
chamber W = { x1 < x2 < · · · < xn }. Clearly the disjoint σW partition Ωn , excluding a
set of measure zero of coincident points, since every point x ∈ Ωn is in ρ( x )−1 (W ) where
ρ( x ) is the permutation induced by mapping each component to its ordinal ranking from
smallest to greatest.
We infer that Pr(W ) = 1/n!, since the disjoint copies of W have equal probability and
comprise the entire space. The event En is the union of all the sets σW where σn = n.
There are (n − 1)! such permutations, so Pr En = (n − 1)!/n! = 1/n.
For any σW ⊂ En we can apply any permutation σ0 ∈ Sm ⊂ Sn which acts only on
the first m letters and keeps the rest fixed. For any σ0 ∈ Sm , σ0 σW ⊂ En since the nth
component is largest and unchanged by the action of σ0 . Let us group the sets σW ⊂ En
into the orbits of the action of Sm . From each orbit we may choose the representative with
ρ1 ( x ) < ρ2 ( x ) < · · · < ρm ( x ) for each x ∈ σW, which is formed by permuting the first
m elements into sorted order. Let U be the union of all orbit representatives. For σ0 ∈ Sm
the sets σ0 U are disjoint and partition En . There are m! such sets, they all have equal
probability, and their union has probability 1/n, so Pr(U ) = 1/(n · m!). Of these exactly
(m − 1)! are also in Em . These correspond to σ0 ∈ Sm which satisfy σ0 m = m. Therefore
Pr( Em ∩ En ) = 1/(mn) = Pr( Em ) Pr( En ) and the sets are independent.
Borel-Cantelli Lemmas
4.4 Suppose a coin with probability p of heads is tossed repeatedly. Let Ak be the event
that a sequence of k (or more) heads occurs among the tosses numbered 2k , 2k + 1, 2k +
1, . . . , 2k+1 − 1. Prove that (
1 if p ≥ 21
Pr( Ak i.o) = (30)
0 if p < 12
7
Let Ek,i be the event that there is a string of k heads at times j for
2k + i ≤ j < 2k + i + k − 1 (31)
Sm
Observe Pr( Ek,i ) = pk . If mk = 2k − k and i ≤ mk then Ek,i ⊂ Ak . Furthmore Ak ⊂ i=k0 Ek,i
since if there’s a string of k heads in this range, it has to start somewhere. Thus by the
union bound
mk
Pr( Ak ) ≤ ∑ Ek,i = mk · Pr(Ek,i ) ≤ (2p)k (32)
i =0
All of the terms in the first sum are pk and there are bmk /k c ≥ 2k /k − 2 terms. All the
terms in the second sum are p2k since the Ek,ik are independent and there are (bmk2/kc) of
them. Therefore when p = 21
! !
2k 22k 1 1
Pr( Ak ) ≥ − 2 pk − 2
p2k = − 2−(k−1) − 2 (35)
k 2k k 2k
Therefore ∑ Pr( Ak ) = ∞, since its greater than sum of the harmonic series and a con-
vergent series. Since the Ak are independent, the second Borel Cantelli lemma says that
Pr( A)k i.o.) = 1.
As a function of p, the probability Pr( Ak i.o.) must non-decreasing, since increasing
the probability of heads increases the likelihood of the event. However we just showed
this probability is 1 for p = 21 . Therefore the probability is 1 for any p > 12 .
1 1 2
Pr( G > x ) ≤ √ e− 2 x (36)
x 2π
8
Let’s do a calcualtion, noting that u/x ≥ 1 in the range of the integral
Z ∞ − 1 u2 Z ∞ 1 2
e 2 1 − 12 u2 e− 2 x
du ≤ ue du = k+1 (39)
x uk x k +1 x x
1 2
Now integrating by parts with v = xe− 2 x and u0 = 1/x we get
1 2 Z ∞ − 1 x2
√ Z ∞
− 12 u2 e− 2 x e 2
2π Pr( G > x ) = e du = − dx (40)
x x x x2
Alternately dropping the last term or applying (39) with k = 2 we find
1 1 1 1 2 1 1 − 1 x2
√ − 3 e− 2 x ≤ Pr( G > x ) ≤ √ e 2 (41)
2π x x 2π x
1 2 1− e
If I ( x ) = x −1 e− 2 x then this shows for any e > 0 of x is large enough then √ I (x) ≤
2π
P( G > x ) ≤ √1 . Now
2π
1 1
e−α log n = p
p
I( 2α log n) = p (42)
α log n x α α log n
p
Using the substitution the substitution u = log x we can apply the integral test
Z ∞ Z ∞ Z ∞
dx 2 2
e(1−α)u du
p
I( 2α log n) dx = p = √ (43)
a a x α α log x α a
Clearly the integral converges if α > 1 and diverges if α ≤ 1, so the same is true of
∑ ∑ 2α log n))
p p
Pr ( X n > 2α log n ) = O ( I ( (44)
n n
The Xn are independent so we may apply Borel-Cantelli and its converse to conclude for
α>1 p p
lim sup Xn / 2α log n ≤ 1 a.s. lim inf Xn / 2 log n ≥ 1 a.s. (45)
Let α ↓ 1 to conclude lim L = 1 almost surely.
Note that we only used independence in the lower bound, the upper bound √ uses plain
jane Borel-Cantelli which does not assume independence.
√ Thus Yn = S n / n are a se-
quence of N (0, 1) random variables. If we take α = 2 then conclude
p
Yn < 2 log n eventually, a.s. (46)
The same is true for −Yn whichp has the same distribution, so p eventually (taking the
greater eventuality) |Yn | < 2 log n. This is the same as |Sn | < 2 n log n eventually al-
2
q this implies the strong law of large numbers for Xn ∼ N (µ, σ )
most surely. In particular
log n
since |Sn /n − µ| < 2σ n → 0 eventually almost surely.
9
4.6 This is a converse to the SLLN. Let Xn be a sequence of i.i.d. RV with E[| Xn |] = ∞.
Prove that
| Xn |
∑ Pr(|Xn | > kn) = ∞ for ∀k and lim sup
n
= ∞ a.s. (47)
n
For i.i.d. Xn since Pr(| Xn | > α) = Pr(| X1 | > α) for any α ≥ 0 this implies
∑ Pr(|Xn | > kn) = ∞ (52)
n
By the converse of Borel-Cantelli we must have | Xn |/n > k infinitely often, and therefore
lim sup| Xn |/n ≥ k for every k ∈ N. We conclude lim sup| Xn |/n = ∞.
Suppose |Sn |/n ≤ k eventually for some k > 0. That is, there exists a k > 0 and N > 0
such that if n ≥ N then |Sn | ≤ kn. We know that almost surely there exists an n ≥ N + 1
such that | Xn | ≥ 2kn. But then
|Sn | ≥ | Xn | − |Sn−1 | ≥ 2kn − k(n − 1) > kn (53)
This is a contradiction. Hence, with probability 1, |Sn |/n > k infinitely often for any k > 0
and therefore
| Sn |
lim sup = ∞ a.s. (54)
n
4.7 What’s fair about a fair game? Let X1 , X2 , . . . be independent random variables
such that (
n2 − 1 with probability n−2
Xn = (55)
−1 with probability 1 − n−2
Prove that E Xn = 0 for all n, but that if Sn = X1 + X2 + · · · + Xn then
Sn
→ −1 a.s. (56)
n
10
An easy calculation shows
By Borel-Cantelli, with probability 1, only finitely many of the Acn occur. Therefore, with
probability 1, Xn = −1 with a finite number of exceptions. On any such sample, let N be
the largest index such that X N = 1 − n−2 . Then for n > N
Sn S −1 · ( n − N )
= N+ (60)
n n n
The as n → ∞, the first term tends to zero because S N is fixed and the second term tends
to -1. (Note the indepdendence of Xn was not needed in this argument)
4.8 This exercise assumes that you are familiar with continuous-parameter Markov
chains with two states.
For each n ∈ N, let X (n) = { X (n) (t) : t ≥ 0} be a Markov chain with state-space the
two-point set {0, 1} with Q-matrix
(n) − an an
Q = , a n , bn > 0 (61)
bn − bn
and transition function P(n) (t) = exp(tQ(n) ). Show that, for every t,
(n) (n)
p00 (t) ≥ bn /( an + bn ), p01 (t) ≤ an /( an + bn ) (62)
The processes ( X (n) : n ∈ N) are independent and X (n) (0) = 0 for every n. Each X (n)
has right-continuous paths.
Suppose that ∑ an = ∞ and ∑ an /bn < ∞. Prove that if t is a fixed time then
11
First let’s calculate the matrix exponential
2
a + ab − a2 − ab
−a a 2
Q= we have Q = = −( a + b) Q (66)
b −b − ab − b2 ab + b2
Thus, by induction, Qn = (− a − b)n−1 Q, and therefore
tk Qk Q tk (− a − b)k e−(a+b)t − 1
exp(tQ) = I + ∑ k! = I −
a + b ∑ k!
= I −
a + b
Q
k ≥1 k ≥1
(67)
− t ( a + b ) a − ae−t(a+b)
1 b + ae
=
a + b b − be−t(a+b) a + be−t(a+b)
From this (62) is evident.
For fixed t > 0,
an an
∑ Pr({X (n) (t) = 1}) = ∑ p01 (t) ≤ ∑ an + bn ≤∑
(n)
<∞ (68)
n n n n bn
Thus by Borel-Cantelli, almost surely, at any time t only finitely many of the X (n) (t) = 1
Now consider the joint probability Π(t) that every X (n) (t) = 0 at time t. This satisfies
the bound
a n + bn an
− log Π(t) = − log ∏ p00 (t) ≤ ∑ log ≤∑
(n)
<∞ (69)
n n bn n bn
This gives a uniform bound for the convergence of − log Π(t), implies the product con-
verges uniformly on any compact subset of R≥0 , such as [0, 1]. In particular this means
that
∞
lim Π(t) = Π(0) =
t ↓0
∏1=1 (70)
n =1
TODO finish this
Tail σ-algebras
Define
Xn = Y0 Y1 · · · Yn (72)
Prove the Xn are independent. Define
Prove !
\ \
L := σ(Y , Tn ) 6= σ Y, Tn =: R (74)
n n
12
First note Pr( Xn = 1) = Pr( Xn = −1) = 21 , so Xn has the same distribution as Yn . If we
let X0 = Y0 , this follows by induction since
(
X n −1 with probability 12
Xn = (75)
− Xn−1 with probability 12
and so, for example,
1 1 1 1
Pr( Xn = 1) = Pr( Xn = 1) + Pr( Xn−1 = −1) = 2 · = (76)
2 2 4 2
For m < n and em , en ∈ {−1, 1}, consider Pr({ Xm = em } ∩ { Xn = en }). Clearly this is
the same as Pr({ Xm = em } ∩ { Xm · Xn = em en }). Given Xm and Xn we can multiply them
to generate Xm · Xn . Conversely, given Xm and Xm · Xn we can multiply them to recover
Xn . Note also that
Xm · Xn = (Y0 · Y1 · · · Ym )2 Ym+1 · · · Yn = Ym+1 · · · Yn (77)
Hence Xm and Xm · Xn are independent, since they are functions of independent random
variables. Thus
Pr({ Xm = em } ∩ { Xn = en }) = Pr({ Xm = em } ∩ { Xm · Xn = em en })
= Pr({ Xm = em }) Pr({ Xm · Xn = em en }) (78)
1 1
= · = Pr( Xm = em ) Pr( Xn = en )
2 2
This argument generalizes to any finite set of Xn1 , . . . , Xnk by considering the isometric
transformation
Xn1 Xn1
Xn Xn1 Xn2
2
→ (79)
.. ..
. .
Xn k Xn1 Xn2 · · · Xn k
and noting each of the variables on the right hand side are independent with the same
distribution as Xn so the joint probability is 2−k for any fixed values enk . But this is the
same as ∏k Pr( Xnk = enk ).
Furthermore Xn is independent of Ym . If m > n this is obvious because the definition of
Xm involves terms independent of Yn by definition. If m ≤ n then consider the isometric
transformation (Ym , Xn ) → (Ym , Xn · Ym ). The second variable is the product Y0 · · · Ym−1 ·
Ym+1 · · · Yn which is clearly independent of Ym
Suppose we know the values of Yk for all k ≥ 1 and also some Xr . Then we can
immediately deduce the value of Y0 since
Y0 = Y0 · (Y1 · · · Yr )2 = Xr · Y1 · Y2 · · · Yr (80)
This shows that for any n, Y0 is measurable in the algebra σ(Y1 , . . . , Yn , Xn ) ⊂ σ(Y , Tn−1 ).
Therefore Y0 is measurable in L = n σ(Y , Tn )
T
13
Dominated Convergence Theorem
5.1 Let S = [0, 1], Σ = B(S), µ be the Lesbegue measure. Define f n = nI(0,1/n) . Prove
that f n (s) → 0 for every s ∈ S but that µ( f n ) = 1 for every n. Draw a picture of
g = supn | f n | and show that g 6∈ L1 (S, Σ, µ)
If n > 1/s then f n (s) = 0 because the indicator I(0,1/n) doesn’t put any mass at points
as large as s. So f n (s) eventually for every s and lim f n (s) → 0. On the other hand
µ( f n ) = nµ(0, 1/n) = nn = 1. The function g = supn | f n | sort of looks like the function 1/x
near 0, except it has stairsteps at each 1/n. It takes the value n on the range (1/(n + 1), n].
Now gn = max(| f 1 |, . . . , | f n |) is a simple function which satisfies
n n
1 1 1
µgn ≥ ∑ k − = ∑ (81)
k =1
k k + 1 k =1
k + 1
The sum diverges as n ↑ ∞, which means µg also diverges since its greater than any µgn .
We will use the following property of indicators repeatedly: for events A and B,
I A IB = I A ∩ B (82)
1 − I ( A1 ∪ · · · ∪ An ) = I (( A1 ∪ . . . An )c )
= I ( A1c ∩ A2c ∩ · · · ∩ Acn )
= I ( A1c ) I ( A2c ) . . . I ( Acn )
=(1 − I ( A1 ))(1 − I ( A2 )) . . . (1 − I ( An ))
=1 − ∑ I ( A i ) + ∑ I ( A i ) I ( A j ) − ∑ I ( Ai ) I ( A j ) I ( A k ) (83)
i i< j i < j<k
+ · · · + (−1)n−1 I ( A1 ) I ( A2 ) . . . I ( An )
=1 − ∑ I ( A i ) + ∑ I ( A i ∩ A j ) − ∑ I ( Ai ∩ A j ∩ A k )
i i< j i < j<k
+ · · · + (−1)n−1 I ( A1 ∩ A2 ∩ · · · ∩ An )
14
Take expectations of each side yields inclusion exclusion
I ( A1 ∪ · · · ∪ A n ) Q ∑ I ( Ai ) + ∑ I ( Ai ∩ A j ) − ∑ I ( Ai ∩ A j ∩ A k )
i i< j i < j<k
(85)
+ · · · + (−1)d−1 ∑ I ( A i1 ∩ A i2 ∩ · · · ∩ A i d )
i1 <i2 <···<id
where the inequality is ≤ if d is odd and ≥ if d is even. For each d we induct on n. Clearly
its true for d = 1 and n = 2 since by inclusion-exclusion
I ( A1 ∪ A2 ) = I ( A1 ) + I ( A2 ) − I ( A1 ∩ A2 ) ≤ I ( A1 ) + I ( A2 ) (86)
Then its true inductively for any n and d = 1, assuming its true for n − 1
I ( A 1 ∪ · · · ∪ A n ) ≤ I ( A 1 ∪ · · · ∪ A n −1 ) ∪ I ( A n )
(87)
≤ I ( A 1 ) + · · · + I ( A n −1 ) + I ( A n )
For arbitrary d, the inequality is true for n ≤ k, since in this case its just the inclusion-
exclusion equality. No terms are truncated. So inductively if its true for n − 1, we can
group together A1 ∪ A2 and treat it as one set
I ( A1 ∪ · · · ∪ A n ) Q I ( A1 ∪ A2 ) + ∑ I ( A i )
i
− ∑ I (( A1 ∪ A2 ) ∩ Ai ) − ∑ I ( Ai ∩ A j )
i i< j
+ ∑ I (( A1 ∪ A2 ) ∩ Ai ∩ A j ) + ∑ I ( Ai ∩ A j ∩ A k )
i,j i < j<k (88)
+...
+ (−1)d−1 ∑ I (( A1 ∪ A2 ) ∩ Ai1 ∩ Ai2 ∩ · · · ∩ Aid−1 )
i1 <i2 <···<id−1
+ (−1)d−1 ∑ I ( A i1 ∩ A i2 ∩ · · · ∩ A i d )
i1 <i2 <···<id
Here indices range from 3 to n since we’ve explicitly broken out 1 and 2. Using the dis-
tributive law, each indicator which includes A1 ∪ A2 can be written as the indicator of a
union of two sets ( A1 ∩ B) ∪ ( A2 ∩ B) where B = Ai1 ∩ · · · ∩ Aim . For all such indica-
tors of m < d terms, use inclusion-exclusion for n = 2 to expand I (( A1 ∪ A2 ) ∩ B) →
I ( A1 ∩ B) + I ( A2 ∩ B) − I ( A1 ∩ A2 ∩ B). After performing all these expansions, we see
we recover most of the terms inclusion-exclusion formula up to level d. This is because
15
the terms of level m expand to provide all the missing terms in level m which have exactly
one of A1 or A2 , and also to provide all the missing terms in level m + 1 including both
A1 and A2 . Terms with neither A1 or A2 were already present before the expansion.
Finally we extend the inequality by expanding the terms in level d using the inequality
I (( A1 ∪ A2 ) ∩ B) ≤ I ( A1 ∩ B) + I ( A2 ∩ B). This provides the missing terms at level d. This
also extends the inequality in the right direction, ≤ for odd d and ≥ for even d, owing to
the factor (−1)d−1 on these terms.
7.1 Let f be a bounded continuous function on [0, ∞). The Laplace transform of f is the
function L on (0, ∞) defined by
Z ∞
L(λ) = e−λx f ( x ) dx (89)
0
(−1)n−1 λn (n−1)
E f ( Sn ) = L (λ) (90)
( n − 1) !
Show that f may be recovered from L as follows: for y > 0
We can write
Z
E f ( Sn ) = f ( x1 + x2 + · · · + xn )e−λx1 e−λx2 . . . e−λxn dx1 dx2 . . . dxn
Rn+
Z ∞Z ∞ Z ∞ (92)
= ··· f (un )e−λun dun . . . du1
0 u1 u n −1
16
We may repeatedly interchange the order of integration, and then integrate out one vari-
able until we are left with
Z ∞
unn−1
E f ( Sn ) = f (un )e−λun dun (95)
0 ( n − 1) !
Now we need two fact from the theory of Laplace transforms
d ∞ d
Z Z ∞
L[ f ] = f ( x )e−λx dx = − x f ( x )e−λx dx (96)
dλ 0 dλ 0
= − L[ x f ] (97)
Z ∞ ∞ Z ∞
L[ f 0 ] = 0
f ( x )e −λx
dx = f ( x )e −λx
+λ f ( x )e−λx dx (98)
0 0 0
= λ f (0) + λL[ f ] (99)
(−1)n−1 n−1
E f ( Sn ) = L [f] (100)
( n − 1) !
where the middle equality comes from the substitution u = x/α. From this its clear that
d n −1 −n d
n −1
L[ f (αx )](λ) = α L[ f ( x )](αλ) (102)
dx n dx n
Taking y = 1/λ and α = 1/n we get the desired result
17
7.2 As usual, write Sn−1 = { x ∈ Rn : | x | = 1}. There is a unique probability mea-
sure νn−1 on (Sn−1 , B(Sn−1 )) such that νn−1 ( A) = νn−1 ( H A) for every orthogonal n × n
matrix H and every A in B(Sn−1 ). This is the same as the radial measure in polar coor-
dinates, or the Haar measure under the action of orthogonal matricies.
(a) Prove that if X is a vector in Rn , the components of which are independent N (0, 1)
variables, then for every orthogonal n × n matrix H the vector HX has the same
property. Deduce that X/k X k has law νn−1
√ (n) √ (n)
lim Pr( Y 1 ≤ x1 ; Y 2 ≤ x2 ) = Φ( x1 )Φ( x2 ) (106)
n→∞
(a) Let Y = HX. Each component is the linear combination of Gaussian random vari-
ables, so it too must be jointly Gaussian. Let’s compute the moments of the compo-
nents. Note E Y = E HX = E H0 = 0. Therefore
Cov Y = E YY | = E( HX )( HX )| = E HXX | H | = H (E XX | ) H | = H I H | = I (107)
We’ve used the identity HH | for orthogonal matricies. Thus each Yi ∼ N(0, 1). Fur-
thermore Cov(Yi , Yj ) = 0 if i 6= j. Since these are jointly Gaussian random variables,
zero correlation is the same thing as independence. Now X/k X k ∈ Sn−1 and the law
of X, by the above calculation, is invariant under orthogonal transformations. That
is, HX/k HX k has the same distribution as X/k X k, which is true since HX has the
same distribution as X. We conclude that the law of X/k X k is νn−1 by the uniqueness
of this measure.
(b) Note E Zk2 = 1 since this is just the variance of Zk . Also E Zk8 < ∞ since a Gaussian
has finite moments of all orders. By the SLLN applied to the random variables Zk ,
n
R2n /n = ( ∑ Zk2 )/n → 1 a.s. (108)
k =1
√
By continuity of square roots, it must also be the case that Rn / n → 1 almost surely.
√
(c) Note Y has the same
√ law as X/ k X k = X/R n . As n → ∞, this tends
√ to X/ n√almost
surely. Thus Pr( nY1 < x ) → Pr( X1 < x ) = Φ( x ). Similarly Pr( nY1 < x1 ; nY2 <
x2 ) → Pr( X1 < x1 ; X2 < x2 ) = Φ( x1 )Φ( x2 ).
18
Conditional Expectation
We make a standard monotone class argument. We will show that the collection of sets
satisfying (109) is a d-system, so by Dynkin’s lemma and the hypothesis the π-system G
has property implies σ(G) has the property.
(a) By assumption Ω ∈ G
(b) Suppose A, B ∈ G . Then L = A ∩ B ∈ G because G is a π-system. Now since B =
( B \ A) ä L is the union of these disjoint sets. Therefore IB = IB\ A + IL . Thus by the
linearity of expectations
E( X; B \ A) = E XIB\ A = E X ( IB − IL ) = E( X; B) − E( X; L) (110)
Since the same is true for Y we get
E( X; B \ A) = E( X; B) − E( X; L) = E(Y; B) − E(Y; L) = E(Y; B \ A) (111)
and the class of sets satisfying (109) is closed under taking differences.
(c) Suppose An ↑ A. Then XI An is dominated by | XI A | and also XI An → XI A . Thus by
dominated convergence
E( X; An ) → E( X; A) (112)
Mutatis mutandis, the same is true for Y hence
E( X; A) = lim E( X; An ) = lim E(Y; An ) = E(Y; A) (113)
n→∞ n→∞
19
Recall the defining property of of conditional expectation. First, E[Y | X ] is σ( X )measurable,
and second and for G ∈ σ( X ),
E( X; X ≤ c) = E(Y; X ≤ c) ⇒
E( X − Y; X ≤ c, Y > c) + E( X − Y; X ≤ c, Y ≤ c) = 0
(117)
E( X; Y ≤ c) = E(Y; Y ≤ c) ⇒
E( X − Y; X > c, Y ≤ c) + E( X − Y; X ≤ c, Y ≤ c) = 0
Therefore
E( X − Y; X ≤ c, Y > c) = E( X − Y; X > c, Y ≤ c) (118)
since both equal E(Y − X; X ≤ c, Y ≤ c). Since the left hand side is non-positive, and the
right hand side is non-negative, both sides must be zero.
I think this clinches it. Note { X > Y } ⊂ c∈Q { X > c, Y ≤ c}. However
S
However if Pr( X > Y ) > 0 then E( X − Y; X > Y ) > 0. Thus X ≤ Y almost surely. By
symmetry, Y ≤ X almost surely, and the intersection of these almost sure events is almost
sure. So X = Y almost surely.
Martingales
10.1 Polya’s urn At time 0 an urn contains 1 black ball and 1 white ball. At each time
t = 1, 2, . . . a ball is chosen at random from the urn and is replaced, and another ball of
the same color is also added. Just after time n there are therefore n + 2 balls in the urn,
of which Bn + 1 is black, where Bn is the number of black balls chosen by time n.
Let Mn = ( Bn + 1)/(n + 2) be the proportion of blacks balls just after time n. Prove that
M is a martingale.
1 for 0 ≤ k ≤ n. What’s the distribution of Θ = lim Mn . Prove
1
Prove Pr( Bn = k) = n+
that for 0 < θ < 1,
( n + 1) !
Nnθ = θ Bn (1 − θ )n− Bn (120)
Bn !(n − Bn )!
defines a martingale N θ
20
We will consider M with respect to the filtration Fn = σ( B1 , . . . , Bn ). Thus, since σ( Bn−1 ) ⊂
F n −1
Bn−1 + 1 B +2
E[ Mn | Fn−1 ] = Pr[ Bn = Bn+1 ] + Pr[ Bn = Bn+1 + 1] n−1
n+2 n+2
n − Bn−1 Bn−1 + 1 Bn−1 + 1 Bn−1 + 2
= · + · (121)
n+1 n+2 n+1 n+2
(n + 2)( Bn−1 + 1)
= = Mn − 1
(n + 2)(n + 1)
which verifies Mn is a martingale.
Now let ti specify on which draw the ith black ball selected, and let s j specify which
draw the jth white ball is selected are white. That is, if n = 7 and k = 4 and the balls
drawn are bwwbbwb then t1 = 1, t2 = 4, t3 = 5, t4 = 7 and ti is given by s1 = 2, s2 =
3, s3 = 6. The length of each sequence is the total number of balls chosen of the respective
color. Since every draw is either black or white, thus it must appear either in the black
sequence or the white sequence. Therefore i ti ∪ i si = {1, 2, . . . , n}. Let’s compute the
S S
We’ve learned a remarkable fact: the probability doesn’t depend on the order of the
draws, just the total number which are black. The sequence is exchangable, any per-
mutation of the outcomes results in an event with the same probability. There are (nk)
sequences with k black balls and n − k white ones.
−1
n 1 n 1
Pr( Bn = k ) = = (123)
k ( n + 1) k n+1
Thus we’ve learned another remarkable fact, the distribution of Bn is uniform on 1, 2, . . . , n+1.
Thus Θ = limn→∞ Mn must converge in distribution to the uniform distribution U (0, 1).
Turning to Nnθ , let B be the event that there’s a black ball on the nth draw. That is that
Bn = Bn−1 + 1. Let W = Bc , the probability there is a white ball.
( n + 1) !
E[ Nnθ | Fn−1 ] = Pr(W ) θ Bn−1 (1 − θ )n− Bn−1
Bn−1 !(n − Bn−1 )!
( n + 1) !
+ Pr( B) θ Bn−1 +1 (1 − θ )n− Bn −1
( Bn−1 + 1)!(n − Bn−1 − 1)!
(124)
n!
= θ Bn −1 (1 − θ )n−1− Bn ×
Bn−1 !(n − 1 − Bn−1 )!
(1 − θ )(n + 1)
θ ( n + 1)
P (W ) + P( B)
n − Bn−1 Bn−1 + 1
The factor before the parantheses is Nnθ−1 . The expression in the parentheses is
n − Bn−1
n+1 n+1 Bn−1 + 1
(1 − θ ) +θ = 1−θ+θ = 1 (125)
n − Bn+1 n+1 Bn−1 + 1 n+1
21
10.2 Belman’s optimality principle Your winnings per unit stake on game n are en
where en are i.i.d. RV with distribution
(
+1 with probability p
en = (126)
−1 with probability q
Your stake Cn on game n must lie between 0 and Zn−1 , where Zn−1 is your fortune at
time n − 1. Your object is to maximize the expected interest rate E log( ZN /Z0 ) where
N is a given integer representing the number of times you will play and Z0 , your initial
wealth, is a given constant. Let Fn = σ(e1 , . . . , en ) be your history up to time n. Show
that if C is any previsible strategy, then log Zn − nα is a supermartingale where α denotes
the entropy
α = p log p + q log q + log 2 (127)
so that E log( ZN /Z0 ) ≤ Nα, but that, for a certain strategy, log Zn − nα is a martingale.
What is the best strategy?
For a discrete distribution on {1, 2, . . . , n}, lets maximize E log X for a random variable
subject to the constraint ∑in=1 X (i ) = K. Using the method of Lagrange multipliers we
maximize the expression
n n
F= ∑ pi log xi − λ( ∑ xi − K ) (128)
i =1 i =1
22
where the last inequality follow from our previous calculation. Therefore Zn − nα is a su-
permartingale for any previsible Cn and the optimal Cn is given by Cn = (2p − 1) Zn−1 =
( p − q) Zn−1 . The supermartingale property implies that E[log ZN − Nα] ≤ E[log Z0 ] and
thus E[log( ZN /Z0 )] ≤ Nα
10.3 Stopping times. Suppose S and T are stopping times relative to (Ω, F , {Fn }).
Prove that S ∧ T = min(S, T ) and S ∨ T = max(S, T ) and S + T are all stopping times.
10.4 Let S and T be stopping times with S ≤ T. Define the process 1(S,T ] with parameter
set N via (
1 if S(ω ) < n ≤ T (ω )
1(S,T ] (n, ω ) = (133)
0 otherwise
Prove 1(S,T ] is previsible and deduce that if X is a supermartingale then
E( X T ∧ n ) ≤ E( XS ∧ n ), ∀n (134)
Note
{1(S,T ] (n, ω ) = 1} = {S(ω ) ≤ n − 1} ∩ { T (ω ) ≥ n} (135)
The first event {S ≤ n − 1} ∈ Fn−1 because S is a stopping time. Similarly the second
event { T ≥ n} = { T ≤ n − 1}c ∈ Fn−1 because T is a stopping time. Since 1(S,T ] only
takes the values {0, 1} this proves that ω ,→ 1(S,T ] (n, ω ) is Fn−1 measurable. Hence the
process 1(S,T ) is previsible.
Thus for any supermartingale Xn , the process Yn = 1(S,T ] • Xn is also a supermartin-
gale with value at time n given by
1(S,T ] • Xn = XT ∧n − XS∧n (136)
Without loss of generality we can assume Y0 = 0, reindexing time if necessary so that
S, T ≥ 1. Taking expectations of both sides gives the desired result
E Yn = E XT ∧n − E XS∧n ≤ Y0 = 0 (137)
23
10.5 Suppose that T is a stopping time such that for some N ∈ N and some e > 0 we
have, for every n
Pr( T ≤ n + N | Fn ) > e, a.s. (138)
By induction using Pr( T > kN ) = Pr( T > kN; T > (k − 1) N ) that for k = 1, 2, 3, . . .
Taking n = 0 this implies Pr( GN ) = Pr( GN | Fn ) < 1 − e. By induction, using the relation
above for n = N (k − 1) we get
Pr( GNk ) = Pr( GN (k−1)+ N | GN (k−1) ) Pr( GN (k−1) ) ≤ (1 − e)(1 − e)n−1 (141)
Also if m < n then Gm ⊃ Gn we have the trivial inequality Pr( Gm ) ≥ Pr( Gn ). So using
this inequality for n = Nk + 1 to n = N (k + 1) − 1 compared with m = Nk we get
∞ ∞ ∞
N
ET = ∑ Pr( T ≥ n) ≤ ∑ N Pr( T ≥ Nk) ≤ N ∑ (1 − e ) k = e
(143)
n =1 k =0 k =0
Clearly this is finite. Note that the inequalities used above are true almost surely. How-
ever, we used only a countable number of inequalities, and the countable intersection of
almost sure events is still almost sure. Thus the inequality above also holds almost surely.
The result may be summarized, “so long as there’s a chance something happens, no mat-
ter how small (so long as the chance is bounded below by a positive number), then it will
happen eventually”
24
10.6 ABRACADABRA. At each of times 1, 2, 3, . . . a monkey types a capital letter at
random, so the sequence typed is an i.i.d. sequence of RV’s each chosed uniformly
among the 26 possible capital letters. Just before each time t = 1, 2, . . . a new gambler
arrives. He bets $1 that
the nth letter will be A (144)
If he loses he leaves. If he wins he bets his fortune of $26 that
and so on through ABRACADABRA. Let T be the first time the monkey has produced
the consecutive sequence. Show why
Intuitively, the cumulative winnings of all the gamblers are a martingale, since its the sum
of a bunch of martingales. The cumulative winnings are − T for the initial stake placed
at each time 1, . . . , T plus 2611 + 264 + 26 payout for the winnings on the bets placed on
the first A (winning for the whole word ABRACADABRA), the bet placed on the second
to last A (winning for the terminal fragment ABRA) and the bet placed on the final A
(winning for A). For this to be a martingale, expectated cumulative winnings must be 0
so E T = 2611 + 264 + 26
More formally, let Ln be the nth randomly selected letter. The Ln are i.i.d., with uni-
form distribution over { A, B, . . . , Z }. For a sequence of letters a1 a2 . . . an consider the
martingale
1 if n ≤ j
26k if n = j + k and L = a , L
( j) j 1 j +1 = a 2 , . . . , L j + k −1 = a k
Mn = 11 if n > j + 11 and L = a , L
(148)
26 j 1 j + 1 = a 2 , . . . , L j + 10 = a 11
0 otherwise
( j)
This is bounded, and hence is in L1 . Mn satisfies the martingale prorperty since, except
for case 2, its constant and in that case
( j) 1 ( j) 25 ( j)
E [ Mn + 1 | F n ] = · 26Mn + · 0 = Mn (149)
26 26
Now form the martingale N ( j) = In≥ j • M( j) where In≥ j is the indicator for n ≥ j. Now
form the martingale
B = ∑ N ( j) (150)
j ≥1
At any time n only finitely many of the terms are non-zero. Each term is in L1 , so B is
also in L1 . The martingale property holds since B is the sum of martingales. For n ∈ N
25
( j)
let S be the set of j ≤ n such that Mn is in case 2, matching some number of letters. Let
( j)
S0 be the set of j ≤ n such that Mn is in case 3, matching all the letters. The value of B is
given by
Bn = −n + ∑ 26n− j + ∑ 2611 (151)
j∈S j∈S0
Let T be the stopping time for the first time some M( j) enters case 3, that is M( j)
matches all 11 letters in ABRACADABRA. It must be that E| T | < ∞ because it statis-
fies the property in 10.5 with N = 11 and e = 26−11 – i.e., it stops if the next 26 letters
are ABRACADABRA in order. Whenever the stopping time happens, Bn has the same
value for the second and third terms. In this case S = {n − 4, n − 1} since the terminal
ABRA and A matched those M(n−4) and M(n−1) respectively, and S0 has exactly one el-
ement, since this is the first time ABRACADABRA appeared. Hence by Doob’s optional
stopping theorem
0 = B0 = E BT = − E T + 264 + 26 + 2611 (152)
This is the same as (147)
and p 6= q. Suppose that a and b are integers with 0 < a < b. Define
S n = a + X1 + · · · + X n , T = inf{n : Sn = 0 or Sn = b} (154)
Let’s compute
S n −1 +1 S n −1 −1 S n −1
q q q
E [ Mn | F n − 1 ] = p +q = (q + p) = Mn − 1 (156)
p p p
E[ Nn | Fn−1 ] = p(Sn−1 + 1 − n( p − q)) + q(Sn−1 − 1 − n( p − q)) (157)
= ( p + q)Sn−1 + p − q − n( p − q) = Nn−1 (158)
Now the condition described in 10.5 holds with N = b − 1. For any n suppose that
Xn+1 = Xn+2 = · · · = Xn+b = +1. Conditional on Sn ∈ (0, b), then some Sn+b =
Sn + b > b, so Sn+k = b for some k ∈ {1, 2, . . . , b − 1}. In particular, T ≤ n + b − 1.
However Pr( Xn+1 = Xn+2 = · · · = Xn+b = +1) = pb > 0. So
26
and the condition is satisfied. We conclude that E T < ∞.
Let π = Pr(ST = 0) = 1 − Pr(ST = b). By Doob’s optional-stopping theorem
a b
q q
= M0 = E MT = π · 1 + (1 − π ) · (160)
p p
or
Also
a = N0 = E NT = π · 0 + (1 − π ) · b − T ( p − q) (162)
or
(1 − π ) b − a
T= (163)
p−q
10.8 A random number Θ is chosen uniformly in [0, 1] and a coin with probability Θ
of heads is minted. The coin is tossed repeatedly. Let Bn be the number of heads in n
tosses. Prove that Bn has exactly the same probabilistic structure as the ( Bn ) sequence in
10.1 on Polya’s urn. Prove that Nnθ is the regular conditional pdf of Θ given B1 , . . . , Bn
R1
Consider the beta function B(m + 1, n + 1) = 0 um (1 − u)n du. Note B(m, n) = B(n, m)
by a change of variables v = 1 − u. Thus, without loss of generality, assume m ≥ n
Z 1
B(m + 1, n + 1) = um (1 − u)n du
0
1 Z 1
1 n
m +1 n
um+1 (1 − u)n−1 du
= u u +
m+1 0 m + 1 0
n
! (164)
n n+1−k
= B(m + 2, n) = ∏ B(m + n + 1, 1)
m+1 k =1
m+k
n!m! 1 n!m!
= =
(m + n)! m + n + 1 ( n + m + 1) !
27
Compare this with (123). Now
Z 1
Pr( Bn = k; Bn−1 = k − 1) = Pr( Bn = k; Bn−1 = k − 1 | Θ = θ ) dθ
0
Z 1
n − 1 k −1
= θ· θ (1 − θ )n−k dθ (166)
0 k−1
n−1
k
= B(k + 1, n − k + 1) =
k−1 n ( n + 1)
Pr( Bn = k; Bn−1 = k − 1) k
Pr( Bn = k | Bn−1 = k − 1) = = (167)
Pr( Bn−1 = k − 1) n
That is, we pick up a factor of θ whenever bk increases (we flipped a head) or 1 − θ when-
ever bk stays the same (we flipped a tails). Clearly this can be simplified to
28
The jailhouse problem.
Z ∞ Z ∞
x −1 − x
Γ( x )Γ(y) = s e ds ty−1 e−y dt
Z0 ∞ Z ∞ 0
When x, y ∈ N we recover our previous formula (164) since Γ(n) = (n − 1)! for n ∈ N
E ( X T ; T < ∞ ) ≤ E X0 (174)
Deduce that
E X0
Pr(sup Xn ≥ c) ≤ (175)
c
29
10.10 The “Starship Enterprise” problem. The control system on the star-ship Enter-
prise has gone wonky. All that one can do is to set a distance to be travelled. The
spaceship will then move that distance in a randomly chosen direction, then stop. The
object is to get into the Solar System, a ball of radius ρ. Initially, the Enterprise is at a
distance R0 > ρ from the Sun.
Show that for all ways to set the jump lengths rn
ρ
Pr( the Enterprise returns to the solar system) ≤ (179)
R0
and find a strategy of jump lengths rn so that
ρ
Pr(the Enterprise returns to the solar system) ≥ −e (180)
R0
1
Let Rn be the distance after n space-hops. Now f ( x) = | x|
is a harmonic function satis-
∂2 f
fying ∇2 f = ∑i ∂xi2
= 0 when x 6= 0. Let r = | x |, let S( x, r ) be the 2-sphere of radius r
centered at x. Then let S(r ) = S(0, r ), and let dσ be the surface area element. Similarly let
B( x, r ) be the ball of radius r centered at x.
∂f x ∂2 f 1 3xi2 3 3r2
= − 3i = − + ∇2 f = − + =0 (181)
∂xi r ∂xi2 r3 r5 r3 r5
1 1
ZZ ZZ
A( x, r ) = f dσ = f ( x + rn) dσ1 (182)
4πr2 S (r ) 4π S (1)
In the above we applied a dilation to integrate over S(1) with surface element dσ1 = r2 dω.
Then differentiating under the integral sign with respect to r
d 1 1
ZZ ZZ
A( x, r ) = ∇ f ( x + rn) · n dσ1 = ∇ f · n dσ
dr 4π S(1) 4πr2 S (r )
(183)
1
ZZZ
= ∇2 f dV = 0
4πr2 B (r )
30
Before considering the case r > Rn we need to perform a calculation. For any e > 0
and f ( x) = | x|−1
x·x 4πe2
ZZ ZZ
∇ f · n dσ = − dσ = − = −4π (185)
S(e) S(e) r4 e2
since ∇| x|−1 = − x/r3 and n = x/r. Remarkably, the integral does not depend on e. Thus
for r > Rn , we can modify (183) to integrate over B( x, r ) \ B(0, e), and then take e → 0.
This gives the formula (
d 0 if r < | x |
A( x, r ) = 1 (186)
dr − r2 if r ≥ | x |
Thus, this more complete analysis shows that 1/Rn is a supermartingale everywhere, and
1 1 1
E = ≤ (187)
R n +1 max(r, Rn ) Rn
By the positive supermartingale maximal inequality (175)
1 1 ρ
Pr(min Rn ≤ ρ) = Pr max ≥ ≤ (188)
n n Rn ρ R0
The probability on the left is the probability that the enterprise ever gets into the solar
system, and it applies to any strategy rn of jump-lengths.
To get a lower bound, consider the process with constant jump sizes rn = e for any
0 < e < ρ, and also choose ρ0 > R0 . Let τ = inf{ Rn ≤ ρ} and τ 0 = inf{ Rn ≥ ρ0 }
and let t = τ ∧ τ 0 . First we show Pr(t < ∞) = 1, that is we hit one barrier or the other
eventually. Note the x coordinate of the Enterprise’s location follows a random walk
since the x projection of each jump of length e is i.i.d.. Let Xn be the x-coordinate at time
n. By the symmetry of each spherical jump, Xn is a martingale, so E Xn = X0 . Also
if Jn = Xn − Xn−1 note Var( Jn ) = e2 σ02 for some σ0 since, by dilation symmetry of the
spherical jump, the variance scales as e2 .
Now let’s consider the stopped martingale Zn = 1/Rt∧n . This is a martingale since our
chose of e < ρ implies rn = e < Rn for all n. We can apply Doob’s optional stopping
theorem since
1 1
0
≤ Zn ≤ (192)
ρ +e ρ−e
31
and therefore Zn is a bounded, positive martingale. We can break the expectation up into
two parts, when Rn hits the upper and lower barriers ρ0 and ρ.
1 Pr(t = τ ) Pr(t = τ 0 )
= E[ Zt ] = s E[ Zt ; t = τ ] + E[ Zt ; t = τ 0 ] ≤ + (193)
R0 ρ−e ρ0
where we’ve estimated the expectations by the simple upper bounds implied by
ρ − e ≤ Rt ≤ ρ if t = τ
(194)
ρ0 ≤ Rt ≤ ρ0 + e if t = τ 0
Now as ρ0 → ∞, the event {t = τ } is essentially the event that the Enterprise returns
to the solar system. It says that the distance Rn ≤ ρ occurs before Rn → ∞. So we can
interpret this equation in the limit that ρ0 → ∞ to say
ρ−e
Pr(Enterprise returns) ≥ (196)
R0
1
Let f ( x) = log| x| = 2 log( x12 + x22 ).
∂f x ∂2 f 1 2xi2 2 2r2
= − 2i =− 2+ 4 2
∇ f =− 2+ 4 =0 (197)
∂xi r ∂xi2 r r r r
Xn = log Rn − log Rn−1 = log αRn − log αRn−1 for any α > 0 (198)
32
For any real number α > 0,
√
Pr(inf Sn = −∞) ≥ Pr(Sn ≤ −ασ n, i.o) (200)
n
since if the event on the right hand side happens, the event on the left hand side cer- √
tainly does, since Sn takes on values which are arbitrarily negative if its less than −ασ n
infinitely often.
Let Ek be a sequence of events. Then the event that Ek occur infinitely often is given
by \ [
Ei.o. = Ek (201)
n ∈N k ≥ n
The limit exists because the terms are non-increasing, and it equals the lim sup. Since
Pr( k≥n Ek ) ≥ Pr( En ), a lower bound is given by
S
and therefore Pr(infn Sn = −∞) > 0. This implies that Pr(infn Sn = −∞) = 1 by Kolmol-
gorov’s 0-1 law, since we will show this event is in the tail field.
There are two loose ends to tie up. First we will demonstrate that the event E =
{inf Sn = −∞} is in the tail field. Select k such that Snk ≤ −k. Now replace X1 , . . . , Xm by
X10 , . . . , Xm
0 , and let
(
X10 + X20 + · · · + Xn0 if n ≤ m
Sn0 = 0 0 0
(205)
X1 + X2 + · · · + Xm + Xm+1 + · · · + Xn if n > m
If D = ∑im=1 Xi0 − Xi , then for n > m, Sn0 = Sn + D. From this its obvious that, for
large enough k, Sn0 k ≤ −k + M → −∞. Thus inf Sn0 = −∞ as well. Therefore E is in
σ ( Xm+1 , Xm+2 , . . . ) since its independent of the values of X1 , . . . , Xm . Since this is true for
any m, E is in the tail field.
Finally we show that σ2 = Var( Xi ) < ∞. We show E Xi2 < ∞. Without loss of gen-
erality, assume Rn−1 = 1, and we’ll compute the pdf of R = Rn . Let Θ be the direction
the Enterprise travels, with Θ = 0 corresponding to the direction toward the center of the
solar system. By hypothesis Θ ∼ U (0, 2π ). Now R is the base of an isoceles triangle with
edges 1 and opposite angle Θ so
Θ 4
R = 2 sin ⇒ p R (r ) = √ (206)
2 4 − r2
33
where p R (r ) is the pdf of R. Thus we want to compute
4(log r )2 4u2 eu
Z 2 Z log 2
2
E(logR) = √ dr = √ du (207)
0 4 − r2 −∞ 4 − eu
The left tail when r is close to zero equivalent to the
R ∞ transformed integral when u is close
to −∞. This integral resembles the tail of Γ(3) = 0 u e , and is therefore finite. On the
2 − u
√
right tail when r → 2 the integrand resembles 4(log 2)2 / 4 − r2 which has antiderivative
A arctan r/2 for some constant A, and therefore finite integral on [2 − e, 2]. On the inter-
mediate range [e, 2 − e], the integrand is continuous and hence bounded, so the integral
is bounded there too. Thus E(log R)2 < ∞
(n)
Assume that if X denotes any of the Xk then
and deduce that M is bounded in L2 iff µ > 1. Show that when µ > 1
σ2
Var( M∞ ) = (211)
µ ( µ − 1)
( n +1) ( n +1)
Note that since E( Xk | F n ) = E Xk = µ and since Zn is Fn -measurable
Zn
∑ E( Xk
( n +1)
E( Zn+1 | Fn ) = | Fn ) = Zn µ (212)
k =1
34
But by definition, Var( Zn+1 | Fn ) = E( Zn2 | Fn ) − (E( Zn+1 | Fn ))2 , so
µ2 Zn2 Zn σ2 2 σ2
E ( Mn + 1 | Fn ) = 2n+2 + 2n+2 = Mn + Mn n+2 (216)
µ µ µ
Thus
σ2 σ2
bn+1 = E( Mn2 +1 − Mn2 ) = E M n = (217)
µ n +2 µ n +2
Note bn is a geometric series and therefore ∑n bn < ∞ iff µ > 1. By the theorem in section
12.1, M is bounded in L2 iff ∑n bn < ∞, and in that case it converges almost surely to
2 =
some random variable M∞ with E M∞ ∑n bn + M02 . When µ > 1 we can sum the series
2 σ2 /µ2 σ2
Var( M∞ ) = E M∞ − ( E M∞ ) 2 = = (218)
1 − µ −1 µ ( µ − 1)
since E M∞ = M0
12.2 Let E1 , E2 , . . . be independent events with Pr( En ) = 1/n. Let Yk = IEk . Prove that
∑(Yk − 1k )/ log k converges a.s. and use Kronecker’s Lemma to deduce that
Nn
→1 a.s. (219)
log n
where Nn = Y1 + · · · + Yn
1
First note E Yk = Pr Ek = k and
1
Var Yk ≤ E Yk2 = E IE2 k = E IEk = Pr( Ek ) = (220)
k
Yk −k−1
Consider the random variable Zk = log k . Note that E Zk = 0 and
Var(Yk ) 1
σk2 = Var Zk = ≤ (221)
(log k)2 k (log k )2
dx
= − log1 x so by the integral test, ∑k σk2 < ∞. Also the Zk are independent,
R
Now x(log x )2
so by theorem 12.2,
1
∑ Zk = ∑ Yk − k / log k (222)
k k
converges almost surely. By Kronecker’s lemma, this means
∑nk=1 Yk − k−1 Nn − Hn
= →0 a.s. (223)
log n log n
35
where Nn = Y1 + · · · + Yn and Hn = 1 + 21 + · · · + n1 is the harmonic series. Now
Hn → log n + γ where γ is the Euler-Mascheroni constant. In particular Hn / log n → 1.
Therefore
Nn
→1 a.s. (224)
log n
Note that the events in 4.3 for X1 , X2 , . . . drawn from i.i.d. continuous distribution that
there is a “record” at time k satisfies the hypothesis of this problem. In 4.3 we show the
events are independent, and that they have probability 1k .
12.3 Star Trek 3. Prove that if the strategy in 10.11 is (in the obvious sense) employed –
and for ever – in R3 rather than in R2 , then
Let En be the event that the Enterprise returns to the solar system for the first time at time
n. The area of a cap on a sphere with central angle θ is 2πR2 (1 − cos θ ) and therefore the
probability of a jump landing in that cap is 21 (1 − cos θ ). The area formula can be seen by
integrating the surface area element R2 sin θ dθdφ or by using Archemedes observation
that the projection of an sphere onto an enclosing cylendar preserves surface area. For
the jumps strategy described in this problem, the radius of the jump sphere for jump n is
Rn−1 , the distance from the Sun. The probabilty we enter the solar system corresponds
to the cap whose edge is distance r from the sun. Thus there is an isoceles triangle with
equal sides R and base r, and the opposite angle of the base is θ. By the law of cosines,
ρ2 = 2R2 − 2R2 cos θ. Let T be the stopping time that the Enterprise returns to the solar
system T = inf{ Rn ≤ r }. This implies
2
ρ k≤T
2
ξ k = Pr( Ek | Fk−1 ) = 4Rk−1 (226)
0 otherwise
Using the notation of theorem 12.15, Zn = ∑k≤n IEn ≤ 1 for all n. Let Y∞ = ∑∞
k =1 ξ k =
ρ2
∑kT=1 4R2k−1
. If Y∞ = ∞ then 12.15b implies that limn→∞ Zn /Yn = 1, but this is a contradic-
tion because Zn is bounded. Therefore Y∞ < ∞ almost surely.
TODO: to clinch the argument I want to take ρ → 0 and T → ∞, but the sum also gets
bigger in this limit since it includes the terms where Rn is really close to 0.
13.1 Prove that a class C of RVs is UI if and only if both of the following conditions hold
(ii) for every e > 0, ∃δ > 0 such that if F ∈ F , Pr( F ) < δ and X ∈ C then E(| X |; F ) < e
36
Assume the conditions are true. Let
E| X | A
Pr(| X | > K ) ≤ ≤ (228)
K K
Taking the supremum over all X ∈ C , this shows BK ≤ A/K. Therefore BK → 0 as K → ∞.
For any e > 0, there is a δ satisfying condition (ii). Take K large enough that BK < δ. From
the definition of BK , all of the sets F = {| X | > K } satisfy Pr( F ) < δ, and hence condition
(ii) implies E(| X |; | X | > K ) < e for each X ∈ C . This verifies that C is UI.
Conversely assume C is UI. For any K > 0 and F ∈ F , note
Hence, given e > 0 choose K such that E(| X |; | X | > K ) < 2e for all X. Then choose δ = 2K
e
.
Equation (230) shows that for this choice of δ, condition (ii) is satisfied. Furthermore with
F = Ω and K chosen so E(| X |; | X | > K ) ≤ 1, (230) gives a uniform bound of 1 + K for
E(| X |), so condition (i) is satisfied.
C + D = { X + Y : X ∈ C , Y ∈ D} (231)
We’ll verify the conditions in 13.1. First note that for Z0 ∈ C + D with Z0 = X0 + Y0
The bound on the right hand side is independent of Z, so its a bound for supZ∈C+D E| Z |,
and condition (i) is satisfied.
For e > 0, choose δ such that for any X ∈ C and Y ∈ D and F ∈ F with Pr( F ) < δ,
then E(| X |; F ) < 2e and E(|Y |; F ) < 2e (Take δ to be the minimum of the δ needed for C or
D alone). Then for Z = X + Y
e e
E(| Z |; F ) ≤ E(| X |; F ) + E(|Y |; F ) < + =e (233)
2 2
Thus condition (ii) is satisfied for C + D
37
13.3 Let C be a UI family of RVs. Say that Y ∈ D if for some X ∈ C and some sub-σ-
algebra G of F , we have Y = E( X | G), a.s. Prove that D is UI.
14.1 Hunt’s lemma. Suppose that ( Xn ) is a sequence of RV’s such that X = lim Xn
exists a.s. and that ( Xn ) is dominated by Y in ( L1 )+ .
E( Xn | F n ) → E( X | F ∞ ) a.s. (237)
38
14.2 Azuma-Hoeffding
(b) Prove that if M is a positive martingale null at 0 such that for some sequence (cn :
n ∈ N) of positive constants
| Mn − Mn − 1 | ≤ c n , ∀n (242)
1 θc
E eθY = E eθcα−θc(1−α) ≤ eθc E α + e−θc E(1 − α) = (e + e−θc ) = cosh θc (245)
2
Now compare the Taylor series of cosh x vs exp( 12 x2 )
x2k x2k
1 2
cosh x = ∑ exp x =∑ (246)
k
(2k)! 2 k 2k k!
For x ≥ 0, we see exp( 12 x2 ) dominates cosh x term by term since the product on the
RHS below contains only the even terms
39
1 2 2 2 2
Thus by induction E eθ Mn = e 2 θ (cn +cn−1 +···+c1 ) E e M0 = exp( 12 θ 2 ∑k c2k ). If we choose
θ to minimize exp(−θx + 12 θ 2 ∑k c2k ) we get θ = x/(∑k c2k ). Plugging this into (248),
where we can rewrite the LHS because eθx is monotonically increasing
! !
1
Pr sup Mk ≥ x ≤ exp − x ∑ c2k (250)
k≤n 2 k
Define Z ∞ −λx
−1 e sin x
f (λ) = L[ x sin x ] = dx (252)
0 x
Then by (97) f 0 (α) = − L[sin x ]. We can compute this handily
Z ∞
0
f (α) = − e−λx sin x dx (253)
0
∞ Z ∞
−λx
= cos xe +λ e−λx cos x dx (254)
0 0
∞ Z ∞
−λx −λx
= 1 + λ sin xe +λ sin xe (255)
0 0
= (1 + λ2 ) f 0 ( α ) (256)
dλ
Z α
π
f (λ) = 2
= arctan(λ) + (257)
∞ 1+λ 2
R∞
Set λ = 0 to find 0 sinx x dx = π2
A second solution is to use complex analysis. Consider the contour γ in the complex
plane along the real line from − T to −e, then a semicircle centered at 0 from −e to e, then
along the real line from e to T, then a semicircle centered at 0 from T to − T. By Cauchy’s
theorem Z iz
e
lim dz = iπ (258)
e →0 γ z
since the integrand is holomorphic inside of the region bound by γ, but has a residue at
0, and the arc of radius e is π radians. On the semicircle of radius T, writing z = Teiθ we
have dz = iTeiθ dθ.
exp(iTeiθ )
Z π Z π Z π
iθ
e−T sin θ dθ
iT (cos θ +i sin θ ))
iTe dθ ≤ e dθ = (259)
0 Teiθ 0 0
40
The integrand is dominated by 1, so by the dominated convergence theorem the integral
converges to 0 as T → ∞. Therefore as T → ∞ and e → 0 we have
Z ∞ ix
eiz e
Z
lim dz = dx (260)
T →∞,e→0 γ z −∞ x
and we conclude this integral equals iπ. Making the substitution x → − x we conclude
Z ∞ −ix
e
dx = −iπ (261)
−∞ x
eix −e−ix
Using the fact sin x = 2i , this implies
Z ∞ Z ∞
sin x sin x π
dx = π ⇒ dx = (262)
−∞ x 0 x 2
Since the integrand on the left is even.
X − Y ∼ U [−1, 1] (264)
Computing
Z 1 iθx 1
1 e = sin θ
φZ (θ ) = E eiθZ = eiθx dx = (265)
−1 2 2iθ −1 θ
For X and Y i.i.d. and Z = X − Y
where we used the property that E f ( X ) g(Y ) = E f ( X ) E g(Y ) for independent X and Y.
However since φX (−θ ) = φX (θ ), in order for Z = X − Y to satisfy Z ∼ U [−1, 1] it must
be that
sin θ
kφX (θ )k2 = (267)
θ
But this is impossible since the left hand side is strictly non-negative whereas the right
hand side is sometimes negative.
16.3 Calculate φX (θ ) where X has the Cauchy distribution, which has pdf 1+1x2 . Show
that the Cauchy distribution is stable, i.e., that ( X1 + X2 + · · · + Xn )/n ∼ X for i.i.d.
Cauchy Xi .
41
First assume θ > 0. Consider the contour γ which goes from − R to R along the real line,
and then along a semicircle centered at 0 from R to − R. By Cauchy’s theorem
eθiz e−θ
Z
dz = 2πi = πe−θ (268)
γ 1 + z2 2i
since there is exactly one residue at z = i (the denominator can be written (i + z)(i − z))
and the residue there is e−θ /2. Along the semicircle we can use the elementary bound
iθz
Z
e πR
γ 0 1 + z2 ≤ R2 − 1 → 0
dz (269)
since for z = Reiψ the magnitude exp(iθReiψ ) = e− Rθ sin ψ ≤ 1 (this is where we use
the hypothesis θ > 0 and that the path is a semicircle in the upper half plane). Also
|1 + z2 | ≥ |z|2 − 1 = R2 − 1. Thus the integrand is bound by R21−1 and the length of the
path is πR. Thus in the limit only the segment along the real line makes a contribution to
the path integral and we conclude for θ > 0
Z ∞
1 eiθx
φX ( θ ) = dx = e−θ if θ ≥ 0 (270)
π −∞ 1 + x2
For θ < 0 we can modify the above argument, using a semicircle in the lower half
plane from R to − R instead. Then in Cauchy’s theorem we pick up the residue at −i (in-
stead of i) going along a clockwise path (instead of counterclockwise). Thus the integral
equals πeθ . We get the same bounds on the integral which show the contribution of the
segment along the semicircle tends to 0. Thus
Z ∞
1 eiθx
φX ( θ ) = dx = eθ if θ ≤ 0 (271)
π −∞ 1 + x2
Both cases may be summarized by the formula φX (θ ) = e−|θ | . From this it immediately
follows that X is stable since
φSn /n (θ ) = E eiθ (X1 +···+Xn )/n = (E eiθX1 /n )(E eiθX2 /n ) · · · (E eiθXn /n ) = φX (θ/n)n (273)
42
Note that
1 ∞Z
1 2 1 ∞ Z
1 2 1 2 1 2
φX ( θ ) = E eiθX
=√ eiθx e− 2 x dx = √ e− 2 ( x−iθ )) − 2 θ = e− 2 θ I (iθ ) (274)
2π −∞ 2π −∞
R∞ 1 2
where I (α) = √1 −∞ e− 2 ( x−α) dx. For a real parameter α, I (α) = I (0) = 1, since we can
2π
just perform a change of variables u = x − α, and the range of integration is unchanged.
In the complex case, consider a path integral in C along the rectangle with corners at
1 2
± R and ± R − iθ of the function f (z) = √12π e− 2 z . Since f (z) is entire, the path integral is
0. As R → ∞, the integral along the bottom side from − R − iθ to R − iθ tends to I (iθ ). The
integral along the top side from R to − R tends to − I (0) (since the limits of the integral are
reversed). Along the sides of the rectangle, z = ± R − ix for x ∈ [0, θ ], so the magnitude
of the integrand satisfies
1 − 1 (± R−ix)2
√ e 2 = √1 |e− 21 ( R2 − x2 +2Rxi) | ≤ √1 e− 12 ( R2 −θ 2 ) → 0 (275)
2π 2π 2π
The path along this side is constant length θ as R → ∞. Therefore the contribution to the
path integral from the left and right sides of the rectangle is negligible as R → ∞, by the
simple bound of the path length times the maximum integrand magnitude. We conclude
Z
0= f (z) dz → I (iθ ) − I (0) (276)
γ
1 2
Thus I (iθ ) = I (0) = 1 and φX (θ ) = e− 2 θ .
∑ c j ck φ(θ j − θk ) ≥ 0 (277)
j,k
E| Z |2 ≥ 0 (278)
Since ! !
| Z |2 = ZZ = ∑ c j eiθj X ∑ ck e−iθk X = ∑ c j c k ei (θ j −θk ) X (279)
j k j,k
43
16.6
(a) Let (Ω, F , Pr) = ([0, 1], B[0, 1], Leb). What is the distribution of the random variable
Z = 2ω − 1? Let ω = ∑ 2−n Rn (ω ) be the binary expansion of ω. Let
Find a random variable V independent of U such that U and V are identically dis-
tributed and U + 21 V is uniformly distributed on [−1, 1]
(b) Now suppose that (on some probability triple) X and Y are i.i.d. RV such that
1
X+ Y is uniformly distributed on [−1, 1] (281)
2
Let φ be the characteristic function of X. Calculate φ(θ )/φ( 14 θ ). Show that the dis-
tribution of X must be the same as that of U in part (a) and deduce that there exists
a set F ∈ B[−1, 1] such that Leb( F ) = 0 and that Pr( X ∈ F ) = 1
(a) The Lebesgue measure gives the uniform distribution on [0, 1]. Under the linear trans-
form Z = 2ω − 1 the distribution remains uniform, but over [−1, 1]. To see this note
for any z ∈ [−1, 1]
1 1 1
Pr( Z < z) = Pr ω < (z + 1) = Leb 0, (z + 1) = (z − (−1)) (282)
2 2 2
which is exactly the distribution for U [−1, 1]. Let V be given by a similar expression
to U
V (ω ) = ∑ 2−n (2Rn+1 (ω ) − 1) = ∑ 2−n+1 (2Rn (ω ) − 1) (283)
odd n even n
the difference being that the nth term uses the n + 1st Rademacher function Rn+1
rather than Rn . Its a standard result that the Rademacher functions Rn (ω ) are i.i.d..
The variables U and V are functions of disjoint collections of Rn , so they are indepen-
dent. Because their expressions as functions of identically distributed Rademacher
functions are the same, the variables have the same distribution. Its also clear that
1
U + V = ∑ 2−n (2Rn (ω ) − 1) + ∑ 2−n (2Rn (ω ) − 1)
2 odd n even n
! (284)
=2 ∑ 2− n R n ( ω ) − 1 = 2ω − 1
n ∈N
Thus U + 12 V ∼ U [−1, 1]
44
sin( 12 θ )
From this we deduce φX ( 21 θ )φX ( 14 θ ) = 1 and hence
2θ
φX ( θ ) φX (θ ) φX ( 12 θ ) sin θ 1
= = = cos ( θ) (286)
φX ( 41 θ ) φX ( 12 θ ) φX ( 14 θ ) 2 sin( 12 θ ) 2
Hence, by induction
φX ( θ ) φX (θ ) φX ( 41 θ ) φ (4− k +1 θ ) 1 1
− k
= 1 1
· · · − k
= cos( θ ) cos( θ ) · · · cos(2−2k+1 θ )
φX ( 4 θ ) φX ( 4 θ ) φX ( 16 θ ) φ (4 θ ) 2 8
(287)
Since φ is uniformly continuous and φ(0) = 0 we conclude by taking the limit k → ∞
φU (θ ) = ∏ cos(2−k θ ) = φX (θ ) (290)
odd k
So it must be that U and X have the same distribution. TODO prove the measure
0 thing. Its not hard to show that the values of X lie in a cantor-like set with middle
thirds excluded. However this approach doesn’t use the characteristic function at all...
is there some clever thing with Fourier analysis?
18.1
(a) Supose that λ > 0 and that (for n > λ), Fn is the DF associated with the Binomial
distribution Bin(n, λ/n). Prove using CF’s that Fn converges weakly to F where F is
the DF of the Poisson distribution with parameter λ.
(b) Suppose that X1 , X2 , . . . are i.i.d. RV’s each with the density function (1 − cos x )/πx2
on R. Prove that for x ∈ R
X1 + X2 + · · · + X n
1
lim Pr ≤ x = + π −1 arctan x (291)
n→∞ n 2
where arctan ∈ (− π2 , π2 )
45
(a) Since B(n, p) is the sum of independent Bernoulli random variables, if Fn ∼ B(n, p) =
B(n, λ/n) then
n n
iθ λ iθ
φFn (θ ) = pe + q = 1 + (e − 1) → exp(λ(eiθ − 1)) (293)
n
On the other if F has a Poisson distribution with parameter λ, its CF is given by
∞
λk
φF (θ ) = E eiθF = ∑ eiθk e−λ
k!
= exp(λeiθ − λ) (294)
k =0
which is exactly the same as limn→∞ φFn (θ ). Thus we conclude that Fn → F in weakly
in distribution.
(b) Let’s compute
1 − cos x
Z
φX (θ ) = E eiθX = dx eiθx (295)
R πx2
To this end first let’s compute for k ∈ N and α > 0
Z
Ik (α) = x −k eiαx dx (296)
R
(Aside: there’s something subtle going on here, because, like the integral in 16.1, this
is not integrable in the L1 sense). As in problem 16.1, take a contour along the real
axis from − R to −e, a semicircle in the upper half plane from −e to e, a line from e
to R, and then a semicircle in the upper half plane back to − R. We use a variation
f (z)
of the residue theorem γ0 (z−z )k dz = (ki∆θ f (k−1) (z0 ), and ∆θ is the radians in the
R
0 −1) !
infinitesimal path.
(iα)k−1
I iαz
e
dz = πi (297)
γ zk ( k − 1) !
Using the technique of 16.1, we get an estimate of the integral along the big semicircle
Z π − Rα sin θ
eiαz e
Z
dz = dθ → 0 as R → ∞ (298)
γ00 zk 0 R k −1
Since | x −k | ≤ | x −1 | on the big semicircle, the estimate for the contribution of this
segment holds. Therefore as R → ∞ and e → 0 we get
eiαz
I
dz → Ik (α) (299)
γ zk
When α < 0 we can modify the argument by taking semicircles in the lower half
plane. This allows us to conclude the integral vanishes along the big semicircle, but
now the sense of integration along the small semicircle is reversed to clockwise from
counterclockwise. We can summarize both cases as
π (iα)k
Ik (α) = (300)
( k − 1) ! | α |
46
This gives I1 (α) = πi sgn( x ) and I2 (α) = −π |α|.
Expressing cos x = 12 (eix + e−ix ), our CF can be written
1 1
φX ( θ ) = I2 (θ ) − ( I2 (θ + 1) + I2 (θ − 1))
π 2
1
= (|θ + 1| + |θ − 1|) − |θ | (301)
2
(
1 − |θ | |θ | ≤ 1
=
0 otherwise
For any fixed θ, for large enough n, |θ |/n ≤ 1 so the first case applies eventually.
Taking the limit as n → ∞ this implies
18.2 Prove the weak law of large numbers in the following form. Suppose X1 , X2 , . . .
are i.i.d. RV each with the same distribution as X. Suppose X ∈ L1 and that E X = µ.
Prove by use of CF’s that the distribution of
S n = n − 1 ( X1 + · · · + X n ) (305)
Sn → µ in probability (306)
47
Now consider the Taylor expansion of φX (θ ). Note that
0
φX ( 0 ) = 1 (0) = E iXeiθX = iµ (308)
φX
θ =0
Therefore
log φX (θ ) = 1 + iµθ + o (θ ) (309)
So the characteristic function satisfies
n
n iµθ θ
φSn (θ ) = φX (θ/n) = 1 + +o → exp(iθµ) = φZ (θ ) (310)
n n
This implies that Sn → δµ in distribution. However, for a unit mass, convergence in
distribution is the same as convergence in probability since for any e > 0
Pr(Sn < µ − e) → Pr( Z < µ − e) = 0 Pr(Sn > µ + e) → Pr( Z > µ + e) = 0 (311)
and hence Pr(|Sn − µ| > e) → 0 for any e > 0. This proves the weak law of large
numbers.
18.3 Let X and Y be RV’s taking values in [0, 1]. Suppose that
E Xk = E Yk k = 0, 1, 2, . . . (312)
Prove that
Item (ii) follows from the Weierstrass theorem. On the compact set [0, 1], every contin-
uous function f can be approximated uniformly by a sequence of polynomials pn . Choose
N large enough so that for all n > N, | f ( x ) − pn ( x )| < e for all x. In this case
|E f ( X ) − E pn ( X )| ≤ E| f ( X ) − pn ( X )| ≤ e (314)
Therefore
|E f ( X ) − E f (Y )| ≤ |E f ( X ) − E pn ( X )| + |E pn ( X ) − E pn (Y )| + |E pn ( X ) − E f ( X )|
≤ e + 0 + e = 2e
(315)
48
Since e is arbitrary, this shows E f ( X ) = E f (Y ).
Approximate I[0,α] by the continuous peicewise linear function
1
x ∈ [0, α)
f n ( x ) = n (α − x ) x ∈ [α, α + n1 ) (316)
x ∈ [α + n1 , 1]
0
The same holds for Y. However, E f n ( X ) = E f n (Y ) for all n by part (ii). Therefore the
limits are equal and Pr( X ≤ α) = Pr(Y ≤ α). Thus the moments unique determine the
distribution on [0, 1].
Suppose that Z
mk = lim x k dFn exists for k = 0, 1, 2, . . . (319)
n [0,1]
w
Use
R the Helly-Bray Lemma and 18.3 to show that Fn → F, where F is characterized by
x k dF = m , ∀ k
[0,1] k
w
Let Fn be associated with the measure µn . Suppose a subsequence Fni → F converges
w
in distribution. By Helly-Bray there is a subsequence ni such that Fni → F, and call the
associated measure µ. By the definition of weak convergence since x k is a continuous
function on [0, 1], and since ni is a subsequence of n
Z
k k k
mk = lim µn ( x ) = lim µni ( x ) = µ( x ) = x k dF (320)
n ni [0,1]
w
If another subsequence ni0 satisfies Fn0 → F 0 with associated measure µ0 , then 10.3 implies
i
that F 0 = F since the above argument shows µ0 ( x k ) = mk as well, so µ0 ( x k ) = µ( x k ) for
all k.
w
This implies that Fn → F. For suppose there is some continuous g : [0, 1] → R such
that lim sup µn ( g) 6= µ( g) or lim inf µn ( g) 6= µ( g). Then choose a subsequence ni so
that µni ( g) converges to a value other than µ( g). By Helly-Bray we can take a further
subsequence converging to a distribution F 0 . This distribution must satisfy µ0 ( g) 6= µ( g),
but this contradicts what we’ve previously shown, that every convergent subsequence
has the same limit. Therefore limn µn ( g) exists for all g ∈ C [0, 1] and lim µn ( g) = µ( g).
49
18.5 Improving on 18.3, a moment inversion formula. Let F be a distribution with
F (0−) = 0 and F (1) = 1. Let µ be the associated law, and define
Z
mk = x k dF ( x ) (321)
[0,1]
Define
This models the situation in which Θ is chosen with law µ, a coin with probability Θ is
minted and tossed at times 1, 2, . . . . See E10.8. The RV Hk is 1 if the kth toss produces
heads, 0 otherwise. Define
Sn = H1 + H2 + · · · + Hn (323)
By the strong law of large numbers and fubini’s theorem
Sn /n → Θ, a.s. (324)
Da = ( an − an+1 : n ∈ Z+ ) (325)
Prove that
n
Fn ( x ) = ∑ ( D n −i m )i → F ( x ) (326)
i
at every point of continuity of F
50
Clearly this is true for n = 0, for n ≥ 1
18.6 Moment Problem. Prove that if (mk : k ∈ Z+ ) is a sequence of numbers in [0, 1],
then there exiss a RV X with values in [0, 1] such that E( X k )
18.7 Using Laplace transforms instead of CF’s. Suppose that F and G are DF’s on R+
such that F (0−) = G (0−) and
Z Z
−λx
e dF ( x ) = e−λx dG ( x ), ∀x ≥ 0 (331)
[0,∞) [0,∞)
Note that the integral on the LHS has a contribution F (0) from {0}. Prove that F = G.
Suppose that ( Fn ) is a sequence of distribution functions on R each with Fn (0−) = 0
and such that Z
L(λ) = lim e−λx dFn ( x ) (332)
n
exists for λ ≥ 0 and that L is continuous at 0. Prove that Fn is tight and that
Z
w
Fn → F where e−λx dF ( x ) = L(λ), ∀λ ≥ 0 (333)
Let U = e−X and V = e−Y . Since the range of X, Y is [0, ∞) the range of U, V is [0, 1].
From (331)
E U k = E e−kX = E e−kY = E V k (334)
Hence by 10.3 the distributions of U and V are equal. However, since e x is monotonic and
invertible, this means that the distributions of X and Y are equal. More concretely, the
relationship Pr(U ≥ c) = Pr( X ≤ − log c) provides a dictionary to translate between the
distributions of X and U, and similarly for Y and V.
Given e > 0 choose λ small enough so that L(λ) > 1 − e/4. This is possible because L
is continuous at 0. Thus, because L is the limit of the Laplace transforms of the Fn , for all
but finitely many n we must have
Z
(1 − e−λx ) dFn ( x ) < e/2 (335)
[0,∞)
51
Thus for any K, for all but finitely many n
Z
−λK
(1 − Fn (K ))(1 − e )≤ (1 − e−λx ) dFn ( x ) < e/2 (336)
[0,∞)
Choose K large enough so that 1 − e−λK > 12 . Thus we get, for all but finitely many n
Fn (K ) > 1 − e (337)
which shows that Fn is tight.
The rest of the proof is basically the same as 18.4. Having proved tightness, Hally-Bray
implies that some subsequence converges in distribution to some distribution, say F. Any
distribution which is the limit of a subsequence of Fn has the Laplace transform L (since
the limit of a subsequences are the same as the limit of the sequence). By the argument
above, there is a unique distribution with a given Laplace transform, Thus, the limit of
a subsequence of Fn , when it exists, converges to a unique distribution regardless of the
w
subsequence.R This implies that R Fn → FRsince, for any continuous function g, we must
have lim sup g dFn = lim inf g dFn = g dF, since otherwise we could use Helly-Bray
to find a subsequence which converged to a distribution other than F.
(e) Deduce from (a) and (d) that Xn → X in probability if and only if every subsequence
of ( Xn ) contains a further subsequence which converges a.s. to X
(a) Suppose there is an event E ⊂ Ω with Pr( E) > 0 and for some e > 0, | Xn − X | > e
infinitely often for ω ∈ Ω. Then for every ω ∈ E, it can’t be that Xn → X since either
Xn > X + e infinitely often or lim supn Xn < X − e infinitely often, and hence either
the lim inf or lim sup is different from X. Thus Pr( Xn → X ) ≤ 1 − Pr( E) < 1.
(b) Let Ω = [0, 1], taking Pr as the Lesbegue measure on the Borel sets. Let
(
1 if nk ≤ x < k+
n
1
Xn,k ( x ) = (338)
0 otherwise
Then take the Xn,k as a sequence taking the indices in lexicographic order. That is,
take the n’s in order, and for a given n take the k’s in order. Now for n ≥ N, and
e ∈ (0, 1)
1 1
Pr(| Xn,k − 0| > e) = Pr( Xn,k = 1) = ≤ (339)
n N
52
Thus Xn,k → 0 in probability. On the other hand, lim inf Xn,k ( x ) = 1 for any x ∈ [0, 1).
That’s because for any n, let k n ( x ) = bn/x c. Then Xn,kn ( x) ( x ) = 1, and Xn,k ( x ) = 1
infinitel often. Therefore it can’t be that Xn,k → X almost sure (in fact, almost surely,
the sequence doesn’t converge).
A second solution is given by considering IEn for independent En where Pr( En ) = pn .
Note
Pr(| IEm − IEn | > e) = Pr( En ∆Em ) = (1 − pn ) pm + pn (1 − pm ) (340)
which are independent. TODO finish second solution
53
A13.2 If ξ ∼ N (0, 1), show E eλξ = exp(− 21 λ2 ). Suppose ξ 1 , ξ 2 , . . . are i.i.d. RV’s each
with a N (0, 1) distribution. Let Sn = ∑in=1 ξ i and let a, b ∈ R and define
Show
( Xn → 0 a.s.) ⇔ (b > 0) (342)
but that for r ≥ 1
( Xn → 0 in Lr ) ⇔ (r < 2b/a2 ) (343)
Calculating
1 1 − 12 λ2 √1
Z Z Z
−λξ − 12 ξ 2 − 21 (ξ −λ)2 − 12 λ2 1 2
Ee λξ
=√ e e dξ = √ e dξ = e e− 2 ζ dζ
2π R 2π R 2π R
− 12 λ2
=e
(344)
Suppose b < 0. By the SLLN, Sn /n → 0 as n → ∞ so, almost surely. Thus for each
element of the sample space ω there is an N (ω ) such that Snn − b < 21 b for all n > N. In this
case exp(Sn − bn) < exp(− 12 bn) → 0. Now suppose b > 0. The similarly there is an N (ω )
such that Snn > 21 b and therefore Xn > exp( 12 bn) → ∞. For b = 0, the law of the iterated
logarithm says that Sn > n log log n infinitely often. Therefore e aSn > (log n) an infinitely
often, and hence does not converge to 0. Since −Sn is also the sum of i.i.d. N (0, 1) RV’s,
e aSn < (log1n)an infinitely often. Thus the limit of e aSn doesn’t exist, it oscillates infinitely
often reaching arbitrarily high and low positive values.
On the other hand
1 2 − br ) n
E Xnr = E er(aSn −bn) = e−rbn (E eraξ )n = e( 2 (ra) (345)
If r < 2b/a2 then k Xn kr → 0, but if r ≥ 2b/a2 its not true. If 2b/a2 then k Xn kr → ∞ and
if r = 2b/a2 then k Xn kr → 1
54