Solutions To Exercises in Advanced Proba PDF

1
Solutions to Exercises
06/09-2018
Thomas Mikaelsen
Advanced Probability Theory 1 & 2- Ernst Hansen
INDHOLD INDHOLD
Indhold
1 Week 1 5
1.1 Exercise 1.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2 Exercise 1.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.3 Exercise 1.3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.4 Exercise 1.4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.5 Exercise 1.5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.6 Exercise 1.6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2 Week 2 12
2.1 Exercise 2.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.2 Exercise 2.4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.3 Exercise 2.5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.4 Exercise 2.6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.5 Exercise 2.7 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.6 Exercise 2.9 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.7 Exercise 2.11 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3 Week 3 20
3.1 Exercise 2.12 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.2 Exercise 2.13 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
3.3 Exercise 2.14 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
3.4 Exercise 2.15 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
3.5 Exercise 2.16 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.6 Exercise 2.17 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.7 Exercise 2.18 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
4 Week 4 27
4.1 Exercise 1.21 [3] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
4.2 Exercise 1.23 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
4.3 Exercise 1.24 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
4.4 Exercise 1.26 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
4.5 Exercise 1.29 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
4.6 Exam January 2014 . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
4.6.1 Question 1.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
4.6.2 Question 1.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
4.6.3 Question 1.3 . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
4.6.4 Question 1.4 . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
4.7 Exam January 2013 . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
2
INDHOLD 3
4.7.1 Problem 2.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

4.7.2 Problem 2.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
4.7.3 Problem 2.3 . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
4.7.4 Problem 2.4 . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
4.7.5 Problem 2.5 . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
4.7.6 Problem 2.6 . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
5 Week 5 36
5.1 Exercise 2.1 [3] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
5.2 Exercise 2.4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
5.3 Exercise 2.10 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
5.4 Example 7.11 [2] . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
5.5 Exercise 2.14 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
5.6 Exam Stok2 November ’16 . . . . . . . . . . . . . . . . . . . . . . . 41
5.6.1 Question 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
5.6.2 Question 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
5.6.3 Question 1.3 . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
5.6.4 Question 1.4 . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
5.6.5 Question 1.5 . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
6 Week 6 45
6.1 Exercise 3.2 [3] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
6.2 Exercise 3.3 [3] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
6.3 Exercise 3.4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
6.4 Exercise 3.7 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
6.5 Exam January 2014 Question 2 . . . . . . . . . . . . . . . . . . . . 51
6.5.1 Question 2.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
6.5.2 Question 2.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
6.6 Re-Exam January 2015 Question 3 . . . . . . . . . . . . . . . . . . 52
6.6.1 Question 3.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
6.6.2 Question 3.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
6.6.3 Question 3.3 . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
7 Week 7 55
7.1 Exercise 3.8 [3] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
7.2 Exercise 3.9 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
7.3 Exercise 3.10 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
7.4 Exercise 3.11 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
7.5 Exercise 3.12 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
7.6 Exercise 3.13 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
7.7 Exam 2015, Problem 3 - Bernstein’s Theorem . . . . . . . . . . . . 59
7.7.1 Question 3.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
7.7.2 Question 3.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
7.7.3 Question 3.3 . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
7.7.4 Question 3.4 . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
4 INDHOLD
7.7.5 Question 3.5 . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

1 Week 1
1.1 Exercise 1.1

We wish to show that the quantile transformation method mentioned in [3] chap-
ter 17 for simulation of variables with standard exponential distribution amounts
to computing X = − log(1 − U ) where U ∼ U nif [0, 1].
We take it as known that the standard exponential distribution has density R y fY (y) =
e−y for y ≥ 0 and hence has distribution function F (y) = P (Y ≤ y) = 0 e−t dt =
1 − e−y for y ≥ 0. ey is strictly increasing in y, so F (y) is strictly increasing by
monotonicity of the integral and also obviously continuous. Thus F is injective and
has a unique quantile function, namely its inverse F −1 . We find F −1 by solving
−1 (u)
F (F −1 (u)) = u ⇔ 1 − e−F = u ⇔ F −1 (u) = − log(1 − u).
It now follows by [3] that q(U ) = − log(1−U ) ∼ exp(1), which is what we wanted.
1.2 Exercise 1.2

We wish to use the quantile method on the Cauchy-distribution as well. It has
1
density fY (y) = π(1+y 2 ) and thus it has distribution
1 y 1 1 1
Z
F (y) = P (Y ≤ y) = dt = arctan(y) +
π −∞ 1 + t2 π 2
arctan(x) is strictly increasing and continuous so it has a unique quantile function.
Similarly to Exercise 1.1 we solve
1 1 π(2U − 1)
−1 −1 −1
F (F (u)) = u ⇔ arctan(F (u)) + ⇔ F (u) = tan .
π 2 2

By [3] q(U ) = tan π(2U2 −1) ∼ Cauchy, which is what we wanted.
1.3 Exercise 1.3

This is about rejection sampling. Assume we want to simulate a real-valued random
variable X with density f and distribution function F . We can generate real-valued
random variables Y with density g as well as random variables U ∼ U nif [0, 1].
We assume that there exists c > 1 such that
f (x) ≤ cg(x) for all x ∈ R. (1)
5
6 KAPITEL 1. WEEK 1
1.3(a). Assume that Y U . We wish to show by explicit computation in the
|=
joint distribution that
!
f (Y ) 1
P U≤ = . (2)
cg(Y ) c
Notice that condition (1) is required to ensure that f and g are not the same, as
if c = 1 we either have that
f (x) = g(x) for all x ∈ R
which is trivial, or we have
Z Z
f (x) < g(x) for all x ∈ R ⇒ f =1< g=1
R R
which is a contradiction. We can always calculate the probability in question by
using
! Z
f (Y )
P U≤ = ✶{U ≤ f (Y ) } d(P ⊗ P )
cg(Y ) cg(Y )
Z
(Def.) → = ✶(−∞, f (Y ) ] (U )d(P ⊗ P )
cg(Y )
Z
(A.C.V., Thm. 10.8) → = ✶(−∞, f (y) ] d(U, Y )(P ⊗ P )
cg(y)
Z
✶(−∞, f (Y ) ] (u)d U (P ) ⊗ Y (P )

(Y U) → =
|=
cg(Y )
Z Z
✶(−∞, f (y) ] (u)dU P (u)dY P (y)

(Tonelli) → =
cg(y)
Z Z
✶(−∞, f (y) ] (U )dP dY (P )(y)

Def. of integral w.r.t. U (P ) → =
cg(y)
!
f (y)
Z
(Def. of F) → = P U≤ dY (P )(y)
cg(y)
f (y)
Z
(U ∼ U nif [0, 1]) → = dY P )(y)
cg(y)
f (y)
Z
(Y has density g w.r.t. λ) → = g(y)λ(dy)
cg(y)
1
Z
= f (y)λ(dy)
c
1
(f is a density) → = ,
c
1.3. EXERCISE 1.3 7
which is what we wanted.
1.3(b). We now wish to show that

!
f (Y ) F (x)
P Y ≤ x, U ≤ = for x ∈ R.
cg(Y ) c
We cannot simply factor the expression into a product using independence since
Y appears in both sets. We get
! Z
f (Y )
P Y ≤ x, U ≤ = ✶(−∞,x]×(−∞, f (Y ) ] (Y, U )d(P ⊗ P )
cg(Y ) cg(Y )
Z
= ✶(−∞,x] (Y )✶(−∞, f (Y ) ] (U )d(P ⊗ P )
cg(Y )
Z
(A.C.V.) → = ✶(−∞,x] ✶(−∞, f (y) ] d(Y, U )(P ⊗ P )
cg(y)
Z
✶(−∞,x] ✶(−∞, f (y) ] d Y (P ) ⊗ U (P )

(Y U) → =
|=
cg(y)
Z Z
✶(−∞,x] ✶(−∞, f (y) ] dU P (u)dY P (y)

(Tonelli) → =
cg(y)
Z Z
✶(−∞,x] (y) ✶(−∞, f (y) ] dU P (u)dY P (y)

=
cg(y)
!
f (y)
Z
✶(−∞,x] (y)P U ≤

= dY P (y)
cg(y)
f (y)
Z
(U ∼ Unif[0, 1]) → = ✶(−∞,x] (y) dY (P )(y)
cg(y)
f (y)
Z
(Y has density g) → = ✶(−∞,x] (y) g(y)λ(dy)
cg(y)
1
Z
= ✶(−∞,x] (y)f (y)λ(dy)
c
F (x)
= ,
c
1.3(c). Let Y1 , Y2 , ... be a sequence of random variables from the Y mechanism

and U1 , U2 , ... correspondingly. Assume all variables are independent. Define
( )
f (y)
τ = inf n ∈ N | Un ≤
cg(y)
8 KAPITEL 1. WEEK 1
f (y)
with the convention that τ = ∞ if Un > for all n. We wish to show that
cg(y)
n o
f (y)
P (τ = ∞) = 0. For convenience let An = Un > cg(y) ) . We wish to use the
continuity properties of the measure, but to do that we need to construct a clever
sequence, so let
n
\
Bn = Ak .
k=1
We then have
∞
\ ∞
\
B1 ⊇ B2 ⊇ ... and An = Bn
n=1 n=1
which gives us
∞
\
P (τ = ∞) = P An
n=1
\∞

=P Bn
n=1
= lim P (Bn )
n→∞
n
\
= lim P Ak
n→∞
k=1
n
Y 1
(U Y ) → = lim 1−
|=
n→∞
k=1
c
1 n
= lim 1− = 0,
n→∞ c
as we wanted.
1.3. EXERCISE 1.3 9
We now wish to show that Eτ = c. Since τ is a discrete variable, we get

∞
X
Eτ = nP (τ = n)
n=1
X∞ n−1
\
Ak ∩ AC

= nP n)
n=1 k=1
∞
X 1 n−1 1
= n 1−
n=1
c c
∞
1X d 1 n
=− 1−
c n=1 dc c
∞
1 d X 1 n
(∗) = − 1−
c dc n=1 c
1 1 c2
= = =c
c (1 − (1 − 1c ))2 c
where (∗) uses the fact that it is a (geometric) power series with radius of con-
vergence |1| and since c > 1 we are within that radius and thus we may switch
summation and differentiataion, which is what we wanted.
1.3(d). We now define


 Y1
 on (τ = 1)
X = Y2 on (τ = 2)
..
 ..
. .
We wish to show that

1 n−1 F (x)
P (τ = n, X ≤ x) = 1 − for n ∈ N, x ∈ R
c c
By construction we have
P (τ = n, X ≤ x) = P (τ = n, Yn ≤ x)
n−1
\
Ak ∩ AC

=P n ∩ Y n ≤ x
k=1
n−1
\
Ak P ∩ AC

=P n ∩ Yn ≤ x
k=1
1 n−1 F (x)
(Exc. 1.3(b)-(c)) → = 1 − ,
c c
10 KAPITEL 1. WEEK 1
1.3(e). We wish to show that P (X ≤ x) = F (x), that is X has the desired

distribution. Since P (τ = ∞) = 0 we cleverly get
[∞

P (X ≤ x) = P τ = n ∩ X ≤ x)
n=1
∞
[
=P τ =n∩X ≤x
n=1
∞
X
= P τ =n∩X ≤x
n=1
∞
X 1 n−1 F (x)
= 1−
n=1
c c
1 F (x)
= = F (x)
1 − 1c c
1.4 Exercise 1.4

Lacks a rigorous proof.
1.5 Exercise 1.5

Let X ∼ N (0, 1), let a > 0.
1.5(a). Let b > 0. We wish to show that

eab ✶[a,∞) (x) ≤ ebx for x ∈ R (3)
this is easily seen by plugging in x < a and x ≥ a respectively in the left-hand
side. We now show that
2 /2
P (X ≥ a) ≤ e−ab eb (4)
by first integrating both sides of (3). By using monotonicity of the integral we get
Z Z Z
e ✶[a,∞) (x)λ(dx) = e ✶[a,∞) (x) = e = e P (X ≥ a) ≤ ebx λ(dx) = eb /2 .
ab ab ab ab 2
Rearranging gives us (4).
1.5(b). We now wish to minimize the expression on the right hand side of (4)
by minimizing b, giving us the bound
2 /2
P (X ≥ a) ≤ e−a
Since e is strictly growing in b this is done globally by maximizing b2 /2 − ab using
first order conditions. We get b = a easily.
1.6. EXERCISE 1.6 11
1.6 Exercise 1.6

We wish to show that if P (X ≤ x) ∈ {0, 1} for all x, then X has a degenerate
distribution.
Consider c = sup{x | P (X ≤ x) = 0}. It holds that c < ∞ since F has to become

1 at some point, or else the density is 0 everywhere, which is a contradiction. We
now have
P (X = c) = F (c) − P (X < c)
(Ch. 17 in MT) → = F (c) − lim− F (x)
x→c
(F always right-continuous) → = lim+ F (x) − lim− F (x)
x→c x→c
(By construction of c) → = lim+ 1 − lim− 0 = 1
x→c x→c

2 Week 2
2.1 Exercise 2.2

Let X1 , X2 , ... ∈ R be random variables. Assume that

P |Yn | ≤ |Xn | = 1, for all n ∈ R. (1)

a.s. a.s.
Xn −−→ 0 ⇒ Yn −−→ 0. (2)
a.s.
Assume that Xn −−→ 0 and define An = {ω ∈ Ω | Xn (ω) → 0} = (Xn → 0) and
Bn = {ω ∈ Ω | |Yn | ≤ |Xn |} = |Yn | ≤ |Xn | . By assumption we have P (An ) = 1
and so it follows from (1) that

P An ∩ Bn = 1 for all n ∈ N.
Let ω ∈ An ∩ Bn and let ǫ > 0. We then have that Xn (ω) → 0 surely (not almost
surely), so we can choose N ∈ N such that |Xn (ω) − 0| = |Xn (ω)| ≤ ǫ for all
n ≥ N . We then have
|Yn (ω) − 0| = |Yn (ω)| ≤ |Xn (ω)| ≤ ǫ, for all n ≥ N
and thus Yn (ω) → 0 pointwise. It follows that

An ∩ Bn ⊆ Yn → 0)

[Monotonicity of P] ⇒ 1 = P An ∩ Bn ≤ P (Yn → 0)

P (Yn → 0) ≤ 1 ⇒ P (Yn → 0) = 1
a.s.
[Definition] ⇒ Yn −−→ 0,
2.2 Exercise 2.4

Let X, X1 , X2 , ... ∈ R be random variables. Assume that X1 ≤ X2 ≤ ... almost
surely. We wish to show that
P a.s.
Xn −
→X ⇔ Xn −−→ X.
12
We show each direction in turn.
” ⇐ ”: This follows directly from Lemma 2.14.

P
” ⇒ ”: Assume that Xn − → X and define An = {ω ∈ Ω | X1 ≤ X2 ≤ ...}
P
with P (An ) = 1 by assumption. Since Xn − → X it follows from Theorem 2.26 that
a.s.
there is a sub-sequence (Xnk )k∈N such that Xnk −−→ X for k → ∞ which is equiva-
a.s.
lent to P (Xnk −−→ X) = 1. Let Bn = (Xnk → X), we then have P (An ∩ Bn ) = 1.
Let ω ∈ An ∩ Bn , this means that Xn (ω) is monotonically increasing for all n ∈ N
and that Xn (ω) has a convergent sub-sequence. It is a standard result in analysis
that these two conditions imply Xn (ω) → X(ω). Using the same arguments as in
exercise 2.2 we have thus shown that

An ∩ Bn ) ⊆ Xn → X ⇒ 1 = P An ∩ Bn ) ≤ P Xn → X
⇒ P (Xn → X) = 1
a.s.
⇒ Xn −−→ X,
which is what we wanted. This result clearly holds as well for decreasing sequen-
ces, as the only thing needed is that monotonic sequences with convergent sub-
sequences converge.
2.3 Exercise 2.5

Let X, X1 , X2 , ... ∈ R be random variables. We wish to show that
P a.s.
sup |Xk − X| −
→0 ⇔ Xn −−→ X.
k≥n
Let Yn = supk≥n |Xn − X|. This is a decreasing sequence in n, as when n in-

creases we take supremum over increasingly smaller sets, thereby decreasing the
P
supremum. The result from Exercise 2.4 therefore applies, yielding Yn −→ 0 ⇔
a.s. a.s. a.s.
Yn −−→ 0. It is therefore enough to show that Yn −−→ 0 ⇔ Xn −−→ X. Consider
Yn (ω) → 0 ⇔ ∀ǫ > 0∃N ∈ N : sup |Xk (ω) − X(ω)| < ǫ ∀n ≥ N

k≥n
⇔ ∀ǫ > 0∃N ∈ N : |Xn (ω) − X(ω)| < ǫ ∀n ≥ N

⇔ Xn (ω) → X(ω)
a.s.
This shows that (Yn → 0) = (Xn (ω) → X(ω), so if Yn −−→ 0 then P (Yn → 0) = 1
a.s a.s.
and it follows that P (Xn → X) = 1 and conversely, that is Yn −→ 0 ⇔ Xn −−→ X,
2.4 Exercise 2.6

Let U1 , U2 , ... ∈ R be an i.i.d. sequence of random variables with U1 ∼ Unif[0, 1].
Define
Mn = max{U1 , ..., Un } for n ∈ N.
We wish to show that Mn → 1 almost surely and in Lp for p ≥ 1.
We start by showing almost sure convergence and begin by noticing that Mn

a.s. P
is an increasing sequence, so Exercise 2.4 implies that Mn −−→ 1 ⇔ Mn − → 1. It is
therefore enough to show convergence in probability. Let ǫ ∈ (0, 1). We then have

P (|1 − Mn | ≥ ǫ) = P (1 − Mn ≥ ǫ) ← Mn ∈ [0, 1]
= P (Mn ≤ 1 − ǫ)
n
\
=P Uk ≤ 1 − ǫ
k=1
n
Y
[Ui Uj ] → = P Uk ≤ 1 − ǫ
|=
k=1
n
Y
[Ui ∼ Unif[0,1]] → = (1 − ǫ)
k=1
= (1 − ǫ)n → 0

1 − ǫ ∈ [0, 1) →
which is the definition of convergence in probability, which is what we wanted.

a.s.
Let An = (Mn −−→ 1) which we have already shown satisfies P (An ) = 1. Let
ω ∈ An . Since Mn is increasing, it follows that |Mn |p = Mnp is also increasing, so
we can apply Monotone Convergence (M.C.) to get
lim E|1 − Mn (ω)|p = lim E(1 − Mn (ω))p

n→∞ n→∞
[M.C.] → = E(1 − lim Mn (ω))p = E(1 − 1)p = 0

n→∞
and since this hold for all ω except in a null-set, we have that this is true for all ǫ.
2.5 Exercise 2.7

Let X1 , X2 , ... ≥ 0 be a sequence of random variables. Assume that
∞
X
EXn < ∞. (1)
n=1
Let Sn = ni=1 Xi for n ∈ N. We wish to show that there is a random variable S

P
such that Sn → S almost surely and in L1 .
We start by showing L1 convergence. Recall that L1 is a Banach-space and there-

fore complete, so absolutely convergent series with respect to the 1-norm ||X||1 =
E|X| are also convergent. To establish convergence of Sn we therefore show abso-
lute convergence of Sn . Consider
n
X
lim || |Sn | || = lim ||Sn ||1 = lim || Xk ||1
n→∞ n→∞ n→∞
k=1
∞
X ∞
X ∞
X
≤ ||Xn ||1 = E|Xn | = EXn < ∞
n=1 n=1 n=1
where we use that Xn ≥ 0 and our assumption (1). By completeness it follows

that Sn is convergent to some random variable S.
To show convergence almost surely, we notice that Sn is an increasing sequen-

ce as Xn ≥ 0, so by Exercise 2.4 it is enough to show convergence in probability.
But Lemma 2.21 gives us that L1 convergence implies convergence in probability,
which we have just shown, and so we are done.
2.6 Exercise 2.9
Define φ : [0, ∞) → R by
x
φ(x) = , for x ≥ 0
1+x
Ad (a): φ is clearly continuous, since we don’t divide by 0, also φ(x) ∈ [0, 1),
1 −2
φ(0) = 0/1 = 0, φ′ (x) = (1+x) ′′
2 > 0, φ (x) = (1+x)3 < 0. All of this is obvious.
(b): We wish to show that
φ(x + y) ≤ φ(x) + φ(y).

Notice
R x ′ that according to the Fundamental Theorem of Calculus we have φ(x) =
0
φ (t)dt, so we get
Z x+y
φ(x + y) = φ′ (t)dt
Z0 x Z x+y
′
= φ (t)dt + φ′ (t)dt
0
Z x+y x
= φ(x) + φ′ (t)dt
x
Z y
= φ(x) + φ′ (t + x)dt
Z0 y
≤ φ(x) + φ′ (t)dt
0
= φ(x) + φ(y) − φ(0) = φ(x) + φ(y)
where the inequality uses that φ′ is decreasing because φ′′ < 0 and monotonicity
of the integral.
Ad (c): We wish to show that if we identify random variables that are almost
surely equal, then
d(X, Y ) = Eφ(|X − Y |)
defines a metric on the space of real random variables defined on (Ω, F, P ). There
are four things to show.
(i) d(X, Y ) ≥ 0 since it is an integral of something non-negative.
(ii) d(X, Y ) = Eφ(|X − Y |) = Eφ(|Y − X|) = d(Y, X) follows from | · | being

a metric on the real numbers.
(iii) The triangle inequality follows from some computations
d(X, Y ) = Eφ(|X − Y |) = Eφ(|X − Z + Z − Y |)

≤ Eφ(|X − Z| + |Z − Y |) ≤ E φ(|X − Z|) + φ(|Z − Y |)
= Eφ(|X − Z|) + Eφ(|Z − Y |) = d(X, Z) + d(Z, Y )
where the first inequality follows from the triangle inequality of | · | and E, φ both
being monotonous, and the second inequality follows from E, φ being monotonous.
(iv) Eφ(|X − Y |) = 0 implies that φ(|X − Y |) = 0 almost surely by the Lebesgue-

integral, and φ(|X − Y |) = 0 implies that |X − Y | = 0 almost surely since φ(0) = 0
and φ is strictly increasing per (a), and |X − Y | = 0 implies X = Y almost surely,
since | · | is a metric on the real numbers. Since we identified random variables
that are equal almost surely, we have that X = Y and so d(X, Y ) defines a metric
on the probability space.
Ad (d): We wish to show that
d(X, Y )
P (|X − Y | ≥ ǫ) ≤
φ(ǫ)
Since φ is an increasing function we have

Eφ(|X − Y |) d(X, Y )
P (|X − Y | ≥ ǫ) = P φ(|X − Y |) ≥ φ(ǫ) ≤ =
φ(ǫ) φ(ǫ)
where the inequality uses Markov, since φ(|X − Y |) ≥ 0 by (a).
Let ǫ, η > 0 and assume that there is a sequence Xn of random variables such
that Xn → X with respect to the metric d. Since Xn converges we can choose
N ∈ N such that d(X, Y ) ≤ η · φ(ǫ) for all n ≥ N . It now follows that
d(Xn , X) η · φ(ǫ)
P (|Xn − X| ≥ ǫ) ≤ ≤ =η for all n ≥ N
φ(ǫ) φ(ǫ)
P
which is the definition of Xn −
→ X.
(e) Let ǫ > 0. We wish to show that
d(X, Y ) ≤ φ(ǫ) + P (|X − Y | ≥ ǫ).
Consider
d(X, Y ) = Eφ(|X − Y |)
= E ✶|X−Y |<ǫ φ(|X − Y |) + ✶|X−Y |≥ǫ φ(|X − Y |)

→ ≤ Eφ(ǫ) + E ✶|X−Y |≥ǫ φ(|X − Y |)

E linear + φ(ǫ) inc.
→ = φ(ǫ) + E ✶|X−Y |≥ǫ φ(|X − Y |)

φ(ǫ) constant
→ ≤ φ(ǫ) + E ✶|X−Y |≥ǫ = φ(ǫ) + P (|X − Y | ≥ ǫ),

φ ∈ [0, 1)

P
Assume that there is a sequence Xn such that Xn − → X. We wish to show that
P
Xn − → X with respect to d. Since lim sup always exists and respects weak inequa-
lities, we get
lim sup d(Xn , X) ≤ lim sup φ(ǫ) + lim sup P (|Xn − X|) ≥ ǫ)
n→∞ n→∞ n→∞
= φ(ǫ) + 0 = φ(ǫ) for all ǫ > 0

Since φ is continuous and strictly increasing we have φ(ǫ) → 0 for ǫ → 0 and thus
lim sup d(Xn , X) ≤ 0.

n→∞
Since
0 ≤ lim inf d(Xn , X) ≤ lim sup d(Xn , X) ≤ 0
always, we have
lim inf d(Xn , X) = lim sup d(Xn , X) = 0 ⇒ lim d(Xn , X) = 0

n→∞
Alternatively, and somewhat easier, instead of using lim sup we can let ǫ > 0 and let
P
η = ǫ−φ(ǫ). Since Xn − → X we can choose N ∈ N such that P (|Xn −X|) ≥ ǫ) ≤ η
for all n ≥ N , that is
d(Xn , X) ≤ φ(ǫ) + η = φ(ǫ) + (ǫ − φ(ǫ)) = ǫ
d
and thus Xn →
− X, which is what we wanted.
2.7 Exercise 2.11

Let X1 , X2 , ... ∈ R be a sequence of random variables and let p > 0. Assume that
∞
X
E|Xn − X|p < ∞.
n=1
a.s.
We wish to show that Xn −−→ X. We want to use Theorem 2.25, so let ǫ > 0 and
notice that since xp for p > 0 is an increasing function we have and we use it on
non-negative terms we get
∞
X ∞
X
P (|Xn − X| ≥ ǫ) = P (|Xn − X|p ≥ ǫp )
n=1 n=1
∞
X E|Xn − X|p
Markov → ≤
n=1
ǫp
∞
1 X
= p E|Xn − X|p < ∞
ǫ n=1
a.s.
and now Theorem 2.25 implies that Xn −−→ X, which is what we wanted.
2.7. EXERCISE 2.11 19
An alternative solution notices that Sn = nk=1 |Xn − X|p is an increasing, po-

P
sitive and measurable sequence that is convergent by assumption, so Monotone
Convergence implies
∞
X ∞
X
E|Xn − X|p = E |Xn − X|p < ∞
n=1 n=1
∞
X
⇒ |Xn − X|p < ∞ a.s.
n=1
⇒ |Xn − X|p → 0 a.s.

⇒ |Xn − X| → 0 a.s.

3 Week 3
3.1 Exercise 2.12

Let X, X1 , X2 , ... ∈ R be random variables. Assume there exists a c > 0 such that
∞
X
P (Xn > c) < ∞. (1)
n=1
We wish to show that supn Xn is almost surely finite.
By (1) Borel-Cantelli implies that
P (Xn > c i.o.)C = P (Xn ≤ c evt.) = 1

P (Xn > c i.o.) = 0 ⇒
And by definition
∞ \
[ ∞
(Xn ≤ c evt.) = (Xn ≤ c)
n=1 k=1
so if we write this using quantifiers, we get
ω ∈ (Xn ≤ c evt.) ⇔ ∃N ∈ N∀k ≥ N : Xk (ω) ≤ c

⇒ sup Xn (ω) ≤ max{X1 (ω), X2 (ω), ..., XN (ω), c} < ∞
n
because from XN onward, all members of the sequence are dominated by c. Furt-
hermore, either c is the largest, in which case the entire sequence is dominated
by c, or one of the first N elements of the sequence is the largest, call it Xj (ω),
in which case supn Xn (ω) = Xj (ω). Whichever way, the supremum will be finite.
Thus we get
(Xn ≤ c evt) ⊆ (sup Xn < ∞)

n
[Monotonicity of measures] ⇒ 1 = P (Xn ≤ c evt) ≤ P (sup Xn < ∞)

n
⇒ P (sup Xn < ∞) = 1,
n
20
3.2. EXERCISE 2.13 21
3.2 Exercise 2.13

Let X, X1 , X2 , ... ∈ R be real-valued random variables, and let Y ≥ 0 be a non-
negative variable with EY < ∞. Assume that
P (|Xn | ≤ Y ) = 1 ∀n ∈ N.

P
Xn −
→X ⇒ EXn → EX.
We want to use double-thinning on R, so let (EXnk )k∈N be a real-valued sub-

sequence of (EXn )n∈N . It has the associated sub-sequence (Xnk )k∈N of random
P P
variables. Since Xn −→ X we have Xnk − → X as well, and by Theorem 2.26
a.s.
there exists a sub-sub-sequence (Xnkm )m∈N such that Xnkm −−→ X for m → ∞.
We especially have P (|Xnkm | ≤ Y ) = 1, so by ordinary Dominated Convergence it
follows that EXnkm → EX in R. It now follows from the double-thinning principle
that EXn → EX, which is what we wanted.
3.3 Exercise 2.14

We wish to show that the double-thinning principle does not hold for almost sure
convergence. We start by showing that every sequence that converges in probabili-
P
ty but not almost surely is a counter-example. Let (Xn )n∈N such that Xn −→ X but
Xn 6→ X almost surely for some random variable X. Let (Xnk )k∈N be some sub-
P P
sequence. Since Xn − → X it follows that (Xnk )k∈N −
→ X as well. By Theorem 2.26
a.s.
there exists a sub-sub-sequence (Xnk )k∈N such that Xnk −−→ X for k → ∞. We
have just shown that given any sub-sequence, we can find a sub-sub-sequence such
that the sub-sub-sequence converges almost surely. If the double-thinning princip-
a.s.
le were true for almost sure convergence, it would now follow that Xn −−→ X. But
that is a contradiction with our assumption. Thus, the double-thinning principle
does not hold for almost sure convergence.
A specific counter-example is given in Example 2.15 in [4].
3.4 Exercise 2.15

Let X, X1 , X2 , ... be real-valued random variables. Let F, F1 , F2 , ... be the corre-
sponding distribution functions. Assume that F is continuous and let x ∈ R.
2.15(a). We wish to show that

a.s.
Xn −−→ X ⇒ Fn (x) → F (x).
Lemma. If an → a in R, then ✶(−∞,x] (an ) → ✶(−∞,x] (a) for all x ∈ R \ a.
Proof. This follows from ✶(−∞, x] being continuous on (−∞, x) and (x, ∞),
because if a ∈ (−∞, x), we have ✶(−∞, x](an ) → ✶(−∞, x](a) and conversely
if a ∈ (x, ∞).
Also notice that ✶(Y (ω)≤x) = ✶(−∞,x] (Y (ω)). Now let ω ∈ (X 6= x) ∩ (Xn → X).
By Lemma we have ✶(Xn (ω)≤x) → ✶(X(ω)≤x) and therefore (X 6= x) ∩ (Xn → X) ⊆
(✶(Xn (ω)≤x) → ✶(X(ω)≤x) ). Since (X 6= x), (Xn → X) both have full probability, we
have
1 = P ((X 6= x) ∩ (Xn → X)) ≤ P (✶(Xn (ω)≤x) → ✶(X(ω)≤x) )
by monotonicity of measures and thus P (✶(Xn (ω)≤x) → ✶(X(ω)≤x) ) = 1, i.e. ✶(Xn (ω)≤x) −−→
a.s.
✶(X(ω)≤x) . We wishR to use Dominated Convergence, so notice that ✶(−∞,x] ≤ ✶Ω

for all x ∈ R and ✶Ω = P (Ω) = 1, so by Dominated Convergence we have
Z Z Z
lim Fn (x) = lim ✶(−∞,x] Xn dP = lim ✶(−∞,x] Xn dP = ✶(−∞,x] XdP = F (x),
n→∞ n→∞ n→∞
2.15(b). We wish to show that
P
Xn −
→X ⇒ Fn (x) → F (x).
We want to use the double-thinning principle, so let (Fnk )k∈N be a sub-sequence.

The associated sub-sequence (Xnk )k∈N converges in probability to X because
the parent-sequence does, so by Theorem 2.26 there exists a sub-sub-sequence
a.s.
(Xnkm )m∈N such that Xnkm −−→ X. It now follows from Exercise 2.15(a) that
Fnkm (x) → F (x) in R, and we have thereby found a convergent sub-sub-sequence.
Since R is a metric space, it follows from the double-thinning principle that
Fn (x) → F (x), which is what we wanted.
3.5 Exercise 2.16

L2
Let X, X1 , X2 , ... be real-valued random variables. Assume that Xn −→ X. We
wish to show that if every Xn has a Gaussian distribution, then X has a Gaussian
distribution or a degenerate distribution.
Since probability measures are finite, they are especially σ-finite, so convergen-
ce in L2 implies convergence in L1 . Thus EX and EX 2 exist and are finite.
We therefore have EX = limn→∞ ξn = ξ and limn→∞ σn2 = limn→∞ V X =
limn→∞ EXn2 − (EXn )2 = EX 2 − (EX)2 = V X = σ 2 by Theorem 2.20. Now
3.6. EXERCISE 2.17 23
consider
x −(t−ξn )2
1
Z
Fn (x) = p e 2σn2 dt
2
2πσn
−∞
x−ξn
Z
σn 1 −s2 x − ξ
n
[Substitution] → = √ e dt = Φ
2
−∞ 2π σn

Since Φ is continuous, we have that Fn (x) = Φ x−ξ σn
n
→ Φ x−ξ
σ
for σ 6= 0, that
is the limit of the distribution functions is itself Gaussian. We now just need to
establish that this Gaussian is actually the distribution function F of X. But this
L2 P
follows from Exercise 2.15(b), since Xn −→ X implies Xn − → X and so by Exercise
2.15(b) Fn (x) → F (x) for all x ∈ R where P (X 6= x) = 0. So by uniqueness of
limits we have that Fn (x) → F (x) = Φ( x−ξ
σ
). Finally if φ = 0, then V X = φ2 = 0
and so X has a degenerate distribution, which is what we wanted.
3.6 Exercise 2.17
Let X1 , X2 , ... be a sequence of real-valued random variables. We wish to show

that there exists a sequence c1 , c2 , ... of real numbers such that
a.s.
cn Xn −−→ 0.
Since the Xn are real-valued, we have
\
P( |Xn | > k) = P (Xn = ∞) = 0
k∈N
and because (|Xn | > 1) ⊇ (|Xn | > 2) ⊇ ... it follows from downward continuity of
measures that
\
P( |Xn | > k) = lim P (|Xn | > k) = 0.
n→∞
k∈N
Thus for all n ∈ N there exists Kn ∈ N such that
P (|Xn | > Kn ) ≤ 2−n .

1
Let ǫ > 0 and define cn = 2 n Kn
and consider
∞ ∞
X X ǫ
P (|cn Xn − 0| > ǫ) = P (|Xn | > )
n=1 n=1
cn
∞
X
= P (|Xn | > ǫ2n Kn )
n=1
N
X ∞
X
n
= P (|Xn | > ǫ2 Kn ) + P (|Xn | > ǫ2n Kn )
n=1 n=N +1
∞
X
=S+ P (|Xn | > ǫ2n Kn )
n=N +1
by choosing N such that 2n ǫ ≥ 1 we get
∞
X ∞
X
≤S+ P (|Xn | > Kn ) ≤ S + 2−n < ∞.
n=N +1 n=N +1
a.s.
It now follows from the corollary to Borel-Cantelli that cn Xn −−→ 0, which is what
we wanted.
3.7 Exercise 2.18

Let X, X1 , X2 be real-valued random variables. Let p > 1 and assume that
sup E|Xn |p < ∞,

n∈N
P
and that Xn −
→ X.
2.18(a). We wish to show that
E|X|p ≤ sup E|Xn |p < ∞.

n
P
By Lemma 2.27 we have |Xn |p − → |X|p and so by Theorem 2.26 there exists a
a.s.
sub-sequence such that |Xnk |p −−→ |X|p and so the limit exists. We thus have
|X|p = lim |Xnk |p = lim inf |Xnk |p

k→∞ k→∞
3.7. EXERCISE 2.18 25
and so we get
E|X|p = E lim inf |Xnk |p
k→∞
[Fatou] → ≤ lim inf E|Xnk |p

k→∞
≤ lim sup E|Xnk |p

k→∞
≤ sup E|Xnk |p
k
≤ sup E|Xn |p < ∞.

n≥1
where the final inequality follows from the fact that we take the supremum over a
larger set of numbers, which can only make the supremum larger. Thus E|X|p <
∞, which is what we wanted.
2.18(b). We wish to show that for any r ∈ [1, p) and any ǫ > 0 it holds that
r/p
E |Xn − X| ✶(|Xn −X|)>ǫ ≤ 2
r r+1
sup E|Xn | p
P (|Xn − X| > ǫ)(p−r)/p .
n
We have

E |Xn − X| ✶(|Xn −X|) = E |Xn − X| (✶(|Xn −X|>ǫ) )
r r (p−r)
E(✶(|Xn −X|>ǫ) )p )(p−r)/p

(r/p)
[Extended Cauchy-Schwartz] → ≤ E|Xn − X|p
E ✶(|Xn −X|>ǫ) )(p−r)/p

(r/p)
= E|Xn − X|p
(r/p)
= E|Xn − X|p P (|Xn − X| > ǫ(p−r)/p
r/p
r+1 p
[Claim] → ≤2 sup E|Xn | P (|Xn − X| > ǫ)(p−r)/p
n
(r/p) r/p
We now prove Claim: E|Xn − X|p ≤ 2r+1 supn E|Xn |p after which we
are done. We have by the triangle inequality that
|Xn − X|p ≤ (|Xn | + |X|)p ≤ (2 max{|Xn |, |X|})p = 2p max{|Xn |p , |X|p }
We thereby get
E|Xn − X|p ≤ 2p E max{|Xn |p , |X|p }
≤ 2p E(|Xn |p + |X|p )
≤ 2p+1 sup E|Xn |p
where the final inequality follows from E|Xn |p ≤ sup E|Xn |p obviously as well as
E|X|p ≤ sup E|Xn |p by Exercise 2.18(a), and so we are done.
Lr
2.18(c). We wish to show that Xn −→ for any r ∈ [1, p). Let ǫ > 0, then
E|Xn − X|r = E|Xn − X|r ✶(|Xn −X|≤ǫ) + E|Xn − X|r ✶(|Xn −X|>ǫ)
≤ Eǫr + E|Xn − X|r ✶(|Xn −X|>ǫ)
= ǫr + E|Xn − X|r ✶(|Xn −X|>ǫ)
r/p
r r+1 p
[Exercise 2.18(b)] → ≤ǫ +2 sup E|Xn | P (|Xn − X| > ǫ)(p−r)/p
n
Since lim sup respects weak inequalities this implies

r/p
r r r+1 p
lim sup E|Xn − X| ≤ ǫ + 2 lim sup sup E|Xn | P (|Xn − X| > ǫ)(p−r)/p
n→∞ n→∞ n
= ǫr + 0 = ǫr
r/p
r+1 p
because the second term is a constant 2 supn E|Xn | times something that
P
goes to 0 P (|Xn − X| > ǫ)(p−r)/p because Xn −
→ X by assumption. So we have
0 ≤ lim inf E|Xn − X|r ≤ lim sup E|Xn − |r ≤ 0

⇒ lim inf E|Xn − X|r = lim sup E|Xn − |r = lim E|Xn − X|r = 0
Lr
which is the definition Xn −→ X and what we wanted.
4 Week 4
4.1 Exercise 1.21 [3]

Let (Xn ) be a sequence of random variables, and let J = ∩∞
n=1 σ(Xn , Xn+1 , ...)
be the tail σ-algebra. Let B ∈ B. We wish to show that (Xn ∈ B i.o.) and
(Xn ∈ B evt.) are in J .
T∞ S∞
Recalling the definition we have (Xn ∈ B i.o.) = n=1 k=n (Xn ∈ B) and
∞
[ ∞
[
(Xn ∈ B) ⊇ (Xn ∈ B) ⊇ ...
k=1 k=2
because we are taking the union over fewer and fewer sets. Thus it doesn’t matter
from which point we start intersecting, i.e. for any N ∈ N we have
∞ [
\ ∞ ∞ [
\ ∞
(Xn ∈ B) = (Xn ∈ B). (1)
n=1 k=n n=N k=n
Since Xn is a random variable it is measurable, and so (Xn ∈ B) ∈ σ(Xn ) ⊆

σ(Xn , Xn+1 , ...) for any n ∈ N, where the inclusion follows from the fact that σ(Xn )
is the smallest σ-algebra making Xn measurable, and σ(Xn , Xn+1 , ...) is the smal-
lest σ-algebra making Xn , Xn+1 , ... measurable.TSinceS(Xn ∈ B) ∈ σ(Xn , Xn+1 , ...)
it follows from theTproperties of σ-algebras that ∞ n=N
∞
k=n (Xn ∈ B) ∈ σ(Xn , Xn+1 , ...)
∞ S ∞
and thus by (1) T∞ n=1 S∞ k=n n(X ∈ B) ∈ σ(X n , X n+1 , ...) for any n ∈ N which is
equivalent to n=1 k=n (Xn ∈ B) ∈ J , which is what we wanted.
A similar argument shows (Xn ∈ B evt.) ∈ J , where we use that

∞
\ ∞
\
(Xn ∈ B) ⊆ (Xn ∈ B) ⊆ ...
k=1 k=2
because we are intersecting over fewer and fewer sets.
4.2 Exercise 1.23

Let (Xn ) be a sequence of independent random variables with Xn ∈ {0, 1} and
P
P∞(1) Xn −
P (Xn = 1) = pn . We wish to show that
a.s.
→ 0 if and only if limn→∞ pn = 0
and that (2) Xn −−→ 0 if and only if n=1 pn is finite.
27
Ad (1), we have by definition that

P
Xn −
→0 ⇔ ∀ǫ > 0 : P (|Xn − 0| > ǫ) = P (|Xn | > ǫ) → 0
Since Xn ∈ {0, 1} it holds that |Xn | > ǫ if and only if Xn = 1, so we get
P
Xn −
→0 ⇔ P (Xn = 1) = pn → 0,
a.s.
Ad (2), we have by definition that Xn −−→ 0 if and only if P (Xn → 0) = 1.
Let ω ∈ (Xn → 0). Since Xn ∈ {0, 1} it follows that ω ∈ (Xn = 0 evt.) and
so (Xn → 0) ⊆ (Xn = 0 evt.) and thus by monotonicity of measures P (Xn =
0 evt.) = 1. Taking complements, we get
0 = P ([Xn = 0 evt.]C ) = P (Xn 6= 0 i.o.) = P (Xn = 1 i.o.),
where the third equality follows from Xn ∈ {0, 1}. Since the Xn ’s are independent,
it follows from Lemma 1.3.12 (Second Borel-Cantelli) [3] that
∞
X ∞
X
pn = P (Xn = 1) < ∞,
n=1 n=1
4.3 Exercise 1.24

Let
P(Xn ) be a sequence of
Pnon-negative random variables. We wish to show that
if ∞n=1 EX n < ∞, then n
k=1 X k is almost surely convergent.
See exercise 2.7.
4.4 Exercise 1.26

Here we present an example ofPa sequence (Xn ) of independent random Pvariables
with first moment such that ∞ k=1 X k converges almost surely while ∞
k=1 EXk
diverges.
Let (Xn ) be a sequence of independent variables with distribution P (Xk = k) = k12

and P (Xk = 0) = 1 − k12 . Such a sequence exists according to a theorem due to
Kolmogorov. We first notice that Xk ∈ {0, k} almost surely, so Xk is bounded and
thus all moments exist for all k ∈ N. We have
∞ ∞ ∞
X X 1 X1
EXk = k× = =∞
n=1 n=1
k2 n=1
k
4.5. EXERCISE 1.29 29
and we have
∞ ∞
X X 1 π2
P (Xn = n) = 2
= <∞
n=1 n=1
n 6
so by Borel-Cantelli it follows that 0 = P (Xn = n i.o.) which, by taking comple-

ments, implies P (Xn 6= n evt) = P (Xn = 0 evt.) = 1, because Xn ∈ {0, n} almost
surely. Let ω ∈ (Xn = 0 evt.), we then have by definition ∀ǫ > 0∃N ∈ N : ∀n ≥
N : Xn (ω) = 0 and we get
∞
X N
X ∞
X N
X
Xn (ω) = Xn (ω) + Xn (ω) = Xn (ω) + 0 < ∞
n=1 n=1 n=N +1 n=1
term is a finite sum. Thus ω ∈ ( ∞

P
because thePfinal n=1 Xn < ∞) and soP(Xn =
0 evt.) ⊆ ( n=1 Xn < ∞) and by monotonicity of measure we have 1 = P ( ∞
∞
n=1 Xn <
∞), i.e. the sum converges almost surely, which is what we wanted.
4.5 Exercise 1.29

Let
P(X n ) be a sequence of iid random variables. Assume that there is a c such that
1 ∞ a.s.
n k=1 Xk − −→ c. We wish to show that E|X1 | < ∞ and that EX1 = c.
We first notice that

n n−1 n n−1
Xn 1X 1X 1X n−1 1 X
= Xk − Xk = Xk − Xk → 0 − 1 × 0 = 0
n n k=1 n k=1 n k=1 n n − 1 k=1
Xn a.s.
by assumption, so n
−−→ 0, i.e. P ( Xnn → 0) = 1. Consider
∞ [ ∞ \ ∞
Xn \ Xk 1
( → 0) = <
n m=1 n=1 k=n
k m
∞ \ ∞
[ Xk X
k

⊆ <1 = < 1 evt.
n=1 k=1
k k

and thus by monotonicity of measures 1 = P Xkk < 1 evt. = P (Xk < k evt.).
Taking complements we get P (Xk ≥ k i.o.) = 0, and since we have independence
the second Borel-Cantelli lemma implies
∞
X
∞> P (Xn ≥ n)
n=1
∞
X
[iid] → = P (X1 ≥ n). (1)
n=1
According to the corollary to Lemma 1.8 of [2], (1) holds if and only if E|X1 | < ∞.
Finally we need to establish that EX1 = 1. But since we have independence and
have just
Pestablished E|X1 | < ∞, it follows from the Strong LawPof Large Numbers
a.s. a.s.
that n1 ∞ n=1 X n −
−→ EX 1 , and since we have assumed that 1
n
∞
n=1 Xn −−→ c as
well, it follows by uniqueness of almost sure limits that EX1 = c, which is what
we wanted.
4.6 Exam January 2014

Let (Xn ) be a sequence of independent real-valued random variables. The distri-
bution of Xn is given by
1 2
P (Xn = −2n ) = P (Xn = 2n ) = , P (Xn = 0) = 1 −
5n 5n
4.6.1 Question 1.1

L2
We wish to compute EXn2 and show that Xn −→ 0 for n → ∞.
(Xn ) have discrete support so we get

1 n 2 1
4 n
EXn2 = (−2n )2 + 0 + (2 ) = 2 →0 (1)
5n 5n 5
And we have
4 n
L2
Xn −→ 0 ⇔ ||Xn ||22 = E|Xn | =2
EXn2 =2 → 0,
5
4.6.2 Question 1.2

a.s.
We wish to show that Xn −−→ 0 and would like to use Borel-Cantelli, so let ǫ > 0.
We have
∞
X ∞
X
P (|Xn − 0| > ǫ) = P (|Xn | > ǫ)
n=1 n=1
∞
X
= P (Xn = −2n ∪ Xn = 2n )
n=1
∞
X 2 π2
[Disjoint] → = n
= < ∞,
n=1
5 3
a.s.
it now follows from Borel-Cantelli that Xn −−→ 0, which is what we wanted.
4.6. EXAM JANUARY 2014 31
4.6.3 Question 1.3

Define the sequence (Sn ) by Sn = nk=1 Xk . We wish to show that Sn converges
P
almost surely. We wish to use Khintchine-Kolmogorov, so we note that EXn2 ∞ by
Question 1.1 and that EXn = 0 for all n ∈ N by symmetry of the distribution
function and, finally, that
∞
X X∞ 4 n
EXn2 = 2 = 2(5 − 1) = 8 < ∞.
n=1 n=1
5
Since the random variables are assumed to be independent, it follows from Khintchine-
Kolmogorov
P∞ that Sn converges almost surely and in L2 to some variable, S =
n=1 Xn , which is what we wanted.
4.6.4 Question 1.4

We wish to show that S has finite variance and compute V S. We have already
shown that Sn has second moment, so it especially has finite variance. To compute
L2
the variance, note that since Sn −→ S it follows from Theorem 2.20 in [2] that
ESn → ES and ESn2 → ES 2 , and so especially V Sn = ESn2 − (ESn )2 → ES 2 −
(ES)2 = V S by continuity. We have
n
X n
X
ESn = E Xk = EXk = 0 → 0 = ES
k=1 k=1
and we have
n
X n
X
ESn2 =E Xk2 = EXk2 → 8 = ES 2
k=1 k=1
so V S = 8 − 0 = 8, which is what we wanted. Alternatively, using independence,

one can conclude that
n
X
V Sn = V Xk
k=1
n
X
[Independence] → = V Xk
k=1
n
X
= EXk2 − (EXk )2
k=1
n
X
= EXk2 → 8 = V S,
k=1
which is the same.

4.7 Exam January 2013

Let (Xn ) be a sequence of independent variables such that Xn ∈ {−n − 1, 0, n + 1}
with
1
P (Xn = −n − 1) = P (Xn = n + 1) = , and
2(n + 1) log(n + 1)
1
P (Xn = 0) = 1 − .
(n + 1) log(n + 1)
4.7.1 Problem 2.1

We wish to show that Xn is integrable and that EXn = 0 for all n ≥ 1. Xn is mea-
surable because it is a random variable, and it is concentrated on {−n−1, 0, n+1}
so each Xn is bounded and thus has moments of all orders and is therefore espe-
cially integrable. EXn = 0 by symmetry of the probability distribution function.
4.7.2 Problem 2.2

(n+1)
We wish to show that Xn has second moment and V Xn = log(n+1) . Xn has moments
of all orders by the argument in Problem 2.1, so it especially has second moment.
Using the fact that Xn has discrete support, we compute
V Xn = EXn2 − (EXn )2 = EXn2
= (−n − 1)2 P (Xn = −n − 1) + 0 + (n + 1)2 P (Xn = n + 1)
2(n + 1)2 1
= = ,
2(n + 1) log(n + 1) (n + 1) log(n + 1)
4.7.3 Problem 2.3

1
Pn
We wish to show that n2 k=1 V Xk → 0 for n → ∞. Notice that
V Xk (k + 1) k+1 1
= = →0
k k log(k + 1) k log(k + 1)
and so by Lemma 5.1 in [2] it follows that
n
1 X V Xk
→ 0,
n k=1 k
but
n ∞ n
1 X 1 X V Xk 1 X V Xk
V Xk = ≤ →0
n2 k=1 n k=1 n n k=1 k
1
Pn
and so n2 k=1 V Xk → 0, which is what we wanted.
4.7.4 Problem 2.4

1
Pn P
We wish to show that n k=1 → 0. We want to show that it converges in L1 ,
Xk −
so consider
n n
1 1X 1X
|| Sn ||1 = || Xk ||1 ≤ ||Xk ||1
n n k=1 n k=1
n
1X
= E|Xk |
n k=1
n
1X 2(k + 1)
=
n k=1 2(k + 1) log(k + 1)
n
1X 1
= .
n k=1 log(k + 1)
1
Notice that log(n+1)
→ 0 for n → ∞, so it follows from Lemma 5.1 in [2] that
1
Pn 1 L1
n k=1 log(k+1) → 0 for n → ∞. Thus n1 Sn −→ 0, and it now follows from Lemma
P
2.21 in [2] that n1 Sn −
→ 0, which is what we wanted.
Alternatively, since the random variables are independent, we have
n
1 X 1X n
V Xk = V Xk → 0 for n → ∞,
n k=1 n k=1
where the limit is shown in Problem 2.3. We now use Chebyshev’s Inequality to
obtain
1X n n n n
1X 1X 1X
P | Xk − E Xk | > ǫ = P | Xk − EXk | > ǫ
n k=1 n k=1 n k=1 n k=1
1X n
=P | Xk − 0| > ǫ
n k=1
1X n
=P | Xk | > ǫ
n k=1
V n1 nk=1 Xk
P
[Chebyshev] → ≤ → 0,
ǫ2
P
so P (| n1 Sn | > ǫ) → 0 which is equivalent to n1 Sn −
→ 0, which is what we wanted.
4.7.5 Problem 2.5

We wish to show that P (|Xn | ≥ n+1 i.o.) = 1. We would like to use Borel-Cantelli,
so consider
X∞ ∞
X
P (|Xn | ≥ n + 1) = P (Xn = n + 1 ∪ Xn = −n − 1)
n=1 n=1
∞ ∞
X 2 X 1
[Disjoint] → = ≤ = ∞,
n=1
(n + 1) log(n + 1) n=1 n log n
where we are allowed to assume that the above sum diverges to ∞. Since the
random variables are independent, it now follows from the Second Borel-Cantelli
lemma that P (|Xn | ≥ n + 1 i.o.) = 1, which is what we wanted.
4.7.6 Problem 2.6

1 X n+1 |X | 1 X n
n+1
Xk ≥ − Xk (1)

n + 1 k=1 n+1 n k=1

and then use (1) to show that n1 nk=1 Xk does not converges almost surely to 0.
P
Consider
n+1 n
|Xn+1 | 1 X 1 X
= Xk − Xk
n+1 n + 1 k=1 n + 1 k=1
n+1 n
1 X 1 X
[Triangle inequality] → ≤ Xk + Xk
n + 1 k=1 n + 1 k=1

n+1 n
1 X 1 X
≤ Xk + Xk (2)
n + 1 k=1 n k=1

re-arranging, we get the desired inequality. We have shown in Problem 2.5 that
P (|Xn |P
≥ n+1 i.o.) = 1, so let ω ∈ (|Xn | ≥ n+1 i.o.) and assume for contradiction
that n nk=1 Xk (ω) → 0. Then there exists N ∈ N such that for all n ≥ N we
1
have n1 nk=1 Xk (ω) < 12 . By (2) there exists a Ñ ≥ N such that |XÑ (ω)| ≥ Ñ .
P
Since Ñ ≥ N we have
|XÑ (ω) |
1≤
Ñ
Ñ +1 Ñ
1 X 1 X
≤ Xk + Xk
Ñ + 1 k=1 Ñ k=1
1 1
< + = 1,
2 2
1 i.o.) ⊆ ( n1 nk=1 Xk 6→ 0) and

P
which is a contradiction. Thus, (|Xn | ≥ n +P
by monotonicity of measures we have P ( n1 nk=1 Xk 6→ 0) = 1 and therefore
soP
1 n
n k=1 Xk almost surely does not converge to 0, which is what we wanted.
5 Week 5
5.1 Exercise 2.1 [3]

Consider the probability space ([0, 1), B[0,1) , P ) where P = m is the Lebesgue-
measure. Define T (x) = 2x−[2x] and S(x) = x+λ−[x+λ], λ ∈ R. Here [·] : R → Z
is the floor function, i.e. the unique function satisfying [x] ≤ x ≤ [x] + 1 for all
x ∈ R. We wish to show that T and S are measure-preserving.
By plotting T (x), we notice that we can write it thusly

(
2x, x ∈ [0, 1/2)
T (x) = (1)
2x − 1, x ∈ [1/2, 1)
Since T is piece-wise continuous and thus measurable, by the Tuborg-lemma T

is measurable, so according to Definintion 2.1.1 in [3] we need to show that
T (P )(A) = P (T −1 (A)) = P (T (ω) ∈ A) = P (A) = m(A) for all A ∈ B[0,1) .
By Lemma 2.2.1 it suffices to show this for all elements in a ∩-stable generator for
B[0,1) , and since {[0, α) | α ∈ (0, 1)} is such a generator, it suffices to show that
m T −1 ([0, α)

= m([0, α))
for all α ∈ (0, 1). Consider
T −1 ([0, α)) = {x ∈ [0, 1) | T (x) = 2x − [2x] ∈ [0, α)}
Using (1) we have that for x ∈ [0, 1/2) it holds
α
T (x) = 2x < α ⇔ x <
2
α
⇒ T −1 ([0, α)) = [0, ). (2)
2
Similarly for x ∈ [1/2, 1) it holds
α 1
T (x) = 2x − 1 < α ⇔ x < +
2 2
1 α 1
⇒ T −1 ([0, α)) = [ , + ). (3)
2 2 2
36
5.1. EXERCISE 2.1 [3] 37
The sets in (2) and (3) are clearly disjoint, so we get
α 1 α 1
T −1 ([0, α)) = [0, ) ∪ [ , + )
2 2 2 2
1 α α 1
⇒ m T −1 ([0, α) = m([ , ) ∪ [0, + ))

2 2 2 2
α α 1
= m([0, )) + m[0, + ))
2 2 2
α α
= + = α = m([0, α)),
2 2
thus T is measure-preserving, which is what we wanted.
Since [y + n] = [y] + n for n ∈ N we have
Sλ (x) = x + λ − [x + λ]
= x + λ − [λ] − ([x + λ] − [λ])
= x + (λ − [λ)) − ([x + (λ − [λ])) = Sλ−[λ] .
And λ − [λ] ∈ [0, 1) so it is enough to consider λ ∈ [0, 1). Plotting S(x) we notice
that
(
x + λ, x ∈ [0, 1 − λ)
S(x) =
x + λ − 1, x ∈ [1 − λ, 1)
By plotting, we also notice that the pre-image depends on α as well, specifically

whether or not α < λ. If α < λ we get
x + λ < α ⇔ x < α − λ < 0 and x + λ − 1 < α ⇔ x < α − λ + 1
and the first condition never happens because x ∈ [0, 1). If α ≥ λ we get
x + λ < α ⇔ x < α − λ and x + λ − 1 < α ⇔ x < α − λ + 1 ≥ 1
and thus
(
[1 − λ, 1 − λ + α)
S −1 ([0, α)) =
[0, α − λ) ∪ [1 − λ, 1)
and in either case we have
m(S −1 ([0, α))) = α = m([0, α)),
thus S is measure-preserving, which is what we wanted.

5.2 Exercise 2.4
Consider the probability space ([0, 1), B[0,1) , P ) where P = m. Define T : [0, 1) →
[0, 1) by T (x) = x + λ − [x + λ]. T is then measure-preserving by Exercise 2.1 We
wish to show that if λ ∈ Q i.e. λ mn
, then T is not ergodic.
According to Definition 2.1.3 in [3], T is ergodic if every element F in the T-

invariant σ-algebra IT = {F ∈ B[0,1) | T −1 (F ) = F } obeys P (F ) = m(F ) ∈ {0, 1}.
Thus to show that T is not ergodic, we need to produce an element of A ∈ IT
such that m(A) ∈ (0, 1).
kn
Claim: T k (x) = x + m
− l for l ∈ Z and n, m ∈ N. This follows by induction.
Thus it holds esepcially for k = m whereby we get T m = x + n − l. Since

T (x) ∈ [0, 1) we have T m (x) ∈ [0, 1) and since x ∈ [0, 1) we have n − l =
T m (x) − x ∈ (−1, 1). But n − l ∈ Z since they are both whole numbers, and
so n − l = 0, whereby T m (x) = x, i.e. T m (x) = Id(x) = T 0 (x). We now wish to
use this fact to construct elements of IT .
Let α < 1, we then define
m−1
[
Fα = T −1 ([0, α))
i=0
and so it holds that
m−1
[
−1 −1 −i
T (Fα ) = T T ([0, α))
i=0
m−1
[
= T −1 T −i ([0, α))
i=0
m
[
= T −i ([0, α))
i=1
m−1
[
= T −i ([0, α)) ∪ T −m ([0, α)
i=1
m−1
[ m−1
[
= T 0 ([0, α)) ∪ T −i ([0, α)) = T −i ([0, α)) = Fα ,
i=1 i=0
5.3. EXERCISE 2.10 39
1
whereby Fα ∈ IT . Now choose α such that 0 < α ≤ ≤ 1. We then have
m
0 < α = m([0, α)) = m T 0 ([0, α))

m−1
[
≤m T −i ([0, α)) = m(Fα ).
i=0
Furthermore we have
m−1
[ m−1
X
−i
m(Fα ) = m T ([0, α)) ≤ m(T −i ([0, α))
i=0 i=0
m−1
X
[T is measure-preserving] → = m([0, α)) = m · α < 1
i=0
where the final inequality follows from α < m1 by construction. Thus Fα ∈ IT and
m(Fα ) ∈ (0, 1) so T is not ergodic, which is what we wanted.
5.3 Exercise 2.10

Consider the probability space ([0, 1), B[0,1) , P ) where P = m. Define T : [0, 1) →
[0, 1) by T (x) = 2x − [2x]. T is then measure-preserving by Exercise 2.1. We wish
to show that T is mixing.
Since σ([0, α)) = B[0,1) and the generator is ∩-stable, it follows from Lemma 2.2.6
that it is enough to show that
lim [0, α) ∩ T −n ([0, β)) → m([0, α))m([0, β)) = αβ

n→∞
for α, β ∈ [0, 1). It is a good idea to plot T 2 for a couple of values of n. We also
notice that T n (x) = 2n x − [2n x] by induction, and we get this idea by looking at
the plot, because it looks like we double the amount of lines in the plot each time,
each line having the same slope. We thus get that


 2n x, x ∈ [0, 21n )

2n x − 1, x ∈ [ 21n , 21n )

n
T (x) = .

 ..
 n
2n x − 2n − 1, x ∈ [ 2 −1 , 1)

2n
−n
Preparing to study T ([0, α) we notice that
α
2n x < α ⇔ x <
2n
α 1
2n x − 1 < α ⇔ x < n + n
2 2
..
.
α 1
2n x − 2n − 1 < α ⇔ x < n
+ n +1
2 2
and thus we get

2[−1 n
−n α 1 1 α 2n − 1 α 1
T ([0, α)) = [0, n ) ∪ [ n , n + n ) ∪ ... ∪ [ n , n + n + 1) =
2 2 2 2 2 2 2 i=1
n −1
2[ h i i + αi
= , .
i=0
2n 2n
Thus
p−1 hp
−n
[
−i
p + α i
T ([0, α)) ∩ [0, β) = T ([0, α)) ∪ n , max ,β
i=0
2 2n
h
p p+1
where p ∈ N is the largest number satisfying β ∈ ,
2n 2n
. Observe that
h p p + 1i
β ∈ n, n ⇔ p ≤ β2n < p + 1
2 2
i.e. p = [2n β]. Notice that
[2n β]−1 h [2n β] h
[ i i + α −n
[ i i + αi
, ⊆ T [0, α) ∩ [0, β) ⊆ , n
i=0
2n 2n i=1
2 n 2
and so by monotonicity of measures we have

[2n β]−1 h [2[n β]
!
[ i i + α
−n
h i i + α i
m n
, n
≤ m T [0, α) ∩ [0, β) ≤ m n
, n
i=0
2 2 i=1
2 2
The unions are disjoint so

[2n β]−1 h
! [2n β]−1
[ i i + α X α
n α n α α
m n
, n
= n
= ([2 β]) n
≥ (2 β − 1) n
= αβ − n
→ αβ
i=0
2 2 i=0
2 2 2 2
where we use the property [x] ≤ x ≤ [x] + 1 to get the inequality. Similarly
n β] n β]
[2[ h i i + α i [2X α α α α
m n
, n = n
= ([2n β] + 1) n ≤ (2n β + 1) n = αβ + n → αβ.
i=1
2 2 i=0
2 2 2 2
Thus limn→∞ m(T −n ([0, α)) ∩ [0, β)) = αβ, which is what we wanted.
5.4 Example 7.11 [2]

A brief note about Example 7.11. Assume (Xn ) is one or more of the following:
measure-preserving, mixing, ergodic, stationary, call these (1)-(4). Let φ : R∞ →
R, i.e. φ takes a sequence and gives a real value. If we define Yn = φ S k (Xn )n≥1 =
5.5. EXERCISE 2.14 41
φ(Xk , Xk+1 , ...) where S is the shift-operator, then (Yn ) preserves the proper-
ties (1)-(4) of (Xn ). Notice that iid sequences are especially measure-preserving,
mixing, ergodic, stationary. So if we have an example where (Xn ) is an iid sequence
k ∞
and Yk = Xk Xk+1 , then we can write Yk = φ(S (Xn )n≥1 )), where φ : R → R
such that φ (Xn )n≥1 = X1 X2 . It now follows from Example 7.11 that Yk is iid
and therefore (1)-(4).
5.5 Exercise 2.14

We say that a process (Xn ) is weakly stationary if it holds that Xn has second
moment for all n ≥ 1, EXn = EXk for all k, n ≥ 1 and Cov(Xn , Xk ) = γ(|n − k|)
for some γ : N0 → R. Assume that (Xn ) is some process such that Xn has second
moment for all n ≥ 1. We wish to show that if (Xn ) is stationary, then (Xn ) is
weakly stationary.
Assume that EXn2 < ∞ and that (Xn ) is stationary. We have already assu-
med second moments, so to show weak stationarity, we need to show the final
two conditions. By Lemma 2.3.9 [3] it follows that for all k, n ≥ 1 we have
(X1 , ..., Xn ) ∼ (X1+k , ..., Xn+k ), and for k ≥ n we especially have (Xk , ..., Xn ) ∼
(X1 , ..., Xn−(k−1) ). Since the joint distributions determine the marginal distribu-
tions, we can pick out any subset of these as long as they’re in the same place,
e.g. (Xk , Xn ) ∼ (X1 , Xn−(k−1) ). Now define
γ(m) = Cov(X1 , Xm+1 )
which exists because we have assumed second moments. Since (Xk , Xn ) ∼ (X1 , Xn−(k−1) )
It follows that
Cov(Xk , Xn ) = Cov(X1 , Xn−(k−1) ) = γ(|n − k|)
and thus (Xn ) is weakly stationary, which is what we wanted.
5.6 Exam Stok2 November ’16

Let ..., X−1 , X0 , X1 , X2 , ... ∼ N (0, 1) be independent. We recall that
r r
2 3 8
E|Xn | = , E|Xn | = , EXn4 = 3.
π π
2
Define Zn = Xn Xn−1 for n ∈ Z.
5.6.1 Question 1
We wish to show that Zn has finite second moment, and compute EZn and V Zn .
Notice that Xn ∼ N (0, 1) implies EXn = 0 and V Xn = EXn2 = 1. So we ha-

ve
EZn2 = EXn2 Xn−1
4
[Independence] → = EXn2 EXn−1 4

= 1 · 3 = 3 < ∞.
We get
2 2
EZn = EXn Xn−1 = EXn EXn−1 =0·3=0
where by V Zn = EZn2 = 3, which is what we wanted.
5.6.2 Question 2
We wish to show that Cov(Zn , Zm ) = 0 for n 6= m, and establish that (Zn )n∈Z is
weakly stationary and find the auto-covariance function.
We check two cases. For k ≥ 2, we get

Cov(Zn , Zn+k ) = EZn Zn+k − EZn EZn+k
2 2
[Question 1] → = EXn Xn−1 Xn+k Xn+k−1 −0
2 2
[Independent b/c k ≥ 2] → = EXn EXn−1 Xn+k Xn+k−1
2 2
= 0 · EXn−1 Xn+k Xn+k−1 =0
For k = 1 we get
Cov(Zn , Zn+1 ) = EZn Zn+1 − EZn EZn+1
2
= EXn Xn−1 Xn+1 Xn2 − 0
= EXn3 Xn−1
2
Xn+1
3 2
[Independence] → = EXn EXn−1 EXn+1
= EXn3 EXn−1
2
· 0 = 0.
In Question 1 we established that EZn2 < ∞ for all n ∈ Z, and EZn = 0 = EZk for
all n, k ∈ Z, so in order to show weak stationarity, we just need to find γ : R∞ → R
such that Cov(Zn , Zk ) = γ(|n − k|) according to Exercise 2.14.
We want a γ(n) such that it is equal to Cov(Zn , Zm ), and we have just shown
that
(
0, n 6= m
Cov(Zn , Zk ) =
V Z1 = 3, n=m
because Cov(Zn , Zn ) = V Zn = V Z1 = 3. So define
(
0, n 6= 0
γ(n) =
3, n=0
we then have γ(|n − k|) = Cov(Zn , Zk ), which according to Exercise 2.14 shows
that (Zn ) is weakly stationary, which is what we wanted.
5.6. EXAM STOK2 NOVEMBER ’16 43
5.6.3 Question 1.3

We wish to compute Cov(|Zn |, |Zm |) for n, m ∈ Z and use the result to determine
whether the Zn -variables are independent.
For k ≥ 2 we get
Cov(|Zn |, |Zn+k |) = E|Zn Zn+k | − E|Zn |E|Zn+k |
2 2 2 2
= E|Xn Xn−1 Xn+k Xn+k−1 | − E|Xn Xn−1 |E|Xn+k Xn+k−1 |
2 2 2 2
[Indep.] → = E|Xn |EXn−1 E|Xn+k |EXn+k−1 − E|Xn |EXn−1 E|Xn+k |EXn+k−1 =0
For k = 1 we get
Cov(|Zn |, |Zn |) = E|Zn Zn+1 | − E|Zn |E|Zn+1 |
2
= E|Xn Xn−1 Xn+1 Xn2 | − E|Xn Xn−1
2
|E|Xn+1 Xn2 |
[Indep.] → = E|Xn3 |EXn−1 2
E|Xn+1 | − E|Xn |EXn−12
E|Xn+1 |EXn2
r r r
8 2 2 2 4 2
= − = − 6= 0.
π π π π π
It now follows that (Zn ) is not independent. Because if it were, it follows that
(|Zn |) is as well, because | · | is a measurable map, and if (|Zn |) were independent,
it follows that Cov(|Zn |, |Zn+1 |) = 0, which we have just shown not to be the case.
5.6.4 Question 1.4

1
Pn
Define Wn = 2Zn + Zn−1 for n ∈ Z. We wish to show that n i=1 Wi converges
almost surely and identify the limit.
Since (Wn ) most likely aren’t independent, we cannot use the standard Strong
Law of Large Numbers, but must instead use Example 7.11 in order to apply The-
orem 7.12. Define φ : R∞ → R such that φ((Xn )n≥1 ) = 2Xn Xn−1 2 2
− Xn−1 Xn−1
because that is how Wn is defined in terms of Xn . We notice that
Wk = φ S k ((Xn )n≥1 )

Since the Xn ’s are iid it follows from Corollary 2.3.14 [3] that the Xn ’s are sta-
tionary and ergodic, and it then follows from Example 7.11 [2] that the Wk ’s are
stationary and ergodic. So in order to apply the Ergodic SLLN, we just need to
show that E|W1 | < ∞. We compute
E|W1 | = E|2Z1 + Z0 | ≤ 2E|Z1 | + E|Z0 | < ∞,
and so it follows from Theorem 7.12 that
n
1X a.s.
Wk −−→ EW1 = 2EZ1 + EZ0 = 0,
n i=1

5.6.5 Question 1.5

1
Pn
We wish to show that n i=1 Wi2 converges almost surely and identify the limit.
Define φ′ : R∞ → R such = Xn2 and notice that we can wri-

that φ (X n ) n≥1
2 ′ k

te Wk = φ S ((Wn )n≥1 ) . Question 1.4 showed that the Wn ’s are stationary and
ergodic, so it follows from Example 7.11 that the Wn2 ’s are stationary and ergodic
as well. So to apply Theorem 7.12 we just need to show that EW12 < ∞. We
compute
EW12 = E(2Z1 + Z0 )2 = 4EZ12 + EZ02 + 4EZ1 Z0

[Z0 ∼ Z1 , Lemma 2.3.9] → = 5EZ12 = 5 · 3 = 15 < ∞.
It now follows from Theorem 7.12 that

n
1 X 2 a.s.
W −−→ EW12 = 15,
n i=1 i

6 Week 6

Let (µn ) be a sequence of probability measures concentrated on N0 , i.e. µn ({x}) =
0 for all x 6∈ N0 . Let µ be another such probability measure. We wish to show that
wk
µn −→ µ if and only if limn→∞ µn ({k}) = µ({k}) for all k ≥ 0.
“⇐”: Assume that limn→∞ µn ({k}) = µ({k}) for all k ≥ 0 and define
(
µn ({x}), x ∈ N0
gn (x) =
0, otherwise
(
µ({x}), x ∈ N0
g(x) =
0, otherwise
Then µn = gn · τ and µ = g · τ , i.e. µn and µ have densities gn and g respectively

with respect to the counting measure τ . To see this, note that for any B ∈ B(R)
we have
µn (B) = µn (B ∩ N0 )
[
[B ∩ N0 ⊆ N0 ] → = µn {x}
x∈B∩N0
X
[Countable & disjoint] → = µn {x}
x∈B∩N0
Z
[Integration w.r.t. τ ] → = µn dτ
B∩N0
Z
[By construction of gn ] → = gn dτ
B∩N0
Z
[gn = 0 outside N0 ] → = gn dτ
B
which shows that µn = gn · τ . Similarly for µ = g · τ . We have by construction that

gn ({x}) → g({x}) for n → ∞ because if x ∈ N0 then gn ({x}) = µn ({x}) → µ({x})
by assumption, and if x 6∈ N0 then gn ({x}) = 0 = µ({x}). Since gn and g are den-
wk
sities of µn and µ, it follows by Scheffé’s Lemma (Lemma 3.1.9 [3]) µn −→ µ,
45
wk
“⇒”: Assume that µn −→ µ and notice that for any k ∈ N0 we have [k − 21 , k + 21 ] ⊆
(k − 1, k + 1). So by Lemma 3.1.3 [3] there exists a bounded and uniformly con-
tinuou function f ∈ Cbu (R) such that ✶[k− 1 ,k+ 1 ] (x) ≤ f (x) ≤ ✶(k−1,k+1) (x) for all
2 2
x ∈ R. Notice that f (n) = 0 for n 6= k since n 6∈ [k − 21 , k + 21 ] andR that f (n) = 1
for n = k since k ∈ (k − 1, k + 1). It now holds that µn ({k}) = f dµn because
since µn is concentrated on N0 it holds that
Z X∞
f dµn = f (i)µn ({i}) = 0 + 0 + ... + f (k)µn ({k}) + 0 + ...
i=1
= 1 · µn ({k}) = µn ({k})
R
where k is the unique number such that f (k) 6= 0. Similarly for µn ({k}) = f dµ.
wk R R
We have assumed that µn −→ µ which by definition means that f dµn → f dµ
for all f ∈ Cb (R) and since f ∈ Cbu (R) ⊆ Cb (R) we have that
Z Z
µn ({k}) = f dµn → f dµ = µ({k}),

Let µn denote the Student’s t-distribution with shape parameter n and densifty
Γ(n+ 12 ) x2 −(n+ 21 )
fn (x) = √2nπΓ(n) (1 + 2n ) . We wish to show that γn converges weakly to the
standard normal distribution.
a.s. −x2
According to Scheffe’s Lemma it is enough to show that fn (x) −−→ φ(x) = √12π e 2 ,
because it is the density of the standard normal distribution. So consider
1 Γ(n + 12 ) x2 /2 −n x2 /2 − 21
fn (x) = √ √ 1+ 1+ (1)
2π nΓ(n) n n
Notice that
x2 /2 − 21
1+ →1
n
and that
x2 /2 −n −x2
1+ →e 2
n
because we generally have that (1 + nx )n → ex for n → ∞. So far we therefore have
that
1 −x2 Γ(n + 21 )
fn (x) → √ e 2 · 1 · lim √ ,
2π n→∞ nΓ(n)
Γ(n+ 1 )
so all that is left ot show is that √nΓ(n)
2
→ 1. By re-arranging Legendre’s Doubling
Formula (Theorem 8.19 [2]) and using it on the numerator we get
√
Γ(n + 21 ) Γ(2n) π 1
√ = 2n−1 √
nΓ(n) 2 Γ(n) nΓ(n)
√
Γ(2n) π
= 2n−1 √
2 nΓ(n)2
√ √
Γ(2n) π 2π(2n)2n−1/2 e−2n
= √ √
22n−1 2π(2n)2n−1/2 e−2n nΓ(n)2
√ √
Γ(2n) π 2π(2n)2n−1/2 e−2n
=√ √ (2)
2π(2n)2n−1 e−2n 22n−1 Γ(n)2 n
According to Stirling’s Formula
Γ(2n)
√ → 1 for n → ∞
2π(2n)2n−1 e−2n
so we are left with considering

√ √ √ √ √ n n −1/2 −n −n
π 2π(2n)2n−1/2 e−2n π 2π 2n n n e e
2n−1 2
√ = √
2 Γ(n) n Γ(n)Γ(n) n
√ √ √ n n −n −n
π 2π 2n n e e
= √ √
Γ(n)Γ(n) n n
√ √
2πnn−1/2 e−n 2πnn−1/2 e−n
= → 1 · 1 for n → ∞ (3)
Γ(n) Γ(n)
by Stirling’s Formula, according to which
Γ(n)
√ →1
2πnn−1/2 e−n
and since the two expressions in (3) are the reciprocals, they also converge to 1.
wk
Thus by Scheffe µn −→ φ(x), which is what we wanted.
6.3 Exercise 3.4

Let (pn ) be a sequence in (0, 1) and let µn be the binomial distribution with suc-
cess probability pn and length n, i.e. µn ∼ bi(n, pn ). Assume that limn→∞ npn = λ
for some λ ≥ 0. We wish to show that if λ > 0 then µn converges weakly to the
Poisson distribution with parameter λ and if λ = 0 then µn converges weakly to
the Dirac measure at zero.
“λ > 0”: Notice that µn = πn · τ where

(
µn ({x}), x ≤ n ∈ N0
πn (x) =
0, otherwise
by a similar argument to the one given in Exericse 3.2. For k ∈ N0 and n > k we
have

n k n−k n 1 npn n
πn (k) = pn (1 − pn ) = k
(npn )k (1 − ) (1 − pn )−k
k k n n
n! k npn n
= k
(np n ) 1 − (1 − pn )−k
k!(n − k)!n n
n(n − 1) · · · (n − k + 1) npn n (1 − pn )−k
= (npn )k 1 −
n···n n k!
nn−1 n−k+1 npn n (1 − pn )−k
= ··· (npn )k 1 −
n n n n k!
1
→ 1 · 1 · · · 1 · λk · e−λ = µ({k}) ∼ P o(λ)
k!
where we use that npn → λ < ∞ implies pn → 0. Thus we have that the density
πn of µn converges to the density of the poisson distribution. It then follows from
Exercise 3.2 that µn converges weakly to the poisson distribution.
“λ = 0”: If k = 0, then

n 0 npn n
µn ({k}) = pn (1 − pn )n = (1 − pn )n = (1 − ) → e0 = 1
0 n
and if k 6= 0 then
λk e−λ

n k
µn ({k}) = pn (1 − pn )n − k → =0
k k!
where the limit follows from the previous argument and the limits is 0 because we
have assumed that λ = 0. But that is exactly δ0 ({k}), which is what we wanted.
6.4 Exercise 3.7

Let (ξn ) and (σn ) be sequences in R, where σn > 0. Let µn ∼ N (ξn , σn2 ) denote
the normal distribution with mean ξn and variance σn2 , that is µn = φn · m where
−(ξn −x)2
1
φn (x) = p e 2σn2 .
2
2πσn
We wish to show that µn converges weakly if and only if ξn and σn both converge.
In the affirmative case, we also want to determine which distribtuion µn converges
weakly to.
“⇐”: Assume that ξn → ξ and σn → σ and that σ 2 > 0, then by continuity

of the involved functions we get
−(ξn −x)2
1 1 −(ξ−x)2
φn (x) = p e 2σn2 → √ e 2σ2 = φ(x)
2πσn2 2πσ 2
where φ(x) is clearly the density corresponding to N (ξ, σ 2 ). It now follows from
wk
Scheffé’s Lemma that µn −→ µ where µ ∼ N (ξ, σ 2 ), which is what we wanted.
For the case where σ 2 = 0, let Xn ∼ µn and consider E(Xn − ξn ) = 0 as well

as V (Xn − ξn ) = V Xn = σn2 → 0 where we use that the variance is unaffected
by adding or subtracting constants ξn . Let ǫ > 0 be given. Using Chebyshev’s
inequality we have that
V Xn
P (|Xn − ξn | ≥ ǫ) = P (|Xn − EXn | ≥ ǫ) ≤ → 0,
ǫ2
P D
and thus Xn − ξn −
→ 0. Using that ξn → ξ, we especially have ξn −
→ ξ by using
D
Dominated Convergence on the definition of ξn −
→ ξ. It follows from Slutsky’s
Lemma (Lemma 3.3.2 [3]) that
D
Xn = Xn − ξn + ξn −
→ ξ ∼ δξ ,
where
(
1, ξ∈A
δξ (A) = ,
0, ξ 6∈ A

wk
“⇒”: Assume now that µn −→ µ to some distribution µ. We wish to show that
ξn → ξ and σn → σ for fitting ξ, σ ∈ R. We decompose the proof into four more
manageable sub-proofs. First, we consider the case where ξn = 0 for all n ∈ N.
wk
Because µn −→ µ it follows by Uniform Tightness (Lemma 3.1.6 [3]) that
lim sup([−M, M ]C ) = 0.
M →∞ n≥1
Assume for contradiction that (σn2 ) is unbounded, then there exists a sub-sequence
(σn2 k ) such that σn2 k → ∞ for k → ∞. By Uniform Tightness there exists a M0 ∈ N
such that for all M ≥ M0 we have
1
sup µn ([−M, M ]C ) ≤
n≥1 2
and so especially µn ([−M, M ]C ) ≤ 12 for all n ∈ N or equivalently µn ([−M, M ]) >

1
2
for all n. Consider the densities φnk associated with the sub-sequence. We have
φnk (x) ≤ 1 for all x and 1 is integrable on [−M0 , M0 ]. Since
2
1 −x
2
2σn
φnk (x) = p e k → 0 for k → ∞
2
2πσnk
it follows by Dominated Convergence that

Z Z
φnk (x)λ(dx) → 0λ(dx) = 0
[−M0 ,M0 ]
but we also have

1
Z
φnk (x)λ(dx) = µnk ([−M0 , M0 ]) ≥ ∀k ∈ N
[−M0 ,M0 ] 2
and so we have a contradiction. Thus (σn2 ) is not unbounded and is therefore bo-
unded. We now prove a small lemma.
Lemma. Every convergent sub-sequence of (σn2 ) converges to the same value.
Proof. If (σn2 k ) and (σn2 m ) both converge to some σ12 , σ22 respectively, it follows from
wk wk
the first part of this exercise that µnk −→ N (0, σ12 ) and that µnm −→ N (0, σ22 )
since ξn = 0 by assumption and so is assumed to converge also. But we have as-
wk
sumed that µn −→ µ, so N (0, σ12 ) = µ = N (0, σ22 ) and so σ12 = σ22 , which is what
we wanted.
Let (σn2 k ) be a given sub-sequence. It is bounded because (σn2 ) is bounded. So

(σn2 k ) has a convergent sub-sequence (σn2 km ), and since it is a sub-sub-sequence of
(σn2 ) it follows that it converges to the same value as any other convergent sub-
sub-sequence. It therefore follows by the Double Thinning Principle that (σn ) is
convergent Since we assumed ξn = 0 we thus have that (σn ) and (ξn ) are both
convergent, which is what we wanted.
For the second part, assume that ξn → ξ for some ξ and let Xn ∼ N (ξn , σn2 ).
wk
Since µn −→ µ to for some µ by assumption, it follows from Lemma 3.1.2 that
D
there exists an X ∼ µ such that Xn − → X. ξn → ξ by assumption so by Dominated
P
Convergence we especially have ξn − → ξ, and so it follows by Generalized Slutsky’s
D
Lemma (Theorem 3.3.3) that Xn − ξn − → X − ξ. But since Xn ∼ N (ξn , σn2 ) it
follows that Xn − ξn ∼ N (0, σn2 ) which means that EXn − ξn = ξn′ = 0. So now we
have a new sequence (Xn − ξn ) with ξn′ = 0 and σn′2 = σn2 , and so it follows from
the previous part of this exercise that (σn2 ) is convergent, which is what we wanted.
For the third part, assume that |ξn | ≤ k for all n ∈ N. Since (ξn ) is bounded,
6.5. EXAM JANUARY 2014 QUESTION 2 51
it can be contained in a compact subset of R and thus by Bolzano-Weierstrass it

has convergent sub-sequence (ξnk ), and since the sub-sequence itself is bounded,
it has a convergent sub-sequence (ξnkm ). Thus by the previous part (σnkm ) is also
convergent. And by our Lemma every sub-sub-sequence of σn converges to the
same value. So let (σnk ) be a sub-sequence with the associated sub-sequence (ξnk )
og ξn . By the current argument (ξnk ) has a convergent sub-sequence (ξnkm ), and it
has an associated sub-sequence (σnkm ) of (σnk ). Thus every sub-sequence of σn has
a convergent sub-sequence, that all converge to the same value by Lemma, and so
by the Double Thinning Principle (σn ) is convergent, which is what we wanted.
For the fourth and final part we just need to show that (ξn ) is bounded. Since
wk
µn −→ µ it follows by Uniform Tightness (Lemma 3.1.6 [3]) that there exists
M0 ∈ N such that
1
µn ([−M0 , M0 ]C ) < ∀n ∈ N.
2
Assume for contradiction that (ξn ) is unbounded. Then there exists n0 ∈ N such
that ξn0 > M0 . We then have that
1
> µn0 ([−M0 , M0 ]C )
2
= µn0 ((−∞, −M0 ) ∪ (M0 , ∞))
≥ µn0 ((M0 , ∞))
1
[ξn0 > M0 ] → ≥ µn0 ((ξn0 , ∞)) =
2
where µn0 ((ξn0 , ∞)) = 21 because µn0 ∼ N (ξn0 , σn0 ) is symmetric around its mean
ξn0 . This is a contradiction and thus ξn is bounded and so it follows from part 3
of the argument that (ξn ) and (σn ) both converge, which is what we wanted.
6.5 Exam January 2014 Question 2

Let (Xn ) be a sequence of independent and identically distributed real-valued
random variables with Xn ∼ N (0, 1). Define
Yn = Xn+1 (Xn + Xn+2 ).
6.5.1 Question 2.1

We wish to show that (Yn ) is stationary and ergodic.
(Xn ) is stationary and ergodic because it is iid by Corollary 2.3.14. Also Yn =

φ(S n ((Xn )n≥1 )) where S is the shift operator and φ : R∞ → R is given by
φ(X1 , X2 , ...) = X2 (X1 + X3 ). If we can show that φ is B∞ − B measurable, it
follows from Example 7.11 [2] that (Yn ) is stationary and ergodic. To that end
define φ̃ : R3 → R given by φ̃(x, y, z) = y(x + z), then φ((Xn )n≥1 ) = φ̃(X̂1 , X̂2 , X̂3 )
where X̂i : R∞ → R is the ith projection mapping such that X̂i (X1 , X2 , ...) = Xi . φ̃
is measurable because it is continuous. Furthermore, B∞ is defined to be the smal-
lest σ-algebra such that X̂n is B∞ − B measurable for every n, and thus X̂1 , X̂2 , X̂3
are especially B∞ − B measurable. So φ is the composition of measurable fun-
ctions and is therefore itself measurable. It follows from Example 7.11 that (Yn )
is stationary and ergodic, which is what we wanted.
6.5.2 Question 2.2

n
1X a.s.
Yk −−→ 0 for n → ∞.
n k=1
Consider
E|Y1 | = E|X2 (X1 + X3 )|

[Independence] → = E|X2 |E|X1 + X3 |
≤ E|X2 |E|X1 | + E|X2 |E|X3 |
2
[Identically distributed] → = 2 E|X1 | < ∞
where the inequality follows from the fact that X1 ∼ N (0, 1) and thus has first
moment.
6.6 Re-Exam January 2015 Question 3

Let U1 , U2 , ... ∼ U(0, 1) be iid. Define
Un
Vn = , for n ∈ N.
1 + Un+1
6.6.1 Question 3.1

We wish to show that V1 , V2 , ... is stationary and ergodic. Notice that
Vn = φ(S n ((Un )n≥1 ))
X1
where φ : R∞ → R is given by φ(X1 , X2 , ...) = 1+X 2
, which is B∞ − B measuarble
by the same argument as in the previous exam question. It now follows by Example
7.11 that (Vn ) is stationary and ergodic, which is what we wanted.
6.6. RE-EXAM JANUARY 2015 QUESTION 3 53
6.6.2 Question 3.2

n
1X a.s.
Vi −−→ α
n i=1
for some α ∈ R and we wish to determine α.
Notice that |Vi | ≤ Ui almost surely for all i ∈ N because 1 ≤ 1 + Ui+1 almost
surely for all i. Thus Vi is bounded and has moments of all orders, and therefore
especially has first moment, i.e. E|V1 | < ∞. It then follows from Khintchine’s
pointwise ergodic theorem (Theorem 7.12 [2]) that
n
1X a.s.
U
1
Vi −−→ E|V1 | = E
n i=1 1 + U2
1
[Independence] → = EU1 E
Z 1 + U2
1 1
[Density ✶(0,1) w.r.t. Lebesgue] → = ✶(0,1) λ(du)
2 1+u
Z 1
1 1
[Continuous on (0,1)] → = R du
2 0 1+u
1h i1 log(2)
= log(1 + u) = = α,
2 0 2
6.6.3 Question 3.3

n
a.s.
Y
Vi −−→ β
i=1
for some β ∈ R and to determine the value of β.
To make it easier on ourselves we start by considering log Vi and notice that
Un
≤ Vn ≤ Un
2
because the denominator in Vn satisfies 1 + Un + 2 ≤ 2. Taking logs this implies
log(Un ) − log(2) ≤ log(Vn ) ≤ log(Un )

so if log Un has first moment, then log Vn also has first moment. Consider
Z
E| log Un | = ✶(0,1) | log(u)|λ(du)
Z 1
=R | log(u)|du
0
Z 1
[U ∈ (0, 1) ⇒ log(U ) < 0] → = R − log(u)du
0
h i1
= − lim u log(u) − u
M →0 M
= −(−1 − 0) = 1
where we use that x log(x) → 0 for x → 0. Furthermore, we can write log Vi =

φ(S i ((Vn )n≥1 )) where S is the shift operator and φ : R∞ → R is given by
φ(X1 , X2 , ...) = log(X1 ). Thus since (Vn ) is stationary and ergodic so is (log(Vn ))
by Example 7.11 [2], and so it follows from Theorem 7.12 [2] that
n
1X a.s.
log Vk −−→ E log V1 .
n i=1
a.s.
Since E log V1 ≤ E log U1 = −1, call this fact (i), it follows that nk=1 log Vk −−→
P
a.s.
−∞. To see this, consider if that wasn’t the case. Then either (1) nk=1 log Vk −−→
P
a.s.
k < ∞ or (2) nk=1 log Vk −−→ ∞. If (1) is true, then by Theorem 7.12 it follows
P
that
n
1X a.s.
log Vk −−→ 0 = E log V1 > −1 = E log U1
n k=1
which is a contradiction with (i). If (2) is true then by Theorem 7.12

n
1X a.s.
log Vk −−→ E log V1 = k ≥ 0 > −1 = E log U1
n k=1
a.s.
Thus nk=1 log Vk −−→ −∞. But nk=1 log Vk =
P P
which
Qis also acontradiction with(i).
n Qn a.s.
log k=1 Vk and therefore log k=1 Vk − −→ −∞ and so, taking the exponen-
tial, we get that
n n
Y Y a.s.
Vk = exp log Vk −−→ 0 = β,
k=1 k=1

7 Week 7

Let (µn ) be a sequence of probability measures on (R, B) such that µn has CDF
Fn . Let µ be some other probability measure with CDF F . Assume that F is
wk
continuous and µn −→ µ. Let (xn ) be a sequence of real numbers convergning to
x. We wish to show that limn Fn (xn ) = F (x).
First solution. Let Xn ∼ µn and X ∼ µ. Then by Lemma 3.2.1 [3] we have

D wk
Xn − → X if and only if µn −→ µn . Since xn → x pointwise, we especially have that
P
xn −→ x, if we consider it as a sequence of (deterministic) random variables. Hence
D
it follows from Generalized Slutsky (Theorem 3.3.3. [3]) that Xn − xn − → X − x,
wk
or equivalently according to Lemma 3.2.1, νn −→ ν where Xn − xn ∼ νn and
X − x ∼ ν. Let νn and ν have CDFs Gn and G, respectively. We then have
Fn (xn ) = P (Xn ≤ xn ) = P (Xn − xn ≤ 0) = Gn (0)

[Lemma 3.2.1] → G(0) = P (X − x ≤ 0) = P (X ≤ x) = F (x),
where we use that G is continuous in 0 because F is continuous, which is what we

wanted.
Second solution. Because xn → x there exists N0 ∈ N such that |xn − x| < ǫ

that is x − ǫ < xn < x + ǫ for all n ≥ N0 . So for all n ≥ N0 we have
Fn (xn ) ≥ Fn (x − ǫ)
because Fn is increasing. And Fn (x − ǫ) → F (x − ǫ) by Lemma 3.2.1 because

wk
µn −→ µ. Hence we get
lim inf Fn (xn ) ≥ lim inf Fn (x − ǫ) = F (x − ǫ).

n→∞ n→∞
Similarly, we have for all n ≥ N0 that
Fn (xn ) ≤ Fn (x + ǫ),
and by Lemma 3.2.1 we have Fn (x + ǫ) → F (x + ǫ). So we get
lim sup Fn (xn ) ≤ lim sup Fn (x + ǫ) = F (x + ǫ).

n→∞ n→∞
55
Collecting the inequalities we have

F (x − ǫ) ≤ lim inf Fn (xn ) ≤ lim sup Fn (xn ) ≤ F (x + ǫ)
n→∞ n→∞
for all ǫ > 0 and so, letting ǫ → 0+ , we get by continuity of F that

F (x) ≤ lim inf Fn (xn ) ≤ lim sup Fn (xn ) ≤ F (x).
n→∞ n→∞
Thus lim inf n→∞ Fn (xn ) = lim supn→∞ Fn (xn ) limn→∞ Fn (xn ) = F (x), which is
what we wanted.
7.2 Exercise 3.9

Let µn be the measure on (R, B) concentrated on {k/n | k ≥ 1} such that
µn ({k/n}) = n1 (1 − n1 )k−1 for each k ∈ N. We wish to show that µn is a pro-
wk
bability measure and that µn −→ exp(1).
µn is already
assumed to be a measure, so we just need to show that µn {k/n |
k ≥ 1} = 1 to establish that it is a probability measure. Consider
∞ n o
!
[ k
µn {k/n | k ≥ 1} = µn
k=1
n
X∞ n k o
[Disjoint] → = µn
k=1
n
∞
X 1 1
= (1 − )k−1
k=1
n n
∞
X 1 1
= (1 − )k
k=0
n n
n
[Geometric series] → = = 1,
n
wk
so µn is a probability measure. To show that µn −→ exp(1) let Fn and F be the
CDFs of µn and µ, respectively. Notice that if x > 0 then
k
≤ x ⇔ k ≤ nx ≤ k ≤ ⌊nx⌋
n
so
⌊nx⌋
X 1 1 k−1 X 1 1
Fn (x) = µn ((−∞, x]) = (1 − ) = (1 − )k−1
k
n n k=1
n n
k: n ≤x
1 1 − (1 − n1 )⌊nx⌋ 1
[Finite geo. series] → = 1 = 1 − (1 − )⌊nx⌋ = 1 − eln(1−1/n)⌊nx⌋
n 1 − (1 − n ) n
7.3. EXERCISE 3.10 57
Now we have
1 ln(1 − n1 ) 0
ln(1 − )n = 1 ”→”
n n
0
so by L’Hopital’s Rule we can differentiate and consider the resulting limit
−1
→ −1
1 − n1
and thus ln(1 − n1 )nx → −x. Furthermore ln(1 − n1 )(nx − 1) = ln(1 − n1 )nx −
ln(1 − n1 ) → −x. So since nx − 1 ≤ ⌊nx⌋ ≤ nx by definition, it follows that
ln(1 − n1 )⌊nx⌋ → −x. So Fn (x) = 1 − eln(1−1/n)⌊nx⌋ → 1 − e−x = F (x) by continuity
of the exponential function. If x ≤ 0, then Fn (x) = 0 = F (x). Thus we have shown
that µn ∼ Fn (x) → F (x) ∼ exp(1) for all x ∈ R and thus by Theorem 3.2.3 it
wk
follows that µn −→ exp(1), which is what we wanted.
7.3 Exercise 3.10
Let X ∼ bin(n, p) = µ such that f (x) = nx px (1 − p)n−x for x ∈ {0, 1, ..., n} is the

density with respect to the counting measure τ . We wish to find the characteristic
function of X. Using the definition we get
Z
φ(θ) = eiθx dµ(x)
{0,1,...,n}

iθx n
Z
[Density w.r.t. τ ] → = e px (1 − p)n−x dτ (x)
{0,1,...,n} x
n
X n x
= eiθx p (1 − p)n−x
x=0
x
n
X n
= (peiθ )x (1 − p)n−x
x=0
x
[Binomial Theorem] → = (1 − p + peiθ )n ,

7.4 Exercise 3.11

Let X ∼ P o(λ). We wish to find the characteristic function of X. Using the
definition, we get
e−λ λx
Z
φ(θ) = eiθx dτ (x)
N0 x!
∞ −λ x
iθx e λ
X
= e
x=0
x!
∞
−λ
X (λeiθ )x
=e
x=0
x!
iθ iθ −1)
[Taylor expansion] → = e−λ eλe = eλ(e ,
7.5 Exercise 3.12

Let (Ω, E, P ) be a probability space with independent variables X and Y with
distributions µ and ν respectively, and characteristic functions φX and φY re-
spectively.
R We wish to show that the variable XY has characteristic function
ψ(θ) = φX (θy)dµ(y).
Let ρ : R2 → R be given by ρ(x, y) = xy. Using the definition and remem-

bering that the joint distribution is given by (X, Y )(P ) = P ((X, Y ) ∈ G)) =
P ((X, Y )−1 (G)) for some G ∈ E ⊗ E, we get
Z
ψ(θ) = eiθt d(XY )(P )(t)
Z
= eiθt d(ρ(X, Y ))(P )(t)
Z
[A.C.V.] → = eiθρ(x,y) d(X, Y )(P )(x, y)
Z
= eiθxy d(X, Y )(P )(x, y)
Z
[Independence, Def. 18.4 [1]] → = eiθxy d(X(P ) ⊗ Y (P ))(x, y)
Z
[X ∼ µ, Y ∼ ν] → = eiθxy d(µ ⊗ ν)(x, y)
Z Z
[Fubini, Theorem 3.4.6 [3]] → = ei(θy)x dµ(x)dν(y)
Z
= φX (θy)dν(y)
7.6. EXERCISE 3.13 59
7.6 Exercise 3.13

Consider a probability space with four independent variables X, Y, Z, W ∼ N (0, 1).
We wish to find the characteristic function of XY − ZW and argue that XY −
ZW ∼ Laplace.
We start by finding the characteristic function of XY , φXY . Using Exercise 3.12,

we get
Z
φXY = φX (θy)dY (P )(y)
Z
(yθ)2
[Example 3.4.10 [3]] → = e− 2 dY (P )(y)
1 −y2
Z
(yθ)2
[Density w.r.t. m] → = e− 2 √ e 2 dy
2π
1
Z 2 (θ 2 +1)
−y
= √ e 2 dy
2π
2
1 2(θ2−y
Z
= √ e +1) −1
dy
2π
1 −y 2
p Z
2
= (θ + 1) −1 p e 2(θ 2 +1)−1
dy
2π(θ2 + 1)−1
p
= (θ2 + 1)−1 (1)
where we get the last equality by noticing that the integrand is the density of
a normal distribution with µ = 0 and σ 2 = (θ2 + 1)−1 , which therefore in-
tegrates to 1. Since XY ∼ ZW we have φXY (θ) = φZW (θ) and by definition
φ−ZW (θ) = φZW (−θ) and since φZW is even by (1) we have φZW (−θ) = φZW (θ).
Now to determine the distribution, we notice that XY − ZW because we
|=
are dealing with measurable transformations of independent variables. Letting

XY ∼ µ and −ZW ∼ ν, it then follows from Lemma 3.4.14 [3] that XY − ZW =
XY + (−ZW ) ∼ µ ∗ ν. By Lemma 3.4.15 µ ∗ ν has characteristic function
p 2
φXY (θ)φ−ZW (θ) = φXY (θ)2 = (θ2 + 1)−1 = θ21+1 , which we recognise to be
the characteristic function of the Laplace distribution from Example 3.4.12 [3].
Since XY − ZW has the same characteristic function as the Laplace distribution,
it follows from Theorem 3.4.19 that XY − ZW is Laplace-distributed, which is
what we wanted.
7.7 Exam 2015, Problem 3 - Bernstein’s Theorem

Bernstein’s Theorem is the fact that if X + Y is indenpendent of X − Y , then X
and Y are Gaussian with the same variance. Assume that X Y , X + Y X − Y ,
|=
|=
EX 2 < ∞ and EY 2 < ∞ and that the characteristic function of X, called φ satis-
fies φ(θ) ∈ (0, ∞) for every θ ∈ R, which implies that 1θ and log(θ) are legitimate
expressions. Furthermore, assume that EX = EY = 0 and that EX 2 = 1. Let Y

have characteristic function ψ.
Consider I(θ) = (X − Y )eiθ(X+Y ) dP for θ ∈ R.

R
7.7.1 Question 3.1

We wish to verify that I(θ) is well-defined and to show that I(θ) = 0. To show it
is well-defined, consider
Z Z Z Z
iθ(X+Y )
|(X − Y )e |dP = |(X − Y )|dP ≤ |X|dP + |Y |dP < ∞
where the inequality follows from the fact that X and Y both have first moment,
since they have second moment. To show that I(θ) = 0, notice that X +Y X −Y
|=
implies X − Y eiθ(X+Y ) and call this (1). We get
|=
Z h i
I(θ) = (X − Y )eiθ(X+Y ) dP = E (X − Y )eiθ(X+Y )
(1) → = E(X − Y )Eeiθ(X+Y )

= (EX − EY )eiθ(X+Y ) = (0 − 0)eiθ(X+Y ) = 0,
7.7.2 Question 3.2

1
We wish to show that I(θ) = i
φ′ (θ)ψ(θ) − φ(θ)ψ ′ (θ) .
Notice that X Y implies XeiθX Y and vice versa, call this (2), so we get
|=
|=
Z Z Z
iθ(X+Y ) iθ(X+Y )
I(θ) = (X − Y )e dP = Xe dP − Y eiθ(X+Y ) dP
Z Z
= Xe e dP − Y eiθY eiθX dP
iθX iθY
Z Z Z Z
(2) → = Xe dP e dP − Y e dP eiθX dP
iθX iθY iθY
1 ′ 1 ′
Z Z
[Lemma 3.4.8] → = φ (θ) e dP − ψ (θ) eiθX dP
iθY
i i
1 ′ 1 ′
= φ (θ)ψ(θ) − ψ (θ)φ(θ)
i i
1 ′
= φ (θ)ψ(θ) − φ(θ)ψ ′ (θ) ,
i
7.7. EXAM 2015, PROBLEM 3 - BERNSTEIN’S THEOREM 61
7.7.3 Question 3.3

We wish to show that X and Y have the same distribution. By Question 3.1 and
3.2 we have φ′ (θ)ψ(θ) − φ(θ)ψ ′ (θ) = 0. Now consider
ψ ′ψ ′ (θ)φ(θ) − ψ(θ)φ′ (θ)
= =0
φ φ(θ)2

this implies that ψφ is constant, i.e. ψφ = k for some k ∈ R. By Lemma 3.4.8(i)
we have
ψ(0) 1
= =1
φ(0) 1
and so since the fraction is constant, it follows that ψ(0)

φ(0)
= ψ(θ)
φ(θ)
= 1 for all θ ∈ R.
Thus φ(θ) = ψ(θ) for all θ ∈ R, i.e. X and Y have the same characteristic function,
and so it follows from Theorem 3.4.19 that X and Y have the same distribution,
7.7.4 Question 3.4

Consider J(θ) = (X −Y )2 eiθ(X+Y ) dP for θ ∈ R. We wish to show three things: (1)
R
that J is well-defined, (2) that J(θ) = 2φ(θ)2 and (3) that J(θ) = −2φ′′ (θ)φ(θ) +
2φ′ (θ)2 .
Ad 1: Consider
Z Z
2 iθ(X+Y )
|(X − Y ) e |dP = (X − Y )2 dP ≤ EX 2 + EY 2 = 2 < ∞,
so J is well-defined.
Ad 2: Consider
Z Z Z
2 iθ(X+Y ) 2
J(θ) = (X − Y ) e dP = (X − Y ) dP eiθ(X+Y ) dP
Z Z Z
2 iθX
= (X − Y ) dP e dP eiθY dP
Z
= (X − Y )2 dP φ(θ)φ(θ)
= E (X − Y )2 φ(θ)2

We know that EX − Y = 0 by assumption so (EX − Y )2 = 0 and so V (X − Y ) =

E (X − Y )2 . But since X Y we have V (X − Y ) = V (X + (−Y )) = V X +

|=
(−1)2 V Y = V X + V Y which is equal to 2 by assumption, and so we get
E (X − Y )2 φ(θ)2 = 2φ(θ)2 .

Ad 3: Consider
Z
J(θ) = (X − Y )2 eiθ(X+Y ) dP
Z Z Z
2 iθ(X+Y ) 2 iθ(X+Y )
= X e dP + Y e dP − 2 XY eiθ(X+Y ) dP
Z Z Z
= X e e dP + Y e e dP − 2 XeiθX Y eiθY dP
2 iθX iθY 2 iθY iθX
Z Z Z Z
[ ] = X e dP e dP + Y e dP eiθX dP
2 iθX iθY 2 iθY
|=
Z Z
− 2 Xe dP Y eiθY dP
iθX
1 ′′ 1 2
[Lemma 3.4.8] → = 2
φ (θ)ψ(θ) + 2 ψ ′′ (θ)φ(θ) − 2 φ′ (θ)ψ ′ (θ)
i i i
′ 2 ′′
[φ(θ) = ψ(θ)] → = 2φ (θ) − 2φ (θ)φ(θ),
7.7.5 Question 3.5

We wish to show that X ∼ N (0, 1).
Using Question 3.4, we consider
d2 d φ′ (θ) φ′′ (θ)φ(θ) − φ′ (θ)φ′ (θ)

log(φ(θ)) = =
dθ2 dθ φ(θ) φ(θ)2
− 12 (−2φ′′ (θ)φ(θ) + 2φ′ (θ)2 )
=
φ(θ)2
−J(θ)
=
2φ(θ)2
−2φ(θ)
[Question 3.4] → = = −1
2φ(θ)
thus log(φ(θ)) is a second order polynomial with second order coefficient equal to
-1, so we can write its Taylor expansion as
d2

d dθ 2 θ=0
log(φ(θ))
log(φ(θ)) = log(φ(0)) + log(φ(θ)) +
dθ θ=0 2
′ 2 2
φ (0) −θ θ
[φ(0) = 1 + Q3.4] → = θ+ = φ′ (0)θ −
φ(0) 2 2
so taking the exponential on both sides gives

′ θ2
φ(θ) = eφ (0)θ− 2 .
7.7. EXAM 2015, PROBLEM 3 - BERNSTEIN’S THEOREM 63
Using Lemma 3.4.8 we now consider

Z Z
′ i0X
φ (0) = i Xe dP = i XdP = iEX = 0
where the last equality follows by assumption. Thus

′ θ2 θ2
φ(θ) = eφ (0)θ− 2 = e− 2
which is exactly the characteristic function of the standard normal distribution.

Thus X has the characteristic function of a standard normal distribution, and it
then follows from Theorem 3.4.19 [3] that X ∼ N (0, 1), which is what we wanted.
Bibliografi
[1] Hansen, Ernst (2015). Measure Theory, Københavns Universitetsforlag.
[2] Hansen, Ernst (2018). Notes on Advanced Probability Theory.
[3] Sokol, Alexander & Rønn-Nielsen, Anders (2016). Advanced Probability 4th
edition, Department of Mathematical Sciences, University of Copenhagen.
[4] Thorbjørnsen, Steen (2014). Grundlæggende Mål- og Integralteori, Aarhus

Universitetsforlag.
[5] Thorbjørnsen, Steen (2015). Videregående Sandsynlighedsteori, Institut for

Matematiske Fag, Aarhus Universitet.
64

Solutions To Exercises in Advanced Proba PDF

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Solutions To Exercises in Advanced Proba PDF

Uploaded by

Copyright:

Available Formats

1

4.7.1 Problem 2.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

7.7.5 Question 3.5 . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

1.1 Exercise 1.1

1.2 Exercise 1.2

1.3 Exercise 1.3

1.3(a). Assume that Y U . We wish to show by explicit computation in the

which is what we wanted.

1.3(b). We now wish to show that

1.3(c). Let Y1 , Y2 , ... be a sequence of random variables from the Y mechanism

We now wish to show that Eτ = c. Since τ is a discrete variable, we get

1.3(d). We now define

We wish to show that

1.3(e). We wish to show that P (X ≤ x) = F (x), that is X has the desired

1.4 Exercise 1.4

1.5 Exercise 1.5

1.5(a). Let b > 0. We wish to show that

Rearranging gives us (4).

1.6 Exercise 1.6

Consider c = sup{x | P (X ≤ x) = 0}. It holds that c < ∞ since F has to become

which is what we wanted.

2.1 Exercise 2.2

We wish to show that

|Yn (ω) − 0| = |Yn (ω)| ≤ |Xn (ω)| ≤ ǫ, for all n ≥ N

and thus Yn (ω) → 0 pointwise. It follows that

which is what we wanted.

2.2 Exercise 2.4

We show each direction in turn.

” ⇐ ”: This follows directly from Lemma 2.14.

2.3 Exercise 2.5

Let Yn = supk≥n |Xn − X|. This is a decreasing sequence in n, as when n in-

Yn (ω) → 0 ⇔ ∀ǫ > 0∃N ∈ N : sup |Xk (ω) − X(ω)| < ǫ ∀n ≥ N

⇔ ∀ǫ > 0∃N ∈ N : |Xn (ω) − X(ω)| < ǫ ∀n ≥ N

2.4 Exercise 2.6

Mn = max{U1 , ..., Un } for n ∈ N.

We wish to show that Mn → 1 almost surely and in Lp for p ≥ 1.

We start by showing almost sure convergence and begin by noticing that Mn

which is the definition of convergence in probability, which is what we wanted.

lim E|1 − Mn (ω)|p = lim E(1 − Mn (ω))p

[M.C.] → = E(1 − lim Mn (ω))p = E(1 − 1)p = 0

2.5 Exercise 2.7

Let Sn = ni=1 Xi for n ∈ N. We wish to show that there is a random variable S

We start by showing L1 convergence. Recall that L1 is a Banach-space and there-

where we use that Xn ≥ 0 and our assumption (1). By completeness it follows

To show convergence almost surely, we notice that Sn is an increasing sequen-

2.6 Exercise 2.9

(b): We wish to show that

φ(x + y) ≤ φ(x) + φ(y).

(i) d(X, Y ) ≥ 0 since it is an integral of something non-negative.

(ii) d(X, Y ) = Eφ(|X − Y |) = Eφ(|Y − X|) = d(Y, X) follows from | · | being

(iii) The triangle inequality follows from some computations

d(X, Y ) = Eφ(|X − Y |) = Eφ(|X − Z + Z − Y |)

(iv) Eφ(|X − Y |) = 0 implies that φ(|X − Y |) = 0 almost surely by the Lebesgue-

on the probability space.

Ad (d): We wish to show that

Since φ is an increasing function we have

where the inequality uses Markov, since φ(|X − Y |) ≥ 0 by (a).

(e) Let ǫ > 0. We wish to show that

d(X, Y ) ≤ φ(ǫ) + P (|X − Y | ≥ ǫ).

= E ✶|X−Y |<ǫ φ(|X − Y |) + ✶|X−Y |≥ǫ φ(|X − Y |)

→ ≤ Eφ(ǫ) + E ✶|X−Y |≥ǫ φ(|X − Y |)

→ = φ(ǫ) + E ✶|X−Y |≥ǫ φ(|X − Y |)

→ ≤ φ(ǫ) + E ✶|X−Y |≥ǫ = φ(ǫ) + P (|X − Y | ≥ ǫ),