You are on page 1of 63

Lecture notes

Probability and Measure


Gábor Szabó
KU Leuven – G0P63a
Academic Year 2022–2023, Semester 1

Contents
1 Lebesgue’s Integration Theory 1
1.1 σ-Algebras and Measurable Spaces . . . . . . . . . . . . . . . 1
1.2 Measurable functions . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Measures on σ-algebras, Measure Spaces . . . . . . . . . . . . 5
1.4 Integration theory . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.5 Convergence Theorems . . . . . . . . . . . . . . . . . . . . . . 18

2 Carathéodory’s Construction of Measures 22


2.1 Measures on Semi-Rings and Rings . . . . . . . . . . . . . . . 22
2.2 Outer measures . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.3 Application: The Lebesgue measure on R . . . . . . . . . . . . 29
2.4 Application: Lebesgue–Stieltjes Measures . . . . . . . . . . . . 32
2.5 Product measures . . . . . . . . . . . . . . . . . . . . . . . . . 34
2.6 Infinite products of probability measures . . . . . . . . . . . . 37

3 Probability 39
3.1 Probability spaces and random variables . . . . . . . . . . . . 39
3.2 Independence . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
3.3 Law of Large Numbers . . . . . . . . . . . . . . . . . . . . . . 45
3.4 Central Limit Theorem . . . . . . . . . . . . . . . . . . . . . . 49

4 Selected further topics 58


4.1 Tail events . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
4.2 Radon–Nikodym theorem . . . . . . . . . . . . . . . . . . . . 59

1
1 Lebesgue’s Integration Theory
1.1 σ-Algebras and Measurable Spaces
Definition 1.1. Let Ω be a non-empty set. A σ-algebra M on Ω is a set of
subsets of Ω satisfying
A1. ∅ ∈ M.

A2. If E ∈ M, then E c ∈ M.

A3. For any sequence En ∈ M, one has En ∈ M.


S
n∈N

We will refer to the pair (Ω, M) as a measurable space.


Remark 1.2. A σ-algebra M is closed with respect to all countable oper-
ations on sets one can perform using complement, union, intersection, and
difference. In fact, any intersection and difference can be rewritten in terms
of unions and complements, namely

E ∩ F = (E c ∪ F c )c , E \ F = (E c ∪ F )c .

Example 1.3. The entire power set 2Ω of Ω is the largest possible σ-algebra
on Ω, whereas {∅, Ω} is the smallest one. We may also consider

M = {E ⊆ Ω | E or E c is countable} ,

which sits strictly in between if Ω is an uncountably infinite set.


Proposition 1.4. If {Mi }i∈I is an arbitrary family of σ-algebras on a set
Ω, then the intersection i∈I Mi is a σ-algebra. Consequently, if S ⊆ 2Ω is
T

a family of subsets of Ω, then there exists a smallest σ-algebra M containing


S. We will refer to it as the σ-algebra generated by S.
Proof. It is a trivial exercise to prove the first part of the statement. For the
second part, if S is given, consider the family
n o
M ⊆ 2Ω | M is a σ-algebra with S ⊆ M .

Since this family contains 2Ω , it is non-empty, and hence the intersection of


this family is the smallest possible σ-algebra that contains S.
Example 1.5. Recall that a topology T on a set Ω is a family of subsets of
Ω satisfying
T1. ∅, Ω ∈ T .

2
T2. For any family of sets O ⊆ T , one has O∈T.
S

T3. For O1 , O2 ∈ T , one has O1 ∩ O2 ∈ T .

If we are given a topological space (Ω, T ), then the σ-algebra generated by


T is called the Borel-σ-algebra of (Ω, T ). Its elements are called the Borel
subsets of Ω.

Remark 1.6. Given a topology T as above, a base B ⊆ T is a subset with


the property that every set O ∈ T can be expressed as a union of a family of
sets in B. If T has a countable base B, then every σ-algebra M containing B
will necessarily also contain T . It follows that the Borel-σ-algebra generated
by T coincides with the σ-algebra generated by B.

Proposition 1.7. Let (Ω, d) be a separable metric space. Then Ω has a


countable base consisting of open balls, i.e., sets of the form

B(x, r) = {y ∈ Ω | d(x, y) < r}

for x ∈ Ω and r > 0. In particular, the Borel-σ-algebra on Ω coincides with


the σ-algebra generated by all open balls.

Proof. In light of the previous remark, the second part of the statement
follows automatically if we can show that the metric topology on Ω has a
countable base consisting of open balls. As we assumed Ω to be separable,
choose a countable dense set D ⊆ Ω, and consider

B = {B(x, r) | x ∈ D, 0 < r ∈ Q} .

This is a countable family of open balls, and we claim that it is a base for
the metric topology. Indeed, let O ⊆ Ω be an open set. For x ∈ D ∩ O, set
Rx = {0 < r ∈ Q | B(x, r) ⊆ O}. Then evidently
[
{B(x, r) | x ∈ D ∩ O, r ∈ Rx } ⊆ O.

We claim that this is an equality: Given y ∈ O, it follows from the definition


of openness that there exists δ > 0 with B(y, δ) ⊆ O. By making δ smaller,
if necessary, we may assume δ ∈ Q. Since D is a dense subset of Ω, there
exists x ∈ D with d(x, y) < δ/2. By the triangle inequality, we have y ∈
B(x, δ/2) ⊆ B(y, δ) ⊆ O. In summary, we have shown that O is the countable
union of sets in B, which finishes the proof.

3
Example 1.8. Let us consider Ω = R with the standard topology. Then
half-open intervals are Borel because, for example, for a, b ∈ R with a < b,
one has
1
[a, b) = (a − , b).
\

n∈N n
Singleton sets are also Borel since they are closed. To summarize, there are
in general many more Borel sets than open sets. Note that in light of the
previous remark, the Borel-σ-algebra on R is generated by all open intervals.

1.2 Measurable functions


Definition 1.9. Let (Ω1 , M1 ) and (Ω2 , M2 ) be two measurable spaces. A
map f : Ω1 → Ω2 is called measurable, if for every subset E ⊆ Ω2 , E ∈ M2
implies f −1 (E) ∈ M1 .

Remark 1.10. The composition of two measurable maps is always measur-


able.

Proposition 1.11. Let (Ω1 , M1 ) and (Ω2 , M2 ) be two measurable spaces


and f : Ω1 → Ω2 a map. Suppose that M2 is the σ-algebra generated by a
set S ⊆ 2Ω2 . Then f is measurable if and only if for all O ∈ S, one has
f −1 (O) ∈ M1 .

Proof. The “only if” part is trivial, so let us consider the “if part”. Consider
the set n o
M = O ⊆ Ω2 | f −1 (O) ∈ M1 .
By assumption, we have S ⊆ M. The claim amounts to showing that M2 ⊆
M, and by assumption on S, it therefore suffices to show that M is a σ-
algebra. But this will be part of the exercise sessions.

Proposition 1.12. Let (Ω, M) be a measurable space and (Y, T ) a topological


space. For a map f : Ω → Y , the following are equivalent:

(i) f is measurable with respect to the Borel-σ-algebra on Y .

(ii) For all O ∈ T , one has f −1 (O) ∈ M.

If furthermore there exists a countable base B ⊆ T , then this is equivalent to

(iii) For all O ∈ B, one has f −1 (O) ∈ M.

4
Proof. By definition of the Borel-σ-algebra as the one generated by T , the
equivalence (i)⇔(ii) becomes a special case of Proposition 1.11. If we further-
more assume that B is a countable base of T , then the equivalence (i)⇔(iii)
is also a special case of Proposition 1.11 in light of Remark 1.6.

Theorem 1.13. Let (Ω, M) be a measurable space and (Y, d) a metric space.
We equip Y with the Borel-σ-algebra associated to the metric topology. Sup-
pose that a sequence of measurable functions fn : Ω → Y converges to a map
f : Ω → Y pointwise. Then f is measurable.

Proof. As a consequence of Proposition 1.12, it suffices to show that the


preimages of open sets under f belong to M. Since preimages respect com-
plements of sets, it suffices to show that the preimage of every closed set
C ⊆ Y belongs to M. Since we have a metric space, we have
1
\  
C=C= inf d(x, y)

y∈Y <
k∈N |
x∈C
{z k}
=:Ck

Then each of the sets Ck is open. For every x ∈ Ω, we use fn (x) → f (x) and
observe
x ∈ f −1 (C)
⇔ f (x) ∈ C
⇔ ∀ k ∈ N : f (x) ∈ Ck
fn (x)→f (x)
⇔ ∀ k ∈\N :[∃ n\
0 ∈ N : ∀ n ≥ n0 : fn (x) ∈ Ck
⇔ x∈ fn−1 (Ck ).
k∈N n0 ∈N n≥n0

In summary, the preimage f −1 (C) can be realized as a countable intersection


of countable unions of countable intersections of sets in M, and hence also
belongs to M.

Notation 1.14. We will equip [0, ∞] := [0, ∞) ∪ {∞} with the topology of
the one point compactification, that is, we define a subset O ⊆ [0, ∞] to be
open when the following holds:

• if 0 ∈ O, then there exists ε > 0 with [0, ε) ⊆ O.

• for every x ∈ O ∩ (0, ∞), there exists ε > 0 with (x − ε, x + ε) ⊆ O.

• if ∞ ∈ O, then there is b ≥ 0 with (b, ∞] ⊆ O.

5
This topology is induced by a metric, for example by
1 1

d(x, y) =

− ,
1 + x 1 + y
where we follow the convention ∞ 1
:= 0. We extend the usual addition and
multiplication from [0, ∞) to [0, ∞] by defining1
x + ∞ := ∞, x · ∞ := ∞ (x > 0), 0 · ∞ := 0.
Then the addition map + : [0, ∞] × [0, ∞] → [0, ∞] is continuous, but this is
not true for the multiplication map.2 We also extend the usual order relation
“≤” of numbers to [0, ∞] in the obvious way.

1.3 Measures on σ-algebras, Measure Spaces


Definition 1.15. Let (Ω, M) be a measurable space. A (positive) measure
µ on (Ω, M) is a map M → [0, ∞] satisfying:
M1. µ(∅) = 0.
M2. µ is σ-additive, i.e., for every sequence En ∈ M consisting of pairwise
disjoint sets, one has
 [ 
En = µ(En ).
X
µ
n∈N n∈N

The triple (Ω, M, µ) is called a measure space. If µ(Ω) < ∞, we call µ a finite
measure, and if more specifically µ(Ω) = 1, we call it a probability measure
and the triple (Ω, M, µ) a probability space. If there exists a sequence En ∈
M with µ(En ) < ∞ and Ω = n∈N En , then we say that µ is σ-finite.
S

Remark 1.16. Note that


• the series n∈N µ(En ) is a series in [0, ∞], which we may define as the
P

supremum supN ≥1 N n=1 µ(En ).


P

• σ-additivity implies finite additivity: If E1 , E2 ∈ M are two disjoint


sets, then
µ(E1 ∪ E2 ) = µ(E1 ∪ E2 ∪ ∅ ∪ ∅ ∪ . . . )
= µ(E1 ) + µ(E2 ) + µ(∅) + µ(∅) + . . .
| {z }
=0
= µ(E1 ) + µ(E2 ).
1
Keep in mind that we implicitly force everything to be commutative, so the order of
addition and multiplication does not matter here by default.
2
Convince yourself why this is not the case!

6
• If there is at least some E ∈ M with µ(E) < ∞, then σ-additivity
already implies µ(∅) = 0:

µ(E) = µ(E ∪ ∅ ∪ ∅ ∪ . . . )
= µ(E) + ∞ · µ(∅).

If µ(E) < ∞, then the above can only happen when µ(∅) = 0.

Example 1.17 (Counting measure). For any non-empty set Ω, we can con-
sider the σ-algebra 2Ω and define a measure µ via

#E , E is finite
µ(E) =
∞ , E is infinite.

Example 1.18 (Dirac measure). If (Ω, M) is any measurable space with a


distinguished point x ∈ Ω, one can define the measure δx via

1 , x∈E
δx (E) = 
0 , x∈
/ E.

Example 1.19. Consider R with its Borel-σ-algebra B. We will later con-


struct the Lebesgue measure on B, which is a unique measure µ : B → [0, ∞]
with the property that µ([a, b]) = b − a for all a, b ∈ R with a ≤ b. An anal-
ogous unique measure exists on Rn as well.

Proposition 1.20. Let (Ω, M, µ) be a measure space. Then µ has the fol-
lowing extra properties:

(i) µ is monotone, i.e., for E, F ∈ M with E ⊆ F one has µ(E) ≤ µ(F ).


If µ(F ) < ∞, then µ(E) = µ(F ) − µ(F \ E).

(ii) µ is subadditive, i.e., for E, F ∈ M one has µ(E ∪ F ) ≤ µ(E) + µ(F ).

(iii) µ is σ-subadditive, i.e., for a sequence En ∈ M one has µ( En ) ≤


S
n∈N
n∈N µ(En ).
P

(iv) If En ∈ M is an increasing sequence (w.r.t. “⊆”), then µ( En ) =


S
n∈N
supn∈N µ(En ).

(v) If En ∈ M is a decreasing sequence and µ(E1 ) < ∞, then µ( En ) =


T
n∈N
inf n∈N µ(En ).

7
Proof. (i): If E ⊆ F , then we may write F = E ∪ (F \ E) as a disjoint
union, so it follows from additivity that µ(F ) = µ(E) + µ(F \ E) ≥ µ(E). If
µ(F ) < ∞, we also obtain the last part of the claim by substracting µ(F \E).
(ii): One has E ∪ F = (E \ F ) ∪ F , which is a disjoint union. Thus
µ(E ∪ F ) = µ(E \ F ) + µ(F ) ≤ µ(E) + µ(F ).
(iii)+(iv): We construct a sequence Fn ∈ M as follows. We set F1 = E1
and Fn = En \( k<n Ek ) for n ≥ 2. Then the sequence Fn consists of pairwise
S

disjoint sets, but at the same time one has that k≤n Ek = k≤n Fk for all
S S

n ≥ 1. Hence µ( n∈N En ) = µ( n∈N Fn ) = n∈N µ(Fn ) ≤ n∈N µ(En ), which


S S P P

proves (iii). For (iv), additionally assume that En was increasing. Then
 [ 
= µ(Fn )
X
µ En
n∈N n∈N
= sup µ(Fk )
X
n∈N k≤n
 
= sup µ
[
Fk
n∈N k≤n
= sup µ(En ).
n∈N

(v): Consider Fn = E1 \ En for all n ≥ 1. As En was assumed to be


decreasing,
 the sets Fn will be increasing, and moreover n∈N Fn = E1 \
S

En . By using (i) and (iv), we see
T
n∈N

µ(E1 ) − inf µ(En ) = sup µ(Fn )


n∈N n∈N
 
= µ
[
Fn
n∈N  \ 
= µ(E1 ) − µ En
n∈N

This implies the claim.


Remark 1.21. In condition (v) above, it is really necessary to assume that
at least one of the sets En has finite measure. For example, we may consider
Ω = N with the counting measure µ, and the sets En = {k ∈ N | k ≥ n}.
Then µ(En ) = ∞ for all n, the sequence is decreasing, and n∈N En = ∅.
T

We omit the proof of the following statement, which is a very simple


exercise:
Proposition 1.22. Let (Ω, M) be a measurable space with two measures
µ1 , µ2 . For any numbers α1 , α2 ≥ 0, the map
α1 µ1 + α2 µ2 : M → [0, ∞], E 7→ α1 µ1 (E) + α2 µ2 (E)
is a measure.

8
Definition 1.23. Let (Ω, M, µ) be a measure space. A subset N ⊆ Ω is
called a null set if there is some E ∈ M with N ⊆ E and µ(E) = 0.
If P (x) is a statement about elements x ∈ Ω that can either be true or
false, we say that P holds almost everywhere (w.r.t. µ) if the set of elements
x ∈ Ω for which P (x) is false is a null set.
Notation 1.24. For brevity, we may also say “P holds a.e.” (a.e.=almost
everywhere) or “P (x) holds for (µ-)almost all x”.

1.4 Integration theory


Notation 1.25. For a set Ω with a subset E ⊆ Ω, its indicator function (or
characteristic function) is defined via

1 , x∈E
χE : Ω → {0, 1} , χE (x) =
0 , x∈
/ E.
Definition 1.26. Let Ω be a set. A simple function (or step function) on Ω
is a function with finite range. In particular, a simple function s : Ω → Y
with Y ∈ {[0, ∞), [0, ∞], R, C} can always be written as
s=
X
λ · χs−1 (λ) .
λ∈s(Ω)

We will refer to this as the canonical form for s.


Proposition 1.27. Let (Ω, M) be a measurable space. Let s : Ω → Y be a
simple function, and assume that Y is equipped with a σ-algebra that contains
all singleton sets. (In particular this is the case for Y ∈ {[0, ∞), [0, ∞], R, C}.)
Then s is measurable if and only if s−1 (λ) ∈ M for all λ ∈ s(Ω).
Proof. The “only if” part is clear. Since we assumed the σ-algebra on Y
to contain all singleton sets, it follows in particular that s−1 (y) ∈ M for all
y ∈Y.
For the “if” part, let A be a measurable subset of Y . Then s−1 (A) =
λ∈s(Ω) s (A ∩ {λ}). By assumption this is a finite union of sets of the form
S −1

s−1 (λ) for λ ∈ s(Ω), and hence if these are measurable, then so is s−1 (A).
Definition 1.28. Let (Ω, M, µ) be a measure space. Let s : Ω → [0, ∞) be
a measurable simple function with canonical form s = λ∈s(Ω) λ · χs−1 (λ) . We
P

define the integral of s over Ω with respect to µ as


Z
s dµ := λ · µ(s−1 (λ)).
X

λ∈s(Ω)

(Keep in mind the convention 0 · ∞ = 0.)

9
Lemma 1.29. Let (Ω, M, µ) be a measure space. For all k ≥ 0, numbers
α1 , . . . , αk ≥ 0, and sets A1 , . . . , Ak ∈ M, the function s = kj=1 αj · χAj is a
P

positive measurable simple function with Ω s dµ = kj=1 αj µ(Aj ).


R P

Proof. It is clear that these are positive measurable simple functions, so the
relevant part of the claim is the last equation. To prove it, we proceed by
induction on k. Let us first consider k = 1, so s = α1 χA1 . If α1 = 0, there is
nothing to prove, so let us assume α1 ̸= 0. Then the canonical form of s is
s = α1 χA1 + 0 · χAc1 , and hence Ω s dµ = α1 µ(A1 ) + 0 · µ(Ac1 ) = α1 µ(A1 ).
R

Now suppose that ℓ ≥ 1 is any natural number and that the statement
of the Lemma is true for all k ≤ ℓ. Suppose that s = ℓ+1 αj χAj is a step
P
Pℓj=1
function with ℓ + 1 non-zero summands. Write s0 = j=1 αj χAj . We will
try to find the canonical form for the sum s = s0 + αℓ+1 χAℓ+1 in terms of the
canonical form of s0 . Let λ ∈ [0, ∞). Then

/ s0 (Ω) ∪ (s0 (Ω) + αℓ+1 )





 ∅ , λ∈
s−1 (λ − α
ℓ+1 ) ∩ Aℓ+1 λ ∈ (s0 (Ω) + αℓ+1 ) \ s0 (Ω)

,

s−1 (λ) = 0−1




s0 (λ) \ Aℓ+1 , λ ∈ s0 (Ω) \ (s0 (Ω) + αℓ+1 )
[s0 (λ − αℓ+1 ) ∩ Aℓ+1 ] ∪ [s0 (λ) \ Aℓ+1 ] ,
 −1 −1
λ ∈ s0 (Ω) ∩ (s0 (Ω) + αℓ+1 )

This allows us to write the canonical form of s as


s =
X
λ · χs−1 (λ−αℓ+1 )∩Aℓ+1
0
λ∈(s0 (Ω)+αℓ+1 )\s0 (Ω)
+
X
λ · χs−1 (λ)\Aℓ+1
0
λ∈s0 (Ω)\(s0 (Ω)+αℓ+1 )
+
X
λ · χ[s−1 (λ−αℓ+1 )∩Aℓ+1 ]∪[s−1 (λ)\Aℓ+1 ]
0 0
λ∈s0 (Ω)∩(s0 (Ω)+αℓ+1 )

We compute its integral as follows by using our computation rules for mea-
sures and the fact that Ω = λ∈s0 (Ω) s−1
0 (λ):
F

Z
s dµ = 0 (λ − αℓ+1 ) ∩ Aℓ+1 )
λ · µ(s−1
X

λ∈(s0 (Ω)+αℓ+1 )\s0 (Ω)
+ 0 (λ) \ Aℓ+1 )
λ · µ(s−1
X

λ∈s0 (Ω)\(s0 (Ω)+αℓ+1 )  


+ λ · µ [s−1
0 (λ − αℓ+1 ) ∩ Aℓ+1 ] ∪ [s0 (λ) \ Aℓ+1 ]
−1
X

λ∈s0 (Ω)∩(s0 (Ω)+αℓ+1 )


= 0 (λ) \ Aℓ+1 ) +
λ · µ(s−1 (λ + αℓ+1 ) · µ(s−1
0 (λ) ∩ Aℓ+1 )
X X

λ∈s 0 (Ω) λ∈s0 (Ω)


= 0 (λ)) + αℓ+1 · µ(Aℓ+1 )
λ · µ(s−1
X

λ∈s
Z 0 (Ω)
= s0 dµ + αℓ+1 µ(Aℓ+1 ).

10
By induction hypothesis, the latter sum is equal to
ℓ ℓ+1
αj µ(Aj ) + αℓ+1 µ(Aℓ+1 ) = αj µ(Aj ),
X X

j=1 j=1

which shows our claim.

Proposition 1.30. Let (Ω, M, µ) be a measure space. Then the assignment


s 7→ Ω s dµ, which assigns to every measurable step function Ω → [0, ∞) its
R

integral, satisfies the following properties:


Z Z
(i) (c · s) dµ = c · s dµ for all constants c ≥ 0.
Ω Ω
Z Z Z
(ii) (s + t) dµ = s dµ + t dµ.
Ω Ω Ω
Z Z
(iii) If s ≤ t, then s dµ ≤ t dµ.
Ω Ω

Proof. Both (i) and (ii) are straightforward consequences of Lemma 1.29.
For (iii), suppose that s and t are given as
k ℓ
s= t=
X X
αj χAj , βi χBi
j=1 i=1

with Ω = kj=1 Aj = Bi , for example via their canonical forms. Then we


F Fℓ
i=1
can also write
k ℓ ℓ k
s= t=
X X X X
αj · χAj ∩Bi , βi · χAj ∩Bi .
j=1 i=1 i=1 j=1

Now clearly s ≤ t implies that αj ≤ βi whenever Aj ∩BiR̸= ∅. Hence it follows


from Lemma 1.29 applied to the above equalities that Ω s dµ ≤ Ω t dµ.
R

Proposition 1.31. Let (Ω, M, µ) be a measure space and s : Ω →R [0, ∞) a


measurable simple function. Then the map ν : M → [0, ∞], E 7→ Ω sχE dµ
defines a measure.

Proof. Since positive linear combinations of measures are again measures, it


suffices by Proposition 1.30 to consider the case where s is of the form s = χA
for A ∈ M. In this case ν(E) = Ω χA χE dµ = Ω χA∩E dµ = µ(A ∩ E).
R R

11
Indeed it is clear that ν(∅) = 0. For σ-additivity, let En ∈ M be a sequence
of pairwise disjoint sets, and observe
 [   
= µ A∩
[
ν En En
n∈N  [ n∈N 
= µ (A ∩ En )
X n∈N
= µ(A ∩ En )
n∈N
= ν(En ).
X

n∈N

Definition 1.32. Let (Ω, M, µ) be a measure space. For a measurable func-


tion f : Ω → [0, ∞], we define its integral as
Z Z 
f dµ = sup s dµ s is a positive measurable step function with s ≤ f .



Ω Ω

We call f integrable when f dµ < ∞.


R

Remark 1.33. We should first convince ourselves that the new definition of
the integral does not contradict the old one in the case where f is assumed
to be a step function. But indeed, s = f is the largest step function with
the property that s ≤ f , so by Proposition 1.30(iii) the above supremum is
indeed attained at the value Ω s dµ in the sense of Definition 1.28.
R

Secondly, we remark that the above definition a priori makes sense even
when f is not assumed to be measurable. However, we will see in the exercises
that the resulting notion of integral will have various undesirable properties
when evaluated on non-measurable functions, for example not being additive.

Example 1.34. If δx isRthe Dirac measure associated to a point x ∈ Ω in a


measurable space, then Ω f dδx = f (x).
If µ is the counting measure on an infinite set Ω, then Ω f dµ = x∈Ω f (x).3
R P

If µ is the (yet to be constructed) Lebesgue measure and f : [a, b] → [0, ∞)


is a continuous function, then f is measurable and integrable, and its integral
in the sense of Definition 1.32 will coincide with its Riemann integral.

Proposition 1.35. Let (Ω, M) be a measureable space and f : Ω → [0, ∞] a


positive function. Then f is measurable if and only if there exists a (point-
wise) increasing sequence of positive measurable step functions sn : Ω →
[0, ∞) such that f = supn∈N sn = limn→∞ sn .
3
Note that this sum might be over an uncountable index set!

12
Proof. We already know that the “if” part is true. For the “only if” part,
assume that f is measurable. We claim that it suffices to consider the special
case f = id as a function [0, ∞] → [0, ∞]. Indeed, if we can realize id =
supn∈N tn for an increasing sequence of positive measurable step functions tn ,
then
f = id ◦f = n→∞
lim tn ◦ f = sup(tn ◦ f ),
n∈N

where sn = tn ◦ f is an increasing sequence of positive measurable step


functions. But we can come up with the following sequence tn , which is
easily seen to do the trick:
n
n2
k−1
tn = nχ[n,∞] +
X
χ[ k−1 , k )
k=1 2
n 2n 2n

Proposition 1.36. Let (Ω, M, µ) be a measure space and f : Ω → [0, ∞] a


positive measurable function. Suppose that sn is an increasing sequence of
positiveR measurable step functions such that f = supn∈N sn . Then Ω f dµ =
R

supn∈N Ω sn dµ.
Proof. Since we have by assumption sRn ≤ f for all n, it follows from the
definition of the integral that supn∈N ΩR sn dµ ≤ Ω f dµ. For the reverse
R

inequality, it suffices to prove that c · Ω s dµ ≤ supn∈N Ω sn dµ for every


R

positive measurable step function s ≤ f and all constants 0 < c < 1. For a
fixed such function s and constant c, we consider the sequence of sets

En = {x ∈ Ω | sn (x) ≥ cs(x)} .

Since sn and s are measurable, it follows that En ∈ M. Since sn converges to


f pointwise and s ≤ f , we can see that Ω =R n∈N En . As in Proposition 1.31,
S

we define a new measure ν via ν(E) = Ω sχE dµ. Then it follows from
Proposition 1.20(iv) that
Z
c s dµ = cν(Ω)

= c · sup
Z n∈N ν(En )
= sup csχEn dµ
n∈N ZΩ
≤ sup sn χEn dµ
n∈N ZΩ
≤ sup sn dµ.
n∈N Ω

13
Proposition 1.37. Let (Ω, M, µ) be a measure space. Let f, g : Ω → [0, ∞]
be measurable functions. Then
Z Z
(i) (c · f ) dµ = c · f dµ for all constants c ≥ 0.
Ω Ω
Z Z Z
(ii) (f + g) dµ = f dµ + g dµ.
Ω Ω Ω
Z Z
(iii) If f ≤ g, then f dµ ≤ g dµ.
Ω Ω

Proof. Both (i) and (iii) are trivial consequences of the definition of the inte-
gral. For (ii), take sequences sn and tn of positive measurable step functions
that increase pointwise to f and g, respectively. Then sn + tn increases
pointwise to f + g. Hence it follows from Proposition 1.36 that
Z Z
f + g dµ = sup sn + tn dµ
Ω n∈N ZΩ Z
= (sup sn dµ) + (sup tn dµ)
Z Ω Z Ω
= f dµ + g dµ.
Ω Ω

Here we have used that these suprema are in fact limits.


Proposition 1.38. Let (Ω, M, µ) be aR measure space and f : Ω → [0, ∞]
a measurable function. Then one has Ω f dµ = 0 if and only if f (x) = 0
for µ-almost all x ∈ Ω. Furthermore, if f is integrable, then f (x) < ∞ for
µ-almost all x ∈ Ω.
Proof. Suppose Ω f dµ = 0. Consider the measurable set En =Rf −1 (( n1 , ∞]),
R

and observe that by definition n1 χEn ≤ f . Hence n1 µ(En ) ≤ Ω f dµ = 0,


so µ(En ) = 0 for all n. The union of En is equal to E = f −1 ((0, ∞]).
By continuity of measures, we have µ(E) = supn∈N µ(En ) = 0, so indeed
f (x) = 0 for µ-almost all x ∈ Ω.
For the converse, assume that f (x) = 0 for µ-almost all x ∈ Ω, which
means that the set E above is a null set. If s : Ω → [0, ∞) is a simple
measurable function with s ≤ f , then itRmeans in particular that s−1 (λ) ⊆ E
for all λ ̸= 0. This immediately implies Ω s dµ = 0, but hence also Ω f dµ =
R

0 by definition.
Now assume that f is integrable. Consider E = f −1R(∞) ∈ M and sn =
nχE . Then sn ≤ f for all n ∈ N, and hence nµ(E) = Ω sn dµ ≤ Ω f dµ.
R

Since this holds for all n ≥ 1 and we assume the right side to be finite, this
is only possible for µ(E) = 0. But this is what it means that f (x) < ∞ for
µ-almost all x ∈ Ω. This finishes the proof.

14
Remark 1.39. Let Ω be a set and f : Ω → C a function. Then it is
always possible to decompose f into f = u + i · v for real-valued functions
u, v : Ω → R, the real and imaginary parts of f . Furthermore we can always
uniquely write
u = u+ − u− , v = v + − v − , u+ u− = 0, v + v − = 0,
where u+ , u− , v + , v − take values in [0, ∞).4 We then have u± , v ± ≤ |f |, but
also |f | ≤ u+ + u− + v + + v − by triangle inequality.
Furthermore, if Ω carries a σ-algebra for which f becomes measurable,
then the functions u+ , u− , v + , v − are also measurable. In what follows we wish
to exploit such decompositions to linearly extend the integral construction.
Definition 1.40. Let (Ω, M, µ) be a measure space. We say that a mea-
surable function f : Ω → C is integrable, if |f | is integrable in the sense of
Definition 1.32. In this case we define the integral as
Z Z Z Z Z 
f dµ = +
u dµ − u dµ + i
− +
v dµ − −
v dµ ,
Ω Ω Ω Ω Ω

where we use the decomposition from Remark 1.39. The set of all such
functions is denoted L1 (Ω, M, µ) or L1 (µ).
Theorem 1.41. L1 (Ω, M, µ) is a complex vector space with the usual oper-
ations. Moreover the integral
Z
L (Ω, M, µ) → C,
1
f 7→ f dµ

is a linear map.
Proof. The fact that L1 is a vector space is left as an exercise. Let us proceed
to prove linearity of the integral.
Let us first assume that f, g ∈ L1 are real-valued. Set h = f + g and
observe that hence h+ −h− = f + −f − +g + −g − , or equivalently h+ +f − +g − =
h− + f + + g + . All of these summands are positive integrable functions, so it
follows by Proposition 1.37 that
Z Z Z Z Z Z
h+ dµ + f − dµ + g − dµ = h− dµ + f + dµ + g + dµ.
Ω Ω Ω Ω Ω Ω

All of these summands are finite, and hence we can rearrange this equation
to
Z Z Z Z Z Z
h+ dµ − h− dµ = f + dµ − f − dµ + g + dµ − g − dµ,
Ω Ω Ω Ω Ω Ω
4 +
For example, u is the composition of u with the function R → [0, ∞) that is the
identity on [0, ∞) and sends every negative number to zero.

15
and hence Ω h dµ = f dµ +R Ω g dµ. It is also clear from the definition
R R R
R Ω
that Ω f + ig dµ = Ω f dµ + i Ω g dµ. These two equations imply together
R

that the integral is indeed additive.


Consider a scalar α > 0. Then (αf )± = α · f ± , so from Proposition 1.37
it follows that
Z Z Z Z
αf dµ = αf +
dµ − αf −
dµ = α f dµ.
Ω Ω Ω Ω

If α < 0, then (αf )± = (−α)f ∓ , so the similar calculation shows


Z Z Z Z
αf dµ = (−αf ) dµ − −
(−αf ) dµ = α +
f dµ.
Ω Ω Ω Ω

If more generally α = α1 + iα2 ∈ C for α1 , α2 ∈ R, then we use the above to


calculate
Z Z
α(f + ig) dµ = α1 f − α2 g + i(α1 g + α2 f ) dµ
Ω ZΩ Z
= α1 f − α2 g dµ + i α1 g + α2 f dµ
Ω Z Z Ω  Z Z 
= α1 f dµ − α2 g dµ + i α1 g dµ + α2 f dµ
 ZΩ Z Ω  Ω Ω
= α f dµ + i g dµ
Z Ω Ω
= α f + ig dµ.

This finishes the proof.

Proposition 1.42. For all f ∈ L1 (Ω, M, µ), one has


Z Z


f dµ ≤ |f | dµ.
Ω Ω

Proof. By considering the polar decomposition of Ω f dµ and multiplying f


R

with a scalar of modulus one, we may assume without loss of generality that
Ω f dµ ≥ 0. Write f = u + iv for real-valued functions u, v. Then
R

Z Z Z u+ ≤|f | Z
0≤ f dµ = u dµ ≤ u+ dµ ≤ |f | dµ.
Ω Ω Ω Ω

Proposition 1.43. Let (Ω, M, µ) be a measure space and f : Ω → C a


measurable function.

(i) If f = 0 a.e., then f is integrable and f dµ = 0.


R

16
(ii) If f is integrable and g : Ω → C is another measurable function such
that f = g a.e., then g is integrable and Ω f dµ = Ω g dµ.
R R

(iii) If f is integrable and f · χE dµ = 0 for all E ∈ M, then f = 0 a.e.


R

Proof. (i): By the previous proposition, it suffices to show Ω |f | dµ = 0, but


R

this is a consequence of Proposition 1.38.


(ii): The assumption means that g − f = 0 a.e., so g − f is integrable and
has vanishing integral. Hence g = f + (g − f ) is integrable, and according to
additivity of the integral it has the same integral as f .
(iii): Consider E = {x ∈ Ω | Ref (x) ≥ 0}. Decompose f = u+ − u− +
i(v + − v − ) as before, and observe
Z  Z Z
0 = Re f χE dµ = Re(f χE ) dµ = u+ dµ.
Ω Ω Ω

By Proposition 1.38 it follows that u+ = 0 a.e.. Repeating this argument


with different choices for E, one can also see that u− , v + , v − vanish almost
everywhere. But this clearly implies that f = 0 a.e..
Proposition 1.44. Let (Ω, M, µ) be a σ-finite measure space and f : Ω → C
an integrable function. Let F ⊆ C be a closed set with the property that for
all E ∈ M with 0 < µ(E) < ∞, one has
1 Z
f χE dµ ∈ F.
µ(E) Ω
Then f (x) ∈ F for µ-almost all x ∈ Ω.
Proof. As µ is σ-finite, we can write Ω = En as an increasing union of sets
S

En ∈ M with µ(En ) < ∞. If the set of all x ∈ En with f (x) ∈ / F is a null


set for all n ≥ 1, then by σ-subadditivity the same is true for the set of all
x ∈ Ω with f (x) ∈ / F . So by restricting the problem to each En in place of
Ω, we may assume without loss of generality that µ is a finite measure.
If F = C, there is nothing to prove, so let us assume that F has non-
empty complement. Let B ⊂ C \ F be a closed ball around some point z
with radius r > 0. Set g = f χf −1 (B) and write g = zχf −1 (B) + g0 , where
|g0 | ≤ rχf −1 (B) . Then either µ(f −1 (B)) = 0, or
1 1
Z Z
=

z − g dµ g0 dµ ≤ r.

µ(f −1 (B)) Ω

µ(f −1 (B)) Ω

The latter would imply that the number µ(f −11 (B)) Ω g dµ is in B, which
R

is disjoint from F , a contradiction. Hence we conclude µ(f −1 (B)) = 0.


Since we can write C \ F as a countable union of closed balls, this implies
µ(f −1 (C \ F )) = 0, which confirms the claim.

17
Proposition 1.45. Let (Ω, M, µ) be a measure space. Consider

N = {f : Ω → C | f is measurable and f (x) = 0 for µ-almost all x} .

Then N is a linear subspace of L1 (Ω, M, µ), and f ∈ N holds if and only if


f is measurable and |f | ∈ N .
Proof. From the previous proposition we get that indeed N ⊆ L1 (Ω, M, µ),
and the second equivalence of the statement is clear.
Let f, g ∈ N be given. First of all, it is clear that α · f ∈ N for all α ∈ C.
So we shall show f + g ∈ N . We do have that this function is measurable,
so it remains to show that it vanishes almost everywhere. Indeed, we have
(f +g)−1 (C\{0}) ⊆ f −1 (C\{0})∪g −1 (C\{0}), which by assumption implies
µ((f + g)−1 (C \ {0})) = 0, or that indeed f (x) + g(x) = 0 holds for µ-almost
all x.
Definition 1.46. Let (Ω, M, µ) be a measure space. Then in light of the
above, we define the quotient vector space

L1 (µ) = L1 (Ω, M, µ) = L1 (Ω, M, µ)/N.

For a function f ∈ L1 , we may for the moment denote its coset by [f ] = f +N ,


but in time we will transition to denote it simply by f , even though this is
some abuse of notation.5 From the previous results above it follows that the
assignment Z
L (Ω, M, µ) → C, [f ] 7→ f dµ
1

is a well-defined linear map, which we will call the integral. Since f ∈
N happens precisely when |f | ∈ N , it makes sense to define |[f ]| = [|f |].
Consequently, the following semi-norm ∥ · ∥1 on L1 becomes a norm (cf.
Proposition 1.38) on the L1 -space via
Z
∥f ∥1 = |f | dµ, ∥[f ]∥1 := ∥f ∥1 .

1.5 Convergence Theorems


Theorem 1.47 (Monotone Convergence Theorem). Let (Ω, M, µ) be a mea-
sure space and fn : Ω → [0, ∞] a (pointwise) increasing sequence of measur-
able functions. Then f = supn≥1 fn is a also measurable, and we have
Z Z
f dµ = sup fn dµ.
Ω n≥1 Ω
5
But this is justified by the fact that we will mostly form integrals, which do not depend
on the representative for such a coset.

18
Proof. By Proposition 1.35, for every n ≥ 1, we find an increasing sequence
of measurable step functions sn,k : Ω → [0, ∞] such that fn = supk≥1 sn,k .
Not only that, but the construction of these functions in the proof allows us
to assume that |fn (x) − sn,k (x)| ≤ 2−k whenever fn (x) ≤ k, and sn,k (x) = k
whenever fn (x) ≥ k.
Since (sn,k )k is increasing, the sequence of positive measurable step func-
tions tn = maxj≤n sj,n is increasing. Since fn is increasing, it follows that
tn ≤ fn for all n. We claim f = supn≥1 tn . Indeed, let x ∈ Ω be given. If
f (x) = ∞, then fn (x) → ∞, so tn (x) ≥ sn,n (x) ≥ min(n, fn (x) − 2−n ) → ∞.
If f (x) < ∞, then in particular f (x) ≤ n for all large enough n, and hence
|f (x) − tn (x)| ≤ |f (x) − sn,n (x)|
≤ |fn (x) − f (x)| + |fn (x) − sn,n (x)|
≤ |fn (x) − f (x)| + 2−n → 0.
We appeal to Proposition 1.36 and see that
Z Z tn ≤fn Z
f dµ = sup tn dµ ≤ sup fn dµ.
Ω n≥1 Ω n≥1 Ω

The “≥” relation is on the other hand clear from the fact that the integral
is monotone. This finishes the proof.
Lemma 1.48 (Fatou). Let (Ω, M, µ) be a measure space and fn : Ω → [0, ∞]
be a sequence of measurable functions. Then lim inf n→∞ fn is also measurable,
and we have Z Z
(lim inf fn ) dµ ≤ lim inf fn dµ.
Ω n→∞ n→∞ Ω

Proof. Denote gk = inf n≥k fn , and recall that lim inf n→∞ fn = supk≥1 gk .
We thus see that lim infR n→∞ n
f is Ra measurable function. If n ≥ k, then
evidently gk ≤ fn , so R Ω gk dµ ≤ Ω fn dµ. R In particular, this is true for
arbitrarily large n, so Ω gk dµ ≤ lim inf n→∞ Ω fn dµ. Using the Monotone
Convergence Theorem, we thus see
Z Z Z
lim inf fn dµ = sup gk dµ ≤ lim inf fn dµ.
Ω n→∞ k≥1 Ω n→∞ Ω

Theorem 1.49 (Dominated Convergence Theorem). Let (Ω, M, µ) be a mea-


sure space and fn ∈ L1 (Ω, M, µ) a sequence converging to a function f point-
wise. Suppose that there exists a positive integrable function g : Ω → [0, ∞)
such that |fn | ≤ g for all n. Then f ∈ L1 (Ω, M, µ), and
Z Z Z
lim |f − fn | dµ = 0, f dµ = lim fn dµ.
n→∞ Ω Ω n→∞ Ω

19
Proof. We have that f is measurable (Theorem 1.13) and |f | ≤ g, hence
f ∈ L1 (Ω, M, µ). Furthermore, we have |f − fn | ≤ 2g, and so we may apply
Fatou’s Lemma to the sequence 2g − |f − fn | to get
Z Z Z Z
2g dµ ≤ lim inf
n→∞
2g − |f − fn | dµ = 2g dµ − lim sup |f − fn | dµ .
Ω Ω Ω n→∞ Ω
| {z }
≥0

Since g was assumed to be integrable, this is equivalent to


Z Z
0 = lim sup |f − fn | dµ = lim |f − fn | dµ.
n→∞ Ω n→∞ Ω

As a consequence we obtain
Z Z Z Z
n→∞
fn dµ = |f − fn | dµ −→ 0.


f dµ − f − fn dµ ≤
Ω Ω Ω Ω

This finishes the proof.

Remark 1.50. For practical applications of the Dominated Convergence


Theorem, it is useful to observe that it is only necessary to assume that the
pointwise convergence fn (x) → f (x) and the inequality |fn (x)| ≤ g(x) holds
for µ-almost all x ∈ Ω. This is a consequence of Proposition 1.43, as we may
just re-define all functions fn on a common measurable null set to have value
zero and thus enforce these statements to hold on all points, yet the integrals
in the conclusion of the statement remain the same.

Remark 1.51. The Dominated Convergence Theorem really only holds for
sequences of functions, and its analogous generalizations for more general
families of functions (such as nets) are false. A counterexample is discussed
in the exercise sessions.

Theorem 1.52. Let (Ω, M, µ) be a measure space. Suppose that a sequence


fn ∈ L1 (Ω, M, µ) satisfies the Cauchy criterion in the semi-norm ∥·∥1 . Then
there exists a subsequence (fnk )k and a function f ∈ L1 (Ω, M, µ) such that
k→∞
fnk (x) −→ f (x) for µ-almost all x ∈ Ω,
n→∞
and moreover ∥f − fn ∥1 −→ 0. In particular, it follows that L1 (Ω, M, µ) is
a Banach space with respect to the norm ∥ · ∥1 .

Proof. By applying the Cauchy criterion inductively, we can find a subse-


quence (fnk )k such that ∥fnk − fnk+1 ∥ ≤ 2−k . We define the measurable

20
k=1 |fnk (x) − fnk+1 (x)|. Then it follows
function g : Ω → [0, ∞] as g(x) = ∞
P

from the Monotone Convergence Theorem that


Z ∞ Z ∞ ∞
g dµ = |fnk − fnk+1 | dµ = 2−k = 1.
X X X
∥fnk − fnk+1 ∥1 ≤
Ω k=1 Ω k=1 k=1

In particular g is integrable, and by Proposition 1.38, the set E = g −1 ([0, ∞))


has a complement of zero measure. For all x ∈ E, we have by definition that

the series [fnk+1 (x) − fnk (x)] converges absolutely, and hence the function
X

k=1

f (x) := fn1 (x) + [fnk+1 (x) − fnk (x)] = lim fnk (x),
X
x∈E
k→∞
k=1

is well-defined and measurable on E. We extend f to a measurable function


on Ω by defining it to be zero on the complement of E. We get by the
triangle inequality that for all k ≥ 1, the function fnk is dominated (on E)
by the integrable function |fn1 | + g. Therefore it follows from the Dominated
k→∞
Convergence Theorem that ∥f − fnk ∥1 −→ 0. Since the sequence (fn )n was
assumed to satisfy the Cauchy criterion in ∥ · ∥1 and we just showed that a
subsequence converges to f in this norm, it follows that also ∥f − fn ∥ → 0.6
This finishes the proof.
Proposition 1.53. Let (Ω1 , M1 , µ1 ) be a measure space, Ω2 a non-empty
set, and φ : Ω1 → Ω2 a map. Then
n o
φ∗ M1 := E ⊆ Ω2 | φ−1 (E) ∈ M1
is a σ-algebra and
φ∗ µ1 : φ∗ M1 → [0, ∞], φ∗ µ1 (E) = µ1 (φ−1 (E))
is a measure. These are called the push-forward σ-algebra and the push-
forward measure with respect to φ. Then with respect to this measure space
structure on Ω2 , φ becomes a measurable map.
Proof. We have already seen in the exercise sessions that φ∗ M1 is a σ-algebra.
For the measure condition, we observe that φ∗ µ1 (∅) = 0 is trivial. Moreover
if En ∈ φ∗ M1 is a sequence of disjoint sets, then φ−1 (En ) ∈ M1 is a sequence
of disjoint sets, and hence
φ∗ µ1 ( En ) = µ1 ( φ−1 (En )) = µ1 (φ−1 (En )) = φ∗ µ1 (En ).
[ [ X X

The last part of the statement is trivial.


6
As an exercise, fill in this detail yourself! This is a standard ε/2-argument, and it is
exactly how one proves the completeness of R out of the axiom that bounded sets have
suprema.

21
Theorem 1.54. Let (Ω1 , M1 , µ1 ) be a measure space, Ω2 a non-empty set,
and φ : Ω1 → Ω2 a map. We define M2 = φ∗ M1 and µ2 = φ∗ µ1 in the above
sense.

(i) If f : Ω2 → [0, ∞] is a measurable function, then


Z Z
f dµ2 = f ◦ φ dµ1 .
Ω2 Ω1

(ii) If f : Ω2 → C is a µ2 -integrable function, then f ◦ φ is µ1 -integrable,


and we have Z Z
f dµ2 = f ◦ φ dµ1 .
Ω2 Ω1

Proof. (i): By definition, a set E ⊆ Ω2 belongs to M2 precisely when


φ−1 (E) ∈ M1 . Since χE ◦ φ = χφ−1 (E) , we have
Z Z
χE dµ2 = µ2 (E) = µ1 (φ−1 (E)) = χE ◦ φ dµ1 .
Ω2 Ω1

In other words, the claim holds for functions of the form f = χE . By linearity
of the integral, the desired equation holds for all positive measurable step
functions in place of f . Now let f be as general as in the statement, and
write f = supn∈N sn for an increasing sequence of positive measurable step
functions sn : Ω2 → [0, ∞), using Proposition 1.35. Clearly we also have
f ◦ φ = supn∈N sn ◦ φ. Then by Proposition 1.36, we see that
Z Z Z Z
f dµ2 = sup sn dµ2 = sup sn ◦ φ dµ1 = f ◦ φ dµ1 .
Ω2 n∈N Ω2 n∈N Ω1 Ω1

(ii): By definition, Rf being µ2 -integrable means that Ω2 |f | dµ2 < ∞, which


R

by part (i) implies Ω1 |f ◦ φ| dµ1 < ∞. In other words, f ◦ φ is µ1 -integrable.


The desired equality thus follows directly from part (i), the linearity of the in-
tegral, and the fact that we may write f as a linear combination of integrable
positive functions.

Remark 1.55. In the above theorem, we pushed forward the measure space
structure from Ω1 to get one on Ω2 which makes the statement of the theorem
true. It may of course happen that we have an a priori given measure space
(Ω2 , M2 , µ2 ) and a measurable map φ : Ω1 → Ω2 with the property that
µ1 (φ−1 (E)) = µ2 (E) for all E ∈ M2 . Convince yourself that the statement
of the theorem will still be true!

22
2 Carathéodory’s Construction of Measures
2.1 Measures on Semi-Rings and Rings
Definition 2.1. Let Ω be a non-empty set. A semi-ring on Ω is a set of
subsets A ⊆ 2Ω such that

(1) ∅ ∈ A.

(2) If E, F ∈ A, then E ∩ F ∈ A.

(3) If E, F ∈ A, then E \ F is a finite disjoint union of sets in A.

Example 2.2. The set A of all subsets of Ω with at most one element form
a semi-ring. More interestingly, if n ≥ 2, then the set of half-open cubes

A = {(a1 , b1 ] × · · · × (an , bn ] | aj , bj ∈ R, aj ≤ bj }

is also a semi-ring on Rn . (This will be justified later.)

Definition 2.3. Let Ω be a non-empty set. A ring over Ω is a set of subsets


R ⊆ 2Ω such that

(1) ∅ ∈ R.

(2) If E, F ∈ R, then E ∪ F ∈ R.

(3) If E, F ∈ R, then E \ F ∈ R.

Remark 2.4. One always has E ∩ F = E \ (E \ F ), so every ring is closed


under forming intersections. In particular, it follows that every ring is also a
semi-ring. The converse is of course not true, e.g., the set of (left) half open
intervals in R is a semi-ring, but not a ring.

Lemma 2.5. Let A be a semi-ring over Ω. Then the smallest ring R over
Ω containing A is given as

R = {E1 ∪ · · · ∪ En | Ej ∈ A are pairwise disjoint} .

We will refer to it as the ring generated by A.

Proof. If R is given as above, then evidently every ring containing A will


also contain R. Hence the claim will follow once we show that R is indeed
a ring. It is first of all clear that ∅ ∈ R. Let E, F ∈ R. We shall first show
E \ F ∈ R.

23
By definition, we find pairwise disjoints sets E1 , . . . , En ∈ A and pairwise
disjoint sets F1 , . . . , Fm ∈ A such that E = nj=1 Ej and F = m ℓ=1 Fℓ . Then
S S

n m
[  n    
E\F = Fℓ = (Ej \ F1 ) \ F2 \ . . .
[ [
Ej \ \ Fm
j=1 ℓ=1 j=1

Since A was assumed to be a semi-ring, the set Ej \ F1 is a finite union of


pairwise disjoint sets in A. Once we know this, it follows again by the semi-
ring property that (Ej \ F1 ) \ F2 is a finite union of pairwise disjoint
 sets in

A. By induction, it follows that each expression of the form (Ej \ F1 ) \
 
F2 \ . . . \ Fm is a finite union of disjoint sets in A. These unions are in
turn pairwise disjoint in j, and hence it follows that E \ F ∈ R.
Now from this we also get E ∪F ∈ R because one can write it as a disjoint
union E ∪ F = E ∪ (F \ E), and it is clear that R is closed under disjoint
unions of its elements.
Definition 2.6. Let A be a semi-ring on Ω. A measure on A is a map
µ : A → [0, ∞] such that
(i) µ(∅) = 0.
(ii) If En ∈ A is a sequence of pairwise disjoint sets with n≥1 En ∈ A,
S

then µ( n≥1 En ) = n≥1 µ(En ). (σ-additivity)


S P

In particular, if R is a ring on Ω, then a measure on the ring R is defined as


a measure on R viewed as a semi-ring.
Proposition 2.7. Let A be a semi-ring over Ω, and let R be the ring gen-
erated by A. Then every measure µ0 on A extends uniquely to a measure µ
on R.
Proof. By the definition of R, it is clear that if µ exists, then the σ-additivity
of µ implies that µ is uniquely determined by µ0 . So let us argue why µ exists
in the first place. Let E ∈ R, and write it as E = E1 ∪ · · · ∪ En with pairwise
disjoint sets Ej ∈ A. Define
µ(E) := µ0 (E1 ) + · · · + µ0 (En ).
We have to show that µ is well-defined, and that it is σ-additive. Suppose
that F1 , . . . , Fm ∈ A is another collection of pairwise disjoint sets with E =
F1 ∪ · · · ∪ Fm . Then one has for all j ∈ {1, . . . , n} that
m
Ej = Ej ∩ E = (Ej ∩ Fℓ ),
[

ℓ=1

24
and this union is a disjoint union. Analogously we have Fℓ = nj=1 (Ej ∩ Fℓ )
S

for all ℓ ∈ {1, . . . , m}. By using the additivity of µ0 , it follows that


n n X
m m X
n m
µ0 (Ej ) = µ0 (Ej ∩ Fℓ ) = µ0 (Ej ∩ Fℓ ) = µ0 (Fℓ ).
X X X X

j=1 j=1 ℓ=1 ℓ=1 j=1 ℓ=1

So we see that µ is a well-defined map.


Now let us see why µ is σ-additive. Suppose that En ∈ R is a sequence
of pairwise disjoint sets with E = n≥1 En ∈ R. Write E = A1 ∪ · · · ∪ Am
S

for pairwise disjoint sets A1 , . . . , Am ∈ A, and moreover write En = An,1 ∪


· · · ∪ An,mn ∈ A for all n ≥ 1 some mn ≥ 1. Then applying σ-additivity of
µ0 several times, we can see
m
µ(E) = µ0 (Aj )
X

j=1
m  [ 
= (Aj ∩ En )
X
µ0
j=1 n≥1
m  [ 
= (Aj ∩ An,k )
X [
µ0
j=1 n≥1 k≤mn
m X∞ X
mn
= µ0 (Aj ∩ An,k )
X

j=1 n=1 k=1


∞ X m Xmn ∞
= µ0 (Aj ∩ An,k ) = µ(En ).
X X

n=1 j=1 k=1 n=1

2.2 Outer measures


Definition 2.8. Let Ω be a non-empty set. An outer measure on Ω is a map
ν : 2Ω → [0, ∞] satisfying:

OM1. ν(∅) = 0.

OM2. If E ⊆ F ⊆ Ω, then ν(E) ≤ ν(F ).


S 
OM3. If En ⊆ Ω is any sequence of pairwise disjoint sets, then ν n≥1 En ≤
n=1 ν(En ). (σ-subadditivity)
P∞

Remark 2.9. WARNING! The terminology is a bit confusing. Contrary to


the usual rules of the English language, an “outer measure” in the above sense
does not describe a measure that has an additional “outerness” property.
Instead, it describes a weaker concept than that of a measure.

25
Proposition 2.10. Let Ω be a non-empty set, R a ring on Ω, and µ a
measure on R. Define for every subset E ⊂ Ω the value
 
X∞ 
µ∗ (E) = inf  µ(En )
[
En  .7

En ∈ R is a sequence with E ⊆
n=1 n≥1

Then µ∗ defines an outer measure on Ω.


Proof. One gets µ∗ (∅) by choosing En = ∅. If E ⊆ F ⊆ Ω are two subsets,
then evidently there are at least as many ways to cover E by sequences in R
as for F , which leads directly to µ∗ (E) ≤ µ∗ (F ).
Now let An ⊆ Ω be a sequence of (pairwise disjoint) sets. We shall
show that µ∗ ( n≥1 An ) ≤ ∞ n=1 µ (An ). If the right side is infinite, there is

S P

nothing to show, so let us assume that it is finite. Let ε > 0. For every
n ≥ 1, we may by definition find a sequence En,k ∈ R with An ⊆ k≥1 En,k
S

and ∞ k=1 µ(En,k ) ≤ µ (An )+2


∗ −n
ε. Then the set n≥1 An is of course covered
P S

by the countably many sets En,k ∈ R for n, k ≥ 1, and hence



∞ X ∞ ∞
µ∗ ( An ) ≤ µ(En,k ) ≤ (2−n ε + µ∗ (An )) = ε + µ∗ (An ).
[ X X X

n≥1 n=1 k=1 n=1 n=1

Since ε > 0 was arbitrary, this shows the claim.


Definition 2.11. Let ν be an outer measure on a set Ω. We say that a
set E ⊆ Ω is ν-measurable, if for every subset A ⊆ Ω, we have ν(A) =
ν(A ∩ E) + ν(A \ E).
Theorem 2.12 (Carathéodory). Let ν be an outer measure on a set Ω. Then
the ν-measurable subsets of Ω form a σ-algebra M, and the restriction of ν
to M defines a measure.
Proof. Evidently ∅ ∈ M and if E ∈ M, then also E c ∈ M. Let us consider
finite unions. Let E, F ∈ M, and A ⊆ Ω an arbitrary subset. Then
 
ν(A) = ν A ∩ (E ∪ F ) ∪ A \ (E ∪ F )
≤ ν(A ∩ (E ∪ F )) + ν(A \ (E ∪ F ))
= ν(A ∩ E ∪ (A ∩ F \ E)) + ν(A \ (E ∪ F ))
≤ ν(A ∩ E) + ν((A \ E) ∩ F ) + ν((A \ E) \ F ))
= ν(A ∩ E) + ν(A \ E) = ν(A).
But from this we get equality everywhere, which implies E ∪ F ∈ M because
A was arbitrary. If additionally E ∩ F = ∅, then we have ν(A ∩ (E ∪ F )) =
7
Keep in mind that by convention, inf ∅ := ∞. For certain sets E there may not exist
any sequence En with these properties.

26
ν(A ∩ (E ∪ F ) ∩ E) + ν(A ∩ (E ∪ F ) \ E) = ν(A ∩ E) + ν(A ∩ F ). So inserting
A = E ∪ F yields that ν is finitely additive on M.
From E ∪ F ∈ M it immediately follows that M is closed under intersec-
tions and differences. Hence M becomes a σ-algebra if we can show that for
all sequences of pairwise disjoint sets En ∈ M, we have E = n≥1 En ∈ M.
S

For m ≥ 1 and all sets A ⊆ Ω, we have


  [    [ 
ν(A) = ν A ∩ En +ν A\ En
n≤m n≤m
m   [ 
= ν(A ∩ En ) + ν A \
X
En
n=1 n≤m
m
ν(A ∩ En ) + ν(A \ E).
X

n=1


Since m was arbitrary, we get ν(A) ≥ ν(A ∩ En ) + ν(A \ E). On the other
X

n=1
hand, the equality A = (A \ E) ∪ (A ∩ E) = (A \ E) ∪ A ∩ En together
S
n≥1
with σ-subadditivity yields

ν(A) ≤ ν(A ∩ E) + ν(A \ E)



ν(A ∩ En ) + ν(A \ E) ≤ ν(A).
X

n=1

In particular we get the equality ν(A) = ν(A ∩ E) + ν(A \ E), which yields
E ∈ M as A was arbitrary. Furthermore, if we insert A = E, then we
also have ν(E) = ∞ n=1 ν(En ), which shows that ν is σ-additive on M. In
P

particular it is indeed a measure when restricted to M.

Definition 2.13. A measure space (Ω, M, µ) is called complete, if for all sets
E ⊆ F ⊆ Ω, one has that F ∈ M and µ(F ) = 0 implies E ∈ M.

Proposition 2.14. Let Ω be a non-empty set and ν an outer measure on Ω.


Let M be the σ-algebra of ν-measurable sets, and define the measure µ = ν|M .
Then (Ω, M, µ) is a complete measure space.

Proof. Suppose E ⊆ F ∈ M are given with µ(F ) = 0. Then we observe for


all A ⊆ Ω that
ν(A) ≤ ν(A ∩ E) + ν(A \ E)
≤ ν(A ∩ F ) + ν(A \ E)
≤ 0 + ν(A).
So we see that these are all equalities. Since A was arbitrary, this implies
that E ∈ M.

27
Theorem 2.15. Let R be a ring on a set Ω, and µ a measure on R. Then
the σ-algebra M of µ∗ -measurable sets contains R, and we have µ∗ |R = µ.
Proof. We see right away for all E ∈ R that µ∗ (E) ≤ µ(E) since we can write
E = E ∪ ∅ ∪ ∅ ∪ . . . . On the other hand, if En ∈ R is any sequence of sets
with E ⊆ n≥1 En , then it follows from σ-subadditivity and monotonicity of
S

µ that
 [  ∞ ∞
µ(E) = µ (E ∩ En ) ≤ µ(E ∩ En ) ≤ µ(En ).
X X

n≥1 n=1 n=1

By taking the infimum over all possible choices of such sequences, we arrive
at µ(E) = µ∗ (E). Since E ∈ R was arbitrary, we have just shown µ∗ |R = µ.
Now we need to show that every set E ∈ R is µ∗ -measurable. Let A ⊆ Ω
be any set. Since we always have µ∗ (A) ≤ µ∗ (A ∩ E) + µ∗ (A \ E), we may
assume without loss of generality that µ∗ (A) < ∞. Let An ∈ R be a sequence
of sets with A ⊆ n≥1 An . Then
S

µ∗ (A) ≤ µ∗ (A ∩ E) + µ∗ (A \ E)
n=1 µ(An ∩ E) + µ(An \ E)
P∞

= n=1 µ(An ).
P∞

If we take the infimum over all possible such sequences An , then the right
side approaches the value µ∗ (A), and hence we get the equality µ∗ (A) =
µ∗ (A ∩ E) + µ∗ (A \ E). Since A ⊆ Ω and E ∈ R were arbitrary, this finishes
the proof.
Definition 2.16. Let A be a semi-ring over Ω, and µ a measure on A. We
say that µ is σ-finite, if there is a sequence En ∈ A with µ(En ) < ∞ and
Ω = n≥1 En .
S

Theorem 2.17. Let R be a ring on a set Ω, and µ a σ-finite measure on R.


Let M0 be the σ-algebra generated by R. Then the measure extension µ∗ |M0
from R to M0 is the unique measure on M0 extending µ on R.
Proof. Suppose that µ1 is any measure on M0 with µ1 |R = µ. We claim
µ1 ≤ µ∗ |M0 . Let E ∈ M0 , and assume without loss of genereality µ∗ (E) < ∞.
So in particular there exist sequences En ∈ R with E ⊆ n≥1 En . By σ-
S

subadditivity of µ1 , it follows that µ1 (E) ≤ ∞ n=1 µ(En ). But since this


P

holds for any choice of (En )n , we obtain µ1 (E) ≤ µ∗ (E).


Since we assumed µ to be σ-finite, we can choose some sequence Fn ∈ R
with µ(Fn ) < ∞ and Ω = n≥1 Fn . Without loss of generality we may assume
S

that Fn ⊆ Fn+1 for all n ≥ 1. For every A ∈ M0 we have


µ1 (Fn ∩ A) + µ1 (Fn \ A) = µ1 (Fn ) = µ∗ (Fn ) = µ∗ (Fn ∩ A) + µ∗ (Fn \ A).

28
Note that all these summands are finite, and we know from above µ1 (Fn ∩
A) ≤ µ∗ (Fn ∩ A). It follows that necessarily µ1 (Fn ∩ A) = µ∗ (Fn ∩ A). By
taking the supremum over n, we conclude µ1 (A) = µ∗ (A).
Corollary 2.18. Let A be a semi-ring over a set Ω, and µ0 a measure on
A. Let M be the σ-algebra generated by A. Then there exists a measure
µ : M → [0, ∞] extending µ0 . If µ0 is σ-finite, then µ is unique.
Proof. We note that if R is the ring generated by A, then M is also the
σ-algebra generated by R. So this is a direct consequence of Proposition 2.7,
Proposition 2.10, Theorem 2.12, and Theorem 2.15. In the case of µ0 being
σ-finite, uniqueness of µ is exactly Theorem 2.17.

2.3 Application: The Lebesgue measure on R


Notation. In what follows, we will fix a bijection φ on a set Ω. We will also
denote by φ the induced bijection on 2Ω , which associates to every subset
E ⊆ Ω its image φ(E) under φ.
Proposition 2.19. Let A be semi-ring on a set Ω, and let µ0 : A → [0, ∞]
be a σ-finite measure. Let R be the ring generated by A, and µ : R → [0, ∞]
the unique extension to a measure. Suppose that φ is a bijection on Ω that
restricts to a bijection on A, and suppose that µ0 = µ0 ◦ φ. Then φ restricts
to a bijection on R, and µ = µ ◦ φ.
Proof. This is immediate from the definition of both R and µ, and is left as
an exercise.
Proposition 2.20. Let R be a ring on a set Ω, and let µ : R → [0, ∞] be a
measure. Suppose that φ is a bijection on Ω that restricts to a bijection on R,
and suppose that µ = µ ◦ φ. Then the outer measure µ∗ satisfies µ∗ ◦ φ = µ∗ .
Moreover, φ restricts to a bijection on the σ-algebra of µ∗ -measurable subsets
of Ω.
Proof. Let A ⊆ Ω be an arbitrary subset, and let En ⊆ Ω be any sequence of
sets. Then clearly A ⊆ n≥1 En if and only if φ(A) ⊆ n≥1 φ(En ). If φ defines
S S

a bijection on R, then also En ∈ R if and only if φ(En ) ∈ R. Furthermore


we have from assumption that µ(φ(En )) = µ(En ). By the definition of µ∗ ,
we immediately get µ∗ (A) = µ∗ (φ(A)).
Now assume additionally that E is µ∗ -measurable. Then we have
µ∗ (A) = µ∗ (φ−1 (A))
= µ∗ (φ−1 (A) ∩ E) + µ∗ (φ−1 (A) \ E)
= µ∗ (A ∩ φ(E)) + µ∗ (A \ φ(E)).

29
Since A is arbitrary, it follows that φ(E) is µ∗ -measurable. The reverse
argument shows that if φ(E) is µ∗ -measurable, then E was µ∗ -measurable to
begin with. This finishes the proof.

Proposition 2.21. Consider the set A ⊆ 2R of half-open intervals A =


{(a, b] | a, b ∈ R, a ≤ b}.8 Then A is a semi-ring, and the map µ0 : A →
[0, ∞] given by µ0 (a, b] = b − a is a measure on A.
 i
Proof. Evidently ∅ ∈ A. We have (a, b] ∩ (c, d] = max(a, c), min(b, d) ,
so A is closed under intersections. The set-difference is given as a disjoint
union (a, b] \ (c, d] = (a, c] ∪ (d, b], so we see that A is a semi-ring. Evidently
µ0 (∅) = 0, so we only need to show the σ-additivity.
Suppose that E = (a, b] ∈ A, and let En = (an , bn ] ∈ A be a sequence
of pairwise disjoint sets with E = n≥1 En . Given any M ≥ 1, we have in
S

particular n=1 En ⊂ E. By reordering En for 1 ≤ n ≤ M and discarding


SM

any empty sets if necessary, we may assume bn ≤ an+1 for all n < M . This
leads to
M M M −1
µ0 (En ) = bn −an ≤ bM −aM + an+1 −an = bM −a1 ≤ b−a = µ0 (E).
X X X

n=1 n=1 n=1

Since M is arbitrary, this leads to µ0 (En ) ≤ µ0 (E). Let ε > 0 with


P∞
n=1
ε < b − a. Then in particular

[a + ε, b] ⊂ (a, b] = (an , bn ] ⊂ (an , bn + 2−n ε).


[ [

n≥1 n≥1

The right-hand side is an open covering of the compact set on the left side,
and hence there is some N ≥ 1 such that [a + ε, b] ⊂ N n=1 (an , bn + 2
−n
ε). We
S

change the ordering of the intervals appearing in this union by the following
inductive procedure: Choose k1 ∈ {1, . . . , N } to be the index so that
n o
ak1 = max aj | a + ε ∈ (aj , bj + 2−j ε) .

If b < bk1 + 2−k1 ε, then the procedure stops here. Otherwise, choose k2 ∈
{1, . . . , N } to be the index so that
n o
ak2 = max aj | bk1 + 2−k1 ∈ (aj , bj + 2−j ε) .

If b < bk2 +2−k2 ε, then the procedure stops here. Otherwise one continues in-
ductively until the procedure stops after L ≤ N steps. This yields an injective
8
By convention, we set (a, b] := ∅ if a ≥ b.

30
map k : {1, . . . , L} → {1, . . . , N } such that [a + ε, b] ⊂ Ln=1 (akn , bkn + 2−kn ε)
S

and such that for all n < L, we have akn+1 < bkn + 2−kn ε. From this we can
deduce (with the help of the first part of the proof)

b − a − ε < bkL + 2−kL ε − ak1


L−1
= bkL + 2−kL ε − akL +
X
akn+1 − akn
n=1
L−1
< bkL + 2−kL ε − akL + bkn + 2−kn ε − akn
X

n=1
L
= bkn + 2−kn ε − akn
X

n=1
N
≤ ε+
X
b n − an
n=1
≤ ε + b − a.

From this we may conclude


N ∞
b − a ≤ 2ε + bn − an ≤ 2ε + µ0 (En ).
X X

n=1 n=1

Since ε > 0 was arbitrary, this finally implies µ0 (E) = µ0 (En ) and
P∞
n=1
finishes the proof.

Theorem 2.22 (Lebesgue). The measure µ0 : A → [0, ∞] defined on the


semi-ring of half-open intervals A ⊂ 2R extends to a translation-invariant
measure λ : L → [0, ∞] on the σ-algebra L ⊂ 2R of Lebesgue-measurable
sets, which contains the Borel σ-algebra of R. On the Borel σ-algebra, λ is
the unique measure extending µ0 , and in fact the unique translation-invariant
measure with λ((0, 1]) = 1.

Proof. We use Proposition 2.7 and extend µ0 to a measure µ : R → [0, ∞]


on the ring R ⊂ 2R generated by A. We consider the outer measure µ∗ on
R as in Proposition 2.10. We appeal to Theorem 2.12 and define L as the
σ-algebra of µ∗ -measurable sets (which we call Lebesgue-measurable), which
contains A. Since A generates the Borel σ-algebra, it follows that L contains
the Borel σ-algebra. By Theorem 2.15, the measure λ = µ∗ |L indeed extends
µ on A.
Let us argue why λ is translation-invariant. Let t ∈ R. Consider the
bijection φ on R given by φ(a) = t + a. Then evidently φ restricts to a
bijection on A, and we have µ = µ ◦ φ. By combining Proposition 2.19 and
Proposition 2.20, it follows that φ restrict to a bijection on L, and λ = λ ◦ φ.

31
In other words, we have λ(E + t) = λ(E) for every Lebesgue-measurable set
E ∈ L. Since t ∈ R is arbitrary, this shows the claim.
Lastly, the measure µ0 on A is σ-finite. Since A generates the Borel σ-
algebra, the uniqueness of the measure follows from Theorem 2.17. On the
other hand, if we have a translation-invariant measure µ with µ((0, 1]) = 1,
then it is easy to see that µ((0, n1 ]) = n1 for all n ≥ 1, which one can use
to show that µ agrees with λ on all sets in A with rational endpoints. If
a, b ∈ R with a < b, pick some number c ∈ Q strictly between them. Choose
decreasing sequences of rational numbers an , bn ∈ Q such that an → a and
bn → b. Then
(a, c] = (an , c], (c, b] = (c, bn ].
[ \

n≥1 n≥1

By the continuity of measures, we can conclude that µ((a, b]) = b − a, so µ


agrees with λ on all of A, hence µ = λ on the Borel σ-algebra.

Remark 2.23. WARNING! The σ-algebra of Lebesgue sets is indeed big-


ger than the Borel σ-algebra.9 Nevertheless, one sometimes refers to the
Lebesgue measure to mean its restriction on the Borel σ-algebra, and de-
notes that also by λ. In the exercise sessions, we have already discussed an
example of a set A ⊂ [0, 1] that is not even Lebesgue-measurable.

2.4 Application: Lebesgue–Stieltjes Measures


Definition 2.24. Let X be a locally compact, σ-compact Hausdorff space.
A Borel measure on X is a measure on the Borel σ-algebra that assigns a
finite value to every compact subset of X. In the special case X = R, these
are called Lebesgue–Stieltjes measures.

Proposition 2.25. Let µ be a Lebesgue–Stieltjes measure. Then there exists


a unique increasing right-continuous function F : R → R such that F (0) = 0
and µ((a, b]) = F (b) − F (a) for all a, b ∈ R, a < b.

Proof. From these properties it follows that if such a function F exists at all,
then it has to be given by the formula

t>0

µ((0, t]),


F (t) = 0 , t=0

−µ((t, 0]) , t < 0.

9
This is not so easy to see, but an example is discussed here, for whoever is interested:
https://www.math3ma.com/blog/lebesgue-but-not-borel.

32
We claim that this function has indeed the right properties. Let a, b ∈ R
with a < b. We aim to show µ((a, b]) = F (b) − F (a). If a ≥ 0, then
µ((a, b]) = µ((0, b] \ (0, a]) = µ((0, b]) − µ((0, a]) = F (b) − F (a). If b < 0,
we can prove this analogously. If a < 0 ≤ b, then we have µ((a, b]) =
µ((a, 0] ∪ (0, b]) = µ((a, 0]) + µ((0, b]) = F (b) − F (a).
The fact that F is increasing follows immediately from the fact that µ
has nonnegative values. The right-continuity follows from

lim F (a + εn ) = lim µ((0, a + εn ]) = µ((0, a]) = F (a)


n→∞ n→∞

for any sequence εn > 0 with εn → 0. Here we used the continuity property
of µ as a measure with respect to countable decreasing intersections.

Theorem 2.26. The assignment µ 7→ F , which assigns to every Lebesgue–


Stieltjes measure its increasing right-continuous function as in Proposition 2.25,
is a bijection. In particular, whenever F : R → R is an increasing right-
continuous function with F (0) = 0, there exists a unique Lebesgue–Stieltjes
measure µ such that µ((a, b]) = F (b) − F (a) for all a, b ∈ R with a < b.

Proof. We have already seen that every Lebesgue–Stieltjes measure gives an


increasing right-continuous function with these properties. Furthermore, the
assignment µ → F is injective. This is because F uniquely determined how
µ is defined on the semi-ring of half-open intervals. Since µ restricted to this
semi-ring is σ-finite, it follows from Theorem 2.17 that µ is hence uniquely
determined by F .
It remains to be shown that for every choice of F , there exists a corre-
sponding Lebesgue–Stieltjes measure. Indeed, let us define A as the semi-ring
of bounded half-open intervals. As in Proposition 2.21, we define µ : A →
[0, ∞] via µ((a, b]) = F (b) − F (a). For σ-additivity, assume E = (a, b] can
be expressed as the disjoint union of En = (an , bn ]. If M ≥ 1 is any number
and we reorder the En for n ≤ M to ensure bn ≤ an+1 , then it follows from
the fact that F is increasing that
M M
µ(En ) = F (bn ) − F (an )
X X

n=1 n=1
M −1
≤ F (bM ) − F (aM ) + F (an+1 ) − F (an )
X

n=1
= F (bM ) − F (a1 )
≤ F (b) − F (a) = µ(E).

Since M was arbitrary, we have µ(En ) ≤ µ(E).


P∞
n=1

33
For the reverse inequality, choose ε > 0. We use right-continuity to pick
for every n ≥ 1 a small δn > 0 such that F (bn + δn ) − F (bn ) ≤ 2−n ε. Then
∞ ∞
[a + ε, b] ⊂ E = (an , bn + δn )
[ [
En ⊂
n=1 n=1

Since the left side is compact, we obtain some N ≥ 1 such that [a + ε, b] ⊂


n=1 (an , bn + δn ). Now repeating exactly the same argument as in the proof
SN

of Proposition 2.21, we may deduce F (b)−F (a+ε) ≤ ε+ n=1 F (bn )−F (an ).
P∞

Letting ε → 0, we obtain the σ-additivity for µ.


The rest of the claim follows from the Carathéodory construction, exactly
as in the proof of Theorem 2.22, see also Corollary 2.18.
Example 2.27. For the choice F = idR , we recover the Lebesgue measure.
On the other hand, if a ∈ R ::::::
a > 0:is some chosen number and we set

0 , t<a
F (t) = 
1 , t ≥ a,

then one can show that we recover the Dirac measure µ = δa . For a ≤ 0 one
::::::::::::::
also recovers this measure with the function
::::::::::::::::::::::::::::::::::::::::::::::

−1 , t<a
F (t) =
0 , t ≥ a.
::::::::::::::::::::::::

More generally, if µ is the measure corresponding to an increasing right-


continuous function F , one can show that µ({a}) = 0 if and only if F is
continuous at a.

2.5 Product measures


Definition 2.28. Let (Ωi , Mi ) be two measurable spaces for i = 1, 2. Then
the product σ-algebra M1 ⊗ M2 on Ω1 × Ω2 is defined as the σ-algebra
generated by all sets of the form E1 × E2 for E1 ∈ M1 and E2 ∈ M2 , the
so-called measurable rectangles in Ω1 × Ω2 .
Proposition 2.29. Let (Ωi , Mi , µi ) be two measure spaces for i = 1, 2. Then
the set of measurable rectangles

A = {E1 × E2 | E1 ∈ M1 , E2 ∈ M2 } ⊆ 2Ω1 ×Ω2

is a semi-ring. Furthermore, the map µ0 : A → [0, ∞] given by µ0 (E1 ×E2 ) =


µ1 (E1 )µ2 (E2 ) is a measure on A.

34
Proof. This is proved in the exercise sessions.

Theorem 2.30. Let (Ωi , Mi , µi ) be two measure spaces for i = 1, 2. Then


the measure µ0 on the measurable rectangles extends to a measure on the
product σ-algebra µ1 ⊗ µ2 : M1 ⊗ M2 → [0, ∞].

Proof. This is a special case of Corollary 2.18.

Definition 2.31. Let d ≥ 1. Then the Lebesgue measure on Rd is defined


as the d-fold product measure

λ(d) = λ
|
⊗ λ ⊗{z· · · ⊗ λ}
d times

with respect to the measure space (R, L, λ). The Lebesgue σ-algebra L(d)
on Rd is the one consisting of all λ(d) -measurable sets in the sense of Defi-
nition 2.11, which contains the Borel σ-algebra. If the dimension d is clear
from context, we may sometimes slightly abuse notation and just write λ for
the Lebesgue measure on Rd .

Corollary 2.32. The Lebesgue measure λ(d) : L(d) → [0, ∞] is translation


invariant for all d ≥ 1.

Proof. We already know that the Lebesgue measure on R is translation in-


variant, so there is nothing to prove when d = 1. We proceed by induction
and assume that d ≥ 2 is a number so that λ(d−1) is translation invariant.
Let t = (t1 , . . . , td ) = (t′ , td ) ∈ Rd = Rd−1 × R. Denote by µ0 the product
measure defined on the measurable rectangles. If E ∈ L(d−1) and F ∈ L are
measurable, then (E × F ) + t = (E + t′ ) × (F + td ), and so µ0 ((E × F ) + t) =
λ(d−1) (E + t′ )λ(F + td ) = λ(d−1) (E)λ(F ) = µ0 (E × F ). It now follows directly
from Proposition 2.19 and Proposition 2.20 that λ(d) (A + t) = λ(d) (A) for all
A ∈ L(d) , concluding the proof.
We conclude this section with Fubini’s theorem, which is a fundamental
result that tells us how to compute integrals over product measures.

Theorem 2.33 (Tonelli; see exercises). Let (Ωi , Mi , µi ) be two σ-finite mea-
sure spaces for i = 1, 2. Let f : Ω1 × Ω2 → [0, ∞] be a M1 ⊗ M2 -measurable
function. Denote fx = f (x, _) : Ω2 → [0, ∞] and f y = f (_, y) : Ω1 → [0, ∞].
Then:
(a) For every x ∈ Ω1 , the function fx is M2 -measurable.

(b) For every y ∈ Ω2 , the function f y is M1 -measurable.

35
(c) The function Ω1 → [0, ∞] given by x 7→
R
Ω2 fx dµ2 is M1 -measurable.

(d) The function Ω2 → [0, ∞] given by y 7→ f y dµ1 is M2 -measurable.


R
Ω1

(e) One has the equalities


Z Z Z  Z Z 
f d(µ1 ⊗µ2 ) = fx dµ2 dµ1 (x) = f dµ1 dµ2 (y).
y
Ω1 ×Ω2 Ω1 Ω2 Ω2 Ω1

Theorem 2.34 (Fubini). Let (Ωi , Mi , µi ) be two σ-finite measure spaces for
i = 1, 2. Let f : Ω1 × Ω2 → C be a M1 ⊗ M2 -measurable function. Denote
fx = f (x, _) : Ω2 → C and f y = f (_, y) : Ω1 → C. Then the following are
equivalent:
Z
(A) |f | d(µ1 ⊗ µ2 ) < ∞;
Ω1 ×Ω2
Z Z 
(B) |fx | dµ2 dµ1 (x) < ∞;
Ω1 Ω2
Z Z 
(C) |f y | dµ1 dµ2 (y) < ∞.
Ω2 Ω1

If any (or every) one of these statements holds, then we have that

(a) The function Ω1 → C given by x 7→


R
Ω2 fx dµ2 is well-defined on a conull
set and µ1 -integrable.

(b) The function Ω2 → C given by y 7→ f y dµ1 is well-defined on a


R
Ω1
conull set and µ2 -integrable.

(c) One has the equalities


Z Z Z  Z Z 
f d(µ1 ⊗µ2 ) = fx dµ2 dµ1 (x) = f dµ1 dµ2 (y).
y
Ω1 ×Ω2 Ω1 Ω2 Ω2 Ω1

Proof. It follows from Tonelli’s theorem that the integrals appearing in (A),
(B), and (C) are always the same, so their finiteness are indeed equivalent.
Now let us assume that these statements are true. By splitting f up into its
real and imaginary parts, let us assume without loss of generality that f is
a real function.
By part (c) in Tonelli’s theorem and Proposition 1.38, it follows that the
set E of all x ∈ Ω1 , for which fx is integrable, is in M1 and its complement
is a null set. Let f = f + − f − be the canonical decomposition into positive
functions as in Remark 1.39. Then it is very easy to see that (f + )x = (fx )+

36
and (f − )x = (fx )− . So fx+ and fx− are both integrable whenever x ∈ E.
Moreover the map
Z Z Z
E ∋ x 7→ fx dµ2 = fx+ dµ2 − fx− dµ2
Ω2 Ω2 Ω2

is M1 -measurable by part (c) in Tonelli’s theorem, and this function is in


fact µ1 -integrable because
Z Z Z Z 
fx dµ2 dµ1 (x) ≤ |fx | dµ2 dµ1 (x) < ∞.



Ω1 Ω2 Ω1 Ω2

Hence it follows that


Z Z 
fx dµ2 dµ1 (x)
Ω1 Z Ω2  Z 
= χE fx dµ2 dµ1 (x)
ZΩ1  ZΩ2 Z 
= χE fx+ dµ2 − fx− dµ2 dµ1 (x)
ZΩ1  ZΩ2  Ω2 Z Z 
= χE fx+ dµ2 dµ1 (x) − χE fx− dµ2 dµ1 (x)
ZΩ1  Z Ω2  Z Ω1Z
 Ω2 
= fx+ dµ2 dµ1 (x) − fx− dµ2 dµ1 (x)
ZΩ1 Ω2 Z Ω1 Ω2
= f +
d(µ1 ⊗ µ2 ) − f − d(µ1 ⊗ µ2 )
ZΩ1 ×Ω2 Ω1 ×Ω2
= f d(µ1 ⊗ µ2 ).
Ω1 ×Ω2

The remaining equality follows exactly in the same way by exchanging the
roles of Ω1 and Ω2 .

2.6 Infinite products of probability measures


Lemma 2.35. Let Ω be a non-empty set and A ⊆ 2Ω a semi-ring with
Ω ∈ A. Suppose that µ : A → [0, 1] is a map with µ(Ω) = 1. Then µ is a
measure if and only if for every sequence An ∈ A of pairwise disjoint sets
with Ω = n≥1 An , one has ∞ n=1 µ(An ) = 1.
S P

Proof. Clearly any measure must satisfy this property, so the “only if” part
is clear.
For the “if” part, we may first take A1 = Ω and An = ∅ for all n ≥
2, which immediately implies µ(∅) = 0. We need to show that µ is σ-
additive. Let Bn ∈ A be a sequence of pairwise disjoint sets such that
B = n≥1 Bn ∈ A. As A is a semi-ring, we have Ω \ B = A1 ∪ · · · ∪ Aℓ
S

for pairwise disjoint sets A1 , . . . , Aℓ ∈ A. Then the assumption implies on

37
the one hand that 1 = µ(B) + ℓn=1 µ(An ). On the other hand, if we set
P

Aℓ+k = Bk for k ≥ 1, then the sequence (An )n≥1 defines a pairwise disjount
covering of Ω, so 1 = ∞ µ(An ). Comparing these two equations, we see
P
n=1P
that µ(B) = n>ℓ µ(An ) = ∞ n=1 µ(Bn ), which shows our claim.
P

Definition 2.36. Let I be a non-empty index set and (Ωi , Mi , µi ) a proba-


bility space for every i ∈ I. We denote Ω = i∈I Ωi , and set
Q

( )
A= Ei | Ei ∈ Mi for all i ∈ I, Ei = Ωi for all but finitely many i ∈ I .
Y

i∈I

The σ-algebra generated by A is denoted by i∈I Mi . We consider the map


N

µ : A → [0, 1] given by µ( i∈I Ei ) = i∈I µi (Ei ). Note that these products


Q Q

are well-defined as all but finitely many factors are assumed to be 1.

Theorem 2.37. Adopt the notation from the above definition. Then A is a
semi-ring and µ is a measure on A. In particular, it extends uniquely to a
probability measure µ on i∈I Mi . One also denotes µ = i∈I µi .
N N

Proof. The “in particular” part is due to Corollary 2.18. By definition, every
set in A is a product of subsets of the spaces Ωi which are proper subsets
only over finitely many indices. In particular, given any sequence An , there
are countably many indices in I so that over every other index i ∈ I, the
projection of every set An to the i-th coordinate yields Ωi . Considering the
semi-ring axioms for A and the axioms of being a measure for µ, we can see
that it is enough to consider the case where I is countable. As we already
know that the claim is true if I is finite, we may from now on assume I = N.
The fact that A is a semi-ring now follows directly from the exercises. We
hence need to show that µ is a measure on A, where our goal is to appeal
to the condition in the above lemma. Suppose that An ∈ A is a sequence
of pairwise disjoint sets with Ω = n≥1 An . It is our intention to show
S

n=1 µ(An ) = 1.
P∞

For each n ≥ 1 let us write An = ∞ i=1 An,i and pick in ≥ 1 such that
Q

An,i = Ωi whenever i > in . For all m, n ∈ N and ω = (ωi )i ∈ Am we claim to


have the equation

χAn,i (ωi ) · µi (An,i ) = δn,m .


Y Y

i≤im i>im

Indeed, if n = m, then all the involved factors are equal to one, so the
equation holds. Assume n ̸= m. We observe χAk ((ωi )i ) = i≤ik χAk,i (ωi ). As
Q

1= ∞ k=1 χAk , we conclude for (ωi )i ∈ Am that 0 = i≤in χAn,i (ωi ). So either
P Q

in ≤ im , in which case the desired inequality above follows immediately. Or,

38
if in > im , then every tuple of the form (ω1 , . . . , ωim , αim +1 , . . . ) is also in Am ,
so analogously
in
0= χAn,i (ωi ) · χAn,i (αi ).
Y Y

i≤im i=im +1

By consecutively integrating over αi for i = im + 1, . . . , in , we recover the


equation we wish to show. Finally, let us assume that ∞ n=1 µ(An ) ̸= 1. We
P

have ∞ ∞ ∞
µ(An ) = µi (An,i )
X XY

n=1 n=1 i=1


∞ Z ∞
Y 
= χAn,1 (ω1 ) · µi (An,i ) dµ1 (ω1 )
X

n=1 Ω1  i=2
Z X ∞ ∞ 
= χAn,1 (ω1 ) · µi (An,i ) dµ1 (ω1 )
Y
Ω1 n=1 i=2

As this quantity is not 1, there must be some ω1 ∈ Ω1 for which


∞ ∞
Y 
χAn,1 (ω1 ) · µi (An,i ) ̸= 1.
X

n=1 i=2

By iterating this process, we can come up with a tuple ω = (ωi )i ∈ Ω


satisfying
∞ hY 
χAn,i (ωi ) · µi (An,i ) ̸= 1
X Y

n=1 i≤k i>k

for all k ≥ 1. However, since ω ∈ Am for some m, it follows from our previous
observation for k = im that precisely one of these summands is 1 and the
others are zero. This leads to a contradiction. Hence we have indeed that
n=1 µ(An ) = 1.
P∞

3 Probability
3.1 Probability spaces and random variables
Definition 3.1. A probability space is a measure space (Ω, M, µ) with
µ(Ω) = 1. In this context,
• µ is called a probability measure,

• Ω is called a sample space,

• the elements of Ω are called outcomes,

• the elements of M are called events,

39
• the value µ(A) is called the probability of (the event) A. (To be even
more suggestive, one often writes P instead of µ.)
The following easy special case illustrates how the most elementary prob-
abilistic experiments, such as tossing a coin only finitely many times, can be
thought of in this framework. We omit the easy proof.
Proposition 3.2. Let Ω be a non-empty countable set and M = 2Ω . Let
p : Ω → [0, 1] be any map with the property ω∈Ω p(ω) = 1. Then the
P

assignment 2Ω ∋ A 7→ P(A) = ω∈A p(ω) defines a probability measure.


P

Conversely, if P is a probability measure on Ω, then p(ω) = P({ω}) defines


a map with ω∈Ω p(ω) = 1.
P

For more involved probabilistic questions, the sample space is not neces-
sarily countable, which makes it somewhat more difficult to come up with
the right choices of events and probability measures. The following can be
seen as a model for tossing a coin infinitely often.
Example 3.3 (infinite coin tossing). For each toss, a coin can only come
up as heads or tails, which we conveniently denote as the outcomes 0 and 1.
Since we want to model tossing the coin infinitely often, the outcomes are
sequences having value 0 or 1. This gives rise to the sample space Ω = {0, 1}N .
A rather obvious example of an event is when the first k tosses are equal
to some k-tuple ω ∈ {0, 1}k . The associated subset of Ω is given as Aω =
>k
{ω} × {0, 1}N . For example, A1 is the event where the first coin toss comes
up as tails, or A0,1,1 is the event where the first three tosses come up as heads
→ tails → tails. Let M be the event σ-algebra generated by all such sets.
We note that a lot of other natural choices for events are automatically in
M. For example, the event that the n-th coin toss comes up as heads is given
by
>n
{0, 1}n−1 × {0} × {0, 1}N =
[
A(ω,0) .
ω∈{0,1}n−1

As for determining the probability, we want all of our coin tosses to be


fair, meaning that there is always an equal chance that either heads or tails
comes up. In particular, the coin tosses should all be independent from each
other. If µ : M → [0, 1] is supposed to be a probability measure modelling
this behavior, then we can agree on µ(A0 ) = µ(A1 ) = 21 . Inductively, we
may conclude that for all ω ∈ {0, 1}n , one should have µ(Aω ) = 2−n . In
this case, if we view {0, 1} as a discrete space, we may give Ω the product
topology, in which case M is the Borel σ-algebra. If µ2 : {0, 1} → [0, 1] is
the measure that assigns the value 12 to both singletons, then µ is in fact the
infinite product measure µ = ∞ n=1 µ2 .
N

40
Definition 3.4. Let (Ω, M, P) be a probability space and W a topological
space. A W -valued random variable X is a measurable map X : Ω → W .
For a Borel subset B ⊆ W , we will frequently use the popular notation
(X ∈ B) for the preimage X −1 (B).

Remark 3.5. One sometimes also says “stochastic variable”. In the partic-
ular case W = Rn , one calls it a vector random variable, and for W = R, a
real random variable. The case W = RN is referred to as a random sequence.
In the case of a real random variable X, we will also freely play around with
the above notation, for example (|X| ≤ 1) is written instead of (X ∈ [−1, 1]),
etc.

Remark 3.6. From our previous study on measurable maps we can observe
that real random variables are closed under addition and multiplication, and
pointwise limits. General random variables are closed under the same type
of operations under which measurable maps are closed.

Definition 3.7. Let (Ω, M, P) be a probability space and X a W -valued


random variable. The distribution of X is the Borel probability measure PX
on W given by PX (B) = P(X ∈ B). If X and Y are two W -valued random
variables (defined on possibly different probability spaces), we say that X
and Y are identically distributed if PX = PY .

Definition 3.8. Let X be a real random variable. Then its (cumulative)


distribution function FX : R → R is defined as FX (t) = PX ((−∞, t]) =
PX (X ≤ t).

Indeed, in the situation above, the measure PX on R is a Lebesgue–


Stieltjes measure, so we know from the previous chapter that FX is an in-
creasing and right-continuous function.

Definition 3.9. Let X be a real random variable on a probability space


(Ω, M, P). We say that X has an expected value (or mean), if X is integrable
as a function X : Ω → R. In this case, its expected value (or mean) is defined
as Z
E(X) = X dP.

Proposition 3.10. Let X be a real random variable. Then X has an expected


value if and onlyR if id : R → R is PX -integrable. In this case, its expected
value is E(X) = R x dPX (x). In particular, the expected value only depends
on the distribution of X.

Proof. This is a direct consequence of Theorem 1.54 and Remark 1.55.

41
Proposition 3.11. Let W be a topological space and X a W -valued random
variable with distribution PX . Let f : W → R be a Borel measurable function.
Then f ◦ X = f (X) has Ran expected value if and only if f is PX -integrable.
In that case, E(f (X)) = W f dPX .
Notation 3.12. Certain expected values get a special name. Let X be a
real random variable.
(i) Given k ≥ 0, we call E(|X|k ) the k-th absolute moment. If it exists,
then E(X k ) is called the k-th moment.
(ii) Suppose X has an expected value mX = E(X). Then Var(X) = E((X−
mX )2 ) is called the variance of X.
(iii) More generally, suppose X and Y are two real random variables on the
same probability space, for which the second moments exist. Then it
follows from Hölder’s inequality that all of X, Y , XY have expected
values, and the value Cov(X, Y ) = E((X − mX )(Y − mY )) is called the
covariance of X and Y .

3.2 Independence
Definition 3.13. Let (Ω, M, P) be a probability space. Two events A, B ∈
M are called independent, if P(A ∩ B) = P(A)P(B).
Example 3.14. In the context of seeing measure theory as a way to assign
a “volume” to certain sets, this notion has no clear meaning. The concept
does however become relevant if one adopts the probabilistic point of view.
For example, keep in mind Example 3.3 modelling the infinite coin tossing.
In that context, we may for example define A to be the event where both
the first and second toss comes up as heads, and B the event where both the
third and fourth toss comes up as tails. In other words,
≥3 ≥5
A = {0} × {0} ⊗ {0, 1}N , B = {0, 1} × {0, 1} × {1} × {1} ⊗ {0, 1}N .
In our model we assume that the individual coin tosses are not supposed
to influence each other, and hence we should certainly view these events as
independent. Indeed, we have here
≥5
A ∩ B = {0} × {0} × {1} × {1} ⊗ {0, 1}N ,
so by the properties of the product measure µ we can see here that µ(A) = 14 ,
µ(B) = 14 , and µ(A ∩ B) = 16 1
. Similarly, all pairs of events defined by the
outcomes of coin tosses happening at distinct times will be independent in
this model.

42
Definition 3.15. Let (Ω, M, P) be a probability space. A family of events
{Ai }i∈I ⊆ M is called independent, if for all pairwise distinct indices i1 , . . . , in ∈
I, one has P(Ai1 ∩ Ai2 ∩ · · · ∩ Aii ) = P(Ai1 )P(Ai2 ) · · · P(Ain ).

Definition 3.16. Let Ω be a non-empty set. For a sequence An ∈ 2Ω , we


define
lim sup An = Ak and lim inf An =
\ [ [ \
Ak .
n→∞ n→∞
n≥1 k≥n n≥1 k≥n

Example 3.17. Let us motivate this again from the point of view of our
infinite coin tossing model. Let Hn be the event that where the n-th toss
comes up heads. Then the event lim supn→∞ Hn describes the situation where
heads comes up infinitely many times, and the event lim inf n→∞ Hn describes
the situation where heads comes up in all but finitely many tosses. For these
reasons, it is not uncommon in a probabilistic context to use the notation

lim sup An = (An , infinitely often) = (An , i.o.)


n→∞

and
lim inf An = (An , almost always) = (An , a.a.).
n→∞

Proposition 3.18. Let (Ω, M, P) be a probability space, and An ∈ M a


sequence. Then

P(lim inf An ) ≤ lim


n→∞
inf P(An ) ≤ lim sup P(An ) ≤ P(lim sup An ).
n→∞ n→∞ n→∞

Proof. See exercises.

Lemma 3.19 (Borel–Cantelli Lemma). Let (Ω, M, P) be a probability space,


and An ∈ M a sequence.

(i) Suppose P(An ) < ∞. Then P(lim sup An ) = 0.
X

n=1 n→∞


(ii) Suppose P(An ) = ∞. If {An }n∈N is an independent family, then
X

n=1
P(lim sup An ) = 1.
n→∞

Proof. We prove (i) in the exercise sessions, so we only need to prove (ii).
We will use the fact from the exercise sessions that the family of com-
plements {Acn }n∈N is also independent. Moreover, we are about to use the

43
well-known inequality 1 − x ≤ e−x for all x ∈ R. We observe for all n ≥ 1
that  \∞   \ m 
P c
Ak = m→∞
lim P c
Ak
k=n k=n
m
= lim P(Ack )
Y
m→∞
k=n
m
= lim (1 − P(Ak ))
Y
m→∞
k=n
m
lim exp(−P(Ak ))
Y

m→∞
k=n  m 
= lim exp P(Ak )
X
m→∞

k=n
= 0.
We conclude that n≥1 k≥n Ack = lim inf n→∞ Acn is a null set. Therefore
S T

lim supn→∞ An = (lim inf n→∞ Acn )c is co-null, which shows our claim.
Example 3.20. Let us have yet another look at our infinite coin tossing
model and what we observed in Example 3.14. Let Hn denote the event
where the n-th coin toss comes up as heads. Then the family {Hn }n∈N is
independent, each event has probability 12 , and hence it follows from the
above that the event (Hn , infinitely often) has probability 1.
Definition 3.21. Let (Ω, M, P) be a probability space. A family of sub-
σ-algebras Mi ⊆ M for i ∈ I is called independent, if for every collection
{Ai }i∈I ⊆ M with Ai ∈ Mi is independent.
Definition 3.22. Let (Ω, M, P) be a probability space. Let I be an index set,
and let Xi be a random variable on (Ω, M, P) with values in the topological
space Wi for i ∈ I. We call the family {Xi }i∈I independent, if for every
collection {Si }i∈I of Borel sets Si ⊆ Wi , we have that the family of events
{(Xi ∈ Si )}i∈I is independent. (In other words, if we denote by Bi the Borel-
σ-algebra on Wi , this condition means that the family of pullback σ-algebras
{Xi∗ (Bi )}i∈I is independent in the above sense.)
Theorem 3.23. Let {Pi }i∈I be a family of Borel probability measures on R.
Then there exists a probability space (Ω, M, P) with an independent family of
real random variables {Xi }i∈I such that Pi = PXi .
Proof. We set (Ω, M, P) = i∈I (R, B, Pi ) and define Xi : Ω → R to be the
N

projection onto the i-th component. It is an easy exercise to see that this
yields an independent family with the desired property.
Since the proof of the following is very easy, we leave it as an exercise.

44
Proposition 3.24. Let (Ω, M, P) be a probability space. Let I be an index
set, and let Xi be a random variable on (Ω, M, P) with values in the topological
space Wi for i ∈ I. For each i ∈ I, let Vi be another topological space and
fi : Wi → Vi a Borel measurable map. If the family of random variables
{Xi }i∈I is independent, then so is {fi (Xi )}i∈I .

Proposition 3.25. Let X and Y be two mutually independent, real random


variables defined on a probability space (Ω, M, P). If both X and Y have a
mean, then so does XY , and in fact E(XY ) = E(X)E(Y ).

Proof. Consider the pullback σ-algebras

MX = {(X ∈ S) | S ⊆ R Borel} , MY = {(Y ∈ S) | S ⊆ R Borel} ,

which are both sub-σ-algebras of M. Given any A ∈ MX and B ∈ MY , it


follows by independence that
Z Z Z
χA χB dP = P(A ∩ B) = P(A)P(B) = χA dP · χB dP.
Ω Ω Ω

By linearity, it follows that for any MX -measurable simple function s : Ω →


[0, ∞) and MY -measurable simple function t : Ω → [0, ∞), one has the
equation Z Z Z
st dP = s dP · t dP.
Ω Ω Ω
By the Monotone Convergence Theorem, this even holds when s and t are
not assumed to be simple. Applying this to s = |X| and t = |Y | yields
E(|XY |) = E(|X|)E(|Y |) < ∞, so indeed XY has a mean. Furthermore if
we decompose X = X + − X − and Y = Y + − Y − , then

E(XY ) = RΩ XY dP
R

= RΩ (X + − X − )(Y + − Y − ) dP
= Ω X + Y + + X − Y − − X − Y + − X + Y − dP
= E(X + )E(Y + ) + E(X − )E(Y − ) − E(X − )E(Y + ) − E(X + )E(Y − )
= (E(X + ) − E(X − ))(E(Y + ) − E(Y − ))
= E(X)E(Y ).

3.3 Law of Large Numbers


Before we come to the main point of this subsection, we need to cover some
observations that are useful in computations.

45
Theorem 3.26 (Chebyshev’s inequality). Let X be a real random variable
on the probability space (Ω, M, P). Suppose that f : R → R≥0 is an even
measurable function whose restriction to R≥0 is increasing. Let a ≥ 0 be any
number with f (a) > 0. Then
1
P(|X| ≥ a) ≤ E(f (X)).
f (a)

Proof. We simply observe:


Z
E(f (X)) = f ◦ X dP
ZΩ
≥ (f ◦ X)χ|X|≥a dP
Ω Z
≥ f (a) χ|X|≥a dP

= f (a)P(|X| ≥ a).

Corollary 3.27. Let X be a real random variable on the probability space


(Ω, M, P).
1
(i) For all a, p > 0, one has P(|X| ≥ a) ≤ E(|X|p ).
ap
(ii) If X has an expected value, then for all a > 0, one has
1
P(|X − mX | ≥ a) ≤ Var(X).
a2

Proof. The first part follows from the general Chebyshev inequality for f (x) =
|x|p . The second part follows when applying it to X −mX as the real random
variable and the function f (x) = x2 .

Definition 3.28. Let Xn be a sequence of real random variables defined on


a probability space (Ω, M, P). Given a real random variable X on the same
probability space, we say that Xn converges to X in probability, if for all
n→∞ p
ε > 0, one has P(|X − Xn | > ε) −→ 0. One writes Xn → X.

Theorem 3.29 (Weak Law of Large Numbers). Let Xn be a sequence of


identically distributed real random variables defined on a probability space
(Ω, M, P) that are pairwise independent. Suppose that every (or any) Xn has
p
a mean m ∈ R. Then it follows that n1 nk=1 Xk → m.
P

46
Proof. Since they are identically distributed, it follows that both the mean
and the variance of Xn are the same for every n ≥ 1. We may assume without
loss of generality m = 0. Let us first assume that the variance of Xn is finite
and equal to σ 2 for all n ≥ 1. For every n ≥ 1, denote X̄n = n1 nk=1 Xk . We
P

have that X̄n also has mean 0, and hence


1 n
nσ 2 σ2
 X 
Var(X̄n ) = E(X̄n2 ) = 2E Xj Xk = = .
n j,k=1 n2 n

Here we used Proposition 3.25 for j ̸= k. By applying Corollary 3.27(ii), we


may hence see
E(X̄n2 ) σ 2 n→∞
P(|X̄n | ≥ ε) ≤ = 2 −→ 0
ε2 nε
for all ε > 0.
Now for the general case, we allow the possibility that Xn has infinite
variance. Let us still assume m = 0. For all n, N ≥ 1, write Xn = Xn≤N +
Xn>N , where Xn≤N = Xn · χ(|Xn |≤N ) . Then by the Monotone Convergence
Theorem
N →∞
aN := E(|Xn>N |) = E(|Xn |) − E(|Xn≤N |) −→ 0,
where we are using that these values do not depend on n as the Xn were
identically distributed. Let us also denote
1X n
1X n
X̄n≤N = X ≤N and X̄n>N = X >N .
n j=1 j n j=1 j

We have in particular for any ε > 0 that


E(|X̄n>N |) aN
P(|X̄n>N | ≥ ε) ≤ ≤ .
ε ε
So, given any δ > 0 with δ < 1 and any N ≥ 1 large enough such that
aN ≤ 21 δε for all n ≥ 1, it follows that P(|X̄n>N | ≥ ε/2) ≤ δ. Hence
 
P(|X̄n | ≥ ε) ≤ P |X̄n>N | + |X̄n≤N | ≥ ε
 
≤ δ + P |X̄n>N | + |X̄n≤N | ≥ ε ∧ |X̄n>N | ≤ ε/2
 
≤ δ + P |X̄n≤N | ≥ ε/2
 
≤ δ + P |X̄n≤N − (E(X̄n≤N ) + E(X̄n>N ))| ≥ ε/2
 
≤ δ + P |X̄n≤N − E(X̄n≤N )| ≥ 2ε (1 − δ)
n→∞
−→ δ.

Here we have used that Xn≤N has finite variance and the above falls into our
p
previous subcase. As δ > 0 was arbitrary, this verifies X̄n → 0 in general.

47
There is also a stronger law, i.e., a theorem with a stronger conclusion
than the weak law of large numbers. We will prove this strong law under an
additional assumption.10
Theorem 3.30 (Strong Law of Large Numbers). Let Xn be an independent
sequence of identically distributed real random variables defined on a proba-
bility space (Ω, M, P). Suppose that every (or any) Xn has a fourth moment,
meaning that E(Xn4 ) < ∞. If m  = E(Xn ) is the mean of all these random
variables, then it follows that P n k=1 Xk → m = 1.
1 Pn

Proof. We first note that applying Hölder’s inequality twice yields E(|Xn |)4 ≤
E(Xn2 )2 ≤ E(Xn4 ) =: M4 , hence the assumptions on Xn ensure that m exists.
As before, we assume without loss of generality m = 0 by replacing each
Xn by Xn − m, if necessary. Denote X̄n = n1 nk=1 Xk . Then
P

1 n
E(X̄n4 ) = E(Xi Xj Xk Xl ).
X
n4 i,j,k,l=1

Considering the summand over the tuple (i, j, k, l), it follows from the inde-
pendence of the sequence Xn and Proposition 3.25 that it is zero if there is
one entry in this tuple that is different from all the other three. In other
words, the summand can only be non-zero if all four entries agree, or if the
tuple has two different indices occuring exactly twice. Two different entries
can possibly occur in the pattern (k, k, l, l), (k, l, k, l) or (k, l, l, k), leading
to 3n(n − 1) possible summands. Using Proposition 3.25 once again, the
expression hence simplifies to
1 X
 n n 
E(X̄n4 ) = 4 E(Xk ) + 3
4
E(Xk )E(Xl )
2 2
X
n k=1 k,l=1
k̸=l

By the very beginning of the proof, each summand in the bracket can be esti-
mated above by M4 , and there are a total of n + 3n(n − 1) ≤ 3n2 summands,
hence
3n2 M4 3M4
E(X̄n4 ) ≤ 4
= 2 .
n n
If we apply Corollary 3.27 for p = 4, it follows for every ε > 0 that
3M4
P(|X̄n | ≥ ε) ≤ .
n2 ε4
10
The conclusion of the theorem is actually true under the same assumptions we had
for the weak law. The proof of the general case is however quite demanding, which is why
we add the stronger assumption heresome common extra assumptions allowing for a much
::::::::::::::::::::::::::::::::::::::::::::::
simpler proof.
::::::::::::
Anyone who is interested in the proof of the general case is referred to, for
example, “Theorem 22.1” in the book “Probability and Measure” by Patrick Billingsley.

48
The right side is a summable sequence of non-negative numbers. Hence it
follows by the Borel–Cantelli Lemma that
P(|X̄n | ≥ ε, infinitely often) = 0.
However, we note that by the definition of convergence of sequences, we have
(X̄n ̸→ 0) = (|X̄n | ≥ ε, infinitely often),
[

ε>0
ε∈Q

and hence it really follows that X̄n → 0 occurs almost surely.

3.4 Central Limit Theorem


Definition 3.31. Given a real random variable X, one defines its character-
istic function φX : R → C via φX (t) = E(eitX ).
Theorem 3.32 (Lévy’s Inversion Theorem). Let X be a real random variable
with characteristic function φX and distribution PX . Then one has for all
a, b ∈ R with a < b that
1 Z T e−ita − e−itb 1   1
lim φX (t) dt = PX ({a}) + PX (a, b) + PX ({b}).
T →∞ 2π −T it 2 2
Proof of Theorem 3.32. Note first that due to the l’Hôpital rule, we have
−ita −itb
limt→0 e −e t
= i(b−a), so the function in the integral is to be understood
as a continuous function in this sense. Using Fubini’s theorem, we may write
Z T
e−ita − e−itb Z  Z T it(x−a)
e − eit(x−b)

φX (t) dt = dt dPX (x).
−T it | {z } R −T it
=E(eitX )

For the purpose of the proof, we define the function κ : R → R via κ(x) =
dt for x ≥ 0, and κ(x) = −κ(−x) when x < 0. We will appeal
R x sin t
0 t
(without proof) to a fact from calculus stating that limx→∞ κ(x) = π2 . We
obviously have the identity dx
d
κ(x) = sin(x)
x
for all x ≥ 0. Using the fact that
cos is an even function, we hence observe that
Z T
eit(x−a) − eit(x−b)
dt
−T it
Z T
cos(t(x − a)) − cos(t(x − b)) + i sin(t(x − a)) − i sin(t(x − b))
= dt
−T it
Z T
sin(t(x − a)) − sin(t(x − b))
= dt
−T t

= 2κ(T (x − a)) − 2κ(T (x − b)).

49
In particular,
Z T
e−ita − e−itb Z
φX (t) dt = 2 κ(T (x − a)) − κ(T (x − b)) dPX (x)
−T it R

In order to determine what happens within the integral when T → ∞, we


note that due to the initial remark about κ we obtain

0 x < a or x > b


,
lim κ(T (x − a)) − κ(T (x − b)) = π , a<x<b
T →∞
x = a or x = b.


π/2 ,

Since PX is a probability measure, it follows from the Dominated Convergence


Theorem that
1 Z T e−ita − e−itb Z
1
lim φX (t) dt = χ{a,b} + χ(a,b) dPx ,
T →∞ 2π −T it R 2

which is precisely the claim.


We will use the following fact from real analysis without proof:11

Theorem 3.33. A monotone function f : R → R can have at most countably


many points of discontinuity.

Corollary 3.34. Two real random variables are identically distributed if and
only if their characteristic functions are equal.

Proof. The “only if” part is clear, so we show the “if” part. Let X and Y be
two real random variables. The distribution function FX (t) = PX ((−∞, t])
is increasing, and therefore by the above theorem, it is discontinuous in at
most countably many points. In analogy to Example 2.27, this means that
we may have PX ({t}) ̸= 0 for at most countably many t ∈ R. The same
observation is of course true for Y in place of X. Let W ⊆ R be the subset
of all points t with PX ({t}) = 0 = PY ({t}). Then W is co-countable and in
particular dense. It is then an easy exercise to show that the semi-ring

AW = {(a, b] | a, b ∈ W, a < b} ⊆ 2R

generates the Borel σ-algebra. By Theorem 2.17, it follows that PX = PY


holds if and only if PX and PY agree on AW . If we assume φX = φY , then
this is a direct consequence of Theorem 3.32.
11
For those interested, see for example “Theorem 4.30” in the book “Principles of Math-
ematical Analysis” (third edition) by Walter Rudin.

50
Definition 3.35. A sequence Pn of Borel probability measures on R is said
to weakly converge to a Borel probability measure P, if one has
Z Z
lim
n→∞ R
f dPn = f dP
R

for every compactly supported continuous function f : R → R. We write


w
Pn → P.
If Fn is the distribution function for Pn and F the distribution function
w w
for P, we say that Fn converges to F weakly, written Fn → F , if Pn → P.

Definition 3.36. A sequence of real random variables Xn is said to converge


d w
in distribution to a real random variable X, written Xn → X, if PXn → PX .

Remark. WARNING! Unlike the notions of convergence we considered be-


fore, convergence of real random variables in distribution is not well-behaved
with respect to standard operations such as addition or multiplication. That
d d d
is, if Xn → X and Yn → Y holds, it is not necessarily true that Xn + Yn →
d
X + Y or Xn Yn → XY .
w
We also note that it is shown in the exercises that for a sequence Pn → P,
the same kind of limit behavior follows for f being any bounded continuous
function on R, not necessarily compactly supported. We will subsequently
use this characterization of weak convergence without further mention.

Theorem 3.37. Let Fn , n ≥ 1, and F be distribution functions with respect


to Borel probability measures Pn and P. Let C(F ) ⊆ R be the set of all points
w
over which F is continuous. Then Fn → F if and only if Fn (x) → F (x) for
all x ∈ C(F ).
w
Proof. Let us first assume Fn → F holds. We shall first show the following
intermediate claim. If A ⊆ R is a closed interval, then lim supn→∞ Pn (A) ≤
P(A). Indeed, we may define a pointwise decreasing sequence of piecewise
linear functions fk : R → R with fk |A = 1 and limk→∞ fk (t) = 0 for all t ∈
/ A.
Then Z Z
lim sup Pn (A) ≤ lim sup fk dPn = fk dP, k ≥ 1.
n→∞ n→∞ R R
w
Here we used the fact that Pn → P. Letting k → ∞ on the right side, we
obtain P(A) as the limit by the Dominated Convergence Theorem, so the
intermediate claim is proved.
Now let x ∈ C(F ). Then it follows from the intermediate claim that

lim sup Fn (x) = lim sup Pn ((−∞, x]) ≤ P((−∞, x]) = F (x).
n→∞ n→∞

51
Since x is a continuous point of F , we also have P({x}) = 0 and hence

F (x) = P((−∞, x)) = 1 − P([x, ∞))


≤ 1 − lim sup Pn ([x, ∞))
n→∞

= lim inf Pn ((−∞, x))


n→∞

≤ lim inf Fn (x).


n→∞

Conversely, let us assume that Fn (x) → F (x) holds for all x ∈ C(F ). By
definition, we may easily observe Pn ((a, b]) → P((a, b]) for all a, b ∈ C(F )
with a < b. Therefore, it follows that
Z Z
f dPn → f dP
R R

holds whenever f belongs to the linear subspace spanned by all indicator


functions χ(a,b] for a, b ∈ C(F ). Since C(F ) is dense in R, it is easy to see
that the closure of this linear subspace with respect to the sup-norm ∥ · ∥∞
contains the space of continuous compactly supported functions. So (with a
standard ε/2-argument) we may conclude that the above limit behavior even
holds for f being any compactly supported continuous function.
Definition 3.38. A sequence of Borel probability measures (Pn )n on R is
called tight, if
lim lim inf Pn ([−R, R]) = 1.
R→∞ n→∞

Remark. In the exercise sessions, we will prove the following analytical


statement which is useful for the proof of the next theorem. Let Fn : R →
[0, 1] be a sequence of increasing right-continuous functions satisfying the
limit formula limR→∞ Fn (R) − Fn (−R) = 1.
• If the sequence Fn (x) is convergent for a dense set of numbers x ∈ R, then
there exists an increasing right-continuous function F : R → [0, 1] such
that Fn (x) → F (x) holds whenever F is continuous in x.
• Suppose that a limit function F as above exists and that

lim lim inf Fn (R) − Fn (−R) = 1.


R→∞ n→∞

Then F is the distribution function for a Borel probability measure on R.


Theorem 3.39 (Helly’s Selection Theorem). For any tight sequence (Pn )n
of Borel probability measures on R, there exists a subsequence (Pnk )k and a
w
Borel probability measure P on R such that Pnk → P.

52
Proof. Let Fn be the distribution function of Pn for every n ≥ 1. Then Pn
being tight translates to
lim lim inf Fn (R) − Fn (−R) = 1.
R→∞ n→∞

Of course this tightness criterion holds for every subsequence as well. By


Theorem 3.37 and the above remark, it suffices to show that there is some
increasing sequence of natural numbers nk such that Fnk converges pointwise
on a dense set of real numbers, for example on the rational numbers Q.
Let N ∋ ℓ 7→ qℓ be an enumeration of Q. By Bolzano–Weierstrass, we
know that there is some increasing sequence of numbers n(1,k) such that
Fn(1,k) (q1 ) converges as k → ∞. Applying Bolzano–Weierstrass again, we
know that (n(1, k))k admits a subsequence (n(2, k))k such that Fn(2,k) (q2 )
converges as k → ∞. Proceed like this inductively, and find finer and finer
subsequences (n(ℓ, k))k such that Fn(ℓ,k) (qℓ ) converges as k → ∞. Finally
define nk = n(k, k). Then (nk )k is a subsequence of (n(ℓ, k))k (up to the
finitely many indices k ≤ ℓ) for every ℓ ≥ 1, so indeed Fnk (qℓ ) converges as
k → ∞, for every ℓ ≥ 1. This finishes the proof.
Lemma 3.40. Let X be a real random variable. Then we have for all R > 0
the estimate Z 1/R
P(|X| > 2R) ≤ R 1 − φX (t) dt.
−1/R

Proof. Using Fubini’s theorem we compute


Z 1/R Z 1/R
R 1 − φX (t) dt = 2 − R φX (t) dt
−1/R −1/R
Z 1/R Z
= 2−R eitx dPX (x) dt
−1/R R
Z Z 1/R
= 2−R cos(tx) dt dPX (x)
R −1/R
Z Z 1/R
= 2 − 2R cos(tx) dt dPX (x)
R 0
sin(x/R)
Z
= 2 − 2R dPX (x)
R x
 sin(X/R) 
= 2E 1 −
X/R
Note that we have sin(X/R)
X/R
≤ 1. Moreover, if any x ∈ R satisfies |x| > 2R,
then |x/R| > 2, in which case | sin(x/R)| ≤ 1 ≤ |x|/2R, hence 1− sin(x/R)
x/R
≥ 12 .
This leads to the inequality χ(|X|>2R) ≤ 2(1 − sin(X/R)
X/R
), and hence forming
the expected value on both sides yields the claim.

53
Theorem 3.41 (Lévy’s Continuity Theorem). Let Xn and X be real random
d
variables for n ≥ 1. Then Xn → X holds if and only if φXn (t) → φX (t) for
all t ∈ R.

Proof. The direction “⇒” holds by definition of the characteristic functions


and by what it means to converge in distribution. For the converse, let us
assume φXn (t) → φX (t) for all t ∈ R. Let us first show that the sequence
PXn is tight. Indeed, using the above lemma we observe for R > 0 that

PXn ([−R, R]) =1 − PXn (|Xn | > R)


R Z 2/R
≥ 1− 1 − φXn (t) dt
2 −2/R
n→∞ R Z 2/R
−→ 1 − 2 · 1 − φX (t) dt.
4 −2/R
Here we also used the Dominated Convergence Theorem. Using the fact
that φX is a continuous function with φX (0) = 1, we see that the integral
R R 2/R
4 −2/R
1 − φX (t) dt goes to zero as R → ∞. Hence the above inequality
yields
lim lim inf PXn ([−R, R]) = 1,
R→∞ n→∞

or in other words the tightness of the sequence PXn .


d
In order to show Xn → X we proceed by contradiction, and suppose
that Xn does not converge to X in distribution. This means thatR for some
compactly supportedRcontinuous function f : R → R, one has that R f dPXn
does not converge to R f dPX . Thus, after passing to a subsequence, we may
assume without loss of generality that
Z Z
lim inf > 0.

n→∞
f dPXn − f dPX
R R

By Theorem 3.39, we may again pass to a subsequence and assume without


d
loss of generality that Xn → Y for some real random variable Y . By the
“⇒” part, we obtain φX (t) = limn→∞ φXn (t) = φY (t) for all t ∈ R. By
Corollary 3.34, it follows that PX = PY , but this leads to a contradiction to
d
what we assumed above. We conclude that Xn → X must hold in the first
place.

Lemma 3.42. For all x ∈ R and n ≥ 0, we have the inequality


n
(ix)k 2|x|n |x|n+1
 
min
ix X
e − ≤ , .

k=0 k!

n! (n + 1)!

54
Proof. We proceed by induction in n. We note that it suffices to consider the
case x ≥ 0 since we may obtain the claim for x < 0 by complex conjugation
n
of the involved terms. Let us first proceed with proving that 2|x|
n!
is an upper
bound. For n = 0, this is equal to 2, and since e has norm one, this follows
ix

from the triangle inequality. Now assume that this inequality holds for a
given number n ≥ 0 and all x ≥ 0. We then compute
n+1
(ix)k x it n
(it)k
Z
=
ix X X
e − e − dt
k=0 k! k=0 k!

0
Z x n
(it)k

it X
≤ e −

dt
0 k=0 k!
Z x n
2t 2xn+1
≤ dt = .
0 n! (n + 1)!
|x| n+1
We can also proceed by induction to show the upper bound (n+1)! . Then
one can first consider n = −1, in which case both the left and right side is
1. One can then perform the induction step (n − 1) → n with a completely
analogous calculation as above, namely
n
(ix)k x it X (it)k
n−1
Z
=
ix X
e − e − dt
k=0 k! k=0 k!

0
Z x
it X (it)k
n−1
≤ e − dt
k=0 k!

0
Z x n
t xn+1
≤ dt = .
0 n! (n + 1)!

Lemma 3.43. Let X be a real random variable such that E(X 2 ) = 1 and
E(X) = 0. Then one has the limit formula
√ 2
lim φX (t/ n)n = e−t /2 , t ∈ R.
t→∞

Proof. As an application of Lemma 3.42, we obtain for all t ∈ R the estimate


 t2 X 2   t3 |X 3 |   t|X 3 | 
itX
e − 1 + itX − ≤ min t2 X 2 , = t2 min X 2 , .
2 6 6
In particular, since X has a second absolute moment we see that the random
variable on the left has
 a mean.  Moreover the expected value of the random
3
2 t|X |
variable X0 = min X , 6
t
tends to zero as t → 0 as a consequence of

55
the Dominated Convergence Theorem. So by integrating and applying the
triangle inequality, we see
t2  t2 X 2 
φX (t) − 1 + ≤ E eitX − 1 − itX + ≤ t2 E(X0t ).

2 2

If we substitute t → t/ n, this leads to the observation that
√  t2  n→∞
xn = n(φX (t/ n) − 1 − ) −→ 0.
2n
Using the fact from calculus that one has convergence
 x n
ex = lim 1 +
n→∞ n
uniformly over bounded intervals, we finally conclude that
√ √ n
 
φX (t/ n)n = 1 + (φX (t/ n) − 1)
t2 √ t2 n
− n(φX (t/ n) − 1 + )

= 1− 2 2n
n
t2

n
x

n
= 1− 2
n
n→∞ 2 /2
−→ e−t .

Theorem 3.44 (Central Limit Theorem). Let Xn be an independent se-


quence of identically distributed real random variables with mean E(Xn ) = 0
and variance E(XR n2 ) = 1. Consider the standard normal variable N given by
PN ((a, b]) = √12π ab e−t /2 dt. Then one has
2

1 X n
d
√ Xk −→ N .
n k=1
Proof. Since the Xn are identically distributed, they all have the same charac-
teristic function, which we shall denote by φX . Denote Yn = n−1/2 nk=1 Xk ,
P
d
so that we wish to show Yn → N . Using that the Xn are independent, we
observe for all t ∈ R that
φYn (t) = E(e−itYn )P
n
−i √t X
= E(e n k=1√ k )
= φPn Xk (t/ n)
k=1
n √
= φXk (t/ n)
Y

k=1 √
= φX (t/ n)n

56
Given that the standard normal variable has the characteristic function
φN (t) = e−t /2 , the proof is complete by combining Lemma 3.43 and Theo-
2

rem 3.41.

Corollary 3.45. Let Xn be an independent sequence of identically distributed


real random variables with a finite mean E(Xn ) = 0 and finite variance
E(Xn2 ) = σ 2 . Then one has

1 X n
d
√ Xk −→ N (0, σ),
n k=1

where N (0, σ) is the real random variable given by the distribution P((a, b]) =
R b − t2
√1 e
2σ 2 dt.
σ 2π a

57
4 Selected further topics
In this final chapter we treat a few selected further topics, both from proba-
bility theory and general measure theory.

4.1 Tail events


Definition 4.1. Let (Ω, M) be a measurable space and An ∈ M a sequence
of measurable sets. For each n ≥ 1, denote by Mn ⊆ M the σ-algebra
generated by {Ak }k≥n . The tail σ-algebra for the sequence An is defined as

M∞ =
\
Mn .
n≥1

Example 4.2. As an important example notice in the above definition that


one always has that lim sup An and lim inf An belong to the tail σ-algebra.
n→∞ n→∞

Lemma 4.3. Let (Ω, M, P) be a probability space. Let A ∈ M, {Bi }i∈I ⊆ M,


and {Cj }j∈J ⊆ M be given.
(i) Suppose {A} ∪ {Bi }i∈I is an independent family. Then for every set B
belonging to the σ-algebra generated by {Bi }, one has that A and B are
independent.
(ii) Suppose that {Bi }i∈I ∪ {Cj }j∈J is an independent family. Let B be the
σ-algebra generated by {Bi } and C the σ-algebra generated by {Cj }.
Then B and C are independent sub-σ-algebras of M.
Proof. (i): We denote by B the σ algebra generated by {Bi }. If A is a null
set, there is nothing to show, so let us assume P(A) > 0. Consider a new
measure P1 on (Ω, M) defined by P1 (C) = P(A∩C)
P(A)
.
Let A ⊆ M be the semi-ring defined by
(m )
n o
A= Dℓ | m ≥ 0, i1 , . . . , im ∈ I are distinct and Dℓ ∈ Biℓ , Bicℓ
\
.
ℓ=1

Note that we use the convention that this intersection is defined to be empty
when m = 0. We leave the details why A is a semi-ring as an exercise. Then
B is also generated by A as a σ-algebra. By the independence assumption, it
follows that P(A ∩ B) = P(A)P(B) for all B ∈ A, and hence P1 (B) = P(B).
By Theorem 2.17, we see that this implies P1 (B) = P(B) for all B ∈ B, or
in other words P(A ∩ B) = P(A)P(B) for all B ∈ B.
(ii): We have to show that for every B ∈ B and C ∈ C, it follows
that B and C are independent. First of all, it follows from the definition of

58
independence that for any set B belonging to the above defined semi-ring
A, we have that the family {B} ∪ {Cj }j∈J is independent. By (i), it hence
follows that for every such B ∈ A and C ∈ C are mutually independent. By
the definition of A as a set and the definition of independence, we see that
hence {Bi }i∈I ∪ {C} is an independent family for all C ∈ C. Applying (i)
one more time, we can conclude that all B ∈ B and C ∈ C are mutually
independent.
Theorem 4.4 (Kolmogorov’s zero-one law). Let (Ω, M, P) be a probability
space and An ∈ M an independent sequence of events. Then for any events
A ∈ M∞ , one has P(A) ∈ {0, 1}.
Proof. Suppose A ∈ M∞ . Let n ≥ 1. Then A ∈ Mn+1 , the σ-algebra gener-
ated by {Ak }k>n . Since the sequence {Ak }k≥1 is independent by assumption,
it follows from the above lemma that {A} ∪ {Ak }k≤n is an independent fam-
ily. However, as n was arbitrary, it follows that in fact {A} ∪ {Ak }k≥1 is an
independent family. Appealing to the lemma above again, it follows that A
and B are independent for every B ∈ M1 . Since M∞ ⊆ M1 , it follows that in
particular A is independent of itself. This leads to P(A) = P(A∩A) = P(A)2 ,
which implies P(A) ∈ {0, 1}.

4.2 Radon–Nikodym theorem


Definition 4.5. Let (Ω, M, µ) be a measure space. In analogy to Defini-
tion 1.40, we consider
n o
L2 (Ω, M, µ) = L2 (µ) = f : Ω → C | f is measurable and |f |2 is integrable .

For f ∈ L2 (µ), one defines the 2-seminorm


Z 1/2
∥f ∥2 = |f |2 dµ .

Theorem 4.6. Let (Ω, M, µ) be a measure space. Then the formula


Z
⟨f | g⟩ = f ḡ dµ

defines a sesquilinear anti-symmetric positive semi-definite form on L2 (µ).


If one considers the linear subspace

N = {f : Ω → C | f is measurable and f = 0 almost everywhere} ,

then L2 (µ) = L2 (Ω, M, µ)/N becomes a Hilbert space with respect the induced
inner product and the resulting 2-norm.

59
Proof. We can conclude from the general Hölder inequality with p = q = 2
(see also the version proved in the exercises) that ⟨·|·⟩ is well-defined. Its
claimed properties are immediate. In particular, it follows from results in
basic linear algebra that ∥ · ∥2 is indeed a seminorm. Like before (cf. Proposi-
tion 1.45), the subspace N can be identified with those functions f ∈ L2 (µ)
with ∥f ∥2 = 0 = ⟨f | f ⟩. Therefore the induced sesquilinear form ⟨·|·⟩ on
L2 (µ) defines an inner product, so that ∥ · ∥2 becomes a norm in analogy to
Definition 1.46. The only thing that remains to be shown is that L2 (µ) is
complete. For this it suffices to prove a statement analogous to Theorem 1.52
(in fact with an analogous proof).
Let fn ∈ L2 (µ) be a sequence that satisfies the Cauchy criterion in the
2-norm. By applying the Cauchy criterion inductively, we can find a sub-
sequence (fnk )k such that ∥fnk − fnk+1 ∥2 ≤ 2−k . We define the measurable
function g : Ω → [0, ∞] as g(x) = ∞ k=1 |fnk (x) − fnk+1 (x)|. Then it follows
P

from the Monotone Convergence Theorem and the triangle inequality for the
2-(semi-)norm that
Z n
Z X 2
g 2 dµ = lim
n→∞
|fnk − fnk+1 | dµ
Ω Ω k=1
n
X 2
= lim

n→∞
|fnk − fnk+1 |
k=1 2
X n 2
≤ lim ∥fnk − fnk+1 ∥2
n→∞
k=1

X 2
≤ 2−k = 1.
k=1

In particular g 2 is integrable, and by Proposition 1.38, the set E = g −1 ([0, ∞))


has a complement of zero measure. For all x ∈ E, we have by definition that

the series [fnk+1 (x) − fnk (x)] converges absolutely, and hence the function
X

k=1

f (x) := fn1 (x) + [fnk+1 (x) − fnk (x)] = lim fnk (x),
X
x∈E
k→∞
k=1

is well-defined and measurable on E. We extend f to a measurable function


on Ω by defining it to be zero on the complement of E. We get by the triangle
inequality that for all k ≥ 1, the function fnk is dominated (on E) by the
square-integrable function |fn1 | + g. Therefore it follows from the Dominated
k→∞
Convergence Theorem that ∥f − fnk ∥2 −→ 0. Since the sequence (fn )n was
assumed to satisfy the Cauchy criterion in ∥ · ∥2 and we just showed that a
subsequence converges to f in this norm, it follows that also ∥f − fn ∥2 → 0.
(As before this is an easy ε/2-argument.) This finishes the proof.

60
Definition 4.7. Let (Ω, M) be a measurable space. Given two measures
µ, ν on (Ω, M), we call ν absolutely continuous with respect to µ, if for all
E ∈ M, one has that µ(E) = 0 implies ν(E) = 0.
Lemma 4.8. Let (Ω, M) be a measurable space. Let µ be an arbitrary mea-
sure and ν a finite measure. Then ν is absolutely continuous with respect to
µ if and only if the following condition holds: For every ε > 0, there exists
δ > 0, such that for every E ∈ M with µ(E) < δ, one has ν(E) < ε.12
Proof. The “if” part is clear, so we shall prove the “only if” part and assume
that ν is absolutely continuous with respect to µ. Suppose that the claimed
condition were to fail. This means that there exists some ε0 > 0 for which
no δ > 0 satisfies the required property. Let us then choose for every k ≥ 1
a set Ek ∈ M so that ν(En ) ≥ ε0 but µ(Ek ) < 2−k . Then the set A =
lim supk→∞ Ek satisfies µ(A) = 0 by the (first) Borel–Cantelli lemma. So by
absolute continuity, we must have ν(A) = 0. However, since ν is a finite
measure, we conclude with Proposition 1.20(v) that
 \ [   [ 
ν(A) = ν Ek = inf ν E k ≥ ε0 .
n≥1
n≥1 k≥n k≥n

This is a contradiction and hence the claimed ε-δ-property must hold.


We will appeal to the following theorem without proof. It is covered in
the course on functional analysis.
Theorem 4.9 (Riesz–Fischer). Let H be a Hilbert space. Let θ : H → C be
a bounded linear functional, i.e., a C-linear map such that there exists some
constant C > 0 such that ∥θ(x)∥ ≤ C∥x∥ for all x ∈ H. Then there exists
an element y ∈ H such that θ(x) = ⟨x | y⟩ for all x ∈ H.
Theorem 4.10 (Radon–Nikodym Theorem). Let (Ω, M) be a measurable
space. Let ν and µ both be σ-finite measures on (Ω, M). Then ν is absolutely
continuous with respect to µ if and only if there exists a non-negative M-
measurable function h : Ω → [0, ∞) such that ν(E) = E h dµ for all E ∈ M.
R

In this context, h is called the Radon–Nikodym derivative and is often denoted


h = dµ

. It is uniquely determined up to equality µ-almost everywhere.
Proof. The “if” part is easy. If such a function h exists, then it follows
immediately that ν must be absolutely continuous with respect to µ.
So we need to show the “only if” part and assume that ν is absolutely
continuous with respect to µ. We first claim that it is enough to consider
12
Notice that that this is equivalent to saying that for any sequence En ∈ M, the
condition µ(En ) → 0 implies ν(En ) → 0.

61
the case where the involved measures are finite. Indeed, using that ν and
µ are σ-finite, we may find a sequence Ak ∈ M with µ(Ak ), ν(Ak ) < ∞
and Ω = k≥1 Ak . If we know that the claim is true for finite measures, we
S

can employ it to the restrictions of the measures on Ak \ Ak−1 for all k ≥ 1


(where A0 := ∅) and define find a function hk on Ak \ Ak−1 with the required
property for E ∈ M with E ⊆ Ak \ Ak−1 . If extend these functions by zero,
we may obtain a non-negative measurable function = k=1 hk : Ω → [0, ∞)
P∞
h
satisfying the general formula ν(E) = E h dµ for all E ∈ M by virtue of the
R

Monotone Convergence Theorem.


So let us from now on assume that both ν and µ are finite measures. Set
λ = ν + µ. q By the Hölder inequality, it follows that L2 (λ) ⊆ L1 (λ) and
∥f ∥1 ≤ ∥f ∥2 λ(Ω) for all f ∈ L2 (λ). Since ν ≤ λ by definition, it follows
that we can view the assignment
Z
L (λ) → C,
2
[f ] 7→ f dν

as a bounded linear function on a Hilbert space. By the Riesz–Fischer theo-


rem, we therefore find a function g ∈ L2 (λ) with the property that
Z Z
f dν = f g dλ.
Ω Ω

By definition of λ, this leads to the equivalent formula


Z Z
f (1 − g) dν = f g dµ.
Ω Ω

We note also that for all E ∈ M with λ(E) > 0, we have in particular
Z Z
0 ≤ ν(E) = χE dν = gχE dλ ≤ λ(E),
Ω Ω

so we are in a situation to apply Proposition 1.44 and conclude that g(x) ∈


[0, 1] holds λ-almost everywhere. Note that by absolute continuity, µ and
λ have the same null sets, so this holds in fact µ-almost everywhere. We
claim in addition that µ(g −1 (1)) = 0. Indeed, for E = g −1 (1) ∈ M, its
characteristic function fits into the equation
Z Z
0= χE (1 − g) dν = χE g dµ = µ(E).
Ω Ω

After redefining g on a measurable null set with respect to µ (and hence


also with respect to ν, by absolute continuity), we may assume that g takes
values in [0, 1). In particular, it makes sense to define h = ∞ n=1 g as a
n
P

62
measurable non-negative function on Ω. We claim that this is the desired
function. Indeed, we observe due to the equations above that for any E ∈ M,
we get for every n ≥ 1 that
Z Z n Z n
χE (1 − g n+1 ) dν = χE (1 − g) g k dν = g k dµ.
X X
χE g
Ω Ω k=0 Ω k=0

If we let n → ∞, then it follows from the Monotone Convergence Theorem


that the left side converges to ν(E), whereas the right side converges to
E h dµ. This finishes the proof.
R

63

You might also like