Professional Documents
Culture Documents
Contents
1 Lebesgue’s Integration Theory 1
1.1 σ-Algebras and Measurable Spaces . . . . . . . . . . . . . . . 1
1.2 Measurable functions . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Measures on σ-algebras, Measure Spaces . . . . . . . . . . . . 5
1.4 Integration theory . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.5 Convergence Theorems . . . . . . . . . . . . . . . . . . . . . . 18
3 Probability 39
3.1 Probability spaces and random variables . . . . . . . . . . . . 39
3.2 Independence . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
3.3 Law of Large Numbers . . . . . . . . . . . . . . . . . . . . . . 45
3.4 Central Limit Theorem . . . . . . . . . . . . . . . . . . . . . . 49
1
1 Lebesgue’s Integration Theory
1.1 σ-Algebras and Measurable Spaces
Definition 1.1. Let Ω be a non-empty set. A σ-algebra M on Ω is a set of
subsets of Ω satisfying
A1. ∅ ∈ M.
A2. If E ∈ M, then E c ∈ M.
E ∩ F = (E c ∪ F c )c , E \ F = (E c ∪ F )c .
Example 1.3. The entire power set 2Ω of Ω is the largest possible σ-algebra
on Ω, whereas {∅, Ω} is the smallest one. We may also consider
M = {E ⊆ Ω | E or E c is countable} ,
2
T2. For any family of sets O ⊆ T , one has O∈T.
S
Proof. In light of the previous remark, the second part of the statement
follows automatically if we can show that the metric topology on Ω has a
countable base consisting of open balls. As we assumed Ω to be separable,
choose a countable dense set D ⊆ Ω, and consider
B = {B(x, r) | x ∈ D, 0 < r ∈ Q} .
This is a countable family of open balls, and we claim that it is a base for
the metric topology. Indeed, let O ⊆ Ω be an open set. For x ∈ D ∩ O, set
Rx = {0 < r ∈ Q | B(x, r) ⊆ O}. Then evidently
[
{B(x, r) | x ∈ D ∩ O, r ∈ Rx } ⊆ O.
3
Example 1.8. Let us consider Ω = R with the standard topology. Then
half-open intervals are Borel because, for example, for a, b ∈ R with a < b,
one has
1
[a, b) = (a − , b).
\
n∈N n
Singleton sets are also Borel since they are closed. To summarize, there are
in general many more Borel sets than open sets. Note that in light of the
previous remark, the Borel-σ-algebra on R is generated by all open intervals.
Proof. The “only if” part is trivial, so let us consider the “if part”. Consider
the set n o
M = O ⊆ Ω2 | f −1 (O) ∈ M1 .
By assumption, we have S ⊆ M. The claim amounts to showing that M2 ⊆
M, and by assumption on S, it therefore suffices to show that M is a σ-
algebra. But this will be part of the exercise sessions.
4
Proof. By definition of the Borel-σ-algebra as the one generated by T , the
equivalence (i)⇔(ii) becomes a special case of Proposition 1.11. If we further-
more assume that B is a countable base of T , then the equivalence (i)⇔(iii)
is also a special case of Proposition 1.11 in light of Remark 1.6.
Theorem 1.13. Let (Ω, M) be a measurable space and (Y, d) a metric space.
We equip Y with the Borel-σ-algebra associated to the metric topology. Sup-
pose that a sequence of measurable functions fn : Ω → Y converges to a map
f : Ω → Y pointwise. Then f is measurable.
Then each of the sets Ck is open. For every x ∈ Ω, we use fn (x) → f (x) and
observe
x ∈ f −1 (C)
⇔ f (x) ∈ C
⇔ ∀ k ∈ N : f (x) ∈ Ck
fn (x)→f (x)
⇔ ∀ k ∈\N :[∃ n\
0 ∈ N : ∀ n ≥ n0 : fn (x) ∈ Ck
⇔ x∈ fn−1 (Ck ).
k∈N n0 ∈N n≥n0
Notation 1.14. We will equip [0, ∞] := [0, ∞) ∪ {∞} with the topology of
the one point compactification, that is, we define a subset O ⊆ [0, ∞] to be
open when the following holds:
5
This topology is induced by a metric, for example by
1 1
d(x, y) =
− ,
1 + x 1 + y
where we follow the convention ∞ 1
:= 0. We extend the usual addition and
multiplication from [0, ∞) to [0, ∞] by defining1
x + ∞ := ∞, x · ∞ := ∞ (x > 0), 0 · ∞ := 0.
Then the addition map + : [0, ∞] × [0, ∞] → [0, ∞] is continuous, but this is
not true for the multiplication map.2 We also extend the usual order relation
“≤” of numbers to [0, ∞] in the obvious way.
The triple (Ω, M, µ) is called a measure space. If µ(Ω) < ∞, we call µ a finite
measure, and if more specifically µ(Ω) = 1, we call it a probability measure
and the triple (Ω, M, µ) a probability space. If there exists a sequence En ∈
M with µ(En ) < ∞ and Ω = n∈N En , then we say that µ is σ-finite.
S
6
• If there is at least some E ∈ M with µ(E) < ∞, then σ-additivity
already implies µ(∅) = 0:
µ(E) = µ(E ∪ ∅ ∪ ∅ ∪ . . . )
= µ(E) + ∞ · µ(∅).
If µ(E) < ∞, then the above can only happen when µ(∅) = 0.
Example 1.17 (Counting measure). For any non-empty set Ω, we can con-
sider the σ-algebra 2Ω and define a measure µ via
#E , E is finite
µ(E) =
∞ , E is infinite.
Proposition 1.20. Let (Ω, M, µ) be a measure space. Then µ has the fol-
lowing extra properties:
7
Proof. (i): If E ⊆ F , then we may write F = E ∪ (F \ E) as a disjoint
union, so it follows from additivity that µ(F ) = µ(E) + µ(F \ E) ≥ µ(E). If
µ(F ) < ∞, we also obtain the last part of the claim by substracting µ(F \E).
(ii): One has E ∪ F = (E \ F ) ∪ F , which is a disjoint union. Thus
µ(E ∪ F ) = µ(E \ F ) + µ(F ) ≤ µ(E) + µ(F ).
(iii)+(iv): We construct a sequence Fn ∈ M as follows. We set F1 = E1
and Fn = En \( k<n Ek ) for n ≥ 2. Then the sequence Fn consists of pairwise
S
disjoint sets, but at the same time one has that k≤n Ek = k≤n Fk for all
S S
proves (iii). For (iv), additionally assume that En was increasing. Then
[
= µ(Fn )
X
µ En
n∈N n∈N
= sup µ(Fk )
X
n∈N k≤n
= sup µ
[
Fk
n∈N k≤n
= sup µ(En ).
n∈N
8
Definition 1.23. Let (Ω, M, µ) be a measure space. A subset N ⊆ Ω is
called a null set if there is some E ∈ M with N ⊆ E and µ(E) = 0.
If P (x) is a statement about elements x ∈ Ω that can either be true or
false, we say that P holds almost everywhere (w.r.t. µ) if the set of elements
x ∈ Ω for which P (x) is false is a null set.
Notation 1.24. For brevity, we may also say “P holds a.e.” (a.e.=almost
everywhere) or “P (x) holds for (µ-)almost all x”.
s−1 (λ) for λ ∈ s(Ω), and hence if these are measurable, then so is s−1 (A).
Definition 1.28. Let (Ω, M, µ) be a measure space. Let s : Ω → [0, ∞) be
a measurable simple function with canonical form s = λ∈s(Ω) λ · χs−1 (λ) . We
P
9
Lemma 1.29. Let (Ω, M, µ) be a measure space. For all k ≥ 0, numbers
α1 , . . . , αk ≥ 0, and sets A1 , . . . , Ak ∈ M, the function s = kj=1 αj · χAj is a
P
Proof. It is clear that these are positive measurable simple functions, so the
relevant part of the claim is the last equation. To prove it, we proceed by
induction on k. Let us first consider k = 1, so s = α1 χA1 . If α1 = 0, there is
nothing to prove, so let us assume α1 ̸= 0. Then the canonical form of s is
s = α1 χA1 + 0 · χAc1 , and hence Ω s dµ = α1 µ(A1 ) + 0 · µ(Ac1 ) = α1 µ(A1 ).
R
Now suppose that ℓ ≥ 1 is any natural number and that the statement
of the Lemma is true for all k ≤ ℓ. Suppose that s = ℓ+1 αj χAj is a step
P
Pℓj=1
function with ℓ + 1 non-zero summands. Write s0 = j=1 αj χAj . We will
try to find the canonical form for the sum s = s0 + αℓ+1 χAℓ+1 in terms of the
canonical form of s0 . Let λ ∈ [0, ∞). Then
We compute its integral as follows by using our computation rules for mea-
sures and the fact that Ω = λ∈s0 (Ω) s−1
0 (λ):
F
Z
s dµ = 0 (λ − αℓ+1 ) ∩ Aℓ+1 )
λ · µ(s−1
X
Ω
λ∈(s0 (Ω)+αℓ+1 )\s0 (Ω)
+ 0 (λ) \ Aℓ+1 )
λ · µ(s−1
X
λ∈s
Z 0 (Ω)
= s0 dµ + αℓ+1 µ(Aℓ+1 ).
Ω
10
By induction hypothesis, the latter sum is equal to
ℓ ℓ+1
αj µ(Aj ) + αℓ+1 µ(Aℓ+1 ) = αj µ(Aj ),
X X
j=1 j=1
Proof. Both (i) and (ii) are straightforward consequences of Lemma 1.29.
For (iii), suppose that s and t are given as
k ℓ
s= t=
X X
αj χAj , βi χBi
j=1 i=1
11
Indeed it is clear that ν(∅) = 0. For σ-additivity, let En ∈ M be a sequence
of pairwise disjoint sets, and observe
[
= µ A∩
[
ν En En
n∈N [ n∈N
= µ (A ∩ En )
X n∈N
= µ(A ∩ En )
n∈N
= ν(En ).
X
n∈N
Remark 1.33. We should first convince ourselves that the new definition of
the integral does not contradict the old one in the case where f is assumed
to be a step function. But indeed, s = f is the largest step function with
the property that s ≤ f , so by Proposition 1.30(iii) the above supremum is
indeed attained at the value Ω s dµ in the sense of Definition 1.28.
R
Secondly, we remark that the above definition a priori makes sense even
when f is not assumed to be measurable. However, we will see in the exercises
that the resulting notion of integral will have various undesirable properties
when evaluated on non-measurable functions, for example not being additive.
12
Proof. We already know that the “if” part is true. For the “only if” part,
assume that f is measurable. We claim that it suffices to consider the special
case f = id as a function [0, ∞] → [0, ∞]. Indeed, if we can realize id =
supn∈N tn for an increasing sequence of positive measurable step functions tn ,
then
f = id ◦f = n→∞
lim tn ◦ f = sup(tn ◦ f ),
n∈N
supn∈N Ω sn dµ.
Proof. Since we have by assumption sRn ≤ f for all n, it follows from the
definition of the integral that supn∈N ΩR sn dµ ≤ Ω f dµ. For the reverse
R
positive measurable step function s ≤ f and all constants 0 < c < 1. For a
fixed such function s and constant c, we consider the sequence of sets
En = {x ∈ Ω | sn (x) ≥ cs(x)} .
we define a new measure ν via ν(E) = Ω sχE dµ. Then it follows from
Proposition 1.20(iv) that
Z
c s dµ = cν(Ω)
Ω
= c · sup
Z n∈N ν(En )
= sup csχEn dµ
n∈N ZΩ
≤ sup sn χEn dµ
n∈N ZΩ
≤ sup sn dµ.
n∈N Ω
13
Proposition 1.37. Let (Ω, M, µ) be a measure space. Let f, g : Ω → [0, ∞]
be measurable functions. Then
Z Z
(i) (c · f ) dµ = c · f dµ for all constants c ≥ 0.
Ω Ω
Z Z Z
(ii) (f + g) dµ = f dµ + g dµ.
Ω Ω Ω
Z Z
(iii) If f ≤ g, then f dµ ≤ g dµ.
Ω Ω
Proof. Both (i) and (iii) are trivial consequences of the definition of the inte-
gral. For (ii), take sequences sn and tn of positive measurable step functions
that increase pointwise to f and g, respectively. Then sn + tn increases
pointwise to f + g. Hence it follows from Proposition 1.36 that
Z Z
f + g dµ = sup sn + tn dµ
Ω n∈N ZΩ Z
= (sup sn dµ) + (sup tn dµ)
Z Ω Z Ω
= f dµ + g dµ.
Ω Ω
0 by definition.
Now assume that f is integrable. Consider E = f −1R(∞) ∈ M and sn =
nχE . Then sn ≤ f for all n ∈ N, and hence nµ(E) = Ω sn dµ ≤ Ω f dµ.
R
Since this holds for all n ≥ 1 and we assume the right side to be finite, this
is only possible for µ(E) = 0. But this is what it means that f (x) < ∞ for
µ-almost all x ∈ Ω. This finishes the proof.
14
Remark 1.39. Let Ω be a set and f : Ω → C a function. Then it is
always possible to decompose f into f = u + i · v for real-valued functions
u, v : Ω → R, the real and imaginary parts of f . Furthermore we can always
uniquely write
u = u+ − u− , v = v + − v − , u+ u− = 0, v + v − = 0,
where u+ , u− , v + , v − take values in [0, ∞).4 We then have u± , v ± ≤ |f |, but
also |f | ≤ u+ + u− + v + + v − by triangle inequality.
Furthermore, if Ω carries a σ-algebra for which f becomes measurable,
then the functions u+ , u− , v + , v − are also measurable. In what follows we wish
to exploit such decompositions to linearly extend the integral construction.
Definition 1.40. Let (Ω, M, µ) be a measure space. We say that a mea-
surable function f : Ω → C is integrable, if |f | is integrable in the sense of
Definition 1.32. In this case we define the integral as
Z Z Z Z Z
f dµ = +
u dµ − u dµ + i
− +
v dµ − −
v dµ ,
Ω Ω Ω Ω Ω
where we use the decomposition from Remark 1.39. The set of all such
functions is denoted L1 (Ω, M, µ) or L1 (µ).
Theorem 1.41. L1 (Ω, M, µ) is a complex vector space with the usual oper-
ations. Moreover the integral
Z
L (Ω, M, µ) → C,
1
f 7→ f dµ
Ω
is a linear map.
Proof. The fact that L1 is a vector space is left as an exercise. Let us proceed
to prove linearity of the integral.
Let us first assume that f, g ∈ L1 are real-valued. Set h = f + g and
observe that hence h+ −h− = f + −f − +g + −g − , or equivalently h+ +f − +g − =
h− + f + + g + . All of these summands are positive integrable functions, so it
follows by Proposition 1.37 that
Z Z Z Z Z Z
h+ dµ + f − dµ + g − dµ = h− dµ + f + dµ + g + dµ.
Ω Ω Ω Ω Ω Ω
All of these summands are finite, and hence we can rearrange this equation
to
Z Z Z Z Z Z
h+ dµ − h− dµ = f + dµ − f − dµ + g + dµ − g − dµ,
Ω Ω Ω Ω Ω Ω
4 +
For example, u is the composition of u with the function R → [0, ∞) that is the
identity on [0, ∞) and sends every negative number to zero.
15
and hence Ω h dµ = f dµ +R Ω g dµ. It is also clear from the definition
R R R
R Ω
that Ω f + ig dµ = Ω f dµ + i Ω g dµ. These two equations imply together
R
with a scalar of modulus one, we may assume without loss of generality that
Ω f dµ ≥ 0. Write f = u + iv for real-valued functions u, v. Then
R
Z Z Z u+ ≤|f | Z
0≤ f dµ = u dµ ≤ u+ dµ ≤ |f | dµ.
Ω Ω Ω Ω
16
(ii) If f is integrable and g : Ω → C is another measurable function such
that f = g a.e., then g is integrable and Ω f dµ = Ω g dµ.
R R
The latter would imply that the number µ(f −11 (B)) Ω g dµ is in B, which
R
17
Proposition 1.45. Let (Ω, M, µ) be a measure space. Consider
18
Proof. By Proposition 1.35, for every n ≥ 1, we find an increasing sequence
of measurable step functions sn,k : Ω → [0, ∞] such that fn = supk≥1 sn,k .
Not only that, but the construction of these functions in the proof allows us
to assume that |fn (x) − sn,k (x)| ≤ 2−k whenever fn (x) ≤ k, and sn,k (x) = k
whenever fn (x) ≥ k.
Since (sn,k )k is increasing, the sequence of positive measurable step func-
tions tn = maxj≤n sj,n is increasing. Since fn is increasing, it follows that
tn ≤ fn for all n. We claim f = supn≥1 tn . Indeed, let x ∈ Ω be given. If
f (x) = ∞, then fn (x) → ∞, so tn (x) ≥ sn,n (x) ≥ min(n, fn (x) − 2−n ) → ∞.
If f (x) < ∞, then in particular f (x) ≤ n for all large enough n, and hence
|f (x) − tn (x)| ≤ |f (x) − sn,n (x)|
≤ |fn (x) − f (x)| + |fn (x) − sn,n (x)|
≤ |fn (x) − f (x)| + 2−n → 0.
We appeal to Proposition 1.36 and see that
Z Z tn ≤fn Z
f dµ = sup tn dµ ≤ sup fn dµ.
Ω n≥1 Ω n≥1 Ω
The “≥” relation is on the other hand clear from the fact that the integral
is monotone. This finishes the proof.
Lemma 1.48 (Fatou). Let (Ω, M, µ) be a measure space and fn : Ω → [0, ∞]
be a sequence of measurable functions. Then lim inf n→∞ fn is also measurable,
and we have Z Z
(lim inf fn ) dµ ≤ lim inf fn dµ.
Ω n→∞ n→∞ Ω
Proof. Denote gk = inf n≥k fn , and recall that lim inf n→∞ fn = supk≥1 gk .
We thus see that lim infR n→∞ n
f is Ra measurable function. If n ≥ k, then
evidently gk ≤ fn , so R Ω gk dµ ≤ Ω fn dµ. R In particular, this is true for
arbitrarily large n, so Ω gk dµ ≤ lim inf n→∞ Ω fn dµ. Using the Monotone
Convergence Theorem, we thus see
Z Z Z
lim inf fn dµ = sup gk dµ ≤ lim inf fn dµ.
Ω n→∞ k≥1 Ω n→∞ Ω
19
Proof. We have that f is measurable (Theorem 1.13) and |f | ≤ g, hence
f ∈ L1 (Ω, M, µ). Furthermore, we have |f − fn | ≤ 2g, and so we may apply
Fatou’s Lemma to the sequence 2g − |f − fn | to get
Z Z Z Z
2g dµ ≤ lim inf
n→∞
2g − |f − fn | dµ = 2g dµ − lim sup |f − fn | dµ .
Ω Ω Ω n→∞ Ω
| {z }
≥0
As a consequence we obtain
Z Z Z Z
n→∞
fn dµ = |f − fn | dµ −→ 0.
f dµ − f − fn dµ ≤
Ω Ω Ω Ω
Remark 1.51. The Dominated Convergence Theorem really only holds for
sequences of functions, and its analogous generalizations for more general
families of functions (such as nets) are false. A counterexample is discussed
in the exercise sessions.
20
k=1 |fnk (x) − fnk+1 (x)|. Then it follows
function g : Ω → [0, ∞] as g(x) = ∞
P
k=1
∞
f (x) := fn1 (x) + [fnk+1 (x) − fnk (x)] = lim fnk (x),
X
x∈E
k→∞
k=1
21
Theorem 1.54. Let (Ω1 , M1 , µ1 ) be a measure space, Ω2 a non-empty set,
and φ : Ω1 → Ω2 a map. We define M2 = φ∗ M1 and µ2 = φ∗ µ1 in the above
sense.
In other words, the claim holds for functions of the form f = χE . By linearity
of the integral, the desired equation holds for all positive measurable step
functions in place of f . Now let f be as general as in the statement, and
write f = supn∈N sn for an increasing sequence of positive measurable step
functions sn : Ω2 → [0, ∞), using Proposition 1.35. Clearly we also have
f ◦ φ = supn∈N sn ◦ φ. Then by Proposition 1.36, we see that
Z Z Z Z
f dµ2 = sup sn dµ2 = sup sn ◦ φ dµ1 = f ◦ φ dµ1 .
Ω2 n∈N Ω2 n∈N Ω1 Ω1
Remark 1.55. In the above theorem, we pushed forward the measure space
structure from Ω1 to get one on Ω2 which makes the statement of the theorem
true. It may of course happen that we have an a priori given measure space
(Ω2 , M2 , µ2 ) and a measurable map φ : Ω1 → Ω2 with the property that
µ1 (φ−1 (E)) = µ2 (E) for all E ∈ M2 . Convince yourself that the statement
of the theorem will still be true!
22
2 Carathéodory’s Construction of Measures
2.1 Measures on Semi-Rings and Rings
Definition 2.1. Let Ω be a non-empty set. A semi-ring on Ω is a set of
subsets A ⊆ 2Ω such that
(1) ∅ ∈ A.
(2) If E, F ∈ A, then E ∩ F ∈ A.
Example 2.2. The set A of all subsets of Ω with at most one element form
a semi-ring. More interestingly, if n ≥ 2, then the set of half-open cubes
A = {(a1 , b1 ] × · · · × (an , bn ] | aj , bj ∈ R, aj ≤ bj }
(1) ∅ ∈ R.
(2) If E, F ∈ R, then E ∪ F ∈ R.
(3) If E, F ∈ R, then E \ F ∈ R.
Lemma 2.5. Let A be a semi-ring over Ω. Then the smallest ring R over
Ω containing A is given as
23
By definition, we find pairwise disjoints sets E1 , . . . , En ∈ A and pairwise
disjoint sets F1 , . . . , Fm ∈ A such that E = nj=1 Ej and F = m ℓ=1 Fℓ . Then
S S
n m
[ n
E\F = Fℓ = (Ej \ F1 ) \ F2 \ . . .
[ [
Ej \ \ Fm
j=1 ℓ=1 j=1
ℓ=1
24
and this union is a disjoint union. Analogously we have Fℓ = nj=1 (Ej ∩ Fℓ )
S
j=1
m [
= (Aj ∩ En )
X
µ0
j=1 n≥1
m [
= (Aj ∩ An,k )
X [
µ0
j=1 n≥1 k≤mn
m X∞ X
mn
= µ0 (Aj ∩ An,k )
X
OM1. ν(∅) = 0.
25
Proposition 2.10. Let Ω be a non-empty set, R a ring on Ω, and µ a
measure on R. Define for every subset E ⊂ Ω the value
X∞
µ∗ (E) = inf µ(En )
[
En .7
En ∈ R is a sequence with E ⊆
n=1 n≥1
nothing to show, so let us assume that it is finite. Let ε > 0. For every
n ≥ 1, we may by definition find a sequence En,k ∈ R with An ⊆ k≥1 En,k
S
26
ν(A ∩ (E ∪ F ) ∩ E) + ν(A ∩ (E ∪ F ) \ E) = ν(A ∩ E) + ν(A ∩ F ). So inserting
A = E ∪ F yields that ν is finitely additive on M.
From E ∪ F ∈ M it immediately follows that M is closed under intersec-
tions and differences. Hence M becomes a σ-algebra if we can show that for
all sequences of pairwise disjoint sets En ∈ M, we have E = n≥1 En ∈ M.
S
∞
Since m was arbitrary, we get ν(A) ≥ ν(A ∩ En ) + ν(A \ E). On the other
X
n=1
hand, the equality A = (A \ E) ∪ (A ∩ E) = (A \ E) ∪ A ∩ En together
S
n≥1
with σ-subadditivity yields
In particular we get the equality ν(A) = ν(A ∩ E) + ν(A \ E), which yields
E ∈ M as A was arbitrary. Furthermore, if we insert A = E, then we
also have ν(E) = ∞ n=1 ν(En ), which shows that ν is σ-additive on M. In
P
Definition 2.13. A measure space (Ω, M, µ) is called complete, if for all sets
E ⊆ F ⊆ Ω, one has that F ∈ M and µ(F ) = 0 implies E ∈ M.
27
Theorem 2.15. Let R be a ring on a set Ω, and µ a measure on R. Then
the σ-algebra M of µ∗ -measurable sets contains R, and we have µ∗ |R = µ.
Proof. We see right away for all E ∈ R that µ∗ (E) ≤ µ(E) since we can write
E = E ∪ ∅ ∪ ∅ ∪ . . . . On the other hand, if En ∈ R is any sequence of sets
with E ⊆ n≥1 En , then it follows from σ-subadditivity and monotonicity of
S
µ that
[ ∞ ∞
µ(E) = µ (E ∩ En ) ≤ µ(E ∩ En ) ≤ µ(En ).
X X
By taking the infimum over all possible choices of such sequences, we arrive
at µ(E) = µ∗ (E). Since E ∈ R was arbitrary, we have just shown µ∗ |R = µ.
Now we need to show that every set E ∈ R is µ∗ -measurable. Let A ⊆ Ω
be any set. Since we always have µ∗ (A) ≤ µ∗ (A ∩ E) + µ∗ (A \ E), we may
assume without loss of generality that µ∗ (A) < ∞. Let An ∈ R be a sequence
of sets with A ⊆ n≥1 An . Then
S
µ∗ (A) ≤ µ∗ (A ∩ E) + µ∗ (A \ E)
n=1 µ(An ∩ E) + µ(An \ E)
P∞
≤
= n=1 µ(An ).
P∞
If we take the infimum over all possible such sequences An , then the right
side approaches the value µ∗ (A), and hence we get the equality µ∗ (A) =
µ∗ (A ∩ E) + µ∗ (A \ E). Since A ⊆ Ω and E ∈ R were arbitrary, this finishes
the proof.
Definition 2.16. Let A be a semi-ring over Ω, and µ a measure on A. We
say that µ is σ-finite, if there is a sequence En ∈ A with µ(En ) < ∞ and
Ω = n≥1 En .
S
28
Note that all these summands are finite, and we know from above µ1 (Fn ∩
A) ≤ µ∗ (Fn ∩ A). It follows that necessarily µ1 (Fn ∩ A) = µ∗ (Fn ∩ A). By
taking the supremum over n, we conclude µ1 (A) = µ∗ (A).
Corollary 2.18. Let A be a semi-ring over a set Ω, and µ0 a measure on
A. Let M be the σ-algebra generated by A. Then there exists a measure
µ : M → [0, ∞] extending µ0 . If µ0 is σ-finite, then µ is unique.
Proof. We note that if R is the ring generated by A, then M is also the
σ-algebra generated by R. So this is a direct consequence of Proposition 2.7,
Proposition 2.10, Theorem 2.12, and Theorem 2.15. In the case of µ0 being
σ-finite, uniqueness of µ is exactly Theorem 2.17.
29
Since A is arbitrary, it follows that φ(E) is µ∗ -measurable. The reverse
argument shows that if φ(E) is µ∗ -measurable, then E was µ∗ -measurable to
begin with. This finishes the proof.
any empty sets if necessary, we may assume bn ≤ an+1 for all n < M . This
leads to
M M M −1
µ0 (En ) = bn −an ≤ bM −aM + an+1 −an = bM −a1 ≤ b−a = µ0 (E).
X X X
n≥1 n≥1
The right-hand side is an open covering of the compact set on the left side,
and hence there is some N ≥ 1 such that [a + ε, b] ⊂ N n=1 (an , bn + 2
−n
ε). We
S
change the ordering of the intervals appearing in this union by the following
inductive procedure: Choose k1 ∈ {1, . . . , N } to be the index so that
n o
ak1 = max aj | a + ε ∈ (aj , bj + 2−j ε) .
If b < bk1 + 2−k1 ε, then the procedure stops here. Otherwise, choose k2 ∈
{1, . . . , N } to be the index so that
n o
ak2 = max aj | bk1 + 2−k1 ∈ (aj , bj + 2−j ε) .
If b < bk2 +2−k2 ε, then the procedure stops here. Otherwise one continues in-
ductively until the procedure stops after L ≤ N steps. This yields an injective
8
By convention, we set (a, b] := ∅ if a ≥ b.
30
map k : {1, . . . , L} → {1, . . . , N } such that [a + ε, b] ⊂ Ln=1 (akn , bkn + 2−kn ε)
S
and such that for all n < L, we have akn+1 < bkn + 2−kn ε. From this we can
deduce (with the help of the first part of the proof)
n=1
L
= bkn + 2−kn ε − akn
X
n=1
N
≤ ε+
X
b n − an
n=1
≤ ε + b − a.
n=1 n=1
Since ε > 0 was arbitrary, this finally implies µ0 (E) = µ0 (En ) and
P∞
n=1
finishes the proof.
31
In other words, we have λ(E + t) = λ(E) for every Lebesgue-measurable set
E ∈ L. Since t ∈ R is arbitrary, this shows the claim.
Lastly, the measure µ0 on A is σ-finite. Since A generates the Borel σ-
algebra, the uniqueness of the measure follows from Theorem 2.17. On the
other hand, if we have a translation-invariant measure µ with µ((0, 1]) = 1,
then it is easy to see that µ((0, n1 ]) = n1 for all n ≥ 1, which one can use
to show that µ agrees with λ on all sets in A with rational endpoints. If
a, b ∈ R with a < b, pick some number c ∈ Q strictly between them. Choose
decreasing sequences of rational numbers an , bn ∈ Q such that an → a and
bn → b. Then
(a, c] = (an , c], (c, b] = (c, bn ].
[ \
n≥1 n≥1
Proof. From these properties it follows that if such a function F exists at all,
then it has to be given by the formula
t>0
µ((0, t]),
F (t) = 0 , t=0
−µ((t, 0]) , t < 0.
9
This is not so easy to see, but an example is discussed here, for whoever is interested:
https://www.math3ma.com/blog/lebesgue-but-not-borel.
32
We claim that this function has indeed the right properties. Let a, b ∈ R
with a < b. We aim to show µ((a, b]) = F (b) − F (a). If a ≥ 0, then
µ((a, b]) = µ((0, b] \ (0, a]) = µ((0, b]) − µ((0, a]) = F (b) − F (a). If b < 0,
we can prove this analogously. If a < 0 ≤ b, then we have µ((a, b]) =
µ((a, 0] ∪ (0, b]) = µ((a, 0]) + µ((0, b]) = F (b) − F (a).
The fact that F is increasing follows immediately from the fact that µ
has nonnegative values. The right-continuity follows from
for any sequence εn > 0 with εn → 0. Here we used the continuity property
of µ as a measure with respect to countable decreasing intersections.
n=1 n=1
M −1
≤ F (bM ) − F (aM ) + F (an+1 ) − F (an )
X
n=1
= F (bM ) − F (a1 )
≤ F (b) − F (a) = µ(E).
33
For the reverse inequality, choose ε > 0. We use right-continuity to pick
for every n ≥ 1 a small δn > 0 such that F (bn + δn ) − F (bn ) ≤ 2−n ε. Then
∞ ∞
[a + ε, b] ⊂ E = (an , bn + δn )
[ [
En ⊂
n=1 n=1
of Proposition 2.21, we may deduce F (b)−F (a+ε) ≤ ε+ n=1 F (bn )−F (an ).
P∞
then one can show that we recover the Dirac measure µ = δa . For a ≤ 0 one
::::::::::::::
also recovers this measure with the function
::::::::::::::::::::::::::::::::::::::::::::::
−1 , t<a
F (t) =
0 , t ≥ a.
::::::::::::::::::::::::
34
Proof. This is proved in the exercise sessions.
λ(d) = λ
|
⊗ λ ⊗{z· · · ⊗ λ}
d times
with respect to the measure space (R, L, λ). The Lebesgue σ-algebra L(d)
on Rd is the one consisting of all λ(d) -measurable sets in the sense of Defi-
nition 2.11, which contains the Borel σ-algebra. If the dimension d is clear
from context, we may sometimes slightly abuse notation and just write λ for
the Lebesgue measure on Rd .
Theorem 2.33 (Tonelli; see exercises). Let (Ωi , Mi , µi ) be two σ-finite mea-
sure spaces for i = 1, 2. Let f : Ω1 × Ω2 → [0, ∞] be a M1 ⊗ M2 -measurable
function. Denote fx = f (x, _) : Ω2 → [0, ∞] and f y = f (_, y) : Ω1 → [0, ∞].
Then:
(a) For every x ∈ Ω1 , the function fx is M2 -measurable.
35
(c) The function Ω1 → [0, ∞] given by x 7→
R
Ω2 fx dµ2 is M1 -measurable.
Theorem 2.34 (Fubini). Let (Ωi , Mi , µi ) be two σ-finite measure spaces for
i = 1, 2. Let f : Ω1 × Ω2 → C be a M1 ⊗ M2 -measurable function. Denote
fx = f (x, _) : Ω2 → C and f y = f (_, y) : Ω1 → C. Then the following are
equivalent:
Z
(A) |f | d(µ1 ⊗ µ2 ) < ∞;
Ω1 ×Ω2
Z Z
(B) |fx | dµ2 dµ1 (x) < ∞;
Ω1 Ω2
Z Z
(C) |f y | dµ1 dµ2 (y) < ∞.
Ω2 Ω1
If any (or every) one of these statements holds, then we have that
Proof. It follows from Tonelli’s theorem that the integrals appearing in (A),
(B), and (C) are always the same, so their finiteness are indeed equivalent.
Now let us assume that these statements are true. By splitting f up into its
real and imaginary parts, let us assume without loss of generality that f is
a real function.
By part (c) in Tonelli’s theorem and Proposition 1.38, it follows that the
set E of all x ∈ Ω1 , for which fx is integrable, is in M1 and its complement
is a null set. Let f = f + − f − be the canonical decomposition into positive
functions as in Remark 1.39. Then it is very easy to see that (f + )x = (fx )+
36
and (f − )x = (fx )− . So fx+ and fx− are both integrable whenever x ∈ E.
Moreover the map
Z Z Z
E ∋ x 7→ fx dµ2 = fx+ dµ2 − fx− dµ2
Ω2 Ω2 Ω2
The remaining equality follows exactly in the same way by exchanging the
roles of Ω1 and Ω2 .
Proof. Clearly any measure must satisfy this property, so the “only if” part
is clear.
For the “if” part, we may first take A1 = Ω and An = ∅ for all n ≥
2, which immediately implies µ(∅) = 0. We need to show that µ is σ-
additive. Let Bn ∈ A be a sequence of pairwise disjoint sets such that
B = n≥1 Bn ∈ A. As A is a semi-ring, we have Ω \ B = A1 ∪ · · · ∪ Aℓ
S
37
the one hand that 1 = µ(B) + ℓn=1 µ(An ). On the other hand, if we set
P
Aℓ+k = Bk for k ≥ 1, then the sequence (An )n≥1 defines a pairwise disjount
covering of Ω, so 1 = ∞ µ(An ). Comparing these two equations, we see
P
n=1P
that µ(B) = n>ℓ µ(An ) = ∞ n=1 µ(Bn ), which shows our claim.
P
( )
A= Ei | Ei ∈ Mi for all i ∈ I, Ei = Ωi for all but finitely many i ∈ I .
Y
i∈I
Theorem 2.37. Adopt the notation from the above definition. Then A is a
semi-ring and µ is a measure on A. In particular, it extends uniquely to a
probability measure µ on i∈I Mi . One also denotes µ = i∈I µi .
N N
Proof. The “in particular” part is due to Corollary 2.18. By definition, every
set in A is a product of subsets of the spaces Ωi which are proper subsets
only over finitely many indices. In particular, given any sequence An , there
are countably many indices in I so that over every other index i ∈ I, the
projection of every set An to the i-th coordinate yields Ωi . Considering the
semi-ring axioms for A and the axioms of being a measure for µ, we can see
that it is enough to consider the case where I is countable. As we already
know that the claim is true if I is finite, we may from now on assume I = N.
The fact that A is a semi-ring now follows directly from the exercises. We
hence need to show that µ is a measure on A, where our goal is to appeal
to the condition in the above lemma. Suppose that An ∈ A is a sequence
of pairwise disjoint sets with Ω = n≥1 An . It is our intention to show
S
n=1 µ(An ) = 1.
P∞
For each n ≥ 1 let us write An = ∞ i=1 An,i and pick in ≥ 1 such that
Q
i≤im i>im
Indeed, if n = m, then all the involved factors are equal to one, so the
equation holds. Assume n ̸= m. We observe χAk ((ωi )i ) = i≤ik χAk,i (ωi ). As
Q
1= ∞ k=1 χAk , we conclude for (ωi )i ∈ Am that 0 = i≤in χAn,i (ωi ). So either
P Q
38
if in > im , then every tuple of the form (ω1 , . . . , ωim , αim +1 , . . . ) is also in Am ,
so analogously
in
0= χAn,i (ωi ) · χAn,i (αi ).
Y Y
i≤im i=im +1
have ∞ ∞ ∞
µ(An ) = µi (An,i )
X XY
n=1 Ω1 i=2
Z X ∞ ∞
= χAn,1 (ω1 ) · µi (An,i ) dµ1 (ω1 )
Y
Ω1 n=1 i=2
n=1 i=2
for all k ≥ 1. However, since ω ∈ Am for some m, it follows from our previous
observation for k = im that precisely one of these summands is 1 and the
others are zero. This leads to a contradiction. Hence we have indeed that
n=1 µ(An ) = 1.
P∞
3 Probability
3.1 Probability spaces and random variables
Definition 3.1. A probability space is a measure space (Ω, M, µ) with
µ(Ω) = 1. In this context,
• µ is called a probability measure,
39
• the value µ(A) is called the probability of (the event) A. (To be even
more suggestive, one often writes P instead of µ.)
The following easy special case illustrates how the most elementary prob-
abilistic experiments, such as tossing a coin only finitely many times, can be
thought of in this framework. We omit the easy proof.
Proposition 3.2. Let Ω be a non-empty countable set and M = 2Ω . Let
p : Ω → [0, 1] be any map with the property ω∈Ω p(ω) = 1. Then the
P
For more involved probabilistic questions, the sample space is not neces-
sarily countable, which makes it somewhat more difficult to come up with
the right choices of events and probability measures. The following can be
seen as a model for tossing a coin infinitely often.
Example 3.3 (infinite coin tossing). For each toss, a coin can only come
up as heads or tails, which we conveniently denote as the outcomes 0 and 1.
Since we want to model tossing the coin infinitely often, the outcomes are
sequences having value 0 or 1. This gives rise to the sample space Ω = {0, 1}N .
A rather obvious example of an event is when the first k tosses are equal
to some k-tuple ω ∈ {0, 1}k . The associated subset of Ω is given as Aω =
>k
{ω} × {0, 1}N . For example, A1 is the event where the first coin toss comes
up as tails, or A0,1,1 is the event where the first three tosses come up as heads
→ tails → tails. Let M be the event σ-algebra generated by all such sets.
We note that a lot of other natural choices for events are automatically in
M. For example, the event that the n-th coin toss comes up as heads is given
by
>n
{0, 1}n−1 × {0} × {0, 1}N =
[
A(ω,0) .
ω∈{0,1}n−1
40
Definition 3.4. Let (Ω, M, P) be a probability space and W a topological
space. A W -valued random variable X is a measurable map X : Ω → W .
For a Borel subset B ⊆ W , we will frequently use the popular notation
(X ∈ B) for the preimage X −1 (B).
Remark 3.5. One sometimes also says “stochastic variable”. In the partic-
ular case W = Rn , one calls it a vector random variable, and for W = R, a
real random variable. The case W = RN is referred to as a random sequence.
In the case of a real random variable X, we will also freely play around with
the above notation, for example (|X| ≤ 1) is written instead of (X ∈ [−1, 1]),
etc.
Remark 3.6. From our previous study on measurable maps we can observe
that real random variables are closed under addition and multiplication, and
pointwise limits. General random variables are closed under the same type
of operations under which measurable maps are closed.
41
Proposition 3.11. Let W be a topological space and X a W -valued random
variable with distribution PX . Let f : W → R be a Borel measurable function.
Then f ◦ X = f (X) has Ran expected value if and only if f is PX -integrable.
In that case, E(f (X)) = W f dPX .
Notation 3.12. Certain expected values get a special name. Let X be a
real random variable.
(i) Given k ≥ 0, we call E(|X|k ) the k-th absolute moment. If it exists,
then E(X k ) is called the k-th moment.
(ii) Suppose X has an expected value mX = E(X). Then Var(X) = E((X−
mX )2 ) is called the variance of X.
(iii) More generally, suppose X and Y are two real random variables on the
same probability space, for which the second moments exist. Then it
follows from Hölder’s inequality that all of X, Y , XY have expected
values, and the value Cov(X, Y ) = E((X − mX )(Y − mY )) is called the
covariance of X and Y .
3.2 Independence
Definition 3.13. Let (Ω, M, P) be a probability space. Two events A, B ∈
M are called independent, if P(A ∩ B) = P(A)P(B).
Example 3.14. In the context of seeing measure theory as a way to assign
a “volume” to certain sets, this notion has no clear meaning. The concept
does however become relevant if one adopts the probabilistic point of view.
For example, keep in mind Example 3.3 modelling the infinite coin tossing.
In that context, we may for example define A to be the event where both
the first and second toss comes up as heads, and B the event where both the
third and fourth toss comes up as tails. In other words,
≥3 ≥5
A = {0} × {0} ⊗ {0, 1}N , B = {0, 1} × {0, 1} × {1} × {1} ⊗ {0, 1}N .
In our model we assume that the individual coin tosses are not supposed
to influence each other, and hence we should certainly view these events as
independent. Indeed, we have here
≥5
A ∩ B = {0} × {0} × {1} × {1} ⊗ {0, 1}N ,
so by the properties of the product measure µ we can see here that µ(A) = 14 ,
µ(B) = 14 , and µ(A ∩ B) = 16 1
. Similarly, all pairs of events defined by the
outcomes of coin tosses happening at distinct times will be independent in
this model.
42
Definition 3.15. Let (Ω, M, P) be a probability space. A family of events
{Ai }i∈I ⊆ M is called independent, if for all pairwise distinct indices i1 , . . . , in ∈
I, one has P(Ai1 ∩ Ai2 ∩ · · · ∩ Aii ) = P(Ai1 )P(Ai2 ) · · · P(Ain ).
Example 3.17. Let us motivate this again from the point of view of our
infinite coin tossing model. Let Hn be the event that where the n-th toss
comes up heads. Then the event lim supn→∞ Hn describes the situation where
heads comes up infinitely many times, and the event lim inf n→∞ Hn describes
the situation where heads comes up in all but finitely many tosses. For these
reasons, it is not uncommon in a probabilistic context to use the notation
and
lim inf An = (An , almost always) = (An , a.a.).
n→∞
n=1 n→∞
∞
(ii) Suppose P(An ) = ∞. If {An }n∈N is an independent family, then
X
n=1
P(lim sup An ) = 1.
n→∞
Proof. We prove (i) in the exercise sessions, so we only need to prove (ii).
We will use the fact from the exercise sessions that the family of com-
plements {Acn }n∈N is also independent. Moreover, we are about to use the
43
well-known inequality 1 − x ≤ e−x for all x ∈ R. We observe for all n ≥ 1
that \∞ \ m
P c
Ak = m→∞
lim P c
Ak
k=n k=n
m
= lim P(Ack )
Y
m→∞
k=n
m
= lim (1 − P(Ak ))
Y
m→∞
k=n
m
lim exp(−P(Ak ))
Y
≤
m→∞
k=n m
= lim exp P(Ak )
X
m→∞
−
k=n
= 0.
We conclude that n≥1 k≥n Ack = lim inf n→∞ Acn is a null set. Therefore
S T
lim supn→∞ An = (lim inf n→∞ Acn )c is co-null, which shows our claim.
Example 3.20. Let us have yet another look at our infinite coin tossing
model and what we observed in Example 3.14. Let Hn denote the event
where the n-th coin toss comes up as heads. Then the family {Hn }n∈N is
independent, each event has probability 12 , and hence it follows from the
above that the event (Hn , infinitely often) has probability 1.
Definition 3.21. Let (Ω, M, P) be a probability space. A family of sub-
σ-algebras Mi ⊆ M for i ∈ I is called independent, if for every collection
{Ai }i∈I ⊆ M with Ai ∈ Mi is independent.
Definition 3.22. Let (Ω, M, P) be a probability space. Let I be an index set,
and let Xi be a random variable on (Ω, M, P) with values in the topological
space Wi for i ∈ I. We call the family {Xi }i∈I independent, if for every
collection {Si }i∈I of Borel sets Si ⊆ Wi , we have that the family of events
{(Xi ∈ Si )}i∈I is independent. (In other words, if we denote by Bi the Borel-
σ-algebra on Wi , this condition means that the family of pullback σ-algebras
{Xi∗ (Bi )}i∈I is independent in the above sense.)
Theorem 3.23. Let {Pi }i∈I be a family of Borel probability measures on R.
Then there exists a probability space (Ω, M, P) with an independent family of
real random variables {Xi }i∈I such that Pi = PXi .
Proof. We set (Ω, M, P) = i∈I (R, B, Pi ) and define Xi : Ω → R to be the
N
projection onto the i-th component. It is an easy exercise to see that this
yields an independent family with the desired property.
Since the proof of the following is very easy, we leave it as an exercise.
44
Proposition 3.24. Let (Ω, M, P) be a probability space. Let I be an index
set, and let Xi be a random variable on (Ω, M, P) with values in the topological
space Wi for i ∈ I. For each i ∈ I, let Vi be another topological space and
fi : Wi → Vi a Borel measurable map. If the family of random variables
{Xi }i∈I is independent, then so is {fi (Xi )}i∈I .
E(XY ) = RΩ XY dP
R
= RΩ (X + − X − )(Y + − Y − ) dP
= Ω X + Y + + X − Y − − X − Y + − X + Y − dP
= E(X + )E(Y + ) + E(X − )E(Y − ) − E(X − )E(Y + ) − E(X + )E(Y − )
= (E(X + ) − E(X − ))(E(Y + ) − E(Y − ))
= E(X)E(Y ).
45
Theorem 3.26 (Chebyshev’s inequality). Let X be a real random variable
on the probability space (Ω, M, P). Suppose that f : R → R≥0 is an even
measurable function whose restriction to R≥0 is increasing. Let a ≥ 0 be any
number with f (a) > 0. Then
1
P(|X| ≥ a) ≤ E(f (X)).
f (a)
Proof. The first part follows from the general Chebyshev inequality for f (x) =
|x|p . The second part follows when applying it to X −mX as the real random
variable and the function f (x) = x2 .
46
Proof. Since they are identically distributed, it follows that both the mean
and the variance of Xn are the same for every n ≥ 1. We may assume without
loss of generality m = 0. Let us first assume that the variance of Xn is finite
and equal to σ 2 for all n ≥ 1. For every n ≥ 1, denote X̄n = n1 nk=1 Xk . We
P
Here we have used that Xn≤N has finite variance and the above falls into our
p
previous subcase. As δ > 0 was arbitrary, this verifies X̄n → 0 in general.
47
There is also a stronger law, i.e., a theorem with a stronger conclusion
than the weak law of large numbers. We will prove this strong law under an
additional assumption.10
Theorem 3.30 (Strong Law of Large Numbers). Let Xn be an independent
sequence of identically distributed real random variables defined on a proba-
bility space (Ω, M, P). Suppose that every (or any) Xn has a fourth moment,
meaning that E(Xn4 ) < ∞. If m = E(Xn ) is the mean of all these random
variables, then it follows that P n k=1 Xk → m = 1.
1 Pn
Proof. We first note that applying Hölder’s inequality twice yields E(|Xn |)4 ≤
E(Xn2 )2 ≤ E(Xn4 ) =: M4 , hence the assumptions on Xn ensure that m exists.
As before, we assume without loss of generality m = 0 by replacing each
Xn by Xn − m, if necessary. Denote X̄n = n1 nk=1 Xk . Then
P
1 n
E(X̄n4 ) = E(Xi Xj Xk Xl ).
X
n4 i,j,k,l=1
Considering the summand over the tuple (i, j, k, l), it follows from the inde-
pendence of the sequence Xn and Proposition 3.25 that it is zero if there is
one entry in this tuple that is different from all the other three. In other
words, the summand can only be non-zero if all four entries agree, or if the
tuple has two different indices occuring exactly twice. Two different entries
can possibly occur in the pattern (k, k, l, l), (k, l, k, l) or (k, l, l, k), leading
to 3n(n − 1) possible summands. Using Proposition 3.25 once again, the
expression hence simplifies to
1 X
n n
E(X̄n4 ) = 4 E(Xk ) + 3
4
E(Xk )E(Xl )
2 2
X
n k=1 k,l=1
k̸=l
By the very beginning of the proof, each summand in the bracket can be esti-
mated above by M4 , and there are a total of n + 3n(n − 1) ≤ 3n2 summands,
hence
3n2 M4 3M4
E(X̄n4 ) ≤ 4
= 2 .
n n
If we apply Corollary 3.27 for p = 4, it follows for every ε > 0 that
3M4
P(|X̄n | ≥ ε) ≤ .
n2 ε4
10
The conclusion of the theorem is actually true under the same assumptions we had
for the weak law. The proof of the general case is however quite demanding, which is why
we add the stronger assumption heresome common extra assumptions allowing for a much
::::::::::::::::::::::::::::::::::::::::::::::
simpler proof.
::::::::::::
Anyone who is interested in the proof of the general case is referred to, for
example, “Theorem 22.1” in the book “Probability and Measure” by Patrick Billingsley.
48
The right side is a summable sequence of non-negative numbers. Hence it
follows by the Borel–Cantelli Lemma that
P(|X̄n | ≥ ε, infinitely often) = 0.
However, we note that by the definition of convergence of sequences, we have
(X̄n ̸→ 0) = (|X̄n | ≥ ε, infinitely often),
[
ε>0
ε∈Q
For the purpose of the proof, we define the function κ : R → R via κ(x) =
dt for x ≥ 0, and κ(x) = −κ(−x) when x < 0. We will appeal
R x sin t
0 t
(without proof) to a fact from calculus stating that limx→∞ κ(x) = π2 . We
obviously have the identity dx
d
κ(x) = sin(x)
x
for all x ≥ 0. Using the fact that
cos is an even function, we hence observe that
Z T
eit(x−a) − eit(x−b)
dt
−T it
Z T
cos(t(x − a)) − cos(t(x − b)) + i sin(t(x − a)) − i sin(t(x − b))
= dt
−T it
Z T
sin(t(x − a)) − sin(t(x − b))
= dt
−T t
49
In particular,
Z T
e−ita − e−itb Z
φX (t) dt = 2 κ(T (x − a)) − κ(T (x − b)) dPX (x)
−T it R
0 x < a or x > b
,
lim κ(T (x − a)) − κ(T (x − b)) = π , a<x<b
T →∞
x = a or x = b.
π/2 ,
Corollary 3.34. Two real random variables are identically distributed if and
only if their characteristic functions are equal.
Proof. The “only if” part is clear, so we show the “if” part. Let X and Y be
two real random variables. The distribution function FX (t) = PX ((−∞, t])
is increasing, and therefore by the above theorem, it is discontinuous in at
most countably many points. In analogy to Example 2.27, this means that
we may have PX ({t}) ̸= 0 for at most countably many t ∈ R. The same
observation is of course true for Y in place of X. Let W ⊆ R be the subset
of all points t with PX ({t}) = 0 = PY ({t}). Then W is co-countable and in
particular dense. It is then an easy exercise to show that the semi-ring
AW = {(a, b] | a, b ∈ W, a < b} ⊆ 2R
50
Definition 3.35. A sequence Pn of Borel probability measures on R is said
to weakly converge to a Borel probability measure P, if one has
Z Z
lim
n→∞ R
f dPn = f dP
R
lim sup Fn (x) = lim sup Pn ((−∞, x]) ≤ P((−∞, x]) = F (x).
n→∞ n→∞
51
Since x is a continuous point of F , we also have P({x}) = 0 and hence
Conversely, let us assume that Fn (x) → F (x) holds for all x ∈ C(F ). By
definition, we may easily observe Pn ((a, b]) → P((a, b]) for all a, b ∈ C(F )
with a < b. Therefore, it follows that
Z Z
f dPn → f dP
R R
52
Proof. Let Fn be the distribution function of Pn for every n ≥ 1. Then Pn
being tight translates to
lim lim inf Fn (R) − Fn (−R) = 1.
R→∞ n→∞
53
Theorem 3.41 (Lévy’s Continuity Theorem). Let Xn and X be real random
d
variables for n ≥ 1. Then Xn → X holds if and only if φXn (t) → φX (t) for
all t ∈ R.
54
Proof. We proceed by induction in n. We note that it suffices to consider the
case x ≥ 0 since we may obtain the claim for x < 0 by complex conjugation
n
of the involved terms. Let us first proceed with proving that 2|x|
n!
is an upper
bound. For n = 0, this is equal to 2, and since e has norm one, this follows
ix
from the triangle inequality. Now assume that this inequality holds for a
given number n ≥ 0 and all x ≥ 0. We then compute
n+1
(ix)k x it n
(it)k
Z
=
ix X X
e − e − dt
k=0 k! k=0 k!
0
Z x n
(it)k
it X
≤ e −
dt
0 k=0 k!
Z x n
2t 2xn+1
≤ dt = .
0 n! (n + 1)!
|x| n+1
We can also proceed by induction to show the upper bound (n+1)! . Then
one can first consider n = −1, in which case both the left and right side is
1. One can then perform the induction step (n − 1) → n with a completely
analogous calculation as above, namely
n
(ix)k x it X (it)k
n−1
Z
=
ix X
e − e − dt
k=0 k! k=0 k!
0
Z x
it X (it)k
n−1
≤ e − dt
k=0 k!
0
Z x n
t xn+1
≤ dt = .
0 n! (n + 1)!
Lemma 3.43. Let X be a real random variable such that E(X 2 ) = 1 and
E(X) = 0. Then one has the limit formula
√ 2
lim φX (t/ n)n = e−t /2 , t ∈ R.
t→∞
55
the Dominated Convergence Theorem. So by integrating and applying the
triangle inequality, we see
t2 t2 X 2
φX (t) − 1 + ≤ E eitX − 1 − itX + ≤ t2 E(X0t ).
2 2
√
If we substitute t → t/ n, this leads to the observation that
√ t2 n→∞
xn = n(φX (t/ n) − 1 − ) −→ 0.
2n
Using the fact from calculus that one has convergence
x n
ex = lim 1 +
n→∞ n
uniformly over bounded intervals, we finally conclude that
√ √ n
φX (t/ n)n = 1 + (φX (t/ n) − 1)
t2 √ t2 n
− n(φX (t/ n) − 1 + )
= 1− 2 2n
n
t2
−
n
x
n
= 1− 2
n
n→∞ 2 /2
−→ e−t .
1 X n
d
√ Xk −→ N .
n k=1
Proof. Since the Xn are identically distributed, they all have the same charac-
teristic function, which we shall denote by φX . Denote Yn = n−1/2 nk=1 Xk ,
P
d
so that we wish to show Yn → N . Using that the Xn are independent, we
observe for all t ∈ R that
φYn (t) = E(e−itYn )P
n
−i √t X
= E(e n k=1√ k )
= φPn Xk (t/ n)
k=1
n √
= φXk (t/ n)
Y
k=1 √
= φX (t/ n)n
56
Given that the standard normal variable has the characteristic function
φN (t) = e−t /2 , the proof is complete by combining Lemma 3.43 and Theo-
2
rem 3.41.
1 X n
d
√ Xk −→ N (0, σ),
n k=1
where N (0, σ) is the real random variable given by the distribution P((a, b]) =
R b − t2
√1 e
2σ 2 dt.
σ 2π a
57
4 Selected further topics
In this final chapter we treat a few selected further topics, both from proba-
bility theory and general measure theory.
M∞ =
\
Mn .
n≥1
Note that we use the convention that this intersection is defined to be empty
when m = 0. We leave the details why A is a semi-ring as an exercise. Then
B is also generated by A as a σ-algebra. By the independence assumption, it
follows that P(A ∩ B) = P(A)P(B) for all B ∈ A, and hence P1 (B) = P(B).
By Theorem 2.17, we see that this implies P1 (B) = P(B) for all B ∈ B, or
in other words P(A ∩ B) = P(A)P(B) for all B ∈ B.
(ii): We have to show that for every B ∈ B and C ∈ C, it follows
that B and C are independent. First of all, it follows from the definition of
58
independence that for any set B belonging to the above defined semi-ring
A, we have that the family {B} ∪ {Cj }j∈J is independent. By (i), it hence
follows that for every such B ∈ A and C ∈ C are mutually independent. By
the definition of A as a set and the definition of independence, we see that
hence {Bi }i∈I ∪ {C} is an independent family for all C ∈ C. Applying (i)
one more time, we can conclude that all B ∈ B and C ∈ C are mutually
independent.
Theorem 4.4 (Kolmogorov’s zero-one law). Let (Ω, M, P) be a probability
space and An ∈ M an independent sequence of events. Then for any events
A ∈ M∞ , one has P(A) ∈ {0, 1}.
Proof. Suppose A ∈ M∞ . Let n ≥ 1. Then A ∈ Mn+1 , the σ-algebra gener-
ated by {Ak }k>n . Since the sequence {Ak }k≥1 is independent by assumption,
it follows from the above lemma that {A} ∪ {Ak }k≤n is an independent fam-
ily. However, as n was arbitrary, it follows that in fact {A} ∪ {Ak }k≥1 is an
independent family. Appealing to the lemma above again, it follows that A
and B are independent for every B ∈ M1 . Since M∞ ⊆ M1 , it follows that in
particular A is independent of itself. This leads to P(A) = P(A∩A) = P(A)2 ,
which implies P(A) ∈ {0, 1}.
then L2 (µ) = L2 (Ω, M, µ)/N becomes a Hilbert space with respect the induced
inner product and the resulting 2-norm.
59
Proof. We can conclude from the general Hölder inequality with p = q = 2
(see also the version proved in the exercises) that ⟨·|·⟩ is well-defined. Its
claimed properties are immediate. In particular, it follows from results in
basic linear algebra that ∥ · ∥2 is indeed a seminorm. Like before (cf. Proposi-
tion 1.45), the subspace N can be identified with those functions f ∈ L2 (µ)
with ∥f ∥2 = 0 = ⟨f | f ⟩. Therefore the induced sesquilinear form ⟨·|·⟩ on
L2 (µ) defines an inner product, so that ∥ · ∥2 becomes a norm in analogy to
Definition 1.46. The only thing that remains to be shown is that L2 (µ) is
complete. For this it suffices to prove a statement analogous to Theorem 1.52
(in fact with an analogous proof).
Let fn ∈ L2 (µ) be a sequence that satisfies the Cauchy criterion in the
2-norm. By applying the Cauchy criterion inductively, we can find a sub-
sequence (fnk )k such that ∥fnk − fnk+1 ∥2 ≤ 2−k . We define the measurable
function g : Ω → [0, ∞] as g(x) = ∞ k=1 |fnk (x) − fnk+1 (x)|. Then it follows
P
from the Monotone Convergence Theorem and the triangle inequality for the
2-(semi-)norm that
Z n
Z X 2
g 2 dµ = lim
n→∞
|fnk − fnk+1 | dµ
Ω Ω k=1
n
X
2
= lim
n→∞
|fnk − fnk+1 |
k=1 2
X n 2
≤ lim ∥fnk − fnk+1 ∥2
n→∞
k=1
∞
X 2
≤ 2−k = 1.
k=1
k=1
∞
f (x) := fn1 (x) + [fnk+1 (x) − fnk (x)] = lim fnk (x),
X
x∈E
k→∞
k=1
60
Definition 4.7. Let (Ω, M) be a measurable space. Given two measures
µ, ν on (Ω, M), we call ν absolutely continuous with respect to µ, if for all
E ∈ M, one has that µ(E) = 0 implies ν(E) = 0.
Lemma 4.8. Let (Ω, M) be a measurable space. Let µ be an arbitrary mea-
sure and ν a finite measure. Then ν is absolutely continuous with respect to
µ if and only if the following condition holds: For every ε > 0, there exists
δ > 0, such that for every E ∈ M with µ(E) < δ, one has ν(E) < ε.12
Proof. The “if” part is clear, so we shall prove the “only if” part and assume
that ν is absolutely continuous with respect to µ. Suppose that the claimed
condition were to fail. This means that there exists some ε0 > 0 for which
no δ > 0 satisfies the required property. Let us then choose for every k ≥ 1
a set Ek ∈ M so that ν(En ) ≥ ε0 but µ(Ek ) < 2−k . Then the set A =
lim supk→∞ Ek satisfies µ(A) = 0 by the (first) Borel–Cantelli lemma. So by
absolute continuity, we must have ν(A) = 0. However, since ν is a finite
measure, we conclude with Proposition 1.20(v) that
\ [ [
ν(A) = ν Ek = inf ν E k ≥ ε0 .
n≥1
n≥1 k≥n k≥n
61
the case where the involved measures are finite. Indeed, using that ν and
µ are σ-finite, we may find a sequence Ak ∈ M with µ(Ak ), ν(Ak ) < ∞
and Ω = k≥1 Ak . If we know that the claim is true for finite measures, we
S
We note also that for all E ∈ M with λ(E) > 0, we have in particular
Z Z
0 ≤ ν(E) = χE dν = gχE dλ ≤ λ(E),
Ω Ω
62
measurable non-negative function on Ω. We claim that this is the desired
function. Indeed, we observe due to the equations above that for any E ∈ M,
we get for every n ≥ 1 that
Z Z n Z n
χE (1 − g n+1 ) dν = χE (1 − g) g k dν = g k dµ.
X X
χE g
Ω Ω k=0 Ω k=0
63