You are on page 1of 63

Measure and integration theory for statisticians

Sebastian Holtz
Humboldt-Universität zu Berlin

April 2, 2019
Preface
This script contains the topics that were discussed in the lecture ’Maßtheorie
für Statistiker‘ at Humboldt-Universität zu Berlin. It is based on the book
of the same name by Prof. Uwe Küchler and the scripts offered by the
preceding lecturers of this course, Mathias Trabs and Martin Wahl. The
current script does not claim to be complete nor to be free from errors. In
case you find any error or unclear point I would appreciate if you contact
me.

ii
Contents

I Prelude 1

1 About measure and integration theory 1

2 Preliminaries 2
2.1 Set theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
2.2 The real numbers . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.3 Maps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.4 Sequences & Countability . . . . . . . . . . . . . . . . . . . . 8
2.5 Product sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.6 Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

II Measure theory 12

3 Introduction 12
3.1 The ’problem of measure‘ . . . . . . . . . . . . . . . . . . . . 12
3.2 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

4 Measurable spaces 14
4.1 σ-Algebras . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
4.2 Borel σ-Algebra . . . . . . . . . . . . . . . . . . . . . . . . . . 17
4.3 σ-algebras under maps . . . . . . . . . . . . . . . . . . . . . . 21

5 Measures 23
5.1 Definition & Properties . . . . . . . . . . . . . . . . . . . . . 23
5.2 Construction Of Measures . . . . . . . . . . . . . . . . . . . . 26
5.3 The Lebesgue Measure . . . . . . . . . . . . . . . . . . . . . . 28
5.4 Probability measures . . . . . . . . . . . . . . . . . . . . . . . 31

III Integration theory 33

6 Measurable maps 33
6.1 Definition & first Properties . . . . . . . . . . . . . . . . . . . 33
6.2 Induced measures . . . . . . . . . . . . . . . . . . . . . . . . . 35
6.3 Simple functions . . . . . . . . . . . . . . . . . . . . . . . . . 36
6.4 Approximation of Borel functions . . . . . . . . . . . . . . . . 36

7 Basics of Integration theory 37


7.1 The integral of simple functions . . . . . . . . . . . . . . . . . 38
7.2 The integral of non-negative functions . . . . . . . . . . . . . 41

iii
7.3 Integrable functions . . . . . . . . . . . . . . . . . . . . . . . 44
7.4 Convergence theorems . . . . . . . . . . . . . . . . . . . . . . 53

8 Product measures 56
8.1 Product σ-algebras . . . . . . . . . . . . . . . . . . . . . . . . 56
8.2 Product measures . . . . . . . . . . . . . . . . . . . . . . . . . 58
8.3 Fubini’s theorem . . . . . . . . . . . . . . . . . . . . . . . . . 59

iv
Part I
Prelude
1 About measure and integration theory
What is measure theory? The aim of measure theory is the generali-
sation of well-known geometric concepts such as lengths, areas or volumes.
Two important questions that immediately arise are: what object do we
measure and how do we measure these objects? For instance, if we consider
the Cartesian plane R2 we could be interested in the area of squares, rectan-
gles, etc. In order to calculate those areas formulas have been derived that
capture our geometric understanding of two-dimensional objects, e.g. the
area of a square equals the square of the side length, the area of a rectangle
equals the product of the side lengths, etc.
Abstractly speaking, we assign to an object a specific number that rep-
resents the mass of the object, i.e. we measure the object. The procedure
of measuring can be mathematically interpreted as a function whose inputs
are certain (geometric) objects and whose outputs are real numbers. In the
following we will call such a function a measure. One measure we are all
familiar with is the one described above, which assigns to a rectangle the
product of the lengths of its sides. In the following we will call this mea-
sure (more precisely: the measure induced by this formula) the Lebesgue
measure.
Besides the Lebesgue measure there are plenty of other measures. For
instance, consider a collection of rectangles with varying side lengths. In-
stead of the area of each rectangle according to the Lebesgue measure one
could be just interested in the quantity of rectangles, i.e. one would like
to count the number of rectangles. Then each rectangle counts as one no
matter how large it is. In other words, from a counting perspective, each
rectangle has the mass one. Since the process of counting is nothing but
assigning a number to a geometric object we have just created a second
candidate for a measure, the so-called counting measure.
Now we would like to link the concept of measures to Stochastics. For
this consider a third example involving rectangles. Suppose you draw a
rectangle on a blackboard and that you are trying to hit this rectangle with
a piece of chalk from a distance of five meters. Let us assume that you are
a good thrower (or that the rectangle is pretty large) and that the chances
of hitting the rectangle are given by 9/10 and of missing it by 1/10. In
other words, from the perspective of how likely a successful throw is, the
rectangle has the mass 9/10 and the area around has the mass 1/10 - even
though it might be of way larger size according to the Lebesgue measure.
Again, we have found an example for a measure. Since this measure tells us
the probabilities of hitting or missing the rectangle it is called a probability

1
measure.
Admittedly, these examples are of a very simple nature. However, it
will turn out that already the Lebesgue measure is not easy to define as it
is not possible to measure any object we like (at least not by maintaining
a few desirable properties). Thus one goal of measure theory is to classify
what the objects we would like to measure should ’offer‘ and what properties
characterise a ’meaningful‘ measure. Besides the derivation of the Lebesgue
measure this leads to a general theory which is the theoretical foundation
to investigate random phenomena mathematically. By the general concept
of measures a unified approach and toolbox is given that facilitates the
handling of stochastic objects no matter if it is of simple or complex nature,
one-dimensional or multi-dimensional, discrete or continuous.

What is integration theory?

2 Preliminaries
In this section we would like to introduce a collection of standard concepts
and statements of the mathematical branches linear algebra and analysis.

2.1 Set theory


In the following we consider the general notion of sets. Sets will help us
to model any kind of object that we would like to measure. In statistics
possible outcomes or events of experiments are represented by sets.

Definition 2.1. A set (Menge) is a well-defined collection of distinct ob-


jects. The objects that constitute a set are called elements (Elemente).

Sets will usually be denoted by capital letters, e.g. A, Ω, . . .. To declare


a collection of elements to a set the braces ’{’ and ’}’ are used. For instance,
let
Ω = {1, 2, 3, 4, 5, 6}.
Then Ω is the set that consists of all possible outcomes of rolling a die.
A set can contain arbitrary objects (not only numbers). For large sets
one often uses abbreviations in terms of ’. . .’. Two examples are given by

Ω = {1, 2, . . . , 6} and N = {1, 2, . . .}

Another way to write down sets compactly is via rules that apply to the
contained elements:

B = {2, 4, 6, . . .} = {n ∈ N : 2 divides n} = {2n : n ∈ N}.

For sets A and B we introduce the following notations.

2
• x ∈ A: x is an element of A.

• x∈
/ A: x is not an element of A.

• A ⊆ B: For any x ∈ A it also holds x ∈ B (often the more imprecise


notation A ⊂ B is used).

• A = B: It holds that A ⊆ B and that B ⊆ A.

• A ( B: It holds that A ⊆ B and but not B ⊆ A.

• ∅ : The empty set, i.e. ∅ = {}.

Example 2.2. Let Ω = {1, . . . , 6} and let A = {1, 2, 3}. Then 1 ∈ A and
4∈/ A. Moreover, we have A ⊆ Ω, or more precisely: A ( Ω, i.e. neither
Ω ⊆ A nor Ω = A holds.

Next the elemental operations or interactions between sets will be con-


sidered. We will use the terminology ’:=’ to express that a definition is
made.

Definition 2.3. For sets A and B define the following operations.

• A ∪ B := {x : x ∈ A or x ∈ B} is called the union (Vereinigung) of A


and B.

• A ∩ B := {x : x ∈ A and x ∈ B} is called the intersection (Durch-


schnitt) of A and B.

• If A ∩ B = ∅ then A and B are called disjoint (disjunkt).

• A\B := {x : x ∈ A and x ∈
/ B} is called the difference (Differenz) of
A and B.

Examples for these operations can be found on the exercise sheet.

Definition 2.4. A set A is said to be finite (endlich) if it is empty or if it


allows for the following representation

A = {an : n = 1, . . . , N },

where the elements an , n = 1, . . . , N , are pairwise distinct. The number


N of elements is called the cardinality (Kardinalität/Mächtigkeit) and is
denoted by |A| (sometimes #A). If A has infinitely many elements then we
set |A| := ∞.

In the following we will usually fix a set Ω and consider relations of


subsets of Ω.

3
Definition 2.5. Let Ω be a set. The power set (Potenzmenge) P(Ω) of Ω
is the set that contains all subsets of Ω, i.e.

P(Ω) = {A : A ⊆ Ω}.

Remark 2.6. Another notation for P(Ω) is given by 2Ω . The reason for this
is that it is not hard to see that in case of |Ω| < ∞ we have |P(Ω)| = 2|Ω| .

Example 2.7. Let Ω = {1, 2, 3}. Then

P(Ω) = {{1}, {2}, {3}, {1, 2}, {2, 3}, {1, 3}, {1, 2, 3}, ∅}.

Lemma 2.8. Let Ω be a non-empty set and let A, B, C ∈ P(Ω). Then the
following rules apply

A ∪ B = B ∪ A, A ∩ B = B ∩ A,

A ∪ (B ∪ C) = (A ∪ B) ∪ C, A ∩ (B ∩ C) = (A ∩ B) ∩ C,
A ∪ (B ∩ C) = (A ∪ B) ∩ (A ∪ C), A ∩ (B ∪ C) = (A ∩ B) ∪ (A ∩ C).

A relation between subsets of high interest is given by the following


definition.

Definition 2.9. Let Ω be a set and let A ⊆ Ω. Then

AcΩ := Ω\A = {x ∈ Ω : x ∈
/ A}

is called the complement of A in Ω (Komplement).

We will often consider Ω to be fixed and therefore omit the dependence


of Ac w.r.t. Ω, i.e. we write short Ac for AcΩ .

Example 2.10. Let Ω = {1, . . . , 6} and let A = {1, 2, 4}. Then Ac =


{3, 5, 6}.

Lemma 2.11. Let Ω be a non-empty set. For any pair of sets A, B ∈ P(Ω)
it holds that

(Ac )c = A, A ∩ Ac = ∅ and A ∪ Ac = Ω,

as well as
A\B = A ∩ B c .

Proof. Exercise

Often we are interested in unions or intersections of more than two or


even infinitely many sets. For this we use the concept if index sets I which
are usually of the from I = {1, . . . , n} or I = N, i.e. they contain finitely or
infinitely many natural numbers.

4
Definition 2.12. Let I be some non-empty index set. If we are given for
each i ∈ I a subset Ai in Ω then we call (Ai )i∈I = {Ai : i ∈ I} a family of
subsets of Ω (Familie/Menge von Mengen).

Definition 2.13. Let (Ai )i∈I be a family of subsets in Ω. Then


[
Ai := {x : x ∈ Ai for (at least) one i ∈ I}
i∈I

is called the union of the sets Ai , i ∈ I and


\
Ai := {x : x ∈ Ai for all i ∈ I}
i∈I

is called the intersections of the sets Ai , i ∈ I.

Example 2.14. Let I = {1, 2, 3}. Then


[ \
Ai = A1 ∪ A2 ∪ A3 and Ai = A1 ∩ A2 ∩ A3 .
i∈I i∈I

For S Tn of type I = {1, 2, . . . , n} one usually uses the no-


finite index sets
n
tations i=1 T Ai and i=1 Ai and for the infinite index set I = N one uses
S ∞ ∞
A
i=1 i and i=1 Ai , respectively.

Lemma 2.15 (De Morgan’s law). Let Ω be a non-empty set. For a family
(Ai )i∈I of subsets of Ω it holds that
 \ c [  [ c \
Ai = Aci , Ai = Aci .
i∈I i∈I i∈I i∈I

Proof. See exercises.

2.2 The real numbers


Next we lay the focus on a particular set of big interest, the real numbers.
In the preceding part we already introduced the (infinite) set of natural
numbers
N = {1, 2, . . .}.
If we extend the natural numbers by the number zero and the set of negative
integers −N := {−1, −2, . . .} we obtain the set of integers Z:

Z := −N ∪ {0} ∪ N.

A further extension marks the set of rational numbers given by

Q := { ab |a, b ∈ Z, b 6= 0}.

5
However, the rational numbers are√not ’complete’ or more roughly said ’have
gaps’. A prominent example is 2, which can be shown to √ be not rep-
resentable by a fraction of two integer numbers. Therefore, 2 is called
irrational. Nevertheless, any irrational
√ number can be approximated by ra-
tional numbers. For instance 2 = 1.41421356... can be approximated by
the rational numbers
14
q1 =1.4 = ,
10
141
q2 =1.41 = ,
100
1414
q3 =1.414 = ,
1000
14142
q4 =1.4142 = ,
10000
..
.

Definition 2.16. The set R that we obtain if we add to Q all those finite
numbers that can be approximated by elements of Q is called the (set of )
real numbers.
It is clear that one has the relation N ( Z ( Q ( R. Moreover, for two
real numbers a and b exactly one of the relations a < b, a > b and a = b
applies. For a, b ∈ R satisfying a ≤ b (i.e. not a > b) we can define the
following sets:

[a, b] := {x ∈ R : a ≤ x ≤ b},
(a, b) := {x ∈ R : a < x < b},
[a, b) := {x ∈ R : a ≤ x < b},
(a, b] := {x ∈ R : a < x ≤ b},

and we call those sets closed, open, right half-open and left-half open inter-
val, respectively. Note that a ∈ / (a, b]. The above intervals are called finite
intervals, i.e. of finite length b − a. The infinite intervals are given by

[a, ∞) := {x ∈ R : a ≤ x < ∞},


(a, ∞) := {x ∈ R : a < x < ∞},
[−∞, b) := {x ∈ R : −∞ < x < b},
(−∞, b] := {x ∈ R : −∞ < x ≤ b},
(−∞, ∞) := {x ∈ R : −∞ < x < ∞} = R.

Definition 2.17. Let A ⊆ R be a subset of R. A number M is called upper


(or lower) bound of A, if x ≤ M (or x ≥ M ) for all x ∈ A. A set A ⊆ R
is called bounded from above (or below) if there is an upper (or lower)
bound.

6
Definition 2.18. Let A ⊆ R. A number M is called supremum (or
infimum) of A if M is the least upper (or greatest lower) bound of A. Here
M is called least upper bound if
1. M is upper bound of A,
2. if M 0 is another upper bound, then it holds that M ≤ M 0 .
The greatest lower bound of A is defined analogously.
M is called maximum (or minimum) of A if M is supremum (or
infimum) of A and M ∈ A.
Example 2.19. Let A = { n1 : n ∈ N} = {1, 12 , 31 , 14 , . . .}. The numbers 2, 100
and 3000 are examples for upper bounds of A. The numbers −4 and −50 are
examples for lower bounds of A. However, the supremum of A is 1 and the
infimum is 0. Moreover, 1 is the maximum of A but A has no minimum as
0∈/ A. In particular, the supremum or infimum are not necessarily contained
in A and a minimum or maximum not necessarily exists.
Theorem 2.20. Every non-empty set A ⊆ R that is bounded from above
(or below) has a supremum (or infimum). We write sup A = supa∈A a (or
inf A).

2.3 Maps
Definition 2.21. Let E, F be two non-empty sets. A map f (Abbildung)
is a rule that associates each x ∈ E with exactly one y ∈ F . We write
f (x) = y and
f : E → F, x 7→ f (x).
The set E is called the domain (Definitionsbereich) of f . For a subset
A ⊆ E the set
f (A) := {y ∈ F : y = f (x) for one x ∈ A}
is called the image of A (under f ) (Bild). The set f (E) is called the
image of f , sometimes written as Im(f ).
For B ⊆ F the set
f −1 (B) := {x ∈ E : f (x) ∈ B}
is called the preimage of B (Urbild).
Example 2.22. A map f is given by
f : R → R, x 7→ x2 .
For this map, we write f (x) = x2 . The image of the set [0, 3] ( R is given
by f ([0, 3]) = [0, 9]. The image of f equals Im(f ) = R+
0 := {x ∈ R : x ≥ 0}.
For the set [4, ∞) the pre-image is given by f −1 ([4, ∞)) = (−∞, −2] ∪
[2, ∞) = R\(−2, 2).

7
Example 2.23. An important function in measure theory is the indicator
function (sometimes also called characteristic function). Let Ω be a non-
empty set and let A ⊆ Ω. Then the indicator function of the subset A is
given by (
1, if x ∈ A
1A : Ω → {0, 1}, x 7→ .
0, if x ∈/A
Remark 2.24. Technically the above definition of a map coincides with
the one of a function. Although the notion ’function‘ is sometimes preferred
over the use of ’map‘ (and vice versa) we treat the two terminologies as
interchangeable in this lecture.

2.4 Sequences & Countability


Definition 2.25. Let A be a non-empty set. A sequence (Folge) of ele-
ments of A is a map N → A, i.e. each n ∈ N is associated to one element
an ∈ A. For the sequence we write

(an )n∈N , (a1 , a2 , a2 , . . .),

or shorter (an ).
Definition 2.26. Let (an ) be a sequence of real numbers. The sequence is
called
1. bounded (beschränkt), if there is a number M ≥ 0, such that |an | ≤
M , for all n ∈ N.

2. monotonically increasing (or decreasing), if an ≤ an+1 (or an ≥


an+1 ), for all n ∈ N.

3. convergent (konvergent), if there is number a ∈ R with the following


property: For any ε > 0 there is a number N ∈ N such that

|an − a| < ε for all n ≥ N.

The number a is called the limit of (an ) and we write

lim an = a.
n→∞

Example 2.27. Let an = n1 . Obviously the sequence (an ) = (1, 12 , 13 , . . .) is


monotonically decreasing and bounded from above by 1 and bounded from
below by 0. Moreover, it holds that limn→∞ an = 0. To see that the number
zero has the property of a limit let ε > 0 be fixed. Then |an − 0| = n1 < ε if
and only if n > 1ε . Therefore one could choose N = inf{M ∈ N : M > 1ε }
and would get |an − 0| < ε for all n ≥ N . Since this can be done for any
ε > 0 the limit of an is zero.

8
The result of the above example can be generalised to monotone se-
quences.

Theorem 2.28. Every bounded and monotone sequence (an ) converges. It


holds
(
sup{a1 , a2 , . . .}, if an is monotonically increasing
lim an =
n→∞ inf{a1 , a2 , . . .}, if an is monotonically decreasing

Next we connect the concept of sequences to sets.

Definition 2.29. A non-empty set B is called countable (abzählbar) if


there is a sequence (an ) such that B = {an : n ∈ N}.

It is clear that finite sets are by definition countable (see Exercise). The
concept of countability is of more interest for infinite sets, we say countably
infinite sets, which the following examples show.

Example 2.30. The sets N, Z and Q are all countable. The real numbers
R are not countable (see Exercise).

2.5 Product sets


Definition 2.31. Let d ≥ 1 be a natural number and A1 , . . . , Ad be non-
empty sets. The collection of all d-tuples a = (a1 , a2 , . . . , ad ) such that a1 ∈
A1 , a2 ∈ A2 , . . . , ad ∈ Ad is called a product set (Produkt-/Kreuzmenge)
and is denoted by
Yd
Ai = A1 × A2 × . . . × Ad .
i=1
If all factors are theQ
same, i.e. if A1 = A2 = . . . = Ad = A for some set A
then we write A = di=1 Ai .
d

Examples 2.32.
(i) For d ≥ 1 the product set

Rd = R
| × .{z
. . × R}
d times

is the space of all vectors of dimension d. We therefore speak of the space


Rd .
(ii) If A = {0, 1} is the set of binary numbers then

A2 ={(0, 0), (0, 1), (1, 0), (1, 1)}


and A3 ={(0, 0, 0), (0, 0, 1), (0, 1, 0), . . . , (1, 1, 1)}.

(iii) If A = {0, 1} and B = {2, 3} then A × B = {(0, 2), (0, 3), (1, 2), (1, 3)}.

9
(iv) For real numbers a1 ≤ b1 and a2 ≤ b2 let A = [a1 , b1 ] and B = [a2 , b2 ]
be two closed intervals. Then the set A × B = {(x, y) : a1 ≤ x ≤ b1 , a2 ≤
y ≤ b2 } can be considered as a rectangle in the Cartesian plane R2 .

For the space R intervals will be essential for the derivation of the (one-
dimensional) Lebesgue measure. Analogously, for the space R2 rectangles
of type (iv) will be essential for the derivation of the (two-dimensional)
Lebesgue measure. The following definition generalises the concept of inter-
vals or rectangles to any arbitrary (finite) dimension.

Definition 2.33. Consider the space Rd . Let a1 , . . . , ad and b1 , . . . , bd be


real numbers such that aj ≤ bj for j = 1, . . . , d. Then a set Q of the form
d
Y
Q= [aj , bj ]
j=1

is called a (closed d-dimensional) hyperrectangle (Quader). Similarly


one defines the open and half-open hyperrectangles.

In the one-dimensional case Q = [a, b] we have already introduced the


length of the interval, given by b − a. For a two-dimensional hyperrectangle
Q = [a1 , b1 ] × [a2 , b2 ] we analogously introduce Q the area (b1 − a1 ) · (b2 −
a2 ). Generalising this to hyperrectangle Q = dj=1 [aj , bj ] we introduce the
volume
Yd
(bj − aj ) = (b1 − a1 ) · (b2 − a2 ) · . . . · (bd − ad ).
j=1
Q
Remark 2.34. The multiple use of the symbol ’ ‘ is unfortunate but well
established in the literature. However, it should be always clear from the
context whether a product set or a product of real numbers is meant.

2.6 Series
For a sequence (an )n∈N of real numbers or functions and a natural number
N we introduce the notation of a finite series (endliche Reihe) by
N
X
an = a1 + a2 + . . . + aN
n=1

and of an infinite series (unendliche Reihe) by



X m
X
an := lim an = lim a1 + a2 + . . . + am
m→∞ m→∞
n=1 n=1

if the limit exists.

10
Definition 2.35. A double sequence (Doppelfolge) of elements in a non-
empty set A is a map N × N → A, i.e. each pair (n, m) ∈ N × N is associated
to one an,m ∈ A.

Examples 2.36.
n
an,m = m gives all positive rational numbers
an,m = nm gives all powers of natural numbers
an,m = fn (bm ) for sequences of functions fn : R → R and real numbers bm .

Theorem 2.37 (Großer Umordnungssatz). Let (an,m ) be a double-sequence


of non-negative elements such that

X ∞ X
X ∞
an,m := an,m < ∞.
n,m=1 n=1 m=1

Then it holds that


∞ X
X ∞ ∞ X
X ∞
an,m = an,m .
n=1 m=1 m=1 n=1

11
Part II
Measure theory
3 Introduction
3.1 The ’problem of measure‘
In the following we consider the space Rd and the task of measuring its
subsets on the basis of our geometric understanding of hyperrectangles. The
corresponding measure µ should have the following desirable properties.

1. The measure is given by a map µ : P(Rd ) → [0, ∞]. In particular, µ


produces non-negative outcomes and each subset of Rd can be ’mea-
sured‘.

2. For any pair A, B ⊆ Rd with A ⊆ B it holds that µ(A) ≤ µ(B)


(Monotonicity). I.e. if a set B is ’larger‘ than another set A it should
also have at least the same mass.

3. For all A ⊆ Rd and x ∈ Rd it holds that µ(A + x) = µ(A) (Invariance


under translations). This means that it does not matter ’where‘ I
measure the object or that the mass of the object (w.r.t. µ) does not
depend on the location.

4. For all hyperrectangles Q = [a1 , b1 ] × . . . × [ad , bd ] it holds that


d
Y
µ(Q) = (bk − ak ).
k=1

This is the equivalent of the formula for the length of an interval (or
the area of rectangles) to higher dimensions.

5. For all sequences (An ) of pairwise disjoint subsets of Rd it holds that



[  X∞
µ An = µ(An ).
n=1 n=1

The above property is called σ-additivity and can be motivated as


follows. Consider sets that can only be approximated but not parti-
tioned by a finite number of disjoint hyperrectangles, e.g. a circle in
R2 . Then it would be preferable that the mass of the entire set is just
given by the limit of the summed masses of the hyperrectangles that
constitute the set.

Theorem 3.1. There is no such function µ with the properties (1)-(5).

12
Proof. (Sketch) The statement follows if it can be shown that already on
[0, 1]d ( Rd such a function does not exist. For this consider for one x ∈
[0, 1]d the set
Ax := {y ∈ [0, 1]d : x − y ∈ Qd }.
Now let I be an (non-unique) index set that yields the following partition
[
[0, 1]d = Ax , 1
x∈I

i.e. the family (Ax )x∈I consists of pairwise disjoint sets. If we set
[
B := I + {r},
r∈[−1,1]d ∩Qd

then the following holds


(i) The set B is a countable union (since Qd is countable).
(ii) The set B is a disjoint union.
(iii) The set B satisfies [0, 1]d ⊆ B ⊆ [−1, 2]d
Now we would like to measure B and assume that we can do this with a
function µ that obeys the properties (1)-(5). Then property (2) and (4) µ
along with (iii) imply 1 ≤ µ(B) ≤ 3d . Moreover, the facts (i) and (ii) in
conjunction with property (3) and (5) yield
X
1≤ µ(I + {r}) ≤ 3d .
d
| {z }
r∈[−1,1]∩Q µ(I)
| {z }
µ(B)

It is clear that one has µ(I) > 0 since otherwise the above sum would equals
zero which is impossible since it has to be larger than one. On the other hand
the volume µ(B) equals an infinite sum over µ(I) which tends to infinity and
cannot be smaller then 3d . Thus we get a contradiction and the assumption
that such a function exists cannot hold.

There are also other statements that show that the measure problem has no
solution. Prominent examples are given by Vitali’s theorem, the Banach-
Tarski-Paradox and the Hausdorff-Paradox.

3.2 Outline
Theorem 3.1 implies that the set P(Rd ) is too large to establish a ’nice‘
measure theory on. We therefore consider in Section 4 collections of sets
that are as large as possible but still allow for a ’meaningful‘ definition of
measures in Section 5.
1
This is possible since x − y ∈ Qd defines a so-called equivalence relation on [0, 1]d .

13
4 Measurable spaces
As we have seen in Part I it is impossible to introduce the Lebesgue measure
on the entire power set P(Rd ) = {A : A ⊆ Rd }. The aim of this chapter
is to characterise systems of subsets A ⊆ P(Rd ) that are ’very large‘ and
still allow for the derivation of measures with preferable properties. These
systems will be generally introduced for an arbitrary underlying set Ω, such
that Ω = Rd is only a special case.

4.1 σ-Algebras
We begin with a first type of system of sets that will be important for the
construction of the Lebesgue measure.
Definition 4.1. Let Ω be a non-empty set. A collection A ⊆ P(Ω) of
subsets of Ω is called algebra on Ω if the following properties are satisfied.
1. Ω ∈ A, i.e. the system A contains the underlying set Ω.

2. If A ∈ A then also Ac ∈ A, i.e. if a set A is contained in A so is its


complement Ac .

3. If A, B ∈ A, then also A ∪ B ∈ A, i.e. the system A is closed under


unions.
Examples 4.2.
(i) The trivial pair A = {∅, Ω} is an algebra (coarse case).
(ii) The power set A = P(Ω) is an algebra (fine case).
(iii) For any subset A ∈ P(Ω) the collection A = {A, Ac , ∅, Ω} is an algebra.
(iv) For Ω = {1, 2, 3} the system A = {{1}, {2}, {1, 2, 3}, ∅} is not an algebra.
For instance {1} ∈ A but {1}c = {2, 3} ∈/ A or {1}, {2} ∈ A but {1} ∪ {2} ∈ /
A.
Having in mind that a circle is only approximable by infinitely many
rectangles we need to strengthen the concept of an algebra a little bit more.
This leads to the following definition.
Definition 4.3. Let Ω be a set. A system A ⊆ P(Ω) is called σ-Algebra
on Ω if the following properties are satisfied.
1. A is an algebra.
S∞
2. For any sequence (An ) of sets in A also n=1 An ∈ A.
Examples 4.4.
(i) All algebras from Examples 4.2 are also σ-algebras. More general: Any
σ-algebra is an algebra.

14
(ii) If Ω is infinite then A = {A ⊆ Ω : A finite or Ac finite} is an algebra
but not a σ-algebra, cf. Exercises.

Lemma 4.5. Let A be a algebra. Then it holds that

1. ∅ ∈ A.

2. For any A, B ∈ A also A ∩ B ∈ A.

3. For any A1 , . . . , Am ∈ A also m


S Tm
n=1 An ∈ A and n=1 An ∈ A.

4. If A is even a σ-algebra then for An ∈ A, n ≥ 1, also ∞


T
n=1 An ∈ A.

Remark 4.6. Some definitions of (σ-)algebras in the literature use the


criterion ∅ ∈ A instead of the equivalent condition that Ω ∈ A.

Proof. 1. Since Ω ∈ A and since Ac ∈ A for any A ∈ A we have that


∅ = Ωc ∈ A.

2. Let A, B ∈ A. By De Morgan’s rule we know that (A ∩ B)c = Ac ∪ B c ,


which is equivalent to A ∩ B = (Ac ∪ B c )c . The claim now follows if
we show that (Ac ∪ B c )c ∈ A. At first note that Ac , B c ∈ A (since A is
an algebra). Similarly, we see that Ac ∪ B c belongs to A and therefore
also its complement (Ac ∪ B c )c . Thus also A ∩ B ∈ A.

3. Since A is an algebra also B1 = A1 ∪ A2 is an element of A. Now


since B1 and A3 are elements of A also B2 =SB1 ∪ A3 = A1 ∪ A2 ∪ A3
m
belongs to A. Repeat this to see the Tm claim n=1 An ∈ A and use (ii)
proceed in the same way to show n=1 An ∈ A.

we make use of De Morgan’s rule to see that S∞


T
4. As
S∞in (ii), n=1 An =
( n=1 Acn )c . SinceSAcn ∈ A for all n ∈ N and therefore also ∞ c
n=1 An ∈

A we finally get ( n=1 Acn )c which proves the claim.

Lemma 4.7. Let Ω be a non-empty set and (Ai )i∈I be a family of σ-algebras,
where I denotes a non-empty index set. Then
\
Ai = {A ⊆ Ω : A ∈ Ai , ∀i ∈ I}
i∈I

is again a σ-algebra, i.e. the intersection of (arbitrarily many) σ-algebras is


again a σ-algebra.
T
Proof. We need to show that i∈I Ai satisfies the properties of a σ-algebra.
T
• We show: Ω ∈ i∈I Ai . Since for any i ∈ I each Ai is a σ-algebra
Ai must
T contain Ω, i.e. Ω ∈ Ai for all i ∈ I. But this exactly means
Ω ∈ i∈I Ai .

15
• We show: If A ∈ i∈I Ai then also Ac ∈ i∈I Ai . Let A ∈ i∈I Ai ,
T T T
i.e. A ∈ Ai for all i ∈ I. Since for any i ∈ I each Ai is a σ-algebra
Ai must alsoTcontain Ac , i.e. Ac ∈ Ai for all i ∈ I. But this exactly
means Ac ∈ i∈I Ai .
T S∞
• T
We show: If (An ) is a sequence in T i∈I Ai , then also n=1 An ∈
i∈I Ai . Let (An ) be a sequence in i∈I Ai , i.e. (An ) is a sequence in
Ai for all i ∈SI. Since forSany i ∈ I each Ai is a σ-algebra Ai must
∞ ∞
also contain
S∞ n=1 ATn , i.e. n=1 An ∈ Ai for all i ∈ I. But this exactly
means n=1 An ∈ i∈I Ai .

Therefore we have shown all properties.

Lemma 4.8. Let Ω be a non-empty set and let S ⊆ P(Ω) be a collections


of subsets of Ω. Then
\
σ(S) = A
A is σ-algebra with S⊆A

is again a σ-algebra. It is the smallest σ-algebra that contains S.

Proof. By the previous Lemma it is clear that σ(S) is again a σ-algebra.


Moreover, since S ⊆ A for any σ-algebras A in the intersection it also holds
that S ⊆ σ(S).
Finally, assume that there is a smaller σ-algebra B that contains S, i.e.
B satisfies S ⊆ B and B ⊆ σ(S). Then B also appears in the intersection
that yields σ(S), i.e. σ(S) ⊆ B. Since we therefore have σ(S) ⊆ B as well
as B ⊆ σ(S) it holds that B = σ(S), i.e. B is not a smaller system then
σ(S).

Definition 4.9. The system σ(S) is called the σ-algebra that is gener-
ated by S. The system S is called the generator of the σ-algebra σ(S).

Example 4.10. Let Ω = {1, 2, 3, 4} and let S = {{1}, {2}}. Then there are
the following possible σ-algebras containing S:

A1 = P(Ω), A2 = {{1}, {2}, {1, 2}, {3, 4}, {2, 3, 4}, {1, 3, 4}, {1, 2, 3, 4}, ∅}.

Thus we obtain by A2 ⊆ A1

σ(S) = A1 ∩ A2 = A2 .

Lemma 4.11. Let Ω be a non-empty set.

1. If A is a σ-algebra on Ω then σ(A) = A.

2. If S, S 0 ⊆ P(Ω) are two systems of sets such that S ⊆ S 0 then σ(S) ⊆


σ(S 0 ).

16
Proof. Since σ(A) is the smallest σ-algebra that contains A the first state-
ment holds.
Let A be a σ-algebra that contains S. Then the σ-algebra A0 = σ(A∪S 0 )
contains A as well as S 0 . In other words: Any σ-algebra that contains S is
contained in some σ-algebra that also contains S 0 . Thus
 \   \ 
σ(S) = A ⊆ A0 = σ(S 0 ).
A is σ-algebra with S⊆A A0 is σ-algebra with S 0 ⊆A0

Definition 4.12. Let Ω be a non-empty set and A be σ-Algebra on Ω.


Then the pair (Ω, A) is called a measurable space. The elements of A are
called measurable sets.

Example 4.13. For a non-empty-set Ω and a subset A ⊆ Ω a measurable


space is given by (Ω, σ(A)), where σ(A) = {A, Ac , ∅, Ω}.

4.2 Borel σ-Algebra


Now we consider the special case Ω = Rd and the system of so-called Borel
sets, the Borel σ-algebra. We first focus on the one-dimensional case d = 1,
i.e. Ω = R, and generalise to arbitrary d ∈ N afterwards.

Definition 4.14. Denote by I1 ⊆ P(R) the system of sets that consists of


all left half-open intervals, i.e. I1 contains all elements of type

(a, b], (−∞, b], (a, ∞), (−∞, ∞),

where a ≤ b are real numbers. The σ-algebra B(R) := σ(I1 ) is called the
Borel σ-algebra. A subset A ∈ B(R) is called Borel set.

Examples 4.15.
(i) Any one-elementary set {x} for x ∈ R is a Borel set. This can be seen
since {x} = ∞ 1
T
n=1 (x − n x].
,
(ii) By (i) also any finite or countably infinite set B ⊆ R is a Borel set. In
particular, the rational numbers Q form a Borel set in R.

Lemma 4.16. Any of the following systems of sets is also a generator of


the Borel σ-algebra.

(a) S1 = {(a, b] : a, b ∈ R, a ≤ b}
(b) S2 = {(a, b) : a, b ∈ R, a ≤ b}
(c) S3 = {[a, b] : a, b ∈ R, a ≤ b}
(d) S4 = {(a, ∞) : a ∈ R}
(e) S5 = {[a, ∞) : a ∈ R}

17
Proof. For (a) and (b), cf. exercises. The remaining systems (c), (d) & (e)
are left to the reader.

Remark 4.17. The advantage of I1 as generator of B(R) over any of the


systems Si , i = 1, . . . , 5, lies in the fact that complements of elements I1
are representable by finite disjoint unions of elements of I1 :

(a, b]c = (−∞, a] ∪ (b, ∞), (−∞, b]c = (b, ∞],

(a, ∞)c = (−∞, a], (−∞, ∞)c = ∅.


Note that ∅ ∈ I1 since for any a ∈ R we have (a, a] = ∅. Similarly, also
intersections between elements of I1 are again elements of I1 .
Definition 4.18. A set U ⊆ R is called open if for any x ∈ U there is a
number ε > 0 such that
(x − ε, x + ε) ⊆ U.
A set V ⊆ R is called closed if its complement R\V is open.
The definition implies that the union of arbitrarily many open sets is
again open. Using complements, one can see that the intersection of arbi-
trarily many closed sets is again closed.
Examples 4.19.
(i) Any open interval (a, b) with −∞ ≤ a ≤ b ≤ ∞ is indeed an open set.
(ii) Any one-elementary set {x} for x ∈ R is a closed set.
(iIi) Any closed interval [a, b] with −∞ < a ≤ b < ∞ is indeed a closed set.
(iv) Any finite half open interval is neither closed nor open.
Theorem 4.20. All open and closed sets of R are Borel sets. Moreover, if
U and V denote the collection of all open and closed sets of R, respectively,
then it holds that
B(R) = σ(U) = σ(V).
Proof. If U ∈ U is an open then for any x ∈ U there is some ε > 0 such that
(x − ε, x + ε) ⊆ U . In particular, we have that

x − ε < x < x + ε.

It is easy to see that there are numbers a, b ∈ Q such that

x − ε ≤ a < x < b ≤ x + ε,

hence (a, b) ⊆ (x − ε, x + ε) ⊆ U . In other words: x ∈ U implies x ∈ (a, b),


where (a, b) ⊆ U with some a, b ∈ Q. This gives
[
U= (a, b).
a,b∈Q with (a,b)⊆U

18
In particular, U is a countable union of sets of type (a, b) hence U ∈ σ(S2 ).
This means σ(U) ⊆ σ(S2 ) = B(R).
On the other hand, we know that (a, b) with a ≤ b is an open set2 , i.e.
B(R) = σ(S2 ) ⊆ σ(U). This together with σ(U) ⊆ B(R) gives σ(U) =
B(R).
Finally, for any closed set V ∈ V we have that V c is open, i.e. V c ∈
σ(U). But σ(U) is a σ-algebra, therefore also V = (V c )c ∈ σ(U). This
gives σ(V) ⊆ σ(U). In the same way we get σ(U) ⊆ σ(V) and the claim
follows.

Lemma 4.21. The system of sets given by


m
n[ o
F1 := Qi : m ∈ N, Q1 , . . . , Qm ∈ I1 pairwise disjoint
i=1

is an algebra. It is the smallest algebra that contains I1 .

Proof. We check the properties of an algebra.

• We show R ∈ F1 : This is clear since R = (−∞, ∞) ∈ I1 .

• Let A ∈ F1 , i.e. A = m
S
i=1 Qi for some m ∈ N and some pairwise
disjoint Q1 , . . . , Qm in I1 . W.l.o.g. assume that Qi 6= ∅ for all i =
1, . . . , m and that the sets are ordered, i.e. for x ∈ Qi and y ∈ Qi+1 we
always have x < y for any i = 1, . . . , m − 1. We consider the following
cases

– If Qi = (−∞, ∞) for one i ∈ {1, . . . , m} then m = 1, hence


A = (−∞, ∞) and thus Ac = ∅ ∈ I1 and in particular Ac ∈ F1 .
– If Q1 = (−∞, b1 ] and Qi = (ai , bi ], for i = 2, . . . , m, then, by De
Morgan’s rule,
m
[ c m
\ m
\
Ac = Qi = Qci = (b1 , ∞) ∩ ((−∞, ai ] ∪ (bi , ∞))
i=1 i=1 i=2

Since for arbitrary sets C, D it holds that C ∩ D = C ∩ C ∩ D a


repetition of this argument gives
m 
\  
Ac = (b1 , ∞) ∩ (−∞, ai ] ∪ (bi , ∞)
i=2
2
It can be shown that (a, a) = ∅ is open (and closed at the same time)

19
The distributive law C ∩ (D ∪ E) = (C ∩ D) ∪ (C ∩ E) implies
m 
\  
c
A = (b1 , ∞) ∩ (−∞, ai ] ∪ (bi , ∞)
i=2
\m    
= (b1 , ∞) ∩ (−∞, ai ] ∪ (b1 , ∞) ∩ (bi , ∞)
i=2
\m   [ \ \ 
= (b1 , ai ] ∪ (bi , ∞) = (b1 , ai ] (bj , ∞) .
i=2 I,J⊆{2,...,m}, i∈I j∈J
I=J c

The last equality is shown by a repetition of the distributive law


via induction over m. Note that by assumption b1 ≤ a2 < b2 ≤
a3 < b3 ≤ . . . < bm . This implies
(
\ (b1 , amin I ], if I 6= ∅
(b1 , ai ] = ,
i∈I
(−∞, ∞), else

and (
\ (bmax J , ∞), if J 6= ∅
(bj , ∞) = .
j∈J
(−∞, ∞), else
We therefore get
[  
Ac = (b1 , a2 ] ∪ (bm , ∞) ∪ (b1 , amin I ] ∩ (bmax J , ∞)
I,J⊆{2,...,m},
I=J c ,I,J6=∅

But (b1 , amin I ] ∩ (bmax J , ∞) 6= ∅ is only possible if min I − 1 =


max J. In this particular case we have (b1 , amin I ] ∩ (bmax J , ∞) =
[bmax J , amin I ) = [bmin I−1 , amin I ). All in all we get
m
[
Ac = (b1 , a2 ] ∪ (bm , ∞) ∪ [bk−1 , ak )
k=3

and in particular Ac ∈ F1 .
– The two other cases involving Qm = (bm , ∞) can be shown anal-
ogously.

• Now let A, B ∈ F1 , i.e. A = m


S Sn
i=1 Qi and B = j=1 Rj , where
Q1 , . . . , Qm , R1 , . . . , Rn ∈ I1 . As before, assume that the sets are
ordered. Tbd

Similarly we give now the higher-dimensional analogous definitions and


statements but the proofs are omitted.

20
Definition 4.22. Denote by Id ⊆ P(Rd ) the system of sets that consists of
all left half-open hyperrectangles, i.e.
d
nY o
d
Id = (aj , bj ] ∩ R : −∞ ≤ aj ≤ bj ≤ ∞, j = 1, . . . , d .
j=1

The σ-algebra B(Rd ) := σ(Id ) is called the Borel σ-algebra. A subset


A ∈ B(Rd ) is called Borel set.
Definition 4.23. A set U ⊆ Rd is called open if for any x = (x1 , . . . , xd ) ∈
U there is a number ε > 0 such that
d
Y
(xj − ε, xj + ε) ⊆ U.
j=1

A set V ⊆ Rd is called closed if its complement Rd \V is open.


Examples 4.24.
Qd
(i) Any of the open hyperrectangles j=1 (aj , bj ) with −∞ < aj < bj < ∞
is indeed an open set.
(ii) Any closed hyperrectangle is indeed a closed set.
(iii) Any one-elementary set {x} for x ∈ Rd is a closed set.
(iv) The set {(x, y) : x2 + y 2 < 1} ⊆ R2 is an open set.
Theorem 4.25. All open and closed sets of Rd are Borel sets. Moreover, if
U and V denote the collection of all open and closed sets of Rd , respectively,
then it holds that
B(Rd ) = σ(U) = σ(V).
Lemma 4.26. The system of sets given by
m
n[ o
Fd := Qi : m ∈ N, Q1 , . . . , Qm ∈ Id pairwise disjoint
i=1

is an algebra. It is the smallest algebra that contains Id .

4.3 σ-algebras under maps


A nice structural property of σ-algebras appears in the interplay with maps.
First note the following.
Lemma 4.27. Let E, F be non-empty sets and f : E → F be a map. If
(Bi )i∈I is a family of subsets of F , where I is some non-empty index set.
then it holds that [  [
f −1 Bi = f −1 (Bi ).
i∈I i∈I

21
If A and B are subsets of F then it holds that

f −1 (B\A) = f −1 (B)\f −1 (A).

In particular, f −1 (Ac ) = (f −1 (A))c

Proof. See exercises.

Theorem 4.28. Let f : E → F be a map from a non-empty set E to a


measurable space (F, B). Then the following statements hold:

1. The system of sets

f −1 (B) = {f −1 (B) : B ∈ B}

is a σ-algebra (on E).

2. For any system of sets S ⊆ P(F ) we have that

σ(f −1 (S)) = f −1 (σ(S)).

Proof. 1. We check the properties of a σ-algebra.

• Show E ∈ f −1 (B): Since B is a σ-algebra we have F ∈ B. Now

f −1 (F ) = {y ∈ E : f (y) ∈ F } = {y : y ∈ E} = E.

The second equality holds true since f is a map, i.e. f assigns


to each element y ∈ E exactly one element f (y) in F . This gives
E ∈ f −1 (B).
• Show that A ∈ f −1 (B) implies Ac ∈ f −1 (B): Let A ∈ f −1 (B),
i.e. there is some set B ∈ B such that A = f −1 (B). Now, by
Lemma 4.27 we have that Ac = (f −1 (B))c = f −1 (B c ). Since B
is a σ-algebra we know B c ∈ B, thus Ac is the pre-image of an
element of B. This means Ac ∈ f −1 (B).
• Let
S∞ (An ) be a−1sequence of sets in −1f −1 (B). We show that also
n=1 An ∈ f (B). Since An ∈ f (B) there is a set Bn ∈ B
such that An = f −1 (Bn ) for all n ∈ N. Use Lemma 4.27 to see
that
[∞ ∞
[ [∞ 
−1 −1
An = f (Bn ) = f Bn .
n=1 n=1 n=1

Since B is a σ-algebra we have that n=1SBn ∈ B, hence ∞


S∞ S
n=1 An
−1 ∞ −1
is the pre-image of a set in f (B), i.e. n=1 An ∈ f (B).

22
2. From f −1 (S) ⊆ f −1 (σ(S)) follows σ(f −1 (S)) ⊆ f −1 (σ(S)).
To show the reverse inclusion set C := {C ⊆ F : f −1 (C) ∈ σ(f −1 (S))}.
The system C is a σ-algebra in F and, by definition of C, we have that
S ⊆ C. Thus we also have σ(S) ⊆ C and therefore f −1 (σ(S)) ⊆
f −1 (C) ⊆ σ(f −1 (S)).

The above theorem implies that we are always given a σ-algebra f −1 (B)
on the domain E which is ’inherited‘ from B via the map f . To determine
f −1 (B) it suffices to generate over all pre-images from a generator S of B.

5 Measures
5.1 Definition & Properties
Definition 5.1. Let (Ω, A) be a measurable space. A function µ : A → R
is called a measure on A if
1. µ(∅) = 0,

2. µ(A) ≥ 0, for any A ∈ A,

3. µ is σ-additive on A, i.e. for any sequence (An ) of pairwise disjoint


sets in A it holds that

[  ∞
X
µ An = µ(An ).
n=1 n=1

Remark 5.2. Note that it is allowed that µ assigns the outcome ’∞’ to a
set.
Definition 5.3. Let (Ω, A) be a measurable space. If µ is a measure on A
then the triplet (Ω, A, µ) is called a measure space.
Definition 5.4. Let (Ω, A, µ) be a measure space.
1. If µ(Ω) < ∞ then µ is called a finite measure.

2. If µ(Ω) = 1 then µ is called a probability measure and the triplet


(Ω, A, µ) is called a probability space.

3. If Ω is finite or countably infinite and if A = P(Ω) then (Ω, P(Ω), µ)


is called a discrete measure space.

4. If there is a sequence (An ) such that Ω = ∞


S
n=1 An and µ(An ) <
∞, n ≥ 1, then µ is called a σ-finite measure.
Examples 5.5.

23
(i) Consider Ω = N and A = P(N). For A ∈ A let
(
|A|, if |A| < ∞
µ(A) = ,
∞, else

i.e. µ(A) equals the number |A| of elements in A if A is finite. Then µ


is a measure on P(N), the so-called counting measure, and the triplet
(N, P(N), µ) is a discrete measure space. Note that µ is not finite but σ-
finite.
(ii) Let (Ω, A) be a measurable space. Then for each x ∈ Ω the function
(
1, if x ∈ A
δx (A) := 1A (x) =
0, else

is a probability measure and (Ω, A, δx ) is a probability space.


(iii) Let Ω = {1, 2, . . . , 6} and A = P(Ω). Then the function
6
X 1
P (A) = · 1A (i)
6
i=1

is a probability measure and (Ω, A, P ) is a probability space.


(iv) Let (Ω, A) = (R, B(R)). We will see in the next sections that there is a
unique (continuous) measure λ on B(R) - the Lebesgue measure - such that
λ((a, b]) = b − a, for any −∞ < a ≤ b < +∞.
Lemma 5.6. Let (Ω, A, µ) be a measure space.
1. (Additivity) For A1 , . . . , Am , ∈ A pairwise disjoint, m ≥ 2, it holds
that
[ m  X m
µ An = µ(An ).
n=1 n=1

2. (Monotonicity) For A, B ∈ A with A ⊆ B it holds that

µ(A) ≤ µ(B).

3. (Subtractivity) For A, B ∈ A with A ⊆ B and µ(A) < ∞ it holds that

µ(B\A) = µ(B) − µ(A).

Proof. 1. Let Bn be the sequence given by Bn = An for any n ∈ {1, . . . , m}


and Bn = ∅ for any n > m. Then, since µ is a measure µ(∅) = 0 and
the σ-additivity imply
m
[  ∞
[  X∞ m
X m
X
µ An = µ Bn = µ(Bn ) = µ(Bn ) = µ(An ).
n=1 n=1 n=1 n=1 n=1

24
2. For A ⊆ B we have that A and B\A are disjoint. By the previous
statement (additivity) we have

µ(B) = µ(A ∪ B\A) = µ(A) + µ(B\A) ≥ µ(A),

since µ(B\A) ≥ 0.

3. Consider again the derived equality

µ(B) = µ(A) + µ(B\A).

Since µ(A) < ∞ this is equivalent to

µ(B) − µ(A) = µ(B\A).

Theorem 5.7. Let (Ω, A, µ) be a measure space.

1. If (An ) is a sequence of sets in A such that An ⊆ An+1 for all n ≥ 1


then it holds that
[∞ 
lim µ(An ) = µ An .
n→∞
n=1

2. If (An ) is a sequence of sets in A such that An ⊇ An+1 for all n ≥ 1


and if µ(A1 ) < ∞ then it holds that

\ 
lim µ(An ) = µ An .
n→∞
n=1

These to properties are called σ-continuity (from below and above).

Proof. 1. Set B1 = A1 and set Bk = Ak \(A1 ∪ . . . ∪ Ak−1 ) = Ak \Ak−1 .


Then it holds that

(a) (Bn ) is a sequence of pairwise disjoint sets of A,


(b) m
S
n=1 Bn = Am for all m ∈ N,
(c) n=1 Bn = ∞
S∞ S
n=1 An .

Now it follows that


n
[  n
X ∞
X
lim µ(An ) = lim µ Bk = lim µ(Bk ) = µ(Bk )
n→∞ n→∞ n→∞
k=1 k=1 k=1

[  ∞
[ 
=µ Bk = µ Ak .
k=1 k=1

25
2. Set Cn = A1 \An . Then we have Cn ⊆ Cn+1 . By 1. we get

[ 
lim µ(Cn ) = µ Cn .
n→∞
n=1

Moreover, we know that

µ(Cn ) = µ(A1 \An ) = µ(A1 ) − µ(An )

as well as

[  ∞
[   ∞
\  ∞
\ 
µ Cn = µ A1 \An = µ A1 \ An = µ(A1 )−µ An .
n=1 n=1 n=1 n=1

A combination of the last three equations gives



[  ∞
\ 
lim (µ(A1 )−µ(An )) = lim µ(Cn ) = µ Cn = µ(A1 )−µ An .
n→∞ n→∞
n=1 n=1

Since limn→∞ (µ(A1 ) − µ(An )) = µ(A1 ) − limn→∞ µ(An ) we get the


claim.

5.2 Construction Of Measures


In general we do not know all elements of a σ-algebra A from a measurable
space (Ω, A). Since we still would like to assign to each A ∈ A a measure
µ(A) we need to find a way how the measure of known sets induces a measure
on any arbitrary set. We begin with the discrete case.

Definition 5.8. Let I ⊆ N, Ω be some set and A := {ai : i ∈ I} ⊆ Ω be a


finite or countably infinite subset of Ω. Moreover, let (pi )i∈I be a sequence
of positive real numbers. Then the function µ given by
X X
µ(B) := pi · 1B (ai ) = pi , B ⊆ Ω,
i∈I i∈I:ai ∈B

defines a measure on the σ-algebra P(Ω). In particular, it holds that


µ({ai }) = pi , for all i ∈ I, and µ(Ω\A) = 0. The measure µ is called
discrete measure and A is called the support of µ. If it holds that
pi = 1, for all i ∈ I then µ is called counting measure on A, since

µ(B) = |A ∩ B|=number
ˆ of elements a ∈ A that are also elements of B.

In particular, discrete measures are completely determined by the pairs


(ai , pi )i∈I , hence they are rather easy to handle.

26
P
Example 5.9. If i∈I pi = 1 then µ is a probability measure, more pre-
cisely: a discrete probability measure or discrete probability distri-
bution. If the support A = {a1 , . . . , an } is finite and if pi = 1/n then
the corresponding probability measure is called uniform distribution on
{a1 , . . . , an }. Examples are given by rolling a die, where n = 6, pi = 1/6, i =
1, . . . , 6, or tossing a coin, where n = 2, pi = 1/2, i = 1, 2.
So far, we have introduced the concept of measures on given σ-algebras.
However, measures are generally defined on rather small systems, i.e. sub-
systems of σ-algebras, and extended to larger structures. We have just
seen how this works in case of discrete measures, where the knowledge of
(ai , pi )i∈I gives rise to a measure on the entire product set. In the following
we will present a general approach that will allow to define (continuous)
measures on (R, B(R)).
Definition 5.10. Let A be an algebra on some non-empty set Ω. A function
µ0 : A → R is called a content on A if
1. µ0 (∅) = 0

2. µ0 (A) ≥ 0, for any A ∈ A,

3. µ0 (A ∪ B) = µ0 (A) + µ0 (B) for all disjoint A, B ∈ A.


additionally satisfies for sequences (An ) of pairwise disjoint sets in A
If µ0 S
with ∞ n=1 An ∈ A that

[  ∞
X
µ0 An = µ0 (An ),
n=1 n=1

then µ0 is called a pre-measure.


Remark 5.11. The only difference to the definition of a measure is that a
pre-measure is defined on an algebra instead of a σ-algebra.
Theorem 5.12 (Extension theorem). Let Ω be a non-empty set and µ0 be
a σ-finite pre-measure on some algebra A0 ⊆ P(Ω). Then there is a unique
σ-finite measure µ on the measurable space (Ω, σ(A0 )) such that

µ(A) = µ0 (A) for all A ∈ A0 .

The theorem is sometimes called Carathéodory extension theorem. The


proof is rather lengthy and complicated but we will give a brief sketch of
the ideas:

Proof. For any A ⊆ Ω set



nX ∞
[ o
µ∗ (A) := inf µ0 (An ) : An ∈ A0 , for all n ∈ N, A ⊆ An ,
n=1 n=1

27
where inf ∅ := 0. The function µ∗ : P(Ω) → R is called the pre-measure
induced by µ0 . We further set

Aµ∗ := {A ∈ P(Ω) : µ∗ (B) = µ∗ (A∩B)+µ∗ (Ac ∩B) for all B ⊆ Ω with µ∗ (B) < ∞}.

Now show the following properties of µ∗ and Aµ∗ :

1. Aµ∗ is a σ-algebra,

2. A0 ⊆ Aµ∗ , hence σ(A0 ) ⊆ Aµ∗

3. the restriction µ := µ∗ |Aµ∗ of µ∗ to Aµ∗ is a measure on (Ω, Aµ∗ ),

4. µ∗ (A) = µ(A) for all A ∈ A0 ,

5. µ∗ is the only σ-finite extension of µ0 to σ(A0 ).

Remark 5.13. For A ∈ σ(A0 ) it holds that



nX ∞
[ o
µ(A) = inf µ0 (An ) : An ∈ A0 , A ⊆ An .
n=1 n=1

5.3 The Lebesgue Measure


Lemma 5.14. For A = m
S
k=1 (ak , bk ] ∈ F1 with (a1 , b1 ], . . . , (ak , bk ] pairwise
disjoint and ak ≤ bk , k = 1, . . . , m set
m
X
λ0 (A) := (bk − ak ),
k=1

where λ0 (A) := +∞ if A contains an infinite interval (i.e. an interval of


type (−∞, b], (a, ∞) or (−∞, ∞)). Then λ0 is a content on the algebra F1 .

Proof. We check the properties.

• With ∅ = (a, a] for some a ∈ R one clearly has λ0 (∅) = a − a = 0.

• Let A ∈ F1 . It suffices
Sm to check the case where A contains no infinite
interval, i.e. A = k=1 (ak , bk ] ∈ F1 with (a1 , b1 ], . . . , (ak , bk ] pairwise
disjoint and ak ≤ bk , kP = 1, . . . , m. Then bk − ak ≥ 0, for all k =
1, . . . , m hence λ0 (A) = mk=1 (bk − ak ) ≥ 0.

• If A, B ∈ F1 are disjoint then it also suffices to consider


Smthe cases
where A and B contain no infinite interval, i.e. A = i=1 Qi and

28
B = m+n
S
i=m+1 Qi , where Qi = (ai , bi ], i = 1, . . . , m + n, are pairwise
disjoint sets. Then, by definition of λ0 ,
 m+n
[  m+n
X
λ0 (A ∪ B) =λ0 Qi = (bi − ai )
i=1 i=1
m
X m+n
X
= (bi − ai ) + (bi − ai ) = λ0 (A) + λ0 (B).
i=1 i=m+1

Theorem 5.15. The content λ0 is even a σ-finite pre-measure on the al-


gebra F1 . In particular, the pre-measure λ0 can be uniquely extended to a
σ−finite measure λ on B(R).

Proof. By R = ∞
S
n=1 (−n, n] and λ0 ((−n, n]) = 2n, for all n ∈ N, we see
that λ0 is σ-finite.
It is left to show σ-additivity. Assume that (An ) is a sequence of pairwise
disjoint elements in F1Ssuch that S∞
S
n=1 A n ∈ F1 . Here we only consider the
case in which (a, b] = ∞ A
n=1 n = ∞
n=1 n , bn ] with (an , bn ] being pairwise
(a
disjoint (the other cases follow similarly). Then we need to show that

X ∞
X
λ0 ((a, b]) = b − a = bn − an = λ0 ((an , bn ]).
n=1 n=1

We first show that the right-hand side is greater or equal the left-hand side
and then we show the reverse.
Let (a1 , b1 ], . . . , (am , bm ] be finitely many pairwise disjoint intervals such
that their union is equal to (a, b]. W.l.o.g. assume that the intervals are non-
empty and ordered (otherwise throw the empty-ones out and/or rearrange),
i.e.
a ≤ a1 < b2 ≤ a2 < b2 ≤ . . . < bm−1 ≤ am < bm ≤ b.
Then this implies that
m
X m
X
λ0 ((a, b]) = b − a ≥ bm − a1 ≥ bi − ai = λ0 ((ai , bi ]).
i=1 i=1

Since m was arbitrary, we get λ0 ((a, b]) ≥ ∞


P
i=1 λ0 ((ai , bi ]).
We show the reverse. Let ε > 0 and −i−1 . Then, by a standard
P∞ εi = ε2
result of analysis (geometric series), i=1 εi = ε/2. It holds that

[
[a, b] ⊆ (ai − εi , bi + ε) (’open covering‘).
i=1

29
Then, by the Heine-Borel theorem, there is a finite covering, i.e. there is a
finite set I ⊆ N such that
[
[a, b] ⊆ (aj − εj , bj + εj ).
j∈I

We therefore get
X X
λ0 ((a, b]) ≤ λ0 ((aj − εj , bj + εj ]) ≤ bj − aj + 2εj
j∈I j∈I
X ∞
X
≤ε + λ0 ((aj , bj ]) ≤ ε + λ0 ((aj , bj ]).
j∈I j=1
P∞
Since ε > 0 was arbitrary we get λ0 ((a, b]) ≤ i=1 λ0 ((ai , bi ]).

Definition 5.16. The measure λ constructed in Theorem 5.14 is called


Lebesgue measure on B(R).

Lemma 5.17. The Lebesgue measure λ on B(R) satisfies the following:

1. λ({x}) = 0 for all x ∈ R.

2. λ((a, b]) = λ([a, b]) = λ((a, b)) = λ([a, b)) = b − a, for all a ≤ b.

3. If A ⊆ R is countable then λ(A) = 0, hence λ(Q) = 0.

4. λ(A + x) = λ(A) for all A ∈ B(R), x ∈ R.

In particular the Lebesgue measure λ satisfies all the desirable properties


discussed in Section 3 on the measurable space (R, B(R)).

Proof. 1. Follows from σ-continuity via:



\ 
λ({x}) =λ (x − 1/n, x] = lim λ(x − 1/n, x]
n→∞
n=1
1
= lim = 0.
n→∞ n

2. Follows from 1 and additivity.

3. Follows since there is a sequence (ai )i∈I in R with I ⊆ N such that


A = {ai : i ∈ I} and λ({ai }) = 0 (by 1). The σ-additivity of λ gives
the claim: [ X
λ(A) = λ( {ai }) = λ({ai }) = 0.
i∈I i∈I

4. Follows from Remark 5.13.

30
Qd
More generally, it is possible to see that for hyperrectangles Q = i=1 (ai , bi ]
the function
Y d
d
λ0 (Q) = (bi − ai )
i=1

defines a σ-finite pre-measure on the algebra S1d . Thus λd0 induces a measure
λd on (Rd , (Rd )), the d-dimensional Lebesgue measure.

5.4 Probability measures


Definition 5.18. On a probability space of the form (R, B(R), P ) the dis-
tribution function F : R → [0, 1] of the measure P is defined by

F (x) = P ((−∞, x]), x ∈ R.

Theorem 5.19. If P is probability measure on B(R) then its distribution


function F has the following properties

(i) F is monotonically increasing, i.e. for x, y ∈ R with x ≤ y it holds


that F (x) ≤ F (y).

(ii) It holds that limx→−∞ F (x) = 0 and limx→∞ F (x) = 1.

(iii) F is right-continuous, i.e. the limit limy↓x F (y) exists for all x ∈ R.

Proof. (i) Let x ≤ y. Then (−∞, x] ⊆ (−∞, y] and, by monotonicity of


P,
F (x) = P ((−∞, x]) ≤ P ((−∞, y]) = F (y).

(ii) Let (xn ) be a monotonically increasing, unbounded sequence. Then


it holds that (−∞, xn ] ⊆ (−∞, xn+1 ] and ∞
S
n=1 (−∞, xn ] = (−∞, ∞).
But then σ-continuity implies

[ 
lim F (xn ) = lim P ((−∞, xn ]) = P (−∞, xn ]
n→∞ n→∞
n=1
=P ((−∞, ∞)) = 1.

Now let (xn ) be a monotonically decreasing, unbounded sequence.


Then it holds that (−∞, xn ] ⊇ (−∞, xn+1 ] and ∞
T
n=1 (−∞, xn ] = ∅.
But then σ-continuity implies

\ 
lim F (xn ) = lim P ((−∞, xn ]) = P (−∞, xn ]
n→∞ n→∞
n=1
=P (∅) = 0.

31
(iii) For a monotonically decreasing sequence (xn ) with xn → x (as n → ∞)
it holds that ∞
T
n=1 (−∞, xn ] = (−∞, x]. Again, by σ-continuity,

\ 
lim F (xn ) = lim P ((−∞, xn ]) = P (−∞, xn ]
n→∞ n→∞
n=1
=P ((−∞, x]) = F (x).

Example 5.20. The distribution function of rolling a die is given by F (x) =


1 P6
6 i=1 1{y∈R:i≤y} (x). More generally, if µ is a discrete measure on (R, B(R))
that is given by (an , pn )n∈N then its distribution function equals
X
F (x) = µ((−∞, x]) = pk .
k∈N:ak ≤x

Distribution functions could be more generally defined for any finite


measure µ, i.e. any measure with µ(Ω) = C < ∞. The corresponding
statements only need a slight adjustment w.r.t. C, e.g. the distribution
function F (x) = µ((−∞, x]) satisfies limx→∞ F (x) = C, etc.

32
Part III
Integration theory
6 Measurable maps
6.1 Definition & first Properties
Definition 6.1. Let (Ω, A) and (E, B) be measurable spaces. A map f :
Ω → E is called (A, B)-measurable (or short: measurable), if f −1 (B) ∈
A for all B ∈ B. If (E, B) = (R, B(R)) then f is called Borel measurbale.

Example 6.2. Let α ∈ R\{0} and A ⊆ Ω. Then the function f (y) =


α1A (y), y ∈ Ω is A-measurable if and only if A ∈ A. In particular, every
constant function is measurable. To see this we note that

∅

 0∈/ B, α ∈/B

A 0 ∈ / B, α ∈ B
f −1 (B) = c
, B ∈ B(R).


A 0 ∈ B, α ∈/ B

Ω, 0 ∈ B, α ∈ B

Thus f −1 (B) ∈ A, ∀B ∈ B(R), if and only if A ∈ A.


The function f is constant if and only if A = Ω or A = ∅, which in both
cases implies A ∈ A.

Lemma 6.3. Let Ω be a non-empty set, (F, B) be a measurable space and


f : Ω → F be a map. Then the system σ(f ) := f −1 (B) is the smallest
σ-algebra A on Ω such that f is a (A, B)-measurable function. We call σ(f )
the σ-algebra that is induced by f .

Proof. In Lemma 4.28 we already showed that σ(f ) is a σ-algebra. By


construction of σ(f ) it is clear that f is (σ(f ), B)-measurable. On the other
hand, if A is a σ-algebra on E such that f is (A, B)-measurable, then it
holds that σ(f ) = f −1 (B) ⊆ A.

Theorem 6.4. If S is a generator of B, i.e. B = σ(S), then f is (A, B)-


measurable if and only if f −1 (S) ⊆ A, i.e. if and only if f −1 (S) ∈ A for all
S ∈ S.

Proof. By definition, we clearly have f −1 (S) ⊆ A. By Theorem 4.28 we


obtain
f −1 (B) = f −1 (σ(S)) = σ(f −1 (S)) ⊆ σ(A) = A.

Corollary 6.5. Let (Ω, A) be a measurable space and f : Ω → R be a map.


Then the following statements are equivalent.

33
(i) f is Borel measurable.

(ii) {x ∈ Ω : f (x) ≤ y} ∈ A for all y ∈ R.

(iii) {x ∈ Ω : f (x) < y} ∈ A for all y ∈ R.

(iv) {x ∈ Ω : f (x) > y} ∈ A for all y ∈ R.

(v) {x ∈ Ω : f (x) ≥ y} ∈ A for all y ∈ R.


Proof. (i) implies (ii) and (iii): The set (−∞, y] is a Borel set, for any y ∈ R.
Since f −1 ((−∞, y]) = {x ∈ Ω : f (x) ≤ y} and since f is Borel-measurable,
(ii) must holds. In the same way follows (iii) by f −1 ((−∞, y)) = {x ∈ Ω :
f (x) < y}.
(ii) implies (i): Since (a, b] = (−∞, b]\(−∞, a] for any a, b ∈ R with
a < b and since f −1 ((−∞, b]\(−∞, a]) = f −1 ((−∞, b])\f −1 ((−∞, a]). The
previous theorem can now be applied with S = S1 = {(a, b] : a, b ∈ R :
a ≤ b}: Since S1 generates B(R) we therefore get f −1 (B(R)) ⊆ A, by the
preceding theorem. Therefore we get that f is (A, B(R))-measurable, i.e. f
is Borel-measurable.
(iii) implies (i): This follows analogously using that S2 = {(a, b) : a, b ∈
R, a ≤ b} generates B(R).
Since (iv) and (v) are the complements of (ii) and (iii) (and therefore also
in the σ-algebra A) we get the equivalence between these statements.

Corollary 6.6. A function f : Ω → Rd is Borel measurable if f −1 (U ) ∈ A


for all open sets U ⊆ R. In particular, every continuous function f : Rm →
Rd is Borel measurable.
Proof. Since the collection of all open sets U in Rd generates B(Rd )) it is
clear that f −1 (U ), for all U ∈ U, implies measurability (by the previous
theorem). Moreover a function is continuous if and only if the pre-image
of open sets are again open. Hence f −1 (U ) is open in Rm and therefore
belonging to B(Rm ).

Lemma 6.7. Let f, g : Ω → R be two Borel measurable functions and let


c ∈ R. Then the functions

cf, f 2, f + g, f g, |f |

are Borel measurable as well.


Proof. See exercises.

Lemma 6.8. Let (fn ) be a sequence of Borel measurable functions fn : Ω →


R. Then also the functions f , f given via

f (x) := sup fn (x), f (x) := inf fn (x)


n≥1 n≥1

34
are Borel measurable in case that they are well-defined.
If fn (x) converges for every x ∈ Ω then also the limit function f given
by f (x) := limn→∞ fn (x) is Borel measurable.

Proof. Since fn is Borel-measurable we have {x ∈ Ω : fn (x) T∞≤ y} ∈ A


for any y ∈ R. This gives {x ∈ Ω : supn∈N fn (x) ≤ y} = n=1 {x ∈ Ω :
fn (x) ≤ y} ∈ A. But this implies supn∈N fn is Borel measurable (by the
above Corollary).
Analogously we get with {x ∈ Ω : inf n∈N fn (x) < y} = ∞
S
n=1 {x ∈ Ω :
fn (x) < y} ∈ A, for all y ∈ R the statement for the infimum.
Note that fn converges if and only if lim supn→∞ fn = lim inf n→∞ fn ,
where the identities

lim sup fn = inf sup fm , lim inf fn = sup inf fm


n→∞ n≥1 m≥n n→∞ n≥1 m≥n

hold. But, by the preceding, supn fn and inf n fn are measurable, hence also
lim inf n fn and lim sup fn are measurable and thus also f .

6.2 Induced measures


Lemma 6.9. Let (Ω, A, µ) be a measure space, (F, B) be measure space and
f be a (A, B)-measurable map. Then the map

µf (B) := µ(f −1 (B)) = µ({x ∈ Ω : f (x) ∈ B}), B ∈ B,

defines a measure on B. If µ is a probability measure then so is µf .

Proof. It holds that µf (∅) = µ(f −1 (∅)) = µ(∅) = 0. Since µ is a measure


and since f is (A, B)-measurable we clearly we have that µf (A) ≥ 0.
Now let (Bn ) be a sequence of pairwise disjoint sets in B. Then it holds
that

[   ∞
[  ∞
[ 
−1
µ f
Bn =µ f Bn =µ f −1 (Bn )
n=1 n=1 n=1

X ∞
X
= µ(f −1 (Bn )) = µf (Bn ).
n=1 n=1

Moreover, µf (F ) = µ(f −1 (F )) = µ(E) = 1 gives the claim for probability


measures.

Definition 6.10. The measure µf is called an induced measure or the


measure induced by f .

35
6.3 Simple functions
Lemma 6.11. Let (Ω, A) be a measurable space. If α1 , . . . , αm ∈ R and
A1 , . . . , Am ∈ A then the function
m
X
f= αj 1Aj (6.1)
j=1

is Borel measurable.

Definition 6.12. A function of the form (6.1) is called simple.

SA, where I is
Definition 6.13. A family (Ai )i∈I of pairwise disjoint sets in
some non-empty index set, is called partition on Ω if Ω = i∈I Ai .

Example 6.14. Let Ω = [0, ∞). Then An = [n − 1, n) is a partition of Ω.

Remark 6.15. If f is a simple function of type (6.1) then there is a rep-


resentation in which (Aj )j=1,...,m is a partition of Ω. This representation
is unique if we demand that α1 , . . . , αm are distinct and that A1 , . . . , Am
are non-empty. We call the corresponding representation the canonical
representation of f .

6.4 Approximation of Borel functions


In integration theory it is of advantage to allow for functions that also take
the values −∞, ∞. Such functions are sometimes called numerical func-
tions. In this case we consider the σ-algebra

B(R) = {A ∪ B : A ∈ B(R), B ⊆ {−∞, ∞}}.

Definition 6.16. Let (Ω, A) be a measurable space. A numerical function


f : Ω → R is called Borel-measurable if it is (A, B(R))-measurable. By
the previous we know that this is the case if and only if {x ∈ Ω : f (x) >
y} ∈ A, for all y ∈ R.

Remark 6.17. All the results that we have derived in this section can be
extended to numerical functions.

Theorem 6.18. Let (Ω, A) be a measurable space and f : Ω → R be a non-


negative Borel-measurable function. Then there is a sequence (fn ) of simple
functions such that

• 0 ≤ fn (x) ≤ fn+1 (x) ≤ f (x) for all n ∈ N, x ∈ Ω,

• limn→∞ fn (x) = f (x) for all x ∈ Ω.

36
Proof. Define a sequence of functions via
(
k
n, if f (x) ∈ [ 2kn , k+1 n
2n ), k = 0, . . . , n2 − 1 .
fn (x) := 2
n, if f (x) ≥ n

Now, since f is Borel-measurable, it is easy to see that the sets


−1 k k+1
An,k :={x ∈ Ω : f (x) ∈ [ 2kn , k+1
2n )} = f ([ 2n , 2n )), k = 0, . . . , n2n − 1,
Bn :={x ∈ Ω : f (x) ≥ n} = f −1 ([n, ∞])

belong to A. Now, by construction we can gather the following facts


P n −1 k
• fn (x) = n1Bn + n2k=0 2n 1An,k , i.e. fn is simple and fn ≥ 0, for all
n ∈ N.
• fn ≤ f , for all n ∈ N, since for any x with f (x) ∈ [ 2kn , k+1
2n ) we have
k n
fn (x) = 2n ≤ f (x), for k = 0, . . . , n2 − 1 and fn (x) = n ≤ f (x),
otherwise.
• fn ≤ fn+1 , since for any x with f (x) ∈ [ 2kn , k+1
2n ) we have that
1

k k+ 2
h
k
, if f (x) ∈ ,

k 
2n 2n 2n
fn (x) = n and fn+1 (x) = 1 1 .
2 k+ k+
h
 n2 , if f (x) ∈,
 2 , k+1
2 2n 2n

Additionally, we also get for x such that f (x) ≥ n the monotonicity.


It remains to show the pointwise convergence. If f (x) = ∞ then fn (x) = n,
for all n ∈ N, and clearly fn (x) → ∞. If f (x) < ∞ then there is some n0 ∈ N
such that f (x) < n, for all n ≥ n0 . This means that f (x) is in one of the
intervals [ 2kn , k+1
2n ) and it holds that

k+1 k 1
0 ≤ f (x) − fn (x) ≤ n
− n = n.
2 2 2
Taking the limit for n → ∞ yields

0 ≤ f (x) − lim fn (x) ≤ 0,


n→∞

which is equivalent to limn→∞ fn (x) = f (x).

7 Basics of Integration theory


In the following we always let (Ω, A, µ) be a measure space. For measurable
maps f : Ω → R we will construct integrals of type
Z
f (x)dµ(x)

37
The idea is to measure with µ the pre-image of each possible outcome of
f . As before we introduce this concept very generally such that the cases
µ = λ, i.e. when µ equals the Lebesgue measure, or µ(Ω) = 1, i.e. when µ
is a probability measure, are only particular examples of µ.

7.1 The integral of simple functions


We begin with the construction of the integral for simple functions of type
m
X
f= αj 1Aj , (7.1)
j=1

where α1 , . . . , αm are (not necessarily distinct) real numbers and (Aj )j=1,...,m
are sets in A that form a partition of Ω. Since the image of simple functions
is finite - we have Im(f ) = {α1 , . . . , αm } - this will lead to a sum over j.
Definition 7.1. Let f be a non-negative, simple function of type (7.1).
Then we set Z m
X
f dµ := αj µ(Aj ),
Ω j=1

where
R we use the convention that 0 · ∞ = 0 and a + ∞ = ∞. The object
Ω f dµ is called the integral of f over Ω w.r.t. µ.

Examples 7.2.
Consider the function

f (x) = 21 1[−2,2] (x) + 3 · 1(2,3] (x),

i.e. α1 = 12 , A1 = [−2, 2], α2 = 3, A2 = (2, 3], α3 = 0, A3 = (−∞, −2) ∪


(3, ∞). Let λ be the Lebesgue measure on (R, B(R)). Then
Z
1
f dλ = · λ([−2, 2]) + 3 · λ((2, 3]) + 0 · λ((−∞, −2) ∪ (3, ∞))
2
1
= · 4 + 3 · 1 + 0 · ∞ = 5.
2
Now consider the integral w.r.t. to the Dirac measure δ2 (A) = 1A (2), A ∈ A.
Then
Z
1
f dδ2 = · δ2 ([−2, 2]) + 3 · δ2 ((2, 3]) + 0 · δ2 ((−∞, −2) ∪ (3, ∞))
2
1 1
= ·4+3·0+0·∞= .
2 2
Let (Ω, A, µ) = (R, B(R), λ), where λ denotes the Lebesgue measure. Then
Z
1(a,b] dλ = λ((a, b] = b − a
R

38
and the Dirichlet function satisfies
Z
1Q dλ = λ(Q) = 0.
R

Let Ω = {1, . . . , n}, A = P(Ω) and µ be a probability measure on A. Then


any function f : Ω → [0, ∞) is simple (since f has at most n different
outcomes because there are only n different arguments). Moreover, we have
that Z n
X
f dµ = f (k)µ({k}).
Ω k=1

Let fR = 0 be the zero function. Then it obviously holds for any measure µ
that Ω df µ = 0.

Lemma 7.3. Pm Let f be a P


non-negative, simple function with the representa-
tions f = j=1 αj 1Aj = nk=1 βk 1Bk , where (Aj )j=1,...,m as well as (Bk )k=1,...,n
are partitions of Ω. Then it holds that
m
X n
X
αj µ(Aj ) = βk µ(Bk ).
j=1 k=1

In particular, the introduced integral does not depend on the representation


of f , thus we say it is well-defined.

Proof. First note that Aj = nk=1 (Aj ∩ Bk ) is a disjoint union for any
S
j = 1, . . . , m,
S since (Bk )k=1,...,n is a partition. By the same argument we get
that Bk = m j=1 (Aj ∩ Bk ) is a disjoint union for any k = 1, . . . , n.
If Aj ∩ Bk 6= ∅ then αj = f (x) = βk , for all x ∈ Aj ∩ Bk . This gives
along with additivity of µ:
m
X m
X n
[  m X
X n
αj µ(Aj ) = αj µ (Aj ∩ Bk = αj µ(Aj ∩ Bk )
j=1 j=1 k=1 j=1 k=1
Xm X n n
X [ m 
= βk µ(Aj ∩ Bk ) = βk µ (Aj ∩ Bk )
j=1 k=1 k=1 j=1
Xn
= βk µ(Bk ).
k=1

Lemma 7.4. For non-negative simple functions f, g and a real number c ≥ 0


it holds that Z Z
cf dµ = c f dµ,
Ω Ω

39
as well as Z Z Z
(f + g)dµ = f dµ + gdµ.
Ω Ω Ω
If it holds that f ≤ g, i.e. f (x) ≤ g(x), for all x ∈ Ω, then
Z Z
f dµ ≤ gdµ.
Ω Ω

Proof. Ad (i): By definition of the integral it is easy to see that


Z m
X m
X Z
cf dµ = cαj 1Aj = c αj 1Aj = c f dµ.
Ω j=1 j=1 Ω

Ad (ii): Note that Aj ∩ Bk , j = 1, . . . , m, k = 1, . . . , n, gives a partition of


Ω. Therefore
m X
X n
f +g = (αj + βk )1Aj ∩Bk
j=1 k=1

is again simple and we clearly have (by a similar argument like in the proof
of the previous Lemma)
Z m X
X n
(f + g)dµ = (αj + βk )µ(Aj ∩ Bk )
Ω j=1 k=1
Xm X n m X
X n
= αj µ(Aj ∩ Bk ) + βk µ(Aj ∩ Bk )
j=1 k=1 j=1 k=1
m
X n
[  Xn m
[ 
= αj µ (Aj ∩ Bk ) + βk µ (Aj ∩ Bk )
j=1 k=1 k=1 j=1
Xm n
X Z Z
= αj µ(Aj ) + βk µ(Bk ) = f dµ + gdµ.
j=1 k=1 Ω Ω

Ad (iii): Note that in case of (Aj ∩ Bk ) 6= ∅ we have that

αj = f (x) ≤ g(x) = βk ,

for all x ∈ (Aj ∩ Bk ). It further holds that αj µ(Aj ∩ Bk ) = 0 = βk µ(Aj ∩ Bk )


in case that (Aj ∩ Bk ) = ∅. This gives (using again the additivity of µ)
Z m
X m X
X n
f dµ = αj µ(Aj ) = αj µ(Aj ∩ Bk )
Ω j=1 j=1 k=1
Xm X n n
X Z
≤ βk µ(Aj ∩ Bk ) = βk µ(Bk ) = gdµ.
j=1 k=1 k=1 Ω

40
7.2 The integral of non-negative functions
With the help of the integral of simple functions we now introduce the
integral for non-negative functions. Remember that we can approximate
these functions arbitrarily close from below by simple functions, cf. the
Approximation theorem. Therefore the following definition is meaningful.

Definition 7.5. Denote by M+ (Ω, A) the set of all non-negative Borel-


measurable functions f : Ω → R. Then for set for any f ∈ M+ (Ω, A)
Z nZ o
f dµ := sup ϕdµ : ϕ is simple , 0 ≤ ϕ ≤ f ∈ [0, ∞]
Ω Ω
R
and call the quantity Ω f dµ the integral of f over Ω w.r.t. µ. Equivalent
notations are
Z Z Z
f dµ, f (x)µ(dx), f (x)dµ(x),
Ω Ω

but they all mean the same.


For A ∈ A we further set
Z Z
f dµ := 1A f dµ.
A Ω

Examples 7.6.
Pm R
(i)
Pm If f is a simple function f = j=1 α j 1 A j of type (7.1) then still f dµ =
j=1 α j µ(A j ) (that is why we can use the same notation).
(ii) Let Ω = N, A = P(N) and µ a probability measure on P(N). Then for
any function f : N → [0, ∞) we have that
Z ∞
X
f dµ = f (k)µ({k}).
N k=1

(iii) Let f be a function in M+ (R, B(R)) and λ be the Lebesgue measure.


Then Z Z
f dλ = 1Q f dλ = 0,
Q R

P ϕ such that ϕ ≤ 1Q f must satisfy ϕ(x) S


since any simple function = 0 for all
x ∈ R\Q and ϕ( x) = m α 1
j=1 j Aj for appropriate α 1 , . . . , α and m
j=1 Aj =
Rm
Q. Since λ(Q) = 0 also λ(ARj ) = 0, j = 1, . . . , m, hence ϕdλ = 0 for all
simple ϕ ≤ f 1Q . This gives Q f dλ = 0.

Lemma 7.7. (i) For f, g ∈ M+ (Ω, A) with f ≤ g it holds that


Z Z
f dµ ≤ gdµ
Ω Ω

41
(ii) For f ∈ M+ (Ω, A) and A, B ∈ A with A ⊆ B we have that
Z Z
f dµ ≤ f dµ.
A B

Proof. Ad (i): If ϕ is simple such that 0 ≤ ϕ ≤ f then f ≤ g implies


0 ≤ ϕ ≤ g as well. Thus
Z nZ o
f dµ = sup ϕdµ : ϕ is simple , 0 ≤ ϕ ≤ f

nZ o Z
≤ sup ϕdµ : ϕ is simple , 0 ≤ ϕ ≤ g = gdµ.

Ad (ii): Since 1A f ≤ 1B f (f is non-negative) the claim follows with (i).

Theorem 7.8 (Lebegue’s monotone convergence theorem). Let (fn ) be a


monotonically increasing sequence of functions in M+ (Ω, A) that converges
to f , i.e. fn ∈ M+ (Ω, A), fn ≤ fn+1 , ∀n ∈ N, and fn ↑ f . Then it holds
that Z Z Z
f dµ = lim fn dµ = lim fn dµ.
Ω Ω n→∞ n→∞ Ω

Proof. First note that fn is Borel-measurable and that fn ≥ 0, for all n ∈ N.


But then also f ≥ 0 and f = limn→∞ fn is Borel-measurable, i.e. f ∈
M+ (Ω, A). From fn ≤ fn+1 ≤ f the previous Lemma implies
Z Z Z
fn dµ ≤ fn+1 dµ ≤ f dµ,
Ω Ω Ω
R R
which gives limn→∞ Ω fn dµ ≤ Ω f dµ. It remains to show the reverse.
Let c ∈ (0, 1) and ϕ be a simple function with 0 ≤ ϕ ≤ f . Set

Bn := {x ∈ Ω : fn (x) ≥ ϕ(x)}.

By fn ↑ f (i.e. (fn )Sconvergences from below to f ) and cϕ < f we get


Bn ⊆ Bn+1 and Ω = ∞ n=1 Bn . By definition of Bn and the monotonicity of
the integral for non-negative functions it holds that
Z Z Z
c 1Bn ϕdµ ≤ 1Bn fn dµ ≤ fn dµ, (7.2)
Ω Ω Ω

which can be seen by


(
cϕ(x), if x ∈ Bn
c1Bn (x)ϕ(x) =
0, if x ∈
/ Bn
(
cfn (x), if x ∈ Bn

0, if x ∈
/ Bn
=c1Bn (x)fn (x) ≤ 1Bn (x)fn (x)

42
Pm S∞
For simple ϕ = j=1 αj µ(Aj ) σ-continuity and Aj = n=1 Aj ∩ Bn yields
Z m
X m
X ∞
[ 
cϕdµ =c αj µ(Aj ) = c αj µ (Aj ∩ Bn )
Ω j=1 j=1 n=1
Xm m
X
=c αj lim µ(Aj ∩ Bn ) = lim c αj µ(Ai ∩ Bn )
n→∞ n→∞
j=1 j=1
Z
= lim c 1Bn ϕdµ,
n→∞ Ω

which gives in conjunction with (7.2)


Z Z
c ϕdµ ≤ lim fn dµ.
Ω n→∞ Ω

But since c ∈ (0, 1) was arbitrary we also get that


Z Z
c ϕdµ ≤ lim fn dµ.
Ω n→∞ Ω

Moreover, since ϕ was arbitrary and such that 0 ≤ ϕ ≤ f :


Z nZ o Z
f dµ = sup ϕdµ : ϕ simple , 0 ≤ ϕ ≤ f ≤ lim fn dµ.
Ω Ω n→∞ Ω

This means that we showed both directions and the claim follows.

Corollary 7.9. Let f ∈ M+ (Ω, A) and let (fn ) be a monotonically increas-


ing sequence of non-negative, simple functions that converges to f . Then it
holds that Z Z Z
f dµ = lim fn dµ = lim fn dµ.
Ω Ω n→∞ n→∞ Ω

Proof. Follows immediately from the previous theorem and the fact that
non-negative, simple functions are in M+ (Ω, A).

Corollary 7.10. For f, g ∈ M+ (Ω, A) and c ≥ 0 it holds that


Z Z
cf dµ = c f dµ
Ω Ω

and Z Z Z
(f + g)dµ = f dµ + gdµ.
Ω Ω Ω

Proof. If c = 0 then the claim follows immediately. Let c > 0 and let (fn )
be a sequence of non-negative, simple functions with fn ↑ f (existence is
guaranteed by the Approximation theorem). Then also (cfn ) is a sequence

43
of non-negative functions with cfn ↑ cf . In particular, the Monotone con-
vergence theorem can be applied twice and the linearity of the integral for
simple functions yields
Z Z Z Z
cf dµ = lim cfn dµ = lim cfn dµ = lim c fn dµ
Ω Ω n→∞ Z
n→∞ Ω
Z
n→∞
Z Ω

=c lim fn dµ = c lim fn dµ = c f dµ.


n→∞ Ω Ω n→∞ Ω

In the same way the second statement follows.

7.3 Integrable functions


In order to use the theory that we have developed for non-negative functions
for general functions f : Ω → R we make use of a small trick. This trick
consists of decomposing f into two functions, such that one functions takes
all the positive values of f and the other all negative values. The part for
the negative values can then be represented as a non-negative function times
the number (−1).
Definition 7.11. For a Borel-measurable function we define the positive
part f + and negative part f − via

f + (x) := max(f (x), 0) und f − (x) := max(−f (x), 0).

Note that f + and f − are non-negative functions and that it holds f =


f + − f − as well as |f | = f + + f − .
Example 7.12. For the function f (x) = 1(−∞,5) (x) − 2 · 1[5,∞) (x) we see
that
( (
1, x ∈ (−∞, 5) 0, x ∈ (−∞, 5)
f + (x) = , f − (x) = ,
0, x ∈ [5, ∞) 2, x ∈ [5, ∞)

which clearly gives f (x) = f + (x) − f − (x), for all x ∈ R. Moreover, it easy
to see that

|f (x)| = 1(−∞,5) (x) + 2 · 1[5,∞) (x) = f + (x) + f − (x).

Definition 7.13. A Borel-measurable function f : Ω → R is called inte-


grable (w.r.t. µ) if
Z Z
+
f dµ < ∞ and f − dµ < ∞.
Ω Ω

In this case we set


Z Z Z
f dµ := f + dµ − f − dµ
Ω Ω Ω

44
R
and we call the quantity Ω f dµ the integral of f over Ω w.r.t. µ.
Equivalent notations are
Z Z Z
f dµ, f (x)µ(dx), f (x)dµ(x),
Ω Ω

but they all mean the same.


For A ∈ A we further set
Z Z Z
f dµ := +
f dµ − f − dµ.
A A A

The set of all integrable functions is denoted by L1 (Ω, A, µ).

Remarks 7.14.
(i) For a Borel-measurable function f : Ω → R the following is equivalent

(a) f ∈ L1 (Ω, A, µ)
(b) |f | ∈ L1 (Ω, A, µ).

This can be easily seen as the linearity of the integral for non-negative
functions gives
Z Z Z Z
|f |dµ = (f + + f − )dµ = f + dµ + f − dµ,
Ω Ω Ω Ω

which is finite if and only if f is integrable.


(ii) The definition immediately gives the ’reverse‘: If f = f1 − f2 for two
non-negative Borel-measurable functions f1 and f2 then
Z Z Z
f dµ = f1 dµ − f2 dµ.
Ω Ω Ω

Definition 7.15. We say that a statement over the points x ∈ Ω holds µ-


almost everywhere (short: µ-a.e.), if there is a set N ∈ A with µ(N ) = 0
such that this statement holds true for all x ∈ N c .

Examples 7.16.
(i) Consider again the Dirichlet function f (x) = 1Q (x) and the Lebesgue
measure λ . Then this function equals zero λ-almost everywhere, since
λ(Q) = 0.
(ii) Consider the functions f (x) = x2 , g(x) = 1 and the Dirac measure
δ0 (A) = 1A (0) for A ∈ A. Then g is greater then f δ0 -almost everywhere,
since g(0) = 1 > 0 = f (0).

Theorem 7.17. Let f, g ∈ L1 (Ω, A, µ) and let c ∈ R. Then it holds that.

45
1. The functions cf and f + g are also in L1 (Ω, A, µ) and
Z Z Z Z Z
cf dµ = c f dµ, (f + g)dµ = f dµ + gdµ.
Ω Ω Ω Ω Ω

2. If f ≤ g, then it holds that


Z Z
f dµ ≤ gdµ.
Ω Ω

3. The modulus (absolute value) of f satisfies


Z Z
f dµ ≤ |f |dµ.
Ω Ω

4. If A and B are two disjoint sets in A then


Z Z Z
f dµ = f dµ + f dµ.
A∪B A B

5. For A ∈ A with µ(A) = 0 it holds that


Z
f dµ = 0.
A

6. The following equivalence holds true


Z
|f |dµ = 0 ⇔ f = 0 µ-almost everywhere.

Proof. Ad 1.: For arbitrary c ∈ R we first show that cf ∈ L1 (Ω, A, µ). The
relation f ∈ L1 (Ω, A, µ) ⇔ |f | ∈ L1 (Ω, A, µ) yields
Z Z Z
|cf |dµ = |c||f |dµ = |c| |f |dµ < ∞,
Ω Ω Ω

where we have used the linearity of the integral for non-negative functions
(note: |f | ∈ M+ (Ω, A). Now consider the case in which c = 0. Then it
obviously holds that
Z Z
0 · f dµ = 0 = 0 · f dµ.
Ω Ω

If c > 0 then we get by (cf )+ = cf + , (cf )− = cf − and the linearity for


non-negative functions that
Z Z Z Z Z
+
cf dµ = cf dµ − cf dµ = c f dµ − c f − dµ
− +
Ω ΩZ ΩZ ΩZ Ω
 
=c f − dµ − f − dµ = c f dµ.
Ω Ω Ω

46
The case c < 0 follows analogously.
Ad 2.: It is clear that f ≤ g implies 0 ≤ g−f , i.e. (g−f ) is a non-negative
function. Since the zero-function x 7→ 0, ∀x ∈ Ω, is also non-negative the
monotonicity of the integral for function in M+ (Ω, A) along with statement
1. gives Z Z Z Z
0= 0dµ ≤ (g − f )dµ = gdµ − −f dµ,
Ω Ω Ω Ω
which is equivalent to the claim.
Next we show that for f, g ∈ L1 (Ω, A, µ) also (f + g) ∈ L1 (Ω, A, µ),
which can be easily seen by then triangle inequality |f + g| ≤ |f | + |g|:
Z Z Z Z
|f + g|dµ ≤ |f | + |g|dµ = |f |dµ + |g|dµ < ∞,
Ω Ω Ω Ω

and using the derived properties of the integral for the non-negative func-
tions |f |, |g| and |f +g|. Moreover, we clearly have that f + +g + and f − +g −
are functions in M+ (Ω, A) such that f + g = (f + + g + ) − (f − + g − ). This
gives (cf. Remark 7.14)
Z Z Z
(f + g)dµ = (f + + g + )dµ − (f − + g − )dµ
Ω ZΩ Z Ω Z Z
+
= f dµ + +
g dµ − f − dµ − g − dµ
ZΩ Z Ω Ω Ω

= f dµ + gdµ,
Ω Ω

where the second equality is obtained from the linearity of the integral of
functions in M+ (Ω, A). Note that it in general we do not have (f + g)+ =
f + + g+!
Ad 3.: By |f | = f + + f − and again by the triangle inequality it holds
that
Z Z Z Z Z

f dµ = +
f dµ − f dµ ≤ +
f dµ + f − dµ
Ω Ω Ω Ω Ω
Z Z Z Z
+ − + −
= f dµ + f dµ = f + f dµ = |f |dµ,
Ω Ω Ω Ω

where we just used another time the linearity of the integral for non-negative
functions.
Ad 4.: We know that 1A∪B f = (1A + 1B )f = 1A f + 1B f . This along
with 1. gives
Z Z Z Z Z
f dµ = 1A∪B f dµ = 1A f + 1B f dµ = 1A f dµ + 1B f dµ.
A∪B Ω Ω Ω Ω

Ad 5: This statement follows by the approach of measure


Ptheoretic in-
m
duction and can be easily verified for the case that f = j=1 αj 1Aj is

47
simple:
Z Z m
X m
X
f dµ = 1A f dµ = αj 1A 1Aj = αj 1A∩Aj = 0,
A Ω j=1 j=1

since by (A ∩ Aj ) ⊆ A we have µ(A ∩ Aj ) ≤ µ(A) = 0, i.e. µ(A ∩ Aj ) = 0,


for j = 1, . . . , m.
Now assume that f ∈ M+ (Ω, A). Then, by the Approximation the-
orem, there is a monotonically increasing sequence (fn ) of non-negative,
simple functions such that fn ↑ f . But then also (1A fn ) is a monotonically
increasing sequence of non-negative simple functions with 1A fn ↑ 1A f . The
Monotone convergence theorem then gives
Z Z Z Z
f dµ = 1A f dµ = lim 1A fn dµ = lim 1A fn dµ
A Ω Z Ω n→∞ n→∞ Ω

= lim fn dµ = lim 0 = 0,
n→∞ A n→∞

since we already verified the claim for simple functions.


Finally, let f ∈ L1 (Ω, A, µ). Then it holds that (1A f )+ = 1A f + as well
as (1A f )− = 1A f − , which gives
Z Z Z Z Z Z

f dµ = 1A f dµ = +
1A f dµ − 1A f dµ = +
f dµ − f − dµ = 0,
A Ω Ω Ω A A

since we verified the claim already for non-negative functions. This gives
the proof of statement 5.
Ad 6.: If we assume first that f = 0 µ-a.e, i.e. µ(A) = 0 for
A :={x ∈ Ω : |f (x)| > 0} = {x ∈ Ω : f (x) 6= 0}
={x ∈ Ω : f (x) < 0} ∪ {x ∈ Ω : f (x) > 0},
which clearly belongs to A since f is Borel-measurable. Moreover, since
|f (x)| = 0, for any x ∈ Ac , an application of 4. (with Ω = A ∪ Ac ) and 5.
gives Z Z Z
|f |dµ = |f |dµ + |f |dµ = 0 + 0 = 0,
Ω A Ac
where we used that 1Ac (x)f (x) = 0.
Now we show the reverse. Assume that the set A (defined as before) has
non-zero measure, i.e. µ(A) > 0. It is not hard to see that

[
A = {x ∈ Ω : |f (x)| > 0} = {x ∈ Ω : |f (x)| > n1 }.
n=1

By σ-continuity we get

[ 
0 < µ(A) = µ {x ∈ Ω : |f (x)| > n1 } = lim µ({x ∈ Ω : |f (x)| > n1 }).
n→∞
n=1

48
This means that there is some n0 ∈ N such that µ(An0 ) > 0 for An0 = {x ∈
Ω : |f (x)| > n10 }) > 0. Note that n10 1An0 ≤ f 1An0 . But this would imply
(applying statement 2. twice)
Z Z Z
1 1
|f |dµ ≥ |f |1An0 dµ ≥ 0 1An0 dµ = 0 µ(An0 ) > 0,
Ω Ω n Ω n
R
which contradicts the assumption Ω |f |dµ = 0. Therefore it cannot hold
that µ(A) > 0, i.e. f = 0 µ-a.e.

Corollary 7.18. If f, g ∈ L1 (Ω, A, µ) such that f = g µ-almost everywhere


then Z Z
f dµ = gdµ.
Ω Ω

Proof. We use that

{x ∈ Ω : f (x) 6= g(x)} = {x ∈ Ω : f (x) − g(x) 6= 0},

i.e. f = g µ-a.e. is equivalent to f − g = 0 µ-a.e, which itself is equivalent


to Z
|f − g|dµ = 0,

by statement 6 of the last Theorem. The claim 3. from the same theorem
now gives Z Z
0≤ (f − g)dµ ≤ |f − g|dµ = 0,
Ω Ω
R
i.e. already Ω (f − g)dµ must be equal to zero, which is equivalent to the
claim, by the linearity of the integral.

By the help of the above corollary it is easy to derive integrals when the
underlying measure is discrete.

Example 7.19. Let (Ω, P(Ω), µ) be a discrete measure space, where µ is


finite. This means that µ is completely determined by a sequence Pof pairs
(ai , pi )i∈I ⊆ Ω × (0, ∞) for some non-empty index set I ⊆ N with i∈I pi <
∞ and such that
X
µ(B) = 1B (ai )pi , B ∈ P(Ω).
i∈I

It is clear that any function f : Ω → R is (P(Ω), B(R))-measurable (since


any pre-image of f lies in the power set P(Ω)). The support A = {ai : i ∈ I}
satisfies µ(Ac ) = 0, i.e. it holds that f = 1A · f µ-almost everywhere, which
implies Z Z
f dµ = 1A f dµ.
Ω Ω

49
We now calculate this integral with the help of approximating functions.
For this, set An := {ai : i ∈ I, i ≤ n.} and define a sequence (fn ) by
X
fn (x) = 1An (x)f (x) = 1{ai } (x)f (ai ).
i∈I:i≤n

These functions satisfy


Z X
1An f dµ = f (ai )pi .
Ω i∈I:i≤n

In case that f ≥ 0 (i.e. f non-negative) it follows that


Z X
f dµ = f (ai )pi . (7.3)
Ω i∈I

For arbitrary f one similarly uses the decomposition f = f + − f − . We


therefore just showed the following theorem.
P
Theorem 7.20. Let (Ω, P(Ω), µ) with µ(B) = i∈I 1B (ai )p Pi , B ∈ P(Ω).
Then a map f : Ω → R is integrable w.r.t. µ if and only if i∈I |f (ai )|pi <
∞. The corresponding integral is given by (7.3).

For continuous measures the following general statement helps with the
calculation of integrals.

Theorem 7.21 (Substitution). Let (Ω, A, µ) be a measure space, (F, B) be


a measurable space and f be a (A, B)-measurable map. Then it holds for
any Borel-measurable function g : F → R that
Z Z
f
g(y)dµ (y) = g(f (x))dµ(x). (7.4)
F Ω

In particular, a Borel-measurable function g : F → R is in L1 (F, B, µf ) if


and only if g ◦ f is in L1 (Ω, A, µ), in which case (7.4) holds.

Proof. We proceed by the concept of ’measure theoretic induction’.

1. Let g(y) = 1B (y), B ∈ B, y ∈ F . Then


Z Z
g(y)dµf (y) =µf (B) = µ(f −1 (B)) = 1f −1 (B) (x)dµ(x)
F Ω
Z Z
= 1B (f (x))dµ(x) = g(f (x))dµ(x).
Ω Ω

2. Let g be a simple function. Then the identity stated in the theorem


holds true by the previous and due to the linearity of the integral. To

50
Pm
see this, let g = j=1 αj 1Aj . Then
Z m
Z X m
X Z
f f
g(y)dµ (y) = αj 1Aj (y)dµ (y) = αj 1Aj (y)dµf (y)
F F j=1 j=1 F
m
X Z m
Z X
= αj 1Aj (f (x))dµ(x) = αj 1Aj (f (x))dµ(x)
j=1 Ω Ω j=1
Z
= g(f (x))dµ(x)

3. Let g ∈ M+ (F, B) and let (gn ) be an increasing sequence of non-


negative, simple functions such that limn→∞ gn (y) = g(y), for all y ∈
F . It is easy to see that also gn ◦ f is a sequence of increasing, non-
negative, simple functions and that
Z Z Z
f f
g(y)dµ (y) = lim gn (y)dµ (y) = lim gn (f (x))dµ(x)
F n→∞ F n→∞ Ω
Z
= g(f (x))dµ(x).

4. For g ∈ L1 (F, B, µf ) the claim now follows by the decomposition g =


g + − g − and the fact that g + and g − are in M+ (F, B):
Z Z Z
+
g(f (x))dµ(x) = g (f (x))dµ(x) − g − (f (x))dµ(x)
Ω Z Ω Z Ω

= + f
g (y)dµ (y) − g − (y)dµf (y)
ZF F
f
= g(y)dµ (y).
F

R
The above theorem implies that for the calulation of Ω (h ◦ f )dµ it is
only necessary to know the induced measure µf (and not f or µ). This is
of fundamental importance for Statistics where the underlying probability
space (Ω, A, P ) is often unknown and only derived quantities - the so-called
random variables X : Ω → F - are observed. The corresponding probability
distribution P X is then accessible via the observations of samples w.r.t. P X .
In the remainder we will often use the notation
Z Z
f (x)dx := f (x)dλ(x),
R R

and equivalently for

51
Definition 7.22. Let µ be a σ-finite measure on (R, B(R)) and let f : R →
R+ be a non-negative function such that
Z Z b
µ((a, b]) = f (x)dx := f (x)dx, −∞ ≤ a < b ≤ ∞.
(a,b] a

Then the measure µ is called absolutely continuous (w.r.t. λ) and f is


called the density function of µ. In case that µ is a probability function
f is called the probability density function.

Examples 7.23.
(i) The Lebesgue measure λ restricted to an interval [a, b], a < b, has the
density f (x) = 1[a,b] (x), x ∈ R.
(ii) For fixed m ∈ R and σ 2 > 0 the density of the so-called Gaußian measure
µ is given by

1  (x − m)2 
f (x) = √ exp − , x ∈ R.
2πσ 2 2σ 2

The corresponding distribution is a probability distribution, the so-called


normal distribution, denoted by N (m, σ 2 ).

For the distribution function F : R → [0, 1] of an absolutely continuous


measure P on (R, B(R)) it holds that
Z x
F (x) = P ((−∞, x]) = f (y)dy, x ∈ R.

If f is (piecewise) continuous then F 0 (x) = f (x) for P -almost all x ∈ R.


1
R 7.24. It is also possible to define for functions f ∈ L (R,
Remark R B(R), λ)
with R f dλ = 1 a probability measure P by setting P (B) = B f dλ for
B ∈ B(R).

Theorem 7.25. Let P be an absolutely continuous probability measure on


(R, B(R)) with density functions f : R → R+ . R If h : R → R is a Borel-
measurable function such that either h ≥ 0 or |h(y)|dP (y) < ∞ then it
holds that Z Z
h(y)dP (y) = h(y)f (y)dy.
R R
R
In particular, it holds that P (B) = B f (y)dy for all B ∈ B(R).

Proof. Follows with measure theoretic induction, where the statement holds
for the case h = 1[a,b] by the definition of a density.

52
7.4 Convergence theorems
Definition 7.26. A sequence (fn ) a Borel-measurable functions fn : Ω →
R, n ∈ N, is called convergent µ-almost everywhere to some Borel-
measurable function f on (Ω, A, µ) if there is a µ-set N ∈ A (i.e. N satisfies
µ(N )) such that

lim fn (x) = f (x), for all x ∈ N c .


n→∞

We write short: fn → f µ-a.e.


Theorem 7.27. Let fn , gn : Ω → R, n ∈ N, constitute a sequence of Borel-
measurable functions. Then the following holds.
1. If fn → f µ-a.e. and if fn → f 0 µ-a.e. then f = f 0 µ-a.e.

2. From fn → f µ-a.e. and fn = gn , µ-a.e., for all n ∈ N, it follows that


gn → f µ-a.e.

3. Let h : R2 → R be a continuous map and let fn → f µ-a.e. as well


as gn → g µ-a.e. Then it follows that h(fn , gn ) → h(f, g) µ-a.e. In
particular,
fn f
fn · gn → f · g µ-a.e., → on {x ∈ Ω : g(x) 6= 0} µ-a.e.,
gn g
αfn + βgn → αf + βg µ-a.e.

Proof. Ad 1.: Let N = {x ∈ Ω : limn→∞ fn (x) 6= f (x)} and N = {x ∈ Ω :


limn→∞ fn (x) 6= f 0 (x)}. Then, we have by assumption µ(N ) = µ(N 0 ) = 0.
Moreover, the uniqueness of the limit of a sequence gives

f (x) = lim fn (x) = f 0 (x), ∀x ∈ Ω\(N ∪ N 0 ).


n→∞

Since µ(N ∪ N 0 ) = 0 the claim follows.


Ad 2.: For n ∈ N let Nn = {x ∈ Ω : fn (x) 6= gn (x)} for which we clearly
have µ(Nn ) = 0. Additionally it holds that µ(N ) = 0, where N = {x ∈ Ω :
limn→∞ fn (x) 6= f (x)}. Then we clearly have
 ∞
[ 
lim gn (x) = f (x) ∀x ∈ Ω\ N ∪ Nn .
n→∞
n=1
S∞ P∞
By µ( n=1 Nn ) = n=1 µ(Nn ) = 0 the claim is easily obtained. Ad 3.:
Set N = {x ∈ Ω : limn→∞ fn (x) 6= f (x)} as well as N 0 = {x ∈ Ω :
limn→∞ gn (x) 6= g(x)} (with µ(N ) = µ(N 0 ) = 0). Continuity of h implies

lim h(fn (x), gn (x)) = h(f (x), g(x)), ∀x ∈ Ω\(N ∪ N 0 ).


n→∞

With µ(N ∪ N 0 ) = 0 the claim follows as before.

53
Example 7.28. Consider the measure space (R, B(R), λ) and the function
fn (x) = xn 1[0,1] (x). Then one has fn (x) → 1{1} (x) but at the same time
also fn (x) → 0 λ-a.e.

Theorem 7.29 (Monotone Convergence Theorem, Beppo-Levi). Let f be a


Borel-measurable function and let (fn ) be a sequence of non-negative Borel-
measurable functions such that fn ≤ fn+1 , for all n ∈ N, and fn → f µ-a.e.
Then it holds that
Z Z Z
lim fn (x)dµ(x) = lim fn (x)dµ(x) = f (x)dµ(x).
n→∞ Ω Ω n→∞ Ω

Proof. Set N = {x ∈ Ω : fn (x) 6= f (x)} with µ(N ) = 0. Then 1N c (x)fn (x)


converges monotonically increasing towards 1N c f (x). Thus Lebesgue’s Mono-
tone Convergence Theorem and properties of the integral give
Z Z Z
lim fn (x)dµ(x) = lim fn (x)dµ(x) + lim fn (x)dµ(x)
n→∞ Ω n→∞ N c n→∞ N
Z Z
= lim fn (x)dµ(x) = f (x)dµ(x)
c n→∞ Nc
ZN Z Z
= f (x)dµ(x) + f (x)dµ(x) = f (x)dµ(x)
Nc N Ω

Lemma 7.30 (Fatou’s Lemma). Let (fn ) be a sequence of non-negative


Borel-measurable functions. Then it holds that
Z Z
lim inf fn dµ ≤ lim inf fn dµ.
Ω n→∞ n→∞ Ω

Proof. Set gn (x) := inf k∈N:k≥n fk (x). Then gn ≤ fn , gn ≤ gn+1 for all
n ∈ N, and limn→∞ gn (x) = lim inf n→∞ fn (x). Moreover, it is clear that
gn is measurable (since infima and maxima are measurable) and that gn is
integrable, which follows by
Z Z Z Z
|gn |dµ = gn dµ ≤ fn dµ = |fn |dµ < ∞.
Ω Ω Ω Ω

This gives by the Monotone convergence theorem


Z Z Z
lim inf fn (x)dµ(x) = lim gn (x)dµ(x) = lim gn (x)dµ(x)
Ω n→∞ Ω n→∞ Z
n→∞ Ω
Z
= lim inf gn (x)dµ(x) ≤ lim inf fn (x)dµ(x).
n→∞ Ω n→∞ Ω

54
Example 7.31. On (Ω, A, µ) = (R, B(R), λ) the sequence fn = − n1 1[2,2n]
violates the non-negativity condition
R and the statement of Fatou’s Lemma
does not hold, since fn → 0 and R fn dλ = −1, i.e.
Z Z
lim inf fn dλ = 0 > −1 = lim inf fn dλ.
R n→∞ n→∞ R

Corollary 7.32 (Reverse Fatou Lemma). Let (fn ) be a sequence of Borel-


measurable functions and suppose that there is a non-negative function g ∈
L1 (Ω, A, µ) such that fn ≤ g, for all n ∈ N. Then
Z Z
lim sup fn dµ ≤ lim sup fn dµ.
n→∞ Ω Ω n→∞

Proof. Set hn := g − fn which clearly is a non-negative function. Then


linearity of the integral, lim inf n→∞ g − fn = g − lim supn→∞ fn and Fatou’s
Lemma imply that
Z Z Z
gdµ − lim sup fn dµ = lim inf hn dµ
Ω Ω n→∞ Ω n→∞ Z
≤ lim inf hn dµ
n→∞ Ω
Z Z
= gdµ − lim sup fn dµ.
Ω n→∞ Ω
R
Subtracting Ω gdµ and multiplying by −1 on each side gives the desired
inequality.

Theorem 7.33 (Dominated convergence theorem). Let f be a Borel-measurable


function and let (fn ) be a sequence of Borel-measurable functions such that

(i) fn → f µ−a.e.,

(ii) There is a function g ∈ L1 (Ω, A, µ) with |fn | ≤ g µ-a.e., for all n ∈ N.

Then it holds that f ∈ L1 (Ω, A, µ) and


Z Z
lim fn dµ = f dµ.
n→∞ Ω Ω

Proof. First note that |fn | ≤ g, for all n ∈ N, and fn → f implies that also
|f | ≤ g. Since fn is measurable (as limit of measurable functions) it is also
integrable by Z Z
|f |dµ ≤ gdµ < ∞.
Ω Ω
Now we clearly have that

|f − fn | ≤ |f | + |fn | ≤ 2g,

55
i.e. |f − fn | is integrable as well, and it even holds that

lim sup |f − fn | = lim |f − fn | = 0.


n→∞ n→∞

Linearity and monotonicity of the integral give


Z Z Z Z
f dµ − fn dµ = (f − fn )dµ ≤ |f − fn |dµ.
Ω Ω Ω Ω

The reverse Fatou Lemma now gives


Z Z
lim sup |f − fn | ≤ lim sup |f − fn |dµ = 0,
n→∞ Ω Ω n→∞

hence Z
lim |f − fn |dµ = 0.
n→∞ Ω

But, again by monotonicity, we get


Z Z Z Z
lim f dµ − fn dµ = lim (f − fn )dµ ≤ |f − fn |dµ = 0,
n→∞ Ω Ω n→∞ Ω Ω

which gives the claim.

Example 7.34. On the measure space ([0, 1], B([0, 1]), λ[0,1] ) consider the
sequence of functions fn (x) = n1[0,1/n] (x), x ∈ R, n ∈ N. Then fn →
0, λ[0,1] -a.e. However, it holds that
Z 1 Z
lim fn dλ[0,1] = 1, lim fn dµ = 0,
n→∞ 0 R n→∞

i.e. the dominated convergence theorem does not apply here. The reason
for this is that there is no map g such that |fn | < g λ[0,1] -a.e. for all n ∈ N.

8 Product measures
In order to model a sequence of either successive or simultaneous random
experiments product σ-algebras and product measures are used.

8.1 Product σ-algebras


In the following we are interested how we can create from two σ-algebras
A and B of given measurable spaces (Ω, A) and (F, B) a σ-algebra on the
product set Ω × F . A meaningful way how to introduce such a system is
given by the next definition.

56
Definition 8.1. Let (Ω, A) and (F, B) be measurable spaces. The system

A ⊗ B := σ({A × B : A ∈ A, B ∈ B})

is called the product σ-algebra on Ω × F .

Remark 8.2. Consider the coordinate projections given by

ΠΩ :Ω × F → Ω, ΠΩ (x, y) = x,
ΠF :Ω × F → F, ΠF (x, y) = y.

Then A ⊗ B is the smallest σ-algebra such that ΠΩ is (A ⊗ B, A)-measurable


and such that ΠF is (A ⊗ B, B)-measurable.

Examples 8.3.
(i) Let Ω = F = {0, 1} and consider the σ-algebra A = {∅, {0, 1}} on Ω
and B = P({0, 1}) = {{0}, {1}, ∅, {0, 1}} on F , respectively. Then (with
A × ∅ = ∅ = ∅ × A for any set A)

A ⊗ B =σ({∅, {(0, 1)} × {0}, {(0, 1)} × {1}, {0, 1}2 })


=σ({∅, {(0, 0), (1, 0)}, {(0, 1), (1, 1)}, {0, 1}2 })
={∅, {(0, 0), (1, 0)}, {(0, 1), (1, 1)}, {0, 1}2 }

(ii) Not every σ-algebra on Ω × F is necessarily a product of σ-algebras on Ω


and F . To see this, let S = {∅, {(0, 0), (1, 1)}, {(1, 0), (0, 1)}, Ω × F }. Then
S is a σ-algebra on Ω × F . However, there are only two possible σ-algebras
on {0, 1}, given by A and B. Since A ⊗ A = {∅, {0, 1}2 } 6= S, B ⊗ B =
P({0, 1}2 ) 6= S and A ⊗ B = 6 S (cf. above) we see that S cannot be a
product σ-algebra.

Lemma 8.4. Let (Ω, A) and (F, B) be measurable spaces and let S and T
be generators of A and B, respectively. Then any of the following systems
is a generator of A ⊗ B:

1. {A × F : A ∈ A} ∪ {Ω × B : B ∈ B},

2. {S × F : S ∈ S} ∪ {Ω × T : T ∈ T },

3. {S × T : S ∈ S, T ∈ T }, if Ω ∈ S and F ∈ T .

Remarks 8.5.
(i) For measurable spaces (Ωi , Ai ), i = 1, . . . , n, where each Ai has a gener-
ator Si with Ωi ∈ Si we get by iteration

A1 ⊗ . . . ⊗ An =σ({A1 × . . . × An : Ai ∈ Ai })
=σ({S1 × . . . × Sn : Si ∈ Si }).

57
Qn
The system A1 ⊗ . . . ⊗ An is called
Qn product σ-algebra on i=1 Ωi . If we
consider the projections Πj : i=1 Ωi → Ωj by Πj ((x1 , . . . , xn )) = xj it
holds that
n
[ 
−1
A1 ⊗ . . . ⊗ A n = σ Πi (Ai ) .
i=1

If Ω1 = Ω2 = . . . Ωn = Ω for some Ω and if A1 = A2 = . . . = An = A for


some A we write A⊗n to denote the product σ-algebra on Ωn .
(ii) Similarly, we call for arbitrary non-empty index sets I and a family
(Ωi , Ai )i∈I of measurable spaces the system
O [ 
Ai = σ Π−1
i (A i )
i∈I i∈I
Q
product σ-algebra on i∈I Ωi . It is the smallest σ-algebra such that all
projections Πj are measurable maps.
(iii) Note that B(R)⊗n = B(Rn ).

Example 8.6. Consider again Ω1 = Ω2 = {0, 1} with A1 = {∅, {0, 1}} and
A2 = P({0, 1}). Then Π−1 −1
1 (A1 ) = {Π1 (A) : A ∈ A1 }. Since A1 only
contains the empty-set and Ω it suffices to consider the two pre-images

Π−1 2
1 (∅) ={(x1 , x2 ) ∈ {0, 1} : x1 ∈ ∅} = ∅
Π−1 2 2
1 (Ω) ={(x1 , x2 ) ∈ {0, 1} : x1 ∈ Ω} = {0, 1} .

which gives Π−1 2


1 (A1 ) = {∅, {0, 1} }. For all possibly sets of A2 we get

Π−1
2 ({0}) = {(0, 0), (1, 0)}, Π−1
2 ({1}) = {(0, 1), (1, 1)}

and again Π−1 −1 2


2 (∅) = ∅ and Π2 ({0, 1}) = {0, 1} . Thus

Π−1 2
2 (A2 ) = {{(0, 0), (1, 0)}, {(0, 1), (1, 1)}, ∅, {0, 1} }.

All in all, we get

A1 ⊗ A2 =σ(Π−1 −1
1 (A1 ) ∪ Π2 (A2 ))
=σ({∅, {(0, 0), (1, 0)}, {(0, 1), (1, 1)}, {0, 1}2 })
={∅, {(0, 0), (1, 0)}, {(0, 1), (1, 1)}, {0, 1}2 }.

8.2 Product measures


We now consider pairs of measure spaces (Ω, A, µ) and (F, B, ν). As we have
seen before it is possible to derive from A and B a σ-algebra A ⊗ B on Ω ×F .
As a next step we would like to construct a measure on A⊗B that is induced
by µ and ν. For this we make use of the Extension theorem.

58
Theorem 8.7. Let (Ω, A, µ) and (F, B, ν) be measure spaces, where µ and
ν are σ-finite. Then there is a unique measure π on A ⊗ B such that

π(A × B) = µ(A) · ν(B)

for all A ∈ A, B ∈ B.
Definition 8.8. The measure µ on A ⊗ B is called the product measure
of µ and ν. We write π = µ ⊗ ν.

8.3 Fubini’s theorem


The following results give a way how to compute a double integral w.r.t. to
a product measure µ ⊗ ν by computing iterated integrals w.r.t. to µ and ν
successively (or the other way round).
Theorem 8.9. Let (Ω, A, µ) and (F, B, ν) be measure spaces, where µ and
ν are σ-finite.
1. (Tonelli) For any f ∈ M+ (Ω × F, A ⊗ B) the functions
Z Z
g1 : x 7→ f (x, y)dν(y), g2 : y 7→ f (x, y)dµ(x)
F Ω

are in M+ (Ω, A) and M+ (F, B), respectively, and it holds that


Z Z Z  Z Z 
f dµ ⊗ ν = f (x, y)dν(y) dµ(x) = f (x, y)dµ(y) dν(x).
Ω×F Ω F F Ω

2. (Fubini) Let f ∈ L1 (Ω × F, A ⊗ B, µ ⊗ ν). Then it holds that

A :={x ∈ Ω : f (x, ·) ∈ L1 (F, B, ν)} ∈ A


B :={y ∈ F : f (·, y) ∈ L1 (Ω, A, µ)} ∈ B.

with µ(Ac ) = ν(B c ) = 0 and the functions g1 and g2 are integrable over A
and B, respectively. Moreover,
Z Z Z  Z Z 
f dµ ⊗ ν = f (x, y)dν(y) dµ(x) = f (x, y)dµ(y) dν(x).
Ω×F A F B Ω

Corollary 8.10. If f : Ω × F → R is Borel-measurable and if any of the


integrals
Z Z Z  Z Z 
|f |dµ ⊗ ν, |f (x, y)|dν(y) dµ(x), |f (x, y)|dµ(y) dν(x)
Ω×F Ω F F Ω

is finite, then all of the above integrals are finite and they coincide. More-
over, it holds that f ∈ L1 (Ω × F, A ⊗ B, µ ⊗ ν) and the statement of Fubini’s
theorem applies.

59

You might also like