# An introduction to probability theory

Christel Geiss and Stefan Geiss
Department of Mathematics and Statistics
June 3, 2009
2
Contents
1 Probability spaces 9
1.1 Deﬁnition of σ-algebras . . . . . . . . . . . . . . . . . . . . . . 10
1.2 Probability measures . . . . . . . . . . . . . . . . . . . . . . . 14
1.3 Examples of distributions . . . . . . . . . . . . . . . . . . . . 25
1.3.1 Binomial distribution with parameter 0 < p < 1 . . . . 25
1.3.2 Poisson distribution with parameter λ > 0 . . . . . . . 25
1.3.3 Geometric distribution with parameter 0 < p < 1 . . . 25
1.3.4 Lebesgue measure and uniform distribution . . . . . . 26
1.3.5 Gaussian distribution on R with mean m ∈ R and
variance σ
2
> 0 . . . . . . . . . . . . . . . . . . . . . . 28
1.3.6 Exponential distribution on R with parameter λ > 0 . 28
1.3.7 Poisson’s Theorem . . . . . . . . . . . . . . . . . . . . 29
1.4 A set which is not a Borel set . . . . . . . . . . . . . . . . . . 31
2 Random variables 35
2.1 Random variables . . . . . . . . . . . . . . . . . . . . . . . . . 35
2.2 Measurable maps . . . . . . . . . . . . . . . . . . . . . . . . . 37
2.3 Independence . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
3 Integration 45
3.1 Deﬁnition of the expected value . . . . . . . . . . . . . . . . . 45
3.2 Basic properties of the expected value . . . . . . . . . . . . . . 49
3.3 Connections to the Riemann-integral . . . . . . . . . . . . . . 56
3.4 Change of variables in the expected value . . . . . . . . . . . . 58
3.5 Fubini’s Theorem . . . . . . . . . . . . . . . . . . . . . . . . . 59
3.6 Some inequalities . . . . . . . . . . . . . . . . . . . . . . . . . 66
3.7 Theorem of Radon-Nikodym . . . . . . . . . . . . . . . . . . . 70
3.8 Modes of convergence . . . . . . . . . . . . . . . . . . . . . . . 71
4 Exercises 77
4.1 Probability spaces . . . . . . . . . . . . . . . . . . . . . . . . . 77
4.2 Random variables . . . . . . . . . . . . . . . . . . . . . . . . . 81
4.3 Integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
3
4 CONTENTS
Introduction
Probability theory can be understood as a mathematical model for the in-
tuitive notion of uncertainty. Without probability theory all the stochastic
models in Physics, Biology, and Economics would either not have been devel-
oped or would not be rigorous. Also, probability is used in many branches of
pure mathematics, even in branches one does not expect this, like in convex
geometry.
The modern period of probability theory is connected with names like S.N.
Bernstein (1880-1968), E. Borel (1871-1956), and A.N. Kolmogorov (1903-
1987). In particular, in 1933 A.N. Kolmogorov published his modern ap-
proach of Probability Theory, including the notion of a measurable space
and a probability space. This lecture will start from this notion, to con-
tinue with random variables and basic parts of integration theory, and to
ﬁnish with some ﬁrst limit theorems. Historical information about mathe-
maticians can be found in the MacTutor History of Mathematics Archive un-
der www-history.mcs.st-andrews.ac.uk/history/index.html and is also
used throughout this script.
Example 1. You stand at a bus-stop knowing that every 30 minutes a bus
comes. But you do not know when the last bus came. What is the intuitively
expected time you have to wait for the next bus? Or you have a light bulb
and would like to know how many hours you can expect the light bulb will
work? How to treat the above two questions by mathematical tools?
die: you have six possible outcomes {1, 2, 3, 4, 5, 6}. If you want to roll a
certain number, let’s say 3, assuming the die is fair (whatever this means) it
is intuitively clear that the chance to dice really 3 is 1:6. How to formalize
this?
Example 3. Now the situation is a bit more complicated: Somebody has
two dice and transmits to you only the sum of the two dice. The set of all
outcomes of the experiment is the set
Ω := {(1, 1), ..., (1, 6), ..., (6, 1), ...., (6, 6)}
5
6 CONTENTS
but what you see is only the result of the map f : Ω →R deﬁned as
f((a, b)) := a + b,
that means you see f(ω) but not ω = (a, b). This is an easy example of a
random variable f, which we introduce later.
Example 4. In Example 3 we know the set of states ω ∈ Ω explicitly. But
this is not always possible nor necessary: Say, that we would like to measure
the temperature outside our home. We can do this by an electronic ther-
mometer which consists of a sensor outside and a display, including some
electronics, inside. The number we get from the system might not be correct
because of several reasons: the electronics is inﬂuenced by the inside temper-
ature and the voltage of the power-supply. Changes of these parameters have
to be compensated by the electronics, but this can, probably, not be done in
a perfect way. Hence, we might not have a systematic error, but some kind
of a random error which appears as a positive or negative deviation from the
exact value. It is impossible to describe all these sources of randomness or
uncertainty explicitly.
We denote the exact temperature by T and the displayed temperature by S,
so that the diﬀerence T −S is inﬂuenced by the above sources of uncertainty.
If we would measure simultaneously, by using thermometers of the same type,
we would get values S
1
, S
2
, ... with corresponding diﬀerences
D
1
:= T −S
1
, D
2
:= T −S
2
, D
3
:= T −S
3
, ...
Intuitively, we get random numbers D
1
, D
2
, ... having a certain distribution.
How to develop an exact mathematical theory out of this?
Firstly, we take an abstract set Ω. Each element ω ∈ Ω will stand for a speciﬁc
conﬁguration of our sources inﬂuencing the measured value. Secondly, we
take a function
f : Ω →R
which gives for all ω ∈ Ω the diﬀerence f(ω) = T − S(ω). From properties
of this function we would like to get useful information of our thermometer
and, in particular, about the correctness of the displayed values.
To put Examples 3 and 4 on a solid ground we go ahead with the following
questions:
Step 1: How to model the randomness of ω, or how likely an ω is? We do
this by introducing the probability spaces in Chapter 1.
Step 2: What mathematical properties of f we need to transport the ran-
domness from ω to f(ω)? This yields to the introduction of the random
variables in Chapter 2.
CONTENTS 7
Step 3: What are properties of f which might be important to know in
practice? For example the mean-value and the variance, denoted by
Ef and E(f −Ef)
2
.
If the ﬁrst expression is zero, then in Example 3 the calibration of the ther-
mometer is right, if the second one is small the displayed values are very
likely close to the real temperature. To deﬁne these quantities one needs the
integration theory developed in Chapter 3.
Step 4: Is it possible to describe the distribution of the values f may take?
Or before, what do we mean by a distribution? Some basic distributions are
discussed in Section 1.3.
Step 5: What is a good method to estimate Ef? We can take a sequence of
independent (take this intuitive for the moment) random variables f
1
, f
2
, ...,
having the same distribution as f, and expect that
1
n
n
¸
i=1
f
i
(ω) and Ef
are close to each other. This lieds us to the Strong Law of Large Numbers
discussed in Section 3.8.
Notation. Given a set Ω and subsets A, B ⊆ Ω, then the following notation
is used:
intersection: A ∩ B = {ω ∈ Ω : ω ∈ A and ω ∈ B}
union: A ∪ B = {ω ∈ Ω : ω ∈ A or (or both) ω ∈ B}
set-theoretical minus: A\B = {ω ∈ Ω : ω ∈ A and ω ∈ B}
complement: A
c
= {ω ∈ Ω : ω ∈ A}
empty set: ∅ = set, without any element
real numbers: R
natural numbers: N = {1, 2, 3, ...}
rational numbers: Q
indicator-function: 1I
A
(x) =

1 if x ∈ A
0 if x ∈ A
Given real numbers α, β, we use α ∧ β := min {α, β}.
8 CONTENTS
Chapter 1
Probability spaces
In this chapter we introduce the probability space, the fundamental notion
of probability theory. A probability space (Ω, F, P) consists of three compo-
nents.
(1) The elementary events or states ω which are collected in a non-empty
set Ω.
Example 1.0.1 (a) If we roll a die, then all possible outcomes are the
numbers between 1 and 6. That means
Ω = {1, 2, 3, 4, 5, 6}.
(b) If we ﬂip a coin, then we have either ”heads” or ”tails” on top, that
means
Ω = {H, T}.
If we have two coins, then
Ω = {(H, H), (H, T), (T, H), (T, T)}
is the set of all possible outcomes.
(c) For the lifetime of a bulb in hours we can choose
Ω = [0, ∞).
(2) A σ-algebra F, which is the system of observable subsets or events
A ⊆ Ω. The interpretation is that one can usually not decide whether a
system is in the particular state ω ∈ Ω, but one can decide whether ω ∈ A
or ω ∈ A.
(3) A measure P, which gives a probability to all A ∈ F. This probability
is a number P(A) ∈ [0, 1] that describes how likely it is that the event A
occurs.
For the formal mathematical approach we proceed in two steps: in the ﬁrst
step we deﬁne the σ-algebras F, here we do not need any measure. In the
second step we introduce the measures.
9
10 CHAPTER 1. PROBABILITY SPACES
1.1 Deﬁnition of σ-algebras
The σ-algebra is a basic tool in probability theory. It is the set the proba-
bility measures are deﬁned on. Without this notion it would be impossible
to consider the fundamental Lebesgue measure on the interval [0, 1] or to
consider Gaussian measures, without which many parts of mathematics can
not live.
Deﬁnition 1.1.1 [σ-algebra, algebra, measurable space] Let Ω be
a non-empty set. A system F of subsets A ⊆ Ω is called σ-algebra on Ω if
(1) ∅, Ω ∈ F,
(2) A ∈ F implies that A
c
:= Ω\A ∈ F,
(3) A
1
, A
2
, ... ∈ F implies that
¸

i=1
A
i
∈ F.
The pair (Ω, F), where F is a σ-algebra on Ω, is called measurable space.
The elements A ∈ F are called events. An event A occurs if ω ∈ A and it
does not occur if ω ∈ A.
If one replaces (3) by
(3

) A, B ∈ F implies that A ∪ B ∈ F,
then F is called an algebra.
Every σ-algebra is an algebra. Sometimes, the terms σ-ﬁeld and ﬁeld are
used instead of σ-algebra and algebra. We consider some ﬁrst examples.
Example 1.1.2 (a) The largest σ-algebra on Ω: if F = 2

is the system
of all subsets A ⊆ Ω, then F is a σ-algebra.
(b) The smallest σ-algebra: F = {Ω, ∅}.
(c) If A ⊆ Ω, then F = {Ω, ∅, A, A
c
} is a σ-algebra.
Some more concrete examples are the following:
Example 1.1.3 (a) Assume a model for a die, i.e. Ω := {1, ..., 6} and
F := 2

. The event ”the die shows an even number” is described by
A = {2, 4, 6}.
(b) Assume a model with two dice, i.e. Ω := {(a, b) : a, b = 1, ..., 6} and
F := 2

. The event ”the sum of the two dice equals four” is described
by
A = {(1, 3), (2, 2), (3, 1)}.
1.1. DEFINITION OF σ-ALGEBRAS 11
(c) Assume a model for two coins, i.e. Ω := {(H, H), (H, T), (T, H), (T, T)}
and F := 2

. ”Exactly one of two coins shows heads” is modeled via
A = {(H, T), (T, H)}.
(d) Assume that we want to model the lifetime of a bulb, so that Ω :=
[0, ∞). ”The bulb works more than 200 hours” we express by
A = (200, ∞).
But what is the right σ-algebra in this case? It is not 2

which would
be too big.
If Ω = {ω
1
, ..., ω
n
}, then any algebra F on Ω is automatically a σ-algebra.
However, in general this is not the case as shown by the next example:
Example 1.1.4 [algebra, which is not a σ-algebra] Let G be the
system of subsets A ⊆ R such that A can be written as
A = (a
1
, b
1
] ∪ (a
2
, b
2
] ∪ · · · ∪ (a
n
, b
n
]
where −∞ ≤ a
1
≤ b
1
≤ · · · ≤ a
n
≤ b
n
≤ ∞ with the convention that
(a, ∞] = (a, ∞) and (a, a] = ∅. Then G is an algebra, but not a σ-algebra.
Unfortunately, most of the important σ–algebras can not be constructed
explicitly. Surprisingly, one can work practically with them nevertheless. In
the following we describe a simple procedure which generates σ–algebras. We
Proposition 1.1.5 [intersection of σ-algebras is a σ-algebra] Let
Ω be an arbitrary non-empty set and let F
j
, j ∈ J, J = ∅, be a family of
σ-algebras on Ω, where J is an arbitrary index set. Then
F :=
¸
j∈J
F
j
is a σ-algebra as well.
Proof. The proof is very easy, but typical and fundamental. First we notice
that ∅, Ω ∈ F
j
for all j ∈ J, so that ∅, Ω ∈
¸
j∈J
F
j
. Now let A, A
1
, A
2
, ... ∈
¸
j∈J
F
j
. Hence A, A
1
, A
2
, ... ∈ F
j
for all j ∈ J, so that (F
j
are σ–algebras!)
A
c
= Ω\A ∈ F
j
and

¸
i=1
A
i
∈ F
j
for all j ∈ J. Consequently,
A
c

¸
j∈J
F
j
and

¸
i=1
A
i

¸
j∈J
F
j
.

12 CHAPTER 1. PROBABILITY SPACES
Proposition 1.1.6 [smallest σ-algebra containing a set-system]
Let Ω be an arbitrary non-empty set and G be an arbitrary system of subsets
A ⊆ Ω. Then there exists a smallest σ-algebra σ(G) on Ω such that
G ⊆ σ(G).
Proof. We let
J := {C is a σ–algebra on Ω such that G ⊆ C} .
According to Example 1.1.2 one has J = ∅, because
G ⊆ 2

and 2

is a σ–algebra. Hence
σ(G) :=
¸
C∈J
C
yields to a σ-algebra according to Proposition 1.1.5 such that (by construc-
tion) G ⊆ σ(G). It remains to show that σ(G) is the smallest σ-algebra
containing G. Assume another σ-algebra F with G ⊆ F. By deﬁnition of J
we have that F ∈ J so that
σ(G) =
¸
C∈J
C ⊆ F.

The construction is very elegant but has, as already mentioned, the slight
disadvantage that one cannot construct all elements of σ(G) explicitly. Let
us now turn to one of the most important examples, the Borel σ-algebra on
R. To do this we need the notion of open and closed sets.
Deﬁnition 1.1.7 [open and closed sets]
(1) A subset A ⊆ R is called open, if for each x ∈ A there is an ε > 0 such
that (x −ε, x + ε) ⊆ A.
(2) A subset B ⊆ R is called closed, if A := R\B is open.
Given −∞ ≤ a ≤ b ≤ ∞, the interval (a, b) is open and the interval [a, b] is
closed. Moreover, by deﬁnition the empty set ∅ is open and closed.
Proposition 1.1.8 [Generation of the Borel σ-algebra on R] We
let
G
0
be the system of all open subsets of R,
G
1
be the system of all closed subsets of R,
G
2
be the system of all intervals (−∞, b], b ∈ R,
G
3
be the system of all intervals (−∞, b), b ∈ R,
G
4
be the system of all intervals (a, b], −∞< a < b < ∞,
G
5
be the system of all intervals (a, b), −∞< a < b < ∞.
Then σ(G
0
) = σ(G
1
) = σ(G
2
) = σ(G
3
) = σ(G
4
) = σ(G
5
).
1.1. DEFINITION OF σ-ALGEBRAS 13
Deﬁnition 1.1.9 [Borel σ-algebra on R]
1
The σ-algebra constructed
in Proposition 1.1.8 is called Borel σ-algebra and denoted by B(R).
In the same way one can introduce Borel σ-algebras on metric spaces:
Given a metric space M with metric d a set A ⊆ M is open provided that
for all x ∈ A there is a ε > 0 such that {y ∈ M : d(x, y) < ε} ⊆ A. A set
B ⊆ M is closed if the complement M\B is open. The Borel σ-algebra
B(M) is the smallest σ-algebra that contains all open (closed) subsets of M.
Proof of Proposition 1.1.8. We only show that
σ(G
0
) = σ(G
1
) = σ(G
3
) = σ(G
5
).
Because of G
3
⊆ G
0
one has
σ(G
3
) ⊆ σ(G
0
).
Moreover, for −∞< a < b < ∞ one has that
(a, b) =

¸
n=1

(−∞, b)\(−∞, a +
1
n
)

∈ σ(G
3
)
so that G
5
⊆ σ(G
3
) and
σ(G
5
) ⊆ σ(G
3
).
Now let us assume a bounded non-empty open set A ⊆ R. For all x ∈ A
there is a maximal ε
x
> 0 such that
(x −ε
x
, x + ε
x
) ⊆ A.
Hence
A =
¸
x∈A∩Q
(x −ε
x
, x + ε
x
),
which proves G
0
⊆ σ(G
5
) and
σ(G
0
) ⊆ σ(G
5
).
Finally, A ∈ G
0
implies A
c
∈ G
1
⊆ σ(G
1
) and A ∈ σ(G
1
). Hence G
0
⊆ σ(G
1
)
and
σ(G
0
) ⊆ σ(G
1
).
The remaining inclusion σ(G
1
) ⊆ σ(G
0
) can be shown in the same way.
1
F´elix Edouard Justin
´
Emile Borel, 07/01/1871-03/02/1956, French mathematician.
14 CHAPTER 1. PROBABILITY SPACES
1.2 Probability measures
Now we introduce the measures we are going to use:
Deﬁnition 1.2.1 [probability measure, probability space] Let
(Ω, F) be a measurable space.
(1) A map P : F → [0, 1] is called probability measure if P(Ω) = 1 and
for all A
1
, A
2
, ... ∈ F with A
i
∩ A
j
= ∅ for i = j one has
P

¸
i=1
A
i

=

¸
i=1
P(A
i
). (1.1)
The triplet (Ω, F, P) is called probability space.
(2) A map µ : F → [0, ∞] is called measure if µ(∅) = 0 and for all
A
1
, A
2
, ... ∈ F with A
i
∩ A
j
= ∅ for i = j one has
µ

¸
i=1
A
i

=

¸
i=1
µ(A
i
).
The triplet (Ω, F, µ) is called measure space.
(3) A measure space (Ω, F, µ) or a measure µ is called σ-ﬁnite provided
that there are Ω
k
⊆ Ω, k = 1, 2, ..., such that
(a) Ω
k
∈ F for all k = 1, 2, ...,
(b) Ω
i
∩ Ω
j
= ∅ for i = j,
(c) Ω =
¸

k=1

k
,
(d) µ(Ω
k
) < ∞.
The measure space (Ω, F, µ) or the measure µ are called ﬁnite if
µ(Ω) < ∞.
Remark 1.2.2 Of course, any probability measure is a ﬁnite measure: We
need only to check µ(∅) = 0 which follows from µ(∅) =
¸

i=1
µ(∅) (note that
∅ ∩ ∅ = ∅) and µ(Ω) < ∞.
Example 1.2.3 (a) We assume the model of a die, i.e. Ω = {1, ..., 6} and
F = 2

. Assuming that all outcomes for rolling a die are equally likely,
P({ω}) :=
1
6
.
Then, for example,
P({2, 4, 6}) =
1
2
.
1.2. PROBABILITY MEASURES 15
(b) If we assume to have two coins, i.e.
Ω = {(T, T), (H, T), (T, H), (H, H)}
and F = 2

, then the intuitive assumption ’fair’ leads to
P({ω}) :=
1
4
.
That means, for example, the probability that exactly one of two coins
P({(H, T), (T, H)}) =
1
2
.
Example 1.2.4 [Dirac and counting measure]
2
(a) Dirac measure: For F = 2

and a ﬁxed x
0
∈ Ω we let
δ
x
0
(A) :=

1 : x
0
∈ A
0 : x
0
∈ A
.
(b) Counting measure: Let Ω := {ω
1
, ..., ω
N
} and F = 2

. Then
µ(A) := cardinality of A.
Let us now discuss a typical example in which the σ–algebra F is not the set
of all subsets of Ω.
Example 1.2.5 Assume that there are n communication channels between
the points A and B. Each of the channels has a communication rate of ρ > 0
(say ρ bits per second), which yields to the communication rate ρk, in case
k channels are used. Each of the channels fails with probability p, so that
we have a random communication rate R ∈ {0, ρ, ..., nρ}. What is the right
model for this? We use
Ω := {ω = (ε
1
, ..., ε
n
) : ε
i
∈ {0, 1})
with the interpretation: ε
i
= 0 if channel i is failing, ε
i
= 1 if channel i is
working. F consists of all unions of
A
k
:= {ω ∈ Ω : ε
1
+· · · + ε
n
= k} .
Hence A
k
consists of all ω such that the communication rate is ρk. The
system F is the system of observable sets of events since one can only observe
how many channels are failing, but not which channel fails. The measure P
is given by
P(A
k
) :=

n
k

p
n−k
(1 −p)
k
, 0 < p < 1.
Note that P describes the binomial distribution with parameter p on
{0, ..., n} if we identify A
k
with the natural number k.
2
Paul Adrien Maurice Dirac, 08/08/1902 (Bristol, England) - 20/10/1984 (Tallahassee,
Florida, USA), Nobelprice in Physics 1933.
16 CHAPTER 1. PROBABILITY SPACES
We continue with some basic properties of a probability measure.
Proposition 1.2.6 Let (Ω, F, P) be a probability space. Then the following
assertions are true:
(1) If A
1
, ..., A
n
∈ F such that A
i
∩ A
j
= ∅ if i = j, then P(
¸
n
i=1
A
i
) =
¸
n
i=1
P(A
i
).
(2) If A, B ∈ F, then P(A\B) = P(A) −P(A ∩ B).
(3) If B ∈ F, then P(B
c
) = 1 −P(B).
(4) If A
1
, A
2
, ... ∈ F then P(
¸

i=1
A
i
) ≤
¸

i=1
P(A
i
).
(5) Continuity from below: If A
1
, A
2
, ... ∈ F such that A
1
⊆ A
2

A
3
⊆ · · · , then
lim
n→∞
P(A
n
) = P

¸
n=1
A
n

.
(6) Continuity from above: If A
1
, A
2
, ... ∈ F such that A
1
⊇ A
2

A
3
⊇ · · · , then
lim
n→∞
P(A
n
) = P

¸
n=1
A
n

.
Proof. (1) We let A
n+1
= A
n+2
= · · · = ∅, so that
P

n
¸
i=1
A
i

= P

¸
i=1
A
i

=

¸
i=1
P(A
i
) =
n
¸
i=1
P(A
i
) ,
because of P(∅) = 0.
(2) Since (A ∩ B) ∩ (A\B) = ∅, we get that
P(A ∩ B) +P(A\B) = P((A ∩ B) ∪ (A\B)) = P(A).
(3) We apply (2) to A = Ω and observe that Ω\B = B
c
by deﬁnition and
Ω ∩ B = B.
(4) Put B
1
:= A
1
and B
i
:= A
c
1
∩A
c
2
∩· · ·∩A
c
i−1
∩A
i
for i = 2, 3, . . . Obviously,
P(B
i
) ≤ P(A
i
) for all i. Since the B
i
’s are disjoint and
¸

i=1
A
i
=
¸

i=1
B
i
it
follows
P

¸
i=1
A
i

= P

¸
i=1
B
i

=

¸
i=1
P(B
i
) ≤

¸
i=1
P(A
i
).
(5) We deﬁne B
1
:= A
1
, B
2
:= A
2
\A
1
, B
3
:= A
3
\A
2
, B
4
:= A
4
\A
3
, ... and
get that

¸
n=1
B
n
=

¸
n=1
A
n
and B
i
∩ B
j
= ∅
1.2. PROBABILITY MEASURES 17
for i = j. Consequently,
P

¸
n=1
A
n

=P

¸
n=1
B
n

=

¸
n=1
P(B
n
) = lim
N→∞
N
¸
n=1
P(B
n
) = lim
N→∞
P(A
N
)
since
¸
N
n=1
B
n
= A
N
. (6) is an exercise.
The sequence a
n
= (−1)
n
, n = 1, 2, . . . does not converge, i.e. the limit of
(a
n
)

n=1
does not exist. But the limit superior and the limit inferior for a
given sequence of real numbers exists always.
Deﬁnition 1.2.7 [liminf
n
a
n
and limsup
n
a
n
] For a
1
, a
2
, ... ∈ R we let
liminf
n
a
n
:= lim
n
inf
k≥n
a
k
and limsup
n
a
n
:= lim
n
sup
k≥n
a
k
.
Remark 1.2.8 (1) The value liminf
n
a
n
is the inﬁmum of all c such that
there is a subsequence n
1
< n
2
< n
3
< · · · such that lim
k
a
n
k
= c.
(2) The value limsup
n
a
n
is the supremum of all c such that there is a
subsequence n
1
< n
2
< n
3
< · · · such that lim
k
a
n
k
= c.
(3) By deﬁnition one has that
−∞ ≤ liminf
n
a
n
≤ limsup
n
a
n
≤ ∞.
Moreover, if liminf
n
a
n
= limsup
n
a
n
= a ∈ R, then lim
n
a
n
= a.
(4) For example, taking a
n
= (−1)
n
, gives
liminf
n
a
n
= −1 and limsup
n
a
n
= 1.
As we will see below, also for a sequence of sets one can deﬁne a limit superior
and a limit inferior.
Deﬁnition 1.2.9 [liminf
n
A
n
and limsup
n
A
n
] Let (Ω, F) be a measurable
space and A
1
, A
2
, ... ∈ F. Then
liminf
n
A
n
:=

¸
n=1

¸
k=n
A
k
and limsup
n
A
n
:=

¸
n=1

¸
k=n
A
k
.
The deﬁnition above says that ω ∈ liminf
n
A
n
if and only if all events A
n
,
except a ﬁnite number of them, occur, and that ω ∈ limsup
n
A
n
if and only
if inﬁnitely many of the events A
n
occur.
18 CHAPTER 1. PROBABILITY SPACES
Proposition 1.2.10 [Lemma of Fatou]
3
Let (Ω, F, P) be a probability
space and A
1
, A
2
, ... ∈ F. Then
P

liminf
n
A
n

≤ liminf
n
P(A
n
) ≤ limsup
n
P(A
n
) ≤ P

limsup
n
A
n

.
The proposition will be deduced from Proposition 3.2.6.
Remark 1.2.11 If liminf
n
A
n
= limsup
n
A
n
= A, then the Lemma of Fatou
gives that
liminf
n
P(A
n
) = limsup
n
P(A
n
) = lim
n
P(A
n
) = P(A) .
Examples for such systems are obtained by assuming A
1
⊆ A
2
⊆ A
3
⊆ · · ·
or A
1
⊇ A
2
⊇ A
3
⊇ · · · .
Now we turn to the fundamental notion of independence.
Deﬁnition 1.2.12 [independence of events] Let (Ω, F, P) be a prob-
ability space. The events (A
i
)
i∈I
⊆ F, where I is an arbitrary non-empty
index set, are called independent, provided that for all distinct i
1
, ..., i
n
∈ I
one has that
P(A
i
1
∩ A
i
2
∩ · · · ∩ A
i
n
) = P(A
i
1
) P(A
i
2
) · · · P(A
i
n
) .
Given A
1
, ..., A
n
∈ F, one can easily see that only demanding
P(A
1
∩ A
2
∩ · · · ∩ A
n
) = P(A
1
) P(A
2
) · · · P(A
n
)
would not yield to an appropriate notion for the independence of A
1
, ..., A
n
:
for example, taking A and B with
P(A ∩ B) = P(A)P(B)
and C = ∅ gives
P(A ∩ B ∩ C) = P(A)P(B)P(C),
which is surely not, what we had in mind. Independence can be also expressed
through conditional probabilities. Let us ﬁrst deﬁne them:
Deﬁnition 1.2.13 [conditional probability] Let (Ω, F, P) be a proba-
bility space, A ∈ F with P(A) > 0. Then
P(B|A) :=
P(B ∩ A)
P(A)
, for B ∈ F,
is called conditional probability of B given A.
3
Pierre Joseph Louis Fatou, 28/02/1878-10/08/1929, French mathematician (dynami-
cal systems, Mandelbrot-set).
1.2. PROBABILITY MEASURES 19
It is now obvious that A and B are independent if and only if P(B|A) = P(B).
An important place of the conditional probabilities in Statistics is guaranteed
by Bayes’ formula. Before we formulate this formula in Proposition 1.2.15
we consider A, B ∈ F, with 0 < P(B) < 1 and P(A) > 0. Then
A = (A ∩ B) ∪ (A ∩ B
c
),
where (A ∩ B) ∩ (A ∩ B
c
) = ∅, and therefore,
P(A) = P(A ∩ B) +P(A ∩ B
c
)
= P(A|B)P(B) +P(A|B
c
)P(B
c
).
This implies
P(B|A) =
P(B ∩ A)
P(A)
=
P(A|B)P(B)
P(A)
=
P(A|B)P(B)
P(A|B)P(B) +P(A|B
c
)P(B
c
)
.
Example 1.2.14 A laboratory blood test is 95% eﬀective in detecting a
certain disease when it is, in fact, present. However, the test also yields a
”false positive” result for 1% of the healthy persons tested. If 0.5% of the
population actually has the disease, what is the probability a person has the
disease given his test result is positive? We set
B := ”the person has the disease”,
A := ”the test result is positive”.
Hence we have
P(A|B) = P(”a positive test result”|”person has the disease”) = 0.95,
P(A|B
c
) = 0.01,
P(B) = 0.005.
Applying the above formula we get
P(B|A) =
0.95 ×0.005
0.95 ×0.005 + 0.01 ×0.995
≈ 0.323.
That means only 32% of the persons whose test results are positive actually
have the disease.
Proposition 1.2.15 [Bayes’ formula]
4
Assume A, B
j
∈ F, with Ω =
¸
n
j=1
B
j
, where B
i
∩ B
j
= ∅ for i = j and P(A) > 0, P(B
j
) > 0 for j =
1, . . . , n. Then
P(B
j
|A) =
P(A|B
j
)P(B
j
)
¸
n
k=1
P(A|B
k
)P(B
k
)
.
4
Thomas Bayes, 1702-17/04/1761, English mathematician.
20 CHAPTER 1. PROBABILITY SPACES
The proof is an exercise. An event B
j
is also called hypothesis, the prob-
abilities P(B
j
) the prior probabilities (or a priori probabilities), and the
probabilities P(B
j
|A) the posterior probabilities (or a posteriori proba-
bilities) of B
j
.
Now we continue with the fundamental Lemma of Borel-Cantelli.
Proposition 1.2.16 [Lemma of Borel-Cantelli]
5
Let (Ω, F, P) be a
probability space and A
1
, A
2
, ... ∈ F. Then one has the following:
(1) If
¸

n=1
P(A
n
) < ∞, then P(limsup
n→∞
A
n
) = 0.
(2) If A
1
, A
2
, ... are assumed to be independent and
¸

n=1
P(A
n
) = ∞, then
P(limsup
n→∞
A
n
) = 1.
Proof. (1) It holds by deﬁnition limsup
n→∞
A
n
=
¸

n=1
¸

k=n
A
k
. By

¸
k=n+1
A
k

¸
k=n
A
k
and the continuity of P from above (see Proposition 1.2.6) we get that
P

limsup
n→∞
A
n

= P

¸
n=1

¸
k=n
A
k

= lim
n→∞
P

¸
k=n
A
k

≤ lim
n→∞

¸
k=n
P(A
k
) = 0,
where the last inequality follows again from Proposition 1.2.6.
(2) It holds that

limsup
n
A
n

c
= liminf
n
A
c
n
=

¸
n=1

¸
k=n
A
c
n
.
So, we would need to show
P

¸
n=1

¸
k=n
A
c
n

= 0.
Letting B
n
:=
¸

k=n
A
c
k
we get that B
1
⊆ B
2
⊆ B
3
⊆ · · · , so that
P

¸
n=1

¸
k=n
A
c
n

= lim
n→∞
P(B
n
)
5
Francesco Paolo Cantelli, 20/12/1875-21/07/1966, Italian mathematician.
1.2. PROBABILITY MEASURES 21
so that it suﬃces to show
P(B
n
) = P

¸
k=n
A
c
k

= 0.
Since the independence of A
1
, A
2
, ... implies the independence of A
c
1
, A
c
2
, ...,
we ﬁnally get (setting p
n
:= P(A
n
)) that
P

¸
k=n
A
c
k

= lim
N→∞,N≥n
P

N
¸
k=n
A
c
k

= lim
N→∞,N≥n
N
¸
k=n
P(A
c
k
)
= lim
N→∞,N≥n
N
¸
k=n
(1 −p
k
)
≤ lim
N→∞,N≥n
N
¸
k=n
e
−p
k
= lim
N→∞,N≥n
e

P
N
k=n
p
k
= e

P

k=n
p
k
= e
−∞
= 0
where we have used that 1 −x ≤ e
−x
.
Although the deﬁnition of a measure is not diﬃcult, to prove existence and
uniqueness of measures may sometimes be diﬃcult. The problem lies in the
fact that, in general, the σ-algebras are not constructed explicitly, one only
knows their existence. To overcome this diﬃculty, one usually exploits
Proposition 1.2.17 [Carath
´
eodory’s extension theorem]
6
Let Ω be a non-empty set and G an algebra on Ω such that
F := σ(G).
Assume that P
0
: G →[0, ∞) satisﬁes:
(1) P
0
(Ω) < ∞.
(2) If A
1
, A
2
, ... ∈ G, A
i
∩ A
j
= ∅ for i = j, and
¸

i=1
A
i
∈ G, then
P
0

¸
i=1
A
i

=

¸
i=1
P
0
(A
i
).
6
Constantin Carath´eodory, 13/09/1873 (Berlin, Germany) - 02/02/1950 (Munich, Ger-
many).
22 CHAPTER 1. PROBABILITY SPACES
Then there exists a unique ﬁnite measure P on F such that
P(A) = P
0
(A) for all A ∈ G.
Proof. See [3] (Theorem 3.1).
As an application we construct (more or less without rigorous proof) the
product space
(Ω
1
×Ω
2
, F
1
⊗F
2
, P
1
×P
2
)
of two probability spaces (Ω
1
, F
1
, P
1
) and (Ω
2
, F
2
, P
2
). We do this as follows:
(1) Ω
1
×Ω
2
:= {(ω
1
, ω
2
) : ω
1
∈ Ω
1
, ω
2
∈ Ω
2
}.
(2) F
1
⊗F
2
is the smallest σ-algebra on Ω
1
×Ω
2
which contains all sets of
type
A
1
×A
2
:= {(ω
1
, ω
2
) : ω
1
∈ A
1
, ω
2
∈ A
2
} with A
1
∈ F
1
, A
2
∈ F
2
.
(3) As algebra G we take all sets of type
A :=

A
1
1
×A
1
2

∪ · · · ∪ (A
n
1
×A
n
2
)
with A
k
1
∈ F
1
, A
k
2
∈ F
2
, and (A
i
1
×A
i
2
) ∩

A
j
1
×A
j
2

= ∅ for i = j.
Finally, we deﬁne P
0
: G →[0, 1] by
P
0

A
1
1
×A
1
2

∪ · · · ∪ (A
n
1
×A
n
2
)

:=
n
¸
k=1
P
1
(A
k
1
)P
2
(A
k
2
).
Proposition 1.2.18 The system G is an algebra. The map P
0
: G → [0, 1]
is correctly deﬁned and satisﬁes the assumptions of Carath´eodory’s extension
theorem Proposition 1.2.17.
Proof. (i) Assume
A =

A
1
1
×A
1
2

∪ · · · ∪ (A
n
1
×A
n
2
) =

B
1
1
×B
1
2

∪ · · · ∪ (B
m
1
×B
m
2
)
where the (A
k
1
×A
k
2
)
n
k=1
and the (B
l
1
×B
l
2
)
m
l=1
are pair-wise disjoint, respec-
tively. We ﬁnd partitions C
1
1
, ..., C
N
1
1
of Ω
1
and C
1
2
, ..., C
N
2
2
of Ω
2
so that A
k
i
and B
l
i
can be represented as disjoint unions of the sets C
r
i
, r = 1, ..., N
i
.
Hence there is a representation
A =
¸
(r,s)∈I
(C
r
1
×C
s
2
)
for some I ⊆ {1, ..., N
1
} ×{1, ...., N
2
}. By drawing a picture and using that
P
1
and P
2
are measures one observes that
n
¸
k=1
P
1
(A
k
1
)P
2
(A
k
2
) =
¸
(r,s)∈I
P
1
(C
r
1
)P
2
(C
s
2
) =
m
¸
l=1
P
1
(B
l
1
)P
2
(B
l
2
).
1.2. PROBABILITY MEASURES 23
(ii) To check that P
0
is σ-additive on G it is suﬃcient to prove the following:
For A
i
, A
k
i
∈ F
i
with
A
1
×A
2
=

¸
k=1
(A
k
1
×A
k
2
)
and (A
k
1
×A
k
2
) ∩ (A
l
1
×A
l
2
) = ∅ for k = l one has that
P
1
(A
1
)P
2
(A
2
) =

¸
k=1
P
1
(A
k
1
)P
2
(A
k
2
).
Since the inequality
P
1
(A
1
)P
2
(A
2
) ≥
N
¸
k=1
P
1
(A
k
1
)P
2
(A
k
2
)
can be easily seen for all N = 1, 2, ... by an argument like in step (i) we
concentrate ourselves on the converse inequality
P
1
(A
1
)P
2
(A
2
) ≤

¸
k=1
P
1
(A
k
1
)P
2
(A
k
2
).
We let
ϕ(ω
1
) :=

¸
n=1
1I

1
∈A
n
1
}
P
2
(A
n
2
) and ϕ
N

1
) :=
N
¸
n=1
1I

1
∈A
n
1
}
P
2
(A
n
2
)
for N ≥ 1, so that 0 ≤ ϕ
N

1
) ↑
N
ϕ(ω
1
) = 1I
A
1

1
)P
2
(A
2
). Let ε ∈ (0, 1)
and
B
N
ε
:= {ω
1
∈ Ω
1
: (1 −ε)P
2
(A
2
) ≤ ϕ
N

1
)} ∈ F
1
.
The sets B
N
ε
are non-decreasing in N and
¸

N=1
B
N
ε
= A
1
so that
(1 −ε)P
1
(A
1
)P
2
(A
2
) = lim
N
(1 −ε)P
1
(B
N
ε
)P
2
(A
2
).
Because (1 −ε)P
2
(A
2
) ≤ ϕ
N

1
) for all ω
1
∈ B
N
ε
one gets (after some calcu-
lation...) (1 −ε)P
2
(A
2
)P
1
(B
N
ε
) ≤
¸
N
k=1
P
1
(A
k
1
)P
2
(A
k
2
) and therefore
lim
N
(1 −ε)P
1
(B
N
ε
)P
2
(A
2
) ≤ lim
N
N
¸
k=1
P
1
(A
k
1
)P
2
(A
k
2
) =

¸
k=1
P
1
(A
k
1
)P
2
(A
k
2
).
Since ε ∈ (0, 1) was arbitrary, we are done.
Deﬁnition 1.2.19 [product of probability spaces] The extension of
P
0
to F
1
⊗ F
2
according to Proposition 1.2.17 is called product measure
and denoted by P
1
×P
2
. The probability space (Ω
1
×Ω
2
, F
1
⊗F
2
, P
1
×P
2
)
is called product probability space.
24 CHAPTER 1. PROBABILITY SPACES
One can prove that
(F
1
⊗F
2
) ⊗F
3
= F
1
⊗(F
2
⊗F
3
) and (P
1
×P
2
) ×P
3
= P
1
×(P
2
×P
3
),
which we simply denote by
(Ω
1
×Ω
2
×Ω
3
, F
1
⊗F
2
⊗F
3
, P
1
×P
2
×P
3
).
Iterating this procedure, we can deﬁne ﬁnite products
(Ω
1
×Ω
2
×· · · ×Ω
n
, F
1
⊗F
2
⊗· · · ⊗F
n
, P
1
×P
2
×· · · ×P
n
)
by iteration. The case of inﬁnite products requires more work. Here the
interested reader is referred, for example, to [5] where a special case is con-
sidered.
Now we deﬁne the the Borel σ-algebra on R
n
.
Deﬁnition 1.2.20 For n ∈ {1, 2, ...} we let
B(R
n
) := σ ((a
1
, b
1
) ×· · · ×(a
n
, b
n
) : a
1
< b
1
, ..., a
n
< b
n
) .
Letting |x − y| := (
¸
n
k=1
|x
k
−y
k
|
2
)
1
2
for x = (x
1
, ..., x
n
) ∈ R
n
and y =
(y
1
, ..., y
n
) ∈ R
n
, we say that a set A ⊆ R
n
is open provided that for all
x ∈ A there is an ε > 0 such that
{y ∈ R
n
: |x −y| < ε} ⊆ A.
As in the case n = 1 one can show that B(R
n
) is the smallest σ-algebra which
contains all open subsets of R
n
. Regarding the above product spaces there
is
Proposition 1.2.21 B(R
n
) = B(R) ⊗· · · ⊗B(R).
If one is only interested in the uniqueness of measures one can also use the
following approach as a replacement of Carath
´
eodory’s extension theo-
rem:
Deﬁnition 1.2.22 [π-system] A system G of subsets A ⊆ Ω is called π-
system, provided that
A ∩ B ∈ G for all A, B ∈ G.
Any algebra is a π-system but a π-system is not an algebra in general, take
for example the π-system {(a, b) : −∞< a < b < ∞} ∪ {∅}.
Proposition 1.2.23 Let (Ω, F) be a measurable space with F = σ(G), where
G is a π-system. Assume two probability measures P
1
and P
2
on F such that
P
1
(A) = P
2
(A) for all A ∈ G.
Then P
1
(B) = P
2
(B) for all B ∈ F.
1.3. EXAMPLES OF DISTRIBUTIONS 25
1.3 Examples of distributions
1.3.1 Binomial distribution with parameter 0 < p < 1
(1) Ω := {0, 1, ..., n}.
(2) F := 2

(system of all subsets of Ω).
(3) P(B) = µ
n,p
(B) :=
¸
n
k=0

n
k

p
k
(1 − p)
n−k
δ
k
(B), where δ
k
is the Dirac
measure introduced in Example 1.2.4.
Interpretation: Coin-tossing with one coin, such that one has heads with
probability p and tails with probability 1 − p. Then µ
n,p
({k}) equals the
probability, that within n trials one has k-times heads.
1.3.2 Poisson distribution with parameter λ > 0
(1) Ω := {0, 1, 2, 3, ...}.
(2) F := 2

(system of all subsets of Ω).
(3) P(B) = π
λ
(B) :=
¸

k=0
e
−λ λ
k
k!
δ
k
(B).
The Poisson distribution
7
is used, for example, to model stochastic processes
with a continuous time parameter and jumps: the probability that the process
jumps k times between the time-points s and t with 0 ≤ s < t < ∞ is equal
to π
λ(t−s)
({k}).
1.3.3 Geometric distribution with parameter 0 < p < 1
(1) Ω := {0, 1, 2, 3, ...}.
(2) F := 2

(system of all subsets of Ω).
(3) P(B) = µ
p
(B) :=
¸

k=0
(1 −p)
k

k
(B).
Interpretation: The probability that an electric light bulb breaks down
is p ∈ (0, 1). The bulb does not have a ”memory”, that means the break
down is independent of the time the bulb has been already switched on. So,
we get the following model: at day 0 the probability of breaking down is p. If
the bulb survives day 0, it breaks down again with probability p at the ﬁrst
day so that the total probability of a break down at day 1 is (1 −p)p. If we
continue in this way we get that breaking down at day k has the probability
(1 −p)
k
p.
7
Sim´eon Denis Poisson, 21/06/1781 (Pithiviers, France) - 25/04/1840 (Sceaux, France).
26 CHAPTER 1. PROBABILITY SPACES
1.3.4 Lebesgue measure and uniform distribution
Using Carath
´
eodory’s extension theorem, we ﬁrst construct the Lebesgue
measure on the intervals (a, b] with −∞ < a < b < ∞. For this purpose we
let
(1) Ω := (a, b],
(2) F = B((a, b]) := σ((c, d] : a ≤ c < d ≤ b),
(3) As generating algebra G for B((a, b]) we take the system of subsets
A ⊆ (a, b] such that A can be written as
A = (a
1
, b
1
] ∪ (a
2
, b
2
] ∪ · · · ∪ (a
n
, b
n
]
where a ≤ a
1
≤ b
1
≤ · · · ≤ a
n
≤ b
n
≤ b. For such a set A we let
P
0
(A) :=
1
b −a
n
¸
i=1
(b
i
−a
i
).
Proposition 1.3.1 The system G is an algebra. The map P
0
: G → [0, 1]
is correctly deﬁned and satisﬁes the assumptions of Carath´eodory’s extension
theorem Proposition 1.2.17.
Proof. For notational simplicity we let a = 0 and b = 1. After a standard
reduction (check!) we have to show the following: given 0 ≤ a ≤ b ≤ 1 and
pair-wise disjoint intervals (a
n
, b
n
] with
(a, b] =

¸
n=1
(a
n
, b
n
]
we have that b −a =
¸

n=1
(b
n
−a
n
). Let ε ∈ (0, b −a) and observe that
[a + ε, b] ⊆

¸
n=1

a
n
, b
n
+
ε
2
n

.
Hence we have an open covering of a compact set and there is a ﬁnite sub-
cover:
[a + ε, b] ⊆
¸
n∈I(ε)
(a
n
, b
n
] ∪

b
n
, b
n
+
ε
2
n

for some ﬁnite set I(ε). The total length of the intervals

b
n
, b
n
+
ε
2
n

is at
most ε > 0, so that
b −a −ε ≤
¸
n∈I(ε)
(b
n
−a
n
) + ε ≤

¸
n=1
(b
n
−a
n
) + ε.
1.3. EXAMPLES OF DISTRIBUTIONS 27
Letting ε ↓ 0 we arrive at
b −a ≤

¸
n=1
(b
n
−a
n
).
Since b−a ≥
¸
N
n=1
(b
n
−a
n
) for all N ≥ 1, the opposite inequality is obvious.

Deﬁnition 1.3.2 [Uniform distribution] The unique extension P of P
0
to B((a, b]) according to Proposition 1.2.17 is called uniform distribution
on (a, b].
Hence P is the unique measure on B((a, b]) such that P((c, d]) =
d−c
b−a
for
a ≤ c < d ≤ b. To get the Lebesgue measure on R we observe that B ∈ B(R)
implies that B ∩ (a, b] ∈ B((a, b]) (check!). Then we can proceed as follows:
Deﬁnition 1.3.3 [Lebesgue measure]
8
Given B ∈ B(R) we deﬁne the
Lebesgue measure on R by
λ(B) :=

¸
n=−∞
P
n
(B ∩ (n −1, n])
where P
n
is the uniform distribution on (n −1, n].
Accordingly, λ is the unique σ-ﬁnite measure on B(R) such that λ((c, d]) =
d −c for all −∞< c < d < ∞. We can also write λ(B) =

B
dλ(x). Now we
can go backwards: In order to obtain the Lebesgue measure on a set I ⊆ R
with I ∈ B(R) we let B
I
:= {B ⊆ I : B ∈ B(R)} and
λ
I
(B) := λ(B) for B ∈ B
I
.
Given that λ(I) > 0, then λ
I
/λ(I) is the uniform distribution on I. Impor-
tant cases for I are the closed intervals [a, b]. Furthermore, for −∞ < a <
b < ∞ we have that B
(a,b]
= B((a, b]) (check!).
8
Henri L´eon Lebesgue, 28/06/1875-26/07/1941, French mathematician (generalized
the Riemann integral by the Lebesgue integral; continuation of work of Emile Borel and
Camille Jordan).
28 CHAPTER 1. PROBABILITY SPACES
1.3.5 Gaussian distribution on R with mean m ∈ R and
variance σ
2
> 0
(1) Ω := R.
(2) F := B(R) Borel σ-algebra.
(3) We take the algebra G considered in Example 1.1.4 and deﬁne
P
0
(A) :=
n
¸
i=1

b
i
a
i
1

2πσ
2
e

(x−m)
2

2
dx
for A := (a
1
, b
1
]∪(a
2
, b
2
]∪· · ·∪(a
n
, b
n
] where we consider the Riemann-
integral
9
on the right-hand side. One can show (we do not do this here,
but compare with Proposition 3.5.8 below) that P
0
satisﬁes the assump-
tions of Proposition 1.2.17, so that we can extend P
0
to a probability
measure N
m,σ
2 on B(R).
The measure N
m,σ
2 is called Gaussian distribution
10
(normal distribu-
tion) with mean m and variance σ
2
. Given A ∈ B(R) we write
N
m,σ
2(A) =

A
p
m,σ
2(x)dx with p
m,σ
2(x) :=
1

2πσ
2
e

(x−m)
2

2
.
The function p
m,σ
2(x) is called Gaussian density.
1.3.6 Exponential distribution on R with parameter
λ > 0
(1) Ω := R.
(2) F := B(R) Borel σ-algebra.
(3) For A and G as in Subsection 1.3.5 we deﬁne, via the Riemann-integral,
P
0
(A) :=
n
¸
i=1

b
i
a
i
p
λ
(x)dx with p
λ
(x) := 1I
[0,∞)
(x)λe
−λx
Again, P
0
satisﬁes the assumptions of Proposition 1.2.17, so that we
can extend P
0
to the exponential distribution µ
λ
with parameter λ
and density p
λ
(x) on B(R).
9
Georg Friedrich Bernhard Riemann, 17/09/1826 (Germany) - 20/07/1866 (Italy),
Ph.D. thesis under Gauss.
10
Johann Carl Friedrich Gauss, 30/04/1777 (Brunswick, Germany) - 23/02/1855
(G¨ottingen, Hannover, Germany).
1.3. EXAMPLES OF DISTRIBUTIONS 29
Given A ∈ B(R) we write
µ
λ
(A) =

A
p
λ
(x)dx.
The exponential distribution can be considered as a continuous time version
of the geometric distribution. In particular, we see that the distribution does
not have a memory in the sense that for a, b ≥ 0 we have
µ
λ
([a + b, ∞)|[a, ∞)) = µ
λ
([b, ∞))
with the conditional probability on the left-hand side. In words: the proba-
bility of a realization larger or equal to a+b under the condition that one has
already a value larger or equal a is the same as having a realization larger or
equal b. Indeed, it holds
µ
λ
([a + b, ∞)|[a, ∞)) =
µ
λ
([a + b, ∞) ∩ [a, ∞))
µ
λ
([a, ∞))
=
λ

a+b
e
−λx
dx
λ

a
e
−λx
dx
=
e
−λ(a+b)
e
−λa
= µ
λ
([b, ∞)).
Example 1.3.4 Suppose that the amount of time one spends in a post oﬃce
is exponential distributed with λ =
1
10
.
(a) What is the probability, that a customer will spend more than 15 min-
utes?
(b) What is the probability, that a customer will spend more than 15 min-
utes from the beginning in the post oﬃce, given that the customer
already spent at least 10 minutes?
The answer for (a) is µ
λ
([15, ∞)) = e
−15
1
10
≈ 0.220. For (b) we get
µ
λ
([15, ∞)|[10, ∞)) = µ
λ
([5, ∞)) = e
−5
1
10
≈ 0.604.
1.3.7 Poisson’s Theorem
For large n and small p the Poisson distribution provides a good approxima-
tion for the binomial distribution.
Proposition 1.3.5 [Poisson’s Theorem] Let λ > 0, p
n
∈ (0, 1), n =
1, 2, ..., and assume that np
n
→λ as n →∞. Then, for all k = 0, 1, . . . ,
µ
n,p
n
({k}) →π
λ
({k}), n →∞.
30 CHAPTER 1. PROBABILITY SPACES
Proof. Fix an integer k ≥ 0. Then
µ
n,p
n
({k}) =

n
k

p
k
n
(1 −p
n
)
n−k
=
n(n −1) . . . (n −k + 1)
k!
p
k
n
(1 −p
n
)
n−k
=
1
k!
n(n −1) . . . (n −k + 1)
n
k
(np
n
)
k
(1 −p
n
)
n−k
.
Of course, lim
n→∞
(np
n
)
k
= λ
k
and lim
n→∞
n(n−1)...(n−k+1)
n
k
= 1. So we have
to show that lim
n→∞
(1 −p
n
)
n−k
= e
−λ
. By np
n
→λ we get that there exists
a sequence ε
n
such that
np
n
= λ + ε
n
with lim
n→∞
ε
n
= 0.
Choose ε
0
> 0 and n
0
≥ 1 such that |ε
n
| ≤ ε
0
for all n ≥ n
0
. Then

1 −
λ + ε
0
n

n−k

1 −
λ + ε
n
n

n−k

1 −
λ −ε
0
n

n−k
.
Using l’Hospital’s rule we get
lim
n→∞
ln

1 −
λ + ε
0
n

n−k
= lim
n→∞
(n −k) ln

1 −
λ + ε
0
n

= lim
n→∞
ln

1 −
λ+ε
0
n

1/(n −k)
= lim
n→∞

1 −
λ+ε
0
n

−1
λ+ε
0
n
2
−1/(n −k)
2
= −(λ + ε
0
).
Hence
e
−(λ+ε
0
)
= lim
n→∞

1 −
λ + ε
0
n

n−k
≤ lim
n→∞

1 −
λ + ε
n
n

n−k
.
In the same way we get that
lim
n→∞

1 −
λ + ε
n
n

n−k
≤ e
−(λ−ε
0
)
.
Finally, since we can choose ε
0
> 0 arbitrarily small
lim
n→∞
(1 −p
n
)
n−k
= lim
n→∞

1 −
λ + ε
n
n

n−k
= e
−λ
.

1.4. A SET WHICH IS NOT A BOREL SET 31
1.4 A set which is not a Borel set
In this section we shall construct a set which is a subset of (0, 1] but not an
element of
B((0, 1]) := {B = A ∩ (0, 1] : A ∈ B(R)} .
Before we start we need
Deﬁnition 1.4.1 [λ-system] A class L is a λ-system if
(1) Ω ∈ L,
(2) A, B ∈ L and A ⊆ B imply B\A ∈ L,
(3) A
1
, A
2
, · · · ∈ L and A
n
⊆ A
n+1
, n = 1, 2, . . . imply
¸

n=1
A
n
∈ L.
Proposition 1.4.2 [π-λ-Theorem] If P is a π-system and L is a λ-
system, then P ⊆ L implies σ(P) ⊆ L.
Deﬁnition 1.4.3 [equivalence relation] An relation ∼ on a set X is
called equivalence relation if and only if
(1) x ∼ x for all x ∈ X (reﬂexivity),
(2) x ∼ y implies y ∼ x for all x, y ∈ X (symmetry),
(3) x ∼ y and y ∼ z imply x ∼ z for all x, y, z ∈ X (transitivity).
Given x, y ∈ (0, 1] and A ⊆ (0, 1], we also need the addition modulo one
x ⊕y :=

x + y if x + y ∈ (0, 1]
x + y −1 otherwise
and
A ⊕x := {a ⊕x : a ∈ A}.
Now deﬁne
L := {A ∈ B((0, 1]) :
A ⊕x ∈ B((0, 1]) and λ(A ⊕x) = λ(A) for all x ∈ (0, 1]} (1.2)
where λ is the Lebesgue measure on (0, 1].
Lemma 1.4.4 The system L from (1.2) is a λ-system.
32 CHAPTER 1. PROBABILITY SPACES
Proof. The property (1) is clear since Ω ⊕x = Ω. To check (2) let A, B ∈ L
and A ⊆ B, so that
λ(A ⊕x) = λ(A) and λ(B ⊕x) = λ(B).
We have to show that B\ A ∈ L. By the deﬁnition of ⊕ it is easy to see that
A ⊆ B implies A ⊕x ⊆ B ⊕x and
(B ⊕x) \ (A ⊕x) = (B \ A) ⊕x,
and therefore, (B \ A) ⊕ x ∈ B((0, 1]). Since λ is a probability measure it
follows
λ(B \ A) = λ(B) −λ(A)
= λ(B ⊕x) −λ(A ⊕x)
= λ((B ⊕x) \ (A ⊕x))
= λ((B \ A) ⊕x)
and B\A ∈ L. Property (3) is left as an exercise.
Finally, we need the axiom of choice.
Proposition 1.4.5 [Axiom of choice] Let I be a non-empty set and
(M
α
)
α∈I
be a system of non-empty sets M
α
. Then there is a function ϕ
on I such that
ϕ : α →m
α
∈ M
α
.
In other words, one can form a set by choosing of each set M
α
a representative
m
α
.
Proposition 1.4.6 There exists a subset H ⊆ (0, 1] which does not belong
to B((0, 1]).
Proof. We take the system L from (1.2). If (a, b] ⊆ [0, 1], then (a, b] ∈ L.
Since
P := {(a, b] : 0 ≤ a < b ≤ 1}
is a π-system which generates B((0, 1]) it follows by the π-λ-Theorem 1.4.2
that
B((0, 1]) ⊆ L.
Let us deﬁne the equivalence relation
x ∼ y if and only if x ⊕r = y for some rational r ∈ (0, 1].
Let H ⊆ (0, 1] be consisting of exactly one representative point from each
equivalence class (such set exists under the assumption of the axiom of
1.4. A SET WHICH IS NOT A BOREL SET 33
choice). Then H ⊕ r
1
and H ⊕ r
2
are disjoint for r
1
= r
2
: if they were
not disjoint, then there would exist h
1
⊕r
1
∈ (H⊕r
1
) and h
2
⊕r
2
∈ (H⊕r
2
)
with h
1
⊕ r
1
= h
2
⊕ r
2
. But this implies h
1
∼ h
2
and hence h
1
= h
2
and
r
1
= r
2
. So it follows that (0, 1] is the countable union of disjoint sets
(0, 1] =
¸
r∈(0,1] rational
(H ⊕r).
If we assume that H ∈ B((0, 1]) then B((0, 1]) ⊆ L implies H ⊕r ∈ B((0, 1])
and
λ((0, 1]) = λ

¸
¸
r∈(0,1] rational
(H ⊕r)

=
¸
r∈(0,1] rational
λ(H ⊕r).
By B((0, 1]) ⊆ L we have λ(H ⊕r) = λ(H) = a ≥ 0 for all rational numbers
r ∈ (0, 1]. Consequently,
1 = λ((0, 1]) =
¸
r∈(0,1] rational
λ(H ⊕r) = a + a + . . .
So, the right hand side can either be 0 (if a = 0) or ∞ (if a > 0). This leads
to a contradiction, so H ∈ B((0, 1]).
34 CHAPTER 1. PROBABILITY SPACES
Chapter 2
Random variables
Given a probability space (Ω, F, P), in many stochastic models functions
f : Ω → R which describe certain random phenomena are considered and
one is interested in the computation of expressions like
P({ω ∈ Ω : f(ω) ∈ (a, b)}) , where a < b.
This leads us to the condition
{ω ∈ Ω : f(ω) ∈ (a, b)} ∈ F
and hence to random variables we will introduce now.
2.1 Random variables
Deﬁnition 2.1.1 [(measurable) step-function] Let (Ω, F) be a mea-
surable space. A function f : Ω → R is called measurable step-function
or step-function, provided that there are α
1
, ..., α
n
∈ R and A
1
, ..., A
n
∈ F
such that f can be written as
f(ω) =
n
¸
i=1
α
i
1I
A
i
(ω),
where
1I
A
i
(ω) :=

1 : ω ∈ A
i
0 : ω ∈ A
i
.
Some particular examples for step-functions are
1I

= 1,
1I

= 0,
1I
A
+ 1I
A
c = 1,
35
36 CHAPTER 2. RANDOM VARIABLES
1I
A∩B
= 1I
A
1I
B
,
1I
A∪B
= 1I
A
+ 1I
B
−1I
A∩B
.
The deﬁnition above concerns only functions which take ﬁnitely many values,
which will be too restrictive in future. So we wish to extend this deﬁnition.
Deﬁnition 2.1.2 [random variables] Let (Ω, F) be a measurable space.
A map f : Ω → R is called random variable provided that there is a
sequence (f
n
)

n=1
of measurable step-functions f
n
: Ω →R such that
f(ω) = lim
n→∞
f
n
(ω) for all ω ∈ Ω.
Does our deﬁnition give what we would like to have? Yes, as we see from
Proposition 2.1.3 Let (Ω, F) be a measurable space and let f : Ω → R be
a function. Then the following conditions are equivalent:
(1) f is a random variable.
(2) For all −∞< a < b < ∞ one has that
f
−1
((a, b)) := {ω ∈ Ω : a < f(ω) < b} ∈ F.
Proof. (1) =⇒(2) Assume that
f(ω) = lim
n→∞
f
n
(ω)
where f
n
: Ω → R are measurable step-functions. For a measurable step-
function one has that
f
−1
n
((a, b)) ∈ F
so that
f
−1
((a, b)) =

ω ∈ Ω : a < lim
n
f
n
(ω) < b
¸
=

¸
m=1

¸
N=1

¸
n=N

ω ∈ Ω : a +
1
m
< f
n
(ω) < b −
1
m

∈ F.
(2) =⇒(1) First we observe that we also have that
f
−1
([a, b)) = {ω ∈ Ω : a ≤ f(ω) < b}
=

¸
m=1

ω ∈ Ω : a −
1
m
< f(ω) < b

∈ F
so that we can use the step-functions
f
n
(ω) :=
4
n
−1
¸
k=−4
n
k
2
n
1I
{
k
2
n
≤f<
k+1
2
n
}
(ω).

Sometimes the following proposition is useful which is closely connected to
Proposition 2.1.3.
2.2. MEASURABLE MAPS 37
Proposition 2.1.4 Assume a measurable space (Ω, F) and a sequence of
random variables f
n
: Ω → R such that f(ω) := lim
n
f
n
(ω) exists for all
ω ∈ Ω. Then f : Ω →R is a random variable.
The proof is an exercise.
Proposition 2.1.5 [properties of random variables] Let (Ω, F) be a
measurable space and f, g : Ω →R random variables and α, β ∈ R. Then the
following is true:
(1) (αf + βg)(ω) := αf(ω) + βg(ω) is a random variable.
(2) (fg)(ω) := f(ω)g(ω) is a random-variable.
(3) If g(ω) = 0 for all ω ∈ Ω, then

f
g

(ω) :=
f(ω)
g(ω)
is a random variable.
(4) |f| is a random variable.
Proof. (2) We ﬁnd measurable step-functions f
n
, g
n
: Ω →R such that
f(ω) = lim
n→∞
f
n
(ω) and g(ω) = lim
n→∞
g
n
(ω).
Hence
(fg)(ω) = lim
n→∞
f
n
(ω)g
n
(ω).
Finally, we remark, that f
n
(ω)g
n
(ω) is a measurable step-function. In fact,
assuming that
f
n
(ω) =
k
¸
i=1
α
i
1I
A
i
(ω) and g
n
(ω) =
l
¸
j=1
β
j
1I
B
j
(ω),
yields
(f
n
g
n
)(ω) =
k
¸
i=1
l
¸
j=1
α
i
β
j
1I
A
i
(ω)1I
B
j
(ω) =
k
¸
i=1
l
¸
j=1
α
i
β
j
1I
A
i
∩B
j
(ω)
and we again obtain a step-function, since A
i
∩ B
j
∈ F. Items (1), (3), and
(4) are an exercise.
2.2 Measurable maps
Now we extend the notion of random variables to the notion of measurable
maps, which is necessary in many considerations and even more natural.
38 CHAPTER 2. RANDOM VARIABLES
Deﬁnition 2.2.1 [measurable map] Let (Ω, F) and (M, Σ) be measurable
spaces. A map f : Ω →M is called (F, Σ)-measurable, provided that
f
−1
(B) = {ω ∈ Ω : f(ω) ∈ B} ∈ F for all B ∈ Σ.
The connection to the random variables is given by
Proposition 2.2.2 Let (Ω, F) be a measurable space and f : Ω →R. Then
the following assertions are equivalent:
(1) The map f is a random variable.
(2) The map f is (F, B(R))-measurable.
For the proof we need
Lemma 2.2.3 Let (Ω, F) and (M, Σ) be measurable spaces and let f : Ω →
M. Assume that Σ
0
⊆ Σ is a system of subsets such that σ(Σ
0
) = Σ. If
f
−1
(B) ∈ F for all B ∈ Σ
0
,
then
f
−1
(B) ∈ F for all B ∈ Σ.
Proof. Deﬁne
A :=
¸
B ⊆ M : f
−1
(B) ∈ F
¸
.
By assumption, Σ
0
⊆ A. We show that A is a σ–algebra.
(1) f
−1
(M) = Ω ∈ F implies that M ∈ A.
(2) If B ∈ A, then
f
−1
(B
c
) = {ω : f(ω) ∈ B
c
}
= {ω : f(ω) / ∈ B}
= Ω \ {ω : f(ω) ∈ B}
= f
−1
(B)
c
∈ F.
(3) If B
1
, B
2
, · · · ∈ A, then
f
−1

¸
i=1
B
i

=

¸
i=1
f
−1
(B
i
) ∈ F.
By deﬁnition of Σ = σ(Σ
0
) this implies that Σ ⊆ A, which implies our
lemma.
Proof of Proposition 2.2.2. (2) =⇒ (1) follows from (a, b) ∈ B(R) for a < b
which implies that f
−1
((a, b)) ∈ F.
(1) =⇒ (2) is a consequence of Lemma 2.2.3 since B(R) = σ((a, b) : −∞ <
a < b < ∞).
2.2. MEASURABLE MAPS 39
Example 2.2.4 If f : R → R is continuous, then f is (B(R), B(R))-
measurable.
Proof. Since f is continuous we know that f
−1
((a, b)) is open for all −∞ <
a < b < ∞, so that f
−1
((a, b)) ∈ B(R). Since the open intervals generate
B(R) we can apply Lemma 2.2.3.
Now we state some general properties of measurable maps.
Proposition 2.2.5 Let (Ω
1
, F
1
), (Ω
2
, F
2
), (Ω
3
, F
3
) be measurable spaces.
Assume that f : Ω
1
→ Ω
2
is (F
1
, F
2
)-measurable and that g : Ω
2
→ Ω
3
is
(F
2
, F
3
)-measurable. Then the following is satisﬁed:
(1) g ◦ f : Ω
1
→Ω
3
deﬁned by
(g ◦ f)(ω
1
) := g(f(ω
1
))
is (F
1
, F
3
)-measurable.
(2) Assume that P
1
is a probability measure on F
1
and deﬁne
P
2
(B
2
) := P
1
({ω
1
∈ Ω
1
: f(ω
1
) ∈ B
2
}) .
Then P
2
is a probability measure on F
2
.
The proof is an exercise.
Example 2.2.6 We want to simulate the ﬂipping of an (unfair) coin by the
random number generator: the random number generator of the computer
gives us a number which has (a discrete) uniform distribution on [0, 1]. So
we take the probability space ([0, 1], B([0, 1]), λ) and deﬁne for p ∈ (0, 1) the
random variable
f(ω) := 1I
[0,p)
(ω).
Then it holds
P
2
({1}) := P
1
({ω ∈ Ω : f(ω) = 1}) = λ([0, p)) = p,
P
2
({0}) := P
1
({ω
1
∈ Ω
1
: f(ω
1
) = 0}) = λ([p, 1]) = 1 −p.
Assume the random number generator gives out the number x. If we would
write a program such that ”output” = ”heads” in case x ∈ [0, p) and ”output”
= ”tails” in case x ∈ [p, 1], ”output” would simulate the ﬂipping of an (unfair)
coin, or in other words, ”output” has binomial distribution µ
1,p
.
Deﬁnition 2.2.7 [law of a random variable] Let (Ω, F, P) be a prob-
ability space and f : Ω →R be a random variable. Then
P
f
(B) := P({ω ∈ Ω : f(ω) ∈ B})
is called the law or image measure of the random variable f.
40 CHAPTER 2. RANDOM VARIABLES
The law of a random variable is completely characterized by its distribution
function which we introduce now.
Deﬁnition 2.2.8 [distribution-function] Given a random variable f :
Ω →R on a probability space (Ω, F, P), the function
F
f
(x) := P({ω ∈ Ω : f(ω) ≤ x})
is called distribution function of f.
Proposition 2.2.9 [Properties of distribution-functions]
The distribution-function F
f
: R →[0, 1] is a right-continuous non-decreasing
function such that
lim
x→−∞
F(x) = 0 and lim
x→∞
F(x) = 1.
Proof. (i) F is non-decreasing: given x
1
< x
2
one has that
{ω ∈ Ω : f(ω) ≤ x
1
} ⊆ {ω ∈ Ω : f(ω) ≤ x
2
}
and
F(x
1
) = P({ω ∈ Ω : f(ω) ≤ x
1
}) ≤ P({ω ∈ Ω : f(ω) ≤ x
2
}) = F(x
2
).
(ii) F is right-continuous: let x ∈ R and x
n
↓ x. Then
F(x) = P({ω ∈ Ω : f(ω) ≤ x})
= P

¸
n=1
{ω ∈ Ω : f(ω) ≤ x
n
}

= lim
n
P({ω ∈ Ω : f(ω) ≤ x
n
})
= lim
n
F(x
n
).
(iii) The properties lim
x→−∞
F(x) = 0 and lim
x→∞
F(x) = 1 are an exercise.

Proposition 2.2.10 Assume that P
1
and P
2
are probability measures on
B(R) and F
1
and F
2
are the corresponding distribution functions. Then the
following assertions are equivalent:
(1) P
1
= P
2
.
(2) F
1
(x) = P
1
((−∞, x]) = P
2
((−∞, x]) = F
2
(x) for all x ∈ R.
2.2. MEASURABLE MAPS 41
Proof. (1) ⇒(2) is of course trivial. We consider (2) ⇒(1): For sets of type
A := (a, b]
one can show that
F
1
(b) −F
1
(a) = P
1
(A) = P
2
(A) = F
2
(b) −F
2
(a).
Now one can apply Proposition 1.2.23.
Summary: Let (Ω, F) be a measurable space and f : Ω →R be a function.
Then the following relations hold true:
f
−1
(A) ∈ F for all A ∈ G
where G is one of the systems given in Proposition 1.1.8 or
any other system such that σ(G) = B(R).

¸
¸
¸
¸
Lemma 2.2.3
f is measurable: f
−1
(A) ∈ F for all A ∈ B(R)

¸
¸
¸
¸
Proposition 2.2.2
f is a random variable i.e.
there exist measurable step functions (f
n
)

n=1
i.e.
f
n
=
¸
N
n
k=1
a
n
k
1I
A
n
k
with a
n
k
∈ R and A
n
k
∈ F such that
f
n
(ω) →f(ω) for all ω ∈ Ω as n →∞.

¸
¸
¸
¸
Proposition 2.1.3
f
−1
((a, b)) ∈ F for all −∞< a < b < ∞
42 CHAPTER 2. RANDOM VARIABLES
2.3 Independence
Let us ﬁrst start with the notion of a family of independent random variables.
Deﬁnition 2.3.1 [independence of a family of random variables]
Let (Ω, F, P) be a probability space and f
i
: Ω → R, i ∈ I, be random
variables where I is a non-empty index-set. The family (f
i
)
i∈I
is called
independent provided that for all distinct i
1
, ..., i
n
∈ I, n = 1, 2, ..., and
all B
1
, ..., B
n
∈ B(R) one has that
P(f
i
1
∈ B
1
, ..., f
i
n
∈ B
n
) = P(f
i
1
∈ B
1
) · · · P(f
i
n
∈ B
n
) .
In case, we have a ﬁnite index set I, that means for example I = {1, ..., n},
then the deﬁnition above is equivalent to
Deﬁnition 2.3.2 [independence of a finite family of random vari-
ables] Let (Ω, F, P) be a probability space and f
i
: Ω → R, i = 1, . . . , n,
random variables. The random variables f
1
, . . . , f
n
are called independent
provided that for all B
1
, ..., B
n
∈ B(R) one has that
P(f
1
∈ B
1
, ..., f
n
∈ B
n
) = P(f
1
∈ B
1
) · · · P(f
n
∈ B
n
) .
The connection between the independence of random variables and of events
is obvious:
Proposition 2.3.3 Let (Ω, F, P) be a probability space and f
i
: Ω → R,
i ∈ I, be random variables where I is a non-empty index-set. Then the
following assertions are equivalent.
(1) The family (f
i
)
i∈I
is independent.
(2) For all families (B
i
)
i∈I
of Borel sets B
i
∈ B(R) one has that the events
({ω ∈ Ω : f
i
(ω) ∈ B
i
})
i∈I
are independent.
Sometimes we need to group independent random variables. In this respect
the following proposition turns out to be useful. For the following we say
that g : R
n
→R is Borel-measurable (or a Borel function) provided that g
is (B(R
n
), B(R))-measurable.
Proposition 2.3.4 [Grouping of independent random variables]
Let f
k
: Ω → R, k = 1, 2, 3, ... be independent random variables. As-
sume Borel functions g
i
: R
n
i
→ R for i = 1, 2, ... and n
i
∈ {1, 2, ...}.
Then the random variables g
1
(f
1
(ω), ..., f
n
1
(ω)), g
2
(f
n
1
+1
(ω), ..., f
n
1
+n
2
(ω)),
g
3
(f
n
1
+n
2
+1
(ω), ..., f
n
1
+n
2
+n
3
(ω)), ... are independent.
The proof is an exercise.
2.3. INDEPENDENCE 43
Proposition 2.3.5 [independence and product of laws] Assume that
(Ω, F, P) is a probability space and that f, g : Ω → R are random variables
with laws P
f
and P
g
and distribution-functions F
f
and F
g
, respectively. Then
the following assertions are equivalent:
(1) f and g are independent.
(2) P((f, g) ∈ B) = (P
f
×P
g
)(B) for all B ∈ B(R
2
).
(3) P(f ≤ x, g ≤ y) = F
f
(x)F
g
(y) for all x, y ∈ R.
The proof is an exercise.
Remark 2.3.6 Assume that there are Riemann-integrable functions p
f
, p
g
:
R →[0, ∞) such that

R
p
f
(x)dx =

R
p
g
(x)dx = 1,
F
f
(x) =

x
−∞
p
f
(y)dy, and F
g
(x) =

x
−∞
p
g
(y)dy
for all x ∈ R (one says that the distribution-functions F
f
and F
g
are abso-
lutely continuous with densities p
f
and p
g
, respectively). Then the indepen-
dence of f and g is also equivalent to the representation
F
(f,g)
(x, y) =

x
−∞

y
−∞
p
f
(u)p
g
(v)d(u)d(v).
In other words: the distribution-function of the random vector (f, g) has a
density which is the product of the densities of f and g.
Often one needs the existence of sequences of independent random variables
f
1
, f
2
, ... : Ω → R having a certain distribution. How to construct such
sequences? First we let
Ω := R
N
= {x = (x
1
, x
2
, ...) : x
n
∈ R} .
Then we deﬁne the projections π
n
: R
N
→R given by
π
n
(x) := x
n
,
that means π
n
ﬁlters out the n-th coordinate. Now we take the smallest
σ-algebra such that all these projections are random variables, that means
we take
B(R
N
) = σ

π
−1
n
(B) : n = 1, 2, ..., B ∈ B(R)

,
44 CHAPTER 2. RANDOM VARIABLES
see Proposition 1.1.6. Finally, let P
1
, P
2
, ... be a sequence of probability
measures on B(R). Using Carath
´
eodory’s extension theorem (Proposition
1.2.17) we ﬁnd an unique probability measure P on B(R
N
) such that
P(B
1
×B
2
×· · · ×B
n
×R ×R ×· · · ) = P
1
(B
1
) · · · P
n
(B
n
)
for all n = 1, 2, ... and B
1
, ..., B
n
∈ B(R), where
B
1
×B
2
×· · · ×B
n
×R ×R ×· · · :=
¸
x ∈ R
N
: x
1
∈ B
1
, ..., x
n
∈ B
n
¸
.
Proposition 2.3.7 [Realization of independent random variab-
les] Let (R
N
, B(R
N
), P) and π
n
: R
N
→R be deﬁned as above. Then (π
n
)

n=1
is a sequence of independent random variables such that the law of π
n
is P
n
,
that means
P(π
n
∈ B) = P
n
(B)
for all B ∈ B(R).
Proof. Take Borel sets B
1
, ..., B
n
∈ B(R). Then
P({x : π
1
(x) ∈ B
1
, ..., π
n
(x) ∈ B
n
})
= P(B
1
×B
2
×· · · ×B
n
×R ×R ×· · · )
= P
1
(B
1
) · · · P
n
(B
n
)
=
n
¸
k=1
P(R ×· · · ×R ×B
k
×R ×· · · )
=
n
¸
k=1
P({x : π
k
(x) ∈ B
k
}).

Chapter 3
Integration
Given a probability space (Ω, F, P) and a random variable f : Ω → R, we
deﬁne the expectation or integral
Ef =

fdP =

f(ω)dP(ω)
and investigate its basic properties.
3.1 Deﬁnition of the expected value
The deﬁnition of the integral is done within three steps.
Deﬁnition 3.1.1 [step one, f is a step-function] Given a probability
space (Ω, F, P) and an F-measurable g : Ω →R with representation
g =
n
¸
i=1
α
i
1I
A
i
where α
i
∈ R and A
i
∈ F, we let
Eg =

gdP =

g(ω)dP(ω) :=
n
¸
i=1
α
i
P(A
i
).
We have to check that the deﬁnition is correct, since it might be that diﬀerent
representations give diﬀerent expected values Eg. However, this is not the
case as shown by
Lemma 3.1.2 Assuming measurable step-functions
g =
n
¸
i=1
α
i
1I
A
i
=
m
¸
j=1
β
j
1I
B
j
,
one has that
¸
n
i=1
α
i
P(A
i
) =
¸
m
j=1
β
j
P(B
j
).
45
46 CHAPTER 3. INTEGRATION
Proof. By subtracting in both equations the right-hand side from the left-
hand one we only need to show that
n
¸
i=1
α
i
1I
A
i
= 0
implies that
n
¸
i=1
α
i
P(A
i
) = 0.
By taking all possible intersections of the sets A
i
complements we ﬁnd a system of sets C
1
, ..., C
N
∈ F such that
(a) C
j
∩ C
k
= ∅ if j = k,
(b)
¸
N
j=1
C
j
= Ω,
(c) for all A
i
there is a set I
i
⊆ {1, ..., N} such that A
i
=
¸
j∈I
i
C
j
.
Now we get that
0 =
n
¸
i=1
α
i
1I
A
i
=
n
¸
i=1
¸
j∈I
i
α
i
1I
C
j
=
N
¸
j=1

¸
i:j∈I
i
α
i

1I
C
j
=
N
¸
j=1
γ
j
1I
C
j
so that γ
j
= 0 if C
j
= ∅. From this we get that
n
¸
i=1
α
i
P(A
i
) =
n
¸
i=1
¸
j∈I
i
α
i
P(C
j
) =
N
¸
j=1

¸
i:j∈I
i
α
i

P(C
j
) =
N
¸
j=1
γ
j
P(C
j
) = 0.

Proposition 3.1.3 Let (Ω, F, P) be a probability space and f, g : Ω → R be
measurable step-functions. Given α, β ∈ R one has that
E(αf + βg) = αEf + βEg.
Proof. The proof follows immediately from Lemma 3.1.2 and the deﬁnition
of the expected value of a step-function since, for
f =
n
¸
i=1
α
i
1I
A
i
and g =
m
¸
j=1
β
j
1I
B
j
,
one has that
αf + βg = α
n
¸
i=1
α
i
1I
A
i
+ β
m
¸
j=1
β
j
1I
B
j
and
E(αf + βg) = α
n
¸
i=1
α
i
P(A
i
) + β
m
¸
j=1
β
j
P(B
j
) = αEf + βEg.

3.1. DEFINITION OF THE EXPECTED VALUE 47
Deﬁnition 3.1.4 [step two, f is non-negative] Given a probability
space (Ω, F, P) and a random variable f : Ω → R with f(ω) ≥ 0 for all
ω ∈ Ω. Then
Ef =

fdP =

f(ω)dP(ω)
:= sup {Eg : 0 ≤ g(ω) ≤ f(ω), g is a measurable step-function} .
Note that in this deﬁnition the case Ef = ∞ is allowed. In the last step
we will deﬁne the expectation for a general random variable. To this end we
decompose a random variable f : Ω →R into its positive and negative part
f(ω) = f
+
(ω) −f

(ω)
with
f
+
(ω) := max {f(ω), 0} ≥ 0 and f

(ω) := max {−f(ω), 0} ≥ 0.
Deﬁnition 3.1.5 [step three, f is general] Let (Ω, F, P) be a proba-
bility space and f : Ω →R be a random variable.
(1) If Ef
+
< ∞ or Ef

< ∞, then we say that the expected value of f
exists and set
Ef := Ef
+
−Ef

∈ [−∞, ∞].
(2) The random variable f is called integrable provided that
Ef
+
< ∞ and Ef

< ∞.
(3) If the expected value of f exists and A ∈ F, then

A
fdP =

A
f(ω)dP(ω) :=

f(ω)1I
A
(ω)dP(ω).
The expression Ef is called expectation or expected value of the random
variable f.
Remark 3.1.6 The fundamental Lebesgue-integral on the real line can be
introduced by the means, we have to our disposal so far, as follows: Assume
a function f : R → R which is (B(R), B(R))-measurable. Let f
n
: (n −
1, n] →R be the restriction of f which is a random variable with respect to
B((n − 1, n]) = B
(n−1,n]
. In Section 1.3.4 we have introduced the Lebesgue
measure λ = λ
n
on (n −1, n]. Assume that f
n
is integrable on (n −1, n] for
all n = 1, 2, ... and that

¸
n=−∞

(n−1,n]
|f(x)|dλ(x) < ∞,
48 CHAPTER 3. INTEGRATION
then f : R → R is called integrable with respect to the Lebesgue-measure
on the real line and the Lebesgue-integral is deﬁned by

R
f(x)dλ(x) :=

¸
n=−∞

(n−1,n]
f(x)dλ(x).
Now we go the opposite way: Given a Borel set I and a map f : I →R which
is (B
I
, B(R)) measurable, we can extend f to a (B(R), B(R))-measurable
function
¯
f by
¯
f(x) := f(x) if x ∈ I and
¯
f(x) := 0 if x ∈ I. If
¯
f is
Lebesgue-integrable, then we deﬁne

I
f(x)dλ(x) :=

R
¯
f(x)dλ(x).
Example 3.1.7 A basic example for our integration is as follows: Let Ω =

1
, ω
2
, ...}, F := 2

, and P({ω
n
}) = q
n
∈ [0, 1] with
¸

n=1
q
n
= 1. Given
f : Ω →R we get that f is integrable if and only if

¸
n=1
|f(ω
n
)|q
n
< ∞,
and the expected value exists if either
¸
{n:f(ω
n
)≥0}
f(ω
n
)q
n
< ∞ or
¸
{n:f(ω
n
)≤0}
(−f(ω
n
))q
n
< ∞.
If the expected value exists, then it computes to
Ef =

¸
n=1
f(ω
n
)q
n
∈ [−∞, ∞].
A simple example for the expectation is the expected value while rolling a
die:
Example 3.1.8 Assume that Ω := {1, 2, . . . , 6}, F := 2

, and P({k}) :=
1
6
,
which models rolling a die. If we deﬁne f(k) = k, i.e.
f(k) :=
6
¸
i=1
i1I
{i}
(k),
then f is a measurable step-function and it follows that
Ef =
6
¸
i=1
iP({i}) =
1 + 2 +· · · + 6
6
= 3.5.
3.2. BASIC PROPERTIES OF THE EXPECTED VALUE 49
Besides the expected value, the variance is often of interest.
Deﬁnition 3.1.9 [variance] Let (Ω, F, P) be a probability space and f :
Ω →R be an integrable random variable. Then
var(f) = σ
2
f
= E[f −Ef]
2
∈ [0, ∞]
is called variance.
Let us summarize some simple properties:
Proposition 3.1.10 (1) If f is integrable and α, c ∈ R, then
var(αf −c) = α
2
var(f).
(2) If Ef
2
< ∞ then var(f) = Ef
2
−(Ef)
2
< ∞.
Proof. (1) follows from
var(αf −c) = E[(αf −c) −E(αf −c)]
2
= E[αf −αEf]
2
= α
2
var(f).
(2) First we remark that E|f| ≤ (Ef
2
)
1
2
as we shall see later by H¨older’s in-
equality (Corollary 3.6.6), that means any square integrable random variable
is integrable. Then we simply get that
var(f) = E[f −Ef]
2
= Ef
2
−2E(fEf) + (Ef)
2
= Ef
2
−2(Ef)
2
+ (Ef)
2
.

3.2 Basic properties of the expected value
We say that a property P(ω), depending on ω, holds P-almost surely or
almost surely (a.s.) if
{ω ∈ Ω : P(ω) holds}
belongs to F and is of measure one. Let us start with some ﬁrst properties
of the expected value.
Proposition 3.2.1 Assume a probability space (Ω, F, P) and random vari-
ables f, g : Ω →R.
(1) If 0 ≤ f(ω) ≤ g(ω), then 0 ≤ Ef ≤ Eg.
(2) The random variable f is integrable if and only if |f| is integrable. In
this case one has
|Ef| ≤ E|f|.
50 CHAPTER 3. INTEGRATION
(3) If f = 0 a.s., then Ef = 0.
(4) If f ≥ 0 a.s. and Ef = 0, then f = 0 a.s.
(5) If f = g a.s. and Ef exists, then Eg exists and Ef = Eg.
Proof. (1) follows directly from the deﬁnition. Property (2) can be seen
as follows: by deﬁnition, the random variable f is integrable if and only if
Ef
+
< ∞ and Ef

< ∞. Since
¸
ω ∈ Ω : f
+
(ω) = 0
¸

¸
ω ∈ Ω : f

(ω) = 0
¸
= ∅
and since both sets are measurable, it follows that |f| = f
+
+f

is integrable
if and only if f
+
and f

are integrable and that
|Ef| = |Ef
+
−Ef

| ≤ Ef
+
+Ef

= E|f|.
(3) If f = 0 a.s., then f
+
= 0 a.s. and f

= 0 a.s., so that we can restrict
ourselves to the case f(ω) ≥ 0. If g is a measurable step-function with
g =
¸
n
k=1
a
k
1I
A
k
, g(ω) ≥ 0, and g = 0 a.s., then a
k
= 0 implies P(A
k
) = 0.
Hence
Ef = sup {Eg : 0 ≤ g ≤ f, g is a measurable step-function} = 0
since 0 ≤ g ≤ f implies g = 0 a.s. Properties (4) and (5) are exercises.
The next lemma is useful later on. In this lemma we use, as an approximation
for f, the so-called staircase-function. This idea was already exploited in the
proof of Proposition 2.1.3.
Lemma 3.2.2 Let (Ω, F, P) be a probability space and f : Ω → R be a
random variable.
(1) Then there exists a sequence of measurable step-functions f
n
: Ω → R
such that, for all n = 1, 2, . . . and for all ω ∈ Ω,
|f
n
(ω)| ≤ |f
n+1
(ω)| ≤ |f(ω)| and f(ω) = lim
n→∞
f
n
(ω).
If f(ω) ≥ 0 for all ω ∈ Ω, then one can arrange f
n
(ω) ≥ 0 for all
ω ∈ Ω.
(2) If f ≥ 0 and if (f
n
)

n=1
is a sequence of measurable step-functions with
0 ≤ f
n
(ω) ↑ f(ω) for all ω ∈ Ω as n →∞, then
Ef = lim
n→∞
Ef
n
.
3.2. BASIC PROPERTIES OF THE EXPECTED VALUE 51
Proof. (1) It is easy to verify that the staircase-functions
f
n
(ω) :=
4
n
−1
¸
k=0
k
2
n
1I
{
k
2
n
≤f<
k+1
2
n
}
(ω) +
−1
¸
k=−4
n
k + 1
2
n
1I
{
k
2
n
≤f<
k+1
2
n
}
(ω)
fulﬁll all the conditions.
(2) Letting
f
0
n
(ω) :=
4
n
−1
¸
k=0
k
2
n
1I
{
k
2
n
≤f<
k+1
2
n
}
(ω)
we get 0 ≤ f
0
n
(ω) ↑ f(ω) for all ω ∈ Ω. On the other hand, by the deﬁnition
of the expectation there exists a sequence 0 ≤ g
n
(ω) ≤ f(ω) of measurable
step-functions such that Eg
n
↑ Ef. Hence
h
n
:= max
¸
f
0
n
, g
1
, . . . , g
n
¸
is a measurable step-function with 0 ≤ g
n
(ω) ≤ h
n
(ω) ↑ f(ω), so that
Eg
n
≤ Eh
n
≤ Ef, and lim
n→∞
Eg
n
= lim
n→∞
Eh
n
= Ef.
Now we will show that for every sequence (f
k
)

k=1
of measurable step-
functions with 0 ≤ f
k
↑ f it holds lim
k→∞
Ef
k
= Ef. Consider
d
k,n
:= f
k
∧ h
n
.
Clearly, d
k,n
↑ f
k
as n →∞ and d
k,n
↑ h
n
as k →∞. Let
z
k,n
:= arctan Ed
k,n
so that 0 ≤ z
k,n
≤ 1. Since (z
k,n
)

k=1
is increasing for ﬁxed n and (z
k,n
)

n=1
is
increasing for ﬁxed k one quickly checks that
lim
k
lim
n
z
k,n
= lim
n
lim
k
z
k,n
.
Hence
Ef = lim
n
Eh
n
= lim
n
lim
k
Ed
k,n
= lim
k
lim
n
Ed
k,n
= lim
k
Ef
k
where we have used the following fact: if 0 ≤ ϕ
n
(ω) ↑ ϕ(ω) for step-functions
ϕ
n
and ϕ, then
lim
n

n
= Eϕ.
To check this, it is suﬃcient to assume that ϕ(ω) = 1I
A
(ω) for some A ∈ F.
Let ε ∈ (0, 1) and
B
n
ε
:= {ω ∈ A : 1 −ε ≤ ϕ
n
(ω)} .
52 CHAPTER 3. INTEGRATION
Then
(1 −ε)1I
B
n
ε
(ω) ≤ ϕ
n
(ω) ≤ 1I
A
(ω).
Since B
n
ε
⊆ B
n+1
ε
and
¸

n=1
B
n
ε
= A we get, by the monotonicity of the
measure, that lim
n
P(B
n
ε
) = P(A) so that
(1 −ε)P(A) ≤ lim
n

n
.
Since this is true for all ε > 0 we get
Eϕ = P(A) ≤ lim
n

n
≤ Eϕ
and are done.
Now we continue with some basic properties of the expectation.
Proposition 3.2.3 [properties of the expectation] Let (Ω, F, P) be a
probability space and f, g : Ω →R be random variables such that Ef and Eg
exist.
(1) If f ≥ 0 and g ≥ 0, then E(f + g) = Ef +Eg.
(2) If c ∈ R, then E(cf) exists and E(cf) = cEf.
(3) If Ef
+
+ Eg
+
< ∞ or Ef

+ Eg

< ∞, then E(f + g)
+
< ∞ or
E(f + g)

< ∞ and E(f + g) = Ef +Eg.
(4) If f ≤ g, then Ef ≤ Eg.
(5) If f and g are integrable and a, b ∈ R, then af + bg is integrable and
aEf + bEg = E(af + bg).
Proof. (1) Here we use Lemma 3.2.2 (2) by ﬁnding step-functions 0 ≤ f
n
(ω) ↑
f(ω) and 0 ≤ g
n
(ω) ↑ g(ω) such that 0 ≤ f
n
(ω) + g
n
(ω) ↑ f(ω) + g(ω) and
E(f + g) = lim
n
E(f
n
+ g
n
) = lim
n
(Ef
n
+Eg
n
) = Ef +Eg
by Proposition 3.1.3.
(2) is an exercise.
(3) We only consider the case that Ef
+
+Eg
+
< ∞. Because of (f +g)
+

f
+
+ g
+
one gets that E(f + g)
+
< ∞. Moreover, one quickly checks that
(f + g)
+
+ f

+ g

= f
+
+ g
+
+ (f + g)

so that Ef

+Eg

= ∞if and only if E(f +g)

= ∞if and only if Ef +Eg =
E(f +g) = −∞. Assuming that Ef

+Eg

< ∞ gives that E(f +g)

< ∞
and
E[(f + g)
+
+ f

+ g

] = E[f
+
+ g
+
+ (f + g)

]
3.2. BASIC PROPERTIES OF THE EXPECTED VALUE 53
which implies that E(f + g) = Ef +Eg because of (1).
(4) If Ef

= ∞ or Eg
+
= ∞, then Ef = −∞ or Eg = ∞ so that nothing is
to prove. Hence assume that Ef

< ∞ and Eg
+
< ∞. The inequality f ≤ g
gives 0 ≤ f
+
≤ g
+
and 0 ≤ g

≤ f

so that f and g are integrable and
Ef = Ef
+
−Ef

≤ Eg
+
−Eg

= Eg.
(5) Since (af + bg)
+
≤ |a||f| + |b||g| and (af + bg)

≤ |a||f| + |b||g| we get
that af +bg is integrable. The equality for the expected values follows from
(2) and (3).
Proposition 3.2.4 [monotone convergence] Let (Ω, F, P) be a proba-
bility space and f, f
1
, f
2
, ... : Ω →R be random variables.
(1) If 0 ≤ f
n
(ω) ↑ f(ω) a.s., then lim
n
Ef
n
= Ef.
(2) If 0 ≥ f
n
(ω) ↓ f(ω) a.s., then lim
n
Ef
n
= Ef.
Proof. (a) First suppose
0 ≤ f
n
(ω) ↑ f(ω) for all ω ∈ Ω.
For each f
n
take a sequence of step functions (f
n,k
)
k≥1
such that 0 ≤ f
n,k
↑ f
n
,
as k →∞. Setting
h
N
:= max
1≤k≤N
1≤n≤N
f
n,k
we get h
N−1
≤ h
N
≤ max
1≤n≤N
f
n
= f
N
. Deﬁne h := lim
N→∞
h
N
. For
1 ≤ n ≤ N it holds that
f
n,N
≤ h
N
≤ f
N
so that, by N →∞,
f
n
≤ h ≤ f,
and therefore
f = lim
n→∞
f
n
≤ h ≤ f.
Since h
N
is a step function for each N and h
N
↑ f we have by Lemma 3.2.2
that lim
N→∞
Eh
N
= Ef and therefore, since h
N
≤ f
N
,
Ef ≤ lim
N→∞
Ef
N
.
On the other hand, f
n
≤ f
n+1
≤ f implies Ef
n
≤ Ef and hence
lim
n→∞
Ef
n
≤ Ef.
(b) Now let 0 ≤ f
n
(ω) ↑ f(ω) a.s. By deﬁnition, this means that
0 ≤ f
n
(ω) ↑ f(ω) for all ω ∈ Ω \ A,
54 CHAPTER 3. INTEGRATION
where P(A) = 0. Hence 0 ≤ f
n
(ω)1I
A
c(ω) ↑ f(ω)1I
A
c(ω) for all ω and step (a)
implies that
lim
n
Ef
n
1I
A
c = Ef1I
A
c.
Since f
n
1I
A
c = f
n
a.s. and f1I
A
c = f a.s. we get E(f
n
1I
A
c) = Ef
n
and
E(f1I
A
c) = Ef by Proposition 3.2.1 (5).
(c) Assertion (2) follows from (1) since 0 ≥ f
n
↓ f implies 0 ≤ −f
n
↑ −f.
Corollary 3.2.5 Let (Ω, F, P) be a probability space and g, f, f
1
, f
2
, ... : Ω →
R be random variables, where g is integrable. If
(1) g(ω) ≤ f
n
(ω) ↑ f(ω) a.s. or
(2) g(ω) ≥ f
n
(ω) ↓ f(ω) a.s.,
then lim
n→∞
Ef
n
= Ef.
Proof. We only consider (1). Let h
n
:= f
n
−g and h := f −g. Then
0 ≤ h
n
(ω) ↑ h(ω) a.s.
Proposition 3.2.4 implies that lim
n
Eh
n
= Eh. Since f

n
and f

are integrable
Proposition 3.2.3 (1) implies that Eh
n
= Ef
n
− Eg and Eh = Ef − Eg so
that we are done.
In Deﬁnition 1.2.7 we deﬁned limsup
n→∞
a
n
for a sequence (a
n
)

n=1
⊂ R. Nat-
urally, one can extend this deﬁnition to a sequence (f
n
)

n=1
of random vari-
ables: For any ω ∈ Ω limsup
n→∞
f
n
(ω) and liminf
n→∞
f
n
(ω) exist. Hence
limsup
n→∞
f
n
and liminf
n→∞
f
n
are random variables.
Proposition 3.2.6 [Lemma of Fatou] Let (Ω, F, P) be a probability space
and g, f
1
, f
2
, ... : Ω →R be random variables with |f
n
(ω)| ≤ g(ω) a.s. Assume
that g is integrable. Then limsup f
n
and liminf f
n
are integrable and one has
that
Eliminf
n→∞
f
n
≤ liminf
n→∞
Ef
n
≤ limsup
n→∞
Ef
n
≤ Elimsup
n→∞
f
n
.
Proof. We only prove the ﬁrst inequality. The second one follows from the
deﬁnition of limsup and liminf, the third one can be proved like the ﬁrst
one. So we let
Z
k
:= inf
n≥k
f
n
so that Z
k
↑ liminf
n
f
n
and, a.s.,
|Z
k
| ≤ g and | liminf
n
f
n
| ≤ g.
3.2. BASIC PROPERTIES OF THE EXPECTED VALUE 55
Applying monotone convergence in the form of Corollary 3.2.5 gives that
Eliminf
n
f
n
= lim
k
EZ
k
= lim
k

E inf
n≥k
f
n

≤ lim
k

inf
n≥k
Ef
n

= liminf
n
Ef
n
.

Proposition 3.2.7 [Lebesgue’s Theorem, dominated convergence]
Let (Ω, F, P) be a probability space and g, f, f
1
, f
2
, ... : Ω → R be random
variables with |f
n
(ω)| ≤ g(ω) a.s. Assume that g is integrable and that
f(ω) = lim
n→∞
f
n
(ω) a.s. Then f is integrable and one has that
Ef = lim
n
Ef
n
.
Proof. Applying Fatou’s Lemma gives
Ef = Eliminf
n→∞
f
n
≤ liminf
n→∞
Ef
n
≤ limsup
n→∞
Ef
n
≤ Elimsup
n→∞
f
n
= Ef.

Finally, we state a useful formula for independent random variables.
Proposition 3.2.8 If f and g are independent and E|f| < ∞and E|g| < ∞,
then E|fg| < ∞ and
Efg = EfEg.
The proof is an exercise. Concerning the variance of the sum of independent
random variables we get the fundamental
Proposition 3.2.9 Let f
1
, ..., f
n
be independent random variables with ﬁnite
second moment. Then one has that
var(f
1
+· · · + f
n
) = var(f
1
) +· · · + var(f
n
).
Proof. The formula follows from
var(f
1
+· · · + f
n
) = E((f
1
+· · · + f
n
) −E(f
1
+· · · + f
n
))
2
= E

n
¸
i=1
(f
i
−Ef
i
)

2
= E
n
¸
i,j=1
(f
i
−Ef
i
)(f
j
−Ef
j
)
=
n
¸
i,j=1
E((f
i
−Ef
i
)(f
j
−Ef
j
))
56 CHAPTER 3. INTEGRATION
=
n
¸
i=1
E(f
i
−Ef
i
)
2
+
¸
i=j
E((f
i
−Ef
i
)(f
j
−Ef
j
))
=
n
¸
i=1
var(f
i
) +
¸
i=j
E(f
i
−Ef
i
)E(f
j
−Ef
j
)
=
n
¸
i=1
var(f
i
)
because E(f
i
−Ef
i
) = Ef
i
−Ef
i
= 0.
3.3 Connections to the Riemann-integral
In two typical situations we formulate (without proof) how our expected
value connects to the Riemann-integral. For this purpose we use the Lebesgue
measure deﬁned in Section 1.3.4.
Proposition 3.3.1 Let f : [0, 1] →R be a continuous function. Then

1
0
f(x)dx = Ef
with the Riemann-integral on the left-hand side and the expectation of the
random variable f with respect to the probability space ([0, 1], B([0, 1]), λ),
where λ is the Lebesgue measure, on the right-hand side.
Now we consider a continuous function p : R →[0, ∞) such that

−∞
p(x)dx = 1
and deﬁne a measure P on B(R) by
P((a
1
, b
1
] ∩ · · · ∩ (a
n
, b
n
]) :=
n
¸
i=1

b
i
a
i
p(x)dx
for −∞ ≤ a
1
≤ b
1
≤ · · · ≤ a
n
≤ b
n
≤ ∞ (again with the convention that
(a, ∞] = (a, ∞)) via Carath´eodory’s Theorem (Proposition 1.2.17). The
function p is called density of the measure P.
Proposition 3.3.2 Let f : R →R be a continuous function such that

−∞
|f(x)|p(x)dx < ∞.
3.3. CONNECTIONS TO THE RIEMANN-INTEGRAL 57
Then

−∞
f(x)p(x)dx = Ef
with the Riemann-integral on the left-hand side and the expectation of the
random variable f with respect to the probability space (R, B(R), P) on the
right-hand side.
Let us consider two examples indicating the diﬀerence between the Riemann-
integral and our expected value.
Example 3.3.3 We give the standard example of a function which has an
expected value, but which is not Riemann-integrable. Let
f(x) :=

1, x ∈ [0, 1] irrational
0, x ∈ [0, 1] rational
.
Then f is not Riemann integrable, but Lebesgue integrable with Ef = 1 if
we use the probability space ([0, 1], B([0, 1]), λ).
Example 3.3.4 The expression
lim
t→∞

t
0
sin x
x
dx =
π
2
is deﬁned as limit in the Riemann sense although

0

sin x
x

+
dx = ∞ and

0

sin x
x

dx = ∞.
Transporting this into a probabilistic setting we take the exponential distri-
bution with parameter λ > 0 from Section 1.3.6. Let f : R → R be given
by f(x) = 0 if x ≤ 0 and f(x) :=
sin x
λx
e
λx
if x > 0 and recall that the
exponential distribution µ
λ
with parameter λ > 0 is given by the density
p
λ
(x) = 1I
[0,∞)
(x)λe
−λx
. The above yields that
lim
t→∞

t
0
f(x)p
λ
(x)dx =
π
2
but

R
f(x)
+

λ
(x) =

R
f(x)

λ
(x) = ∞.
Hence the expected value of f does not exists, but the Riemann-integral gives
a way to deﬁne a value, which makes sense. The point of this example is
rather abstract expected value.
58 CHAPTER 3. INTEGRATION
3.4 Change of variables in the expected value
We want to prove a change of variable formula for the integrals

fdP. In
many cases, only by this formula it is possible to compute explicitly expected
values.
Proposition 3.4.1 [Change of variables] Let (Ω, F, P) be a probability
space, (E, E) be a measurable space, ϕ : Ω → E be a measurable map, and
g : E →R be a random variable. Assume that P
ϕ
is the image measure of P
with respect to ϕ
1
, that means
P
ϕ
(A) = P({ω : ϕ(ω) ∈ A}) = P(ϕ
−1
(A)) for all A ∈ E.
Then

A
g(η)dP
ϕ
(η) =

ϕ
−1
(A)
g(ϕ(ω))dP(ω)
for all A ∈ E in the sense that if one integral exists, the other exists as well,
and their values are equal.
Proof. (i) Letting ¯ g(η) := 1I
A
(η)g(η) we have
¯ g(ϕ(ω)) = 1I
ϕ
−1
(A)
(ω)g(ϕ(ω))
so that it is suﬃcient to consider the case A = E. Hence we have to show
that

E
g(η)dP
ϕ
(η) =

g(ϕ(ω))dP(ω).
(ii) Since, for f(ω) := g(ϕ(ω)) one has that f
+
= g
+
◦ϕ and f

= g

◦ϕ it is
suﬃcient to consider the positive part of g and its negative part separately.
In other words, we can assume that g(η) ≥ 0 for all η ∈ E.
(iii) Assume now a sequence of measurable step-function 0 ≤ g
n
(η) ↑ g(η)
for all η ∈ E which does exist according to Lemma 3.2.2 so that g
n
(ϕ(ω)) ↑
g(ϕ(ω)) for all ω ∈ Ω as well. If we can show that

E
g
n
(η)dP
ϕ
(η) =

g
n
(ϕ(ω))dP(ω)
then we are done. By additivity it is enough to check g
n
(η) = 1I
B
(η) for some
B ∈ E (if this is true for this case, then one can multiply by real numbers
and can take sums and the equality remains true). But now we get

E
g
n
(η)dP
ϕ
(η) = P
ϕ
(B) = P(ϕ
−1
(B)) =

1I
ϕ
−1
(B)
(ω)dP(ω)
=

1I
B
(ϕ(ω))dP(ω) =

g
n
(ϕ(ω))dP(ω).

Let us give an example for the change of variable formula.
1
In other words, P
ϕ
is the law of ϕ.
3.5. FUBINI’S THEOREM 59
Deﬁnition 3.4.2 [Moments] Assume that n ∈ {1, 2, ....}.
(1) For a random variable f : Ω → R the expected value E|f|
n
is called
n-th absolute moment of f. If Ef
n
exists, then Ef
n
is called n-th
moment of f.
(2) For a probability measure µ on (R, B(R)) the expected value

R
|x|
n
dµ(x)
is called n-th absolute moment of µ. If

R
x
n
dµ(x) exists, then

R
x
n
dµ(x) is called n-th moment of µ.
Corollary 3.4.3 Let (Ω, F, P) be a probability space and f : Ω → R be a
random variable with law P
f
. Then, for all n = 1, 2, ...,
E|f|
n
=

R
|x|
n
dP
f
(x) and Ef
n
=

R
x
n
dP
f
(x),
where the latter equality has to be understood as follows: if one side exists,
then the other exists as well and they coincide. If the law P
f
has a density
p in the sense of Proposition 3.3.2, then

R
|x|
n
dP
f
(x) can be replaced by

R
|x|
n
p(x)dx and

R
x
n
dP
f
(x) by

R
x
n
p(x)dx.
3.5 Fubini’s Theorem
In this section we consider iterated integrals, as they appear very often in
applications, and show in Fubini’s
2
Theorem that integrals with respect
to product measures can be written as iterated integrals and that one can
change the order of integration in these iterated integrals. In many cases
this provides an appropriate tool for the computation of integrals. Before we
start with Fubini’s Theorem we need some preparations. First we recall the
notion of a vector space.
Deﬁnition 3.5.1 [vector space] A set L equipped with operations ”+” :
L×L →L and ”·” : R×L →L is called vector space over R if the following
conditions are satisﬁed:
(1) x + y = y + x for all x, y ∈ L.
(2) x + (y + z) = (x + y) + z for all x, y, z ∈ L.
(3) There exists a 0 ∈ L such that x + 0 = x for all x ∈ L.
(4) For all x ∈ L there exists a −x such that x + (−x) = 0.
2
Guido Fubini, 19/01/1879 (Venice, Italy) - 06/06/1943 (New York, USA).
60 CHAPTER 3. INTEGRATION
(5) 1 x = x.
(6) α(βx) = (αβ)x for all α, β ∈ R and x ∈ L.
(7) (α + β)x = αx + βx for all α, β ∈ R and x ∈ L.
(8) α(x + y) = αx + αy for all α ∈ R and x, y ∈ L.
Usually one uses the notation x−y := x+(−y) and −x+y := (−x) +y etc.
Now we state the Monotone Class Theorem. It is a powerful tool by which,
for example, measurability assertions can be proved.
Proposition 3.5.2 [Monotone Class Theorem] Let H be a class of
bounded functions from Ω into R satisfying the following conditions:
(1) H is a vector space over R where the natural point-wise operations ”+”
and ” · ” are used.
(2) 1I

∈ H.
(3) If f
n
∈ H, f
n
≥ 0, and f
n
↑ f, where f is bounded on Ω, then f ∈ H.
Then one has the following: if H contains the indicator function of every
set from some π-system I of subsets of Ω, then H contains every bounded
σ(I)-measurable function on Ω.
Proof. See for example [5] (Theorem 3.14).
For the following it is convenient to allow that the random variables may
take inﬁnite values.
Deﬁnition 3.5.3 [extended random variable] Let (Ω, F) be a measur-
able space. A function f : Ω → R ∪ {−∞, ∞} is called extended random
variable iﬀ
f
−1
(B) := {ω : f(ω) ∈ B} ∈ F for all B ∈ B(R) or B = {−∞}.
If we have a non-negative extended random variable, we let (for example)

fdP = lim
N→∞

[f ∧ N]dP.
For the following, we recall that the product space (Ω
1
×Ω
2
, F
1
⊗F
2
, P
1
×P
2
)
of the two probability spaces (Ω
1
, F
1
, P
1
) and (Ω
2
, F
2
, P
2
) was deﬁned in
Deﬁnition 1.2.19.
3.5. FUBINI’S THEOREM 61
Proposition 3.5.4 [Fubini’s Theorem for non-negative functions]
Let f : Ω
1
× Ω
2
→ R be a non-negative F
1
⊗ F
2
-measurable function such
that

1
×Ω
2
f(ω
1
, ω
2
)d(P
1
×P
2
)(ω
1
, ω
2
) < ∞. (3.1)
Then one has the following:
(1) The functions ω
1
→ f(ω
1
, ω
0
2
) and ω
2
→ f(ω
0
1
, ω
2
) are F
1
-measurable
and F
2
-measurable, respectively, for all ω
0
i
∈ Ω
i
.
(2) The functions
ω
1

2
f(ω
1
, ω
2
)dP
2

2
) and ω
2

1
f(ω
1
, ω
2
)dP
1

1
)
are extended F
1
-measurable and F
2
-measurable, respectively, random
variables.
(3) One has that

1
×Ω
2
f(ω
1
, ω
2
)d(P
1
×P
2
) =

1
¸

2
f(ω
1
, ω
2
)dP
2

2
)

dP
1

1
)
=

2
¸

1
f(ω
1
, ω
2
)dP
1

1
)

dP
2

2
).
It should be noted, that item (3) together with Formula (3.1) automatically
implies that
P
2

ω
2
:

1
f(ω
1
, ω
2
)dP
1

1
) = ∞

= 0
and
P
1

ω
1
:

2
f(ω
1
, ω
2
)dP
2

2
) = ∞

= 0.
Proof of Proposition 3.5.4.
(i) First we remark it is suﬃcient to prove the assertions for
f
N

1
, ω
2
) := min {f(ω
1
, ω
2
), N}
which is bounded. The statements (1), (2), and (3) can be obtained via
N → ∞ if we use Proposition 2.1.4 to get the necessary measurabilities
(which also works for our extended random variables) and the monotone
convergence formulated in Proposition 3.2.4 to get to values of the integrals.
Hence we can assume for the following that sup
ω
1

2
f(ω
1
, ω
2
) < ∞.
(ii) We want to apply the Monotone Class Theorem Proposition 3.5.2. Let
H be the class of bounded F
1
× F
2
-measurable functions f : Ω
1
× Ω
2
→ R
such that
62 CHAPTER 3. INTEGRATION
(a) the functions ω
1
→ f(ω
1
, ω
0
2
) and ω
2
→ f(ω
0
1
, ω
2
) are F
1
-measurable
and F
2
-measurable, respectively, for all ω
0
i
∈ Ω
i
,
(b) the functions
ω
1

2
f(ω
1
, ω
2
)dP
2

2
) and ω
2

1
f(ω
1
, ω
2
)dP
1

1
)
are F
1
-measurable and F
2
-measurable, respectively,
(c) one has that

1
×Ω
2
f(ω
1
, ω
2
)d(P
1
×P
2
) =

1
¸

2
f(ω
1
, ω
2
)dP
2

2
)

dP
1

1
)
=

2
¸

1
f(ω
1
, ω
2
)dP
1

1
)

dP
2

2
).
Again, using Propositions 2.1.4 and 3.2.4 we see that H satisﬁes the assump-
tions (1), (2), and (3) of Proposition 3.5.2. As π-system I we take the system
of all F = A×B with A ∈ F
1
and B ∈ F
2
. Letting f(ω
1
, ω
2
) = 1I
A

1
)1I
B

2
)
we easily can check that f ∈ H. For instance, property (c) follows from

1
×Ω
2
f(ω
1
, ω
2
)d(P
1
×P
2
) = (P
1
×P
2
)(A ×B) = P
1
(A)P
2
(B)
and, for example,

1
¸

2
f(ω
1
, ω
2
)dP
2

2
)

dP
1

1
) =

1
1I
A

1
)P
2
(B)dP
1

1
)
= P
1
(A)P
2
(B).
Applying the Monotone Class Theorem Proposition 3.5.2 gives that H con-
tains all bounded functions f : Ω
1
×Ω
2
→R measurable with respect F
1
⊗F
2
.
Hence we are done.
Now we state Fubini’s Theorem for general random variables f : Ω
1
×Ω
2

R.
Proposition 3.5.5 [Fubini’s Theorem] Let f : Ω
1
× Ω
2
→ R be an
F
1
⊗F
2
-measurable function such that

1
×Ω
2
|f(ω
1
, ω
2
)|d(P
1
×P
2
)(ω
1
, ω
2
) < ∞. (3.2)
Then the following holds:
(1) The functions ω
1
→ f(ω
1
, ω
0
2
) and ω
2
→ f(ω
0
1
, ω
2
) are F
1
-measurable
and F
2
-measurable, respectively, for all ω
0
i
∈ Ω
i
.
3.5. FUBINI’S THEOREM 63
(2) There are M
i
∈ F
i
with P
i
(M
i
) = 1 such that the integrals

1
f(ω
1
, ω
0
2
)dP
1

1
) and

2
f(ω
0
1
, ω
2
)dP
2

2
)
exist and are ﬁnite for all ω
0
i
∈ M
i
.
(3) The maps
ω
1
→1I
M
1

1
)

2
f(ω
1
, ω
2
)dP
2

2
)
and
ω
2
→1I
M
2

2
)

1
f(ω
1
, ω
2
)dP
1

1
)
are F
1
-measurable and F
2
-measurable, respectively, random variables.
(4) One has that

1
×Ω
2
f(ω
1
, ω
2
)d(P
1
×P
2
)
=

1
¸
1I
M
1

1
)

2
f(ω
1
, ω
2
)dP
2

2
)

dP
1

1
)
=

2
¸
1I
M
2

2
)

1
f(ω
1
, ω
2
)dP
1

1
)

dP
2

2
).
Remark 3.5.6 (1) Our understanding is that writing, for example, an
expression like
1I
M
2

2
)

1
f(ω
1
, ω
2
)dP
1

1
)
we only consider and compute the integral for ω
2
∈ M
2
.
(2) The expressions in (3.1) and (3.2) can be replaced by

1
¸

2
f(ω
1
, ω
2
)dP
2

2
)

dP
1

1
) < ∞,
and the same expression with |f(ω
1
, ω
2
1
, ω
2
), respec-
tively.
Proof of Proposition 3.5.5. The proposition follows by decomposing f =
f
+
−f

and applying Proposition 3.5.4.
In the following example we show how to compute the integral

−∞
e
−x
2
dx
by Fubini’s Theorem.
64 CHAPTER 3. INTEGRATION
Example 3.5.7 Let f : R ×R →R be a non-negative continuous function.
Fubini’s Theorem applied to the uniform distribution on [−N, N], N ∈
{1, 2, ...} gives that

N
−N
¸
N
−N
f(x, y)
dλ(y)
2N

dλ(x)
2N
=

[−N,N]×[−N,N]
f(x, y)
d(λ ×λ)(x, y)
(2N)
2
where λ is the Lebesgue measure. Letting f(x, y) := e
−(x
2
+y
2
)
, the above
yields that

N
−N
¸
N
−N
e
−x
2
e
−y
2
dλ(y)

dλ(x) =

[−N,N]×[−N,N]
e
−(x
2
+y
2
)
d(λ ×λ)(x, y).
For the left-hand side we get
lim
N→∞

N
−N
¸
N
−N
e
−x
2
e
−y
2
dλ(y)

dλ(x)
= lim
N→∞

N
−N
e
−x
2
¸
N
−N
e
−y
2
dλ(y)

dλ(x)
=
¸
lim
N→∞

N
−N
e
−x
2
dλ(x)

2
=
¸

−∞
e
−x
2
dλ(x)

2
.
For the right-hand side we get
lim
N→∞

[−N,N]×[−N,N]
e
−(x
2
+y
2
)
d(λ ×λ)(x, y)
= lim
R→∞

x
2
+y
2
≤R
2
e
−(x
2
+y
2
)
d(λ ×λ)(x, y)
= lim
R→∞

R
0

0
e
−r
2
rdrdϕ
= π lim
R→∞

1 −e
−R
2

= π
where we have used polar coordinates. Comparing both sides gives

−∞
e
−x
2
dλ(x) =

π.
As corollary we show that the deﬁnition of the Gaussian measure in Section
1.3.5 was “correct”.
3.5. FUBINI’S THEOREM 65
Proposition 3.5.8 For σ > 0 and m ∈ R let
p
m,σ
2(x) :=
1

2πσ
2
e

(x−m)
2

2
.
Then,

R
p
m,σ
2(x)dx = 1,

R
xp
m,σ
2(x)dx = m, and

R
(x −m)
2
p
m,σ
2(x)dx = σ
2
. (3.3)
In other words: if a random variable f : Ω → R has as law the normal
distribution N
m,σ
2, then
Ef = m and E(f −Ef)
2
= σ
2
. (3.4)
Proof. By the change of variable x → m + σx it is suﬃcient to show the
statements for m = 0 and σ = 1. Firstly, by putting x = z/

2 one gets
1 =
1

π

−∞
e
−x
2
dx =
1

−∞
e

z
2
2
dz
where we have used Example 3.5.7 so that

R
p
0,1
(x)dx = 1. Secondly,

R
xp
0,1
(x)dx = 0
follows from the symmetry of the density p
0,1
(x) = p
0,1
(−x). Finally, by
(x exp(−x
2
/2))

= exp(−x
2
/2) −x
2
exp(−x
2
/2) one can also compute that
1

−∞
x
2
e

x
2
2
dx =
1

−∞
e

x
2
2
dx = 1.

We close this section with a “counterexample” to Fubini’s Theorem.
Example 3.5.9 Let Ω = [−1, 1] ×[−1, 1] and µ be the uniform distribution
on [−1, 1] (see Section 1.3.4). The function
f(x, y) :=
xy
(x
2
+ y
2
)
2
for (x, y) = (0, 0) and f(0, 0) := 0 is not integrable on Ω, even though the
iterated integrals exist end are equal. In fact

1
−1
f(x, y)dµ(x) = 0 and

1
−1
f(x, y)dµ(y) = 0
so that

1
−1

1
−1
f(x, y)dµ(x)

dµ(y) =

1
−1

1
−1
f(x, y)dµ(y)

dµ(x) = 0.
66 CHAPTER 3. INTEGRATION
On the other hand, using polar coordinates we get
4

[−1,1]×[−1,1]
|f(x, y)|d(µ ×µ)(x, y) ≥

1
0

0
| sin ϕcos ϕ|
r
dϕdr
= 2

1
0
1
r
dr = ∞.
The inequality holds because on the right hand side we integrate only over
the area {(x, y) : x
2
+ y
2
≤ 1} which is a subset of [−1, 1] ×[−1, 1] and

0
| sin ϕcos ϕ|dϕ = 4

π/2
0
sin ϕcos ϕdϕ = 2
follows by a symmetry argument.
3.6 Some inequalities
In this section we prove some basic inequalities.
Proposition 3.6.1 [Chebyshev’s inequality]
3
Let f be a non-negative
integrable random variable deﬁned on a probability space (Ω, F, P). Then, for
all λ > 0,
P({ω : f(ω) ≥ λ}) ≤
Ef
λ
.
Proof. We simply have
λP({ω : f(ω) ≥ λ}) = λE1I
{f≥λ}
≤ Ef1I
{f≥λ}
≤ Ef.

Deﬁnition 3.6.2 [convexity] A function g : R →R is convex if and only
if
g(px + (1 −p)y) ≤ pg(x) + (1 −p)g(y)
for all 0 ≤ p ≤ 1 and all x, y ∈ R. A function g is concave if −g is convex.
Every convex function g : R → R is continuous (check!) and hence
(B(R), B(R))-measurable.
Proposition 3.6.3 [Jensen’s inequality]
4
If g : R → R is convex and
f : Ω →R a random variable with E|f| < ∞, then
g(Ef) ≤ Eg(f)
where the expected value on the right-hand side might be inﬁnity.
3
Pafnuty Lvovich Chebyshev, 16/05/1821 (Okatovo, Russia) - 08/12/1894 (St Peters-
burg, Russia)
4
Johan Ludwig William Valdemar Jensen, 08/05/1859 (Nakskov, Denmark)- 05/
03/1925 (Copenhagen, Denmark).
3.6. SOME INEQUALITIES 67
Proof. Let x
0
= Ef. Since g is convex we ﬁnd a “supporting line” in x
0
, that
means a, b ∈ R such that
ax
0
+ b = g(x
0
) and ax + b ≤ g(x)
for all x ∈ R. It follows af(ω) + b ≤ g(f(ω)) for all ω ∈ Ω and
g(Ef) = aEf + b = E(af + b) ≤ Eg(f).

Example 3.6.4 (1) The function g(x) := |x| is convex so that, for any
integrable f,
|Ef| ≤ E|f|.
(2) For 1 ≤ p < ∞ the function g(x) := |x|
p
is convex, so that Jensen’s
inequality applied to |f| gives that
(E|f|)
p
≤ E|f|
p
.
For the second case in the example above there is another way we can go. It
uses the famous H
¨
older-inequality.
Proposition 3.6.5 [H
¨
older’s inequality]
5
Assume a probability space
(Ω, F, P) and random variables f, g : Ω →R. If 1 < p, q < ∞ with
1
p
+
1
q
= 1,
then
E|fg| ≤ (E|f|
p
)
1
p
(E|g|
q
)
1
q
.
Proof. We can assume that E|f|
p
> 0 and E|g|
q
> 0. For example, assuming
E|f|
p
= 0 would imply |f|
p
= 0 a.s. according to Proposition 3.2.1 so that
fg = 0 a.s. and E|fg| = 0. Hence we may set
˜
f :=
f
(E|f|
p
)
1
p
and ˜ g :=
g
(E|g|
q
)
1
q
.
We notice that
x
a
y
b
≤ ax + by
for x, y ≥ 0 and positive a, b with a+b = 1, which follows from the concavity
of the logarithm (we can assume for a moment that x, y > 0)
ln(ax + by) ≥ a ln x + b ln y = ln x
a
+ ln y
b
= ln x
a
y
b
.
5
Otto Ludwig H¨older, 22/12/1859 (Stuttgart, Germany) - 29/08/1937 (Leipzig, Ger-
many).
68 CHAPTER 3. INTEGRATION
Setting x := |
˜
f|
p
, y := |˜ g|
q
, a :=
1
p
, and b :=
1
q
, we get
|
˜
f˜ g| = x
a
y
b
≤ ax + by =
1
p
|
˜
f|
p
+
1
q
|˜ g|
q
and
E|
˜
f˜ g| ≤
1
p
E|
˜
f|
p
+
1
q
E|˜ g|
q
=
1
p
+
1
q
= 1.
On the other hand side,
E|
˜
f˜ g| =
E|fg|
(E|f|
p
)
1
p
(E|g|
q
)
1
q
so that we are done.
Corollary 3.6.6 For 0 < p < q < ∞ one has that (E|f|
p
)
1
p
≤ (E|f|
q
)
1
q
.
The proof is an exercise.
Corollary 3.6.7 [H
¨
older’s inequality for sequences] Let (a
n
)

n=1
and (b
n
)

n=1
be sequences of real numbers. If 1 < p, q < ∞ with
1
p
+
1
q
= 1,
then

¸
n=1
|a
n
b
n
| ≤

¸
n=1
|a
n
|
p
1
p

¸
n=1
|b
n
|
q
1
q
.
Proof. It is suﬃcient to prove the inequality for ﬁnite sequences (b
n
)
N
n=1
since
by letting N → ∞ we get the desired inequality for inﬁnite sequences. Let
Ω = {1, ..., N}, F := 2

, and P({k}) := 1/N. Deﬁning f, g : Ω → R by
f(k) := a
k
and g(k) := b
k
we get
1
N
N
¸
n=1
|a
n
b
n
| ≤

1
N
N
¸
n=1
|a
n
|
p

1
p

1
N
N
¸
n=1
|b
n
|
q

1
q
from Proposition 3.6.5. Multiplying by N and letting N → ∞ gives our
assertion.
Proposition 3.6.8 [Minkowski inequality]
6
Assume a probability space
(Ω, F, P), random variables f, g : Ω →R, and 1 ≤ p < ∞. Then
(E|f + g|
p
)
1
p
≤ (E|f|
p
)
1
p
+ (E|g|
p
)
1
p
. (3.5)
6
Hermann Minkowski, 22/06/1864 (Alexotas, Russian Empire; now Kaunas, Lithuania)
- 12/01/1909 (G¨ottingen, Germany).
3.6. SOME INEQUALITIES 69
Proof. For p = 1 the inequality follows from |f + g| ≤ |f| + |g|. So assume
that 1 < p < ∞. The convexity of x →|x|
p
gives that

a + b
2

p

|a|
p
+|b|
p
2
and (a+b)
p
≤ 2
p−1
(a
p
+b
p
) for a, b ≥ 0. Consequently, |f +g|
p
≤ (|f|+|g|)
p

2
p−1
(|f|
p
+|g|
p
) and
E|f + g|
p
≤ 2
p−1
(E|f|
p
+E|g|
p
).
Assuming now that (E|f|
p
)
1
p
+ (E|g|
p
)
1
p
< ∞, otherwise there is nothing to
prove, we get that E|f +g|
p
< ∞as well by the above considerations. Taking
1 < q < ∞ with
1
p
+
1
q
= 1, we continue with
E|f + g|
p
= E|f + g||f + g|
p−1
≤ E(|f| +|g|)|f + g|
p−1
= E|f||f + g|
p−1
+E|g||f + g|
p−1
≤ (E|f|
p
)
1
p

E|f + g|
(p−1)q

1
q
+ (E|g|
p
)
1
p

E|f + g|
(p−1)q

1
q
,
where we have used H
¨
older’s inequality. Since (p − 1)q = p, (3.5) follows
by dividing the above inequality by (E|f + g|
p
)
1
q
and taking into the account
1 −
1
q
=
1
p
.
We close with a simple deviation inequality for f.
Corollary 3.6.9 Let f be a random variable deﬁned on a probability space
(Ω, F, P) such that Ef
2
< ∞. Then one has, for all λ > 0,
P(|f −Ef| ≥ λ) ≤
E(f −Ef)
2
λ
2

Ef
2
λ
2
.
Proof. From Corollary 3.6.6 we get that E|f| < ∞ so that Ef exists. Apply-
ing Proposition 3.6.1 to |f −Ef|
2
gives that
P({|f −Ef| ≥ λ}) = P({|f −Ef|
2
≥ λ
2
}) ≤
E|f −Ef|
2
λ
2
.
Finally, we use that E(f −Ef)
2
= Ef
2
−(Ef)
2
≤ Ef
2
.
70 CHAPTER 3. INTEGRATION
Deﬁnition 3.7.1 (Signed measures) Let (Ω, F) be a measurable space.
(i) A map µ : F →R is called (ﬁnite) signed measure if and only if
µ = αµ
+
−βµ

,
where α, β ≥ 0 and µ
+
and µ

are probability measures on F.
(ii) Assume that (Ω, F, P) is a probability space and that µ is a signed
measure on (Ω, F). Then µ P (µ is absolutely continuous with
respect to P) if and only if
P(A) = 0 implies µ(A) = 0.
Example 3.7.2 Let (Ω, F, P) be a probability space, L : Ω → R be an
integrable random variable, and
µ(A) :=

A
LdP.
Then µ is a signed measure and µ P.
Proof. We let L
+
:= max {L, 0} and L

:= max {−L, 0} so that L = L
+
−L

.
Assume that

L
±
dP > 0 and deﬁne
µ
±
(A) =

1I
A
L
±
dP

L
±
dP
.
Now we check that µ
±
are probability measures. First we have that
µ
±
(Ω) =

1I

L
±
dP

L
±
dP
= 1.
Then assume A
n
∈ F to be disjoint sets such that A =

¸
n=1
A
n
. Set α :=

L
+
dP. Then
µ
+

n=1
A
n

=
1
α

1I ∞

n=1
A
n
L
+
dP
=
1
α

¸
n=1
1I
A
n
(ω)

L
+
dP
=
1
α

lim
N→∞

N
¸
n=1
1I
A
n

L
+
dP
=
1
α
lim
N→∞

N
¸
n=1
1I
A
n

L
+
dP
3.8. MODES OF CONVERGENCE 71
=

¸
n=1
µ
+
(A
n
)
where we have used Lebesgue’s dominated convergence theorem. The same
can be done for L

.
Theorem 3.7.3 (Radon-Nikodym) Let (Ω, F, P) be a probability space
and µ a signed measure with µ P. Then there exists an integrable random
variable L : Ω →R such that
µ(A) =

A
L(ω)dP(ω), A ∈ F. (3.6)
The random variable L is unique in the following sense. If L and L

are
random variables satisfying (3.6), then
P(L = L

) = 0.
7
in 1913 in the case of
R
n
. The extension to the general case was done by Nikodym
8
in 1930.
Deﬁnition 3.7.4 L is called Radon-Nikodym derivative. We shall write
L =

dP
.
We should keep in mind the rule
µ(A) =

1I
A
dµ =

1I
A
LdP,
so that ’dµ = LdP’.
3.8 Modes of convergence
First we introduce some basic types of convergence.
Deﬁnition 3.8.1 [Types of convergence] Let (Ω, F, P) be a probabil-
ity space and f, f
1
, f
2
, ... : Ω →R random variables.
(1) The sequence (f
n
)

n=1
converges almost surely (a.s.) or with prob-
ability 1 to f (f
n
→f a.s. or f
n
→f P-a.s.) if and only if
P({ω : f
n
(ω) →f(ω) as n →∞}) = 1.
7
Johann Radon, 16/12/1887 (Tetschen, Bohemia; now Decin, Czech Republic) -
25/05/1956 (Vienna, Austria).
8
Otton Marcin Nikodym, 13/08/1887 (Zablotow, Galicia, Austria-Hungary; now
Ukraine) - 04/05/1974 (Utica, USA).
72 CHAPTER 3. INTEGRATION
(2) The sequence (f
n
)

n=1
converges in probability to f (f
n
P
→f) if and
only if for all ε > 0 one has
P({ω : |f
n
(ω) −f(ω)| > ε}) →0 as n →∞.
(3) If 0 < p < ∞, then the sequence (f
n
)

n=1
converges with respect to
L
p
or in the L
p
-mean to f (f
n
L
p
→f) if and only if
E|f
n
−f|
p
→0 as n →∞.
Note that {ω : f
n
(ω) →f(ω) as n →∞} and {ω : |f
n
(ω) −f(ω)| > ε} are
measurable sets.
For the above types of convergence the random variables have to be deﬁned
on the same probability space. There is a variant without this assumption.
Deﬁnition 3.8.2 [Convergence in distribution] Let (Ω
n
, F
n
, P
n
) and
(Ω, F, P) be probability spaces and let f
n
: Ω
n
→ R and f : Ω → R be
random variables. Then the sequence (f
n
)

n=1
converges in distribution
to f (f
n
d
→f) if and only if
Eψ(f
n
) →Eψ(f) as n →∞
for all bounded and continuous functions ψ : R →R.
We have the following relations between the above types of convergence.
Proposition 3.8.3 Let (Ω, F, P) be a probability space and f, f
1
, f
2
, ... : Ω →
R be random variables.
(1) If f
n
→f a.s., then f
n
P
→f.
(2) If 0 < p < ∞ and f
n
L
p
→f, then f
n
P
→f.
(3) If f
n
P
→f, then f
n
d
→f.
(4) One has that f
n
d
→ f if and only if F
f
n
(x) → F
f
(x) at each point x of
continuity of F
f
(x), where F
f
n
and F
f
are the distribution-functions of
f
n
and f, respectively.
(5) If f
n
P
→ f, then there is a subsequence 1 ≤ n
1
< n
2
< n
3
< · · · such
that f
n
k
→f a.s. as k →∞.
Proof. See [4].
3.8. MODES OF CONVERGENCE 73
Example 3.8.4 Assume ([0, 1], B([0, 1]), λ) where λ is the Lebesgue mea-
sure. We take
f
1
= 1I
[0,
1
2
)
, f
2
= 1I
[
1
2
,1]
,
f
3
= 1I
[0,
1
4
)
, f
4
= 1I
[
1
4
,
1
2
]
, f
5
= 1I
[
1
2
,
3
4
)
, f
6
= 1I
[
3
4
,1]
,
f
7
= 1I
[0,
1
8
)
, . . .
This implies lim
n→∞
f
n
(x) →0 for all x ∈ [0, 1]. But it holds convergence in
probability f
n
λ
→0: choosing 0 < ε < 1 we get
λ({x ∈ [0, 1] : |f
n
(x)| > ε}) = λ({x ∈ [0, 1] : f
n
(x) = 0})
=

1
2
if n = 1, 2
1
4
if n = 3, 4, . . . , 6
1
8
if n = 7, . . .
.
.
.
As a preview on the next probability course we give some examples for the
above concepts of convergence. We start with the weak law of large numbers
as an example of the convergence in probability:
Proposition 3.8.5 [Weak law of large numbers] Let (f
n
)

n=1
be a
sequence of independent random variables with
Ef
k
= m and E(f
k
−m)
2
= σ
2
for all k = 1, 2, . . . .
Then
f
1
+· · · + f
n
n
P
−→m as n →∞,
that means, for each ε > 0,
lim
n
P

ω : |
f
1
+· · · + f
n
n
−m| > ε

→0.
Proof. By Chebyshev’s inequality (Corollary 3.6.9) we have that
P

ω :

f
1
+· · · + f
n
−nm
n

> ε

E|f
1
+· · · + f
n
−nm|
2
n
2
ε
2
=
E(
¸
n
k=1
(f
k
−m))
2
n
2
ε
2
=

2
n
2
ε
2
→0
as n →∞.
Using a stronger condition, we get easily more: the almost sure convergence
instead of the convergence in probability. This gives a form of the strong law
of large numbers.
74 CHAPTER 3. INTEGRATION
Proposition 3.8.6 [Strong law of large numbers]
Let (f
n
)

n=1
be a sequence of independent random variables with Ef
k
= 0,
k = 1, 2, . . . , and c := sup
n
Ef
4
n
< ∞. Then
f
1
+· · · + f
n
n
a.s.
→ 0.
Proof. Let S
n
:=
¸
n
k=1
f
k
. It holds
ES
4
n
= E

n
¸
k=1
f
k

4
= E
n
¸
i,j,k,l,=1
f
i
f
j
f
k
f
l
=
n
¸
k=1
Ef
4
k
+ 3
n
¸
k,l=1
k=l
Ef
2
k
Ef
2
l
,
because for distinct {i, j, k, l} it holds
Ef
i
f
3
j
= Ef
i
f
2
j
f
k
= Ef
i
f
j
f
k
f
l
= 0
by independence. For example, Ef
i
f
3
j
= Ef
i
Ef
3
j
= 0 · Ef
3
j
= 0, where one
gets that f
3
j
is integrable by E|f
j
|
3
≤ (E|f
j
|
4
)
3
4
≤ c
3
4
. Moreover, by Jensen’s
inequality,

Ef
2
k

2
≤ Ef
4
k
≤ c.
Hence Ef
2
k
f
2
l
= Ef
2
k
Ef
2
l
≤ c for k = l. Consequently,
ES
4
n
≤ nc + 3n(n −1)c ≤ 3cn
2
,
and
E

¸
n=1
S
4
n
n
4
=

¸
n=1
E
S
4
n
n
4

¸
n=1
3c
n
2
< ∞.
This implies that
S
4
n
n
4
a.s.
→ 0 and therefore
S
n
n
a.s.
→ 0.
There are several strong laws of large numbers with other, in particular
weaker, conditions. We close with a fundamental example concerning the
convergence in distribution: the Central Limit Theorem (CLT). For this we
need
Deﬁnition 3.8.7 Let (Ω, F, P) be a probability spaces. A sequence of
Independent random variables f
n
: Ω → R is called Identically Distributed
(i.i.d.) provided that the random variables f
n
have the same law, that means
P(f
n
≤ λ) = P(f
k
≤ λ)
for all n, k = 1, 2, ... and all λ ∈ R.
3.8. MODES OF CONVERGENCE 75
Let (Ω, F, P) be a probability space and (f
n
)

n=1
be a sequence of i.i.d. ran-
dom variables with Ef
1
= 0 and Ef
2
1
= σ
2
. By the law of large numbers we
know
f
1
+· · · + f
n
n
P
−→0.
Hence the law of the limit is the Dirac-measure δ
0
. Is there a right scaling
factor c(n) such that
f
1
+· · · + f
n
c(n)
→g,
where g is a non-degenerate random variable in the sense that P
g
= δ
0
? And
in which sense does the convergence take place? The answer is the following
Proposition 3.8.8 [Central Limit Theorem] Let (f
n
)

n=1
be a sequence
of i.i.d. random variables with Ef
1
= 0 and Ef
2
1
= σ
2
> 0. Then
P

f
1
+· · · + f
n
σ

n
≤ x

1

x
−∞
e

u
2
2
du
for all x ∈ R as n →∞, that means that
f
1
+· · · + f
n
σ

n
d
→g
for any g with P(g ≤ x) =
1

x
−∞
e

u
2
2
du.
76 CHAPTER 3. INTEGRATION
Chapter 4
Exercises
4.1 Probability spaces
1. Prove that A ∩ (B ∪ C) = (A ∩ B) ∪ (A ∩ C).
2. Prove that
¸
i∈I
A
i

c
=
¸
i∈I
A
c
i
where A
i
⊆ Ω and I is an arbitrary
index set.
3. Given a set Ω and two non-empty sets A, B ⊆ Ω such that A∩ B = ∅.
Give all elements of the smallest σ-algebra F on Ω which contains A
and B.
4. Let x ∈ R. Is it true that {x} ∈ B(R), where {x} is the set, consisting
of the element x only?
5. Assume that Q is the set of rational numbers. Is it true that Q ∈ B(R)?
6. Given two dice with numbers {1, 2, ..., 6}. Assume that the probability
that one die shows a certain number is
1
6
. What is the probability that
the sum of the two dice is m ∈ {1, 2, ..., 12}?
7. There are three students. Assuming that a year has 365 days, what
is the probability that at least two of them have their birthday at the
same day ?
8.* Deﬁnition: The system F ⊆ 2

is a monotonic class, if
(a) A
1
, A
2
, ... ∈ F, A
1
⊆ A
2
⊆ A
3
⊆ · · · =⇒
¸
n
A
n
∈ F, and
(b) A
1
, A
2
, ... ∈ F, A
1
⊇ A
2
⊇ A
3
⊇ · · · =⇒
¸
n
A
n
∈ F.
Show that if F ⊆ 2

is an algebra and a monotonic class, then F is a
σ-algebra.
9. Let E, F, G, be three events. Find expressions for the events that of
E, F, G,
77
78 CHAPTER 4. EXERCISES
(a) only F occurs,
(b) both E and F but not G occur,
(c) at least one event occurs,
(d) at least two events occur,
(e) all three events occur,
(f) none occurs,
(g) at most one occurs,
(h) at most two occur.
10. Show that in the deﬁnition of an algebra (Deﬁnition 1.1.1 one can
replace
(3’) A, B ∈ F implies that A ∪ B ∈ F
by
(3”) A, B ∈ F implies that A ∩ B ∈ F.
11. Prove that A \ B ∈ F if F is an algebra and A, B ∈ F.
12. Prove that
¸

i=1
A
i
∈ F if F is a σ-algebra and A
1
, A
2
, ... ∈ F.
13. Give an example where the union of two σ-algebras is not a σ-algebra.
14. Let F be a σ-algebra and A ∈ F. Prove that
G := {B ∩ A : B ∈ F}
is a σ-algebra.
15. Prove that
{B ∩ [α, β] : B ∈ B(R)} = σ {[a, b] : α ≤ a < b ≤ β}
and that this σ-algebra is the smallest σ-algebra generated by the sub-
sets A ⊆ [α, β] which are open within [α, β]. The generated σ-algebra
is denoted by B([α, β]).
16. Show the equality σ(G
2
) = σ(G
4
) = σ(G
0
) in Proposition 1.1.8 holds.
17. Show that A ⊆ B implies that σ(A) ⊆ σ(B).
18. Prove σ(σ(G)) = σ(G).
19. Let Ω = ∅, A ⊆ Ω, A = ∅ and F := 2

. Deﬁne
P(B) :=

1 : B ∩ A = ∅
0 : B ∩ A = ∅
.
Is (Ω, F, P) a probability space?
4.1. PROBABILITY SPACES 79
20. Prove Proposition 1.2.6 (7).
21. Let (Ω, F, P) be a probability space and A ∈ F such that P(A) > 0.
Show that (Ω, F, µ) is a probability space, where
µ(B) := P(B|A).
22. Let (Ω, F, P) be a probability space. Show that if A, B ∈ F are inde-
pendent this implies
(a) A and B
c
are independent,
(b) A
c
and B
c
are independent.
23. Let (Ω, F, P) be the model of rolling two dice, i.e. Ω = {(k, l) : 1 ≤
k, l ≤ 6}, F = 2

, and P((k, l)) =
1
36
for all (k, l) ∈ Ω. Assume
A := {(k, l) : l = 1, 2 or 5},
B := {(k, l) : l = 4, 5 or 6},
C := {(k, l) : k + l = 9}.
Show that
P(A ∩ B ∩ C) = P(A)P(B)P(C).
Are A, B, C independent ?
24. (a) Let Ω := {1, ..., 6}, F := 2

and P(B) :=
1
6
card(B). We deﬁne
A := {1, 4} and B := {2, 5}. Are A and B independent?
(b) Let Ω := {(k, l) : k, l = 1, ..., 6}, F := 2

and P(B) :=
1
36
card(B).
We deﬁne
A := {(k, l) : k = 1 or k = 4} and B := {(k, l) : l = 2 or l = 5} .
Are A and B independent?
25. In a certain community 60 % of the families own their own car, 30 %
own their own home, and 20% own both (their own car and their own
home). If a family is randomly chosen, what is the probability that
this family owns a car or a house but not both?
26. Prove Bayes’ formula: Proposition 1.2.15.
27. Suppose we have 10 coins which are such that if the ith one is ﬂipped
then heads will appear with probability
i
10
, i = 1, 2, . . . 10. One chooses
randomly one of the coins. What is the conditionally probability that
one has chosen the ﬁfth coin given it had shown heads after ﬂipping?
Hint: Use Bayes’ formula.
80 CHAPTER 4. EXERCISES
28.* A class that is both, a π-system and a λ-system, is a σ-algebra.
29. Prove Property (3) in Lemma 1.4.4.
30.* Let (Ω, F, P) be a probability space and assume A
1
, A
2
, ..., A
n
∈ F are
independent. Show that then A
c
1
, A
c
2
, ..., A
c
n
are independent.
31. Let us play the following game: If we roll a die it shows each of the
numbers {1, 2, 3, 4, 5, 6} with probability
1
6
. Let k
1
, k
2
, ... ∈ {1, 2, 3, ...}
be a given sequence.
(a) First go: One has to roll the die k
1
times. If it did show all k
1
times 6 we won.
(b) Second go: We roll k
2
times. Again, we win if it would all k
2
times
show 6.
And so on... Show that
(a) The probability of winning inﬁnitely many often is 1 if and only
if

¸
n=1

1
6

k
n
= ∞,
(b) The probability of loosing inﬁnitely many often is always 1.
Hint: Use the lemma of Borel-Cantelli.
32. Let n ≥ 1, k ∈ {0, 1, ..., n} and

n
k

:=
n!
k!(n −k)!
where 0! := 1 and k! := 1 · 2 · · · k, for k ≥ 1.
Prove that one has

n
k

possibilities to choose k elements out of n ele-
ments.
33. Binomial distributon: Assume 0 < p < 1, Ω := {0, 1, 2, ..., n},
n ≥ 1, F := 2

and
µ
n,p
(B) :=
¸
k∈B

n
k

p
n−k
(1 −p)
k
.
(a) Is (Ω, F, µ
n,p
) a probability space?
(b) Compute max
k=0,...,n
µ
n,p
({k}) for p =
1
2
.
34. Geometric distribution: Let 0 < p < 1, Ω := {0, 1, 2, ...}, F := 2

and
µ
p
(B) :=
¸
k∈B
p(1 −p)
k
.
4.2. RANDOM VARIABLES 81
(a) Is (Ω, F, µ
p
) a probability space?
(b) Compute µ
p
({0, 2, 4, 6, ...}).
35. Poisson distribution: Let λ > 0, Ω := {0, 1, 2, ...}, F := 2

and
π
λ
(B) :=
¸
k∈B
e
−λ
λ
k
k!
.
(a) Is (Ω, F, π
λ
) a probability space?
(b) Compute
¸
k∈Ω

λ
({k}).
4.2 Random variables
1. Let A
1
, A
2
, ... ⊆ Ω. Show that
(a) liminf
n→∞
1I
A
n
(ω) = 1I
liminf
n→∞
A
n
(ω),
(b) limsup
n→∞
1I
A
n
(ω) = 1I
limsup
n→∞
A
n
(ω),
for all ω ∈ Ω.
2. Let (Ω, F, P) be a probability space and A ⊆ Ω a set. Show that A ∈ F
if and only if 1I
A
: Ω →R is a random variable.
3. Show the assertions (1), (3) and (4) of Proposition 2.1.5.
4. Let (Ω, F, P) be a probability space and f : Ω →R.
(a) Show that, if F = {∅, Ω}, then f is measurable if and only if f is
constant.
(b) Show that, if P(A) is 0 or 1 for every A ∈ F and f is measurable,
then
P({ω : f(ω) = c}) = 1 for a constant c.
5. Let (Ω, F) be a measurable space and f
n
, n = 1, 2, . . . a sequence of
random variables. Show that
liminf
n→∞
f
n
and limsup
n→∞
f
n
are random variables.
6. Complete the proof of Proposition 2.2.9 by showing that for a distribu-
tion function F
g
(x) = P({ω : g ≤ x}) of a random variable g it holds
lim
x→−∞
F
g
(x) = 0 and lim
x→∞
F
g
(x) = 1.
82 CHAPTER 4. EXERCISES
7. Let f : Ω →R be a map. Show that for A
1
, A
2
, · · · ⊆ R it holds
f
−1

¸
i=1
A
i

=

¸
i=1
f
−1
(A
i
).
8. Let (Ω
1
, F
1
), (Ω
2
, F
2
), (Ω
3
, F
3
) be measurable spaces. Assume that
f : Ω
1
→ Ω
2
is (F
1
, F
2
)-measurable and that g : Ω
2
→ Ω
3
is (F
2
, F
3
)-
measurable. Show that then g ◦ f : Ω
1
→Ω
3
deﬁned by
(g ◦ f)(ω
1
) := g(f(ω
1
))
is (F
1
, F
3
)-measurable.
9. Prove Proposition 2.1.4.
10. Let (Ω, F, P) be a probability space, (M, Σ) a measurable space and
assume that f : Ω →M is (F, Σ)-measurable. Show that µ with
µ(B) := P({ω : f(ω) ∈ B}) for B ∈ Σ
is a probability measure on Σ.
11. Assume the probability space (Ω, F, P) and let f, g : Ω → R be mea-
surable step-functions. Show that f and g are independent if and only
if for all x, y ∈ R it holds
P({f = x, g = y}) = P({f = x}) P({g = y})
12. Assume the product space ([0, 1] ×[0, 1], B([0, 1]) ⊗B([0, 1]), λ×λ) and
the random variables f(x, y) := 1I
[0,p)
(x) and g(x, y) := 1I
[0,p)
(y), where
0 < p < 1. Show that
(a) f and g are independent,
(b) the law of f + g, P
f+g
({k}), for k = 0, 1, 2 is the binomial distri-
bution µ
2,p
.
13. Let ([0, 1], B([0, 1])) be a measurable space and deﬁne the functions
f(x) := 1I
[0,1/2)
(x) + 21I
[1/2,1])
(x),
and
g(x) := 1I
[0,1/2)
(x) −1I
{1/4}
(x).
Compute (a) σ(f) and (b) σ(g).
14. Assume a measurable space (Ω, F) and random variables f, g : Ω →R.
Is the set {ω : f(ω) = g(ω)} measurable?
4.3. INTEGRATION 83
15. Let G := σ{(a, b), 0 < a < b < 1} be a σ -algebra on [0, 1]. Which
continuous functions f : [0, 1] →R are measurable with respect to G ?
16. Let f
k
: Ω → R, k = 1, . . . , n be independent random variables on
(Ω, F, P) and assume the functions g
k
: R →R, k = 1, . . . , n are B(R)–
measurable. Show that g
1
(f
1
), g
2
(f
2
), . . . , g
n
(f
n
) are independent.
17. Assume n ∈ N and deﬁne by F := σ{

k
n
,
k+1
n

for k = 0, 1, . . . , n − 1}
a σ–algebra on [0, 1].
(a) Why is the function f(x) = x, x ∈ [0, 1] not F–measurable?
(b) Give an example of an F–measurable function.
18. Prove Proposition 2.3.4.
19.* Show that families of random variables are independent iﬀ (if and only
if) their generated σ-algebras are independent.
20. Show Proposition 2.3.5.
4.3 Integration
1. Let (Ω, F, P) be a probability space and f, g : Ω →R random variables
with
f =
n
¸
i=1
a
i
1I
A
i
and g =
m
¸
j=1
b
j
1I
B
j
,
where a
i
, b
j
∈ R and A
i
, B
j
∈ F. Assume that {A
1
, . . . , A
n
} and
{B
1
, . . . B
m
} are independent. Show that
Efg = EfEg.
2. Prove assertion (4) and (5) of Proposition 3.2.1
Hints: To prove (4), set A
n
:=
¸
ω : f(ω) >
1
n
¸
and show that
0 = Ef1I
A
n
≥ E
1
n
1I
A
n
=
1
n
E1I
A
n
=
1
n
P(A
n
).
This implies P(A
n
) = 0 for all n. Using this one gets
P{ω : f(ω) > 0} = 0.
To prove (5) use
Ef = E(f1I
{f=g}
+ f1I
{f=g}
) = Ef1I
{f=g}
+Ef1I
{f=g}
= Ef1I
{f=g}
.
84 CHAPTER 4. EXERCISES
3. Let (Ω, F, P) be a probability space and A
1
, A
2
, ... ∈ F. Show that
P

liminf
n
A
n

≤ liminf
n
P(A
n
)
≤ limsup
n
P(A
n
) ≤ P

limsup
n
A
n

.
Hint: Use Fatou’s lemma and Exercise 4.2.1.
4. Assume the probability space ([0, 1], B([0, 1]), λ) and the random vari-
able
f =

¸
k=0
k1I
[a
k−1
,a
k
)
with a
−1
:= 0 and
a
k
:= e
−λ
k
¸
m=0
λ
m
m!
, for k = 0, 1, 2, . . .
where λ > 0. Compute the law P
f
({k}), k = 0, 1, 2, . . . of f. Which
distribution has f?
5. Assume the probability space ((0, 1], B((0, 1]), λ) and
f(x) :=

1 for irrational x ∈ (0, 1]
1
x
for rational x ∈ (0, 1]
Compute Ef.
6. uniform distribution:
Assume the probability space ([a, b], B([a, b]),
λ
b−a
), where
−∞< a < b < ∞.
(a) Show that
Ef =

[a,b]
f(ω)
1
b −a
dλ(ω) =
a + b
2
where f(ω) := ω.
(b) Compute
Eg =

[a,b]
g(ω)
1
b −a
dλ(ω) = where g(ω) := ω
2
.
(c) Compute the variance E(f −Ef)
2
of the random variable f(ω) :=
ω.
Hint: Use a) and b).
4.3. INTEGRATION 85
7. Poisson distribution: Let λ > 0, Ω := {0, 1, 2, 3, ...}, F := 2

and
π
λ
=

¸
k=0
e
−λ
λ
k
k!
δ
k
.
(a) Show that
Ef =

f(ω)dπ
λ
(ω) = λ where f(k) := k.
(b) Show that
Eg =

g(ω)dµ
λ
(ω) = λ
2
where g(k) := k(k −1).
(c) Compute the variance E(f −Ef)
2
of the random variable f(k) :=
k.
Hint: Use a) and b).
8. Binomial distribution: Let 0 < p < 1, Ω := {0, 1, ..., n}, n ≥ 2,
F := 2

and
µ
n,p
=
n
¸
k=0

n
k

p
k
(1 −p)
(n−k)
δ
k
.
(a) Show that
Ef =

f(ω)dµ
n,p
(ω) = np where f(k) := k.
(b) Show that
Eg =

g(ω)dµ
n,p
(ω) = n(n −1)p
2
where g(k) := k(k −1).
(c) Compute the variance E(f −Ef)
2
of the random variable f(k) :=
k.
Hint: Use a) and b).
9. Assume integrable, independent random variables f
i
: Ω → R with
Ef
2
i
< ∞, mean Ef
i
= m
i
and variance σ
2
i
= E(f
i
− m
i
)
2
, for i =
1, 2, . . . , n. Compute the mean and the variance of
(a) g = af
1
+ b, where a, b ∈ R,
(b) g =
¸
n
i=1
f
i
.
86 CHAPTER 4. EXERCISES
10. Let the random variables f and g be independent and Poisson dis-
trubuted with parameters λ
1
and λ
2
, respectively. Show that f + g is
Poisson distributed
P
f+g
({k}) = P(f + g = k) =
k
¸
l=0
P(f = l, g = k −l) = . . . .
Which parameter does this Poisson distribution have?
11. Assume that f ja g are independent random variables where E|f| < ∞
and E|g| < ∞. Show that E|fg| < ∞ and
Efg = EfEg.
Hint:
(a) Assume ﬁrst f ≥ 0 and g ≥ 0 and show using the ”stair-case
functions” to approximate f and g that
Efg = EfEg.
(b) Use (a) to show E|fg| < ∞, and then Lebesgue’s Theorem for
Efg = EfEg.
in the general case.
12. Use H¨older’s inequality to show Corollary 3.6.6.
13. Let f
1
, f
2
, . . . be non-negative random variables on (Ω, F, P). Show that
E

¸
k=1
f
k
=

¸
k=1
Ef
k
(≤ ∞).
14. Use Minkowski’s inequality to show that for sequences of real numbers
(a
n
)

n=1
and (b
n
)

n=1
and 1 ≤ p < ∞ it holds

¸
n=1
|a
n
+ b
n
|
p
1
p

¸
n=1
|a
n
|
p
1
p
+

¸
n=1
|b
n
|
p
1
p
.
15. Prove assertion (2) of Proposition 3.2.3.
16. Assume a sequence of i.i.d. random variables (f
k
)

k=1
with Ef
1
= m
and variance E(f
1
−m)
2
= σ
2
. Use the Central Limit Theorem to show
that
P

ω :
f
1
+ f
2
+· · · + f
n
−nm
σ

n
≤ x

1

x
−∞
e

u
2
2
du
as n →∞.
4.3. INTEGRATION 87
17. Let ([0, 1], B([0, 1]), λ) be a probability space. Deﬁne the functions
f
n
: Ω →R by
f
2n
(x) := n
3
1I
[0,1/2n]
(x) and f
2n−1
(x) := n
3
1I
[1−1/2n,1]
(x),
where n = 1, 2, . . . .
(a) Does there exist a random variable f : Ω → R such that f
n
→ f
almost surely?
(b) Does there exist a random variable f : Ω →R such that f
n
P
→f?
(c) Does there exist a random variable f : Ω →R such that f
n
L
P
→f?
Index
λ-system, 31
liminf
n
A
n
, 17
liminf
n
a
n
, 17
limsup
n
A
n
, 17
limsup
n
a
n
, 17
π-system, 24
π-systems and uniqueness of mea-
sures, 24
π −λ-Theorem, 31
σ-algebra, 10
σ-ﬁnite, 14
algebra, 10
axiom of choice, 32
Bayes’ formula, 19
binomial distribution, 25
Borel σ-algebra, 13
Borel σ-algebra on R
n
, 24
Carath´eodory’s extension
theorem, 21
central limit theorem, 75
change of variables, 58
Chebyshev’s inequality, 66
closed set, 12
conditional probability, 18
convergence almost surely, 71
convergence in L
p
, 71
convergence in distribution, 72
convergence in probability, 71
convexity, 66
counting measure, 15
Dirac measure, 15
distribution-function, 40
dominated convergence, 55
equivalence relation, 31
event, 10
existence of sets, which are not Borel,
32
expectation of a random variable, 47
expected value, 47
exponential distribution on R, 28
extended random variable, 60
Fubini’s Theorem, 61, 62
Gaussian distribution on R, 28
geometric distribution, 25
H¨older’s inequality, 67
i.i.d. sequence, 74
independence of a family of random
variables, 42
independence of a ﬁnite family of ran-
dom variables, 42
independence of a sequence of events,
18
Jensen’s inequality, 66
law of a random variable, 39
Lebesgue integrable, 47
Lebesgue measure, 26, 27
Lebesgue’s Theorem, 55
lemma of Borel-Cantelli, 20
lemma of Fatou, 18
lemma of Fatou for random variables,
54
measurable map, 38
measurable space, 10
measurable step-function, 35
measure, 14
measure space, 14
Minkowski’s inequality, 68
88
INDEX 89
moments, absolute moments, 59
monotone class theorem, 60
monotone convergence, 53
open set, 12
Poisson distribution, 25
Poisson’s Theorem, 29
probability measure, 14
probability space, 14
product of probability spaces, 23
random variable, 36
realization of independent
random variables, 44
step-function, 35
strong law of large numbers, 74
uniform distribution, 26, 27
variance, 49
vector space, 59
weak law of large numbers, 73
90 INDEX
Bibliography
[1] H. Bauer. Probability theory. Walter de Gruyter, 1996.
[2] H. Bauer. Measure and integration theory. Walter de Gruyter, 2001.
[3] P. Billingsley. Probability and Measure. Wiley, 1995.
[4] A.N. Shiryaev. Probability. Springer, 1996.
[5] D. Williams. Probability with martingales. Cambridge University Press,
1991.
91

2

Contents
1 Probability spaces 1.1 Deﬁnition of σ-algebras . . . . . . . . . . . . . . . . . . . . . 1.2 Probability measures . . . . . . . . . . . . . . . . . . . . . . 1.3 Examples of distributions . . . . . . . . . . . . . . . . . . . 1.3.1 Binomial distribution with parameter 0 < p < 1 . . . 1.3.2 Poisson distribution with parameter λ > 0 . . . . . . 1.3.3 Geometric distribution with parameter 0 < p < 1 . . 1.3.4 Lebesgue measure and uniform distribution . . . . . 1.3.5 Gaussian distribution on R with mean m ∈ R and variance σ 2 > 0 . . . . . . . . . . . . . . . . . . . . . 1.3.6 Exponential distribution on R with parameter λ > 0 1.3.7 Poisson’s Theorem . . . . . . . . . . . . . . . . . . . 1.4 A set which is not a Borel set . . . . . . . . . . . . . . . . . 9 10 14 25 25 25 25 26 28 28 29 31

. . . . . . . . . . .

2 Random variables 35 2.1 Random variables . . . . . . . . . . . . . . . . . . . . . . . . . 35 2.2 Measurable maps . . . . . . . . . . . . . . . . . . . . . . . . . 37 2.3 Independence . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 3 Integration 3.1 Deﬁnition of the expected value . . . . . 3.2 Basic properties of the expected value . . 3.3 Connections to the Riemann-integral . . 3.4 Change of variables in the expected value 3.5 Fubini’s Theorem . . . . . . . . . . . . . 3.6 Some inequalities . . . . . . . . . . . . . 3.7 Theorem of Radon-Nikodym . . . . . . . 3.8 Modes of convergence . . . . . . . . . . . 45 45 49 56 58 59 66 70 71

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

4 Exercises 77 4.1 Probability spaces . . . . . . . . . . . . . . . . . . . . . . . . . 77 4.2 Random variables . . . . . . . . . . . . . . . . . . . . . . . . . 81 4.3 Integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83

3

4 CONTENTS .

.. Surely many lectures about probability start with rolling a die: you have six possible outcomes {1.. and to ﬁnish with some ﬁrst limit theorems. Also. . . 3. Kolmogorov (19031987). 6).. You stand at a bus-stop knowing that every 30 minutes a bus comes. Let us start with some introducing examples.. and Economics would either not have been developed or would not be rigorous. 1).st-andrews. 1).. 5. (6.. Kolmogorov published his modern approach of Probability Theory. even in branches one does not expect this.mcs.ac. including the notion of a measurable space and a probability space. Example 1. Now the situation is a bit more complicated: Somebody has two dice and transmits to you only the sum of the two dice. The modern period of probability theory is connected with names like S. This lecture will start from this notion. probability is used in many branches of pure mathematics. 6)} 5 . Borel (1871-1956). 4. Biology.Introduction Probability theory can be understood as a mathematical model for the intuitive notion of uncertainty. How to formalize this? Example 3. If you want to roll a certain number. 6}.N. E. to continue with random variables and basic parts of integration theory. 2.uk/history/index. like in convex geometry. The set of all outcomes of the experiment is the set Ω := {(1. Historical information about mathematicians can be found in the MacTutor History of Mathematics Archive under www-history.. .N. in 1933 A. (1. But you do not know when the last bus came. and A. In particular. (6. What is the intuitively expected time you have to wait for the next bus? Or you have a light bulb and would like to know how many hours you can expect the light bulb will work? How to treat the above two questions by mathematical tools? Example 2.. assuming the die is fair (whatever this means) it is intuitively clear that the chance to dice really 3 is 1:6. let’s say 3. Bernstein (1880-1968).N.. Without probability theory all the stochastic models in Physics.html and is also used throughout this script.

. Intuitively. Secondly. which we introduce later. Hence. D3 := T − S3 . Example 4. but this can. we take a function f :Ω→R which gives for all ω ∈ Ω the diﬀerence f (ω) = T − S(ω). This is an easy example of a random variable f . by using thermometers of the same type. we might not have a systematic error. having a certain distribution.. . How to develop an exact mathematical theory out of this? Firstly. probably. If we would measure simultaneously. including some electronics. It is impossible to describe all these sources of randomness or uncertainty explicitly. S2 . We denote the exact temperature by T and the displayed temperature by S. The number we get from the system might not be correct because of several reasons: the electronics is inﬂuenced by the inside temperature and the voltage of the power-supply. that we would like to measure the temperature outside our home. D2 := T − S2 . But this is not always possible nor necessary: Say. .. we take an abstract set Ω. . We can do this by an electronic thermometer which consists of a sensor outside and a display.. From properties of this function we would like to get useful information of our thermometer and. or how likely an ω is? We do this by introducing the probability spaces in Chapter 1. Each element ω ∈ Ω will stand for a speciﬁc conﬁguration of our sources inﬂuencing the measured value. we get random numbers D1 .6 CONTENTS but what you see is only the result of the map f : Ω → R deﬁned as f ((a. b)) := a + b. . with corresponding diﬀerences D1 := T − S1 . In Example 3 we know the set of states ω ∈ Ω explicitly. inside. that means you see f (ω) but not ω = (a.. we would get values S1 . b).. in particular. so that the diﬀerence T − S is inﬂuenced by the above sources of uncertainty. Step 2: What mathematical properties of f we need to transport the randomness from ω to f (ω)? This yields to the introduction of the random variables in Chapter 2. not be done in a perfect way. Changes of these parameters have to be compensated by the electronics. To put Examples 3 and 4 on a solid ground we go ahead with the following questions: Step 1: How to model the randomness of ω. but some kind of a random error which appears as a positive or negative deviation from the exact value. D2 . about the correctness of the displayed values.

2. then the following notation is used: intersection: A ∩ B union: A ∪ B set-theoretical minus: A\B complement: Ac empty set: ∅ real numbers: R natural numbers: N rational numbers: Q indicator-function: 1 A (x) I = = = = = {ω ∈ Ω : ω ∈ A and ω ∈ B} {ω ∈ Ω : ω ∈ A or (or both) ω ∈ B} {ω ∈ Ω : ω ∈ A and ω ∈ B} {ω ∈ Ω : ω ∈ A} set. .. what do we mean by a distribution? Some basic distributions are discussed in Section 1. Given a set Ω and subsets A. β. To deﬁne these quantities one needs the integration theory developed in Chapter 3. 3. Step 5: What is a good method to estimate Ef ? We can take a sequence of independent (take this intuitive for the moment) random variables f1 ..3. . If the ﬁrst expression is zero. This lieds us to the Strong Law of Large Numbers discussed in Section 3. Step 4: Is it possible to describe the distribution of the values f may take? Or before.CONTENTS 7 Step 3: What are properties of f which might be important to know in practice? For example the mean-value and the variance. β}. f2 . . Notation. without any element = {1.. B ⊆ Ω.. and expect that 1 n n fi (ω) and i=1 Ef are close to each other. we use α ∧ β := min {α. having the same distribution as f . denoted by Ef and E(f − Ef )2 .8.. if the second one is small the displayed values are very likely close to the real temperature. then in Example 3 the calibration of the thermometer is right.} = 1 if x ∈ A 0 if x ∈ A Given real numbers α.

8 CONTENTS .

5. 3. (2) A σ-algebra F. 4. 2. then we have either ”heads” or ”tails” on top. (H. F. T )} is the set of all possible outcomes. P) consists of three components. If we have two coins. here we do not need any measure. H).1 (a) If we roll a die. 1] that describes how likely it is that the event A occurs. (T. ∞). which is the system of observable subsets or events A ⊆ Ω. but one can decide whether ω ∈ A or ω ∈ A. 6}. that means Ω = {H. That means Ω = {1. the fundamental notion of probability theory.0. H). (1) The elementary events or states ω which are collected in a non-empty set Ω. then Ω = {(H. Example 1. (T. A probability space (Ω.Chapter 1 Probability spaces In this chapter we introduce the probability space. (c) For the lifetime of a bulb in hours we can choose Ω = [0. This probability is a number P(A) ∈ [0. In the second step we introduce the measures. 9 . The interpretation is that one can usually not decide whether a system is in the particular state ω ∈ Ω. then all possible outcomes are the numbers between 1 and 6. (3) A measure P. T ). T }. For the formal mathematical approach we proceed in two steps: in the ﬁrst step we deﬁne the σ-algebras F. (b) If we ﬂip a coin. which gives a probability to all A ∈ F.

the terms σ-ﬁeld and ﬁeld are used instead of σ-algebra and algebra. 6} and F := 2Ω . A system F of subsets A ⊆ Ω is called σ-algebra on Ω if (1) ∅. ∅. The event ”the sum of the two dice equals four” is described by A = {(1. measurable space] Let Ω be a non-empty set. algebra. then F is called an algebra. ∅}. i.1. (2. b = 1.. then F = {Ω. . We consider some ﬁrst examples. (3) A1 . If one replaces (3) by (3 ) A. 6} and F := 2Ω .1 [σ-algebra. 4. (c) If A ⊆ Ω. . Deﬁnition 1. Every σ-algebra is an algebra.e. (3. An event A occurs if ω ∈ A and it does not occur if ω ∈ A. F).1. Ac } is a σ-algebra.. A. i.. .. (b) Assume a model with two dice.. . The pair (Ω. without which many parts of mathematics can not live. It is the set the probability measures are deﬁned on. 1)}.1 Deﬁnition of σ-algebras The σ-algebra is a basic tool in probability theory. then F is a σ-algebra. Some more concrete examples are the following: Example 1. Sometimes.. (b) The smallest σ-algebra: F = {Ω. Without this notion it would be impossible to consider the fundamental Lebesgue measure on the interval [0. 1] or to consider Gaussian measures.. (2) A ∈ F implies that Ac := Ω\A ∈ F. 2). 6}. B ∈ F implies that A ∪ B ∈ F. where F is a σ-algebra on Ω. Example 1. is called measurable space. ∈ F implies that ∞ i=1 Ai ∈ F. A2 . The elements A ∈ F are called events. Ω := {(a.e. Ω ∈ F.2 (a) The largest σ-algebra on Ω: if F = 2Ω is the system of all subsets A ⊆ Ω. PROBABILITY SPACES 1.3 (a) Assume a model for a die. 3). b) : a.. Ω := {1.10 CHAPTER 1. The event ”the die shows an even number” is described by A = {2.1.

(T. .. However. T ). so that (Fj are σ–algebras!) ∞ Ac = Ω\A ∈ Fj for all j ∈ J. where J is an arbitrary index set. H)}. (T.5 [intersection of σ-algebras is a σ-algebra] Let Ω be an arbitrary non-empty set and let Fj . H). b1 ] ∪ (a2 . H). Consequently. (H. But what is the right σ-algebra in this case? It is not 2Ω which would be too big.4 [algebra.. ωn }. The proof is very easy. i.1. ∞] = (a. .1. ”Exactly one of two coins shows heads” is modeled via A = {(H.. T )} and F := 2Ω . Unfortunately. ∈ Fj for all j ∈ J. ∞).1. A1 . Then G is an algebra. one can work practically with them nevertheless. In the following we describe a simple procedure which generates σ–algebras. Ω ∈ j∈J Fj . First we notice that ∅. but not a σ-algebra. ”The bulb works more than 200 hours” we express by A = (200. . which is not a σ-algebra] Let G be the system of subsets A ⊆ R such that A can be written as A = (a1 . ∞). . a] = ∅. ∈ j∈J Fj .. A1 . A2 .. Now let A. Ω ∈ Fj for all j ∈ J. b2 ] ∪ · · · ∪ (an . then any algebra F on Ω is automatically a σ-algebra. We start with the fundamental Proposition 1. Ω := {(H. in general this is not the case as shown by the next example: Example 1. and i=1 Ai ∈ F j ∞ A ∈ j∈J c Fj and i=1 Ai ∈ j∈J Fj . (d) Assume that we want to model the lifetime of a bulb. J = ∅. Hence A. Proof. If Ω = {ω1 . be a family of σ-algebras on Ω. T ). Then F := j∈J Fj is a σ-algebra as well. but typical and fundamental. so that Ω := [0. Surprisingly. DEFINITION OF σ-ALGEBRAS 11 (c) Assume a model for two coins. so that ∅. ∞) and (a.. most of the important σ–algebras can not be constructed explicitly.1. (T.e. bn ] where −∞ ≤ a1 ≤ b1 ≤ · · · ≤ an ≤ bn ≤ ∞ with the convention that (a. j ∈ J.. A2 .

The construction is very elegant but has.8 [Generation of the Borel σ-algebra on R] We let G0 G1 G2 G3 G4 G5 be be be be be be the the the the the the system system system system system system of of of of of of all all all all all all open subsets of R. intervals (a. the Borel σ-algebra on R. By deﬁnition of J we have that F ∈ J so that σ(G) = C∈J C ⊆ F. b). To do this we need the notion of open and closed sets.6 [smallest σ-algebra containing a set-system] Let Ω be an arbitrary non-empty set and G be an arbitrary system of subsets A ⊆ Ω. Proof. as already mentioned. PROBABILITY SPACES Proposition 1. because G ⊆ 2Ω and 2Ω is a σ–algebra. Hence σ(G) := C∈J C yields to a σ-algebra according to Proposition 1. Proposition 1.2 one has J = ∅. the slight disadvantage that one cannot construct all elements of σ(G) explicitly.1. (2) A subset B ⊆ R is called closed. According to Example 1.1. It remains to show that σ(G) is the smallest σ-algebra containing G. −∞ < a < b < ∞. b) is open and the interval [a. if for each x ∈ A there is an ε > 0 such that (x − ε.5 such that (by construction) G ⊆ σ(G). intervals (−∞. b]. Then σ(G0 ) = σ(G1 ) = σ(G2 ) = σ(G3 ) = σ(G4 ) = σ(G5 ).12 CHAPTER 1. b]. intervals (a.7 [open and closed sets] (1) A subset A ⊆ R is called open. the interval (a.1. Deﬁnition 1. x + ε) ⊆ A. Given −∞ ≤ a ≤ b ≤ ∞.1. b] is closed. Assume another σ-algebra F with G ⊆ F. We let J := {C is a σ–algebra on Ω such that G ⊆ C} . b ∈ R. if A := R\B is open. closed subsets of R. b). −∞ < a < b < ∞. Let us now turn to one of the most important examples. b ∈ R. Then there exists a smallest σ-algebra σ(G) on Ω such that G ⊆ σ(G). intervals (−∞.1. . Moreover. by deﬁnition the empty set ∅ is open and closed.

A set B ⊆ M is closed if the complement M \B is open. e . We only show that σ(G0 ) = σ(G1 ) = σ(G3 ) = σ(G5 ). b) = n=1 (−∞.8 is called Borel σ-algebra and denoted by B(R). x + εx ) ⊆ A. The Borel σ-algebra B(M ) is the smallest σ-algebra that contains all open (closed) subsets of M . For all x ∈ A there is a maximal εx > 0 such that (x − εx .1. x∈A∩Q 1 ´ F´lix Edouard Justin Emile Borel. for −∞ < a < b < ∞ one has that ∞ (a. Now let us assume a bounded non-empty open set A ⊆ R. b)\(−∞. (x − εx . Hence G0 ⊆ σ(G1 ) and σ(G0 ) ⊆ σ(G1 ).8. Moreover. In the same way one can introduce Borel σ-algebras on metric spaces: Given a metric space M with metric d a set A ⊆ M is open provided that for all x ∈ A there is a ε > 0 such that {y ∈ M : d(x. y) < ε} ⊆ A. Finally. Proof of Proposition 1. French mathematician. DEFINITION OF σ-ALGEBRAS 13 Deﬁnition 1.1. A ∈ G0 implies Ac ∈ G1 ⊆ σ(G1 ) and A ∈ σ(G1 ).1. The remaining inclusion σ(G1 ) ⊆ σ(G0 ) can be shown in the same way. Hence A= which proves G0 ⊆ σ(G5 ) and σ(G0 ) ⊆ σ(G5 ).1. Because of G3 ⊆ G0 one has σ(G3 ) ⊆ σ(G0 ).9 [Borel σ-algebra on R] 1 The σ-algebra constructed in Proposition 1.1. a + 1 ) n ∈ σ(G3 ) so that G5 ⊆ σ(G3 ) and σ(G5 ) ⊆ σ(G3 ). x + εx ). 07/01/1871-03/02/1956.

2. . The measure space (Ω. 4.2.. any probability measure is a ﬁnite measure: We need only to check µ(∅) = 0 which follows from µ(∅) = ∞ µ(∅) (note that i=1 ∅ ∩ ∅ = ∅) and µ(Ω) < ∞.. leads to 1 P({ω}) := .1) The triplet (Ω. (d) µ(Ωk ) < ∞.14 CHAPTER 1. for example. (3) A measure space (Ω.1 [probability measure. Example 1. 1] is called probability measure if P(Ω) = 1 and for all A1 . ∈ F with Ai ∩ Aj = ∅ for i = j one has ∞ ∞ µ i=1 Ai = i=1 µ(Ai ). A2 .. .e. 2 . 2. . (2) A map µ : F → [0. k = 1. 1 P({2. 2.. such that (a) Ωk ∈ F for all k = 1.2. µ) is called measure space.. (1. A2 . F. The triplet (Ω.. P) is called probability space. F) be a measurable space.. PROBABILITY SPACES 1. ∞] is called measure if µ(∅) = 0 and for all A1 . 6 Then. . probability space] (Ω. µ) or a measure µ is called σ-ﬁnite provided that there are Ωk ⊆ Ω. ∈ F with Ai ∩ Aj = ∅ for i = j one has ∞ ∞ P i=1 Ai = i=1 P(Ai ). Assuming that all outcomes for rolling a die are equally likely. i.3 (a) We assume the model of a die. (c) Ω = ∞ k=1 Ωk . (1) A map P : F → [0.. F. Remark 1.... µ) or the measure µ are called ﬁnite if µ(Ω) < ∞. Ω = {1. .2 Probability measures Let Now we introduce the measures we are going to use: Deﬁnition 1. 6} and F = 2Ω . F..2 Of course. F.. 6}) = . (b) Ωi ∩ Ωj = ∅ for i = j.

England) . the probability that exactly one of two coins shows heads is 1 P({(H.. T ). USA). .20/10/1984 (Tallahassee. (H. ..1... (H. 2 Example 1...2. T ). Example 1. 2 Paul Adrien Maurice Dirac. for example. k Note that P describes the binomial distribution with parameter p on {0.e. n} if we identify Ak with the natural number k.. . Let us now discuss a typical example in which the σ–algebra F is not the set of all subsets of Ω. The system F is the system of observable sets of events since one can only observe how many channels are failing. εn ) : εi ∈ {0.. Hence Ak consists of all ω such that the communication rate is ρk. which yields to the communication rate ρk. (T.. but not which channel fails.. ρ. H).5 Assume that there are n communication channels between the points A and B.2. The measure P is given by n n−k P(Ak ) := p (1 − p)k . Florida. 0 < p < 1. (T. 08/08/1902 (Bristol. . Ω = {(T. 4 That means. T ). H)} and F = 2Ω . i. Nobelprice in Physics 1933. 1}) with the interpretation: εi = 0 if channel i is failing. PROBABILITY MEASURES (b) If we assume to have two coins. . What is the right model for this? We use Ω := {ω = (ε1 . 0 : x0 ∈ A (b) Counting measure: Let Ω := {ω1 .4 [Dirac and counting measure] 2 (a) Dirac measure: For F = 2Ω and a ﬁxed x0 ∈ Ω we let δx0 (A) := 1 : x0 ∈ A . ωN } and F = 2Ω . F consists of all unions of Ak := {ω ∈ Ω : ε1 + · · · + εn = k} .. in case k channels are used. Each of the channels fails with probability p. H)}) = . then the intuitive assumption ’fair’ leads to 15 1 P({ω}) := . nρ}. εi = 1 if channel i is working. Each of the channels has a communication rate of ρ > 0 (say ρ bits per second).2. so that we have a random communication rate R ∈ {0.. Then µ(A) := cardinality of A.

∈ F then P ( ∞ i=1 n i=1 Ai ) = Ai ) ≤ ∞ i=1 P (Ai ). then ∞ n→∞ lim P(An ) = P n=1 An . then P(B c ) = 1 − P(B). B ∈ F... so that n ∞ ∞ n P i=1 Ai =P i=1 Ai = i=1 P (Ai ) = i=1 P (Ai ) .. (1) We let An+1 = An+2 = · · · = ∅. B2 := A2 \A1 .. A2 .. then P(A\B) = P(A) − P(A ∩ B). .. A2 . because of P(∅) = 0. An ∈ F such that Ai ∩ Aj = ∅ if i = j. and get that ∞ ∞ Bn = n=1 n=1 An and Bi ∩ Bj = ∅ . PROBABILITY SPACES We continue with some basic properties of a probability measure. ∈ F such that A1 ⊇ A2 ⊇ A3 ⊇ · · · . Since the Bi ’s are disjoint and ∞ Ai = ∞ Bi it i=1 i=1 follows ∞ ∞ ∞ ∞ P i=1 Ai =P i=1 Bi = i=1 P(Bi ) ≤ i=1 P(Ai ).. Proof.16 CHAPTER 1. (5) Continuity from below: If A1 .. B4 := A4 \A3 . . . then ∞ n→∞ lim P(An ) = P n=1 An . . (2) If A. we get that P(A ∩ B) + P(A\B) = P ((A ∩ B) ∪ (A\B)) = P(A). 3. . Then the following assertions are true: (1) If A1 . (4) Put B1 := A1 and Bi := Ac ∩Ac ∩· · ·∩Ac ∩Ai for i = 2. P) be a probability space. (6) Continuity from above: If A1 . ... Proposition 1. (4) If A1 . .6 Let (Ω. (3) We apply (2) to A = Ω and observe that Ω\B = B c by deﬁnition and Ω ∩ B = B. 1 2 i−1 P(Bi ) ≤ P(Ai ) for all i. F. Obviously. . (5) We deﬁne B1 := A1 . B3 := A3 \A2 . A2 . ∈ F such that A1 ⊆ A2 ⊆ A3 ⊆ · · · .. (3) If B ∈ F.2. then P ( n i=1 P (Ai ). (2) Since (A ∩ B) ∩ (A\B) = ∅.

2. A2 .2. then limn an = a. . F) be a measurable space and A1 . occur. PROBABILITY MEASURES for i = j.. ∞ ∞ ∞ N 17 P n=1 An N n=1 =P n=1 Bn = n=1 P (Bn ) = lim N →∞ P (Bn ) = lim P(AN ) n=1 N →∞ since Bn = AN . does not converge.1. Deﬁnition 1. if lim inf n an = lim supn an = a ∈ R. (2) The value lim supn an is the supremum of all c such that there is a subsequence n1 < n2 < n3 < · · · such that limk ank = c.e. But the limit superior and the limit inferior for a n=1 given sequence of real numbers exists always. taking an = (−1)n . The deﬁnition above says that ω ∈ lim inf n An if and only if all events An . ∈ F.9 [lim inf n An and lim supn An ] Let (Ω... the limit of (an )∞ does not exist. 2. . and that ω ∈ lim supn An if and only if inﬁnitely many of the events An occur. ∈ R we let lim inf an := lim inf ak n n k≥n and lim sup an := lim sup ak . n n Moreover.7 [lim inf n an and lim supn an ] For a1 . Deﬁnition 1. . .. . n n k≥n Remark 1. a2 . i. Consequently. gives lim inf an = −1 and n lim sup an = 1. The sequence an = (−1)n . (6) is an exercise.8 (1) The value lim inf n an is the inﬁmum of all c such that there is a subsequence n1 < n2 < n3 < · · · such that limk ank = c. except a ﬁnite number of them. n As we will see below. Then ∞ ∞ ∞ ∞ lim inf An := n n=1 k=n Ak and lim sup An := n n=1 k=n Ak .2. n = 1. (3) By deﬁnition one has that −∞ ≤ lim inf an ≤ lim sup an ≤ ∞. (4) For example. .2. also for a sequence of sets one can deﬁne a limit superior and a limit inferior.

F. where I is an arbitrary non-empty index set.. n n n Examples for such systems are obtained by assuming A1 ⊆ A2 ⊆ A3 ⊆ · · · or A1 ⊇ A2 ⊇ A3 ⊇ · · · .2. . taking A and B with P(A ∩ B) = P(A)P(B) and C = ∅ gives P(A ∩ B ∩ C) = P(A)P(B)P(C).12 [independence of events] Let (Ω.. one can easily see that only demanding P (A1 ∩ A2 ∩ · · · ∩ An ) = P (A1 ) P (A2 ) · · · P (An ) would not yield to an appropriate notion for the independence of A1 . . F. n n The proposition will be deduced from Proposition 3. are called independent. Then P(B|A) := P(B ∩ A) . 3 Pierre Joseph Louis Fatou.. . Independence can be also expressed through conditional probabilities. what we had in mind.6..10 [Lemma of Fatou] space and A1 . provided that for all distinct i1 . Then P lim inf An n n Let (Ω. Deﬁnition 1..2. Mandelbrot-set). Given A1 . An : for example...2.18 CHAPTER 1. is called conditional probability of B given A. A2 . A ∈ F with P(A) > 0.13 [conditional probability] Let (Ω. 28/02/1878-10/08/1929. P) be a probability space. An ∈ F. ∈ F.2. then the Lemma of Fatou gives that lim inf P (An ) = lim sup P (An ) = lim P (An ) = P (A) . Let us ﬁrst deﬁne them: Deﬁnition 1.11 If lim inf n An = lim supn An = A. Now we turn to the fundamental notion of independence. P) be a probability space. The events (Ai )i∈I ⊆ F.2. F.. P) be a probability ≤ lim inf P (An ) ≤ lim sup P (An ) ≤ P lim sup An ... which is surely not.. Remark 1. in ∈ I one has that P (Ai1 ∩ Ai2 ∩ · · · ∩ Ain ) = P (Ai1 ) P (Ai2 ) · · · P (Ain ) . . P(A) for B ∈ F. . PROBABILITY SPACES 3 Proposition 1. French mathematician (dynamical systems.

1702-17/04/1761. . Then P(Bj |A) = 4 n k=1 P(A|Bj )P(Bj ) . . Applying the above formula we get 0.95. PROBABILITY MEASURES 19 It is now obvious that A and B are independent if and only if P(B|A) = P(B).2. English mathematician.995 That means only 32% of the persons whose test results are positive actually have the disease. in fact. Bj ∈ F.323. = P(A|B)P(B) + P(A|B c )P(B c ) Example 1.005 + 0.15 [Bayes’ formula] 4 Assume A.95 × 0. 0.14 A laboratory blood test is 95% eﬀective in detecting a certain disease when it is. Hence we have P(A|B) = P(”a positive test result”|”person has the disease”) = 0. the test also yields a ”false positive” result for 1% of the healthy persons tested. . with 0 < P(B) < 1 and P(A) > 0. where Bi ∩ Bj = ∅ for i = j and P(A) > 0. A := ”the test result is positive”.01 × 0. where (A ∩ B) ∩ (A ∩ B c ) = ∅. . P(A|B c ) = 0. P(B|A) = Proposition 1. However. what is the probability a person has the disease given his test result is positive? We set B := ”the person has the disease”. If 0. Then A = (A ∩ B) ∪ (A ∩ B c ).2.005 ≈ 0.95 × 0. B ∈ F. An important place of the conditional probabilities in Statistics is guaranteed by Bayes’ formula.01. P(A|Bk )P(Bk ) Thomas Bayes. with Ω = n j=1 Bj . n. present.2.005. Before we formulate this formula in Proposition 1.5% of the population actually has the disease.15 we consider A. P(A) = P(A ∩ B) + P(A ∩ B c ) = P(A|B)P(B) + P(A|B c )P(B c ). This implies P(B|A) = P(B ∩ A) P(A|B)P(B) = P(A) P(A) P(A|B)P(B) .1. P(B) = 0. and therefore. P(Bj ) > 0 for j = 1.2. .

F.16 [Lemma of Borel-Cantelli] 5 Let (Ω.2. . .20 CHAPTER 1. (2) It holds that c ∞ ∞ lim sup An n = lim inf n Ac n = n=1 k=n Ac . the probabilities P(Bj ) the prior probabilities (or a priori probabilities). then P (lim supn→∞ An ) = 0. By Ak ⊆ k=n+1 k=n Ak and the continuity of P from above (see Proposition 1.6. and the probabilities P(Bj |A) the posterior probabilities (or a posteriori probabilities) of Bj . then ∞ n=1 ∞ k=n Ak . n So. 20/12/1875-21/07/1966.. ∞ n=1 (2) If A1 . . we would need to show ∞ ∞ P n=1 k=n Ac n = 0. A2 . k=n where the last inequality follows again from Proposition 1. PROBABILITY SPACES The proof is an exercise. Now we continue with the fundamental Lemma of Borel-Cantelli. P) be a probability space and A1 ... A2 . are assumed to be independent and P (lim supn→∞ An ) = 1. Letting Bn := ∞ k=n Ac we get that B1 ⊆ B2 ⊆ B3 ⊆ · · · . An event Bj is also called hypothesis. Proof. Italian mathematician. Then one has the following: (1) If ∞ n=1 P(An ) < ∞.2.2. ∈ F..6) we get that ∞ ∞ P lim sup An n→∞ = P = ≤ lim P Ak n=1 k=n ∞ n→∞ Ak k=n ∞ n→∞ lim P (Ak ) = 0. Proposition 1. so that k ∞ ∞ P n=1 k=n 5 Ac n = lim P(Bn ) n→∞ Francesco Paolo Cantelli. (1) It holds by deﬁnition lim supn→∞ An = ∞ ∞ P(An ) = ∞.

Ac .N ≥n lim P k=n N Ac k P (Ac ) k = N →∞. N →∞. Since the independence of A1 . and ∞ ∞ ∞ i=1 6 Ai ∈ G. To overcome this diﬃculty. Assume that P0 : G → [0. 6 Constantin Carath´odory.N ≥n lim (1 − pk ) k=n N N →∞.1.N ≥n lim k=n N = ≤ = N →∞.2. A2 . to prove existence and uniqueness of measures may sometimes be diﬃcult. .. 13/09/1873 (Berlin. Ai ∩ Aj = ∅ for i = j. one only knows their existence.2. ∈ G. . then P0 i=1 Ai = i=1 P0 (Ai ). (2) If A1 .N ≥n P − ∞ pk k=n lim e Although the deﬁnition of a measure is not diﬃcult.. .N ≥n lim e−pk k=n P − N pk k=n = e = e−∞ = 0 where we have used that 1 − x ≤ e−x . the σ-algebras are not constructed explicitly. implies the independence of Ac .17 [Caratheodory’s extension theorem] Let Ω be a non-empty set and G an algebra on Ω such that F := σ(G). The problem lies in the fact that. ∞) satisﬁes: (1) P0 (Ω) < ∞. A2 . one usually exploits ´ Proposition 1.. . Gere many)..02/02/1950 (Munich. Germany) .. PROBABILITY MEASURES so that it suﬃces to show ∞ 21 P(Bn ) = P k=n Ac k = 0. 1 2 we ﬁnally get (setting pn := P(An )) that ∞ N P k=n Ac k = N →∞.. in general..

Ak ∈ F2 . . F2 .1). P1 ) and (Ω2 .17. 1] by n with A1 ∈ F1 . ω2 ∈ A2 } (3) As algebra G we take all sets of type A := A1 × A1 ∪ · · · ∪ (An × An ) 1 2 1 2 with Ak ∈ F1 . N1 } × {1. Proof. We do this as follows: (1) Ω1 × Ω2 := {(ω1 . and (Ai × Ai ) ∩ Aj × Aj = ∅ for i = j.. As an application we construct (more or less without rigorous proof) the product space (Ω1 × Ω2 .18 The system G is an algebra. . PROBABILITY SPACES Then there exists a unique ﬁnite measure P on F such that P(A) = P0 (A) for all A ∈ G.. F1 . we deﬁne P0 : G → [0.s)∈I r s (C1 × C2 ) for some I ⊆ {1. Proof..2. ω2 ) : ω1 ∈ A1 . .22 CHAPTER 1. . respec1 2 k=1 l=1 N N 1 1 tively. The map P0 : G → [0.. (i) Assume 1 1 m m A = A1 × A1 ∪ · · · ∪ (An × An ) = B1 × B2 ∪ · · · ∪ (B1 × B2 ) 1 2 1 2 l l where the (Ak × Ak )n and the (B1 × B2 )m are pair-wise disjoint.s)∈I r s P1 (C1 )P2 (C2 ) = l=1 l l P1 (B1 )P2 (B2 ). 1 2 1 2 1 2 Finally. By drawing a picture and using that P1 and P2 are measures one observes that n m P1 (Ak )P2 (Ak ) = 1 2 k=1 (r. 1 2 Proposition 1. r = 1. F1 ⊗ F2 ..2.... C1 1 of Ω1 and C2 . P0 A1 1 × A1 2 ∪ ··· ∪ (An 1 × An ) 2 := k=1 P1 (Ak )P2 (Ak ). ω2 ) : ω1 ∈ Ω1 . Ni . A2 ∈ F2 . N2 }.. ... .. ω2 ∈ Ω2 }. See [3] (Theorem 3.. C2 2 of Ω2 so that Ak i and Bil can be represented as disjoint unions of the sets Cir . (2) F1 ⊗ F2 is the smallest σ-algebra on Ω1 × Ω2 which contains all sets of type A1 × A2 := {(ω1 .. 1] is correctly deﬁned and satisﬁes the assumptions of Carath´odory’s extension e theorem Proposition 1. Hence there is a representation A= (r.. P1 × P2 ) of two probability spaces (Ω1 . We ﬁnd partitions C1 . P2 )..

1 2 Since the inequality N P1 (A1 )P2 (A2 ) ≥ k=1 P1 (Ak )P2 (Ak ) 1 2 can be easily seen for all N = 1. so that 0 ≤ ϕN (ω1 ) ↑N ϕ(ω1 ) = 1 A1 (ω1 )P2 (A2 ).. PROBABILITY MEASURES 23 (ii) To check that P0 is σ-additive on G it is suﬃcient to prove the following: For Ai .2. The probability space (Ω1 × Ω2 .1.19 [product of probability spaces] The extension of P0 to F1 ⊗ F2 according to Proposition 1. . Let ε ∈ (0. F1 ⊗ F2 . 2.2.17 is called product measure and denoted by P1 × P2 . we are done. 1 2 Since ε ∈ (0. 1) I and N Bε := {ω1 ∈ Ω1 : (1 − ε)P2 (A2 ) ≤ ϕN (ω1 )} ∈ F1 .. Ak ∈ Fi with i ∞ A1 × A2 = k=1 (Ak × Ak ) 1 2 and (Ak × Ak ) ∩ (Al × Al ) = ∅ for k = l one has that 1 2 1 2 ∞ P1 (A1 )P2 (A2 ) = k=1 P1 (Ak )P2 (Ak ). P1 × P2 ) is called product probability space.2. 1 2 We let ∞ N ϕ(ω1 ) := n=1 1 I {ω1 ∈An } 1 P2 (An ) 2 and ϕN (ω1 ) := n=1 1 {ω1 ∈An } P2 (An ) I 2 1 for N ≥ 1. Deﬁnition 1. by an argument like in step (i) we concentrate ourselves on the converse inequality ∞ P1 (A1 )P2 (A2 ) ≤ k=1 P1 (Ak )P2 (Ak ). N N Because (1 − ε)P2 (A2 ) ≤ ϕN (ω1 ) for all ω1 ∈ Bε one gets (after some calcuN N lation.. 1) was arbitrary.) (1 − ε)P2 (A2 )P1 (Bε ) ≤ k=1 P1 (Ak )P2 (Ak ) and therefore 1 2 N N lim(1 − ε)P1 (Bε )P2 (A2 ) ≤ lim N N k=1 ∞ P1 (Ak )P2 (Ak ) = 1 2 k=1 P1 (Ak )P2 (Ak ). N The sets Bε are non-decreasing in N and ∞ N =1 N Bε = A1 so that N (1 − ε)P1 (A1 )P2 (A2 ) = lim(1 − ε)P1 (Bε )P2 (A2 ). ..

Deﬁnition 1. Here the interested reader is referred. yn ) ∈ Rn . ..23 Let (Ω..22 [π-system] A system G of subsets A ⊆ Ω is called πsystem.2. 2..2. Now we deﬁne the the Borel σ-algebra on Rn . bn ) : a1 < b1 . Then P1 (B) = P2 (B) for all B ∈ F. Assume two probability measures P1 and P2 on F such that P1 (A) = P2 (A) for all A ∈ G... F1 ⊗ F2 ⊗ F3 . b1 ) × · · · × (an . where G is a π-system. to [5] where a special case is considered. . If one is only interested in the uniqueness of measures one can also use the ´ following approach as a replacement of Caratheodory’s extension theorem: Deﬁnition 1. . P1 × P2 × · · · × Pn ) by iteration. P1 × P2 × P3 ). provided that A∩B ∈G for all A.. b) : −∞ < a < b < ∞} ∪ {∅}. As in the case n = 1 one can show that B(Rn ) is the smallest σ-algebra which contains all open subsets of Rn . Regarding the above product spaces there is Proposition 1.. Letting |x − y| := ( n |xk − yk |2 ) 2 for x = (x1 .20 For n ∈ {1. take for example the π-system {(a.2. B ∈ G.24 One can prove that CHAPTER 1. for example. xn ) ∈ Rn and y = k=1 (y1 .... PROBABILITY SPACES (F1 ⊗ F2 ) ⊗ F3 = F1 ⊗ (F2 ⊗ F3 ) and (P1 × P2 ) × P3 = P1 × (P2 × P3 ). Iterating this procedure. F) be a measurable space with F = σ(G).. which we simply denote by (Ω1 × Ω2 × Ω3 . we can deﬁne ﬁnite products (Ω1 × Ω2 × · · · × Ωn . The case of inﬁnite products requires more work. we say that a set A ⊆ Rn is open provided that for all x ∈ A there is an ε > 0 such that {y ∈ Rn : |x − y| < ε} ⊆ A.2. 1 Any algebra is a π-system but a π-system is not an algebra in general. .} we let B(Rn ) := σ ((a1 . .21 B(Rn ) = B(R) ⊗ · · · ⊗ B(R). an < bn ) . Proposition 1. F1 ⊗ F2 ⊗ · · · ⊗ Fn .

p (B) := n k=0 k p (1 − p) measure introduced in Example 1.3.3. 1.25/04/1840 (Sceaux.. 1.. we get the following model: at day 0 the probability of breaking down is p. .3. that within n trials one has k-times heads. where δk is the Dirac (3) P(B) = µn. 1). 1...}.1 Examples of distributions Binomial distribution with parameter 0 < p < 1 (1) Ω := {0.p ({k}) equals the probability.1. . Interpretation: The probability that an electric light bulb breaks down is p ∈ (0. (2) F := 2Ω (system of all subsets of Ω).3. So.2 Poisson distribution with parameter λ > 0 (1) Ω := {0. k=0 e k! k The Poisson distribution 7 is used. 3. Then µn.. n k n−k δk (B).4.. EXAMPLES OF DISTRIBUTIONS 25 1. 1.. e . (2) F := 2Ω (system of all subsets of Ω). such that one has heads with probability p and tails with probability 1 − p.3 Geometric distribution with parameter 0 < p < 1 (1) Ω := {0. 21/06/1781 (Pithiviers.}. (3) P(B) = πλ (B) := ∞ −λ λk δ (B). n}. 2. 3. (2) F := 2Ω (system of all subsets of Ω). 7 Sim´on Denis Poisson. that means the break down is independent of the time the bulb has been already switched on. The bulb does not have a ”memory”. France) . If the bulb survives day 0. it breaks down again with probability p at the ﬁrst day so that the total probability of a break down at day 1 is (1 − p)p. 2. Interpretation: Coin-tossing with one coin.2.3 1. to model stochastic processes with a continuous time parameter and jumps: the probability that the process jumps k times between the time-points s and t with 0 ≤ s < t < ∞ is equal to πλ(t−s) ({k}). (3) P(B) = µp (B) := ∞ k=0 (1 − p)k pδk (B). France). . 1. for example. If we continue in this way we get that breaking down at day k has the probability (1 − p)k p.

3. (3) As generating algebra G for B((a. we ﬁrst construct the Lebesgue measure on the intervals (a. After a standard reduction (check!) we have to show the following: given 0 ≤ a ≤ b ≤ 1 and pair-wise disjoint intervals (an . PROBABILITY SPACES 1. bn ] ∪ bn .17. b]) := σ((c. (2) F = B((a. b] with −∞ < a < b < ∞. 2n Hence we have an open covering of a compact set and there is a ﬁnite subcover: ε [a + ε. bn ] we have that b − a = ∞ n=1 (bn − an ).26 CHAPTER 1. b].3. For such a set A we let 1 P0 (A) := b−a n (bi − ai ).1 The system G is an algebra. bn + ε . bn ] with ∞ (a. b] ⊆ (an . b] ⊆ n=1 an . b2 ] ∪ · · · ∪ (an . b] = n=1 (an . b − a) and observe that ∞ [a + ε. The map P0 : G → [0. so that ∞ ε 2n is at b−a−ε≤ n∈I(ε) (bn − an ) + ε ≤ n=1 (bn − an ) + ε. b1 ] ∪ (a2 . b] such that A can be written as A = (a1 . Let ε ∈ (0. d] : a ≤ c < d ≤ b). The total length of the intervals bn . bn + most ε > 0. For notational simplicity we let a = 0 and b = 1. bn ] where a ≤ a1 ≤ b1 ≤ · · · ≤ an ≤ bn ≤ b. b]) we take the system of subsets A ⊆ (a. bn + n 2 n∈I(ε) for some ﬁnite set I(ε). .2. Proof. i=1 Proposition 1.4 Lebesgue measure and uniform distribution ´ Using Caratheodory’s extension theorem. For this purpose we let (1) Ω := (a. 1] is correctly deﬁned and satisﬁes the assumptions of Carath´odory’s extension e theorem Proposition 1.

Now we can go backwards: In order to obtain the Lebesgue measure on a set I ⊆ R with I ∈ B(R) we let BI := {B ⊆ I : B ∈ B(R)} and λI (B) := λ(B) for B ∈ BI . b]. continuation of work of Emile Borel and Camille Jordan). then λI /λ(I) is the uniform distribution on I. Important cases for I are the closed intervals [a. d]) = b−a for a ≤ c < d ≤ b.2. French mathematician (generalized e the Riemann integral by the Lebesgue integral.1. 8 Henri L´on Lebesgue. . EXAMPLES OF DISTRIBUTIONS Letting ε ↓ 0 we arrive at ∞ 27 b−a≤ n=1 (bn − an ). b]) according to Proposition 1. Furthermore. Deﬁnition 1. λ is the unique σ-ﬁnite measure on B(R) such that λ((c. the opposite inequality is obvious. We can also write λ(B) = B dλ(x).2 [Uniform distribution] The unique extension P of P0 to B((a. b]) (check!). b]) (check!).3. Then we can proceed as follows: Deﬁnition 1. Given that λ(I) > 0.3. d]) = d − c for all −∞ < c < d < ∞. for −∞ < a < b < ∞ we have that B(a. b]) such that P((c.17 is called uniform distribution on (a. 28/06/1875-26/07/1941. b] ∈ B((a. To get the Lebesgue measure on R we observe that B ∈ B(R) implies that B ∩ (a.b] = B((a. n]) where Pn is the uniform distribution on (n − 1. Since b − a ≥ N n=1 (bn − an ) for all N ≥ 1.3. n].3 [Lebesgue measure] Lebesgue measure on R by ∞ 8 Given B ∈ B(R) we deﬁne the λ(B) := n=−∞ Pn (B ∩ (n − 1. d−c Hence P is the unique measure on B((a. b]. Accordingly.

σ2 is called Gaussian distribution 10 (normal distribution) with mean m and variance σ 2 .3.1.D.5. 9 Georg Friedrich Bernhard Riemann. but compare with Proposition 3. 1.σ2 (x) := √ 1 2πσ 2 e− (x−m)2 2σ 2 . so that we can extend P0 to a probability measure Nm. (2) F := B(R) Borel σ-algebra. The function pm.4 and deﬁne n bi P0 (A) := i=1 ai √ 1 2πσ 2 e− (x−m)2 2σ 2 dx for A := (a1 .17. n bi ai P0 (A) := i=1 pλ (x)dx with pλ (x) := 1 [0. PROBABILITY SPACES 1.17.28 CHAPTER 1.2.23/02/1855 (G¨ttingen.σ2 (A) = A pm. One can show (we do not do this here. P0 satisﬁes the assumptions of Proposition 1. 17/09/1826 (Germany) . o .3. The measure Nm. b1 ]∪(a2 . b2 ]∪· · ·∪(an .σ2 (x) is called Gaussian density. Germany) . Hannover. so that we can extend P0 to the exponential distribution µλ with parameter λ and density pλ (x) on B(R).5 Gaussian distribution on R with mean m ∈ R and variance σ 2 > 0 (1) Ω := R. 10 Johann Carl Friedrich Gauss. via the Riemann-integral.σ2 (x)dx with pm. (3) For A and G as in Subsection 1.∞) (x)λe−λx I Again.2. thesis under Gauss. 30/04/1777 (Brunswick.5 we deﬁne. Ph.3. Given A ∈ B(R) we write Nm. Germany).8 below) that P0 satisﬁes the assumptions of Proposition 1.6 Exponential distribution on R with parameter λ>0 (1) Ω := R.σ2 on B(R). (3) We take the algebra G considered in Example 1. bn ] where we consider the Riemannintegral 9 on the right-hand side. (2) F := B(R) Borel σ-algebra.20/07/1866 (Italy).

3. ∞)|[a.3. Proposition 1. Example 1. . 1). . that a customer will spend more than 15 minutes? (b) What is the probability. that a customer will spend more than 15 minutes from the beginning in the post oﬃce. 1 For (b) we get 1. ∞)|[a. EXAMPLES OF DISTRIBUTIONS Given A ∈ B(R) we write µλ (A) = A 29 pλ (x)dx... ∞)) = e−15 10 ≈ 0. ∞)) = e−5 10 ≈ 0. . n = 1. Then. we see that the distribution does not have a memory in the sense that for a. and assume that npn → λ as n → ∞. . ∞)) with the conditional probability on the left-hand side. it holds µλ ([a + b. n → ∞. b ≥ 0 we have µλ ([a + b. ∞) ∩ [a. 2. for all k = 0. ∞)|[10. ∞)) = = = µλ ([a + b. pn ∈ (0. given that the customer already spent at least 10 minutes? The answer for (a) is µλ ([15.3. The exponential distribution can be considered as a continuous time version of the geometric distribution. ∞)) ∞ −λx λ a+b e dx λ e ∞ −λx e dx a −λ(a+b) e−λa = µλ ([b. .. 1 µλ ([15. . In particular. (a) What is the probability. Indeed.1.5 [Poisson’s Theorem] Let λ > 0.3. ∞)) = µλ ([b.7 Poisson’s Theorem For large n and small p the Poisson distribution provides a good approximation for the binomial distribution.604.pn ({k}) → πλ ({k}). 1. µn.4 Suppose that the amount of time one spends in a post oﬃce 1 is exponential distributed with λ = 10 .220. ∞)). ∞)) µλ ([a. In words: the probability of a realization larger or equal to a+b under the condition that one has already a value larger or equal a is the same as having a realization larger or equal b. ∞)) = µλ ([5.

In the same way we get that lim λ + εn 1− n n−k n→∞ ≤ e−(λ−ε0 ) .. .30 CHAPTER 1. limn→∞ (npn )k = λk and limn→∞ n(n−1). Using l’Hospital’s rule we get lim ln 1 − λ + ε0 n n−k n→∞ = = n→∞ lim (n − k) ln 1 − lim λ + ε0 n ln 1 − λ+ε0 n n→∞ 1/(n − k) −1 λ+ε0 1 − λ+ε0 n n2 = lim 2 n→∞ −1/(n − k) = −(λ + ε0 ). . So we have nk to show that limn→∞ (1 − pn )n−k = e−λ . n→∞ Choose ε0 > 0 and n0 ≥ 1 such that |εn | ≤ ε0 for all n ≥ n0 . k k! n Of course. . Fix an integer k ≥ 0. since we can choose ε0 > 0 arbitrarily small lim (1 − pn )n−k = lim 1− λ + εn n n−k n→∞ n→∞ = e−λ .pn ({k}) = n k pn (1 − pn )n−k k n(n − 1) . (n − k + 1) = (npn )k (1 − pn )n−k . Then λ + ε0 1− n n−k ≤ λ + εn 1− n n−k ≤ λ − ε0 1− n n−k . Then µn. (n − k + 1) k = pn (1 − pn )n−k k! 1 n(n − 1) .(n−k+1) = 1.. By npn → λ we get that there exists a sequence εn such that npn = λ + εn with lim εn = 0. . PROBABILITY SPACES Proof. . Hence e−(λ+ε0 ) = lim 1− λ + ε0 n n−k n→∞ ≤ lim n→∞ 1− λ + εn n n−k . Finally.

A SET WHICH IS NOT A BOREL SET 31 1. .4.1. (3) x ∼ y and y ∼ z imply x ∼ z for all x. imply ∞ n=1 An ∈ L.2) where λ is the Lebesgue measure on (0. x+y x+y−1 if x + y ∈ (0. Lemma 1. A2 . y ∈ (0. then P ⊆ L implies σ(P) ⊆ L. 1]) and λ(A ⊕ x) = λ(A) for all x ∈ (0. 1] and A ⊆ (0. 1]) : A ⊕ x ∈ B((0.4. 1] otherwise . (3) A1 .1 [λ-system] A class L is a λ-system if (1) Ω ∈ L. 1] but not an element of B((0. B ∈ L and A ⊆ B imply B\A ∈ L. 1]) := {B = A ∩ (0. 1]. 1]} (1. (2) A. 1]. y ∈ X (symmetry). Now deﬁne L := {A ∈ B((0. 2. z ∈ X (transitivity).2) is a λ-system.4. n = 1.4.2 [π-λ-Theorem] If P is a π-system and L is a λsystem. · · · ∈ L and An ⊆ An+1 . Before we start we need Deﬁnition 1. y.4 The system L from (1. Proposition 1. . Given x. we also need the addition modulo one x ⊕ y := and A ⊕ x := {a ⊕ x : a ∈ A}. Deﬁnition 1.4 A set which is not a Borel set In this section we shall construct a set which is a subset of (0.4. (2) x ∼ y implies y ∼ x for all x. 1] : A ∈ B(R)} . .3 [equivalence relation] An relation ∼ on a set X is called equivalence relation if and only if (1) x ∼ x for all x ∈ X (reﬂexivity).

PROBABILITY SPACES Proof. Proposition 1. b] ∈ L. Finally. Then there is a function ϕ on I such that ϕ : α → m α ∈ Mα . so that λ(A ⊕ x) = λ(A) and λ(B ⊕ x) = λ(B). one can form a set by choosing of each set Mα a representative mα . Proof. 1]). 1]). We take the system L from (1. and therefore. Since λ is a probability measure it follows λ(B \ A) = = = = λ(B) − λ(A) λ(B ⊕ x) − λ(A ⊕ x) λ((B ⊕ x) \ (A ⊕ x)) λ((B \ A) ⊕ x) and B\A ∈ L. By the deﬁnition of ⊕ it is easy to see that A ⊆ B implies A ⊕ x ⊆ B ⊕ x and (B ⊕ x) \ (A ⊕ x) = (B \ A) ⊕ x. 1]) ⊆ L. Since P := {(a.4. We have to show that B \ A ∈ L.2). Property (3) is left as an exercise.4. Proposition 1. The property (1) is clear since Ω ⊕ x = Ω. 1]) it follows by the π-λ-Theorem 1. B ∈ L and A ⊆ B.5 [Axiom of choice] Let I be a non-empty set and (Mα )α∈I be a system of non-empty sets Mα .32 CHAPTER 1. 1] be consisting of exactly one representative point from each equivalence class (such set exists under the assumption of the axiom of . If (a. we need the axiom of choice. 1] which does not belong to B((0.4.6 There exists a subset H ⊆ (0. Let H ⊆ (0. (B \ A) ⊕ x ∈ B((0.2 that B((0. In other words. b] ⊆ [0. 1]. then (a. 1]. Let us deﬁne the equivalence relation x∼y if and only if x⊕r =y for some rational r ∈ (0. To check (2) let A. b] : 0 ≤ a < b ≤ 1} is a π-system which generates B((0.

1 = λ((0.4. 1]. 1]) ⊆ L implies H ⊕ r ∈ B((0. the right hand side can either be 0 (if a = 0) or ∞ (if a > 0).1.1] (H ⊕ r) = rational r∈(0. 1] = r∈(0. 1]). . This leads to a contradiction. then there would exist h1 ⊕ r1 ∈ (H ⊕ r1 ) and h2 ⊕ r2 ∈ (H ⊕ r2 ) with h1 ⊕ r1 = h2 ⊕ r2 . . So it follows that (0. Consequently. so H ∈ B((0.1] λ(H ⊕ r). rational By B((0. . Then H ⊕ r1 and H ⊕ r2 are disjoint for r1 = r2 : if they were not disjoint. 1]) ⊆ L we have λ(H ⊕ r) = λ(H) = a ≥ 0 for all rational numbers r ∈ (0. 1]) then B((0. 1]) and   λ((0. rational So. 1]) = r∈(0. 1] is the countable union of disjoint sets (0. But this implies h1 ∼ h2 and hence h1 = h2 and r1 = r2 . 1]) = λ  r∈(0.1] (H ⊕ r).1] λ(H ⊕ r) = a + a + . rational If we assume that H ∈ B((0. A SET WHICH IS NOT A BOREL SET 33 choice).

PROBABILITY SPACES .34 CHAPTER 1.

I 1 ∅ = 0. 2. αn ∈ R and A1 ..Chapter 2 Random variables Given a probability space (Ω. I where 1 Ai (ω) := I 1 : ω ∈ Ai . b)} ∈ F and hence to random variables we will introduce now. where a < b. This leads us to the condition {ω ∈ Ω : f (ω) ∈ (a.... F. F) be a measurable space.1 [(measurable) step-function] Let (Ω. I 1 A + 1 Ac = 1.. provided that there are α1 .. I I 35 . . b)}) . A function f : Ω → R is called measurable step-function or step-function. . in many stochastic models functions f : Ω → R which describe certain random phenomena are considered and one is interested in the computation of expressions like P ({ω ∈ Ω : f (ω) ∈ (a. P). 0 : ω ∈ Ai Some particular examples for step-functions are 1 Ω = 1.1. An ∈ F such that f can be written as n f (ω) = i=1 αi 1 Ai (ω). Deﬁnition 2.1 Random variables We start with the most simple random variables.

I I I I The deﬁnition above concerns only functions which take ﬁnitely many values. So we wish to extend this deﬁnition. as we see from Proposition 2.3. I I I 1 A∪B = 1 A + 1 B − 1 A∩B . b)) := {ω ∈ Ω : a < f (ω) < b} ∈ F. n→∞ Does our deﬁnition give what we would like to have? Yes. A map f : Ω → R is called random variable provided that there is a sequence (fn )∞ of measurable step-functions fn : Ω → R such that n=1 f (ω) = lim fn (ω) for all ω ∈ Ω.36 CHAPTER 2. b)) = {ω ∈ Ω : a ≤ f (ω) < b} ∞ = m=1 ω ∈Ω:a− 1 < f (ω) < b m ∈F so that we can use the step-functions 4n −1 fn (ω) := k=−4n k 1 k I k+1 (ω).3 Let (Ω. 2n { 2n ≤f < 2n } Sometimes the following proposition is useful which is closely connected to Proposition 2.1. F) be a measurable space. (1) =⇒ (2) Assume that f (ω) = lim fn (ω) n→∞ where fn : Ω → R are measurable step-functions. (2) =⇒ (1) First we observe that we also have that f −1 ([a. RANDOM VARIABLES 1 A∩B = 1 A 1 B . Then the following conditions are equivalent: (1) f is a random variable. b)) ∈ F so that f −1 ((a.1. which will be too restrictive in future. b)) = = m=1 N =1 n=N ω ∈ Ω : a < lim fn (ω) < b n ∞ ∞ ∞ ω ∈Ω:a+ 1 1 < fn (ω) < b − m m ∈ F.1. For a measurable stepfunction one has that −1 fn ((a.2 [random variables] Let (Ω. F) be a measurable space and let f : Ω → R be a function. . Deﬁnition 2. (2) For all −∞ < a < b < ∞ one has that f −1 ((a. Proof.

which is necessary in many considerations and even more natural. (2) We ﬁnd measurable step-functions fn .2. (3).2 Measurable maps Now we extend the notion of random variables to the notion of measurable maps.5 [properties of random variables] Let (Ω.1. Then f : Ω → R is a random variable. (2) (f g)(ω) := f (ω)g(ω) is a random-variable.1. Items (1). then (4) |f | is a random variable. n→∞ Finally. g : Ω → R random variables and α. 2. (3) If g(ω) = 0 for all ω ∈ Ω. since Ai ∩ Bj ∈ F. gn : Ω → R such that f (ω) = lim fn (ω) and g(ω) = lim gn (ω). Then the following is true: (1) (αf + βg)(ω) := αf (ω) + βg(ω) is a random variable. In fact. Proposition 2.4 Assume a measurable space (Ω.2. assuming that k l fn (ω) = i=1 αi 1 Ai (ω) and gn (ω) = I j=1 βj 1 Bj (ω). I yields k l k l (fn gn )(ω) = i=1 j=1 αi βj 1 Ai (ω)1 Bj (ω) = I I i=1 j=1 αi βj 1 Ai ∩Bj (ω) I and we again obtain a step-function. that fn (ω)gn (ω) is a measurable step-function. MEASURABLE MAPS 37 Proposition 2. n→∞ n→∞ Hence (f g)(ω) = lim fn (ω)gn (ω). and (4) are an exercise. β ∈ R. Proof. F) be a measurable space and f. F) and a sequence of random variables fn : Ω → R such that f (ω) := limn fn (ω) exists for all ω ∈ Ω. The proof is an exercise. we remark. f g (ω) := f (ω) g(ω) is a random variable. .

We show that A is a σ–algebra. · · · ∈ A. By deﬁnition of Σ = σ(Σ0 ) this implies that Σ ⊆ A. f −1 i=1 Bi = i=1 f −1 (Bi ) ∈ F. F) and (M. then ∞ ∞ for all B ∈ Σ0 . B(R))-measurable. A map f : Ω → M is called (F. Σ)-measurable. Proof of Proposition 2. which implies our lemma. F) and (M. Σ) be measurable spaces and let f : Ω → M . F) be a measurable space and f : Ω → R. (1) f −1 (M ) = Ω ∈ F implies that M ∈ A. Σ) be measurable spaces. RANDOM VARIABLES Deﬁnition 2. for all B ∈ Σ. provided that f −1 (B) = {ω ∈ Ω : f (ω) ∈ B} ∈ F for all B ∈ Σ. {ω : f (ω) ∈ B c } {ω : f (ω) ∈ B} / Ω \ {ω : f (ω) ∈ B} f −1 (B)c ∈ F. (2) =⇒ (1) follows from (a. b) : −∞ < a < b < ∞).2.3 Let (Ω. then f −1 (B c ) = = = = (3) If B1 . By assumption. Assume that Σ0 ⊆ Σ is a system of subsets such that σ(Σ0 ) = Σ.2. Deﬁne A := B ⊆ M : f −1 (B) ∈ F .3 since B(R) = σ((a.2. .2. For the proof we need Lemma 2.2. The connection to the random variables is given by Proposition 2. b) ∈ B(R) for a < b which implies that f −1 ((a. Σ0 ⊆ A.2. (1) =⇒ (2) is a consequence of Lemma 2. (2) If B ∈ A. b)) ∈ F. (2) The map f is (F. B2 .1 [measurable map] Let (Ω.38 CHAPTER 2. If f −1 (B) ∈ F then f −1 (B) ∈ F Proof. Then the following assertions are equivalent: (1) The map f is a random variable.2 Let (Ω.

2. Then Pf (B) := P ({ω ∈ Ω : f (ω) ∈ B}) is called the law or image measure of the random variable f . B([0. 1].p) (ω).2. then f is (B(R). MEASURABLE MAPS 39 Example 2. p)) = p. Assume that f : Ω1 → Ω2 is (F1 . Then P2 is a probability measure on F2 . (2) Assume that P1 is a probability measure on F1 and deﬁne P2 (B2 ) := P1 ({ω1 ∈ Ω1 : f (ω1 ) ∈ B2 }) . F2 ). F2 )-measurable and that g : Ω2 → Ω3 is (F2 . Assume the random number generator gives out the number x. Now we state some general properties of measurable maps. I Then it holds P2 ({1}) := P1 ({ω ∈ Ω : f (ω) = 1}) = λ([0. (Ω2 .2. (Ω3 . F3 ) be measurable spaces. ”output” would simulate the ﬂipping of an (unfair) coin.5 Let (Ω1 . P2 ({0}) := P1 ({ω1 ∈ Ω1 : f (ω1 ) = 0}) = λ([p. F3 )-measurable. 1]) = 1 − p. P) be a probability space and f : Ω → R be a random variable. p) and ”output” = ”tails” in case x ∈ [p. . Since the open intervals generate B(R) we can apply Lemma 2. B(R))measurable.3.p . F1 ). 1]. So we take the probability space ([0. Since f is continuous we know that f −1 ((a. Proposition 2. b)) is open for all −∞ < a < b < ∞. b)) ∈ B(R). 1]).2.4 If f : R → R is continuous. 1]. Example 2.2.2.2.6 We want to simulate the ﬂipping of an (unfair) coin by the random number generator: the random number generator of the computer gives us a number which has (a discrete) uniform distribution on [0.7 [law of a random variable] Let (Ω. or in other words. F. The proof is an exercise. so that f −1 ((a. Proof. Deﬁnition 2. Then the following is satisﬁed: (1) g ◦ f : Ω1 → Ω3 deﬁned by (g ◦ f )(ω1 ) := g(f (ω1 )) is (F1 . ”output” has binomial distribution µ1. λ) and deﬁne for p ∈ (0. If we would write a program such that ”output” = ”heads” in case x ∈ [0. F3 )-measurable. 1) the random variable f (ω) := 1 [0.

Proposition 2. F. (2) F1 (x) = P1 ((−∞.2. (ii) F is right-continuous: let x ∈ R and xn ↓ x.10 Assume that P1 and P2 are probability measures on B(R) and F1 and F2 are the corresponding distribution functions.2. Then the following assertions are equivalent: (1) P1 = P2 . P).9 [Properties of distribution-functions] The distribution-function Ff : R → [0.8 [distribution-function] Given a random variable f : Ω → R on a probability space (Ω.40 CHAPTER 2. Proof. the function Ff (x) := P({ω ∈ Ω : f (ω) ≤ x}) is called distribution function of f . (i) F is non-decreasing: given x1 < x2 one has that {ω ∈ Ω : f (ω) ≤ x1 } ⊆ {ω ∈ Ω : f (ω) ≤ x2 } and F (x1 ) = P({ω ∈ Ω : f (ω) ≤ x1 }) ≤ P({ω ∈ Ω : f (ω) ≤ x2 }) = F (x2 ). RANDOM VARIABLES The law of a random variable is completely characterized by its distribution function which we introduce now. Deﬁnition 2. x]) = P2 ((−∞. n (iii) The properties limx→−∞ F (x) = 0 and limx→∞ F (x) = 1 are an exercise. . Then F (x) = P({ω ∈ Ω : f (ω) ≤ x}) ∞ = P n=1 n {ω ∈ Ω : f (ω) ≤ xn } = lim P ({ω ∈ Ω : f (ω) ≤ xn }) = lim F (xn ). x]) = F2 (x) for all x ∈ R. 1] is a right-continuous non-decreasing function such that x→−∞ lim F (x) = 0 and x→∞ lim F (x) = 1. Proposition 2.2.

there exist measurable step functions (fn )∞ i. F) be a measurable space and f : Ω → R be a function.2.23. We consider (2) ⇒ (1): For sets of type A := (a.2. Then the following relations hold true: f −1 (A) ∈ F for all A ∈ G where G is one of the systems given in Proposition 1.1. Now one can apply Proposition 1.1. Lemma 2. b)) ∈ F for all − ∞ < a < b < ∞ . MEASURABLE MAPS 41 Proof. (1) ⇒ (2) is of course trivial.e.2. Summary: Let (Ω.2 f is a random variable i. Proposition 2.3 f is measurable: f −1 (A) ∈ F for all A ∈ B(R) Proposition 2. b] one can show that F1 (b) − F1 (a) = P1 (A) = P2 (A) = F2 (b) − F2 (a).3 f −1 ((a. n=1 I k fn = Nn an 1 An k=1 k with an ∈ R and An ∈ F such that k k fn (ω) → f (ω) for all ω ∈ Ω as n → ∞.8 or any other system such that σ(G) = B(R).e.2.2.

then the deﬁnition above is equivalent to Deﬁnition 2. The family (fi )i∈I is called independent provided that for all distinct i1 ..... be random variables where I is a non-empty index-set.3. . In case. Bn ∈ B(R) one has that P (fi1 ∈ B1 ... RANDOM VARIABLES 2. . i = 1. . that means for example I = {1. . .3. . . and all B1 .. .. . n = 1.. g2 (fn1 +1 (ω). fn ∈ Bn ) = P (f1 ∈ B1 ) · · · P (fn ∈ Bn ) ..}. fin ∈ Bn ) = P (fi1 ∈ B1 ) · · · P (fin ∈ Bn ) ... we have a ﬁnite index set I.. . 2. Then the following assertions are equivalent. P) be a probability space and fi : Ω → R. be random variables where I is a non-empty index-set. For the following we say that g : Rn → R is Borel-measurable (or a Borel function) provided that g is (B(Rn ). The proof is an exercise.. Sometimes we need to group independent random variables..4 [Grouping of independent random variables] Let fk : Ω → R.... and ni ∈ {1. 2.1 [independence of a family of random variables] Let (Ω. In this respect the following proposition turns out to be useful. .3. 2. 2. random variables.... .. Assume Borel functions gi : Rni → R for i = 1. fn1 +n2 (ω)).42 CHAPTER 2. (2) For all families (Bi )i∈I of Borel sets Bi ∈ B(R) one has that the events ({ω ∈ Ω : fi (ω) ∈ Bi })i∈I are independent. B(R))-measurable... Bn ∈ B(R) one has that P (f1 ∈ B1 . F. F. F. P) be a probability space and fi : Ω → R.2 [independence of a finite family of random variables] Let (Ω.3. ..... g3 (fn1 +n2 +1 (ω). fn1 +n2 +n3 (ω)). .3 Independence Let us ﬁrst start with the notion of a family of independent random variables. are independent... n... P) be a probability space and fi : Ω → R.. n}.. Then the random variables g1 (f1 (ω). . .. in ∈ I. 3. . i ∈ I. k = 1. . . The connection between the independence of random variables and of events is obvious: Proposition 2.. fn are called independent provided that for all B1 . The random variables f1 .3 Let (Ω. . . fn1 (ω)). . Proposition 2. (1) The family (fi )i∈I is independent. i ∈ I. Deﬁnition 2.. .. be independent random variables.

3. g) has a density which is the product of the densities of f and g. and Fg (x) = −∞ pg (y)dy for all x ∈ R (one says that the distribution-functions Ff and Fg are absolutely continuous with densities pf and pg .. In other words: the distribution-function of the random vector (f.) : xn ∈ R} . How to construct such sequences? First we let Ω := RN = {x = (x1 . g) ∈ B) = (Pf × Pg )(B) for all B ∈ B(R2 ). : Ω → R having a certain distribution. . . . g : Ω → R are random variables with laws Pf and Pg and distribution-functions Ff and Fg . 2. Then the following assertions are equivalent: (1) f and g are independent.g) (x.3.3. y) = −∞ −∞ pf (u)pg (v)d(u)d(v). that means πn ﬁlters out the n-th coordinate.5 [independence and product of laws] Assume that (Ω. respectively. f2 .. that means we take −1 B(RN ) = σ πn (B) : n = 1. g ≤ y) = Ff (x)Fg (y) for all x. respectively). pg : R → [0.. The proof is an exercise. x2 . ..6 Assume that there are Riemann-integrable functions pf .2. INDEPENDENCE 43 Proposition 2. P) is a probability space and that f. Remark 2. Often one needs the existence of sequences of independent random variables f1 .. F. Then we deﬁne the projections πn : RN → R given by πn (x) := xn .. Ff (x) = −∞ pf (y)dy. y ∈ R. (3) P(f ≤ x. ∞) such that pf (x)dx = R x R x pg (x)dx = 1. B ∈ B(R) . Then the independence of f and g is also equivalent to the representation x y F(f.. Now we take the smallest σ-algebra such that all these projections are random variables. (2) P ((f.

. RANDOM VARIABLES see Proposition 1. ... πn (x) ∈ Bn }) = P(B1 × B2 × · · · × Bn × R × R × · · · ) = P1 (B1 ) · · · Pn (Bn ) n = k=1 n P(R × · · · × R × Bk × R × · · · ) P({x : πk (x) ∈ Bk }).. Finally. Then P({x : π1 (x) ∈ B1 . B(RN ). where B1 × B2 × · · · × Bn × R × R × · · · := x ∈ RN : x1 ∈ B1 ...... Bn ∈ B(R). 2. ..1. Bn ∈ B(R). .17) we ﬁnd an unique probability measure P on B(RN ) such that P(B1 × B2 × · · · × Bn × R × R × · · · ) = P1 (B1 ) · · · Pn (Bn ) for all n = 1.. Take Borel sets B1 ... . Then (πn )∞ n=1 is a sequence of independent random variables such that the law of πn is Pn . Proof. that means P(πn ∈ B) = Pn (B) for all B ∈ B(R). P) and πn : RN → R be deﬁned as above. . and B1 . P2 .6.7 [Realization of independent random variables] Let (RN . Using Caratheodory’s extension theorem (Proposition 1. k=1 = .3.44 CHAPTER 2. . xn ∈ Bn .2. be a sequence of probability ´ measures on B(R)... let P1 . Proposition 2..

1.1 [step one. this is not the case as shown by Lemma 3. P) and a random variable f : Ω → R. we deﬁne the expectation or integral Ef = Ω f dP = Ω f (ω)dP(ω) and investigate its basic properties.1.Chapter 3 Integration Given a probability space (Ω. F. P) and an F-measurable g : Ω → R with representation n g= i=1 αi 1 Ai I where αi ∈ R and Ai ∈ F. F. We have to check that the deﬁnition is correct.1 Deﬁnition of the expected value The deﬁnition of the integral is done within three steps. we let n Eg = Ω gdP = Ω g(ω)dP(ω) := i=1 αi P(Ai ). 45 .2 Assuming measurable step-functions n m g= i=1 αi 1 Ai = I j=1 m j=1 βj 1 Bj . since it might be that diﬀerent representations give diﬀerent expected values Eg. 3. f is a step-function] Given a probability space (Ω. However. I one has that n i=1 αi P(Ai ) = βj P(Bj ). Deﬁnition 3.

Proof. Proposition 3. From this we get that n n N N αi P(Ai ) = i=1 i=1 j∈Ii αi P(Cj ) = j=1 i:j∈Ii αi P(Cj ) = j=1 γj P(Cj ) = 0. I one has that αf + βg = α αi 1 Ai + β I i=1 j=1 βj 1 Bj I and n m E(αf + βg) = α i=1 αi P(Ai ) + β j=1 βj P(Bj ) = αEf + βEg.. .46 CHAPTER 3.. (b) N j=1 Cj = Ω. .. The proof follows immediately from Lemma 3.. By subtracting in both equations the right-hand side from the lefthand one we only need to show that n αi 1 Ai = 0 I i=1 implies that n αi P(Ai ) = 0. INTEGRATION Proof. β ∈ R one has that E (αf + βg) = αEf + βEg. Given α. for n m f= i=1 αi 1 Ai I n and g= j=1 m βj 1 Bj . P) be a probability space and f. N 0= i=1 αi 1 Ai = I i=1 j∈Ii αi 1 Cj = I j=1 i:j∈Ii I αi 1 Cj = j=1 γj 1 Cj I so that γj = 0 if Cj = ∅. g : Ω → R be measurable step-functions. i=1 By taking all possible intersections of the sets Ai and by adding appropriate complements we ﬁnd a system of sets C1 . CN ∈ F such that (a) Cj ∩ Ck = ∅ if j = k.2 and the deﬁnition of the expected value of a step-function since.1... .3 Let (Ω. j∈Ii (c) for all Ai there is a set Ii ⊆ {1. F. N } such that Ai = Now we get that n n N Cj .1.

DEFINITION OF THE EXPECTED VALUE 47 Deﬁnition 3.1. f is general] Let (Ω. Then Ef = Ω f dP = Ω f (ω)dP(ω) := sup {Eg : 0 ≤ g(ω) ≤ f (ω). B(R))-measurable. as follows: Assume a function f : R → R which is (B(R).6 The fundamental Lebesgue-integral on the real line can be introduced by the means. n] for all n = 1. In Section 1. (3) If the expected value of f exists and A ∈ F. g is a measurable step-function} .4 we have introduced the Lebesgue measure λ = λn on (n − 1.3. Let fn : (n − 1. n=−∞ (n−1.3. In the last step we will deﬁne the expectation for a general random variable. F. Note that in this deﬁnition the case Ef = ∞ is allowed. Deﬁnition 3. we have to our disposal so far. (1) If Ef + < ∞ or Ef − < ∞. I The expression Ef is called expectation or expected value of the random variable f . f is non-negative] Given a probability space (Ω. P) be a probability space and f : Ω → R be a random variable. (2) The random variable f is called integrable provided that Ef + < ∞ and Ef − < ∞.n] . . and that ∞ |f (x)|dλ(x) < ∞. 2.n] . n] → R be the restriction of f which is a random variable with respect to B((n − 1.4 [step two.. To this end we decompose a random variable f : Ω → R into its positive and negative part f (ω) = f + (ω) − f − (ω) with f + (ω) := max {f (ω). P) and a random variable f : Ω → R with f (ω) ≥ 0 for all ω ∈ Ω. Remark 3. 0} ≥ 0. Assume that fn is integrable on (n − 1..1.1.1. n].5 [step three. n]) = B(n−1. ∞]. F. then we say that the expected value of f exists and set Ef := Ef + − Ef − ∈ [−∞. 0} ≥ 0 and f − (ω) := max {−f (ω). then f dP = A A f (ω)dP(ω) := Ω f (ω)1 A (ω)dP(ω).

. 6}. n=1 and the expected value exists if either f (ωn )qn < ∞ or {n:f (ωn )≥0} {n:f (ωn )≤0} (−f (ωn ))qn < ∞. . Now we go the opposite way: Given a Borel set I and a map f : I → R which is (BI . then it computes to ∞ Ef = n=1 f (ωn )qn ∈ [−∞. we can extend f to a (B(R). B(R)) measurable.}.e. which models rolling a die. ω2 . F := 2Ω . A simple example for the expectation is the expected value while rolling a die: 1 Example 3. Example 3. INTEGRATION then f : R → R is called integrable with respect to the Lebesgue-measure on the real line and the Lebesgue-integral is deﬁned by ∞ f (x)dλ(x) := R n=−∞ (n−1. . and P({ωn }) = qn ∈ [0.7 A basic example for our integration is as follows: Let Ω = {ω1 .. F := 2Ω . . 6 . then we deﬁne f (x)dλ(x) := I R f (x)dλ(x). B(R))-measurable function f by f (x) := f (x) if x ∈ I and f (x) := 0 if x ∈ I.48 CHAPTER 3. If we deﬁne f (k) = k. 1] with ∞ qn = 1. If the expected value exists..8 Assume that Ω := {1.1. Given n=1 f : Ω → R we get that f is integrable if and only if ∞ |f (ωn )|qn < ∞. 2.5. ∞]. .1. 6 f (k) := i=1 i1 {i} (k). and P({k}) := 6 . If f is Lebesgue-integrable.n] f (x)dλ(x). i. I then f is a measurable step-function and it follows that 6 Ef = i=1 iP({i}) = 1 + 2 + ··· + 6 = 3.

g : Ω → R.1. 49 Deﬁnition 3. that means any square integrable random variable is integrable.3. (2) The random variable f is integrable if and only if |f | is integrable. 1 3.2. F.10 (1) If f is integrable and α. Proposition 3. (2) If Ef 2 < ∞ then var(f) = Ef 2 − (Ef)2 < ∞. .2 Basic properties of the expected value We say that a property P(ω). (1) follows from var(αf − c) = E[(αf − c) − E(αf − c)]2 = E[αf − αEf]2 = α2 var(f). ∞] is called variance.2. (2) First we remark that E|f | ≤ (Ef 2 ) 2 as we shall see later by H¨lder’s ino equality (Corollary 3. P) and random variables f. then 0 ≤ Ef ≤ Eg. Let us summarize some simple properties: Proposition 3. Let us start with some ﬁrst properties of the expected value.1 Assume a probability space (Ω.) if {ω ∈ Ω : P(ω) holds} belongs to F and is of measure one. depending on ω.s. Proof. In this case one has |Ef | ≤ E|f |. the variance is often of interest. BASIC PROPERTIES OF THE EXPECTED VALUE Besides the expected value. then var(αf − c) = α2 var(f).9 [variance] Let (Ω. holds P-almost surely or almost surely (a. P) be a probability space and f : Ω → R be an integrable random variable. Then var(f) = σf2 = E[f − Ef]2 ∈ [0. F. (1) If 0 ≤ f (ω) ≤ g(ω).6).6.1. Then we simply get that var(f) = E[f − Ef]2 = Ef 2 − 2E(fEf) + (Ef)2 = Ef 2 − 2(Ef)2 + (Ef)2 . c ∈ R.

50 (3) If f = 0 a.s., then Ef = 0.

CHAPTER 3. INTEGRATION

(4) If f ≥ 0 a.s. and Ef = 0, then f = 0 a.s. (5) If f = g a.s. and Ef exists, then Eg exists and Ef = Eg. Proof. (1) follows directly from the deﬁnition. Property (2) can be seen as follows: by deﬁnition, the random variable f is integrable if and only if Ef + < ∞ and Ef − < ∞. Since ω ∈ Ω : f + (ω) = 0 ∩ ω ∈ Ω : f − (ω) = 0 = ∅ and since both sets are measurable, it follows that |f | = f + +f − is integrable if and only if f + and f − are integrable and that |Ef | = |Ef + − Ef − | ≤ Ef + + Ef − = E|f |. (3) If f = 0 a.s., then f + = 0 a.s. and f − = 0 a.s., so that we can restrict ourselves to the case f (ω) ≥ 0. If g is a measurable step-function with g = n ak 1 Ak , g(ω) ≥ 0, and g = 0 a.s., then ak = 0 implies P(Ak ) = 0. I k=1 Hence Ef = sup {Eg : 0 ≤ g ≤ f, g is a measurable step-function} = 0 since 0 ≤ g ≤ f implies g = 0 a.s. Properties (4) and (5) are exercises. The next lemma is useful later on. In this lemma we use, as an approximation for f , the so-called staircase-function. This idea was already exploited in the proof of Proposition 2.1.3. Lemma 3.2.2 Let (Ω, F, P) be a probability space and f : Ω → R be a random variable. (1) Then there exists a sequence of measurable step-functions fn : Ω → R such that, for all n = 1, 2, . . . and for all ω ∈ Ω, |fn (ω)| ≤ |fn+1 (ω)| ≤ |f (ω)| and f (ω) = lim fn (ω).
n→∞

If f (ω) ≥ 0 for all ω ∈ Ω, then one can arrange fn (ω) ≥ 0 for all ω ∈ Ω. (2) If f ≥ 0 and if (fn )∞ is a sequence of measurable step-functions with n=1 0 ≤ fn (ω) ↑ f (ω) for all ω ∈ Ω as n → ∞, then Ef = lim Efn .
n→∞

3.2. BASIC PROPERTIES OF THE EXPECTED VALUE Proof. (1) It is easy to verify that the staircase-functions
4n −1

51

fn (ω) :=
k=0

k k+1 1 k I 1 k I k+1 (ω) + k+1 (ω) n { 2n ≤f < 2n } 2 2n { 2n ≤f < 2n } k=−4n

−1

fulﬁll all the conditions. (2) Letting
4n −1 0 fn (ω) 0 fn (ω)

:=
k=0

k 1 k I k+1 (ω) 2n { 2n ≤f < 2n }

we get 0 ≤ ↑ f (ω) for all ω ∈ Ω. On the other hand, by the deﬁnition of the expectation there exists a sequence 0 ≤ gn (ω) ≤ f (ω) of measurable step-functions such that Egn ↑ Ef . Hence
0 hn := max fn , g1 , . . . , gn

is a measurable step-function with 0 ≤ gn (ω) ≤ hn (ω) ↑ f (ω), so that Egn ≤ Ehn ≤ Ef, and
n→∞

lim Egn = lim Ehn = Ef.
n→∞

Now we will show that for every sequence (fk )∞ of measurable stepk=1 functions with 0 ≤ fk ↑ f it holds limk→∞ Efk = Ef . Consider dk,n := fk ∧ hn . Clearly, dk,n ↑ fk as n → ∞ and dk,n ↑ hn as k → ∞. Let zk,n := arctan Edk,n so that 0 ≤ zk,n ≤ 1. Since (zk,n )∞ is increasing for ﬁxed n and (zk,n )∞ is n=1 k=1 increasing for ﬁxed k one quickly checks that lim lim zk,n = lim lim zk,n .
k n n k

Hence Ef = lim Ehn = lim lim Edk,n = lim lim Edk,n = lim Efk
n n k k n k

where we have used the following fact: if 0 ≤ ϕn (ω) ↑ ϕ(ω) for step-functions ϕn and ϕ, then lim Eϕn = Eϕ.
n

To check this, it is suﬃcient to assume that ϕ(ω) = 1 A (ω) for some A ∈ F. I Let ε ∈ (0, 1) and
n Bε := {ω ∈ A : 1 − ε ≤ ϕn (ω)} .

52 Then

CHAPTER 3. INTEGRATION

(1 − ε)1 Bε (ω) ≤ ϕn (ω) ≤ 1 A (ω). I n I
n n+1 n Since Bε ⊆ Bε and ∞ Bε = A we get, by the monotonicity of the n=1 n measure, that limn P(Bε ) = P(A) so that

(1 − ε)P(A) ≤ lim Eϕn .
n

Since this is true for all ε > 0 we get Eϕ = P(A) ≤ lim Eϕn ≤ Eϕ
n

and are done. Now we continue with some basic properties of the expectation. Proposition 3.2.3 [properties of the expectation] Let (Ω, F, P) be a probability space and f, g : Ω → R be random variables such that Ef and Eg exist. (1) If f ≥ 0 and g ≥ 0, then E(f + g) = Ef + Eg. (2) If c ∈ R, then E(cf ) exists and E(cf ) = cEf . (3) If Ef + + Eg + < ∞ or Ef − + Eg − < ∞, then E(f + g)+ < ∞ or E(f + g)− < ∞ and E(f + g) = Ef + Eg. (4) If f ≤ g, then Ef ≤ Eg. (5) If f and g are integrable and a, b ∈ R, then af + bg is integrable and aEf + bEg = E(af + bg). Proof. (1) Here we use Lemma 3.2.2 (2) by ﬁnding step-functions 0 ≤ fn (ω) ↑ f (ω) and 0 ≤ gn (ω) ↑ g(ω) such that 0 ≤ fn (ω) + gn (ω) ↑ f (ω) + g(ω) and E(f + g) = lim E(fn + gn ) = lim(Efn + Egn ) = Ef + Eg
n n

by Proposition 3.1.3. (2) is an exercise. (3) We only consider the case that Ef + + Eg + < ∞. Because of (f + g)+ ≤ f + + g + one gets that E(f + g)+ < ∞. Moreover, one quickly checks that (f + g)+ + f − + g − = f + + g + + (f + g)− so that Ef − +Eg − = ∞ if and only if E(f +g)− = ∞ if and only if Ef +Eg = E(f + g) = −∞. Assuming that Ef − + Eg − < ∞ gives that E(f + g)− < ∞ and E[(f + g)+ + f − + g − ] = E[f + + g + + (f + g)− ]

n→∞ Since hN is a step function for each N and hN ↑ f we have by Lemma 3. (2) If 0 ≥ fn (ω) ↓ f (ω) a. Setting hN := max fn. For 1 ≤ n ≤ N it holds that fn. fn ≤ h ≤ f. The inequality f ≤ g gives 0 ≤ f + ≤ g + and 0 ≤ g − ≤ f − so that f and g are integrable and Ef = Ef + − Ef − ≤ Eg + − Eg − = Eg.. For each fn take a sequence of step functions (fn. By deﬁnition. (5) Since (af + bg)+ ≤ |a||f | + |b||g| and (af + bg)− ≤ |a||f | + |b||g| we get that af + bg is integrable. .. : Ω → R be random variables. Proof. (a) First suppose 0 ≤ fn (ω) ↑ f (ω) for all ω ∈ Ω. Ef ≤ lim EfN .2 that limN →∞ EhN = Ef and therefore. then limn Efn = Ef .. Hence assume that Ef − < ∞ and Eg + < ∞.s.2.s. (4) If Ef − = ∞ or Eg + = ∞.k 1≤k≤N 1≤n≤N we get hN −1 ≤ hN ≤ max1≤n≤N fn = fN . by N → ∞. (b) Now let 0 ≤ fn (ω) ↑ f (ω) a.k ↑ fn . fn ≤ fn+1 ≤ f implies Efn ≤ Ef and hence n→∞ lim Efn ≤ Ef. BASIC PROPERTIES OF THE EXPECTED VALUE 53 which implies that E(f + g) = Ef + Eg because of (1). f1 . P) be a probability space and f. (1) If 0 ≤ fn (ω) ↑ f (ω) a.k )k≥1 such that 0 ≤ fn.N ≤ hN ≤ fN so that. F. N →∞ On the other hand. and therefore f = lim fn ≤ h ≤ f.. this means that 0 ≤ fn (ω) ↑ f (ω) for all ω ∈ Ω \ A.2. as k → ∞.s. then limn Efn = Ef . Deﬁne h := limN →∞ hN .3. Proposition 3. then Ef = −∞ or Eg = ∞ so that nothing is to prove. since hN ≤ fN .4 [monotone convergence] Let (Ω. . The equality for the expected values follows from (2) and (3). f2 .2.

54

CHAPTER 3. INTEGRATION

where P(A) = 0. Hence 0 ≤ fn (ω)1 Ac (ω) ↑ f (ω)1 Ac (ω) for all ω and step (a) I I implies that lim Efn 1 Ac = Ef 1 Ac . I I
n

Since fn 1 Ac = fn a.s. and f 1 Ac = f a.s. we get E(fn 1 Ac ) = Efn and I I I E(f 1 Ac ) = Ef by Proposition 3.2.1 (5). I (c) Assertion (2) follows from (1) since 0 ≥ fn ↓ f implies 0 ≤ −fn ↑ −f.

Corollary 3.2.5 Let (Ω, F, P) be a probability space and g, f, f1 , f2 , ... : Ω → R be random variables, where g is integrable. If (1) g(ω) ≤ fn (ω) ↑ f (ω) a.s. or (2) g(ω) ≥ fn (ω) ↓ f (ω) a.s., then limn→∞ Efn = Ef . Proof. We only consider (1). Let hn := fn − g and h := f − g. Then 0 ≤ hn (ω) ↑ h(ω) a.s.
− Proposition 3.2.4 implies that limn Ehn = Eh. Since fn and f − are integrable Proposition 3.2.3 (1) implies that Ehn = Efn − Eg and Eh = Ef − Eg so that we are done.

In Deﬁnition 1.2.7 we deﬁned lim supn→∞ an for a sequence (an )∞ ⊂ R. Natn=1 urally, one can extend this deﬁnition to a sequence (fn )∞ of random varin=1 ables: For any ω ∈ Ω lim supn→∞ fn (ω) and lim inf n→∞ fn (ω) exist. Hence lim supn→∞ fn and lim inf n→∞ fn are random variables. Proposition 3.2.6 [Lemma of Fatou] Let (Ω, F, P) be a probability space and g, f1 , f2 , ... : Ω → R be random variables with |fn (ω)| ≤ g(ω) a.s. Assume that g is integrable. Then lim sup fn and lim inf fn are integrable and one has that E lim inf fn ≤ lim inf Efn ≤ lim sup Efn ≤ E lim sup fn .
n→∞ n→∞ n→∞ n→∞

Proof. We only prove the ﬁrst inequality. The second one follows from the deﬁnition of lim sup and lim inf, the third one can be proved like the ﬁrst one. So we let Zk := inf fn
n≥k

so that Zk ↑ lim inf n fn and, a.s., |Zk | ≤ g and | lim inf fn | ≤ g.
n

3.2. BASIC PROPERTIES OF THE EXPECTED VALUE Applying monotone convergence in the form of Corollary 3.2.5 gives that E lim inf fn = lim EZk = lim E inf fn
n k k n≥k

55

≤ lim inf Efn
k n≥k

= lim inf Efn .
n

Proposition 3.2.7 [Lebesgue’s Theorem, dominated convergence] Let (Ω, F, P) be a probability space and g, f, f1 , f2 , ... : Ω → R be random variables with |fn (ω)| ≤ g(ω) a.s. Assume that g is integrable and that f (ω) = limn→∞ fn (ω) a.s. Then f is integrable and one has that Ef = lim Efn .
n

Proof. Applying Fatou’s Lemma gives Ef = E lim inf fn ≤ lim inf Efn ≤ lim sup Efn ≤ E lim sup fn = Ef.
n→∞ n→∞ n→∞ n→∞

Finally, we state a useful formula for independent random variables. Proposition 3.2.8 If f and g are independent and E|f | < ∞ and E|g| < ∞, then E|f g| < ∞ and Ef g = Ef Eg. The proof is an exercise. Concerning the variance of the sum of independent random variables we get the fundamental Proposition 3.2.9 Let f1 , ..., fn be independent random variables with ﬁnite second moment. Then one has that var(f1 + · · · + fn ) = var(f1 ) + · · · + var(fn ). Proof. The formula follows from var(f1 + · · · + fn ) = E((f1 + · · · + fn ) − E(f1 + · · · + fn ))2
n 2

= E
i=1 n

(fi − Efi ) (fi − Efi )(fj − Efj ) E ((fi − Efi )(fj − Efj ))

= E
i,j=1 n

=
i,j=1

56
n

CHAPTER 3. INTEGRATION =
i=1 n

E(fi − Efi )2 +
i=j

E ((fi − Efi )(fj − Efj ))

=
i=1 n

var(fi ) +
i=j

E(fi − Efi )E(fj − Efj )

=
i=1

var(fi )

because E(fi − Efi ) = Efi − Efi = 0.

3.3

Connections to the Riemann-integral

In two typical situations we formulate (without proof) how our expected value connects to the Riemann-integral. For this purpose we use the Lebesgue measure deﬁned in Section 1.3.4. Proposition 3.3.1 Let f : [0, 1] → R be a continuous function. Then
1

f (x)dx = Ef
0

with the Riemann-integral on the left-hand side and the expectation of the random variable f with respect to the probability space ([0, 1], B([0, 1]), λ), where λ is the Lebesgue measure, on the right-hand side.

Now we consider a continuous function p : R → [0, ∞) such that

p(x)dx = 1
−∞

and deﬁne a measure P on B(R) by
n bi

P((a1 , b1 ] ∩ · · · ∩ (an , bn ]) :=
i=1 ai

p(x)dx

for −∞ ≤ a1 ≤ b1 ≤ · · · ≤ an ≤ bn ≤ ∞ (again with the convention that (a, ∞] = (a, ∞)) via Carath´odory’s Theorem (Proposition 1.2.17). The e function p is called density of the measure P. Proposition 3.3.2 Let f : R → R be a continuous function such that

|f (x)|p(x)dx < ∞.
−∞

1]). Let us consider two examples indicating the diﬀerence between the Riemannintegral and our expected value.4 The expression t t→∞ lim 0 sin x π dx = x 2 is deﬁned as limit in the Riemann sense although ∞ 0 sin x x + ∞ dx = ∞ and 0 sin x x − dx = ∞. . Transporting this into a probabilistic setting we take the exponential distribution with parameter λ > 0 from Section 1. Example 3.6. The above yields that I t t→∞ lim f (x)pλ (x)dx = 0 π 2 but f (x)+ dµλ (x) = R R f (x)− dµλ (x) = ∞.3. B([0.3 We give the standard example of a function which has an expected value. x ∈ [0.∞) (x)λe−λx . 1] rational Then f is not Riemann integrable.3. Let f : R → R be given by f (x) = 0 if x ≤ 0 and f (x) := sin x eλx if x > 0 and recall that the λx exponential distribution µλ with parameter λ > 0 is given by the density pλ (x) = 1 [0. The point of this example is that the Riemann-integral takes more information into the account than the rather abstract expected value. but the Riemann-integral gives a way to deﬁne a value. but which is not Riemann-integrable. Hence the expected value of f does not exists. P) on the right-hand side. 1] irrational . 1]. Example 3.3. but Lebesgue integrable with Ef = 1 if we use the probability space ([0.3. B(R). λ). which makes sense. CONNECTIONS TO THE RIEMANN-INTEGRAL Then ∞ 57 f (x)p(x)dx = Ef −∞ with the Riemann-integral on the left-hand side and the expectation of the random variable f with respect to the probability space (R. Let f (x) := 1. 0.3. x ∈ [0.

P) be a probability space. Proposition 3. (iii) Assume now a sequence of measurable step-function 0 ≤ gn (η) ↑ g(η) for all η ∈ E which does exist according to Lemma 3. In other words. If we can show that gn (η)dPϕ (η) = E Ω gn (ϕ(ω))dP(ω) then we are done. Pϕ is the law of ϕ. In many cases. only by this formula it is possible to compute explicitly expected values. E) be a measurable space. and their values are equal. that means Pϕ (A) = P({ω : ϕ(ω) ∈ A}) = P(ϕ−1 (A)) Then g(η)dPϕ (η) = A ϕ−1 (A) for all A ∈ E. g(ϕ(ω))dP(ω) for all A ∈ E in the sense that if one integral exists. But now we get gn (η)dPϕ (η) = Pϕ (B) = P(ϕ−1 (B)) = E Ω 1 ϕ−1 (B) (ω)dP(ω) I gn (ϕ(ω))dP(ω).4. then one can multiply by real numbers and can take sums and the equality remains true). Proof. INTEGRATION 3. Ω = Ω 1 B (ϕ(ω))dP(ω) = I Let us give an example for the change of variable formula. (i) Letting g(η) := 1 A (η)g(η) we have I g(ϕ(ω)) = 1 ϕ−1 (A) (ω)g(ϕ(ω)) I so that it is suﬃcient to consider the case A = E.2 so that gn (ϕ(ω)) ↑ g(ϕ(ω)) for all ω ∈ Ω as well. (E.1 [Change of variables] Let (Ω. Assume that Pϕ is the image measure of P with respect to ϕ 1 . . the other exists as well.2. 1 In other words. By additivity it is enough to check gn (η) = 1 B (η) for some I B ∈ E (if this is true for this case.4 Change of variables in the expected value We want to prove a change of variable formula for the integrals Ω f dP. for f (ω) := g(ϕ(ω)) one has that f + = g + ◦ ϕ and f − = g − ◦ ϕ it is suﬃcient to consider the positive part of g and its negative part separately. E Ω (ii) Since. Hence we have to show that g(η)dPϕ (η) = g(ϕ(ω))dP(ω). and g : E → R be a random variable. we can assume that g(η) ≥ 0 for all η ∈ E.58 CHAPTER 3. F. ϕ : Ω → E be a measurable map.

Italy) . Then. B(R)) the expected value |x|n dµ(x) R is called n-th absolute moment of µ. .. 2.. Before we start with Fubini’s Theorem we need some preparations. 2.5.}. y ∈ L.4. In many cases this provides an appropriate tool for the computation of integrals. R R xn dµ(x) exists.. then Corollary 3. If xn dµ(x) is called n-th moment of µ. (3) There exists a 0 ∈ L such that x + 0 = x for all x ∈ L. USA). If the law Pf has a density p in the sense of Proposition 3. P) be a probability space and f : Ω → R be a random variable with law Pf . y. (4) For all x ∈ L there exists a −x such that x + (−x) = 0. for all n = 1. . .1 [vector space] A set L equipped with operations ” + ” : L×L → L and ”·” : R×L → L is called vector space over R if the following conditions are satisﬁed: (1) x + y = y + x for all x. (2) For a probability measure µ on (R. 2 Guido Fubini. E|f |n = R |x|n dPf (x) and Ef n = R xn dPf (x).2 [Moments] Assume that n ∈ {1.5. then Ef n is called n-th moment of f . and show in Fubini’s 2 Theorem that integrals with respect to product measures can be written as iterated integrals and that one can change the order of integration in these iterated integrals.5 Fubini’s Theorem In this section we consider iterated integrals..2.3.4. 59 (1) For a random variable f : Ω → R the expected value E|f |n is called n-th absolute moment of f . as they appear very often in applications.. then the other exists as well and they coincide. FUBINI’S THEOREM Deﬁnition 3.3 Let (Ω.3. F. 19/01/1879 (Venice. If Ef n exists. (2) x + (y + z) = (x + y) + z for all x. Deﬁnition 3. z ∈ L. First we recall the notion of a vector space.06/06/1943 (New York. where the latter equality has to be understood as follows: if one side exists. then R |x|n dPf (x) can be replaced by |x|n p(x)dx and R xn dPf (x) by R xn p(x)dx. R 3..

fn ≥ 0. for example. measurability assertions can be proved. F1 . Then one has the following: if H contains the indicator function of every set from some π-system I of subsets of Ω. Proposition 3. (7) (α + β)x = αx + βx for all α. we let (for example) f dP = lim Ω N →∞ [f ∧ N ]dP. then H contains every bounded σ(I)-measurable function on Ω. (8) α(x + y) = αx + αy for all α ∈ R and x. F2 . β ∈ R and x ∈ L. Proof.60 (5) 1 x = x.5. P1 ) and (Ω2 . For the following it is convenient to allow that the random variables may take inﬁnite values. INTEGRATION (6) α(βx) = (αβ)x for all α.2 [Monotone Class Theorem] Let H be a class of bounded functions from Ω into R satisfying the following conditions: (1) H is a vector space over R where the natural point-wise operations ”+” and ” · ” are used.5. Usually one uses the notation x − y := x + (−y) and −x + y := (−x) + y etc. I (3) If fn ∈ H.2. we recall that the product space (Ω1 × Ω2 . (2) 1 Ω ∈ H. Now we state the Monotone Class Theorem. where f is bounded on Ω. . y ∈ L. See for example [5] (Theorem 3. P2 ) was deﬁned in Deﬁnition 1. Deﬁnition 3. If we have a non-negative extended random variable. Ω For the following. P1 × P2 ) of the two probability spaces (Ω1 . CHAPTER 3. It is a powerful tool by which. F) be a measurable space. F1 ⊗ F2 . ∞} is called extended random variable iﬀ f −1 (B) := {ω : f (ω) ∈ B} ∈ F for all B ∈ B(R) or B = {−∞}. A function f : Ω → R ∪ {−∞.3 [extended random variable] Let (Ω. and fn ↑ f .19.14). then f ∈ H. β ∈ R and x ∈ L.

3.5. random variables. ω2 )dP1 (ω1 ) dP2 (ω2 ). Ω2 Ω1 = It should be noted.4 to get to values of the integrals. ω2 )dP1 (ω1 ) are extended F1 -measurable and F2 -measurable. for all ωi ∈ Ωi . ω2 )dP1 (ω1 ) = ∞ =0 and P1 ω1 : Ω2 f (ω1 .ω2 f (ω1 . ω2 ) < ∞. ω2 ) are F1 -measurable 0 and F2 -measurable. (i) First we remark it is suﬃcient to prove the assertions for fN (ω1 . respectively. ω2 )dP2 (ω2 ) = ∞ = 0. Proof of Proposition 3. (3) One has that f (ω1 .4. ω2 )d(P1 × P2 ) = Ω1 ×Ω2 Ω1 Ω2 f (ω1 .4 [Fubini’s Theorem for non-negative functions] Let f : Ω1 × Ω2 → R be a non-negative F1 ⊗ F2 -measurable function such that f (ω1 . Let H be the class of bounded F1 × F2 -measurable functions f : Ω1 × Ω2 → R such that .5. Hence we can assume for the following that supω1 . ω2 ) < ∞.2. ω2 ). N } which is bounded.2. respectively.5. (2) The functions ω1 → Ω2 f (ω1 . ω2 ) := min {f (ω1 .1) Ω1 ×Ω2 Then one has the following: 0 0 (1) The functions ω1 → f (ω1 . (2). and (3) can be obtained via N → ∞ if we use Proposition 2.4 to get the necessary measurabilities (which also works for our extended random variables) and the monotone convergence formulated in Proposition 3. (ii) We want to apply the Monotone Class Theorem Proposition 3.1. FUBINI’S THEOREM 61 Proposition 3. (3.1) automatically implies that P2 ω2 : Ω1 f (ω1 . The statements (1). ω2 )dP2 (ω2 ) and ω2 → Ω1 f (ω1 . ω2 ) and ω2 → f (ω1 . ω2 )dP2 (ω2 ) dP1 (ω1 ) f (ω1 . ω2 )d(P1 × P2 )(ω1 . that item (3) together with Formula (3.5.

using Propositions 2. ω2 )d(P1 × P2 ) = Ω1 ×Ω2 Ω1 Ω2 f (ω1 . ω2 ) = 1 A (ω1 )1 B (ω2 ) I I we easily can check that f ∈ H. for all ωi ∈ Ωi . ω2 )d(P1 × P2 ) = (P1 × P2 )(A × B) = P1 (A)P2 (B) Ω1 ×Ω2 and.4 we see that H satisﬁes the assumptions (1). Letting f (ω1 . ω2 ) < ∞.1.4 and 3. property (c) follows from f (ω1 . for all ωi ∈ Ωi . Hence we are done.5 [Fubini’s Theorem] Let f : Ω1 × Ω2 → R be an F1 ⊗ F2 -measurable function such that |f (ω1 . Proposition 3. ω2 )dP1 (ω1 ) are F1 -measurable and F2 -measurable.5. Ω1 ×Ω2 (3. ω2 )dP2 (ω2 ) dP1 (ω1 ) f (ω1 .5. INTEGRATION 0 0 (a) the functions ω1 → f (ω1 . (c) one has that f (ω1 . respectively. As π-system I we take the system of all F = A×B with A ∈ F1 and B ∈ F2 . respectively. .5. For instance. ω2 ) are F1 -measurable 0 and F2 -measurable. ω2 ) are F1 -measurable 0 and F2 -measurable. ω2 ) and ω2 → f (ω1 .2. ω2 ) and ω2 → f (ω1 . (2). respectively. and (3) of Proposition 3. for example. ω2 )dP2 (ω2 ) dP1 (ω1 ) = Ω1 Ω2 Ω1 1 A (ω1 )P2 (B)dP1 (ω1 ) I = P1 (A)P2 (B).2 gives that H contains all bounded functions f : Ω1 ×Ω2 → R measurable with respect F1 ⊗F2 .2) Then the following holds: 0 0 (1) The functions ω1 → f (ω1 .2. ω2 )dP2 (ω2 ) and ω2 → Ω1 f (ω1 . Ω2 Ω1 = Again. ω2 )dP1 (ω1 ) dP2 (ω2 ). Applying the Monotone Class Theorem Proposition 3. Now we state Fubini’s Theorem for general random variables f : Ω1 × Ω2 → R.62 CHAPTER 3. f (ω1 . (b) the functions ω1 → Ω2 f (ω1 . ω2 )|d(P1 × P2 )(ω1 .

FUBINI’S THEOREM (2) There are Mi ∈ Fi with Pi (Mi ) = 1 such that the integrals 0 f (ω1 .5. Ω1 = Ω2 1 M2 (ω2 ) I Remark 3. 2 . ω2 ). ω2 )dP2 (ω2 ) dP1 (ω1 ) f (ω1 . Ω1 Ω2 and the same expression with |f (ω1 . ω2 )| instead of f (ω1 . (2) The expressions in (3. respectively. ω2 )dP1 (ω1 ) are F1 -measurable and F2 -measurable.2) can be replaced by f (ω1 . ω2 )dP1 (ω1 ) dP2 (ω2 ). (4) One has that f (ω1 .3. ω2 )dP1 (ω1 ) and Ω1 0 exist and are ﬁnite for all ωi ∈ Mi .5.5.5. Proof of Proposition 3. The proposition follows by decomposing f = f + − f − and applying Proposition 3. respectively. for example. random variables.4.1) and (3. ω2 )dP2 (ω2 ) dP1 (ω1 ) < ∞. ω2 )d(P1 × P2 ) Ω1 ×Ω2 = Ω1 1 M1 (ω1 ) I Ω2 f (ω1 . In the following example we show how to compute the integral ∞ −∞ e−x dx by Fubini’s Theorem.6 (1) Our understanding is that writing. ω2 )dP2 (ω2 ) and ω2 → 1 M2 (ω2 ) I Ω1 f (ω1 . Ω2 0 f (ω1 .5. an expression like 1 M2 (ω2 ) I Ω1 f (ω1 . ω2 )dP1 (ω1 ) we only consider and compute the integral for ω2 ∈ M2 . ω2 )dP2 (ω2 ) 63 (3) The maps ω1 → 1 M1 (ω1 ) I Ω2 f (ω1 .

y) −N dλ(y) dλ(x) = 2N 2N f (x. . the above e−x e−y dλ(y) dλ(x) = [−N.7 Let f : R × R → R be a non-negative continuous function.5. y) e−(x 2 +y 2 ) = = R→∞ lim d(λ × λ)(x. y) x2 +y 2 ≤R2 R 2π 0 0 2 R→∞ lim e−r rdrdϕ 2 = π lim 1 − e−R R→∞ = π where we have used polar coordinates. y) (2N )2 2 +y 2 ) where λ is the Lebesgue measure. For the right-hand side we get lim e−(x [−N. y) [−N. y) := e−(x yields that N −N N −N . . Comparing both sides gives ∞ −∞ e−x dλ(x) = 2 √ π. y)..N ] 2 +y 2 ) N →∞ d(λ × λ)(x..N ]×[−N.N ] d(λ × λ)(x.5 was “correct”.N ] 2 2 e−(x 2 +y 2 ) d(λ × λ)(x.N ]×[−N. N ]. N ∈ {1.} gives that N −N N f (x. Letting f (x. As corollary we show that the deﬁnition of the Gaussian measure in Section 1.64 CHAPTER 3.3. INTEGRATION Example 3. Fubini’s Theorem applied to the uniform distribution on [−N.N ]×[−N. For the left-hand side we get N N →∞ N −N N lim e−x e−y dλ(y) dλ(x) −N 2 2 = = = N →∞ lim e−x −N N 2 N −N 2 e−y dλ(y) dλ(x) dλ(x) 2 2 N →∞ ∞ −∞ lim e −N −x2 −x2 e dλ(x) . 2.

y)dµ(y) dµ(x) = 0. z2 where we have used Example 3. then Ef = m and E(f − Ef )2 = σ 2 . by putting x = z/ 2 one gets 1 1= √ π ∞ −∞ e −x2 1 dx = √ 2π R ∞ −∞ e− 2 dz p0.9 Let Ω = [−1. .σ2 (x)dx = m.5. pm. Finally. and R (x − m)2 pm. y) := xy (x2 + y 2 )2 for (x.3) R In other words: if a random variable f : Ω → R has as law the normal distribution Nm. The function f (x.5. 0) and f (0. (3.σ2 (x)dx = 1.8 For σ > 0 and m ∈ R let pm.σ2 (x) := √ Then. R 65 1 2πσ 2 e− (x−m)2 2σ 2 . Secondly. by (x exp(−x2 /2)) = exp(−x2 /2) − x2 exp(−x2 /2) one can also compute that 1 √ 2π ∞ −∞ xe 2 −x 2 2 1 dx = √ 2π ∞ −∞ e− 2 dx = 1.1 (−x). In fact 1 1 f (x.σ2 (x)dx = σ 2 .3. even though the iterated integrals exist end are equal.5. x2 We close this section with a “counterexample” to Fubini’s Theorem. y)dµ(x) = 0 and −1 −1 f (x. y)dµ(x) dµ(y) = −1 −1 −1 f (x. xpm.4). y) = (0.1 (x) = p0.1 (x)dx = 1.5.σ2 . By the change of variable x → m + σx it is suﬃcient to show the √ statements for m = 0 and σ = 1. 1] × [−1. 1] (see Section 1. 1] and µ be the uniform distribution on [−1. 0) := 0 is not integrable on Ω. FUBINI’S THEOREM Proposition 3. y)dµ(y) = 0 so that 1 −1 1 1 1 f (x.7 so that xp0.4) Proof. (3.1 (x)dx = 0 R follows from the symmetry of the density p0. Firstly.3. Example 3.

r The inequality holds because on the right hand side we integrate only over the area {(x.1] |f (x. We simply have λP({ω : f (ω) ≥ λ}) = λE1 {f ≥λ} ≤ Ef 1 {f ≥λ} ≤ Ef.3 [Jensen’s inequality] 4 If g : R → R is convex and f : Ω → R a random variable with E|f | < ∞. Denmark).6 Some inequalities In this section we prove some basic inequalities. . and hence Proposition 3.1 [Chebyshev’s inequality] 3 Let f be a non-negative integrable random variable deﬁned on a probability space (Ω.05/ 03/1925 (Copenhagen. P). 16/05/1821 (Okatovo.08/12/1894 (St Petersburg.2 [convexity] A function g : R → R is convex if and only if g(px + (1 − p)y) ≤ pg(x) + (1 − p)g(y) for all 0 ≤ p ≤ 1 and all x.6. 1] × [−1. Then. then g(Ef ) ≤ Eg(f ) where the expected value on the right-hand side might be inﬁnity. 1] and 2π π/2 | sin ϕ cos ϕ|dϕ = 4 0 0 sin ϕ cos ϕdϕ = 2 follows by a symmetry argument. F. Ef .1]×[−1. y ∈ R.6. Proposition 3. y) : x2 + y 2 ≤ 1} which is a subset of [−1. B(R))-measurable. 08/05/1859 (Nakskov. y) ≥ 0 | sin ϕ cos ϕ| dϕdr r = 2 0 1 dr = ∞. Russia) 4 Johan Ludwig William Valdemar Jensen. INTEGRATION On the other hand. Russia) . for all λ > 0.6. 3. P({ω : f (ω) ≥ λ}) ≤ λ Proof. using polar coordinates we get 1 2π 0 1 4 [−1. A function g is concave if −g is convex.66 CHAPTER 3. 3 Pafnuty Lvovich Chebyshev. I I Deﬁnition 3. y)|d(µ × µ)(x. Every convex function g : R → R is continuous (check!) (B(R). Denmark).

|Ef | ≤ E|f |. 22/12/1859 (Stuttgart. b with a + b = 1.4 (1) The function g(x) := |x| is convex so that. ¨ Proposition 3. and E|f g| = 0. Example 3. which follows from the concavity of the logarithm (we can assume for a moment that x. It follows af (ω) + b ≤ g(f (ω)) for all ω ∈ Ω and g(Ef ) = aEf + b = E(af + b) ≤ Eg(f ). For example.1 so that f g = 0 a. y ≥ 0 and positive a.2. that means a. assuming E|f |p = 0 would imply |f |p = 0 a. according to Proposition 3.29/08/1937 (Leipzig. It ¨ uses the famous Holder-inequality. b ∈ R such that ax0 + b = g(x0 ) and ax + b ≤ g(x) for all x ∈ R. Since g is convex we ﬁnd a “supporting line” in x0 . so that Jensen’s inequality applied to |f | gives that (E|f |)p ≤ E|f |p . SOME INEQUALITIES 67 Proof. If 1 < p. F. f (E|f |p ) 1 p and g := ˜ g (E|g|q ) q 1 . . For the second case in the example above there is another way we can go. Gero many). q < ∞ with p + 1 = 1. Proof.s. Hence we may set ˜ f := We notice that xa y b ≤ ax + by for x. g : Ω → R. y > 0) ln(ax + by) ≥ a ln x + b ln y = ln xa + ln y b = ln xa y b .6.6. 5 Otto Ludwig H¨lder.3. q then 1 1 E|f g| ≤ (E|f |p ) p (E|g|q ) q . We can assume that E|f |p > 0 and E|g|q > 0.5 [Holder’s inequality] 5 Assume a probability space 1 (Ω.6. for any integrable f . Germany) . (2) For 1 ≤ p < ∞ the function g(x) := |x|p is convex.s. P) and random variables f. Let x0 = Ef .

12/01/1909 (G¨ttingen. g : Ω → R by f (k) := ak and g(k) := bk we get 1 N N |an bn | ≤ n=1 1 N N 1 p |an |p n=1 1 N N 1 q |bn |q n=1 from Proposition 3. random variables f. and 1 ≤ p < ∞. ¨ Corollary 3. Corollary 3. Lithuania) . g : Ω → R.6. g p q p q On the other hand side. . now Kaunas.. y := |˜|q . q < ∞ with p + 1 = 1.5) 6 Hermann Minkowski.8 [Minkowski inequality] 6 Assume a probability space (Ω. Let Ω = {1. ˜˜ E|f g | = so that we are done. If 1 < p. Deﬁning f. It is suﬃcient to prove the inequality for ﬁnite sequences (bn )N since n=1 by letting N → ∞ we get the desired inequality for inﬁnite sequences. Proposition 3. Multiplying by N and letting N → ∞ gives our assertion.6 For 0 < p < q < ∞ one has that (E|f |p ) p ≤ (E|f |q ) q . P). N }. Proof. we get g q 1 1 ˜ ˜˜ g |f g | = xa y b ≤ ax + by = |f |p + |˜|q p q and 1 1 ˜ 1 1 ˜˜ E|f g | ≤ E|f |p + E|˜|q = + = 1. Then (E|f + g|p ) p ≤ (E|f |p ) p + (E|g|p ) p . Germany). q then 1 1 ∞ ∞ p 1 1 E|f g| (E|f |p ) p (E|g|q ) q 1 1 ∞ q |an bn | ≤ n=1 n=1 |an | p n=1 |bn |q . 22/06/1864 (Alexotas.6.6. The proof is an exercise.7 [Holder’s inequality for sequences] Let (an )∞ n=1 1 ∞ and (bn )n=1 be sequences of real numbers.6.. and b := 1 . F := 2Ω . INTEGRATION 1 ˜ Setting x := |f |p . Russian Empire.68 CHAPTER 3. 1 1 1 (3. and P({k}) := 1/N .5.. o . a := p . F.

1 to |f − Ef |2 gives that P({|f − Ef | ≥ λ}) = P({|f − Ef |2 ≥ λ2 }) ≤ E|f − Ef |2 .3. we get that E|f +g|p < ∞ as well by the above considerations. λ2 λ Proof. Then one has. From Corollary 3.6. otherwise there is nothing to prove. Applying Proposition 3.6. So assume that 1 < p < ∞. P) such that Ef 2 < ∞. Consequently. |f +g|p ≤ (|f |+|g|)p ≤ 2p−1 (|f |p + |g|p ) and E|f + g|p ≤ 2p−1 (E|f |p + E|g|p ). we continue with q E|f + g|p = E|f + g||f + g|p−1 ≤ E(|f | + |g|)|f + g|p−1 = E|f ||f + g|p−1 + E|g||f + g|p−1 ≤ (E|f |p ) p E|f + g|(p−1)q 1 1 q 1 1 + (E|g|p ) p E|f + g|(p−1)q 1 1 q . we use that E(f − Ef )2 = Ef 2 − (Ef )2 ≤ Ef 2 .6. For p = 1 the inequality follows from |f + g| ≤ |f | + |g|. q We close with a simple deviation inequality for f . for all λ > 0. Corollary 3.6. . F. P(|f − Ef | ≥ λ) ≤ E(f − Ef )2 Ef 2 ≤ 2 . λ2 Finally.5) follows 1 by dividing the above inequality by (E|f + g|p ) q and taking into the account 1 1 − 1 = p.9 Let f be a random variable deﬁned on a probability space (Ω. Taking 1 1 < q < ∞ with p + 1 = 1.6 we get that E|f | < ∞ so that Ef exists. ¨ where we have used Holder’s inequality. (3. Assuming now that (E|f |p ) p + (E|g|p ) p < ∞. b ≥ 0. Since (p − 1)q = p. The convexity of x → |x|p gives that a+b 2 p ≤ |a|p + |b|p 2 and (a+b)p ≤ 2p−1 (ap +bp ) for a. SOME INEQUALITIES 69 Proof.

Assume that Ω L± dP > 0 and deﬁne µ± (A) = Ω 1 A L± dP I . F) be a measurable space.1 (Signed measures) Let (Ω. (i) A map µ : F → R is called (ﬁnite) signed measure if and only if µ = αµ+ − βµ− . Then µ P (µ is absolutely continuous with respect to P) if and only if P(A) = 0 implies µ(A) = 0. L± dP Ω ∞ Then assume An ∈ F to be disjoint sets such that A = n=1 Ω An .70 CHAPTER 3. Then µ+ ∞ n=1 ∪ An = 1 α Ω 1 ∞ A L+ dP I∪ n=1 n 1 = α 1 = α = ∞ 1 An (ω) L+ dP I Ω n=1 N Ω N →∞ lim 1 An I n=1 N L+ dP L+ dP 1 lim α N →∞ 1 An I Ω n=1 . β ≥ 0 and µ+ and µ− are probability measures on F. Set α := L+ dP. Proof. 0} so that L = L+ −L− . where α. P) be a probability space. Then µ is a signed measure and µ P. P) is a probability space and that µ is a signed measure on (Ω. L± dP Ω Now we check that µ± are probability measures.2 Let (Ω. F. INTEGRATION 3.7. 0} and L− := max {−L.7. L : Ω → R be an integrable random variable. F). Example 3. and µ(A) := A LdP. F. We let L+ := max {L.7 Theorem of Radon-Nikodym Deﬁnition 3. First we have that µ (Ω) = ± Ω 1 Ω L± dP I = 1. (ii) Assume that (Ω.

Theorem 3. I so that ’dµ = LdP’. f2 . P) be a probability space and f.7. now Decin. F. Deﬁnition 3.8. F. or fn → f P-a. The Radon-Nikodym theorem was proved by Radon 7 in 1913 in the case of Rn .3. P) be a probability space and µ a signed measure with µ P. 8 Otton Marcin Nikodym.s. A ∈ F. 13/08/1887 (Zablotow. then P(L = L ) = 0.) or with probn=1 ability 1 to f (fn → f a. The extension to the general case was done by Nikodym 8 in 1930. dP 1 A dµ = I Ω 1 A LdP. The same can be done for L− .s. f1 . USA).. Galicia. We shall write L= We should keep in mind the rule µ(A) = Ω dµ . Bohemia.) if and only if P({ω : fn (ω) → f (ω) as n → ∞}) = 1.6) The random variable L is unique in the following sense. 16/12/1887 (Tetschen. (1) The sequence (fn )∞ converges almost surely (a. If L and L are random variables satisfying (3. Then there exists an integrable random variable L : Ω → R such that µ(A) = A L(ω)dP(ω).8. Austria). MODES OF CONVERGENCE ∞ 71 = n=1 µ+ (An ) where we have used Lebesgue’s dominated convergence theorem.s. now Ukraine) .1 [Types of convergence] Let (Ω. 7 Johann Radon.. Czech Republic) 25/05/1956 (Vienna. . . Deﬁnition 3.04/05/1974 (Utica. 3. Austria-Hungary.4 L is called Radon-Nikodym derivative.8 Modes of convergence First we introduce some basic types of convergence. (3.6).7.3 (Radon-Nikodym) Let (Ω. : Ω → R random variables.

2 [Convergence in distribution] Let (Ωn . For the above types of convergence the random variables have to be deﬁned on the same probability space. Proof. (5) If fn → f . P) be probability spaces and let fn : Ωn → R and f : Ω → R be random variables. Pn ) and (Ω. See [4]. where Ffn and Ff are the distribution-functions of fn and f . P d P d Lp P P Lp . Deﬁnition 3. F. F. then fn → f . f1 . : Ω → R be random variables.s. f2 . INTEGRATION P (2) The sequence (fn )∞ converges in probability to f (fn → f ) if and n=1 only if for all ε > 0 one has P({ω : |fn (ω) − f (ω)| > ε}) → 0 as n → ∞. respectively. (1) If fn → f a. (3) If 0 < p < ∞. (4) One has that fn → f if and only if Ffn (x) → Ff (x) at each point x of continuity of Ff (x). (3) If fn → f . as k → ∞. Fn .8.72 CHAPTER 3.. We have the following relations between the above types of convergence. then fn → f . . P) be a probability space and f. There is a variant without this assumption.. (2) If 0 < p < ∞ and fn → f . Proposition 3.8. then there is a subsequence 1 ≤ n1 < n2 < n3 < · · · such that fnk → f a. then the sequence (fn )∞ converges with respect to n=1 Lp or in the Lp -mean to f (fn → f ) if and only if E|fn − f |p → 0 as n → ∞.3 Let (Ω. then fn → f . Note that {ω : fn (ω) → f (ω) as n → ∞} and {ω : |fn (ω) − f (ω)| > ε} are measurable sets. Then the sequence (fn )∞ converges in distribution n=1 d to f (fn → f ) if and only if Eψ(fn ) → Eψ(f ) as n → ∞ for all bounded and continuous functions ψ : R → R..s.

I I I I 4 4 2 2 4 4 f7 = 1 [0. . 6 4 1 =  8 if n = 7. . 1]. Using a stronger condition.8.6. 1] : |fn (x)| > ε}) = λ({x ∈ [0. 4.4 Assume ([0.8. . .3. 2. .1] . f1 + · · · + fn P −→ m n that means. This gives a form of the strong law of large numbers.8. . 1]. . B([0. 1] : fn (x) = 0})  1  2 if n = 1. . λ) where λ is the Lebesgue measure. 1 ) . . 1 ) . We start with the weak law of large numbers as an example of the convergence in probability: Proposition 3. I 8 This implies limn→∞ fn (x) → 0 for all x ∈ [0. MODES OF CONVERGENCE 73 Example 3. .9) we have that P ω: f1 + · · · + fn − nm >ε n ≤ = = as n → ∞. 1]). lim P n as n → ∞. As a preview on the next probability course we give some examples for the above concepts of convergence.  . But it holds convergence in λ probability fn → 0: choosing 0 < ε < 1 we get λ({x ∈ [0. we get easily more: the almost sure convergence instead of the convergence in probability. .  . 1 ] .1] . f2 = 1 [ 1 . E|f1 + · · · + fn − nm|2 n 2 ε2 E( n k=1 (fk − n 2 ε2 m)) 2 nσ 2 →0 n 2 ε2 . 1 ) . . ω:| f1 + · · · + fn − m| > ε n → 0. We take f1 = 1 [0.5 [Weak law of large numbers] Let (fn )∞ be a n=1 sequence of independent random variables with Efk = m Then and E(fk − m)2 = σ 2 for all k = 1. By Chebyshev’s inequality (Corollary 3.  . 2   1  if n = 3. for each ε > 0. . f6 = 1 [ 3 . . 3 ) . Proof. f5 = 1 [ 1 . f4 = 1 [ 1 . I I 2 2 f3 = 1 [0.

For this we need Deﬁnition 3.8.74 CHAPTER 3. A sequence of Independent random variables fn : Ω → R is called Identically Distributed (i.s. 2 2 4 Efk ≤ Efk ≤ c. → 0 and therefore a. → 0. 2. and c := supn Efn < ∞. 4 ESn ≤ nc + 3n(n − 1)c ≤ 3cn2 . n2 This implies that 4 Sn n4 a. n=1 4 k = 1.l. where one 3 3 gets that fj3 is integrable by E|fj |3 ≤ (E|fj |4 ) 4 ≤ c 4 .k.s. k.. in particular weaker.s.d. 2 2 Hence Efk fl2 = Efk Efl2 ≤ c for k = l. . P) be a probability spaces. There are several strong laws of large numbers with other.j. INTEGRATION Proposition 3. because for distinct {i. n Proof. . . 2. Consequently. j. l} it holds Efi fj3 = Efi fj2 fk = Efi fj fk fl = 0 by independence. . Let Sn := 4 ESn n k=1 fk .) provided that the random variables fn have the same law. . F..i. We close with a fundamental example concerning the convergence in distribution: the Central Limit Theorem (CLT). k = 1. and E ∞ n=1 4 Sn = n4 S4 E n ≤ n4 n=1 Sn n ∞ ∞ n=1 3c < ∞.8. → 0.6 [Strong law of large numbers] Let (fn )∞ be a sequence of independent random variables with Efk = 0. conditions. It holds n 4 n =E k=1 fk = E = fi fj fk fl n i. Efi fj3 = Efi Efj3 = 0 · Efj3 = 0. Moreover. Then f1 + · · · + fn a. by Jensen’s inequality. . and all λ ∈ R.l=1 k=l 2 Efk Efl2 . that means P(fn ≤ λ) = P(fk ≤ λ) for all n.=1 n 4 Efk + k=1 3 k.7 Let (Ω. For example.

that means that f1 + · · · + fn d √ →g σ n for any g with P(g ≤ x) = √1 2π u2 x e− 2 du. rann=1 2 dom variables with Ef1 = 0 and Ef1 = σ 2 . Is there a right scaling factor c(n) such that f1 + · · · + fn → g.i.8.8 [Central Limit Theorem] Let (fn )∞ be a sequence n=1 2 of i. c(n) where g is a non-degenerate random variable in the sense that Pg = δ0 ? And in which sense does the convergence take place? The answer is the following Proposition 3. P) be a probability space and (fn )∞ be a sequence of i.8.3. n Hence the law of the limit is the Dirac-measure δ0 . By the law of large numbers we know f1 + · · · + fn P −→ 0. −∞ . F. Then P f1 + · · · + fn √ ≤x σ n 1 →√ 2π x −∞ e− 2 du u2 for all x ∈ R as n → ∞.d.i.d. random variables with Ef1 = 0 and Ef1 = σ 2 > 0. MODES OF CONVERGENCE 75 Let (Ω.

76 CHAPTER 3. INTEGRATION .

Show that if F ⊆ 2Ω is an algebra and a monotonic class. 2.. A1 ⊆ A2 ⊆ A3 ⊆ · · · =⇒ (b) A1 . . .1 Probability spaces Ai c 1. G. ∈ F.Chapter 4 Exercises 4. be three events. i∈I = i∈I Ac where Ai ⊆ Ω and I is an arbitrary i 3. then F is a σ-algebra. Prove that A ∩ (B ∪ C) = (A ∩ B) ∪ (A ∩ C). Prove that index set. and An ∈ F. A1 ⊇ A2 ⊇ A3 ⊇ · · · =⇒ n n An ∈ F.. consisting of the element x only? 5.. . 6}.. Find expressions for the events that of E. G. 4.. Let x ∈ R.. 12}? 7. Let E.. where {x} is the set. Give all elements of the smallest σ-algebra F on Ω which contains A and B. B ⊆ Ω such that A ∩ B = ∅. . F. A2 . Is it true that Q ∈ B(R)? 6.. Assume that the probability 1 that one die shows a certain number is 6 . 2. There are three students. ∈ F. 77 . if (a) A1 . What is the probability that the sum of the two dice is m ∈ {1. what is the probability that at least two of them have their birthday at the same day ? 8. F. Assume that Q is the set of rational numbers. A2 .* Deﬁnition: The system F ⊆ 2Ω is a monotonic class. Given two dice with numbers {1. 2. Is it true that {x} ∈ B(R). Given a set Ω and two non-empty sets A.. Assuming that a year has 365 days. 9..

. B ∈ F implies that A ∪ B ∈ F by (3”) A.78 (a) only F occurs. Prove σ(σ(G)) = σ(G). (h) at most two occur. 14. Give an example where the union of two σ-algebras is not a σ-algebra. Deﬁne P(B) := 1 : B∩A=∅ . The generated σ-algebra is denoted by B([α. (d) at least two events occur. (f) none occurs. CHAPTER 4. B ∈ F implies that A ∩ B ∈ F. 0 : B∩A=∅ Is (Ω. 18. (g) at most one occurs. A2 .1 one can replace (3’) A..1. b] : α ≤ a < b ≤ β} and that this σ-algebra is the smallest σ-algebra generated by the subsets A ⊆ [α. 12. 15. Prove that ∞ i=1 Ai ∈ F if F is a σ-algebra and A1 .8 holds. (b) both E and F but not G occur. β].1. (c) at least one event occurs. Show the equality σ(G2 ) = σ(G4 ) = σ(G0 ) in Proposition 1. Let Ω = ∅. Let F be a σ-algebra and A ∈ F. (e) all three events occur. B ∈ F. Show that in the deﬁnition of an algebra (Deﬁnition 1. EXERCISES 10. β]). 19. 17. 13. 11. A ⊆ Ω. Prove that G := {B ∩ A : B ∈ F} is a σ-algebra. β] : B ∈ B(R)} = σ {[a. P) a probability space? . . A = ∅ and F := 2Ω . Show that A ⊆ B implies that σ(A) ⊆ σ(B). ∈ F. Prove that {B ∩ [α. F. 16. β] which are open within [α. Prove that A \ B ∈ F if F is an algebra and A.

F.1. . i = 1. µ) is a probability space. 23. F := 2Ω and P(B) := We deﬁne 1 card(B). l) : k = 1 or k = 4} and B := {(k. Let (Ω. 36 A := {(k. l)) = 36 for all (k. Show that P(A ∩ B ∩ C) = P(A)P(B)P(C).. what is the probability that this family owns a car or a house but not both? 26. P) be a probability space. . Show that if A.2. l = 1. F.2. l) : k. Assume A := {(k. Let (Ω. . B. l ≤ 6}. . Suppose we have 10 coins which are such that if the ith one is ﬂipped i then heads will appear with probability 10 . What is the conditionally probability that one has chosen the ﬁfth coin given it had shown heads after ﬂipping? Hint: Use Bayes’ formula. . In a certain community 60 % of the families own their own car. i. C independent ? 24. l) ∈ Ω.15. l) : k + l = 9}. 27. 5}. l) : l = 1.. where µ(B) := P(B|A). We deﬁne 6 A := {1. F = 2Ω . 4} and B := {2. F..4. B := {(k. . F := 2Ω and P(B) := 1 card(B). Let (Ω. Ω = {(k. Show that (Ω. B ∈ F are independent this implies (a) A and B c are independent. 6}. Are A and B independent? 25. 22.. and 20% own both (their own car and their own home). 30 % own their own home.e. P) be a probability space and A ∈ F such that P(A) > 0. 79 21. l) : 1 ≤ 1 k.6 (7). Are A and B independent? (b) Let Ω := {(k. F. Prove Proposition 1. If a family is randomly chosen. l) : l = 2 or l = 5} . and P((k. l) : l = 4. Prove Bayes’ formula: Proposition 1. PROBABILITY SPACES 20. Are A. 10. P) be the model of rolling two dice. C := {(k. (b) Ac and B c are independent. 6}. (a) Let Ω := {1. 2. 2 or 5}... One chooses randomly one of the coins. 5 or 6}.

..* Let (Ω. 2.. 32. Let us play the following game: If we roll a die it shows each of the 1 numbers {1.. Prove Property (3) in Lemma 1..p (B) := k∈B n n−k p (1 − p)k . F := 2Ω and µp (B) := p(1 − p)k . .. possibilities to choose k elements out of n ele- 33. we win if it would all k2 times show 6.}.. 4.. ∈ {1. n}. A2 . n ≥ 1. 3.. 1. . . k∈B .80 CHAPTER 4. 6 n=1 (b) The probability of loosing inﬁnitely many often is always 1. 2. a π-system and a λ-system.. .4. k2 . Ω := {0.. n} and n k n k := n! k!(n − k)! where 0! := 1 and k! := 1 · 2 · · · k.* A class that is both. Again. .n µn.} be a given sequence. 1. If it did show all k1 times 6 we won. 1 2 n 31. .. (b) Second go: We roll k2 times. Ω := {0. Ac .. 2 34. Prove that one has ments.. Show that (a) The probability of winning inﬁnitely many often is 1 if and only if ∞ k 1 n = ∞. Let k1 . Geometric distribution: Let 0 < p < 1. 30. for k ≥ 1. 29. An ∈ F are independent. 2... k (a) Is (Ω. (a) First go: One has to roll the die k1 times.. is a σ-algebra.. 2. EXERCISES 28. F := 2Ω and µn. Show that then Ac . Binomial distributon: Assume 0 < p < 1...p ({k}) for p = 1 . 5. k ∈ {0..p ) a probability space? (b) Compute maxk=0.4.. F.. Hint: Use the lemma of Borel-Cantelli. P) be a probability space and assume A1 . Ac are independent. 1. 6} with probability 6 . µn.. Let n ≥ 1.. 3. F. And so on.

Complete the proof of Proposition 2.. Ω := {0. 1. F... . P) be a probability space and f : Ω → R. Let (Ω. I I (b) lim supn→∞ 1 An (ω) = 1 lim supn→∞ An (ω). (a) Show that. 4. 2. . F := 2Ω and πλ (B) := k∈B 81 e−λ λk . P) be a probability space and A ⊆ Ω a set. RANDOM VARIABLES (a) Is (Ω. if F = {∅.5. (b) Show that.. 2. .2 Random variables (a) lim inf n→∞ 1 An (ω) = 1 lim inf n→∞ An (ω). Poisson distribution: Let λ > 0.9 by showing that for a distribution function Fg (x) = P ({ω : g ≤ x}) of a random variable g it holds x→−∞ lim Fg (x) = 0 and lim Fg (x) = 1. 35. if P(A) is 0 or 1 for every A ∈ F and f is measurable. then f is measurable if and only if f is constant.2. 6. Let (Ω. F. x→∞ . 5. 6. ⊆ Ω. Ω}. Show that lim inf fn n→∞ and lim sup fn n→∞ are random variables. Let (Ω. Show that 2.}). F.}.4. n = 1. Show that A ∈ F if and only if 1 A : Ω → R is a random variable. then P({ω : f (ω) = c}) = 1 for a constant c. F) be a measurable space and fn ..2. µp ) a probability space? (b) Compute µp ({0. a sequence of random variables. F. 4. πλ ) a probability space? (b) Compute k∈Ω kπλ ({k}).. 4. I I for all ω ∈ Ω. k! (a) Is (Ω. . Show the assertions (1). Let A1 . . . A2 . 2. I 3. 1. (3) and (4) of Proposition 2.1.

where I I 0 < p < 1. F3 )-measurable.p) (y).1/2) (x) + 21 [1/2. Σ) a measurable space and assume that f : Ω → M is (F. 1] × [0. B([0. g : Ω → R. 1].p) (x) and g(x. 13. Show that for A1 . 9. I I and g(x) := 1 [0. P) be a probability space. Σ)-measurable. 1])) be a measurable space and deﬁne the functions f (x) := 1 [0. Let ([0. A2 . (Ω2 . 1]). Let (Ω. F2 )-measurable and that g : Ω2 → Ω3 is (F2 . F3 ) be measurable spaces. 1]) ⊗ B([0. Assume a measurable space (Ω. F3 )measurable. (b) the law of f + g. Show that (a) f and g are independent. Let f : Ω → R be a map. F1 ). 1]. F2 ). F. Assume the probability space (Ω. Pf +g ({k}).1. Prove Proposition 2. (M. for k = 0. Assume the product space ([0. F.1]) (x). 2 is the binomial distribution µ2. y ∈ R it holds P ({f = x. · · · ⊆ R it holds ∞ ∞ f −1 i=1 Ai = i=1 f −1 (Ai ). F) and random variables f. y) := 1 [0. EXERCISES 7. Let (Ω1 . g = y}) = P ({f = x}) P ({g = y}) 12.1/2) (x) − 1 {1/4} (x). 10. B([0. I I Compute (a) σ(f ) and (b) σ(g). P) and let f. Show that µ with µ(B) := P ({ω : f (ω) ∈ B}) is a probability measure on Σ. 1.p . Is the set {ω : f (ω) = g(ω)} measurable? for B ∈ Σ . 8.82 CHAPTER 4. 14. Show that f and g are independent if and only if for all x. g : Ω → R be measurable step-functions. 11. y) := 1 [0. Assume that f : Ω1 → Ω2 is (F1 .4. (Ω3 . Show that then g ◦ f : Ω1 → Ω3 deﬁned by (g ◦ f )(ω1 ) := g(f (ω1 )) is (F1 . λ × λ) and the random variables f (x.

Assume n ∈ N and deﬁne by F := σ{ a σ–algebra on [0. Prove assertion (4) and (5) of Proposition 3. n − 1} (a) Why is the function f (x) = x. b). k = 1. . Bj ∈ F. gn (fn ) are independent. .1 1 Hints: To prove (4). Using this one gets P {ω : f (ω) > 0} = 0. . . . 18. . Show that g1 (f1 ).5. . Prove Proposition 2. .3 Integration n m 1. . g2 (f2 ). . . Show that Ef g = Ef Eg. P) be a probability space and f. k = 1. To prove (5) use Ef = E(f 1 {f =g} + f 1 {f =g} ) = Ef 1 {f =g} + Ef 1 {f =g} = Ef 1 {f =g} . 4. I where ai . 1. . . Which continuous functions f : [0. bj ∈ R and Ai . x ∈ [0.4. F. . 1] → R are measurable with respect to G ? 16. . g : Ω → R random variables with f= i=1 ai 1 Ai I and g= j=1 bj 1 Bj . n n for k = 0. Let fk : Ω → R. . . . set An := ω : f (ω) > n and show that 1 1 1 0 = Ef 1 An ≥ E 1 An = E1 An = P(An ). INTEGRATION 83 15.4. n are B(R)– measurable. 19. 1]. 17. 20. Let G := σ{(a. . Bm } are independent. . . I I I n n n This implies P(An ) = 0 for all n. Assume that {A1 . . n be independent random variables on (Ω. Show Proposition 2. 0 < a < b < 1} be a σ -algebra on [0. k k+1 .3. I I I I I . F. Let (Ω. An } and {B1 . 1]. . Show that families of random variables are independent iﬀ (if and only * if) their generated σ-algebras are independent. 2.3.3. P) and assume the functions gk : R → R.2. 1] not F–measurable? (b) Give an example of an F–measurable function.

. F. Assume the probability space ([0. 1] x (a) Show that Ef = [a. b−a ). .. 1. λ) and f (x) := Compute Ef. . of f . 4.b] g(ω) 1 dλ(ω) = b−a where g(ω) := ω 2 . . . 2. . 1 for irrational x ∈ (0. Hint: Use a) and b). .2.ak ) I with a−1 := 0 and ak := e −λ λm .. where λ > 0. 1. uniform distribution: λ Assume the probability space ([a. 1]. Which distribution has f ? 5. b]. 6. k = 0. Assume the probability space ((0.1. 1]). ∈ F. 2. n n Hint: Use Fatou’s lemma and Exercise 4. m! m=0 k for k = 0. Show that P lim inf An n ≤ ≤ lim inf P (An ) n lim sup P (An ) ≤ P lim sup An . Let (Ω. b]). B((0. B([0. where −∞ < a < b < ∞. Compute the law Pf ({k}).84 CHAPTER 4. A2 . EXERCISES 3. (c) Compute the variance E(f − Ef )2 of the random variable f (ω) := ω.b] f (ω) a+b 1 dλ(ω) = b−a 2 where f (ω) := ω. . 1]). 1] 1 for rational x ∈ (0. λ) and the random variable ∞ f= k=0 k1 [ak−1 . P) be a probability space and A1 . 1]. (b) Compute Eg = [a. B([a.

for i = 1. F := 2Ω and n n k µn. 2. mean Efi = mi and variance σi = E(fi − mi )2 .}. (b) g = n i=1 fi . Poisson distribution: Let λ > 0. b ∈ R. Binomial distribution: Let 0 < p < 1. . k! (a) Show that Ef = Ω f (ω)dπλ (ω) = λ where f (k) := k. . (b) Show that Eg = Ω g(ω)dµλ (ω) = λ2 where g(k) := k(k − 1). n}. (b) Show that Eg = Ω g(ω)dµn.p (ω) = n(n − 1)p2 where g(k) := k(k − 1). 2.. F := 2Ω and ∞ πλ = k=0 e−λ λk δk . n. . k k=0 (a) Show that Ef = Ω f (ω)dµn.4.p = p (1 − p)(n−k) δk .p (ω) = np where f (k) := k. . (c) Compute the variance E(f − Ef )2 of the random variable f (k) := k.. (c) Compute the variance E(f − Ef )2 of the random variable f (k) := k. 3.. Hint: Use a) and b). 9. INTEGRATION 85 7. Ω := {0. Assume integrable.3.. independent random variables fi : Ω → R with 2 Efi2 < ∞. where a. Ω := {0. 1. 8. . n ≥ 2. . 1. . Compute the mean and the variance of (a) g = af1 + b. Hint: Use a) and b)..

. (b) Use (a) to show E|f g| < ∞.6.i. 16.86 CHAPTER 4. P).6. Prove assertion (2) of Proposition 3. . 12. Which parameter does this Poisson distribution have? 11. . EXERCISES 10. Let the random variables f and g be independent and Poisson distrubuted with parameters λ1 and λ2 . Show that f + g is Poisson distributed k Pf +g ({k}) = P(f + g = k) = l=0 P(f = l. g = k − l) = . 15. Show that ∞ ∞ E k=1 fk = k=1 Efk (≤ ∞). Hint: (a) Assume ﬁrst f ≥ 0 and g ≥ 0 and show using the ”stair-case functions” to approximate f and g that Ef g = Ef Eg.2. 14.d. Let f1 .3. in the general case. Show that E|f g| < ∞ and Ef g = Ef Eg. Use H¨lder’s inequality to show Corollary 3. . Use the Central Limit Theorem to show that P f1 + f2 + · · · + fn − nm √ ω: ≤x σ n 1 → 2π x −∞ e− 2 du u2 as n → ∞. . . o 13. random variables (fk )∞ with Ef1 = m k=1 and variance E(f1 − m)2 = σ 2 . . Assume a sequence of i. and then Lebesgue’s Theorem for Ef g = Ef Eg. F. Assume that f ja g are independent random variables where E|f | < ∞ and E|g| < ∞. Use Minkowski’s inequality to show that for sequences of real numbers (an )∞ and (bn )∞ and 1 ≤ p < ∞ it holds n=1 n=1 ∞ 1 p ∞ 1 p ∞ 1 p |an + bn |p n=1 ≤ n=1 |an |p + n=1 |bn |p . respectively. be non-negative random variables on (Ω. f2 .

. 1]. INTEGRATION 87 17.3. . . I P L .4.1] (x). λ) be a probability space. B([0. Let ([0. 2. .1/2n] (x) I where n = 1. (a) Does there exist a random variable f : Ω → R such that fn → f almost surely? (b) Does there exist a random variable f : Ω → R such that fn → f ? P (c) Does there exist a random variable f : Ω → R such that fn → f ? and f2n−1 (x) := n3 1 [1−1/2n. Deﬁne the functions fn : Ω → R by f2n (x) := n3 1 [0. 1]).

32 Bayes’ formula. 62 π − λ-Theorem. 66 counting measure. 67 o i. 14 geometric distribution. 13 Borel σ-algebra on Rn . 66 law of a random variable. 17 expectation of a random variable. 12 conditional probability. 60 sures. 39 Lebesgue integrable. 47 lim supn An . 55 equivalence relation. 74 independence of a family of random variables. 71 convergence in distribution. 31 existence of sets. 75 change of variables. 10 λ-system. 15 Dirac measure. 18 lemma of Fatou for random variables. 14 Minkowski’s inequality. 28 σ-ﬁnite. 27 Lebesgue’s Theorem. 54 measurable map. 47 lim supn an . 10 measurable step-function. 17 exponential distribution on R. 72 convergence in probability. which are not Borel. 14 measure space. 18 Jensen’s inequality. 17 32 lim inf n an . 21 central limit theorem.d. 38 measurable space. 15 distribution-function. 24 π-systems and uniqueness of mea. 10 axiom of choice. 55 lemma of Borel-Cantelli.Index event. 31 σ-algebra. 24 Fubini’s Theorem. 58 Chebyshev’s inequality. 28 π-system. 47 Lebesgue measure. 17 expected value. 61. sequence. 18 convergence almost surely. 42 independence of a sequence of events. 42 independence of a ﬁnite family of random variables. 25 Borel σ-algebra. 10 Gaussian distribution on R. lim inf n An . 68 88 . 24 Carath´odory’s extension e theorem. 26. 66 closed set. 19 binomial distribution. 20 lemma of Fatou.i. 71 convexity. 25 algebra. 35 measure. 71 convergence in Lp .extended random variable. 31 H¨lder’s inequality. 40 dominated convergence.

44 step-function. 29 probability measure. 59 monotone class theorem. 74 uniform distribution. 14 probability space. 73 89 . 26. absolute moments. 14 product of probability spaces. 23 random variable. 59 weak law of large numbers.INDEX moments. 12 Poisson distribution. 36 realization of independent random variables. 35 strong law of large numbers. 27 variance. 60 monotone convergence. 49 vector space. 53 open set. 25 Poisson’s Theorem.

90 INDEX .

Williams.Bibliography [1] H. 1996. 91 . Cambridge University Press. Bauer.N. Probability. Probability and Measure. Springer. [5] D. 1996. [4] A. Billingsley. Walter de Gruyter. [2] H. Shiryaev. Probability with martingales. 1995. 1991. [3] P. Bauer. Probability theory. Walter de Gruyter. 2001. Wiley. Measure and integration theory.