Professional Documents
Culture Documents
i=1
f
i
() and Ef
are close to each other. This lieds us to the Strong Law of Large Numbers
discussed in Section 3.8.
Notation. Given a set and subsets A, B , then the following notation
is used:
intersection: A B = { : A and B}
union: A B = { : A or (or both) B}
set-theoretical minus: A\B = { : A and B}
complement: A
c
= { : A}
empty set: = set, without any element
real numbers: R
natural numbers: N = {1, 2, 3, ...}
rational numbers: Q
indicator-function: 1I
A
(x) =
_
1 if x A
0 if x A
Given real numbers , , we use := min {, }.
8 CONTENTS
Chapter 1
Probability spaces
In this chapter we introduce the probability space, the fundamental notion
of probability theory. A probability space (, F, P) consists of three compo-
nents.
(1) The elementary events or states which are collected in a non-empty
set .
Example 1.0.1 (a) If we roll a die, then all possible outcomes are the
numbers between 1 and 6. That means
= {1, 2, 3, 4, 5, 6}.
(b) If we ip a coin, then we have either heads or tails on top, that
means
= {H, T}.
If we have two coins, then
= {(H, H), (H, T), (T, H), (T, T)}
is the set of all possible outcomes.
(c) For the lifetime of a bulb in hours we can choose
= [0, ).
(2) A -algebra F, which is the system of observable subsets or events
A . The interpretation is that one can usually not decide whether a
system is in the particular state , but one can decide whether A
or A.
(3) A measure P, which gives a probability to all A F. This probability
is a number P(A) [0, 1] that describes how likely it is that the event A
occurs.
For the formal mathematical approach we proceed in two steps: in the rst
step we dene the -algebras F, here we do not need any measure. In the
second step we introduce the measures.
9
10 CHAPTER 1. PROBABILITY SPACES
1.1 Denition of -algebras
The -algebra is a basic tool in probability theory. It is the set the proba-
bility measures are dened on. Without this notion it would be impossible
to consider the fundamental Lebesgue measure on the interval [0, 1] or to
consider Gaussian measures, without which many parts of mathematics can
not live.
Denition 1.1.1 [-algebra, algebra, measurable space] Let be
a non-empty set. A system F of subsets A is called -algebra on if
(1) , F,
(2) A F implies that A
c
:= \A F,
(3) A
1
, A
2
, ... F implies that
i=1
A
i
F.
The pair (, F), where F is a -algebra on , is called measurable space.
The elements A F are called events. An event A occurs if A and it
does not occur if A.
If one replaces (3) by
(3
) A, B F implies that A B F,
then F is called an algebra.
Every -algebra is an algebra. Sometimes, the terms -eld and eld are
used instead of -algebra and algebra. We consider some rst examples.
Example 1.1.2 (a) The largest -algebra on : if F = 2
is the system
of all subsets A , then F is a -algebra.
(b) The smallest -algebra: F = {, }.
(c) If A , then F = {, , A, A
c
} is a -algebra.
Some more concrete examples are the following:
Example 1.1.3 (a) Assume a model for a die, i.e. := {1, ..., 6} and
F := 2
. The event the sum of the two dice equals four is described
by
A = {(1, 3), (2, 2), (3, 1)}.
1.1. DEFINITION OF -ALGEBRAS 11
(c) Assume a model for two coins, i.e. := {(H, H), (H, T), (T, H), (T, T)}
and F := 2
which would
be too big.
If = {
1
, ...,
n
}, then any algebra F on is automatically a -algebra.
However, in general this is not the case as shown by the next example:
Example 1.1.4 [algebra, which is not a -algebra] Let G be the
system of subsets A R such that A can be written as
A = (a
1
, b
1
] (a
2
, b
2
] (a
n
, b
n
]
where a
1
b
1
a
n
b
n
with the convention that
(a, ] = (a, ) and (a, a] = . Then G is an algebra, but not a -algebra.
Unfortunately, most of the important algebras can not be constructed
explicitly. Surprisingly, one can work practically with them nevertheless. In
the following we describe a simple procedure which generates algebras. We
start with the fundamental
Proposition 1.1.5 [intersection of -algebras is a -algebra] Let
be an arbitrary non-empty set and let F
j
, j J, J = , be a family of
-algebras on , where J is an arbitrary index set. Then
F :=
jJ
F
j
is a -algebra as well.
Proof. The proof is very easy, but typical and fundamental. First we notice
that , F
j
for all j J, so that ,
jJ
F
j
. Now let A, A
1
, A
2
, ...
jJ
F
j
. Hence A, A
1
, A
2
, ... F
j
for all j J, so that (F
j
are algebras!)
A
c
= \A F
j
and
_
i=1
A
i
F
j
for all j J. Consequently,
A
c
jJ
F
j
and
_
i=1
A
i
jJ
F
j
.
and 2
is a algebra. Hence
(G) :=
CJ
C
yields to a -algebra according to Proposition 1.1.5 such that (by construc-
tion) G (G). It remains to show that (G) is the smallest -algebra
containing G. Assume another -algebra F with G F. By denition of J
we have that F J so that
(G) =
CJ
C F.
The construction is very elegant but has, as already mentioned, the slight
disadvantage that one cannot construct all elements of (G) explicitly. Let
us now turn to one of the most important examples, the Borel -algebra on
R. To do this we need the notion of open and closed sets.
Denition 1.1.7 [open and closed sets]
(1) A subset A R is called open, if for each x A there is an > 0 such
that (x , x + ) A.
(2) A subset B R is called closed, if A := R\B is open.
Given a b , the interval (a, b) is open and the interval [a, b] is
closed. Moreover, by denition the empty set is open and closed.
Proposition 1.1.8 [Generation of the Borel -algebra on R] We
let
G
0
be the system of all open subsets of R,
G
1
be the system of all closed subsets of R,
G
2
be the system of all intervals (, b], b R,
G
3
be the system of all intervals (, b), b R,
G
4
be the system of all intervals (a, b], < a < b < ,
G
5
be the system of all intervals (a, b), < a < b < .
Then (G
0
) = (G
1
) = (G
2
) = (G
3
) = (G
4
) = (G
5
).
1.1. DEFINITION OF -ALGEBRAS 13
Denition 1.1.9 [Borel -algebra on R]
1
The -algebra constructed
in Proposition 1.1.8 is called Borel -algebra and denoted by B(R).
In the same way one can introduce Borel -algebras on metric spaces:
Given a metric space M with metric d a set A M is open provided that
for all x A there is a > 0 such that {y M : d(x, y) < } A. A set
B M is closed if the complement M\B is open. The Borel -algebra
B(M) is the smallest -algebra that contains all open (closed) subsets of M.
Proof of Proposition 1.1.8. We only show that
(G
0
) = (G
1
) = (G
3
) = (G
5
).
Because of G
3
G
0
one has
(G
3
) (G
0
).
Moreover, for < a < b < one has that
(a, b) =
_
n=1
_
(, b)\(, a +
1
n
)
_
(G
3
)
so that G
5
(G
3
) and
(G
5
) (G
3
).
Now let us assume a bounded non-empty open set A R. For all x A
there is a maximal
x
> 0 such that
(x
x
, x +
x
) A.
Hence
A =
_
xAQ
(x
x
, x +
x
),
which proves G
0
(G
5
) and
(G
0
) (G
5
).
Finally, A G
0
implies A
c
G
1
(G
1
) and A (G
1
). Hence G
0
(G
1
)
and
(G
0
) (G
1
).
The remaining inclusion (G
1
) (G
0
) can be shown in the same way.
1
Felix Edouard Justin
Emile Borel, 07/01/1871-03/02/1956, French mathematician.
14 CHAPTER 1. PROBABILITY SPACES
1.2 Probability measures
Now we introduce the measures we are going to use:
Denition 1.2.1 [probability measure, probability space] Let
(, F) be a measurable space.
(1) A map P : F [0, 1] is called probability measure if P() = 1 and
for all A
1
, A
2
, ... F with A
i
A
j
= for i = j one has
P
_
_
i=1
A
i
_
=
i=1
P(A
i
). (1.1)
The triplet (, F, P) is called probability space.
(2) A map : F [0, ] is called measure if () = 0 and for all
A
1
, A
2
, ... F with A
i
A
j
= for i = j one has
_
_
i=1
A
i
_
=
i=1
(A
i
).
The triplet (, F, ) is called measure space.
(3) A measure space (, F, ) or a measure is called -nite provided
that there are
k
, k = 1, 2, ..., such that
(a)
k
F for all k = 1, 2, ...,
(b)
i
j
= for i = j,
(c) =
k=1
k
,
(d) (
k
) < .
The measure space (, F, ) or the measure are called nite if
() < .
Remark 1.2.2 Of course, any probability measure is a nite measure: We
need only to check () = 0 which follows from () =
i=1
() (note that
= ) and () < .
Example 1.2.3 (a) We assume the model of a die, i.e. = {1, ..., 6} and
F = 2
. Assuming that all outcomes for rolling a die are equally likely,
leads to
P({}) :=
1
6
.
Then, for example,
P({2, 4, 6}) =
1
2
.
1.2. PROBABILITY MEASURES 15
(b) If we assume to have two coins, i.e.
= {(T, T), (H, T), (T, H), (H, H)}
and F = 2
and a xed x
0
we let
x
0
(A) :=
_
1 : x
0
A
0 : x
0
A
.
(b) Counting measure: Let := {
1
, ...,
N
} and F = 2
. Then
(A) := cardinality of A.
Let us now discuss a typical example in which the algebra F is not the set
of all subsets of .
Example 1.2.5 Assume that there are n communication channels between
the points A and B. Each of the channels has a communication rate of > 0
(say bits per second), which yields to the communication rate k, in case
k channels are used. Each of the channels fails with probability p, so that
we have a random communication rate R {0, , ..., n}. What is the right
model for this? We use
:= { = (
1
, ...,
n
) :
i
{0, 1})
with the interpretation:
i
= 0 if channel i is failing,
i
= 1 if channel i is
working. F consists of all unions of
A
k
:= { :
1
+ +
n
= k} .
Hence A
k
consists of all such that the communication rate is k. The
system F is the system of observable sets of events since one can only observe
how many channels are failing, but not which channel fails. The measure P
is given by
P(A
k
) :=
_
n
k
_
p
nk
(1 p)
k
, 0 < p < 1.
Note that P describes the binomial distribution with parameter p on
{0, ..., n} if we identify A
k
with the natural number k.
2
Paul Adrien Maurice Dirac, 08/08/1902 (Bristol, England) - 20/10/1984 (Tallahassee,
Florida, USA), Nobelprice in Physics 1933.
16 CHAPTER 1. PROBABILITY SPACES
We continue with some basic properties of a probability measure.
Proposition 1.2.6 Let (, F, P) be a probability space. Then the following
assertions are true:
(1) If A
1
, ..., A
n
F such that A
i
A
j
= if i = j, then P(
n
i=1
A
i
) =
n
i=1
P(A
i
).
(2) If A, B F, then P(A\B) = P(A) P(A B).
(3) If B F, then P(B
c
) = 1 P(B).
(4) If A
1
, A
2
, ... F then P(
i=1
A
i
)
i=1
P(A
i
).
(5) Continuity from below: If A
1
, A
2
, ... F such that A
1
A
2
A
3
, then
lim
n
P(A
n
) = P
_
_
n=1
A
n
_
.
(6) Continuity from above: If A
1
, A
2
, ... F such that A
1
A
2
A
3
, then
lim
n
P(A
n
) = P
_
n=1
A
n
_
.
Proof. (1) We let A
n+1
= A
n+2
= = , so that
P
_
n
_
i=1
A
i
_
= P
_
_
i=1
A
i
_
=
i=1
P(A
i
) =
n
i=1
P(A
i
) ,
because of P() = 0.
(2) Since (A B) (A\B) = , we get that
P(A B) +P(A\B) = P((A B) (A\B)) = P(A).
(3) We apply (2) to A = and observe that \B = B
c
by denition and
B = B.
(4) Put B
1
:= A
1
and B
i
:= A
c
1
A
c
2
A
c
i1
A
i
for i = 2, 3, . . . Obviously,
P(B
i
) P(A
i
) for all i. Since the B
i
s are disjoint and
i=1
A
i
=
i=1
B
i
it
follows
P
_
_
i=1
A
i
_
= P
_
_
i=1
B
i
_
=
i=1
P(B
i
)
i=1
P(A
i
).
(5) We dene B
1
:= A
1
, B
2
:= A
2
\A
1
, B
3
:= A
3
\A
2
, B
4
:= A
4
\A
3
, ... and
get that
_
n=1
B
n
=
_
n=1
A
n
and B
i
B
j
=
1.2. PROBABILITY MEASURES 17
for i = j. Consequently,
P
_
_
n=1
A
n
_
=P
_
_
n=1
B
n
_
=
n=1
P(B
n
) = lim
N
N
n=1
P(B
n
) = lim
N
P(A
N
)
since
N
n=1
B
n
= A
N
. (6) is an exercise.
The sequence a
n
= (1)
n
, n = 1, 2, . . . does not converge, i.e. the limit of
(a
n
)
n=1
does not exist. But the limit superior and the limit inferior for a
given sequence of real numbers exists always.
Denition 1.2.7 [liminf
n
a
n
and limsup
n
a
n
] For a
1
, a
2
, ... R we let
liminf
n
a
n
:= lim
n
inf
kn
a
k
and limsup
n
a
n
:= lim
n
sup
kn
a
k
.
Remark 1.2.8 (1) The value liminf
n
a
n
is the inmum of all c such that
there is a subsequence n
1
< n
2
< n
3
< such that lim
k
a
n
k
= c.
(2) The value limsup
n
a
n
is the supremum of all c such that there is a
subsequence n
1
< n
2
< n
3
< such that lim
k
a
n
k
= c.
(3) By denition one has that
liminf
n
a
n
limsup
n
a
n
.
Moreover, if liminf
n
a
n
= limsup
n
a
n
= a R, then lim
n
a
n
= a.
(4) For example, taking a
n
= (1)
n
, gives
liminf
n
a
n
= 1 and limsup
n
a
n
= 1.
As we will see below, also for a sequence of sets one can dene a limit superior
and a limit inferior.
Denition 1.2.9 [liminf
n
A
n
and limsup
n
A
n
] Let (, F) be a measurable
space and A
1
, A
2
, ... F. Then
liminf
n
A
n
:=
_
n=1
k=n
A
k
and limsup
n
A
n
:=
n=1
_
k=n
A
k
.
The denition above says that liminf
n
A
n
if and only if all events A
n
,
except a nite number of them, occur, and that limsup
n
A
n
if and only
if innitely many of the events A
n
occur.
18 CHAPTER 1. PROBABILITY SPACES
Proposition 1.2.10 [Lemma of Fatou]
3
Let (, F, P) be a probability
space and A
1
, A
2
, ... F. Then
P
_
liminf
n
A
n
_
liminf
n
P(A
n
) limsup
n
P(A
n
) P
_
limsup
n
A
n
_
.
The proposition will be deduced from Proposition 3.2.6.
Remark 1.2.11 If liminf
n
A
n
= limsup
n
A
n
= A, then the Lemma of Fatou
gives that
liminf
n
P(A
n
) = limsup
n
P(A
n
) = lim
n
P(A
n
) = P(A) .
Examples for such systems are obtained by assuming A
1
A
2
A
3
or A
1
A
2
A
3
.
Now we turn to the fundamental notion of independence.
Denition 1.2.12 [independence of events] Let (, F, P) be a prob-
ability space. The events (A
i
)
iI
F, where I is an arbitrary non-empty
index set, are called independent, provided that for all distinct i
1
, ..., i
n
I
one has that
P(A
i
1
A
i
2
A
i
n
) = P(A
i
1
) P(A
i
2
) P(A
i
n
) .
Given A
1
, ..., A
n
F, one can easily see that only demanding
P(A
1
A
2
A
n
) = P(A
1
) P(A
2
) P(A
n
)
would not yield to an appropriate notion for the independence of A
1
, ..., A
n
:
for example, taking A and B with
P(A B) = P(A)P(B)
and C = gives
P(A B C) = P(A)P(B)P(C),
which is surely not, what we had in mind. Independence can be also expressed
through conditional probabilities. Let us rst dene them:
Denition 1.2.13 [conditional probability] Let (, F, P) be a proba-
bility space, A F with P(A) > 0. Then
P(B|A) :=
P(B A)
P(A)
, for B F,
is called conditional probability of B given A.
3
Pierre Joseph Louis Fatou, 28/02/1878-10/08/1929, French mathematician (dynami-
cal systems, Mandelbrot-set).
1.2. PROBABILITY MEASURES 19
It is now obvious that A and B are independent if and only if P(B|A) = P(B).
An important place of the conditional probabilities in Statistics is guaranteed
by Bayes formula. Before we formulate this formula in Proposition 1.2.15
we consider A, B F, with 0 < P(B) < 1 and P(A) > 0. Then
A = (A B) (A B
c
),
where (A B) (A B
c
) = , and therefore,
P(A) = P(A B) +P(A B
c
)
= P(A|B)P(B) +P(A|B
c
)P(B
c
).
This implies
P(B|A) =
P(B A)
P(A)
=
P(A|B)P(B)
P(A)
=
P(A|B)P(B)
P(A|B)P(B) +P(A|B
c
)P(B
c
)
.
Example 1.2.14 A laboratory blood test is 95% eective in detecting a
certain disease when it is, in fact, present. However, the test also yields a
false positive result for 1% of the healthy persons tested. If 0.5% of the
population actually has the disease, what is the probability a person has the
disease given his test result is positive? We set
B := the person has the disease,
A := the test result is positive.
Hence we have
P(A|B) = P(a positive test result|person has the disease) = 0.95,
P(A|B
c
) = 0.01,
P(B) = 0.005.
Applying the above formula we get
P(B|A) =
0.95 0.005
0.95 0.005 + 0.01 0.995
0.323.
That means only 32% of the persons whose test results are positive actually
have the disease.
Proposition 1.2.15 [Bayes formula]
4
Assume A, B
j
F, with =
n
j=1
B
j
, where B
i
B
j
= for i = j and P(A) > 0, P(B
j
) > 0 for j =
1, . . . , n. Then
P(B
j
|A) =
P(A|B
j
)P(B
j
)
n
k=1
P(A|B
k
)P(B
k
)
.
4
Thomas Bayes, 1702-17/04/1761, English mathematician.
20 CHAPTER 1. PROBABILITY SPACES
The proof is an exercise. An event B
j
is also called hypothesis, the prob-
abilities P(B
j
) the prior probabilities (or a priori probabilities), and the
probabilities P(B
j
|A) the posterior probabilities (or a posteriori proba-
bilities) of B
j
.
Now we continue with the fundamental Lemma of Borel-Cantelli.
Proposition 1.2.16 [Lemma of Borel-Cantelli]
5
Let (, F, P) be a
probability space and A
1
, A
2
, ... F. Then one has the following:
(1) If
n=1
P(A
n
) < , then P(limsup
n
A
n
) = 0.
(2) If A
1
, A
2
, ... are assumed to be independent and
n=1
P(A
n
) = , then
P(limsup
n
A
n
) = 1.
Proof. (1) It holds by denition limsup
n
A
n
=
n=1
k=n
A
k
. By
_
k=n+1
A
k
_
k=n
A
k
and the continuity of P from above (see Proposition 1.2.6) we get that
P
_
limsup
n
A
n
_
= P
_
n=1
_
k=n
A
k
_
= lim
n
P
_
_
k=n
A
k
_
lim
n
k=n
P(A
k
) = 0,
where the last inequality follows again from Proposition 1.2.6.
(2) It holds that
_
limsup
n
A
n
_
c
= liminf
n
A
c
n
=
_
n=1
k=n
A
c
n
.
So, we would need to show
P
_
_
n=1
k=n
A
c
n
_
= 0.
Letting B
n
:=
k=n
A
c
k
we get that B
1
B
2
B
3
, so that
P
_
_
n=1
k=n
A
c
n
_
= lim
n
P(B
n
)
5
Francesco Paolo Cantelli, 20/12/1875-21/07/1966, Italian mathematician.
1.2. PROBABILITY MEASURES 21
so that it suces to show
P(B
n
) = P
_
k=n
A
c
k
_
= 0.
Since the independence of A
1
, A
2
, ... implies the independence of A
c
1
, A
c
2
, ...,
we nally get (setting p
n
:= P(A
n
)) that
P
_
k=n
A
c
k
_
= lim
N,Nn
P
_
N
k=n
A
c
k
_
= lim
N,Nn
N
k=n
P(A
c
k
)
= lim
N,Nn
N
k=n
(1 p
k
)
lim
N,Nn
N
k=n
e
p
k
= lim
N,Nn
e
P
N
k=n
p
k
= e
k=n
p
k
= e
= 0
where we have used that 1 x e
x
.
Although the denition of a measure is not dicult, to prove existence and
uniqueness of measures may sometimes be dicult. The problem lies in the
fact that, in general, the -algebras are not constructed explicitly, one only
knows their existence. To overcome this diculty, one usually exploits
Proposition 1.2.17 [Carath
i=1
A
i
G, then
P
0
_
_
i=1
A
i
_
=
i=1
P
0
(A
i
).
6
Constantin Caratheodory, 13/09/1873 (Berlin, Germany) - 02/02/1950 (Munich, Ger-
many).
22 CHAPTER 1. PROBABILITY SPACES
Then there exists a unique nite measure P on F such that
P(A) = P
0
(A) for all A G.
Proof. See [3] (Theorem 3.1).
As an application we construct (more or less without rigorous proof) the
product space
(
1
2
, F
1
F
2
, P
1
P
2
)
of two probability spaces (
1
, F
1
, P
1
) and (
2
, F
2
, P
2
). We do this as follows:
(1)
1
2
:= {(
1
,
2
) :
1
1
,
2
2
}.
(2) F
1
F
2
is the smallest -algebra on
1
2
which contains all sets of
type
A
1
A
2
:= {(
1
,
2
) :
1
A
1
,
2
A
2
} with A
1
F
1
, A
2
F
2
.
(3) As algebra G we take all sets of type
A :=
_
A
1
1
A
1
2
_
(A
n
1
A
n
2
)
with A
k
1
F
1
, A
k
2
F
2
, and (A
i
1
A
i
2
)
_
A
j
1
A
j
2
_
= for i = j.
Finally, we dene P
0
: G [0, 1] by
P
0
__
A
1
1
A
1
2
_
(A
n
1
A
n
2
)
_
:=
n
k=1
P
1
(A
k
1
)P
2
(A
k
2
).
Proposition 1.2.18 The system G is an algebra. The map P
0
: G [0, 1]
is correctly dened and satises the assumptions of Caratheodorys extension
theorem Proposition 1.2.17.
Proof. (i) Assume
A =
_
A
1
1
A
1
2
_
(A
n
1
A
n
2
) =
_
B
1
1
B
1
2
_
(B
m
1
B
m
2
)
where the (A
k
1
A
k
2
)
n
k=1
and the (B
l
1
B
l
2
)
m
l=1
are pair-wise disjoint, respec-
tively. We nd partitions C
1
1
, ..., C
N
1
1
of
1
and C
1
2
, ..., C
N
2
2
of
2
so that A
k
i
and B
l
i
can be represented as disjoint unions of the sets C
r
i
, r = 1, ..., N
i
.
Hence there is a representation
A =
_
(r,s)I
(C
r
1
C
s
2
)
for some I {1, ..., N
1
} {1, ...., N
2
}. By drawing a picture and using that
P
1
and P
2
are measures one observes that
n
k=1
P
1
(A
k
1
)P
2
(A
k
2
) =
(r,s)I
P
1
(C
r
1
)P
2
(C
s
2
) =
m
l=1
P
1
(B
l
1
)P
2
(B
l
2
).
1.2. PROBABILITY MEASURES 23
(ii) To check that P
0
is -additive on G it is sucient to prove the following:
For A
i
, A
k
i
F
i
with
A
1
A
2
=
_
k=1
(A
k
1
A
k
2
)
and (A
k
1
A
k
2
) (A
l
1
A
l
2
) = for k = l one has that
P
1
(A
1
)P
2
(A
2
) =
k=1
P
1
(A
k
1
)P
2
(A
k
2
).
Since the inequality
P
1
(A
1
)P
2
(A
2
)
N
k=1
P
1
(A
k
1
)P
2
(A
k
2
)
can be easily seen for all N = 1, 2, ... by an argument like in step (i) we
concentrate ourselves on the converse inequality
P
1
(A
1
)P
2
(A
2
)
k=1
P
1
(A
k
1
)P
2
(A
k
2
).
We let
(
1
) :=
n=1
1I
{
1
A
n
1
}
P
2
(A
n
2
) and
N
(
1
) :=
N
n=1
1I
{
1
A
n
1
}
P
2
(A
n
2
)
for N 1, so that 0
N
(
1
)
N
(
1
) = 1I
A
1
(
1
)P
2
(A
2
). Let (0, 1)
and
B
N
:= {
1
1
: (1 )P
2
(A
2
)
N
(
1
)} F
1
.
The sets B
N
N=1
B
N
= A
1
so that
(1 )P
1
(A
1
)P
2
(A
2
) = lim
N
(1 )P
1
(B
N
)P
2
(A
2
).
Because (1 )P
2
(A
2
)
N
(
1
) for all
1
B
N
N
k=1
P
1
(A
k
1
)P
2
(A
k
2
) and therefore
lim
N
(1 )P
1
(B
N
)P
2
(A
2
) lim
N
N
k=1
P
1
(A
k
1
)P
2
(A
k
2
) =
k=1
P
1
(A
k
1
)P
2
(A
k
2
).
Since (0, 1) was arbitrary, we are done.
Denition 1.2.19 [product of probability spaces] The extension of
P
0
to F
1
F
2
according to Proposition 1.2.17 is called product measure
and denoted by P
1
P
2
. The probability space (
1
2
, F
1
F
2
, P
1
P
2
)
is called product probability space.
24 CHAPTER 1. PROBABILITY SPACES
One can prove that
(F
1
F
2
) F
3
= F
1
(F
2
F
3
) and (P
1
P
2
) P
3
= P
1
(P
2
P
3
),
which we simply denote by
(
1
3
, F
1
F
2
F
3
, P
1
P
2
P
3
).
Iterating this procedure, we can dene nite products
(
1
2
n
, F
1
F
2
F
n
, P
1
P
2
P
n
)
by iteration. The case of innite products requires more work. Here the
interested reader is referred, for example, to [5] where a special case is con-
sidered.
Now we dene the the Borel -algebra on R
n
.
Denition 1.2.20 For n {1, 2, ...} we let
B(R
n
) := ((a
1
, b
1
) (a
n
, b
n
) : a
1
< b
1
, ..., a
n
< b
n
) .
Letting |x y| := (
n
k=1
|x
k
y
k
|
2
)
1
2
for x = (x
1
, ..., x
n
) R
n
and y =
(y
1
, ..., y
n
) R
n
, we say that a set A R
n
is open provided that for all
x A there is an > 0 such that
{y R
n
: |x y| < } A.
As in the case n = 1 one can show that B(R
n
) is the smallest -algebra which
contains all open subsets of R
n
. Regarding the above product spaces there
is
Proposition 1.2.21 B(R
n
) = B(R) B(R).
If one is only interested in the uniqueness of measures one can also use the
following approach as a replacement of Carath
n
k=0
_
n
k
_
p
k
(1 p)
nk
k
(B), where
k
is the Dirac
measure introduced in Example 1.2.4.
Interpretation: Coin-tossing with one coin, such that one has heads with
probability p and tails with probability 1 p. Then
n,p
({k}) equals the
probability, that within n trials one has k-times heads.
1.3.2 Poisson distribution with parameter > 0
(1) := {0, 1, 2, 3, ...}.
(2) F := 2
(B) :=
k=0
e
k
k!
k
(B).
The Poisson distribution
7
is used, for example, to model stochastic processes
with a continuous time parameter and jumps: the probability that the process
jumps k times between the time-points s and t with 0 s < t < is equal
to
(ts)
({k}).
1.3.3 Geometric distribution with parameter 0 < p < 1
(1) := {0, 1, 2, 3, ...}.
(2) F := 2
k=0
(1 p)
k
p
k
(B).
Interpretation: The probability that an electric light bulb breaks down
is p (0, 1). The bulb does not have a memory, that means the break
down is independent of the time the bulb has been already switched on. So,
we get the following model: at day 0 the probability of breaking down is p. If
the bulb survives day 0, it breaks down again with probability p at the rst
day so that the total probability of a break down at day 1 is (1 p)p. If we
continue in this way we get that breaking down at day k has the probability
(1 p)
k
p.
7
Simeon Denis Poisson, 21/06/1781 (Pithiviers, France) - 25/04/1840 (Sceaux, France).
26 CHAPTER 1. PROBABILITY SPACES
1.3.4 Lebesgue measure and uniform distribution
Using Carath
i=1
(b
i
a
i
).
Proposition 1.3.1 The system G is an algebra. The map P
0
: G [0, 1]
is correctly dened and satises the assumptions of Caratheodorys extension
theorem Proposition 1.2.17.
Proof. For notational simplicity we let a = 0 and b = 1. After a standard
reduction (check!) we have to show the following: given 0 a b 1 and
pair-wise disjoint intervals (a
n
, b
n
] with
(a, b] =
_
n=1
(a
n
, b
n
]
we have that b a =
n=1
(b
n
a
n
). Let (0, b a) and observe that
[a + , b]
_
n=1
_
a
n
, b
n
+
2
n
_
.
Hence we have an open covering of a compact set and there is a nite sub-
cover:
[a + , b]
_
nI()
(a
n
, b
n
]
_
b
n
, b
n
+
2
n
_
for some nite set I(). The total length of the intervals
_
b
n
, b
n
+
2
n
_
is at
most > 0, so that
b a
nI()
(b
n
a
n
) +
n=1
(b
n
a
n
) + .
1.3. EXAMPLES OF DISTRIBUTIONS 27
Letting 0 we arrive at
b a
n=1
(b
n
a
n
).
Since ba
N
n=1
(b
n
a
n
) for all N 1, the opposite inequality is obvious.
n=
P
n
(B (n 1, n])
where P
n
is the uniform distribution on (n 1, n].
Accordingly, is the unique -nite measure on B(R) such that ((c, d]) =
d c for all < c < d < . We can also write (B) =
_
B
d(x). Now we
can go backwards: In order to obtain the Lebesgue measure on a set I R
with I B(R) we let B
I
:= {B I : B B(R)} and
I
(B) := (B) for B B
I
.
Given that (I) > 0, then
I
/(I) is the uniform distribution on I. Impor-
tant cases for I are the closed intervals [a, b]. Furthermore, for < a <
b < we have that B
(a,b]
= B((a, b]) (check!).
8
Henri Leon Lebesgue, 28/06/1875-26/07/1941, French mathematician (generalized
the Riemann integral by the Lebesgue integral; continuation of work of Emile Borel and
Camille Jordan).
28 CHAPTER 1. PROBABILITY SPACES
1.3.5 Gaussian distribution on R with mean m R and
variance
2
> 0
(1) := R.
(2) F := B(R) Borel -algebra.
(3) We take the algebra G considered in Example 1.1.4 and dene
P
0
(A) :=
n
i=1
_
b
i
a
i
1
2
2
e
(xm)
2
2
2
dx
for A := (a
1
, b
1
](a
2
, b
2
] (a
n
, b
n
] where we consider the Riemann-
integral
9
on the right-hand side. One can show (we do not do this here,
but compare with Proposition 3.5.8 below) that P
0
satises the assump-
tions of Proposition 1.2.17, so that we can extend P
0
to a probability
measure N
m,
2 on B(R).
The measure N
m,
2 is called Gaussian distribution
10
(normal distribu-
tion) with mean m and variance
2
. Given A B(R) we write
N
m,
2(A) =
_
A
p
m,
2(x)dx with p
m,
2(x) :=
1
2
2
e
(xm)
2
2
2
.
The function p
m,
2(x) is called Gaussian density.
1.3.6 Exponential distribution on R with parameter
> 0
(1) := R.
(2) F := B(R) Borel -algebra.
(3) For A and G as in Subsection 1.3.5 we dene, via the Riemann-integral,
P
0
(A) :=
n
i=1
_
b
i
a
i
p
(x)dx with p
(x) := 1I
[0,)
(x)e
x
Again, P
0
satises the assumptions of Proposition 1.2.17, so that we
can extend P
0
to the exponential distribution
with parameter
and density p
(x) on B(R).
9
Georg Friedrich Bernhard Riemann, 17/09/1826 (Germany) - 20/07/1866 (Italy),
Ph.D. thesis under Gauss.
10
Johann Carl Friedrich Gauss, 30/04/1777 (Brunswick, Germany) - 23/02/1855
(Gottingen, Hannover, Germany).
1.3. EXAMPLES OF DISTRIBUTIONS 29
Given A B(R) we write
(A) =
_
A
p
(x)dx.
The exponential distribution can be considered as a continuous time version
of the geometric distribution. In particular, we see that the distribution does
not have a memory in the sense that for a, b 0 we have
([a + b, )|[a, )) =
([b, ))
with the conditional probability on the left-hand side. In words: the proba-
bility of a realization larger or equal to a+b under the condition that one has
already a value larger or equal a is the same as having a realization larger or
equal b. Indeed, it holds
([a + b, )|[a, )) =
([a + b, ) [a, ))
([a, ))
=
a+b
e
x
dx
a
e
x
dx
=
e
(a+b)
e
a
=
([b, )).
Example 1.3.4 Suppose that the amount of time one spends in a post oce
is exponential distributed with =
1
10
.
(a) What is the probability, that a customer will spend more than 15 min-
utes?
(b) What is the probability, that a customer will spend more than 15 min-
utes from the beginning in the post oce, given that the customer
already spent at least 10 minutes?
The answer for (a) is
([15, )) = e
15
1
10
0.220. For (b) we get
([15, )|[10, )) =
([5, )) = e
5
1
10
0.604.
1.3.7 Poissons Theorem
For large n and small p the Poisson distribution provides a good approxima-
tion for the binomial distribution.
Proposition 1.3.5 [Poissons Theorem] Let > 0, p
n
(0, 1), n =
1, 2, ..., and assume that np
n
as n . Then, for all k = 0, 1, . . . ,
n,p
n
({k})
({k}), n .
30 CHAPTER 1. PROBABILITY SPACES
Proof. Fix an integer k 0. Then
n,p
n
({k}) =
_
n
k
_
p
k
n
(1 p
n
)
nk
=
n(n 1) . . . (n k + 1)
k!
p
k
n
(1 p
n
)
nk
=
1
k!
n(n 1) . . . (n k + 1)
n
k
(np
n
)
k
(1 p
n
)
nk
.
Of course, lim
n
(np
n
)
k
=
k
and lim
n
n(n1)...(nk+1)
n
k
= 1. So we have
to show that lim
n
(1 p
n
)
nk
= e
. By np
n
we get that there exists
a sequence
n
such that
np
n
= +
n
with lim
n
n
= 0.
Choose
0
> 0 and n
0
1 such that |
n
|
0
for all n n
0
. Then
_
1
+
0
n
_
nk
_
1
+
n
n
_
nk
_
1
0
n
_
nk
.
Using lHospitals rule we get
lim
n
ln
_
1
+
0
n
_
nk
= lim
n
(n k) ln
_
1
+
0
n
_
= lim
n
ln
_
1
+
0
n
_
1/(n k)
= lim
n
_
1
+
0
n
_
1
+
0
n
2
1/(n k)
2
= ( +
0
).
Hence
e
(+
0
)
= lim
n
_
1
+
0
n
_
nk
lim
n
_
1
+
n
n
_
nk
.
In the same way we get that
lim
n
_
1
+
n
n
_
nk
e
(
0
)
.
Finally, since we can choose
0
> 0 arbitrarily small
lim
n
(1 p
n
)
nk
= lim
n
_
1
+
n
n
_
nk
= e
n=1
A
n
L.
Proposition 1.4.2 [--Theorem] If P is a -system and L is a -
system, then P L implies (P) L.
Denition 1.4.3 [equivalence relation] An relation on a set X is
called equivalence relation if and only if
(1) x x for all x X (reexivity),
(2) x y implies y x for all x, y X (symmetry),
(3) x y and y z imply x z for all x, y, z X (transitivity).
Given x, y (0, 1] and A (0, 1], we also need the addition modulo one
x y :=
_
x + y if x + y (0, 1]
x + y 1 otherwise
and
A x := {a x : a A}.
Now dene
L := {A B((0, 1]) :
A x B((0, 1]) and (A x) = (A) for all x (0, 1]} (1.2)
where is the Lebesgue measure on (0, 1].
Lemma 1.4.4 The system L from (1.2) is a -system.
32 CHAPTER 1. PROBABILITY SPACES
Proof. The property (1) is clear since x = . To check (2) let A, B L
and A B, so that
(A x) = (A) and (B x) = (B).
We have to show that B\ A L. By the denition of it is easy to see that
A B implies A x B x and
(B x) \ (A x) = (B \ A) x,
and therefore, (B \ A) x B((0, 1]). Since is a probability measure it
follows
(B \ A) = (B) (A)
= (B x) (A x)
= ((B x) \ (A x))
= ((B \ A) x)
and B\A L. Property (3) is left as an exercise.
Finally, we need the axiom of choice.
Proposition 1.4.5 [Axiom of choice] Let I be a non-empty set and
(M
)
I
be a system of non-empty sets M
.
In other words, one can form a set by choosing of each set M
a representative
m
.
Proposition 1.4.6 There exists a subset H (0, 1] which does not belong
to B((0, 1]).
Proof. We take the system L from (1.2). If (a, b] [0, 1], then (a, b] L.
Since
P := {(a, b] : 0 a < b 1}
is a -system which generates B((0, 1]) it follows by the --Theorem 1.4.2
that
B((0, 1]) L.
Let us dene the equivalence relation
x y if and only if x r = y for some rational r (0, 1].
Let H (0, 1] be consisting of exactly one representative point from each
equivalence class (such set exists under the assumption of the axiom of
1.4. A SET WHICH IS NOT A BOREL SET 33
choice). Then H r
1
and H r
2
are disjoint for r
1
= r
2
: if they were
not disjoint, then there would exist h
1
r
1
(Hr
1
) and h
2
r
2
(Hr
2
)
with h
1
r
1
= h
2
r
2
. But this implies h
1
h
2
and hence h
1
= h
2
and
r
1
= r
2
. So it follows that (0, 1] is the countable union of disjoint sets
(0, 1] =
_
r(0,1] rational
(H r).
If we assume that H B((0, 1]) then B((0, 1]) L implies H r B((0, 1])
and
((0, 1]) =
_
_
_
r(0,1] rational
(H r)
_
_
=
r(0,1] rational
(H r).
By B((0, 1]) L we have (H r) = (H) = a 0 for all rational numbers
r (0, 1]. Consequently,
1 = ((0, 1]) =
r(0,1] rational
(H r) = a + a + . . .
So, the right hand side can either be 0 (if a = 0) or (if a > 0). This leads
to a contradiction, so H B((0, 1]).
34 CHAPTER 1. PROBABILITY SPACES
Chapter 2
Random variables
Given a probability space (, F, P), in many stochastic models functions
f : R which describe certain random phenomena are considered and
one is interested in the computation of expressions like
P({ : f() (a, b)}) , where a < b.
This leads us to the condition
{ : f() (a, b)} F
and hence to random variables we will introduce now.
2.1 Random variables
We start with the most simple random variables.
Denition 2.1.1 [(measurable) step-function] Let (, F) be a mea-
surable space. A function f : R is called measurable step-function
or step-function, provided that there are
1
, ...,
n
R and A
1
, ..., A
n
F
such that f can be written as
f() =
n
i=1
i
1I
A
i
(),
where
1I
A
i
() :=
_
1 : A
i
0 : A
i
.
Some particular examples for step-functions are
1I
= 1,
1I
= 0,
1I
A
+ 1I
A
c = 1,
35
36 CHAPTER 2. RANDOM VARIABLES
1I
AB
= 1I
A
1I
B
,
1I
AB
= 1I
A
+ 1I
B
1I
AB
.
The denition above concerns only functions which take nitely many values,
which will be too restrictive in future. So we wish to extend this denition.
Denition 2.1.2 [random variables] Let (, F) be a measurable space.
A map f : R is called random variable provided that there is a
sequence (f
n
)
n=1
of measurable step-functions f
n
: R such that
f() = lim
n
f
n
() for all .
Does our denition give what we would like to have? Yes, as we see from
Proposition 2.1.3 Let (, F) be a measurable space and let f : R be
a function. Then the following conditions are equivalent:
(1) f is a random variable.
(2) For all < a < b < one has that
f
1
((a, b)) := { : a < f() < b} F.
Proof. (1) =(2) Assume that
f() = lim
n
f
n
()
where f
n
: R are measurable step-functions. For a measurable step-
function one has that
f
1
n
((a, b)) F
so that
f
1
((a, b)) =
_
: a < lim
n
f
n
() < b
_
=
_
m=1
_
N=1
n=N
_
: a +
1
m
< f
n
() < b
1
m
_
F.
(2) =(1) First we observe that we also have that
f
1
([a, b)) = { : a f() < b}
=
m=1
_
: a
1
m
< f() < b
_
F
so that we can use the step-functions
f
n
() :=
4
n
1
k=4
n
k
2
n
1I
{
k
2
n
f<
k+1
2
n
}
().
i=1
i
1I
A
i
() and g
n
() =
l
j=1
j
1I
B
j
(),
yields
(f
n
g
n
)() =
k
i=1
l
j=1
j
1I
A
i
()1I
B
j
() =
k
i=1
l
j=1
j
1I
A
i
B
j
()
and we again obtain a step-function, since A
i
B
j
F. Items (1), (3), and
(4) are an exercise.
2.2 Measurable maps
Now we extend the notion of random variables to the notion of measurable
maps, which is necessary in many considerations and even more natural.
38 CHAPTER 2. RANDOM VARIABLES
Denition 2.2.1 [measurable map] Let (, F) and (M, ) be measurable
spaces. A map f : M is called (F, )-measurable, provided that
f
1
(B) = { : f() B} F for all B .
The connection to the random variables is given by
Proposition 2.2.2 Let (, F) be a measurable space and f : R. Then
the following assertions are equivalent:
(1) The map f is a random variable.
(2) The map f is (F, B(R))-measurable.
For the proof we need
Lemma 2.2.3 Let (, F) and (M, ) be measurable spaces and let f :
M. Assume that
0
is a system of subsets such that (
0
) = . If
f
1
(B) F for all B
0
,
then
f
1
(B) F for all B .
Proof. Dene
A :=
_
B M : f
1
(B) F
_
.
By assumption,
0
A. We show that A is a algebra.
(1) f
1
(M) = F implies that M A.
(2) If B A, then
f
1
(B
c
) = { : f() B
c
}
= { : f() / B}
= \ { : f() B}
= f
1
(B)
c
F.
(3) If B
1
, B
2
, A, then
f
1
_
_
i=1
B
i
_
=
_
i=1
f
1
(B
i
) F.
By denition of = (
0
) this implies that A, which implies our
lemma.
Proof of Proposition 2.2.2. (2) = (1) follows from (a, b) B(R) for a < b
which implies that f
1
((a, b)) F.
(1) = (2) is a consequence of Lemma 2.2.3 since B(R) = ((a, b) : <
a < b < ).
2.2. MEASURABLE MAPS 39
Example 2.2.4 If f : R R is continuous, then f is (B(R), B(R))-
measurable.
Proof. Since f is continuous we know that f
1
((a, b)) is open for all <
a < b < , so that f
1
((a, b)) B(R). Since the open intervals generate
B(R) we can apply Lemma 2.2.3.
Now we state some general properties of measurable maps.
Proposition 2.2.5 Let (
1
, F
1
), (
2
, F
2
), (
3
, F
3
) be measurable spaces.
Assume that f :
1
2
is (F
1
, F
2
)-measurable and that g :
2
3
is
(F
2
, F
3
)-measurable. Then the following is satised:
(1) g f :
1
3
dened by
(g f)(
1
) := g(f(
1
))
is (F
1
, F
3
)-measurable.
(2) Assume that P
1
is a probability measure on F
1
and dene
P
2
(B
2
) := P
1
({
1
1
: f(
1
) B
2
}) .
Then P
2
is a probability measure on F
2
.
The proof is an exercise.
Example 2.2.6 We want to simulate the ipping of an (unfair) coin by the
random number generator: the random number generator of the computer
gives us a number which has (a discrete) uniform distribution on [0, 1]. So
we take the probability space ([0, 1], B([0, 1]), ) and dene for p (0, 1) the
random variable
f() := 1I
[0,p)
().
Then it holds
P
2
({1}) := P
1
({ : f() = 1}) = ([0, p)) = p,
P
2
({0}) := P
1
({
1
1
: f(
1
) = 0}) = ([p, 1]) = 1 p.
Assume the random number generator gives out the number x. If we would
write a program such that output = heads in case x [0, p) and output
= tails in case x [p, 1], output would simulate the ipping of an (unfair)
coin, or in other words, output has binomial distribution
1,p
.
Denition 2.2.7 [law of a random variable] Let (, F, P) be a prob-
ability space and f : R be a random variable. Then
P
f
(B) := P({ : f() B})
is called the law or image measure of the random variable f.
40 CHAPTER 2. RANDOM VARIABLES
The law of a random variable is completely characterized by its distribution
function which we introduce now.
Denition 2.2.8 [distribution-function] Given a random variable f :
R on a probability space (, F, P), the function
F
f
(x) := P({ : f() x})
is called distribution function of f.
Proposition 2.2.9 [Properties of distribution-functions]
The distribution-function F
f
: R [0, 1] is a right-continuous non-decreasing
function such that
lim
x
F(x) = 0 and lim
x
F(x) = 1.
Proof. (i) F is non-decreasing: given x
1
< x
2
one has that
{ : f() x
1
} { : f() x
2
}
and
F(x
1
) = P({ : f() x
1
}) P({ : f() x
2
}) = F(x
2
).
(ii) F is right-continuous: let x R and x
n
x. Then
F(x) = P({ : f() x})
= P
_
n=1
{ : f() x
n
}
_
= lim
n
P({ : f() x
n
})
= lim
n
F(x
n
).
(iii) The properties lim
x
F(x) = 0 and lim
x
F(x) = 1 are an exercise.
n=1
i.e.
f
n
=
N
n
k=1
a
n
k
1I
A
n
k
with a
n
k
R and A
n
k
F such that
f
n
() f() for all as n .
_
_
_
_
_
Proposition 2.1.3
f
1
((a, b)) F for all < a < b <
42 CHAPTER 2. RANDOM VARIABLES
2.3 Independence
Let us rst start with the notion of a family of independent random variables.
Denition 2.3.1 [independence of a family of random variables]
Let (, F, P) be a probability space and f
i
: R, i I, be random
variables where I is a non-empty index-set. The family (f
i
)
iI
is called
independent provided that for all distinct i
1
, ..., i
n
I, n = 1, 2, ..., and
all B
1
, ..., B
n
B(R) one has that
P(f
i
1
B
1
, ..., f
i
n
B
n
) = P(f
i
1
B
1
) P(f
i
n
B
n
) .
In case, we have a nite index set I, that means for example I = {1, ..., n},
then the denition above is equivalent to
Denition 2.3.2 [independence of a finite family of random vari-
ables] Let (, F, P) be a probability space and f
i
: R, i = 1, . . . , n,
random variables. The random variables f
1
, . . . , f
n
are called independent
provided that for all B
1
, ..., B
n
B(R) one has that
P(f
1
B
1
, ..., f
n
B
n
) = P(f
1
B
1
) P(f
n
B
n
) .
The connection between the independence of random variables and of events
is obvious:
Proposition 2.3.3 Let (, F, P) be a probability space and f
i
: R,
i I, be random variables where I is a non-empty index-set. Then the
following assertions are equivalent.
(1) The family (f
i
)
iI
is independent.
(2) For all families (B
i
)
iI
of Borel sets B
i
B(R) one has that the events
({ : f
i
() B
i
})
iI
are independent.
Sometimes we need to group independent random variables. In this respect
the following proposition turns out to be useful. For the following we say
that g : R
n
R is Borel-measurable (or a Borel function) provided that g
is (B(R
n
), B(R))-measurable.
Proposition 2.3.4 [Grouping of independent random variables]
Let f
k
: R, k = 1, 2, 3, ... be independent random variables. As-
sume Borel functions g
i
: R
n
i
R for i = 1, 2, ... and n
i
{1, 2, ...}.
Then the random variables g
1
(f
1
(), ..., f
n
1
()), g
2
(f
n
1
+1
(), ..., f
n
1
+n
2
()),
g
3
(f
n
1
+n
2
+1
(), ..., f
n
1
+n
2
+n
3
()), ... are independent.
The proof is an exercise.
2.3. INDEPENDENCE 43
Proposition 2.3.5 [independence and product of laws] Assume that
(, F, P) is a probability space and that f, g : R are random variables
with laws P
f
and P
g
and distribution-functions F
f
and F
g
, respectively. Then
the following assertions are equivalent:
(1) f and g are independent.
(2) P((f, g) B) = (P
f
P
g
)(B) for all B B(R
2
).
(3) P(f x, g y) = F
f
(x)F
g
(y) for all x, y R.
The proof is an exercise.
Remark 2.3.6 Assume that there are Riemann-integrable functions p
f
, p
g
:
R [0, ) such that
_
R
p
f
(x)dx =
_
R
p
g
(x)dx = 1,
F
f
(x) =
_
x
p
f
(y)dy, and F
g
(x) =
_
x
p
g
(y)dy
for all x R (one says that the distribution-functions F
f
and F
g
are abso-
lutely continuous with densities p
f
and p
g
, respectively). Then the indepen-
dence of f and g is also equivalent to the representation
F
(f,g)
(x, y) =
_
x
_
y
p
f
(u)p
g
(v)d(u)d(v).
In other words: the distribution-function of the random vector (f, g) has a
density which is the product of the densities of f and g.
Often one needs the existence of sequences of independent random variables
f
1
, f
2
, ... : R having a certain distribution. How to construct such
sequences? First we let
:= R
N
= {x = (x
1
, x
2
, ...) : x
n
R} .
Then we dene the projections
n
: R
N
R given by
n
(x) := x
n
,
that means
n
lters out the n-th coordinate. Now we take the smallest
-algebra such that all these projections are random variables, that means
we take
B(R
N
) =
_
1
n
(B) : n = 1, 2, ..., B B(R)
_
,
44 CHAPTER 2. RANDOM VARIABLES
see Proposition 1.1.6. Finally, let P
1
, P
2
, ... be a sequence of probability
measures on B(R). Using Carath
n=1
is a sequence of independent random variables such that the law of
n
is P
n
,
that means
P(
n
B) = P
n
(B)
for all B B(R).
Proof. Take Borel sets B
1
, ..., B
n
B(R). Then
P({x :
1
(x) B
1
, ...,
n
(x) B
n
})
= P(B
1
B
2
B
n
R R )
= P
1
(B
1
) P
n
(B
n
)
=
n
k=1
P(R R B
k
R )
=
n
k=1
P({x :
k
(x) B
k
}).
Chapter 3
Integration
Given a probability space (, F, P) and a random variable f : R, we
dene the expectation or integral
Ef =
_
fdP =
_
f()dP()
and investigate its basic properties.
3.1 Denition of the expected value
The denition of the integral is done within three steps.
Denition 3.1.1 [step one, f is a step-function] Given a probability
space (, F, P) and an F-measurable g : R with representation
g =
n
i=1
i
1I
A
i
where
i
R and A
i
F, we let
Eg =
_
gdP =
_
g()dP() :=
n
i=1
i
P(A
i
).
We have to check that the denition is correct, since it might be that dierent
representations give dierent expected values Eg. However, this is not the
case as shown by
Lemma 3.1.2 Assuming measurable step-functions
g =
n
i=1
i
1I
A
i
=
m
j=1
j
1I
B
j
,
one has that
n
i=1
i
P(A
i
) =
m
j=1
j
P(B
j
).
45
46 CHAPTER 3. INTEGRATION
Proof. By subtracting in both equations the right-hand side from the left-
hand one we only need to show that
n
i=1
i
1I
A
i
= 0
implies that
n
i=1
i
P(A
i
) = 0.
By taking all possible intersections of the sets A
i
and by adding appropriate
complements we nd a system of sets C
1
, ..., C
N
F such that
(a) C
j
C
k
= if j = k,
(b)
N
j=1
C
j
= ,
(c) for all A
i
there is a set I
i
{1, ..., N} such that A
i
=
jI
i
C
j
.
Now we get that
0 =
n
i=1
i
1I
A
i
=
n
i=1
jI
i
i
1I
C
j
=
N
j=1
_
i:jI
i
i
_
1I
C
j
=
N
j=1
j
1I
C
j
so that
j
= 0 if C
j
= . From this we get that
n
i=1
i
P(A
i
) =
n
i=1
jI
i
i
P(C
j
) =
N
j=1
_
i:jI
i
i
_
P(C
j
) =
N
j=1
j
P(C
j
) = 0.
i=1
i
1I
A
i
and g =
m
j=1
j
1I
B
j
,
one has that
f + g =
n
i=1
i
1I
A
i
+
m
j=1
j
1I
B
j
and
E(f + g) =
n
i=1
i
P(A
i
) +
m
j=1
j
P(B
j
) = Ef + Eg.
fdP =
_
f()dP()
:= sup {Eg : 0 g() f(), g is a measurable step-function} .
Note that in this denition the case Ef = is allowed. In the last step
we will dene the expectation for a general random variable. To this end we
decompose a random variable f : R into its positive and negative part
f() = f
+
() f
()
with
f
+
() := max {f(), 0} 0 and f
() := max {f(), 0} 0.
Denition 3.1.5 [step three, f is general] Let (, F, P) be a proba-
bility space and f : R be a random variable.
(1) If Ef
+
< or Ef
[, ].
(2) The random variable f is called integrable provided that
Ef
+
< and Ef
< .
(3) If the expected value of f exists and A F, then
_
A
fdP =
_
A
f()dP() :=
_
f()1I
A
()dP().
The expression Ef is called expectation or expected value of the random
variable f.
Remark 3.1.6 The fundamental Lebesgue-integral on the real line can be
introduced by the means, we have to our disposal so far, as follows: Assume
a function f : R R which is (B(R), B(R))-measurable. Let f
n
: (n
1, n] R be the restriction of f which is a random variable with respect to
B((n 1, n]) = B
(n1,n]
. In Section 1.3.4 we have introduced the Lebesgue
measure =
n
on (n 1, n]. Assume that f
n
is integrable on (n 1, n] for
all n = 1, 2, ... and that
n=
_
(n1,n]
|f(x)|d(x) < ,
48 CHAPTER 3. INTEGRATION
then f : R R is called integrable with respect to the Lebesgue-measure
on the real line and the Lebesgue-integral is dened by
_
R
f(x)d(x) :=
n=
_
(n1,n]
f(x)d(x).
Now we go the opposite way: Given a Borel set I and a map f : I R which
is (B
I
, B(R)) measurable, we can extend f to a (B(R), B(R))-measurable
function
f by
f(x) := f(x) if x I and
f(x) := 0 if x I. If
f is
Lebesgue-integrable, then we dene
_
I
f(x)d(x) :=
_
R
f(x)d(x).
Example 3.1.7 A basic example for our integration is as follows: Let =
{
1
,
2
, ...}, F := 2
, and P({
n
}) = q
n
[0, 1] with
n=1
q
n
= 1. Given
f : R we get that f is integrable if and only if
n=1
|f(
n
)|q
n
< ,
and the expected value exists if either
{n:f(
n
)0}
f(
n
)q
n
< or
{n:f(
n
)0}
(f(
n
))q
n
< .
If the expected value exists, then it computes to
Ef =
n=1
f(
n
)q
n
[, ].
A simple example for the expectation is the expected value while rolling a
die:
Example 3.1.8 Assume that := {1, 2, . . . , 6}, F := 2
, and P({k}) :=
1
6
,
which models rolling a die. If we dene f(k) = k, i.e.
f(k) :=
6
i=1
i1I
{i}
(k),
then f is a measurable step-function and it follows that
Ef =
6
i=1
iP({i}) =
1 + 2 + + 6
6
= 3.5.
3.2. BASIC PROPERTIES OF THE EXPECTED VALUE 49
Besides the expected value, the variance is often of interest.
Denition 3.1.9 [variance] Let (, F, P) be a probability space and f :
R be an integrable random variable. Then
var(f) =
2
f
= E[f Ef]
2
[0, ]
is called variance.
Let us summarize some simple properties:
Proposition 3.1.10 (1) If f is integrable and , c R, then
var(f c) =
2
var(f).
(2) If Ef
2
< then var(f) = Ef
2
(Ef)
2
< .
Proof. (1) follows from
var(f c) = E[(f c) E(f c)]
2
= E[f Ef]
2
=
2
var(f).
(2) First we remark that E|f| (Ef
2
)
1
2
as we shall see later by Holders in-
equality (Corollary 3.6.6), that means any square integrable random variable
is integrable. Then we simply get that
var(f) = E[f Ef]
2
= Ef
2
2E(fEf) + (Ef)
2
= Ef
2
2(Ef)
2
+ (Ef)
2
.
< . Since
_
: f
+
() = 0
_
_
: f
() = 0
_
=
and since both sets are measurable, it follows that |f| = f
+
+f
is integrable
if and only if f
+
and f
| Ef
+
+Ef
= E|f|.
(3) If f = 0 a.s., then f
+
= 0 a.s. and f
n
k=1
a
k
1I
A
k
, g() 0, and g = 0 a.s., then a
k
= 0 implies P(A
k
) = 0.
Hence
Ef = sup {Eg : 0 g f, g is a measurable step-function} = 0
since 0 g f implies g = 0 a.s. Properties (4) and (5) are exercises.
The next lemma is useful later on. In this lemma we use, as an approximation
for f, the so-called staircase-function. This idea was already exploited in the
proof of Proposition 2.1.3.
Lemma 3.2.2 Let (, F, P) be a probability space and f : R be a
random variable.
(1) Then there exists a sequence of measurable step-functions f
n
: R
such that, for all n = 1, 2, . . . and for all ,
|f
n
()| |f
n+1
()| |f()| and f() = lim
n
f
n
().
If f() 0 for all , then one can arrange f
n
() 0 for all
.
(2) If f 0 and if (f
n
)
n=1
is a sequence of measurable step-functions with
0 f
n
() f() for all as n , then
Ef = lim
n
Ef
n
.
3.2. BASIC PROPERTIES OF THE EXPECTED VALUE 51
Proof. (1) It is easy to verify that the staircase-functions
f
n
() :=
4
n
1
k=0
k
2
n
1I
{
k
2
n
f<
k+1
2
n
}
() +
1
k=4
n
k + 1
2
n
1I
{
k
2
n
f<
k+1
2
n
}
()
fulll all the conditions.
(2) Letting
f
0
n
() :=
4
n
1
k=0
k
2
n
1I
{
k
2
n
f<
k+1
2
n
}
()
we get 0 f
0
n
() f() for all . On the other hand, by the denition
of the expectation there exists a sequence 0 g
n
() f() of measurable
step-functions such that Eg
n
Ef. Hence
h
n
:= max
_
f
0
n
, g
1
, . . . , g
n
_
is a measurable step-function with 0 g
n
() h
n
() f(), so that
Eg
n
Eh
n
Ef, and lim
n
Eg
n
= lim
n
Eh
n
= Ef.
Now we will show that for every sequence (f
k
)
k=1
of measurable step-
functions with 0 f
k
f it holds lim
k
Ef
k
= Ef. Consider
d
k,n
:= f
k
h
n
.
Clearly, d
k,n
f
k
as n and d
k,n
h
n
as k . Let
z
k,n
:= arctan Ed
k,n
so that 0 z
k,n
1. Since (z
k,n
)
k=1
is increasing for xed n and (z
k,n
)
n=1
is
increasing for xed k one quickly checks that
lim
k
lim
n
z
k,n
= lim
n
lim
k
z
k,n
.
Hence
Ef = lim
n
Eh
n
= lim
n
lim
k
Ed
k,n
= lim
k
lim
n
Ed
k,n
= lim
k
Ef
k
where we have used the following fact: if 0
n
() () for step-functions
n
and , then
lim
n
E
n
= E.
To check this, it is sucient to assume that () = 1I
A
() for some A F.
Let (0, 1) and
B
n
:= { A : 1
n
()} .
52 CHAPTER 3. INTEGRATION
Then
(1 )1I
B
n
()
n
() 1I
A
().
Since B
n
B
n+1
and
n=1
B
n
) = P(A) so that
(1 )P(A) lim
n
E
n
.
Since this is true for all > 0 we get
E = P(A) lim
n
E
n
E
and are done.
Now we continue with some basic properties of the expectation.
Proposition 3.2.3 [properties of the expectation] Let (, F, P) be a
probability space and f, g : R be random variables such that Ef and Eg
exist.
(1) If f 0 and g 0, then E(f + g) = Ef +Eg.
(2) If c R, then E(cf) exists and E(cf) = cEf.
(3) If Ef
+
+ Eg
+
< or Ef
+ Eg
f
+
+ g
+
one gets that E(f + g)
+
< . Moreover, one quickly checks that
(f + g)
+
+ f
+ g
= f
+
+ g
+
+ (f + g)
so that Ef
+Eg
+Eg
<
and
E[(f + g)
+
+ f
+ g
] = E[f
+
+ g
+
+ (f + g)
]
3.2. BASIC PROPERTIES OF THE EXPECTED VALUE 53
which implies that E(f + g) = Ef +Eg because of (1).
(4) If Ef
= or Eg
+
= , then Ef = or Eg = so that nothing is
to prove. Hence assume that Ef
< and Eg
+
< . The inequality f g
gives 0 f
+
g
+
and 0 g
Eg
+
Eg
= Eg.
(5) Since (af + bg)
+
|a||f| + |b||g| and (af + bg)
n
and f
are integrable
Proposition 3.2.3 (1) implies that Eh
n
= Ef
n
Eg and Eh = Ef Eg so
that we are done.
In Denition 1.2.7 we dened limsup
n
a
n
for a sequence (a
n
)
n=1
R. Nat-
urally, one can extend this denition to a sequence (f
n
)
n=1
of random vari-
ables: For any limsup
n
f
n
() and liminf
n
f
n
() exist. Hence
limsup
n
f
n
and liminf
n
f
n
are random variables.
Proposition 3.2.6 [Lemma of Fatou] Let (, F, P) be a probability space
and g, f
1
, f
2
, ... : R be random variables with |f
n
()| g() a.s. Assume
that g is integrable. Then limsup f
n
and liminf f
n
are integrable and one has
that
Eliminf
n
f
n
liminf
n
Ef
n
limsup
n
Ef
n
Elimsup
n
f
n
.
Proof. We only prove the rst inequality. The second one follows from the
denition of limsup and liminf, the third one can be proved like the rst
one. So we let
Z
k
:= inf
nk
f
n
so that Z
k
liminf
n
f
n
and, a.s.,
|Z
k
| g and | liminf
n
f
n
| g.
3.2. BASIC PROPERTIES OF THE EXPECTED VALUE 55
Applying monotone convergence in the form of Corollary 3.2.5 gives that
Eliminf
n
f
n
= lim
k
EZ
k
= lim
k
_
E inf
nk
f
n
_
lim
k
_
inf
nk
Ef
n
_
= liminf
n
Ef
n
.
i=1
(f
i
Ef
i
)
_
2
= E
n
i,j=1
(f
i
Ef
i
)(f
j
Ef
j
)
=
n
i,j=1
E((f
i
Ef
i
)(f
j
Ef
j
))
56 CHAPTER 3. INTEGRATION
=
n
i=1
E(f
i
Ef
i
)
2
+
i=j
E((f
i
Ef
i
)(f
j
Ef
j
))
=
n
i=1
var(f
i
) +
i=j
E(f
i
Ef
i
)E(f
j
Ef
j
)
=
n
i=1
var(f
i
)
because E(f
i
Ef
i
) = Ef
i
Ef
i
= 0.
3.3 Connections to the Riemann-integral
In two typical situations we formulate (without proof) how our expected
value connects to the Riemann-integral. For this purpose we use the Lebesgue
measure dened in Section 1.3.4.
Proposition 3.3.1 Let f : [0, 1] R be a continuous function. Then
_
1
0
f(x)dx = Ef
with the Riemann-integral on the left-hand side and the expectation of the
random variable f with respect to the probability space ([0, 1], B([0, 1]), ),
where is the Lebesgue measure, on the right-hand side.
Now we consider a continuous function p : R [0, ) such that
_
p(x)dx = 1
and dene a measure P on B(R) by
P((a
1
, b
1
] (a
n
, b
n
]) :=
n
i=1
_
b
i
a
i
p(x)dx
for a
1
b
1
a
n
b
n
(again with the convention that
(a, ] = (a, )) via Caratheodorys Theorem (Proposition 1.2.17). The
function p is called density of the measure P.
Proposition 3.3.2 Let f : R R be a continuous function such that
_
|f(x)|p(x)dx < .
3.3. CONNECTIONS TO THE RIEMANN-INTEGRAL 57
Then
_
f(x)p(x)dx = Ef
with the Riemann-integral on the left-hand side and the expectation of the
random variable f with respect to the probability space (R, B(R), P) on the
right-hand side.
Let us consider two examples indicating the dierence between the Riemann-
integral and our expected value.
Example 3.3.3 We give the standard example of a function which has an
expected value, but which is not Riemann-integrable. Let
f(x) :=
_
1, x [0, 1] irrational
0, x [0, 1] rational
.
Then f is not Riemann integrable, but Lebesgue integrable with Ef = 1 if
we use the probability space ([0, 1], B([0, 1]), ).
Example 3.3.4 The expression
lim
t
_
t
0
sin x
x
dx =
2
is dened as limit in the Riemann sense although
_
0
_
sin x
x
_
+
dx = and
_
0
_
sin x
x
_
dx = .
Transporting this into a probabilistic setting we take the exponential distri-
bution with parameter > 0 from Section 1.3.6. Let f : R R be given
by f(x) = 0 if x 0 and f(x) :=
sin x
x
e
x
if x > 0 and recall that the
exponential distribution
(x) = 1I
[0,)
(x)e
x
. The above yields that
lim
t
_
t
0
f(x)p
(x)dx =
2
but
_
R
f(x)
+
d
(x) =
_
R
f(x)
(x) = .
Hence the expected value of f does not exists, but the Riemann-integral gives
a way to dene a value, which makes sense. The point of this example is
that the Riemann-integral takes more information into the account than the
rather abstract expected value.
58 CHAPTER 3. INTEGRATION
3.4 Change of variables in the expected value
We want to prove a change of variable formula for the integrals
_
fdP. In
many cases, only by this formula it is possible to compute explicitly expected
values.
Proposition 3.4.1 [Change of variables] Let (, F, P) be a probability
space, (E, E) be a measurable space, : E be a measurable map, and
g : E R be a random variable. Assume that P
() =
_
1
(A)
g(())dP()
for all A E in the sense that if one integral exists, the other exists as well,
and their values are equal.
Proof. (i) Letting g() := 1I
A
()g() we have
g(()) = 1I
1
(A)
()g(())
so that it is sucient to consider the case A = E. Hence we have to show
that
_
E
g()dP
() =
_
g(())dP().
(ii) Since, for f() := g(()) one has that f
+
= g
+
and f
= g
it is
sucient to consider the positive part of g and its negative part separately.
In other words, we can assume that g() 0 for all E.
(iii) Assume now a sequence of measurable step-function 0 g
n
() g()
for all E which does exist according to Lemma 3.2.2 so that g
n
(())
g(()) for all as well. If we can show that
_
E
g
n
()dP
() =
_
g
n
(())dP()
then we are done. By additivity it is enough to check g
n
() = 1I
B
() for some
B E (if this is true for this case, then one can multiply by real numbers
and can take sums and the equality remains true). But now we get
_
E
g
n
()dP
() = P
(B) = P(
1
(B)) =
_
1I
1
(B)
()dP()
=
_
1I
B
(())dP() =
_
g
n
(())dP().
is the law of .
3.5. FUBINIS THEOREM 59
Denition 3.4.2 [Moments] Assume that n {1, 2, ....}.
(1) For a random variable f : R the expected value E|f|
n
is called
n-th absolute moment of f. If Ef
n
exists, then Ef
n
is called n-th
moment of f.
(2) For a probability measure on (R, B(R)) the expected value
_
R
|x|
n
d(x)
is called n-th absolute moment of . If
_
R
x
n
d(x) exists, then
_
R
x
n
d(x) is called n-th moment of .
Corollary 3.4.3 Let (, F, P) be a probability space and f : R be a
random variable with law P
f
. Then, for all n = 1, 2, ...,
E|f|
n
=
_
R
|x|
n
dP
f
(x) and Ef
n
=
_
R
x
n
dP
f
(x),
where the latter equality has to be understood as follows: if one side exists,
then the other exists as well and they coincide. If the law P
f
has a density
p in the sense of Proposition 3.3.2, then
_
R
|x|
n
dP
f
(x) can be replaced by
_
R
|x|
n
p(x)dx and
_
R
x
n
dP
f
(x) by
_
R
x
n
p(x)dx.
3.5 Fubinis Theorem
In this section we consider iterated integrals, as they appear very often in
applications, and show in Fubinis
2
Theorem that integrals with respect
to product measures can be written as iterated integrals and that one can
change the order of integration in these iterated integrals. In many cases
this provides an appropriate tool for the computation of integrals. Before we
start with Fubinis Theorem we need some preparations. First we recall the
notion of a vector space.
Denition 3.5.1 [vector space] A set L equipped with operations + :
LL L and : RL L is called vector space over R if the following
conditions are satised:
(1) x + y = y + x for all x, y L.
(2) x + (y + z) = (x + y) + z for all x, y, z L.
(3) There exists a 0 L such that x + 0 = x for all x L.
(4) For all x L there exists a x such that x + (x) = 0.
2
Guido Fubini, 19/01/1879 (Venice, Italy) - 06/06/1943 (New York, USA).
60 CHAPTER 3. INTEGRATION
(5) 1 x = x.
(6) (x) = ()x for all , R and x L.
(7) ( + )x = x + x for all , R and x L.
(8) (x + y) = x + y for all R and x, y L.
Usually one uses the notation xy := x+(y) and x+y := (x) +y etc.
Now we state the Monotone Class Theorem. It is a powerful tool by which,
for example, measurability assertions can be proved.
Proposition 3.5.2 [Monotone Class Theorem] Let H be a class of
bounded functions from into R satisfying the following conditions:
(1) H is a vector space over R where the natural point-wise operations +
and are used.
(2) 1I
H.
(3) If f
n
H, f
n
0, and f
n
f, where f is bounded on , then f H.
Then one has the following: if H contains the indicator function of every
set from some -system I of subsets of , then H contains every bounded
(I)-measurable function on .
Proof. See for example [5] (Theorem 3.14).
For the following it is convenient to allow that the random variables may
take innite values.
Denition 3.5.3 [extended random variable] Let (, F) be a measur-
able space. A function f : R {, } is called extended random
variable i
f
1
(B) := { : f() B} F for all B B(R) or B = {}.
If we have a non-negative extended random variable, we let (for example)
_
fdP = lim
N
_
[f N]dP.
For the following, we recall that the product space (
1
2
, F
1
F
2
, P
1
P
2
)
of the two probability spaces (
1
, F
1
, P
1
) and (
2
, F
2
, P
2
) was dened in
Denition 1.2.19.
3.5. FUBINIS THEOREM 61
Proposition 3.5.4 [Fubinis Theorem for non-negative functions]
Let f :
1
2
R be a non-negative F
1
F
2
-measurable function such
that
_
2
f(
1
,
2
)d(P
1
P
2
)(
1
,
2
) < . (3.1)
Then one has the following:
(1) The functions
1
f(
1
,
0
2
) and
2
f(
0
1
,
2
) are F
1
-measurable
and F
2
-measurable, respectively, for all
0
i
i
.
(2) The functions
1
_
2
f(
1
,
2
)dP
2
(
2
) and
2
_
1
f(
1
,
2
)dP
1
(
1
)
are extended F
1
-measurable and F
2
-measurable, respectively, random
variables.
(3) One has that
_
2
f(
1
,
2
)d(P
1
P
2
) =
_
1
__
2
f(
1
,
2
)dP
2
(
2
)
_
dP
1
(
1
)
=
_
2
__
1
f(
1
,
2
)dP
1
(
1
)
_
dP
2
(
2
).
It should be noted, that item (3) together with Formula (3.1) automatically
implies that
P
2
_
2
:
_
1
f(
1
,
2
)dP
1
(
1
) =
_
= 0
and
P
1
_
1
:
_
2
f(
1
,
2
)dP
2
(
2
) =
_
= 0.
Proof of Proposition 3.5.4.
(i) First we remark it is sucient to prove the assertions for
f
N
(
1
,
2
) := min {f(
1
,
2
), N}
which is bounded. The statements (1), (2), and (3) can be obtained via
N if we use Proposition 2.1.4 to get the necessary measurabilities
(which also works for our extended random variables) and the monotone
convergence formulated in Proposition 3.2.4 to get to values of the integrals.
Hence we can assume for the following that sup
1
,
2
f(
1
,
2
) < .
(ii) We want to apply the Monotone Class Theorem Proposition 3.5.2. Let
H be the class of bounded F
1
F
2
-measurable functions f :
1
2
R
such that
62 CHAPTER 3. INTEGRATION
(a) the functions
1
f(
1
,
0
2
) and
2
f(
0
1
,
2
) are F
1
-measurable
and F
2
-measurable, respectively, for all
0
i
i
,
(b) the functions
1
_
2
f(
1
,
2
)dP
2
(
2
) and
2
_
1
f(
1
,
2
)dP
1
(
1
)
are F
1
-measurable and F
2
-measurable, respectively,
(c) one has that
_
2
f(
1
,
2
)d(P
1
P
2
) =
_
1
__
2
f(
1
,
2
)dP
2
(
2
)
_
dP
1
(
1
)
=
_
2
__
1
f(
1
,
2
)dP
1
(
1
)
_
dP
2
(
2
).
Again, using Propositions 2.1.4 and 3.2.4 we see that H satises the assump-
tions (1), (2), and (3) of Proposition 3.5.2. As -system I we take the system
of all F = AB with A F
1
and B F
2
. Letting f(
1
,
2
) = 1I
A
(
1
)1I
B
(
2
)
we easily can check that f H. For instance, property (c) follows from
_
2
f(
1
,
2
)d(P
1
P
2
) = (P
1
P
2
)(A B) = P
1
(A)P
2
(B)
and, for example,
_
1
__
2
f(
1
,
2
)dP
2
(
2
)
_
dP
1
(
1
) =
_
1
1I
A
(
1
)P
2
(B)dP
1
(
1
)
= P
1
(A)P
2
(B).
Applying the Monotone Class Theorem Proposition 3.5.2 gives that H con-
tains all bounded functions f :
1
2
R measurable with respect F
1
F
2
.
Hence we are done.
Now we state Fubinis Theorem for general random variables f :
1
2
R.
Proposition 3.5.5 [Fubinis Theorem] Let f :
1
2
R be an
F
1
F
2
-measurable function such that
_
2
|f(
1
,
2
)|d(P
1
P
2
)(
1
,
2
) < . (3.2)
Then the following holds:
(1) The functions
1
f(
1
,
0
2
) and
2
f(
0
1
,
2
) are F
1
-measurable
and F
2
-measurable, respectively, for all
0
i
i
.
3.5. FUBINIS THEOREM 63
(2) There are M
i
F
i
with P
i
(M
i
) = 1 such that the integrals
_
1
f(
1
,
0
2
)dP
1
(
1
) and
_
2
f(
0
1
,
2
)dP
2
(
2
)
exist and are nite for all
0
i
M
i
.
(3) The maps
1
1I
M
1
(
1
)
_
2
f(
1
,
2
)dP
2
(
2
)
and
2
1I
M
2
(
2
)
_
1
f(
1
,
2
)dP
1
(
1
)
are F
1
-measurable and F
2
-measurable, respectively, random variables.
(4) One has that
_
2
f(
1
,
2
)d(P
1
P
2
)
=
_
1
_
1I
M
1
(
1
)
_
2
f(
1
,
2
)dP
2
(
2
)
_
dP
1
(
1
)
=
_
2
_
1I
M
2
(
2
)
_
1
f(
1
,
2
)dP
1
(
1
)
_
dP
2
(
2
).
Remark 3.5.6 (1) Our understanding is that writing, for example, an
expression like
1I
M
2
(
2
)
_
1
f(
1
,
2
)dP
1
(
1
)
we only consider and compute the integral for
2
M
2
.
(2) The expressions in (3.1) and (3.2) can be replaced by
_
1
__
2
f(
1
,
2
)dP
2
(
2
)
_
dP
1
(
1
) < ,
and the same expression with |f(
1
,
2
)| instead of f(
1
,
2
), respec-
tively.
Proof of Proposition 3.5.5. The proposition follows by decomposing f =
f
+
f
e
x
2
dx
by Fubinis Theorem.
64 CHAPTER 3. INTEGRATION
Example 3.5.7 Let f : R R R be a non-negative continuous function.
Fubinis Theorem applied to the uniform distribution on [N, N], N
{1, 2, ...} gives that
_
N
N
__
N
N
f(x, y)
d(y)
2N
_
d(x)
2N
=
_
[N,N][N,N]
f(x, y)
d( )(x, y)
(2N)
2
where is the Lebesgue measure. Letting f(x, y) := e
(x
2
+y
2
)
, the above
yields that
_
N
N
__
N
N
e
x
2
e
y
2
d(y)
_
d(x) =
_
[N,N][N,N]
e
(x
2
+y
2
)
d( )(x, y).
For the left-hand side we get
lim
N
_
N
N
__
N
N
e
x
2
e
y
2
d(y)
_
d(x)
= lim
N
_
N
N
e
x
2
__
N
N
e
y
2
d(y)
_
d(x)
=
_
lim
N
_
N
N
e
x
2
d(x)
_
2
=
__
e
x
2
d(x)
_
2
.
For the right-hand side we get
lim
N
_
[N,N][N,N]
e
(x
2
+y
2
)
d( )(x, y)
= lim
R
_
x
2
+y
2
R
2
e
(x
2
+y
2
)
d( )(x, y)
= lim
R
_
R
0
_
2
0
e
r
2
rdrd
= lim
R
_
1 e
R
2
_
=
where we have used polar coordinates. Comparing both sides gives
_
e
x
2
d(x) =
.
As corollary we show that the denition of the Gaussian measure in Section
1.3.5 was correct.
3.5. FUBINIS THEOREM 65
Proposition 3.5.8 For > 0 and m R let
p
m,
2(x) :=
1
2
2
e
(xm)
2
2
2
.
Then,
_
R
p
m,
2(x)dx = 1,
_
R
xp
m,
2(x)dx = m, and
_
R
(x m)
2
p
m,
2(x)dx =
2
. (3.3)
In other words: if a random variable f : R has as law the normal
distribution N
m,
2, then
Ef = m and E(f Ef)
2
=
2
. (3.4)
Proof. By the change of variable x m + x it is sucient to show the
statements for m = 0 and = 1. Firstly, by putting x = z/
2 one gets
1 =
1
e
x
2
dx =
1
2
_
z
2
2
dz
where we have used Example 3.5.7 so that
_
R
p
0,1
(x)dx = 1. Secondly,
_
R
xp
0,1
(x)dx = 0
follows from the symmetry of the density p
0,1
(x) = p
0,1
(x). Finally, by
(x exp(x
2
/2))
= exp(x
2
/2) x
2
exp(x
2
/2) one can also compute that
1
2
_
x
2
e
x
2
2
dx =
1
2
_
x
2
2
dx = 1.
.
Proof. We simply have
P({ : f() }) = E1I
{f}
Ef1I
{f}
Ef.
Example 3.6.4 (1) The function g(x) := |x| is convex so that, for any
integrable f,
|Ef| E|f|.
(2) For 1 p < the function g(x) := |x|
p
is convex, so that Jensens
inequality applied to |f| gives that
(E|f|)
p
E|f|
p
.
For the second case in the example above there is another way we can go. It
uses the famous H
older-inequality.
Proposition 3.6.5 [H
olders inequality]
5
Assume a probability space
(, F, P) and random variables f, g : R. If 1 < p, q < with
1
p
+
1
q
= 1,
then
E|fg| (E|f|
p
)
1
p
(E|g|
q
)
1
q
.
Proof. We can assume that E|f|
p
> 0 and E|g|
q
> 0. For example, assuming
E|f|
p
= 0 would imply |f|
p
= 0 a.s. according to Proposition 3.2.1 so that
fg = 0 a.s. and E|fg| = 0. Hence we may set
f :=
f
(E|f|
p
)
1
p
and g :=
g
(E|g|
q
)
1
q
.
We notice that
x
a
y
b
ax + by
for x, y 0 and positive a, b with a+b = 1, which follows from the concavity
of the logarithm (we can assume for a moment that x, y > 0)
ln(ax + by) a ln x + b ln y = ln x
a
+ ln y
b
= ln x
a
y
b
.
5
Otto Ludwig Holder, 22/12/1859 (Stuttgart, Germany) - 29/08/1937 (Leipzig, Ger-
many).
68 CHAPTER 3. INTEGRATION
Setting x := |
f|
p
, y := | g|
q
, a :=
1
p
, and b :=
1
q
, we get
|
f g| = x
a
y
b
ax + by =
1
p
|
f|
p
+
1
q
| g|
q
and
E|
f g|
1
p
E|
f|
p
+
1
q
E| g|
q
=
1
p
+
1
q
= 1.
On the other hand side,
E|
f g| =
E|fg|
(E|f|
p
)
1
p
(E|g|
q
)
1
q
so that we are done.
Corollary 3.6.6 For 0 < p < q < one has that (E|f|
p
)
1
p
(E|f|
q
)
1
q
.
The proof is an exercise.
Corollary 3.6.7 [H
n=1
and (b
n
)
n=1
be sequences of real numbers. If 1 < p, q < with
1
p
+
1
q
= 1,
then
n=1
|a
n
b
n
|
_
n=1
|a
n
|
p
_1
p
_
n=1
|b
n
|
q
_1
q
.
Proof. It is sucient to prove the inequality for nite sequences (b
n
)
N
n=1
since
by letting N we get the desired inequality for innite sequences. Let
= {1, ..., N}, F := 2
n=1
|a
n
b
n
|
_
1
N
N
n=1
|a
n
|
p
_
1
p
_
1
N
N
n=1
|b
n
|
q
_
1
q
from Proposition 3.6.5. Multiplying by N and letting N gives our
assertion.
Proposition 3.6.8 [Minkowski inequality]
6
Assume a probability space
(, F, P), random variables f, g : R, and 1 p < . Then
(E|f + g|
p
)
1
p
(E|f|
p
)
1
p
+ (E|g|
p
)
1
p
. (3.5)
6
Hermann Minkowski, 22/06/1864 (Alexotas, Russian Empire; now Kaunas, Lithuania)
- 12/01/1909 (Gottingen, Germany).
3.6. SOME INEQUALITIES 69
Proof. For p = 1 the inequality follows from |f + g| |f| + |g|. So assume
that 1 < p < . The convexity of x |x|
p
gives that
a + b
2
|a|
p
+|b|
p
2
and (a+b)
p
2
p1
(a
p
+b
p
) for a, b 0. Consequently, |f +g|
p
(|f|+|g|)
p
2
p1
(|f|
p
+|g|
p
) and
E|f + g|
p
2
p1
(E|f|
p
+E|g|
p
).
Assuming now that (E|f|
p
)
1
p
+ (E|g|
p
)
1
p
< , otherwise there is nothing to
prove, we get that E|f +g|
p
< as well by the above considerations. Taking
1 < q < with
1
p
+
1
q
= 1, we continue with
E|f + g|
p
= E|f + g||f + g|
p1
E(|f| +|g|)|f + g|
p1
= E|f||f + g|
p1
+E|g||f + g|
p1
(E|f|
p
)
1
p
_
E|f + g|
(p1)q
_
1
q
+ (E|g|
p
)
1
p
_
E|f + g|
(p1)q
_
1
q
,
where we have used H
2
Ef
2
2
.
Proof. From Corollary 3.6.6 we get that E|f| < so that Ef exists. Apply-
ing Proposition 3.6.1 to |f Ef|
2
gives that
P({|f Ef| }) = P({|f Ef|
2
2
})
E|f Ef|
2
2
.
Finally, we use that E(f Ef)
2
= Ef
2
(Ef)
2
Ef
2
.
70 CHAPTER 3. INTEGRATION
3.7 Theorem of Radon-Nikodym
Denition 3.7.1 (Signed measures) Let (, F) be a measurable space.
(i) A map : F R is called (nite) signed measure if and only if
=
+
,
where , 0 and
+
and
.
Assume that
_
(A) =
_
1I
A
L
dP
_
dP
.
Now we check that
() =
_
1I
dP
_
dP
= 1.
Then assume A
n
F to be disjoint sets such that A =
n=1
A
n
. Set :=
_
L
+
dP. Then
+
_
n=1
A
n
_
=
1
1I
n=1
A
n
L
+
dP
=
1
n=1
1I
A
n
()
_
L
+
dP
=
1
lim
N
_
N
n=1
1I
A
n
_
L
+
dP
=
1
lim
N
_
_
N
n=1
1I
A
n
_
L
+
dP
3.8. MODES OF CONVERGENCE 71
=
n=1
+
(A
n
)
where we have used Lebesgues dominated convergence theorem. The same
can be done for L
.
Theorem 3.7.3 (Radon-Nikodym) Let (, F, P) be a probability space
and a signed measure with P. Then there exists an integrable random
variable L : R such that
(A) =
_
A
L()dP(), A F. (3.6)
The random variable L is unique in the following sense. If L and L
are
random variables satisfying (3.6), then
P(L = L
) = 0.
The Radon-Nikodym theorem was proved by Radon
7
in 1913 in the case of
R
n
. The extension to the general case was done by Nikodym
8
in 1930.
Denition 3.7.4 L is called Radon-Nikodym derivative. We shall write
L =
d
dP
.
We should keep in mind the rule
(A) =
_
1I
A
d =
_
1I
A
LdP,
so that d = LdP.
3.8 Modes of convergence
First we introduce some basic types of convergence.
Denition 3.8.1 [Types of convergence] Let (, F, P) be a probabil-
ity space and f, f
1
, f
2
, ... : R random variables.
(1) The sequence (f
n
)
n=1
converges almost surely (a.s.) or with prob-
ability 1 to f (f
n
f a.s. or f
n
f P-a.s.) if and only if
P({ : f
n
() f() as n }) = 1.
7
Johann Radon, 16/12/1887 (Tetschen, Bohemia; now Decin, Czech Republic) -
25/05/1956 (Vienna, Austria).
8
Otton Marcin Nikodym, 13/08/1887 (Zablotow, Galicia, Austria-Hungary; now
Ukraine) - 04/05/1974 (Utica, USA).
72 CHAPTER 3. INTEGRATION
(2) The sequence (f
n
)
n=1
converges in probability to f (f
n
P
f) if and
only if for all > 0 one has
P({ : |f
n
() f()| > }) 0 as n .
(3) If 0 < p < , then the sequence (f
n
)
n=1
converges with respect to
L
p
or in the L
p
-mean to f (f
n
L
p
f) if and only if
E|f
n
f|
p
0 as n .
Note that { : f
n
() f() as n } and { : |f
n
() f()| > } are
measurable sets.
For the above types of convergence the random variables have to be dened
on the same probability space. There is a variant without this assumption.
Denition 3.8.2 [Convergence in distribution] Let (
n
, F
n
, P
n
) and
(, F, P) be probability spaces and let f
n
:
n
R and f : R be
random variables. Then the sequence (f
n
)
n=1
converges in distribution
to f (f
n
d
f) if and only if
E(f
n
) E(f) as n
for all bounded and continuous functions : R R.
We have the following relations between the above types of convergence.
Proposition 3.8.3 Let (, F, P) be a probability space and f, f
1
, f
2
, ... :
R be random variables.
(1) If f
n
f a.s., then f
n
P
f.
(2) If 0 < p < and f
n
L
p
f, then f
n
P
f.
(3) If f
n
P
f, then f
n
d
f.
(4) One has that f
n
d
f if and only if F
f
n
(x) F
f
(x) at each point x of
continuity of F
f
(x), where F
f
n
and F
f
are the distribution-functions of
f
n
and f, respectively.
(5) If f
n
P
f, then there is a subsequence 1 n
1
< n
2
< n
3
< such
that f
n
k
f a.s. as k .
Proof. See [4].
3.8. MODES OF CONVERGENCE 73
Example 3.8.4 Assume ([0, 1], B([0, 1]), ) where is the Lebesgue mea-
sure. We take
f
1
= 1I
[0,
1
2
)
, f
2
= 1I
[
1
2
,1]
,
f
3
= 1I
[0,
1
4
)
, f
4
= 1I
[
1
4
,
1
2
]
, f
5
= 1I
[
1
2
,
3
4
)
, f
6
= 1I
[
3
4
,1]
,
f
7
= 1I
[0,
1
8
)
, . . .
This implies lim
n
f
n
(x) 0 for all x [0, 1]. But it holds convergence in
probability f
n
_
1
2
if n = 1, 2
1
4
if n = 3, 4, . . . , 6
1
8
if n = 7, . . .
.
.
.
As a preview on the next probability course we give some examples for the
above concepts of convergence. We start with the weak law of large numbers
as an example of the convergence in probability:
Proposition 3.8.5 [Weak law of large numbers] Let (f
n
)
n=1
be a
sequence of independent random variables with
Ef
k
= m and E(f
k
m)
2
=
2
for all k = 1, 2, . . . .
Then
f
1
+ + f
n
n
P
m as n ,
that means, for each > 0,
lim
n
P
__
: |
f
1
+ + f
n
n
m| >
__
0.
Proof. By Chebyshevs inequality (Corollary 3.6.9) we have that
P
__
:
f
1
+ + f
n
nm
n
>
__
E|f
1
+ + f
n
nm|
2
n
2
2
=
E(
n
k=1
(f
k
m))
2
n
2
2
=
n
2
n
2
2
0
as n .
Using a stronger condition, we get easily more: the almost sure convergence
instead of the convergence in probability. This gives a form of the strong law
of large numbers.
74 CHAPTER 3. INTEGRATION
Proposition 3.8.6 [Strong law of large numbers]
Let (f
n
)
n=1
be a sequence of independent random variables with Ef
k
= 0,
k = 1, 2, . . . , and c := sup
n
Ef
4
n
< . Then
f
1
+ + f
n
n
a.s.
0.
Proof. Let S
n
:=
n
k=1
f
k
. It holds
ES
4
n
= E
_
n
k=1
f
k
_
4
= E
n
i,j,k,l,=1
f
i
f
j
f
k
f
l
=
n
k=1
Ef
4
k
+ 3
n
k,l=1
k=l
Ef
2
k
Ef
2
l
,
because for distinct {i, j, k, l} it holds
Ef
i
f
3
j
= Ef
i
f
2
j
f
k
= Ef
i
f
j
f
k
f
l
= 0
by independence. For example, Ef
i
f
3
j
= Ef
i
Ef
3
j
= 0 Ef
3
j
= 0, where one
gets that f
3
j
is integrable by E|f
j
|
3
(E|f
j
|
4
)
3
4
c
3
4
. Moreover, by Jensens
inequality,
_
Ef
2
k
_
2
Ef
4
k
c.
Hence Ef
2
k
f
2
l
= Ef
2
k
Ef
2
l
c for k = l. Consequently,
ES
4
n
nc + 3n(n 1)c 3cn
2
,
and
E
n=1
S
4
n
n
4
=
n=1
E
S
4
n
n
4
n=1
3c
n
2
< .
This implies that
S
4
n
n
4
a.s.
0 and therefore
S
n
n
a.s.
0.
There are several strong laws of large numbers with other, in particular
weaker, conditions. We close with a fundamental example concerning the
convergence in distribution: the Central Limit Theorem (CLT). For this we
need
Denition 3.8.7 Let (, F, P) be a probability spaces. A sequence of
Independent random variables f
n
: R is called Identically Distributed
(i.i.d.) provided that the random variables f
n
have the same law, that means
P(f
n
) = P(f
k
)
for all n, k = 1, 2, ... and all R.
3.8. MODES OF CONVERGENCE 75
Let (, F, P) be a probability space and (f
n
)
n=1
be a sequence of i.i.d. ran-
dom variables with Ef
1
= 0 and Ef
2
1
=
2
. By the law of large numbers we
know
f
1
+ + f
n
n
P
0.
Hence the law of the limit is the Dirac-measure
0
. Is there a right scaling
factor c(n) such that
f
1
+ + f
n
c(n)
g,
where g is a non-degenerate random variable in the sense that P
g
=
0
? And
in which sense does the convergence take place? The answer is the following
Proposition 3.8.8 [Central Limit Theorem] Let (f
n
)
n=1
be a sequence
of i.i.d. random variables with Ef
1
= 0 and Ef
2
1
=
2
> 0. Then
P
_
f
1
+ + f
n
n
x
_
2
_
x
u
2
2
du
for all x R as n , that means that
f
1
+ + f
n
n
d
g
for any g with P(g x) =
1
2
_
x
u
2
2
du.
76 CHAPTER 3. INTEGRATION
Chapter 4
Exercises
4.1 Probability spaces
1. Prove that A (B C) = (A B) (A C).
2. Prove that
_
iI
A
i
_
c
=
iI
A
c
i
where A
i
and I is an arbitrary
index set.
3. Given a set and two non-empty sets A, B such that A B = .
Give all elements of the smallest -algebra F on which contains A
and B.
4. Let x R. Is it true that {x} B(R), where {x} is the set, consisting
of the element x only?
5. Assume that Q is the set of rational numbers. Is it true that Q B(R)?
6. Given two dice with numbers {1, 2, ..., 6}. Assume that the probability
that one die shows a certain number is
1
6
. What is the probability that
the sum of the two dice is m {1, 2, ..., 12}?
7. There are three students. Assuming that a year has 365 days, what
is the probability that at least two of them have their birthday at the
same day ?
8.* Denition: The system F 2
is a monotonic class, if
(a) A
1
, A
2
, ... F, A
1
A
2
A
3
=
n
A
n
F, and
(b) A
1
, A
2
, ... F, A
1
A
2
A
3
=
n
A
n
F.
Show that if F 2
i=1
A
i
F if F is a -algebra and A
1
, A
2
, ... F.
13. Give an example where the union of two -algebras is not a -algebra.
14. Let F be a -algebra and A F. Prove that
G := {B A : B F}
is a -algebra.
15. Prove that
{B [, ] : B B(R)} = {[a, b] : a < b }
and that this -algebra is the smallest -algebra generated by the sub-
sets A [, ] which are open within [, ]. The generated -algebra
is denoted by B([, ]).
16. Show the equality (G
2
) = (G
4
) = (G
0
) in Proposition 1.1.8 holds.
17. Show that A B implies that (A) (B).
18. Prove ((G)) = (G).
19. Let = , A , A = and F := 2
. Dene
P(B) :=
_
1 : B A =
0 : B A =
.
Is (, F, P) a probability space?
4.1. PROBABILITY SPACES 79
20. Prove Proposition 1.2.6 (7).
21. Let (, F, P) be a probability space and A F such that P(A) > 0.
Show that (, F, ) is a probability space, where
(B) := P(B|A).
22. Let (, F, P) be a probability space. Show that if A, B F are inde-
pendent this implies
(a) A and B
c
are independent,
(b) A
c
and B
c
are independent.
23. Let (, F, P) be the model of rolling two dice, i.e. = {(k, l) : 1
k, l 6}, F = 2
and P(B) :=
1
6
card(B). We dene
A := {1, 4} and B := {2, 5}. Are A and B independent?
(b) Let := {(k, l) : k, l = 1, ..., 6}, F := 2
and P(B) :=
1
36
card(B).
We dene
A := {(k, l) : k = 1 or k = 4} and B := {(k, l) : l = 2 or l = 5} .
Are A and B independent?
25. In a certain community 60 % of the families own their own car, 30 %
own their own home, and 20% own both (their own car and their own
home). If a family is randomly chosen, what is the probability that
this family owns a car or a house but not both?
26. Prove Bayes formula: Proposition 1.2.15.
27. Suppose we have 10 coins which are such that if the ith one is ipped
then heads will appear with probability
i
10
, i = 1, 2, . . . 10. One chooses
randomly one of the coins. What is the conditionally probability that
one has chosen the fth coin given it had shown heads after ipping?
Hint: Use Bayes formula.
80 CHAPTER 4. EXERCISES
28.* A class that is both, a -system and a -system, is a -algebra.
29. Prove Property (3) in Lemma 1.4.4.
30.* Let (, F, P) be a probability space and assume A
1
, A
2
, ..., A
n
F are
independent. Show that then A
c
1
, A
c
2
, ..., A
c
n
are independent.
31. Let us play the following game: If we roll a die it shows each of the
numbers {1, 2, 3, 4, 5, 6} with probability
1
6
. Let k
1
, k
2
, ... {1, 2, 3, ...}
be a given sequence.
(a) First go: One has to roll the die k
1
times. If it did show all k
1
times 6 we won.
(b) Second go: We roll k
2
times. Again, we win if it would all k
2
times
show 6.
And so on... Show that
(a) The probability of winning innitely many often is 1 if and only
if
n=1
_
1
6
_
k
n
= ,
(b) The probability of loosing innitely many often is always 1.
Hint: Use the lemma of Borel-Cantelli.
32. Let n 1, k {0, 1, ..., n} and
_
n
k
_
:=
n!
k!(n k)!
where 0! := 1 and k! := 1 2 k, for k 1.
Prove that one has
_
n
k
_
possibilities to choose k elements out of n ele-
ments.
33. Binomial distributon: Assume 0 < p < 1, := {0, 1, 2, ..., n},
n 1, F := 2
and
n,p
(B) :=
kB
_
n
k
_
p
nk
(1 p)
k
.
(a) Is (, F,
n,p
) a probability space?
(b) Compute max
k=0,...,n
n,p
({k}) for p =
1
2
.
34. Geometric distribution: Let 0 < p < 1, := {0, 1, 2, ...}, F := 2
and
p
(B) :=
kB
p(1 p)
k
.
4.2. RANDOM VARIABLES 81
(a) Is (, F,
p
) a probability space?
(b) Compute
p
({0, 2, 4, 6, ...}).
35. Poisson distribution: Let > 0, := {0, 1, 2, ...}, F := 2
and
(B) :=
kB
e
k
k!
.
(a) Is (, F,
) a probability space?
(b) Compute
k
k
({k}).
4.2 Random variables
1. Let A
1
, A
2
, ... . Show that
(a) liminf
n
1I
A
n
() = 1I
liminf
n
A
n
(),
(b) limsup
n
1I
A
n
() = 1I
limsup
n
A
n
(),
for all .
2. Let (, F, P) be a probability space and A a set. Show that A F
if and only if 1I
A
: R is a random variable.
3. Show the assertions (1), (3) and (4) of Proposition 2.1.5.
4. Let (, F, P) be a probability space and f : R.
(a) Show that, if F = {, }, then f is measurable if and only if f is
constant.
(b) Show that, if P(A) is 0 or 1 for every A F and f is measurable,
then
P({ : f() = c}) = 1 for a constant c.
5. Let (, F) be a measurable space and f
n
, n = 1, 2, . . . a sequence of
random variables. Show that
liminf
n
f
n
and limsup
n
f
n
are random variables.
6. Complete the proof of Proposition 2.2.9 by showing that for a distribu-
tion function F
g
(x) = P({ : g x}) of a random variable g it holds
lim
x
F
g
(x) = 0 and lim
x
F
g
(x) = 1.
82 CHAPTER 4. EXERCISES
7. Let f : R be a map. Show that for A
1
, A
2
, R it holds
f
1
_
_
i=1
A
i
_
=
_
i=1
f
1
(A
i
).
8. Let (
1
, F
1
), (
2
, F
2
), (
3
, F
3
) be measurable spaces. Assume that
f :
1
2
is (F
1
, F
2
)-measurable and that g :
2
3
is (F
2
, F
3
)-
measurable. Show that then g f :
1
3
dened by
(g f)(
1
) := g(f(
1
))
is (F
1
, F
3
)-measurable.
9. Prove Proposition 2.1.4.
10. Let (, F, P) be a probability space, (M, ) a measurable space and
assume that f : M is (F, )-measurable. Show that with
(B) := P({ : f() B}) for B
is a probability measure on .
11. Assume the probability space (, F, P) and let f, g : R be mea-
surable step-functions. Show that f and g are independent if and only
if for all x, y R it holds
P({f = x, g = y}) = P({f = x}) P({g = y})
12. Assume the product space ([0, 1] [0, 1], B([0, 1]) B([0, 1]), ) and
the random variables f(x, y) := 1I
[0,p)
(x) and g(x, y) := 1I
[0,p)
(y), where
0 < p < 1. Show that
(a) f and g are independent,
(b) the law of f + g, P
f+g
({k}), for k = 0, 1, 2 is the binomial distri-
bution
2,p
.
13. Let ([0, 1], B([0, 1])) be a measurable space and dene the functions
f(x) := 1I
[0,1/2)
(x) + 21I
[1/2,1])
(x),
and
g(x) := 1I
[0,1/2)
(x) 1I
{1/4}
(x).
Compute (a) (f) and (b) (g).
14. Assume a measurable space (, F) and random variables f, g : R.
Is the set { : f() = g()} measurable?
4.3. INTEGRATION 83
15. Let G := {(a, b), 0 < a < b < 1} be a -algebra on [0, 1]. Which
continuous functions f : [0, 1] R are measurable with respect to G ?
16. Let f
k
: R, k = 1, . . . , n be independent random variables on
(, F, P) and assume the functions g
k
: R R, k = 1, . . . , n are B(R)
measurable. Show that g
1
(f
1
), g
2
(f
2
), . . . , g
n
(f
n
) are independent.
17. Assume n N and dene by F := {
_
k
n
,
k+1
n
for k = 0, 1, . . . , n 1}
a algebra on [0, 1].
(a) Why is the function f(x) = x, x [0, 1] not Fmeasurable?
(b) Give an example of an Fmeasurable function.
18. Prove Proposition 2.3.4.
19.* Show that families of random variables are independent i (if and only
if) their generated -algebras are independent.
20. Show Proposition 2.3.5.
4.3 Integration
1. Let (, F, P) be a probability space and f, g : R random variables
with
f =
n
i=1
a
i
1I
A
i
and g =
m
j=1
b
j
1I
B
j
,
where a
i
, b
j
R and A
i
, B
j
F. Assume that {A
1
, . . . , A
n
} and
{B
1
, . . . B
m
} are independent. Show that
Efg = EfEg.
2. Prove assertion (4) and (5) of Proposition 3.2.1
Hints: To prove (4), set A
n
:=
_
: f() >
1
n
_
and show that
0 = Ef1I
A
n
E
1
n
1I
A
n
=
1
n
E1I
A
n
=
1
n
P(A
n
).
This implies P(A
n
) = 0 for all n. Using this one gets
P{ : f() > 0} = 0.
To prove (5) use
Ef = E(f1I
{f=g}
+ f1I
{f=g}
) = Ef1I
{f=g}
+Ef1I
{f=g}
= Ef1I
{f=g}
.
84 CHAPTER 4. EXERCISES
3. Let (, F, P) be a probability space and A
1
, A
2
, ... F. Show that
P
_
liminf
n
A
n
_
liminf
n
P(A
n
)
limsup
n
P(A
n
) P
_
limsup
n
A
n
_
.
Hint: Use Fatous lemma and Exercise 4.2.1.
4. Assume the probability space ([0, 1], B([0, 1]), ) and the random vari-
able
f =
k=0
k1I
[a
k1
,a
k
)
with a
1
:= 0 and
a
k
:= e
m=0
m
m!
, for k = 0, 1, 2, . . .
where > 0. Compute the law P
f
({k}), k = 0, 1, 2, . . . of f. Which
distribution has f?
5. Assume the probability space ((0, 1], B((0, 1]), ) and
f(x) :=
_
1 for irrational x (0, 1]
1
x
for rational x (0, 1]
Compute Ef.
6. uniform distribution:
Assume the probability space ([a, b], B([a, b]),
ba
), where
< a < b < .
(a) Show that
Ef =
_
[a,b]
f()
1
b a
d() =
a + b
2
where f() := .
(b) Compute
Eg =
_
[a,b]
g()
1
b a
d() = where g() :=
2
.
(c) Compute the variance E(f Ef)
2
of the random variable f() :=
.
Hint: Use a) and b).
4.3. INTEGRATION 85
7. Poisson distribution: Let > 0, := {0, 1, 2, 3, ...}, F := 2
and
k=0
e
k
k!
k
.
(a) Show that
Ef =
_
f()d
() = where f(k) := k.
(b) Show that
Eg =
_
g()d
() =
2
where g(k) := k(k 1).
(c) Compute the variance E(f Ef)
2
of the random variable f(k) :=
k.
Hint: Use a) and b).
8. Binomial distribution: Let 0 < p < 1, := {0, 1, ..., n}, n 2,
F := 2
and
n,p
=
n
k=0
_
n
k
_
p
k
(1 p)
(nk)
k
.
(a) Show that
Ef =
_
f()d
n,p
() = np where f(k) := k.
(b) Show that
Eg =
_
g()d
n,p
() = n(n 1)p
2
where g(k) := k(k 1).
(c) Compute the variance E(f Ef)
2
of the random variable f(k) :=
k.
Hint: Use a) and b).
9. Assume integrable, independent random variables f
i
: R with
Ef
2
i
< , mean Ef
i
= m
i
and variance
2
i
= E(f
i
m
i
)
2
, for i =
1, 2, . . . , n. Compute the mean and the variance of
(a) g = af
1
+ b, where a, b R,
(b) g =
n
i=1
f
i
.
86 CHAPTER 4. EXERCISES
10. Let the random variables f and g be independent and Poisson dis-
trubuted with parameters
1
and
2
, respectively. Show that f + g is
Poisson distributed
P
f+g
({k}) = P(f + g = k) =
k
l=0
P(f = l, g = k l) = . . . .
Which parameter does this Poisson distribution have?
11. Assume that f ja g are independent random variables where E|f| <
and E|g| < . Show that E|fg| < and
Efg = EfEg.
Hint:
(a) Assume rst f 0 and g 0 and show using the stair-case
functions to approximate f and g that
Efg = EfEg.
(b) Use (a) to show E|fg| < , and then Lebesgues Theorem for
Efg = EfEg.
in the general case.
12. Use Holders inequality to show Corollary 3.6.6.
13. Let f
1
, f
2
, . . . be non-negative random variables on (, F, P). Show that
E
k=1
f
k
=
k=1
Ef
k
( ).
14. Use Minkowskis inequality to show that for sequences of real numbers
(a
n
)
n=1
and (b
n
)
n=1
and 1 p < it holds
_
n=1
|a
n
+ b
n
|
p
_1
p
n=1
|a
n
|
p
_1
p
+
_
n=1
|b
n
|
p
_1
p
.
15. Prove assertion (2) of Proposition 3.2.3.
16. Assume a sequence of i.i.d. random variables (f
k
)
k=1
with Ef
1
= m
and variance E(f
1
m)
2
=
2
. Use the Central Limit Theorem to show
that
P
__
:
f
1
+ f
2
+ + f
n
nm
n
x
__
1
2
_
x
u
2
2
du
as n .
4.3. INTEGRATION 87
17. Let ([0, 1], B([0, 1]), ) be a probability space. Dene the functions
f
n
: R by
f
2n
(x) := n
3
1I
[0,1/2n]
(x) and f
2n1
(x) := n
3
1I
[11/2n,1]
(x),
where n = 1, 2, . . . .
(a) Does there exist a random variable f : R such that f
n
f
almost surely?
(b) Does there exist a random variable f : R such that f
n
P
f?
(c) Does there exist a random variable f : R such that f
n
L
P
f?
Index
-system, 31
liminf
n
A
n
, 17
liminf
n
a
n
, 17
limsup
n
A
n
, 17
limsup
n
a
n
, 17
-system, 24
-systems and uniqueness of mea-
sures, 24
-Theorem, 31
-algebra, 10
-nite, 14
algebra, 10
axiom of choice, 32
Bayes formula, 19
binomial distribution, 25
Borel -algebra, 13
Borel -algebra on R
n
, 24
Caratheodorys extension
theorem, 21
central limit theorem, 75
change of variables, 58
Chebyshevs inequality, 66
closed set, 12
conditional probability, 18
convergence almost surely, 71
convergence in L
p
, 71
convergence in distribution, 72
convergence in probability, 71
convexity, 66
counting measure, 15
Dirac measure, 15
distribution-function, 40
dominated convergence, 55
equivalence relation, 31
event, 10
existence of sets, which are not Borel,
32
expectation of a random variable, 47
expected value, 47
exponential distribution on R, 28
extended random variable, 60
Fubinis Theorem, 61, 62
Gaussian distribution on R, 28
geometric distribution, 25
Holders inequality, 67
i.i.d. sequence, 74
independence of a family of random
variables, 42
independence of a nite family of ran-
dom variables, 42
independence of a sequence of events,
18
Jensens inequality, 66
law of a random variable, 39
Lebesgue integrable, 47
Lebesgue measure, 26, 27
Lebesgues Theorem, 55
lemma of Borel-Cantelli, 20
lemma of Fatou, 18
lemma of Fatou for random variables,
54
measurable map, 38
measurable space, 10
measurable step-function, 35
measure, 14
measure space, 14
Minkowskis inequality, 68
88
INDEX 89
moments, absolute moments, 59
monotone class theorem, 60
monotone convergence, 53
open set, 12
Poisson distribution, 25
Poissons Theorem, 29
probability measure, 14
probability space, 14
product of probability spaces, 23
random variable, 36
realization of independent
random variables, 44
step-function, 35
strong law of large numbers, 74
uniform distribution, 26, 27
variance, 49
vector space, 59
weak law of large numbers, 73
90 INDEX
Bibliography
[1] H. Bauer. Probability theory. Walter de Gruyter, 1996.
[2] H. Bauer. Measure and integration theory. Walter de Gruyter, 2001.
[3] P. Billingsley. Probability and Measure. Wiley, 1995.
[4] A.N. Shiryaev. Probability. Springer, 1996.
[5] D. Williams. Probability with martingales. Cambridge University Press,
1991.
91