Professional Documents
Culture Documents
An Introduction To Probability Theory - Geiss
An Introduction To Probability Theory - Geiss
1 Probability spaces 7
1.1 Definition of σ-algebras . . . . . . . . . . . . . . . . . . . . . . 8
1.2 Probability measures . . . . . . . . . . . . . . . . . . . . . . . 12
1.3 Examples of distributions . . . . . . . . . . . . . . . . . . . . 20
1.3.1 Binomial distribution with parameter 0 < p < 1 . . . . 20
1.3.2 Poisson distribution with parameter λ > 0 . . . . . . . 21
1.3.3 Geometric distribution with parameter 0 < p < 1 . . . 21
1.3.4 Lebesgue measure and uniform distribution . . . . . . 21
1.3.5 Gaussian distribution on R with mean m ∈ R and
variance σ 2 > 0 . . . . . . . . . . . . . . . . . . . . . . 22
1.3.6 Exponential distribution on R with parameter λ > 0 . 22
1.3.7 Poisson’s Theorem . . . . . . . . . . . . . . . . . . . . 24
1.4 A set which is not a Borel set . . . . . . . . . . . . . . . . . . 25
2 Random variables 29
2.1 Random variables . . . . . . . . . . . . . . . . . . . . . . . . . 29
2.2 Measurable maps . . . . . . . . . . . . . . . . . . . . . . . . . 31
2.3 Independence . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
3 Integration 39
3.1 Definition of the expected value . . . . . . . . . . . . . . . . . 39
3.2 Basic properties of the expected value . . . . . . . . . . . . . . 42
3.3 Connections to the Riemann-integral . . . . . . . . . . . . . . 48
3.4 Change of variables in the expected value . . . . . . . . . . . . 49
3.5 Fubini’s Theorem . . . . . . . . . . . . . . . . . . . . . . . . . 51
3.6 Some inequalities . . . . . . . . . . . . . . . . . . . . . . . . . 58
4 Modes of convergence 63
4.1 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
4.2 Some applications . . . . . . . . . . . . . . . . . . . . . . . . . 64
3
4 CONTENTS
Introduction
The modern period of probability theory is connected with names like S.N.
Bernstein (1880-1968), E. Borel (1871-1956), and A.N. Kolmogorov (1903-
1987). In particular, in 1933 A.N. Kolmogorov published his modern ap-
proach of Probability Theory, including the notion of a measurable space
and a probability space. This lecture will start from this notion, to continue
with random variables and basic parts of integration theory, and to finish
with some first limit theorems.
The lecture is based on a mathematical axiomatic approach and is intended
for students from mathematics, but also for other students who need more
mathematical background for their further studies. We assume that the
integration with respect to the Riemann-integral on the real line is known.
The approach, we follow, seems to be in the beginning more difficult. But
once one has a solid basis, many things will be easier and more transparent
later. Let us start with an introducing example leading us to a problem
which should motivate our axiomatic approach.
D1 := T − S1 , D2 := T − S2 , D3 := T − S3 , ...
5
6 CONTENTS
Probability spaces
Example 1.0.1 (a) If we roll a die, then all possible outcomes are the
numbers between 1 and 6. That means
Ω = {1, 2, 3, 4, 5, 6}.
(b) If we flip a coin, then we have either ”heads” or ”tails” on top, that
means
Ω = {H, T }.
If we have two coins, then we would get
Ω = [0, ∞).
Example 1.0.2 (a) The event ”the die shows an even number” can be
described by
A = {2, 4, 6}.
7
8 CHAPTER 1. PROBABILITY SPACES
(c) ”The bulb works more than 200 hours” we express via
A = (200, ∞).
Example 1.0.3 (a) We assume that all outcomes for rolling a die are
equally likely, that is
P({ω}) = 16 .
Then
P({2, 4, 6}) = 12 .
(b) If we assume we have two fair coins, that means they both show head
and tail equally likely, the probability that exactly one of two coins
shows head is
P({(H, T ), (T, H)}) = 21 .
(c) The probability of the lifetime of a bulb we will consider at the end of
Chapter 1.
(1) ∅, Ω ∈ F,
Every σ-algebra is an algebra. Sometimes, the terms σ-field and field are
used instead of σ-algebra and algebra. We consider some first examples.
is a σ-algebra as well.
10 CHAPTER 1. PROBABILITY SPACES
Proof. The proof is very easy, but typical and T fundamental. First we notice
T ∅, Ω ∈ Fj for all j ∈ J, so that ∅, Ω ∈ j∈J Fj . Now let A, A1 , A2 , ... ∈
that
j∈J Fj . Hence A, A1 , A2 , ... ∈ Fj for all j ∈ J, so that (Fj are σ–algebras!)
∞
[
c
A = Ω\A ∈ Fj and Ai ∈ F j
i=1
G ⊆ σ(G).
Proof. We let
G ⊆ 2Ω
The construction is very elegant but has, as already mentioned, the slight
disadvantage that one cannot explicitly construct all elements of σ(G). Let
us now turn to one of the most important examples, the Borel σ-algebra on
R. To do this we need the notion of open and closed sets.
1.1. DEFINITION OF σ-ALGEBRAS 11
It should be noted, that by definition the empty set ∅ is open and closed.
σ(G3 ) ⊆ σ(G0 ).
(x − εx , x + εx ) ⊆ A.
Hence [
A= (x − εx , x + εx ),
x∈A∩ Q
12 CHAPTER 1. PROBABILITY SPACES
σ(G0 ) ⊆ σ(G5 ).
µ(A) := cardinality of A.
Let us now discuss a typical example in which the σ–algebra F is not the set
of all subsets of Ω.
Ak := {ω ∈ Ω : ε1 + · · · + εn = k} .
Hence Ak consists of all ω such that the communication rate is ρk. The
system F is the system of observable sets of events since one can only observe
how many channels are failing, but not which channels are failing. The
measure P is given by
P(Ak ) := nk pn−k (1 − p)k , 0 < p < 1.
Note that P describes the binomial distribution with parameter p on
{0, ..., n} if we identify Ak with the natural number k.
(1) Without assuming that P(∅) = 0 the σ-additivity (1.1) implies that
P(∅) = 0.
If A1 , ..., An ∈ F such that Ai ∩ Aj = ∅ if i 6= j, then
(2) P P (Sni=1 Ai) =
i=1 P (Ai ).
n
because of P(∅) = 0.
(3) Since (A ∩ B) ∩ (A\B) = ∅, we get that
P(A ∩ B) + P(A\B) = P ((A ∩ B) ∪ (A\B)) = P(A).
(4) We apply (3) to A = Ω and observe that Ω\B = B c by definition and
Ω ∩ B = B.
(5) Put B1 := A1 and Bi := Ac1 ∩Ac2 ∩· · ·∩Aci−1 ∩Ai for i = 2, 3, . . . Obviously,
P(Bi) ≤ P(Ai) for all i. Since the Bi’s are disjoint and S∞i=1 Ai = S∞i=1 Bi it
follows
∞
! ∞
! ∞ ∞
P Ai = P Bi = P(Bi) ≤ P(Ai).
[ [ X X
Definition 1.2.5 [lim inf n An and lim supn An ] Let (Ω, F) be a measurable
space and A1 , A2 , ... ∈ F. Then
∞ \
[ ∞ ∞ [
\ ∞
lim inf An := Ak and lim sup An := Ak .
n n
n=1 k=n n=1 k=n
The definition above says that ω ∈ lim inf n An if and only if all events An ,
except a finite number of them, occur, and that ω ∈ lim supn An if and only
if infinitely many of the events An occur.
Definition 1.2.6 [lim inf n ξn and lim supn ξn ] For ξ1 , ξ2 , ... ∈ R we let
Remark 1.2.7 (1) The value lim inf n ξn is the infimum of all c such that
there is a subsequence n1 < n2 < n3 < · · · such that limk ξnk = c.
(2) The value lim supn ξn is the supremum of all c such that there is a
subsequence n1 < n2 < n3 < · · · such that limk ξnk = c.
P(A ∩ B) 6= P(A)P(B)
and C = ∅ gives
P(A ∩ B ∩ C) = P(A)P(B)P(C),
which is surely not, what we had in mind.
P(B|A) := P(B ∩ A)
P(A) , for B ∈ F,
P(B|A) = P(B ∩ A)
P(A) =
P(A|B)P(B)
P(A)
P(A|B)P(B)
=
P(A|B)P(B) + P(A|B c)P(B c) .
Let us consider an
Hence we have
That means only 32% of the persons whose test results are positive actually
have the disease.
Since the independence of A1 , A2 , ... implies the independence of Ac1 , Ac2 , ...,
we finally get (setting pn := P(An )) that
∞
! N
!
P lim P
\ \
c c
Ak = Ak
N →∞,N ≥n
k=n k=n
N
P (Ack )
Y
= lim
N →∞,N ≥n
k=n
YN
= lim (1 − pk )
N →∞,N ≥n
k=n
YN
≤ lim e−pn
N →∞,N ≥n
k=n
− N
P
= lim e k=n pn
N →∞,N ≥n
− ∞
P
= e k=n pn
= e−∞
= 0
where we have used that 1 − x ≤ e−x for x ≥ 0.
F := σ(G).
(1) Ω1 × Ω2 := {(ω1 , ω2 ) : ω1 ∈ Ω1 , ω2 ∈ Ω2 }.
A1 × A2 := {(ω1 , ω2 ) : ω1 ∈ A1 , ω2 ∈ A2 } with A1 ∈ F1 , A2 ∈ F2 .
(3) P(B) = µn,p (B) := nk=0 nk pk (1 − p)n−k δk (B), where δk is the Dirac
P
Interpretation: Coin-tossing with one coin, such that one has head with
probability p and tail with probability 1 − p. Then µn,p ({k}) is equals the
probability, that within n trials one has k-times head.
1.3. EXAMPLES OF DISTRIBUTIONS 21
or
A = {a} ∪ (a1 , b1 ] ∪ (a2 , b2 ] ∪ · · · ∪ (an , bn ]
where a ≤ a1 ≤ b1 ≤ · · · ≤ an ≤ bn ≤ b. For such a set A we let
∞
! ∞
[ X
λ0 (ai , bi ] := (bi − ai ).
i=1 i=1
22 CHAPTER 1. PROBABILITY SPACES
where we have on the left-hand side the conditional probability. In words: the
probability of a realization larger or equal to a + b under the condition that
one has already a value larger or equal a is the same as having a realization
larger or equal b. Indeed, it holds
Example 1.3.2 Suppose that the amount of time one spends in a post office
1
is exponential distributed with λ = 10 .
(a) What is the probability, that a customer will spend more than 15 min-
utes?
(b) What is the probability, that a customer will spend more than 15 min-
utes in the post office, given that she or he is already there for at least
10 minutes?
1
The answer for (a) is µλ ([15, ∞)) = e−15 10 ≈ 0.220. For (b) we get
1
µλ ([15, ∞)|[10, ∞)) = µλ ([5, ∞)) = e−5 10 ≈ 0.604.
24 CHAPTER 1. PROBABILITY SPACES
= lim
n→∞ −1/(n − k)2
= −(λ + ε0 ).
Hence
n−k n−k
−(λ+ε0 ) λ + ε0 λ + εn
e = lim 1− ≤ lim 1− .
n→∞ n n→∞ n
In the same way we get
n−k
λ + εn
lim 1− ≤ e−(λ−ε0 ) .
n→∞ n
1.4. A SET WHICH IS NOT A BOREL SET 25
Given x, y ∈ (0, 1] and A ⊆ (0, 1], we also need the addition modulo one
x+y if x + y ∈ (0, 1]
x ⊕ y :=
x + y − 1 otherwise
and
A ⊕ x := {a ⊕ x : a ∈ A}.
Now define
(B ⊕ x) \ (A ⊕ x) = (B \ A) ⊕ x,
ϕ : α → m α ∈ Mα .
In other words, one can form a set by choosing of each set Mα a representative
mα .
Proposition 1.4.6 There exists a subset H ⊆ (0, 1] which does not belong
to B((0, 1]).
P := {(a, b] : 0 ≤ a < b ≤ 1}
So, the right hand side can either be 0 (if a = 0) or ∞ (if a > 0). This leads
to a contradiction, so H 6∈ B((0, 1]).
28 CHAPTER 1. PROBABILITY SPACES
Chapter 2
Random variables
Given a probability space (Ω, F, P), in many stochastic models one considers
functions f : Ω → R, which describe certain random phenomena, and is
interested in the computation of expressions like
P ({ω ∈ Ω : f (ω) ∈ (a, b)}) , where a < b.
This yields us to the condition
{ω ∈ Ω : f (ω) ∈ (a, b)} ∈ F
and hence to random variables we introduce now.
where
1 : ω ∈ Ai
1IAi (ω) := .
0 : ω 6∈ Ai
29
30 CHAPTER 2. RANDOM VARIABLES
The definition above concerns only functions which take finitely many values,
which will be too restrictive in future. So we wish to extend this definition.
Does our definition give what we would like to have? Yes, as we see from
k=−4n
2n { 2n ≤f < 2n }
Sometimes the following proposition is useful which is closely connected to
Proposition 2.1.3.
2.2. MEASURABLE MAPS 31
Hence
(f g)(ω) = lim fn (ω)gn (ω).
n→∞
yields
k X
X l k X
X l
(fn gn )(ω) = αi βj 1IAi (ω)1IBj (ω) = αi βj 1IAi ∩Bj (ω)
i=1 j=1 i=1 j=1
Proof of Proposition 2.2.2. (2) =⇒ (1) follows from (a, b) ∈ B(R) for a < b
which implies that f −1 ((a, b)) ∈ F.
(1) =⇒ (2) is a consequence of Lemma 2.2.3 since B(R) = σ((a, b) : −∞ <
a < b < ∞).
2.2. MEASURABLE MAPS 33
Proof. Since f is continuous we know that f −1 ((a, b)) is open for all −∞ <
a < b < ∞, so that f −1 ((a, b)) ∈ B(R). Since the open intervals generate
B(R) we can apply Lemma 2.2.3.
Now we state some general properties of measurable maps.
is (F1 , F3 )-measurable.
(2) Assume that P is a probability measure on F1 and define
µ(B2 ) := P ({ω1 ∈ Ω1 : f (ω1 ) ∈ B2 }) .
Pf (B) := P (ω ∈ Ω : f (ω) ∈ B)
is called the law of the random variable f .
34 CHAPTER 2. RANDOM VARIABLES
{ω ∈ Ω : f (ω) ≤ x1 } ⊆ {ω ∈ Ω : f (ω) ≤ x2 }
and
(iii) The properties limx→−∞ F (x) = 0 and limx→∞ F (x) = 1 are an exercise.
(1) µ1 = µ2 .
Proof. (1) ⇒ (2) is of course trivial. We consider (2) ⇒ (1): For sets of type
A := (a1 , b1 ] ∪ · · · ∪ (an , bn ],
~
w
w Lemma 2.2.3
w
~
w
wProposition 2.2.2
w
2.3 Independence
Let us first start with the notion of a family of independent random variables.
In case, we have a finite index set I, that means for example I = {1, ..., n},
then the definition above is equivalent to
(2) For all families (Bi )i∈I of Borel sets Bi ∈ B(R) one has that the events
({ω ∈ Ω : fi (ω) ∈ Bi })i∈I are independent.
for all x ∈ R (one says that the distribution-functions Ff and Fg are abso-
lutely continuous with densities pf and pg , respectively). Then the indepen-
dence of f and g is also equivalent to
Z x Z y
F(f,g) (x, y) = pf (u)pg (v)d(u)d(v).
−∞ −∞
Ω := RN = {x = (x1 , x2 , ...) : xn ∈ R} .
Πn (x) := xn ,
that means Πn filters out the n-th coordinate. Now we take the smallest
σ-algebra such that all these projections are random variables, that means
we take
B(RN ) := σ Π−1 n (B) : n = 1, 2, ..., B ∈ B(R) ,
38 CHAPTER 2. RANDOM VARIABLES
B1 × B2 × · · · × Bn × R × R · · · := x ∈ RN : x1 ∈ B1 , ..., xn ∈ Bn .
that means
P(πn ∈ B) = Pn(B)
for all B ∈ B(R).
Chapter 3
Integration
Ef = f dP = f (ω)dP(ω)
Z Z
Ω Ω
We have to check that the definition is correct, since it might be that different
representations give different expected values Eg. However, this is not the
case as shown by
αi P(Ai ) = βj P(Bj ).
Pn Pm
one has that i=1 j=1
39
40 CHAPTER 3. INTEGRATION
Proof. By subtracting in both equations the right-hand side from the left-
hand one we only need to show that
n
X
αi 1IAi = 0
i=1
implies that
n
αi P(Ai ) = 0.
X
i=1
By taking all possible intersections of the sets Ai and by adding appropriate
complements we find a system of sets C1 , ..., CN ∈ F such that
(a) Cj ∩ Ck = ∅ if j 6= k,
(b) N
S
j=1 Cj = Ω,
S
(c) for all Ai there is a set Ii ⊆ {1, ..., N } such that Ai = j∈Ii Cj .
Now we get that
n n X N
! N
X X X X X
0= αi 1IAi = αi 1ICj = αi 1ICj = γj 1ICj
i=1 i=1 j∈Ii j=1 i:j∈Ii j=1
and
n m
E(αf + βg) = α αi P(Ai ) + β βj P(Bj ) = αEf + β Eg.
X X
i=1 j=1
3.1. DEFINITION OF THE EXPECTED VALUE 41
Ef = f dP = f (ω)dP(ω)
Z Z
Ω Ω
:= sup {Eg : 0 ≤ g(ω) ≤ f (ω), g is a measurable step-function} .
Note that in this definition the case Ef = ∞ is allowed. In the last step we
define the expectation for a general random variable.
A A Ω
A simple example for the expectation is the expected value while rolling a
die:
{ω ∈ Ω : P(ω) holds}
belongs to F and is of measure one. Let us start with some first properties
of the expected value.
Proof. (1) follows directly from the definition. Property (2) can be seen
as follows: by definition, the random variable f is integrable if and only if
Ef + < ∞ and Ef − < ∞. Since
ω ∈ Ω : f + (ω) 6= 0 ∩ ω ∈ Ω : f − (ω) 6= 0 = ∅
The next lemma is useful later on. In this lemma we use, as an approximation
for f , a staircase-function. This idea was already exploited in the proof of
Proposition 2.1.3.
If f (ω) ≥ 0 for all ω ∈ Ω, then one can arrange fn (ω) ≥ 0 for all
ω ∈ Ω.
Ef = n→∞
lim Efn .
k=−4n
2n { 2n ≤f < 2n }
k=0
2n { 2n ≤f < 2n }
we get 0 ≤ fn0 (ω) ↑ f (ω) for all ω ∈ Ω. On the other hand, by the definition
of the expectation there exits a sequence 0 ≤ gn (ω) ≤ f (ω) of measurable
step-functions such that Egn ↑ Ef . Hence
hn := max fn0 , g1 , . . . , gn
Consider
dk,n := fk ∧ hn .
Clearly, dk,n ↑ fk as n → ∞ and dk,n ↑ hn as k → ∞. Let
Hence
Ef = lim
n
Ehn = lim
n
lim Edk,n = lim lim Edk,n = Efn
k k n
where we have used the following fact: if 0 ≤ ϕn (ω) ↑ ϕ(ω) for step-functions
ϕn and ϕ, then
lim Eϕn = Eϕ.
n
To check this, it is sufficient to assume that ϕ(ω) = 1IA (ω) for some A ∈ F.
Let ε ∈ (0, 1) and
Bεn := {ω ∈ A : 1 − ε ≤ ϕn (ω)} .
Then
(1 − ε)1IBεn (ω) ≤ ϕn (ω) ≤ 1IA (ω).
Since Bεn ⊆ Bεn+1 and ∞ n
S
n=1 Bε = A we get, by the monotonicity of the
measure, that limn P(Bε ) = P(A) so that
n
Eϕ = P(A) ≤ lim
n
Eϕn ≤ Eϕ
and are done.
for all ω ∈ Ω. Lemma 3.2.2, Proposition 3.1.3, and ϕn (ω) + ψn (ω) ↑ ϕ(ω) +
ψ(ω) give that
Eϕ + Eψ = lim
n
Eϕn + lim
n
Eψn = lim
n
E(ϕn + ψn) = E(ϕ + ψ).
(2) is an exercise.
(3) If Ef − = ∞ or Eg + = ∞, then Ef = −∞ or Eg = ∞ so that nothing
is to prove. Hence assume that Ef − < ∞ and Eg + < ∞. The inequality
f ≤ g gives 0 ≤ f + ≤ g + and 0 ≤ g − ≤ f − so that f and g are integrable
and
Ef = Ef + − Ef − ≤ Eg+ − Eg− = Eg.
(4) Since (af + bg)+ ≤ |a||f | + |b||g| and (af + bg)− ≤ |a||f | + |b||g| we get
that af + bg is integrable. The equality for the expected values follows from
(1) and (2).
For each fn take a sequence of step functions (fn,k )k≥1 such that 0 ≤ fn,k ↑ fn ,
as k → ∞. Setting
hN := max fn,k
1≤k≤N
1≤n≤N
fn,N ≤ hN ≤ fN ,
fn ≤ h ≤ f,
and therefore
f = lim fn ≤ h ≤ f.
n→∞
Ef ≤ Nlim
→∞
EfN .
On the other hand, fn ≤ fn+1 ≤ f implies Efn ≤ Ef and hence
lim
n→∞
Efn ≤ Ef.
(b) Now let 0 ≤ fn (ω) ↑ f (ω) a.s. By definition, this means that
where P(A) = 0. Hence 0 ≤ fn (ω)1IAc (ω) ↑ f (ω)1IAc (ω) for all ω and step
(a) implies that
lim Efn 1IAc = Ef 1IAc .
n
Proof. We only prove the first inequality. The second one follows from the
definition of lim sup and lim inf, the third one can be proved like the first
one. So we let
Zk := inf fn
n≥k
Proposition 3.2.8 If f and g are independent and E|f | < ∞ and E|g| <
∞, then E|f g| < ∞ and
Ef g = Ef Ef.
The proof is an exercise.
48 CHAPTER 3. INTEGRATION
with the Riemann-integral on the left-hand side and the expectation of the
random variable f with respect to the probability space ([0, 1], B([0, 1]), λ),
where λ is the Lebesgue measure, on the right-hand side.
Then
∞
f (x)p(x)dx = Ef
Z
−∞
with the Riemann-integral on the left-hand side and the expectation of the
random variable f with respect to the probability space (R, B(R), P) on the
right-hand side.
Let us consider two examples indicating the difference between the Riemann-
integral and our expected value.
3.4. CHANGE OF VARIABLES IN THE EXPECTED VALUE 49
Then
g(η)dPϕ (η) = g(ϕ(ω))dP(ω)
Z Z
A ϕ−1 (A)
for all A ∈ E in the sense that if one integral exists, the other exists as well,
and their values are equal.
E Ω
E Ω
then we are done. By additivity it is enough to check gn (η) = 1IB (η) for some
B ∈ E (if this is true for this case, then one can multiply by real numbers
and can take sums and the equality remains true). But now we get
for allR ∞ < a < b < ∞ where p : R → [0, ∞) is a continuous function such
∞
that −∞ p(x)dx = 1 using the Riemann-integral. Letting n ∈ {1, 2, ...} and
g(x) := xn , we get that
Z ∞
Eϕ = ϕ(ω) dP(ω) = g(x)dPϕ(x) =
Z Z
n n
xn p(x)dx
Ω R −∞
with pk ≥ 0, ∞
P
k=1 pk = 1, and some ηk ∈ E (that means that the image
measure of P with respect to ϕ is ’discrete’). Then
∞
g(ϕ(ω))dP(ω) = g(η)dPϕ (η) =
Z Z X
pk g(ηk ).
Ω R k=1
(5) 1 x = x.
(2) 1IΩ ∈ H.
For the following it is convenient to allow that the random variables may
take infinite values.
f dP = lim [f ∧ N ]dP.
Z Z
Ω N →∞ Ω
For the following, we recall that the product space (Ω1 ×Ω2 , F1 ⊗F2 , P1 × P2 )
of the two probability spaces (Ω1 , F1 , P1 ) and (Ω2 , F2 , P2 ) was defined in
Definition 1.2.15.
Ω1 ×Ω2 Ω1 Ω2
Z Z
= f (ω1 , ω2 )dP1 (ω1 ) dP2 (ω2 ).
Ω2 Ω1
It should be noted, that item (3) together with Formula (3.2) automatically
implies that
P2 ω2 : f (ω1, ω2)dP1(ω1) = ∞ = 0
Z
Ω1
and
P1 f (ω1 , ω2 )dP2 (ω2 ) = ∞
Z
ω1 : = 0.
Ω2
Proof of Proposition 3.5.4.
(i) First we remark it is sufficient to prove the assertions for
fN (ω1 , ω2 ) := min {f (ω1 , ω2 ), N }
which is bounded. The statements (1), (2), and (3) can be obtained via
N → ∞ if we use Proposition 2.1.4 to get the necessary measurabilities
(which also works for our extended random variables) and the monotone
convergence formulated in Proposition 3.2.4 to get to values of the integrals.
Hence we can assume for the following that supω1 ,ω2 f (ω1 , ω2 ) < ∞.
(ii) We want to apply the Monotone Class Theorem Proposition 3.5.2. Let
H be the class of bounded F1 × F2 -measurable functions f : Ω1 × Ω2 → R
such that
(a) the functions ω1 → f (ω1 , ω20 ) and ω2 → f (ω10 , ω2 ) are F1 -measurable
and F2 -measurable, respectively, for all ωi0 ∈ Ωi ,
(b) the functions
Ω1 ×Ω2 Ω1 Ω2
Z Z
= f (ω1 , ω2 )dP1 (ω1 ) dP2 (ω2 ).
Ω2 Ω1
Again, using Propositions 2.1.4 and 3.2.4 we see that H satisfies the assump-
tions (1), (2), and (3) of Proposition 3.5.2. As π-system I we take the system
of all F = A×B with A ∈ F1 and B ∈ F2 . Letting f (ω1 , ω2 ) = 1IA (ω1 )1IB (ω2 )
we easily can check that f ∈ H. For instance, property (c) follows from
Ω1 ×Ω2
Ω1 Ω2 Ω1
= P1(A)P2(B).
Applying the Monotone Class Theorem Proposition 3.5.2 gives that H con-
sists of all bounded functions f : Ω1 × Ω2 → R measurable with respect
F1 × F2 . Hence we are done.
and
f (ω1 , ω2 )dP1 (ω1 )
Z
ω2 → 1IM2 (ω2 )
Ω1
f (ω1 , ω2 )d(P1 × P2 )
Z
Ω1 ×Ω2
Z
f (ω1 , ω2 )dP2 (ω2 ) dP1 (ω1 )
Z
= 1IM1 (ω1 )
Ω1 Ω2
Z
f (ω1 , ω2 )dP1 (ω1 ) dP2 (ω2 ).
Z
= 1IM2 (ω2 )
Ω2 Ω1
Ω1 Ω2
by Fubini’s Theorem.
56 CHAPTER 3. INTEGRATION
so that
Z 1 Z 1 Z 1 Z 1
f (x, y)dµ(x) dµ(y) = f (x, y)dµ(y) dµ(x) = 0.
−1 −1 −1 −1
58 CHAPTER 3. INTEGRATION
The inequality holds because on the right hand side we integrate only over
the area {(x, y) : x2 + y 2 ≤ 1} which is a subset of [−1, 1] × [−1, 1] and
Z 2π Z π/2
| sin ϕ cos ϕ|dϕ = 4 sin ϕ cos ϕdϕ = 2
0 0
g(Ef ) ≤ Eg(f )
Example 3.6.4 (1) The function g(x) := |x| is convex so that, for any
integrable f ,
|Ef | ≤ E|f |.
(2) For 1 ≤ p < ∞ the function g(x) := |x|p is convex, so that Jensen’s
inequality applied to |f | gives that
For the second case in the example above there is another way we can go. It
uses the famous Hölder-inequality.
Proof. We can assume that E|f |p > 0 and E|g|q > 0. For example, assuming
E|f |p = 0 would imply |f |p = 0 a.s. according to Proposition 3.2.1 so that
f g = 0 a.s. and E|f g| = 0. Hence we may set
f g
f˜ := and g̃ := .
(E|f |p ) (E|g|q ) q
1 1
p
We notice that
xa y b ≤ ax + by
for x, y ≥ 0 and positive a, b with a + b = 1, which follows from the concavity
of the logarithm (we can assume for a moment that x, y > 0)
ln(ax + by) ≥ a ln x + b ln y = ln xa + ln y b = ln xa y b .
1 1
|f˜g̃| = xa y b ≤ ax + by = |f˜|p + |g̃|q
p q
60 CHAPTER 3. INTEGRATION
and
E|f˜g̃| ≤ p1 E|f˜|p + 1q E|g̃|q = p1 + 1q = 1.
On the other hand side,
E|f˜g̃| = E|f g|
(E|f |p ) (E|g|q )
1 1
p q
Corollary 3.6.6 For 0 < p < q < ∞ one has that (E|f |p ) p ≤ (E|f |q ) q .
1 1
N N
! p1 N
! 1q
1 X 1 X 1 X
|an bn | ≤ |an |p |bn |q
N n=1 N n=1 N n=1
prove, we get that E|f +g|p < ∞ as well by the above considerations. Taking
1 < q < ∞ with p1 + 1q = 1, we continue by
1 − 1q = p1 .
P(|f − Ef | ≥ λ) ≤ E(f − Ef )2 E f2
≤ 2 .
λ2 λ
Proof. From Corollary 3.6.6 we get that E|f | < ∞ so that Ef exists. Ap-
plying Proposition 3.6.1 to |f − Ef |2 gives that
Modes of convergence
4.1 Definitions
Let us introduce some basic types of convergence.
E|fn − f |p → 0 as n → ∞.
For the above types of convergence the random variables have to be defined
on the same probability space. There is a variant without this assumption.
Eψ(fn) → ψ(f ) as n → ∞
63
64 CHAPTER 4. MODES OF CONVERGENCE
Example 4.1.4 Assume ([0, 1], B([0, 1]), λ) where λ is the Lebesgue mea-
sure. We take
f1 = 1I[0, 1 ) , f2 = 1I[ 1 ,1] ,
2 2
f3 = 1I[0, 1 ) , f4 = 1I[ 1 , 1 ] , f5 = 1I[ 1 , 3 ) , f6 = 1I[ 3 ,1] ,
4 4 2 2 4 4
f7 = 1I[0, 1 ) , . . .
8
This implies limn→∞ fn (x) 6→ 0 for all x ∈ [0, 1]. But it holds convergence in
λ
probability fn → 0: choosing 0 < ε < 1 we get
λ({x ∈ [0, 1] : |fn (x)| > ε}) = λ({x ∈ [0, 1] : fn (x) 6= 0})
1
2
if n = 1, 2
1 if n = 3, 4, . . . , 6
4
= 1
if n = 7, . . .
8
.
..
Then
f1 + · · · + fn P
−→ m as n → ∞,
n
that means, for each ε > 0,
lim P
f1 + · · · + fn
ω:| − m| > ε → 0.
n n
=
E Pn
( k=1 (fk − m))
2
n 2 ε2
2
nσ
= →0
n 2 ε2
as n → ∞.
Using a stronger condition, we get easily more: the almost sure convergence
instead of the convergence in probability.
inequality,
Efk2 ≤ Efk4 ≤ c.
2
66 CHAPTER 4. MODES OF CONVERGENCE
and ∞
\
T := Fn∞ .
n=1
Example 4.2.5 Let us come back to the set A considered in Formula (4.1).
For all n ∈ {1, 2, ...} we have
( ∞
)
X εk (ω)
A= ω: converges ∈ Fn∞
k=n
k
so that A ∈ T .
P(fn ≤ λ) = P(fk ≤ λ)
for all n, k = 1, 2, ... and all λ ∈ R.
Let (Ω, F, P) be a probability space and (fn )∞ n=1 be a sequence of i.i.d. ran-
dom variables with Ef1 = 0 and Ef1 = σ . By the law of large numbers we
2 2
know
f1 + · · · + fn P
−→ 0.
n
Hence the law of the limit is the Dirac-measure δ0 . Is there a right scaling
factor c(n) such that
f1 + · · · + fn
→ g,
c(n)
where g is a non-degenerate random variable in the sense that Pg 6= δ0 ? And
in which sense does the convergence take place? The answer is the following
Z x
P f1 + · · · + fn
√
σ n
≤x → √
1 u2
e− 2 du
2π −∞
for all x ∈ R as n → ∞, that means that
f1 + · · · + fn d
√ →g
σ n
P(g ≤ x) = u2
Rx
for any g with −∞
e− 2 du.
Index
Gaussian distribution on R, 22
σ-algebra, 8
σ-finite, 12
geometric distribution, 21
algebra, 8
Hölder’s inequality, 59
axiom of choice, 26
i.i.d. sequence, 67
Bayes’ formula, 17 independence of a family of events,
binomial distribution, 20 36
Borel σ-algebra, 11
Borel σ-algebra on Rn , 20
independence of a family of ran-
dom variables, 35
Carathéodory’s extension theorem, independence of a finite family of
19 random variables, 36
central limit theorem, 67 independence of a sequence of
Change of variables, 49 events, 15
Chebyshev’s inequality, 58 Jensen’s inequality, 58
closed set, 11
conditional probability, 16 Kolmogorov’s 0-1-law, 66
convergence almost surely, 63
convergence in Lp , 63 law of a random variable, 33
convergence in distribution, 63 Lebesgue integrable, 41
convergence in probability, 63 Lebesgue measure, 21, 22
convexity, 58 Lebesgue’s Theorem, 47
counting measure, 13 lemma of Borel-Cantelli, 17
lemma of Fatou, 15
Dirac measure, 12 lemma of Fatou for random vari-
distribution-function, 34 ables, 47
dominated convergence, 47
measurable map, 32
equivalence relation, 25 measurable space, 8
68
INDEX 69
measurable step-function, 29
measure, 12
measure space, 12
Minkowski’s inequality, 60
monotone class theorem, 52
monotone convergence, 45
open set, 11
Poisson distribution, 21
Poisson’s Theorem, 24
probability measure, 12
probability space, 12
product of probability spaces, 19
random variable, 30
Realization of independent random
variables, 38
step-function, 29
strong law of large numbers, 65
tail σ-algebra, 66
uniform distribution, 21
variance, 41
vector space, 51
71