You are on page 1of 66

Probability and Measure

Stefan Grosskinsky
Cambridge, Michaelmas 2006

These notes and other information about the course are available on
www.statslab.cam.ac.uk/stefan/teaching/probmeas.html
The text is based on and partially copied from notes on the same course by James Norris, Alan Stacey and Geoffrey Grimmett.

Contents
Introduction
1

Set systems and measures


1.1 Set systems . . . . . . . . . . . . . . . .
1.2 Measures . . . . . . . . . . . . . . . . .
1.3 Extension and uniqueness . . . . . . . . .
1.4 Lebesgue(-Stieltjes) measure . . . . . . .
1.5 Independence and Borel-Cantelli lemmas

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

5
5
6
8
10
13

Measurable Functions and Random Variables


15
2.1 Measurable Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.2 Random Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.3 Convergence of measurable functions . . . . . . . . . . . . . . . . . . . . . . 19

Integration
3.1 Definition and basic properties . . . .
3.2 Integrals and limits . . . . . . . . . .
3.3 Integration in R and differentiation . .
3.4 Product measure and Fubinis theorem

.
.
.
.

22
22
25
28
30

Lp -spaces
4.1 Norms and inequalities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.2 L2 as a Hilbert space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.3 Convergence in L1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

33
33
36
39

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

Characteristic functions and Gaussian random variables


41
5.1 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
5.2 Properties of characteristic functions . . . . . . . . . . . . . . . . . . . . . . . 42
5.3 Gaussian random variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

Ergodic theory and sums of random variables


6.1 Motivation . . . . . . . . . . . . . . . . . . .
6.2 Measure-preserving transformations . . . . .
6.3 Ergodic Theorems . . . . . . . . . . . . . . .
6.4 Limit theorems for sums of random variables

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

46
46
46
49
51

Appendix A Example sheets


A.1 Example sheet 1 Set systems and measures . . . . . . . . . . . . . . .
A.2 Example sheet 2 Measurable functions and integration . . . . . . . . .
A.3 Example sheet 3 Convergence, Fubini, Lp -spaces . . . . . . . . . . . .
A.4 Example sheet 4 Characteristic functions, Gaussian rvs, ergodic theory

.
.
.
.

.
.
.
.

.
.
.
.

53
53
55
57
59

Appendix B Hand-outs
B.1 Hand-out 1 Proof of Caratheodorys extension theorem . . . . . . .
B.2 Hand-out 2 Convergence of random variables . . . . . . . . . . . .
B.3 Hand-out 3 Connection between Lebesgue and Riemann integration
B.4 Hand-out 4 Ergodic theorems . . . . . . . . . . . . . . . . . . . . .

.
.
.
.

.
.
.
.

.
.
.
.

61
61
63
65
66

.
.
.
.

.
.
.
.

Introduction
Motivation from two perspectives:
1. Probability 
Let , P(), P be a probability space, where is a set, P() the set of events (power set in
this case) and P : P() [0, 1] is the probability measure.
If is countable then we have for every A P()
X

P(A) =
P {} .
A

So calculating probabilities just involves (possibly infinite) sums.


If = [0, 1] and P is the uniform probability measure on [0, 1] then for every it is
P() = 0. So
X


1 = P [0, 1] 6=
P {} .
w[0,1]

2. Integration (Analysis)
When P can be described by a density : [0, ) we can handle the situation via
Z
Z
P(A) =
(x) dx =
1A (x) (x) dx ,
A

()

where 1A is the indicator function of the set A. In the example above (x) 1 and this leads
 R1
Rb
to P [a, b] = 0 1[a,b] (x) dx = a dx = b a.
In general this approach only makes sense if the integral () exists. Using the theory of
Riemann-integration we are fine as long as A is a finite union or intersection
R 1 of intervals and
(x) is e.g. continuous. But e.g. for A = [0, 1] Q, the Riemann-integral 0 1A (x) dx is not
defined, although the probability for this event is intuitively 0. 

Moreover, since A is countable, it can be written as A = an : n N . Define
fn := 1{a1 ,...,an } with fn 1A for n . For every n, fn is Riemann-integrable and
R1
0 fn (x) dx = 0. So it should be the case that
Z 1
Z 1
?
lim
fn (x) dx = 0 =
1A (x) dx ,
n 0

but the latter integral is not defined. Thus the concept of Riemann-integrals is not satisfactory for two reasons: The set of Riemann-integrable functions is not closed, and there are too
many functions which are not Riemann-integrable.
Goals of this course
Generalisation of Riemann-integration to Lebesgue-integration using measure theory,
involving a precise treatment of sets A and functions for which () is defined
Using measure theory as the basis of advanced probability and discussing applications
in that area

Official schedule
Measure spaces, -algebras, -systems and uniqueness of extension, statement * and proof *
of Caratheodorys extension theorem. Construction of Lebesgue measure on R. The Borel
-algebra of R. Existence of non-measurable subsets of R. Lebesgue-Stieltjes measures and
probability distribution functions. Independence of events, independence of -algebras. The
Borel-Cantelli lemmas. Kolmogorovs zero-one law.
[6]
Measurable functions, random variables, independence of random variables. Construction of
the integral, expectation. Convergence in measure and convergence almost everywhere. Fatous lemma, monotone and dominated convergence, differentiation under the integral sign.
Discussion of product measure and statement of Fubini s theorem.
[6]
Chebyshev s inequality, tail estimates. Jensens inequality. Completeness of Lp for 1 p
. The Holder and Minkowski inequalities, uniform integrability.
[4]
2
L as a Hilbert space. Orthogonal projection, relation with elementary conditional probability.
Variance and covariance. Gaussian random variables, the multivariate normal distribution. [2]
The strong law of large numbers, proof for independent random variables with bounded fourth
moments. Measure preserving transformations, Bernoulli shifts. Statements * and proofs *
of maximal ergodic theorem and Birkhoff s almost everywhere ergodic theorem, proof of the
strong law.
[4]
The Fourier transform of a finite measure, characteristic functions, uniqueness and inversion.
Weak convergence, statement of Levys convergence theorem for characteristic functions. The
central limit theorem.
[2]
Appropriate books
P. Billingsley, Probability and Measure. Wiley 1995 (hardback).
R.M. Dudley, Real Analysis and Probability. CUP 2002 (paperback).
R.L.Schilling, Measures, Integrals and Martingales. CUP 2005 (paperback).
R.T. Durrett, Probability: Theory and Examples. Wadsworth a. Brooks/Cole 1991 (hardback).
D. Williams, Probability with Martingales. CUP (paperback).
From the point of view of analysis, the first chapters of this book might be interesting:
S. Kantorovitz, Introduction to Modern Analysis. Oxford 2003 (hardback).

Non-examinable material
proof of Caratheodorys extension theorem on hand-out 1
part (a) of the proof of Skorohods representation theorem on hand-out 2
connection between Lebesgue and Riemann integration on hand-out 3
Proof of the maximal ergodic lemma and Birkhoffs almost everywhere ergodic theorem
on hand-out 4 and in Section 6

Set systems and measures

Let E be an arbitrary set and E P(E) a set of subsets. To define a measure : E [0, )
(see section 1.2) we first need to identify a proper domain of definition.

1.1

Set systems

Definition 1.1. Say that E is a ring, if for all A, B E


(i) E

(ii) B \ A E

(iii) A B E .

Say that E is an algebra (or field), if for all A, B E


(i) E

(ii) Ac = E \ A E

(iii) A B E .

Say that E is a -algebra (or -field), if for all A and A1 , A2 , . . . E


[
(i) E
(ii) Ac E
(iii)
An E .
nN

Properties. (i) A (-)algebra is closed under (countably) finitely many set operations, since
[
c
\
c
A B = Ac B c E ,
An =
Acn , 1
nN
c

A\B=AB E ,

nN

A 4 B = (A \ B) (B \ A) E .

(ii) Thus: E is a -algebra E is an algebra E is a ring


In general the inverse statementss are false, but in special cases they hold:
(if E is finite)
(if E E)
Examples. (i) P(E) and {, E} are the largest and smallest -algebras on E, respectively
nS
o
n
(ii) E = R, R =
(a
,
b
]
:
a
<
b
,
i
=
1,
.
.
.
,
n,
n

N
is the
i
i
i
i
0
i=1
ring of half-open intervals ( is given by the empty intersection n = 0).
R is an algebra if we allow for infinite intervals and identify R = (, ].
nS
o

(iii) Beware:
(a
,
b
]
:
a
,
b

[,
],
a
<
b
,
i

N,
is a not a -algebra.
i
i
i
i
i
i
i=1
(see problem 1.9(a))


Lemma
1.1.
Let
E
:
i

I
i
T
Sbe a (possibly uncountable) collection of -algebras. Then
E
is
a
-algebra,
whereas
i
iI
iI Ei in general is not.
T
Proof. Let E = iI Ei . We check (i) to (iii) in the above definition:
c
(i) Since Ei for all i I, E. (ii) Since for all A E, Ac Ei for
S all i I, A E.
(iii) Let A1 , A
S2 , . . . E. Then Ak Ei for all k N and i I, hence nN An Ei for each
i I and so nN An E. For the second part see problem 1.1 (c).
2
Definition 1.2. Let A P(E). Then the -algebra generated by A is
\
(A) :=
E,
the smallest -algebra containing A.
EA
E alg.
1

as long as (ii) is fulfilled and are equivalent

Remarks. (i) If E is a -algebra, then (E) = E.


(ii) Let A1 , A2 P(E) with A1 A2 . Then (A1 ) (A2 ).

Examples. (i) Let =
6 A ( E. Then {A} = {, E, A, Ac }.
(ii) If E is a -algebra, so is A E = {A B : B E} for each A E, called the trace
-algebra of A.
The next example is so important, that we spend an extra definition.
Definition 1.3. Let (E, ) be a topological space with topology P(E) (set of open sets)2 .
Then ( ) is called the Borel -algebra of E, denoted by B(E). A B(E) is called a Borel
set. One usually denotes B(R) = B.
Lemma 1.2. Let E = R, R the ring of half-open intervals and I =
the set of all half-open intervals. Then (R) = (I) = B.


(a, b] : a < b

Proof.
S (i) I R (I) (R). On the other hand, each A R can be written as
(R) (I).
A = ni=1 (ai , bi ] (I). Thus R (I) T
1
(ii) Each A I can be written as A = (a, b] =
n=1 (a, b + n ) B (I) B.
Let A R be open, i.e. xA x >0 : (x x , x + x ) A. Thus
x A ax , bx Q : {x} (ax , bx ] A .
S
Then A = xA (ax , bx ] which is a countable union, since ax , bx Q.
Thus A (I) B (I).

Remarks. (i) The Borel -algebra on A R is B(A) = A B (trace -algebra of A).


d
nY
o
(ii) Analogously, B(Rd ) is generated by I d =
(ai , bi ] : ai < bi , 1 = 1, . . . , d and
i=1

this is consistent with d = 1 in the sense that B(Rd ) = B d .


Definition 1.4. Let E P(E) be a -algebra. The pair (E, E) is a measurable space and
elements of E are measurable sets.
If E is finite or countably infinite, one usually takes E = P(E) as relevant -algebra.

1.2

Measures

Definition 1.5. Let E be a ring on E. A set function is any : E [0, ] with () = 0.


is called additive if for all A, B E with A B = :

(A B) = (A) + (B).

is called countably additive (or -additive)


sequences (An )nN with
 [ if for all
[
X
Ai Aj = for i 6= j and  An E:  An =
(An ).
nN

nN

nN

Having a direct definition of open sets for E = Rd , there is S


also an axiomatic definition of a topology, namely
(i) and E
(ii) A, B : A B
(iii) iI Ai , given Ai for all i I
2

Note. countably additive additive



A1 = A, A2 = B, A3 = A4 = . . . =

Definition 1.6. Let (E, E) be a measurable space. A countably additive set function
: E [0, ] is called a measure, the triple (E, E, ) is called measure space.
If (E) < , is called finite. If (E) = 1, is a probability measure and (E, E, ) is a
probability space. If E is a topological space and E = B(E), then is called Borel measure.
Basic properties. Let (E, E, ) be a measure space.
(i) is non-decreasing: For all A, B E, A B it is (B) = (B \ A) + (A) (A).
(Note: The version (B \ A) = (B) (A) only makes sense if (A) < ).
(ii) is subadditive: For all A, B E,

(A B) (A) + (B) since

(A) + (B) = (A \ B) + (A B) + (B \ A) +(A B) =


|
{z
}
=(AB)

= (A B) + (A B) (A B) .
(Again: (A B) = (A) + (B) (A B) only if (A B) < .)
(iii) is also countably subadditive (see problem 1.6 (b)).
(iv) Let E1 E2 be -algebras. If is a measure on E2 , then it is also on E1 .
(v) For A E the restriction |A = (. A) is a measure on (E, E).
Remark. These properties also hold for countably additive set functions (called pre-measures)
on a ring , (i) and (ii) also for additive set functions on a ring.

1 , xA
.
Examples. (i) For every x E, the Dirac measure is given by x (A) =
0 , x 6 A
(ii) Discrete measure theory:
Let E be countable. Every measure on (E, P(E)) can be characterized by a mass
function m : E [0, ],
=

m(x) x

or equivalently A E : (A) =

xE

 X
{x} =
m(x) .

xA

xA

If m(x) 1 for all x E, is called counting measure.


S
(iii) Let E = R and R be the ring of half-open intervals. For A R write A =  ni=1 (ai , bi ]
with disjoint intervals, n N0 . We call this a standard representation
A. Although
Pof
n
it is not unique, the set function : R [0, ] with (A) :=
i=1 (bi ai ) is
independent of the particular representation and thus well defined.
Further, is additive and translation invariant, i.e. xR : (A + x) = (A), where
A + x := {x + y : y A}. The key question is:
Can be extended to a measure on B = (R)?
7

In order to attack this question in the next subsection, it is useful to introduce the following
property of set functions.
Definition 1.7. Let E be a ring on E and : E [0, ] an additive set function. is
continuous at A E, if
(i) is continuous from below:
[
given any A1 A2 A3 . . . in E with
An = A E

(An % A),

nN

then lim (An ) = (A)


n

(ii) is continuous from above:


\
given any A1 A2 A3 . . . in E with
An = A E

(An & A) and

nN

(An ) < for some n N, then lim (An ) = (A)


n

Lemma 1.3. Let E be a ring on E and : E [0, ] an additive set function. Then:
(i) is countably additive

is continuous at all A E

(ii) is continuous from below at all A E

is countably additive

(iii) is cont. from above at and (A) < for all A E

is countably additive

Remark. The condition (An ) < for some n N in (ii) of the definition is necessary for
measures to be continuous. Consider e.g. E = N, Ak = {k, k + 1, . . .} and the counting
measure. Then (Ak ) = for all k N, but (A) = () = 0.
Proof. (i) Given An % A in E, then A = (A1 \ A0 ) (A2 \ A1 ) (A3 \ A2 ) . . . (A0 = )

(A) =

 m1

[
(An+1 \ An ) = lim  (An+1 \ An ) = lim (Am ) .
m

n=0

n=0

Given An & A in E and (Am ) < for some m N. Let Bn := Am \ An for n m. Then
Bn % (Am \ A) for n and thus, following the above,
n

(Am ) (An ) = (Bn ) (Am \ A) = (Am ) (A) .


Since (Am ) < this implies lim (An ) = (A).
n
(ii) see problem 1.6 (a)
(iii) analogous to (i) and (ii)

1.3

Extension and uniqueness

Theorem 1.4. Caratheodorys extension theorem


Let E be a ring on E and : E [0, ] be a countably additive set function. Then there
exists a measure 0 on E, (E) such that 0 (A) = (A) for all A E.

Proof. The proof is not examinable and is given on Hand-out 1 in the appendix.
To formulate a result on uniqueness two further notions are useful.
Definition 1.8. Let E be a set. E P(E) is called a -system if
E is called a d-system if
(i) E E ,

A, B A : A B A .

(ii) A, B E, A B : B \ A E ,
(iii) A1 , A2 , . . . E : A1 A2 . . .

An E .

nN

Remarks.


(i) The set I {} = (a, b] : a < b {} is a -system on R and we have shown in
Lemma 1.2 that (I) = B.
(ii) E is a -algebra E is a - and a d-system

(see problem 1.8 (a)).

Lemma 1.5. Dynkins -system lemma


Let E be a -system. Then for any d-system D E it is D (E).
T
Proof. The intersection (E) = DE D of all d-systems containing E is itself a d-system.
We shall show that (E) is also a -system. Then it is also a -algebra and for any d-system
D E we have D (E) (E), thus proving the lemma. Consider


D0 = B (E) : B A (E) for all A E (E) .
Then E D0 because E is a -system. We check that D0 is a d-system, and hence D0 = (E).
(i) clearly E D0 ;
(ii) suppose B1 , B2 D0 with B1 B2 , then for A E we have
(B2 \ B1 ) A = (B2 A) \ (B1 A) (E) ,
because (E) is a d-system, so B2 \ B1 D0 ;
(iii) finally, if Bn D0 and Bn % B, then for A E
Bn A % B A (E)

B D0 .

Now consider


D00 = B (E) : B A (E) for all A (E) D0 .
Then E D00 because D0 = (E). We can check that D00 is a d-system, just as we did for D0 .
Hence D00 = (E) which shows that (E) is a -system.
2
Theorem 1.6. Uniqueness of extension
Let E P(E) be a -system. Suppose that 1 , 2 are measures on (E) with
1 (E) = 2 (E) < . If 1 = 2 on E then 1 = 2 on (E).
Equivalently, if (E) < the measure on (E) is uniquely determined by its values on the
-system E.



Proof. Consider D = A (E) : 1 (A) = 2 (A) (E). By hypothesis, E D.
For A, B D with A B we have
1 (B \ A) = 1 (B) 1 (A) = 2 (B) 2 (A) = 2 (B \ A) < ,
thus also B \ A D. If An D, n N, with An % A, then
1 (A) = lim 1 (An ) = lim 2 (An ) = 2 (A)
n

AD.

Thus D (E) is a d-system containing the -system E, so D = (E) by Dynkins lemma. 2


These theorems provide general tools for the construction and characterisation of measures
and will be applied in a specific context in the next subsection.

1.4

Lebesgue(-Stieltjes) measure

Theorem 1.7. There exists a unique Borel measure on (R, B) such that

(a, b] = b a , for all a, b R with a < b.
The measure is called Lebesgue measure on R.
Proof. P
(Existence) Let R be the Sring of half-open intervals. Consider the set function
(A) = ni=1 (bi ai ), where A =  ni=1 (ai , bi ], n N0 . We aim to show that is countably
additive on R, which then proves existence by Caratheodorys extension theorem.
Since (A) < for all A R, by Lemma 1.3 (iii) it suffices to show that is continuous
from above at . Suppose not. Then there exists  > 0 and An & with (An ) 2 for all n.
For each n we can find Cn R with C n An and (An \ Cn ) 2n (see problem 1.9 (b)).
Then

n
n
X
X
 X
2k =  ,
(Ak \ Ck )
(An \ Ck )
An \ (C1 . . . Cn )
k=1

k=1

k=1

and since (An ) 2 we have (C1 . . . Cn )  and in particular C1 . . . Cn 6= . Thus


Kn = C 1 . . . C n , n N is a monotone sequence of compact non-empty sets. Thus there
exists a sequence (xn )nN with xn Kn which has
at least one accumulation
point xT
, since all
T
T

xn K1 which is compact. Since Kn &, x nN Kn . Thus 6= nN Kn nN An


which is a contradiction to An & .
(Uniqueness) For each n Z define

n (A) := (n, n + 1] A for all A B .
Then n is a probability measure on (R, B), so, by Theorem P
1.6, n is uniquely determinded
by its values on the -system I generating B. Since (A) = nZ n (A) it follows that is
also uniquely determined.
2
Definition 1.9. A R is called
null if A B B with (B) = 0. Denote by N the

set of all null sets. Then L = B N : b B, N N is the completion of B and is called
Lebesgue -algebra. The sets in L are called Lebesgue-measurable sets or Lebesgue sets.

10

Remark. The Lebesgue measure can be extended to L via

(see problem 1.10)

(B N ) := (B) for all B B, N N .


In some books only is called Lebesgue measure. This makes sense e.g. in analysis, where
one usually works with a fixed measure space (R, L, ).
Theorem 1.8. B ( L ( P(R).

Moreover3


card(B) = card(R) = c whereas card(L) = card P(R) = 2c .
Proof. (i) According to problem 1.4, B is separable, i.e. generated by a countable set system E. Thus card(B) card P(E) = c. On the other hand, {x} B for all x R
and thus card(B) = c. With problem 1.10 the Cantor set C R is uncountable and thus
card(C) = c. Since also (C) = 0, with Definition 1.9 we have P(C) L and thus
card(L) = card P(R) = 2c > c.
(ii) Using the axiom of choice we construct a subset of U = [0, 1] which is not in L:
Define the equivalence relation on U by x y if x y Q.
Write {Ei : i I} for the equivalence classes of and let R = {ei : i I} be a
collection of representatives ei Ei , chosen by the axiom of choice. Then U can be partitioned
[
[ [
[ [
[
 (ei + q) =  (R + q) ,
U =  Ei = 
 (ei + q) = 
iI

qQ[0,1) iI

iI qQ[0,1)

qQ[0,1)

{z

=R+q

where + is to be understood modulo 1. Suppose R L, then R + q L and (R) = (R + q)


for all q Q by translation invariance of . But by countable additivity of this means
X
(R) = (U ) = 1 ,
qQ[0,1)

which leads to a contradiction for either (R) = 0 or (R) > 0. Thus R 6 L.

Remarks. (i) Every set of positive measure has non-measurable subsets and, moreover:
P(A) L

(A) = 0 .

(ii) There exists


no translation invariant, countably additive set function on P(R) with

[0, 1] (0, ).
Definition 1.10. F : R R is called distribution function if
(i) F is non-decreasing

(ii) F is right-continuous, i.e. lim F (x) = F (x0 ) .


x&x0

F is a probability distribution function if in addition


(iii) lim F (x) = 1 ,
x

lim F (x) = 0 .

Notation: The cardinality of a countable set (such as N or Q) is 0 , for a continuous


set (such as P(N), R or

the Cantor set C) it is 20 = c. For power sets of the latter we just write card P(R) = 2c .

11

Proposition 1.9. Let be a Radon measure4 on (R, B). Then for every r R



(r, x]  , x > r
Fr (x) :=
in particular Fr (r) = () = 0
(x, r] , x r

is a distriburion function with (a, b] = Fr (b) Fr (a), a < b .
Also Fr + C is a distribution function with that property for all C R.
The Fr differ only in an additive constant, namely Fr (x) = F0 (x) F0 (r) for all r R.

Proof. Fr % by monotonicity of and (a, b] = Fr (b)  Fr (a) for b > a by definition. For xn & x > r it is Fr (xn ) = (r, xn ] & (r, x] = Fr (x) by continuity of
measures. For xn & r we have (r, xn ] & such that Fr (xn ) & () = 0 = Fr (r) .
The Fr differ only in a constant, since for all x > r


() = (r, x] (r, x] = F0 (x) F0 (r) Fr (x) + Fr (r) = 0 ,
and Fr (r) = 0 = F0 (r) F0 (r) . Both statements follow analogously for x < 0.

Remarks. (i) The distribution functions for the Lebesgue measure are Fr (x) = x + r, r R.
(ii) Note that for xn % x we have (xn , x] % {x} =
6 , so that Fr as defined in Proposition
1.9 is in general not left-continuous.
(iii) If is a probability measure one usually uses the cumulative distribution function

CDF (x) := F (x) = (, x] .

E.g. for the Dirac measure a concentrated in a R, CDFa (x) =

0 , x<a
.
1 , xa

On the other hand, a distribution function uniquely determines a Radon measure.


Theorem 1.10. Let F : R R be a distribution function. Then there exists a unique Radon
measure F on (R, B), such that

F (a, b] = F (b) F (a) for all a < b .
The measure F is called the Lebesgue-Stieltjes measure of F .
Proof. As for Lebesgue measure, the set function F is well defined on the ring R via
(A) :=

n
X


F (bi ) F (ai )

where A =

i=1

n
[

(ai , bi ] .

i=1

The proof is then the same as that of Theorem 1.7 for Lebesgue measure.

Remark. Analogous to Definition 1.9, B can also be completed with respect to the LebesgueStieltjes measure F . However, the completion LF depends on the measure F . Although L
is much larger than B (see Theorem 1.8), it is therefore preferable to work with the measure
space (R, B), since it is defined independent of the measure and all F will have the same
domain of definition.
4

i.e. (K) < for K B compact

12

1.5

Independence and Borel-Cantelli lemmas

Let (E, E, P) be a probability space. It provides a model for an experiment whose outcome
is random. E describes the set of possible outcomes, E the set of events (observable sets of
outcomes) and P(A) is the probability of an event A E.
Definition 1.11. The events (Ai )iI , Ai E, are said to be independent if
\  Y
P
Ai =
P(Ai ) for all finite, nonempty J I .
iJ

iJ

The -algebras (Ei )iI , Ei E are said to be independent if the events (Ai )iI are independent
for any choice Ai Ei .
A useful way to establish independence of two -algebra is given below.
Theorem 1.11. Let E1 , E2 E be -systems and suppose that
P(A1 A2 ) = P(A1 ) P(A2 ) whenever A1 E1 , A2 E2 .
Then (E1 ) and (E2 ) are independent.
Proof. Fix A1 E1 and define the measures , by
(A) = P(A1 A) ,

(A) = P(A1 ) P(A) for all A E .

and agree on the -system E2 with (E) = (E) = P(A1 ) < . So, by uniqueness of
extension (Theorem 1.6), for all A1 E1 and A2 (E2 )
P(A1 A2 ) = (A2 ) = (A2 ) = P(A1 ) P(A2 ) .
Now fix A2 (E2 ) and repeat the same argument with
0 (A) := P(A A2 ) ,

0 (A) := P(A) P(A2 )

to show that for all A1 (E1 ) we have P(A1 A2 ) = P(A1 ) P(A2 ).



Remark. In particular, the -algebras {A1 } and {A2 } generated by single events are
independent if and only if A1 and A2 are independent.
Background. Let (an )nN be a sequence in R. Then lim an does not necessarily exist,
n

e.g. for an = (1)n . To nevertheless study asymptotic properties of (an )nN consider
an = inf ak
kn

and an = sup ak .
kn

Then an % and an & are monotone and both have limits in R = R {, +}. Define
lim inf an := lim inf ak
n

n kn

and

lim sup an := lim sup ak ,


n

n kn

which are equal to the smallest and largest accumulation point of (an )nN , respectively. In
general lim inf an lim sup an since am amn an for all m, n N. They may be
n

13

different, as e.g. lim inf (1)n = 1 < 1 = lim sup(1)n . Both are equal if and only if
n

lim an exists. Note that a sequence may have much more than two accumulation points, e.g.

for an = sin n the set of accumulation points is [1, 1], lim inf an = 1 and lim sup an = 1.
n

We use
S the same concept for a sequence (An )nN of subsets of a set E. Note that
and kn Ak & are monotone in n.

kn Ak

Definition 1.12. Let (An )nN be a sequence of sets in E. Then define the sets
[ \
\ [
lim inf An :=
Ak and lim sup An :=
Ak .
n

nN kn

nN kn

Remark. This definition can be interpreted as






lim inf An = x : nN kn x Ak = x : x An for all but finitely many n
n




lim sup An = x : nN kn x Ak = x : x An for infinitely many n .
n

One also writes lim inf An =An ev. (eventually) and lim sup An =An i.o. (infinitely often).
n

Properties. (i) Let E be a -algebra. If An E for all n N then lim inf An , lim sup An E.
n

(ii) lim inf An lim sup An


n

(iii)

lim sup An
n

since

c

= lim inf Acn


n

x An eventually x An infinitely often

since

x An finitely often x Acn eventually.

Lemma 1.12. First Borel-Cantelli lemma


Let (E, E, ) be a measure space and (An )nN a sequence of sets in E.
X
If
(An ) < then (An i.o.) = 0 .
nN

 X

[
\ [
(Ak ) 0
Ak
Proof. (An i.o.) =
Ak
nN kn

kn

for n .

kn

Lemma 1.13. Second Borel-Cantelli lemma


Let (E, E, P) be a probability space and suppose that (An )nN are independent.
X
If
P(An ) = , then P(An i.o.) = 1 .
nN

Proof. We use the inequality 1 a ea . With (An )nN also (Acn )nN are independent
(see problem 1.11). Then we have for all n N
\
 Y
h X
i

P
Ack =
1 P(Ak ) exp
P(Ak ) = 0 .
kn

Hence

kn

kn

P(An i.o.) = 1 P lim inf Acn = 1 P


n

[ \
nN kn

14


Ack = 1 .

Measurable Functions and Random Variables

2.1

Measurable Functions

Definition 2.1. Let (E, E) and (F, F) be measurable spaces. A function f : E F is called
measurable (with respect to E and F) or E/F-measurable if



A F : f 1 (A) = x E : f (x) A E
short: f 1 (F) E .
Often (F, F) = (R, B) or (R, B) with the extended real line R = R {, } and
B = B C : b B, C {, } . If in addition E is a topological space with
E = B(E), f is called Borel function.
Remarks. (i) Every function f : E F is measurable w.r.t. P(E) and F.
(ii) Preimages of functions preserve the set operations
f 1

[
nN


[
An =
f 1 (An ) ,

c
f 1 (Ac ) = f 1 (A) ,

nN

since e.g. {x E : f (x) 6 A} = {x E : f (x) A}c . Note


c that this second
c
property does not hold for images since in general f (A ) f (A) for A E.
(iii) With (ii) the following holds for any function f : E F :
If F is a -algebra on F then (f ) := f 1 (F) is a -algebra on E, called -algebra
generated by f . This is the smallest-algebra on E w.r.t. which
f is measurable.
If E is a -algebra on E then C = A F : f 1 (A) E is the largest -algebra on
F w.r.t. which f is measurable. Note that C =
6 f (E), which is in general not a -algebra.
Lemma 2.1. Let f : E F and F = (A) for some A P(F ).
Then f is E/F-measurable if and only if f 1 (A) E for all A A.


Proof. According to Remark (iii), C := A F : f 1 (A) E is a -algebra on F .
Now if A C then (A) = F C and f is E/F-measurable. On the other hand, if f is
E/F-measurable then certainly f 1 (A) E for all A A F.
2
Lemma 2.2. f : E R is E/B-measurable if and only if one of the following holds:
 

(i) f 1 (, c] = x E : f (x) c E for all c R ,
 

(ii) f 1 (, c) = x E : f (x) < c E for all c R ,
 

(iii) f 1 [c, ) = x E : f (x) c E for all c R ,
 

(iv) f 1 (c, ) = x E : f (x) > c E for all c R .

Proof. (i) In problem 1.3 it was shown that B = {(, c] : c R} . The statement then
follows with Lemma 3.1.
(ii) (iv) Show analogously that B is generated by the respective sets.
2

15

Lemma 2.3. Let E and F be topological spaces and f : E F be continuous (i.e.


f 1 (U ) E open whenever U F open). Then f is measurable w.r.t. B(E) and B(F ).


Proof. Let F = U F : U open be the topology on F . Then for all U F , f 1 (U ) is
open and thus f 1 (U ) B(E). Since B(F ) = (F ), f is measurable with Lemma 2.1. 2
However, not every measurable function is continuous. Next we introduce another important
class of functions which turn out to be measurable.
Definition 2.2. Let 1A : E R be the indicator function of A E.
f : E R is called simple if
f=

n
X

ci 1Ai

for some n N, ci R and A1 , . . . , An E .

i=1

We call this a standard representation of f if the Ai 6= , ci 6= cj and Ai Aj = for i 6= j. f


is called E-simple if there exists a standard representation such that Ai E for all i = 1, . . . , n.
Let S(E) denote the set of all E-simple functions.
Remark. (i) Simple fcts are more general than step functions, where the Ai are intervals.
(ii) Standard representations are not unique, the order of indices may change and a ci may
or may not take the value 0.
Lemma 2.4. Let (E, E) be a measurable space.
(i) A simple function f : E R is E/B-measurable if and only if f S(E).
(ii) S(E) is a vector space, i.e. if f1 , f2 S(E) then 1 f1 + 2 f2 S(E) for all 1 , 2 R.
In addition f1 f2 S(E).
n
P

ci 1Ai be a simple function in standard representation.


S
Ai E.
If Ai E for i = 1, . . . , n then for all B B , f 1 (B) =
i:ci B

On the other hand, if Ai 6 E for some i then f 1 {ci } =P
Ai 6 E and f is not measurable.
P
(ii) Let f1 , f2 S(E) with standard representations f1 = ni=1 ci 1Ai and f2 = m
i=1 di 1Bi .
Define Cij = Ai Bj . Then the {Cij } are disjoint and each Cij E as well as all possible
unions of Cij s. f1 f2 and 1 f1 + 2 f2 are constant on each Cij and thus E-simple.
2
Proof. (i) Let f =

i=1

Remark. In particular, 1A : E R is E/B-measurable if and only if A E.


Lemma 2.5. Let f1 : E1 E2 and f2 : E2 E3 . If f1 is E1 /E2 -measurable and f2 is
E2 /E3 -measurable, then f2 f1 : E1 E3 is E1 /E3 -measurable.

Proof. For every A E3 , (f2 f1 )1 (A) = f11 f21 (A) E1 since f21 (A) E2 .

16

Proposition 2.6. Let fn : E R, n N, be E/B-measurable. Then so are


(i) c f1

for all c R

(iv) inf fn
nN

(ii) f1 + f2

(v) sup fn

(vi) lim inf fn


n

nN

whenever they are defined ( and

(iii) f1 f2

(vii) lim sup fn ,


n

are not well defined).



inf fn : x 7 inf fn (x) : n N R and
nN



lim inf fn : x 7 lim inf fk (x) : k n R .

Remarks. (i) Notation:

(ii) In particular f1 f2 = max{f1 , f2 } and f1 f2 = min{f1 , f2 } are measurable.


Whenever it exists, also lim fn is measurable.
n

(iii) f = f + f meas. f + = f 0 and f = (f ) 0 meas. |f | = f + +f meas.


The inverse of the last implication is in general false, e.g. f + =1A and f =1Ac for A6E.
Proof. (i) If c 6= 0 we have for all y R



x E : c f1 (x) y = x E : f1 (x) y/c E since f1 measurable .



E , y0
If c = 0 it is x E : 0 y =
E, so c f1 is measurable for all c R.
, y<0
(ii) see example sheet

(iii) f1 f2 = 14 (f1 + f2 )2 (f1 f2 )2 is measurable with (i), (ii), Lemma 2.3 and 2.5,
since g : R R, g(x) = x2 is continuous and thus measurable.
(iv) (vii) see example sheet

Definition 2.3. Let (E, E) and (F, F) be measurable spaces and let be a measure on (E, E).
Then any E/F-measurable function f : E F induces the image measure = f 1
on F, given by (A) = f 1 (A) for all A F .
Remark. is a measure since f 1 (A) E for all A F and f 1 preserves set operations as has been shown above.

2.2

Random Variables

Let (, A, P) be a probability space and (E, E) a measurable space.


Definition 2.4. An A/E-measurable function X : E is called random variable in E
or simply random variable (if E = R). The image measure X = P X 1 on (E, E) is called
law or (probability) distribution of X.
For E = R the cumulative probability distribution function FX = CDFX : R [0, 1] with


FX (x) = X (, x] = P { : X() x} = P(X x)
is called distribution function of X.

17

Remark. By Proposition 1.9 and Theorem 1.10, X is characterised by FX . Usually random variables are given by their distribution function without specifying (, A, P) and the
function X : R.
Definition 2.5. The random variables (Xn )nN defined on (, A, P) and taking values in
(E, E), are called independent if the -algebras (Xn ) = Xn1 (E) A are independent.
Lemma 2.7. Real random variables (Xn )nN are independent if and only if



P X1 x1 , . . . , Xk xk = P X1 x1 P Xk xk
for all x1 , . . . , xk R, k N.
Proof. see problem 2.5 for two random variables X1 , X2 .
This extends to X1 , . . . , Xk , noting that by continuity of measures e.g. for k = 3


P X1 x1 , X3 x3 = lim P X1 x1 , X2 x2 , X3 x3 =
x2

= lim P(X1 x1 )P(X2 x2 )P(X3 x3 ) = P(X1 x1 )P(X3 x3 ) .


x2

Remark. This Lemma provides a characterisation of independence using only distribution


functions. But to relate this to the definition, the functions Xn have to be defined on the same
probability space. Although one usually does not bother to define them, at least it has to be
possible to do so. This is guaranteed by the next theorem which is, although rarely used in
practice, of great conceptual importance. It is split in two parts, the secon is Theorem 2.?.
Theorem 2.8. Skorohod representation theorem part 1
For all probability distribution functions F1 , F2 , . . . : R [0, 1] there exists a probability
space (, A, P) and random variables X1 , X2 , . . . : R such that Fn is the distribution
function of Xn for each n. The Xn can be chosen to be independent.
Proof. see problem 2.4, for independence see hand-out 2
Remark. (Xn )n0 is often regarded as a stochastic process with state space E and discrete
time n. The -algebra generated by X0 , . . . , Xn ,
Fn = (X0 , . . . , Xn ) =

n
[


Xi1 (E) A ,

i=0

contains events depending measurably on X0 , . . . , Xn and represents what is known about the
process by time n. Fn Fn+1 for each n and the family (Fn )nN is called the filtration
generated by the process (Xn )n0 .
Definition 2.6. Let (Xn )nN be a sequence of random variables. Define
\
Tn = (Xn+1 , Xn+2 , . . .) (& in n) and T =
Tn A .
nN

Then T is called the tail -algebra of (Xn )nN and elements in T are called tail events.

18





Example. A = : lim Xn () exists T since A = : lim XN +n () exists
n
n


for every fixed N N. Similarly, : lim sup Xn () = 137 T .
n

Theorem 2.9. Kolmogorovs 0-1-law


Suppose (Xn )nN is a sequence of independent random variables. Then every tail event has
probability 0 or 1. Moreover, any T -measurable random variable Y is almost surely constant,
i.e. P(Y = c) = 1 for some c R.
Proof. The -algebra Fn = (X1 , . . . , Xn ) is generated by the -system of events


A = X1 x1 , . . . , Xn xn ,
whereas Tn is generated by the -system of events


B = Xn+1 xn+1 , . . . , Xn+k xn+k ,

kN.

Since P(A B) = P(A) P(B) for all such A and B by independence, Fn and Tn are independent bySTheorem 1.11 for all n N. Hence Fn and T are independent, since T Tn+1 .
Since n Fn is a -system generating the -algebra F = (Xn : n N) , F and T
are independent, again by Theorem 1.11. But T F and thus every A T is independent
of itself, i.e.
P(A) = P(A A) = P(A) P(A)

P(A) {0, 1} .

Let Y be a T -measurable random variable. Then FY (y) = P(Y y) takes values in {0, 1},
so P(Y = c) = 1 for c = inf{y R : FY (y) = 1}.
2
Remark. Kolmogorovs 0-1-law involves the -algebras generated by random variables, rather
than the random variables themselves. Thus it can be formulated without using r.v.s:
Let (Fn )nN be a sequence of independent -algebras in A. Let A be a tail event, i.e.
 [

\
A T , where T =
Tn with Tn =
Fm .
nm

nN

Then P(A) = 0 or P(A) = 1.

2.3

Convergence of measurable functions

Let (E, E, ) be a measure space.


Remark. Convergence to infinity
Let (xn )nN be a sequence in R. Here and in the following we say that
xn if y R N N n N : xn y ,
and xn % if in addition xn+1 xn for all n N (analogously xn and xn & ).
Remember that xn is unbounded form above if y R N N n N : xn y, i.e.
y R : xn y for infinitely many n, whereas xn means that y R : xn y for
all but finitely many n. This is not convergence in the usual sense, since either | xn | is not
well defined or is equal to .
19

Definition 2.7. We say that A E holds almost everywhere (short a.e.), if (Ac ) = 0. If
is a probability measure ((E) = 1) one uses almost surely (short a.s.) instead.
Let f, f1 , f2 , . . . : E R be measurable functions. Say that
(i) fn f

everywhere or pointwise if

fn (x) f (x) for all x E ,

(ii) fn f

almost everywhere (fn f ), almost surely (fn f ) for (E) = 1, if

a.s.

a.e.


{x E : fn (x) 6 f (x)} = (fn 6 f ) = 0 ,
(iii) fn f

in measure, in probability (fn f ) for (E) = 1, if

 > 0 : (|fn f | > ) 0

for

n.

Theorem 2.10. Let (fn )nN be a sequence of measurable functions.


(i) Assume that (E) < . If fn f a.e. then fn f in measure.
(ii) If fn f in measure then there exists a subsequence (nk )kN such that fnk f a.e..
Proof. (i) Set gn = fn f and suppose gn 0 a.e.. Then for every  > 0
 \



|gn | 
{|gm | } % |gn |  ev. (gn 0) = (E) .
mn

Hence (|gn | > ) 0 as n and fn f in measure.


(ii) Suppose gn = fn f 0 in measure. Thus (|gn | > k1 ) 0 for every k N and we
can find a subsequence (nk )kN such that
X


|gnk | > k1 < 2k and thus
|gnk | > k1 < .
kN

So, by the first Borel-Cantelli


lemma (Lemma 1.12) |gnk | >
c

1
|gnk | > k i.o. {gnk 0} and thus


gnk 6 0 |gnk | > k1 i.o. = 0 ,

1
k


i.o. = 0.

so gnk 0 a.e..

Definition 2.8. Let X, X1 , X2 , . . . be random variables with distribution functions F, F1 , F2 , . . ..


Say that X1 and X2 are identically distributed, written as X1 X2 , if F1 (x) = F2 (x), x R.
D

Say that Xn X in distribution (short Xn X) if for all continuity points x of F ,


Fn (x) = P(Xn x) P(X x) = F (x) as n .
Proposition 2.11. Let X, X1 , X2 , . . . be random variables on the probability space (, A, P).
Then

Xn X

Xn X

and

Xn c R
20

Xn c .

Proof. The first statement is proved on hand-out 2, the second follows directly from


 n
P |Xn c| >  = P Xn > c +  + P Xn < c  0 for all  > 0 .

Theorem 2.12. Skorohod representation theorem part 2


Let X, X1 , X2 , . . . be random variables such that Xn X in distribution. Then there exists a
probability space (, A, P) and random variables Y X, Y1 X1 , Y2 X2 , . . . defined on
such that Yn Y a.s. .
Proof. see problem 2.4
Example. Let X1 , X2 , . . . {0, 1} be independent random variables with
P(Xn = 0) = 1 1/n and P(Xn = 1) = 1/n .

Then  > 0 : P |X
|
>

= 1/n 0 as n , i.e. Xn 0 in measure.
n
P
On the other hand,
and {Xn = 1} are independent events.
n P(Xn = 1) =
Thus with the second Borel-Cantelli lemma
P(Xn 6 0) P(Xn = 1 i.o.) = 1 ,

and thus

21

Xn 6 0 a.s. .

Integration

3.1

Definition and basic properties

Let (E, E, ) be a measure space.


bf (x) 2n c
n defines
Theorem 3.1. Let f : E [0, ] be E/B-measurable. Then fn (x) =
2n
a sequence of E-simple, non-negative functions, such that fn % f pointwise as n .
n

Proof. We can write fn (x) =

2 n
X

1Ak,n (x) 2n k where

k=0

Ak,n



= f 1 2n k , 2n (k + 1) for k < 2n n

and A2n n,n = f 1



n, .

Since f is measurable, so are the sets Ak,n , and thus fn S(E) for all n N.
From the first representation it follows immediately that fn+1 (x) fn (x) for all x E and
that |fn (x) f (x)| 2n for n f (x), or fn (x) for f (x) = . Thus fn % f .
2
This motivates the following definition.
Definition 3.1. Let f : E Z [0, ] be Zan E/B-measurable
function. We define the inteZ
gral of f , written as (f ) =
f d = f d =
f (x) (dx) , by
E

Z
f d := sup

nZ

g d : g S(E), 0 g f

Z
g d :=

where

n
X

ck (Ak ) is the integral of an E-simple function g : E R with

k=1

standard representation g(x) =

n
X

ck 1Ak . We adopt 0 = 0 = 0.

k=1

g d is independent of the representation of the E-simple function g.


Z
Z
(ii) If f, g : E R are E-simple then:
f g
f d
g d and

Remarks. (i)


c1 f + c2 g d = c1

Z
f d + c2

g d

for all c1 , c2 R .

Lemma 3.2. Let f Z: E [0, ]


Z be measurable and (fn )nN a sequence in S(E) with
fn d %
f d .
0 fn % f . Then
E

R
Proof. E fn d E fP
d for all n N by definition of E f d. It remains to show
that for any E-simple g = nk=1 ak 1Ak f (with standard representation and ak 6= 0)
Z
Z
lim
fn d
g d .
n E

22



Choose  > 0 and set Bn := x E : fn (x) g(x)  . Thus Bn % E and for any A E:
(Bn A)R% (A).
Case (i): E g d = (Ar ) = for some r {1, . . . , n} and ar > 0. Then
Z
Z
Z
fn 1Bn Ar d (g ) 1Bn Ar d = (ar ) (Bn Ar )
fn d
E

as n , Rprovided  < ar .
S
Case (ii): E g d < for A = nk=1 Ak it is (A) < . Then
Z
Z
Z
fn d
fn 1Bn A d (g ) 1Bn A d =
E
E
ZE
Z
=
g 1Bn A d  (Bn A)
g d  (A) as n .
E

Z
This is true for  arbitrarily small and thus

lim

n E

fn d

g d .
E

Definition 3.2. Let f : E R be a measurable function.


measurable
R + Then f and f Rare
with Proposition 2.6. f is called (-)integrable if E f d < and E f d <
and the integral of f is defined as
Z
Z
Z
+
f d =
f d
f d R .
E

Z
For random variables X : R the integral

X dP = E(X) is also called expectation.


Z
Z
For A E and f integrable, f 1A is integrable and we write
f d :=
f 1A d.

Remark.
The integral
R can be well defined even if f is not integrable, namely if either
R +
f
d
=

or
E
E f d = , it takes aRvalue . In particular a measurable function f : E [0, ] is integrable if and only if E f d < .
Theorem 3.3. Basic properties of integration
Let f, g : E R be integrable functions on (E, E, ).
(i) Linearity: f + g and, for any c R, c f are integrable with
Z
Z
Z
Z
Z
(f + g) d =
f d +
g d ,
(c f ) d = c
f d .
E

Z
(ii) Monotonicity: f g

Z
f d

g d .
E

Z
(iii) f 0 and

Z
f d = 0

f = 0 a.e.

f d = 0 .
E

Let f : E R be measurable. Then


Z
(iv) f integrable

|f | integrable,

and in this case


E

23

Z



|f | d
f d .
E

Proof. (i) If f, g 0 choose sequences of E-simple functions with 0 fn % f and 0


gn % g. Then fn + gn is E-simple for all n N and
Z
Z
Z
(fn + gn ) d =
fn d +
gn d .
E

Since 0 fn + gn % f + g it follows by Lemma 3.2:

(f + g) d =
E

For f, g : E R we have

Z
(f + g)+ (f + g) = f + f + g +

E
g

g d .

f d +
E

and thus

(f + g)+ + f + g = (f + g) + f + + g + .
Since each of the terms is non-negative we also have
Z
Z
Z
Z
Z
Z
(f + g)+ d +
f d +
g d = (f + g) d +
f + d +
g + d
E

and the statement follows by reordering the terms.


Z

For c f , c 0, analogously, and for c < 0 use 0 = (f f ) d =


f d + (f ) d .
E Z
E
Z E
Z
Z
(ii) With f g using (i):
f d = (f g) d +
g d .
g d
E
E
E
E
|
{z
}
0
S
(iii) Let f 0 and suppose that (f > 0) > 0. Then, since {f > 0} =
mN {f 1/m},
R
we have (f 1/n) =  > 0 for some n N. Thus f n1 1f 1/nR and E f dR /n > 0 .
+

On the other hand letR f = 0 a.e. f + , f = 0 a.e.


E f d = E f d = 0
by definition, since E g d = 0 for all E-simple 0 g f .
 +
f f = f
+

and (i), (ii).


2
(iv) Follows with |f | = f + f
f f + = f
Remark. Let f, g : E
and f = g a.e.. Then f is integrable if and only if g
R R be measurable
R
is integrable, and then E f d =R E g d, which is a direct consequence of (iii). In particular,
whenever f 0 a.e. the integral E f d [0, ] is well defined.
Proposition 3.4. Let (E, E, ) and (F, F, ) be measure spaces and suppose that = f 1
is the image measure of a measurable f : E F . Then
Z
Z
g d = (g f ) d for all integrable g : F R .
F

Remark. In particularZfor random variables X with distribution X this leads to the useful

formula E g(X) =
g(x) X (dx) .
R


Proof. For g = 1A , A F, the identity (A) = f 1 (A) is the definition of .
The identity extends to all F-simple functions by linearity of integration, then to all measurable
g : F [0, ] with Lemma 3.2, using the approximations gn = 2n b2n gc n, and finally to
all integrable g = g + g : F R again by linearity.
2

24

Examples. This notion of integration includes in particular Riemann integrals as is discussed


in section 3.3, but is much more general than the latter.
R
(i) E f dy = f (y) for all integrable f : E R and y E.
n
n
R
P
P
ck y (Ak ) = g(y) , since in standard
ck 1Ak S(E) , E g dy =
For g =
k=1

k=1

representation y (Ak ) = 1 for exactly one k. So for E-simple fn % f : E [0, ]


Z

Z
f dy = lim

n E

fn dy = lim fn (y) = f (y) by Lemma 3.2 ,


n

which extends to f : E R by linearity of integration.



(ii) Let (E, E) = N, P(N) and be the counting measure. Then for all f : N R
Z

Z
f d = lim

n E

By our definition:

f 1{1,...,n} d = lim

n
X

f integrable

 X
f (k) {k} =
f (k) .

k=1

k=1

|f | integrable

So f (k) = (1)k /k is not integrable, although

|f (k)| < .

k=1

(1)k /k = ln 2 (0 =0 ).

k=1

3.2

Integrals and limits

We are interested under which conditions fn f implies


Let (E, E, ) be a measure space.

fn d

lim fn d.

Theorem 3.5. Monotone convergence


Let f, fZ1 , f2 , . . . : EZ R be measurable with fn 0 a.e. for all n N and fn % f a.e..
fn d %

Then
E

f d.
E

Proof. Suppose first that fn 0 and fn % f pointwise.


k
k
For each n N let (f
n n)kN be an sequence of E-simple functions with 0 fn % fn as k
and let gn := max f1 , . . . , fn . Then gn is an increasing sequence of E-simple functions
n g f for each m n, n N. Taking the limit n we get
with fm
n
n
fm g f

for each m N

with g = lim gn : E [0, ] .


n

Z
Taking the limit m gives g = f . Hence

Z
f d = lim

n E

gn d by Lemma 3.2. But

Z
gn d
fn d for each m n, n N and so with
E
E Z
E
Z
Z
n :
fm d
f d lim
fn d
n E
E
E
Z
Z
Z
m :
lim
fm d
f d lim
fn d .

n
fm
d

m E

n E

25


S
Now, let fn 0 a.e. and fn % f a.e.. Since N = n {fn < 0}{fn > fn+1 } {fn 6 f }
is a countable union of null sets, (N ) = 0. Then use monotone convergence on N c to get
Z
Z
Z
Z
f d .
2
f d =
fn d %
fn d =
Nc

Nc

Lemma 3.6. Fatous lemma


Let f1 , f2 , . . . : E R be measurable functions with fn 0 a.e. for all n N. Then
Z
Z
fn d .
lim inf fn d lim inf
n

S
Proof. As previously, N = n {fn < 0} is a null set, so suppose fn 0 pointwise w.l.o.g..
Let gn := inf fk . Then the gn are measurable by Proposition 2.6 and gn % lim inf fn .
kn
Zn
Z
Z
lim inf fn d
gn d
fn d
So since fn gn and by monotone convergence:
n

which proves the statement, taking lim inf n on the left-hand side.

Theorem 3.7. Dominated convergence


Let f, f1 , f2 , . . . : E R be measurable and g, g1 , g2 , . . . R: E [0, ]
R be integrable with
fn f a.e., gn g a.e., |fn | gn a.e.
Z for all n ZN and E gn d E g d < .
fn d

Then f and the fn are integrable and


E

f d .
E

R
R
Proof. f, fn are integrable
since
with
|f
|

g
and
|f
|

g,
|f
|
d,
|f | d < .
n
n
n
S
As before, N = n {fn > |gn |} {fn 6 f } {gn 6 g} is a null set which does not affect
the integral, so we assume pointwise validity of the assumptions w.l.o.g..
We have 0 gn fn g f , so lim inf n (gn fn ) = g f . By Fatous lemma,
R
R
R
R
R
R
g d+ f d = lim inf (gn +fn ) d lim inf (gn +fn ) d = g d+ lim inf fn d ,
n

g d f d =

Since

R
R

lim inf (gn fn ) d lim inf (gn fn ) d =


n

R
g d lim sup fn d .
n

g d < it follows that


R
R
R
f d lim inf fn d lim sup fn d f d ,
n

proving that

fn d

f d as n .

Remark. If gn = g for all n N this is Lebesgues dominated convergence theorem.


Corollary 3.8. Bounded convergence
Let (E) < and f, f1 , f2 : E R be a measurable with fn f a.e.
C a.e.
Z and |fn | Z
for some C R and all n N. Then f and the fn are integrable and
fn d
f d .
E

Proof. Apply dominated convergence with gn C, noting that

26

g d = C (E) < .

The following example shows that the inequality in Lemma 3.6 can be strict and that domination by an integrable function in Theorem 3.7 is crucial.
2
Example. On (R, B, ) with Lebesgue
R measure Rtake fn = n 1(0,1/n) .
Then fn & f 0 pointwise, but
f d = 0 < fn d = n .

Remarks. Equivalent series versions of Theorems 3.5 and 3.7:


Z X

Z

X
(i) Let (fn )nN , fn : E [0, ] measurable. Then
fn d =
fn d .
E

n=1 E

n=1

P

n

fn converges and
fk g, where g is
n=1
k=1
Z

Z
X


X
P
integrable, then
fn , fn are integrable and
fn d =
fn d .

(ii) Let (fn )nN , fn : E R measurable. If

n=1

n=1 E

n=1

Definition 3.3. Let , 1 , 2 , . . . be measures on (R, B). Say that n converges weakly to ,
written n , if
Z
Z
f dn
f d for all f Cb (R, R) , i.e. f : R R bounded and continuous .
R

Theorem 3.9. Let X, X1 , X2 , . . . be random variables with distributions , 1 , 2 , . . ..





D
Then Xn X n
E f (Xn ) E f (X) by Prop. 3.4 .
D

Proof. Suppose Xn X. Then by the Skorohod theorem 2.12 there exist Y X and
Yn Xn on a common probability space (, A, P) such that, f (Yn ) f (Y ) a.e. since
f Cb (R, R) (see also problem 2.6). Thus by bounded convergence
Z
Z
Z
Z
f dn =
f (Yn ) dP
f (Y ) dP =
f d
so n .

Suppose n and let y be a continuity pointof FX .

For > 0, approximate 1(,y] by

f (x) =

Z
Z




(1(,y] f ) d g d
R

1(,y] (x) , x 6 (y, y + )

such that
1 + (y x)/ , x (y, y + )

1+(xy)/ , x 6 (y, y)
where g (x) = 1+(yx)/ , x [y, y+) .

0
, otherwise

The same inequality holds for n for all n N. Then as n


Z
Z



FXn (y) FX (y) = 1(,y] dn
1(,y] d
R
R
Z
Z
Z
Z

Z







g dn + g d + f dn
f d 2 g d ,
R

R


since f , g Cb (R, R). Now, R g d (y , y + ) 0 as 0 ,

D
{y} = 0, so Xn X.

27

since
2

3.3

Integration in R and differentiation

Theorem 3.10. Differentiation under the integral sign


Let (E, E, ) be a measure space, U R be open and suppose that f : U E R satisfies:
(i) x 7 f (t, x) is integrable for all t U ,
(ii) t 7 f (t, x) is differentiable for all x E,




(iii) f
(t,
x)
g(x) for some integrable g : E R and all x E, t U .
t
Z

Then f
t (t, .) is integrable for all t, the function F : U R defined by
Z
d
f
is differentiable and
F (t) =
(t, x) (dx).
dt
E t

f (t, x) (dx)

F (t) =
E

Proof. Take a sequence hn 0 and set


gn (t, x) :=

f (t + hn , x) f (t, x) f

(t, x) .
hn
t

Then for all x E, t U , gn (t, x) 0 and |gn (t, x)| 2 g(x) for all n N by the MVT.
f
f
t (t, .) is measurable as the limit of measurable functions, and integrable since t g.
Then by dominated convergence, as n
Z
Z
F (t + hn ) F (t)
f

(t, x) (dx) =
gn (t, x) (dx) 0 .
2
hn
E t
E
Remarks. (i) The integral on R w.r.t. Lebesgue measure is called Lebesgue integral.
Z
Z
Z
Z b
We write
f d =
f (x) dx and
f 1(a,b] d =
f (x) dx .
R

(ii) Linearity of the integral then implies:


f (x) dx =
f (x) dx +
Z a a
Z ba
for all c R, using the convention
f (x) dx =
f (x) dx.
b

f (x) dx
c

Theorem 3.11. Fundamental theorem of calculus


Z
(i) Let f : [a, b] R be a continuous function and set Fa (t) =
Then Fa is differentiable on [a, b] with

Fa0

f (x) dx .
a

= f.

(ii) Let F : [a, b] R be differentiable with continuous derivative f . Then


Z

f (x) dx = F (b) F (a) .


a

28

Proof. (i) Fix t [a, b). >0 >0 : |xy|< |f (x)f (t)|<. So for 0<h,
Z
1 Z t+h
F (t + h) F (t)

 t+h


a
a
f (x) f (t) dx
dx =  .
f (t) =

h
h t
h t
Analogous for negative h and t (a, b], thus Fa0 = f .
(ii) (F Fa )0 (t) = 0 for all t (a, b) so by the MVT
Z b
f (x) dx .
F (b) F (a) = Fa (b) Fa (a) =

So the methods of calculating Riemann integrals also apply to Lebesgue integrals.


Proposition 3.12. Partial integration and change of variable

(i) Let u, v C 1 [a, b], R , i.e. differentiable with continuous derivative, then
Z



u(x) v (x) dx = u(b) v(b) u(a) v(a)

u0 (x) v(x) dx .


(ii) Let C 1 [a, b], R be strictly increasing. Then
Z

(b)

Z
f (y) dy =

(a)



f (x) 0 (x) dx for all f C [(a), (b)], R .

Proof. see problems 2.12 and 2.13


Definition 3.4. Let (E, E, ) be a measure space and f : E [0, ) be integrable. We
say a measure on (E, E) has density f with respect to , short = f , if
Z
(A) =
f d for all A E .
A

Lemma 3.13.
Let (E, E, ) be a measure space. For every integrable f : E [0, ),
R
: A 7 A f d is a measure on (E, E) with -density f and
Z
Z
g d =
f g d for all integrable g : E R .
E

Let be a Radon measure on (R, B) with distribution function F C 1 (R, R). Then has
density f = F 0 with respect to Lebesgue measure.
Proof. For the first part see problem 2.15(a).
Z b

With Theorem 1.10 and 3.11, (a, b] = F (b) F (a) =
f (x) dx .
a


So coincides with f on the -system I {} = (a, b] : a < b {} that generates B.
Thus by uniqueness of extension, = f on B and has -density f .
2

29

3.4

Product measure and Fubinis theorem

Let (E1 , E1 , 1 ) and (E2 , E2 , 2 ) be finite measure spaces and E = E1 E2 .


Definition 3.5. The product -algebra E = E1 E2 := (A) is generated by the -system


A = A1 A2 : A1 E 1 , A 2 E 2 .
Example. If E1 = E2 = R and E1 = E2 = B then E1 E2 = B(R2 ).
Lemma 3.14. Let f : E R be E-measurable. Then the following holds:
(i) f (x1 , .) : E2 R is E2 -measurable for all x1 E1 .
Z
f (x1 , x2 ) 2 (dx2 ) is bounded and E1 -measurable.
(ii) If f is bounded, f1 (x1 ) :=
E2

Tx1: E2 E by Tx1 x2 = (x1 , x2 ).


A2 , x1 A1
For A = A1 A2 A , Tx1
A =
E2 and thus with Lemma 2.1
1
, x1 6 A1
Tx1 is E2 /E-measurable. So f (x1 , .) = f (Tx1 (.)) is E2 /B-measurable with Lemma 2.5.
(ii) By (i) and since f is bounded, f1 is well
since 2 (E2 ) < .
 defined and bounded,


For f = 1A , f1 (x1 ) = 2 Tx1
(A)
.
Denote
D
=
A

E : f1 is measurable ,
1
which can be checked to be a d-system. Since f1 (x1 ) = 1A1 (x1 ) 2 (A2 ) for A = A1 A2 ,
A D and thus E = (A) = D with Dynkins lemma (1.5).
By linearity of integration the statement also holds for non-negative E-simple functions, and
by monotone convergence for all bounded, measurable f using
Z
Z
f1 (x1 ) :=
f + (x1 , x2 ) 2 (dx2 )
f (x1 , x2 ) 2 (dx2 ) .
2
Proof. (i) For fixed x1 E1 define

E2

E2

Theorem 3.15. Product measure


There exists a unique measure = 1 2 on E, such that
(A1 A2 ) = 1 (A1 ) 2 (A2 ) for all A1 E1 and A2 E2 ,
Z Z
defined as (A) :=
1A (x1 , x2 ) 2 (dx2 ) 1 (dx1 ) .
E1

E2

Proof. With Lemma 3.14, is a well defined function of A. Using monotone convergence
can be seen to be countably additive and is thus a measure.
Since 1A1 A2 = 1A1 1A2 the above property is fulfilled for all A1 E1 and A2 E2 .
Since A = {A1 A2 : Ai Ei } is a -system generating E and (E) < , is uniquely
determined by its values on A following Theorem 1.6 (Uniqueness of extension).
2

Remark. f : E1 E2 R is measurable if and


Z only if f : E2 E1 Z R with f (x2 , x1 ) =
f (x1 , x2 ) is measurable and for integrable f :
f d2 1 =
f d1 2 .
E2 E1

30

E1 E2

Theorem 3.16. Fubinis theorem


(i) Let f : E [0, ] be E-measurable. Then
Z

f d =

f (x1 , x2 ) 2 (dx2 ) 1 (dx1 ) =

E1

f (x1 , x2 ) 1 (dx1 ) 2 (dx2 )

E2

E2

E1

taking values in [0, ].


(ii) Let f : E R be E-measurable. If at least one of the following integrals is finite
Z

|f (x1 , x2 )| 1 (dx1 ) 2 (dx2 )

|f (x1 , x2 )| 2 (dx2 ) 1 (dx1 ),

|f | d,
E1

E2

E2

E1

then all three are finite and f is integrable. Furthermore,


R
f (x1 , .) is 2 -integrable for 1 -almost all x1 and E2 f (., x2 ) 2 (dx2 ) is 1 -integrable,
R
f (., x2 ) is 1 -integrable for 2 -almost all x2 and E1 f (x1 , .) 1 (dx1 ) is 2 -integrable,
and the formula in (i) holds.
Proof. (i) If f = 1A for some A E the formula holds by definition of and can be extended
to non-negative measurable f as in the proof of Lemma 3.14 (ii).
(ii) Since |f | is non-negative, the formula in (i) holds and all integrals coincide and are finite.
By Lemma 3.14 f (x1 , .) is measurable and 2 -integrable since
Z
Z

f (x1 , x2 ) 2 (dx2 ) < for 1 a.e. x1 E1


f (x1 , x2 ) 2 (dx2 )
E2

E2

f (x1 , x2 ) 2 (dx2 ) 1 (dx1 )

Furthermore
E1

E2

|f | d < .
E

The same follows for f (., x2 ) and finally the formula in (i) holds for f and thus for f =
f + f by linearity.
2
Remarks. (i) Product measures and Fubini can be extended to -finite measure spaces,
S i.e.
for all A E1 there exist An E1 , n N with 1 (An ) < for all n and A = n An .
(ii) However, without -finiteness Fubinis theorem does in general not hold. Consider e.g.
the measure () = 0, (A) = for a 6= on (R, B). This is not -finite and with
Lebesgue measure on (R, B) we have
Z Z
R

1Q (x + y) (dx) (dy) = 0 whereas

Z Z
R

1Q (x + y) (dy) (dx) = .

(iii) The operation of taking products of measure spaces is associative




E1 E2 E3 := E1 E2 E3 = E1 E2 E3

(also for measures) .


So products can be taken without specifying the order, e.g. Rd , B(Rd ), d .

31

Z
Example. I =

ex dx =

since by Fubinis theorem and polar-doordinates

I2 =

e(x

2 +y 2 )

dx dy =

R2

r=0

=0



2
2
er r dr d = 2 er /2 0 = .

Proposition 3.17. Let (E1 , E1 , 1 ) be a -finite measure space. Then


Z
Z
(f x) dx for all E1 /B-measurable f : E1 [0, ) .
f d =
E

Proof. see problem 2.15(b)


Remark. Together with Proposition 3.4, this consequence of Fubinis theorem is particularly
useful to calculate expectations of random variables.

32

Lp -spaces

4
4.1

Norms and inequalities

Let (E, E, ) be a measure space.


Theorem 4.1. Chebyshevs inequality
Z
Let f : E [0, ] be measurable. Then for any 0: (f )

f d .
E

Proof. Integrate f 1{f } .

Example. Let X be a random variable


< . Then take f = |X m|2
 with m = E(X)
2
2
to get P(f ) = P |X m| Var(X)/ for all > 0.
Definition 4.1. For 1 p we denote by Lp = Lp (E, E, ) the set of measurable
functions f : E R with finite Lp -norm, kf kp < , where
1/p
Z


for p < ,
kf k = inf R : |f | a.e. .
|f |p d
kf kp =
E
Lp

We say that fn converges to f in Lp , fn f , if

kf fn kp 0 as n .

Remarks. (i) At the end of this section we will see in what sense k.kp is a norm on Lp .
(ii) For f C(R), kf k = supxE |f (x)|.
(iii) For 1 p < :

kf kp (E)1/p kf k .


(iv) Let f Lp , 1 p < . Then |f | (kf kp /)p for all > 0 by Chebyshevs inequality.
For f Lp (R) this includes that f (x) essentially tends to zero as |x| in the
sense kf 1|x|y k 0 as y . For random variables X Lp the relation
P |X| = O(p ) as is called a tail estimate (see also problem 3.6) .
Definition 4.2. A function f : R R is convex if, for all x, y R and t [0, 1]

f t x + (1 t) y t f (x) + (1 t) f (y) .
Remark. Let f : R R be convex. Then f is continuous (in particular measurable) and
x0 R a R : f (x) a (xx0 ) + f (x0 ) .
Theorem 4.2. Jensens inequality
Let X be an integrable r.v. and f : R R convex. Then



E f (X) f E(X) .

Proof. With m = E(X)


 < choose a R such that f (X)  a (X m) + f (m).
In particular E f (X) |a| E(|X|) + |f (m)| < and E f (X) R is well defined.
Moreover



E f (X) a E(X) m + f (m) = f (m) = f E(X) .
2

33

Theorem 4.3. Holders inequality


Let p, q [1, ] be conjugate indices, i.e. p1 + 1q = 1 or equivalently q = p/(p 1). Then for
Z
|f g| d = kf gk1 kf kp kgkq .
all measurable f, g : E R ,
Equality holds if and only if

E
|f (x)|p

kf kpp

|g(x)|q
a.e. .
kgkqq

Proof. For p = 1, q = the result follows with |f g| |f | kgk a.e.. If kf kp , kgkq = 0 or


the result is trivial, so in the following p, q (1, ) and kf kp , kf gq (0, ).
For given 0 < a, b < let a = es/p , b = et/q and by convexity of ex we get
es/p+t/q

es et
+
p
q

and thus a b

ap bq
+
.
p
q

(Youngs inequality)

By strict convexity of ex equality holds if and only if s = t


Now insert a = |f |/kf kp and b = |g|/kgkq and integrate
R q 
R
|g| d
|f |p d
= kf kp kgkq
kf gk1 kf kp kgkq
p +
p kf kp
q kgkqq

b = ap1 .

1
p

1
q

= kf kp kgkq .

After integration equality holds if and only if b = ap1 a.e., finishing the proof.
Corollary 4.4. Minkowskis inequality
For p [1, ] and measurable f, g : E R we have

kf + gkp kf kp + kgkp .

Remark. This is the triangle inequality


for p-norms.
For every norm this implies the negative




triangle inequality kfn f k kfn k kf k , and thus kfn k kf k whenever kfn f k 0.
Proof. The cases p = 1, follow directly from the triangle inequality on R, so assume
p (1, ) and kf kp , kgkp < , kf + gkp > 0. 
p
Then |f + g|p 2 (|f | |g|) 2p |f |p + |g|p
so kf + gkp < .
The result then follows from
Z
Z
Z
|f + g|p d
|f | |f + g|p1 d +
|g| |f + g|p1 d
kf + gkpp =
E
E
E

 Z
1/q
p
kf kp + kgkp
|f + g|(p1)q d
using Holder with q = p1
E


= kf kp + kgkp kf + gkpp1 .
2
Corollary 4.5. Monotonicity of Lp -norms
Let 1 p < q . Then for all f Lp () we have kf kp (E)1/p1/q kf kq ,
which includes Lq () Lp () in the case (E) < .
Proof. For q = the result follows from Remark (iii) on the previous page. For q <
apply Holder with indices p = q/p and q = p/(
p 1) = q/(q p) to get
Z
Z

1/p
kf kpp =
|f |p 1 d
|f |pp d
(E)1/q = kf kpq (E)1p/q .
E

Since x 7

xp

is monotone increasing for x 0 this implies the result.


34

Examples. (i) There is no monotonicity of Lp -norms if (E) = . Take e.g. f (x) = 1/x
on (0, )
with Lebesgue measure.
Then kf 1[1,) k1 = > 1 = kf 1[1,) k2
and k f 1(0,1) k1 = 2 < = k f 1(0,1) k2 .

P
(ii) Consider the counting measure = n n on the measurable space N, P(N) . Then
kf kp =

X

1/p
f (n) p

for p <

and



kf k = sup f (n) .
nN

n=1

So Lp (N, ) = `p is the space of sequences with finite p-norm.


Definition 4.3. For f, g Lp we say that g is a version of f if g = f a.e. . This defines an
equivalence relation on Lp and we denote by Lp = Lp /[0] the quotient space of all equivalence
classes [f ] = {g Lp : g f = 0 a.e.}.
Proposition 4.6. (Lp , k.kp ) is a normed vector space.
Proof. If f, g Lp with f = g a.e. then kf kp = kgkp < by Theorem 3.3, so k.kp is
well defined on Lp . In particular f = 0 a.e. implies kf kp = 0. Furthermore kf kp = || kf kp
for all R by linearity of integration and kf + gkp kf kp + kgkp by Minkowskis inequality. These properties extend to equivalence classes. In particular [f ], [g] Lp implies
that [f ] = [f ] and [f ] + [g] = [f + g] are in Lp , so that Lp is a vector space.
2
Remark. In the following we follow the usual abuse of notation and identify Lp with Lp .
Theorem 4.7. Completeness of Lp
(Lp , k.kp ) is a Banach space, i.e. a complete normed vector space, for every p [1, ].
Proof. The case p = is left as problem 3.3, in the following p < .
Let (fn )nN be a Cauchy sequence in Lp such that kfn fm kp 0 as n, m .

X
Choose a subsequence (nk )kN such that S :=
kfnk+1 fnk kp < .
k=1

By Minkowskis inequality, for any K N,

K
X



|fnk+1 fnk | S .

p

k=1

By monotone convergence this bound holds also for K , so

|fnk+1 fnk | < a.e.

k=1

So for a.e. x R, fnk (x) is Cauchy and thus converges by completeness of R. We define

limk fnk (x) , if the limit exists
f (x) :=
.
0
, otherwise
Z
Given  > 0, we can find N N such that
|fn fm |p d <  for all m n N , and
Z
in particular
|fn fnk |p d <  for sufficiently large k. Hence by Fatous Lemma
Z

Z
Z
p
|fn f | d = lim inf |fn fnk | d lim inf |fn fnk |p d <  for all n N .
p

35

Hence kfn f kp 0 since  > 0 was arbitrary and f Lp since for n large enough
kf kp kf fn kp + kfn kp 1 + kfn kp < .

4.2

L2 as a Hilbert space

Let (E, E, ) be a measure space and L2 = L2 (E, E, ).


R
Proposition 4.8. The form h., .i : L2 L2 R with hf, gi = E f g d is an inner
product on L2 , and the innerpproduct space (L2 , h., .i) is a Hilbert space, i.e. complete with
respect to the norm kf k2 = hf, f i.
Proof. By Holders inequality we have for all f, g L2
Z


hf, gi
|f g| d kf k2 kgk2
(Cauchy-Schwarz inequality) .
E

Thus h., .i is finite and well defined on L2 , symmetric by definition and bilinear by linearity of
integration. Further hf, f i = kf k22 0 with equality if and only if f = 0 a.e. and (L2 , k.k2 )
is complete by Theorem 4.7.
2
Proposition 4.9. For f, g L2 we have Pythagoras rule
kf + gk22 = kf k22 + 2 hf, gi + kgk22 ,
and the parallelogram law

kf + gk22 + kf gk22 = 2 kf k22 + kgk22 .
Proof. Follows directly from kf gk22 = hf g, f gi.

Definition 4.4. We say f, g L2 are orthogonal if hf, gi = 0. For V L2 , we define




V = f L2 : hf, vi = 0 for all v V .
V L2 is called closed if, for every sequence (fn )nN in V , with fn f in L2 , we have
f = v a.e., for some v V .
Remark. For all V L2 , V is a closed subspace, since f, g V includes 1 f +2 g V
for all 1 , 2 R and for (fn )nN in V with fn f in L2 we have for all v V


hf, vi = f fn , vi kfn f k2 kvk2 0 as n .

Theorem 4.10. Orthogonal projection


Let V be a closed subspace of L2 . Then each f L2 has a decomposition f = v + u, with
v V and u V . The decomposition is unique up to a version and v is called the orthogonal projection of f on V . Moreover, kf vk2 kf gk2 for all g V , with equality iff
g = v a.e. .
36

Proof. Uniqueness: Suppose f = v + u = v + u


a.e. with v, v V and u, u
V .
Then with Pythagoras rule
0 = kv v + u u
k22 = kv vk22 + ku u
k22

u
= u, v = v a.e. .

Existence: Choose a sequence vn V such that, as n ,




kf vn k2 d(f, V ) := inf kf gk2 : g V .
By the parallelogram law,



2 f (vn + vm )/2 2 + kvn vm k22 = 2 kf vn k22 + kf vm k22 .
2

 2
But 2 f (vn + vm )/2 2 4 d(f, V )2 , so we must have kvn vm k2 0 as n, m .
By completeness, kvn gk2 0, for some g L2 , and by closure g = v a.e., for some v V .
Hence
kf vk2 = lim kf vn k2 = d(f, V ) kf hk2
n

for all h V .

In particular, for all t R, h V , we have



2
d(f, V )2 f (v + t h) 2 = d(f, V )2 2t hf v, hi + t2 khk22 .
So we must have hf v, hi = 0 , and u = f v V , as required.

Definition 4.5. For R-valued random variables X, Y L2 (P) with means mX = E(X)
and mY = E(Y ) we define variance, covariance and correlation by


var(X) = E (X mX )2 , cov(X, Y ) = E (X mX )(Y mY ) ,
p
var(X) var(Y ) .
corr(X, Y ) = cov(X, Y )
For an Rn -valued random variable X = (X1 , . . . , Xn ) L2 (P) (this means that each coordinate Xi L2 (P)) the variance is given by the covariance matrix

var(X) = cov(Xi , Xj ) i,j=1,..,n .
Remarks. (i) var(X) = 0

if and only if X = mX a.s. .

(ii) cov(X, Y ) = cov(Y, X) , cov(X, X) = var(X) and if X and Y are independent,


then cov(X, Y ) = 0 .



(iii) By Holder cov(X, Y ) var(X) var(Y ) and thus corr(X, Y ) [1, 1] .
Proposition 4.11. Every covariance matrix is symmetric and non-negative definite.
Proof. Symmetry by definition. For X = (X1 , . . . , Xn ) L2 (P) and a = (a1 , . . . , an ) Rn
T

a var(X)a =

n
X

ai aj cov(Xi , Xj ) = var(at X) 0 ,

i,j=1

since aT X =

i ai Xi

L2 (P).
37

Revision. Let (, A, P) be a probability space and let G A be some event. For P(G) > 0
the conditional probability P(. | G), given by
P(A | G) =

P(A G)
P(G)

for all A A ,

is a probability measure on (, A).


Definition 4.6. For a random variable X : R we denote
Z
.
Z
E(X | G) =
X dP(.|G) =
X 1G dP
P(G) = E(X 1G )/P(G) ,

whenever P(G) > 0, and we set E(X | G) = 0 when P(G)S= 0.


Let (Gi )iI be a countable family of disjoint events with i Gi = and G = (Gi : i I).
Then the conditional expectation of a r.v. X given G is given by
X
E(X | G) =
E(X | Gi ) 1Gi .
iI

Remarks. (i) E(X | G) is a G/B-measurable r.v., taking constant


values on each Gi .

In particular, for G = () = {, }, E X | {, } = E(X | ) 1 = E(X) .
S
(ii) For every A G it is A = iJ Gi for some J I. Thus
Z
E(X | G) dP =
A

E(X 1Gi )

iI

X

1Gi dP P(Gi ) =
E(X 1Gi ) =

iJ

X dP .
A


In particular, if E(X) < , E(X | G) is integrable and E E(X | G) = E(X).
(iii) For a -algebra G A, L2 (G, P) is complete and therefore a closed subspace of
L2 (A, P). If X L2 (A, P) then E(X | G) L2 (G, P).
Proposition 4.12. If X L2 (A, P) then E(X | G) is a version of the orthogonal projection of
X on L2 (G, P).
Proof. see problem 3.10
Remarks on the general case
(i) For a general -algebra F A one can show,
integrable r.v. X there
R that for every
R
exists an F-measurable, integrable r.v. Y with F Y dP = F X dP for every F F. It
is unique up to a version, defining the conditional expectation Y = E(X | F).
For X L2 (A, P), E(X | F) is the orthogonal projection of X on L2 (F, P).
(ii) If X is F-measurable, E(X | F) = X. In particular E(X | A) = X.
(iii) For -algebras F1 F2 A we have


E E(X | F2 ) F1 = E(X | F1 ) = E E(X | F1 ) F2 .

38

4.3

Convergence in L1

Let (, A, P) be a probability space and consider L1 = L1 (, A, P).


Lq

Lp

By monotonicity of Lp -norms Xn X implies Xn X for all 1 q p, so converL1

gence in L1 is the weakest. From problem 3.4 we know that Xn X implies convergence
in probability. The converse holds only under additional assumptions.
Theorem. 4.13. Bounded convergence
Let (Xn )nN be a sequence of random variables with Xn X in probability. If in addition
|Xn | C a.s. for all n N and some C < , then Xn X in L1 .
Proof. By Theorem 2.10(ii) X is the almost sure limit of a subsequence, so |X| C a.s. .
For  > 0 there exists N N such that for all n N : P |Xn X| > /2 /(4C) . Then



E |Xn X| = E |Xn X| 1|Xn X|>/2 + E |Xn X| 1|Xn X|/2

2 C /(4C) + /2 =  .
2
Remark. Corollary 3.8 on bounded convergence gives a similar statement under the stronger
assumption Xn X a.s.. Although the assumptions in 4.13 are weaker, they are still not necessary for the conclusion to hold. The main motivation of this section is to provide a necessary
and sufficient extra condition, such that convergence in probability implies convergence in L1 .
Lemma 4.14. For X L1 (A, P) set
Then IX () & 0 as & 0.



IX () = sup E(|X| 1A ) : A A, P(A) .

Proof. Suppose
not. Then, for some  > 0, there exist An A, with P(An ) 2n and

E |X| 1An  for all n N. By the first Borel-Cantelli lemma, P(An i.o.) = 0. But then
by dominated convergence


 E |X| 1mn Am E |X| 1{An i.o.} = 0 as n ,
2

which is a contradiction.

Definition 4.7. Let X be a family of randomvariables on (,


A, P). For 1 p we
p
say that X is uniformly bounded in L if sup kXkp : X X < . Define


IX () = sup E(|X| 1A ) : X X , A A, P(A) .
We say that X is unif. integrable (UI) if X is unif. bounded in L1 and IX () & 0, as & 0.


Remarks. (i) X is unif. bounded in L1 if and only if IX (1) = sup kXk1 : X X < .
(ii) With Lemma 4.14, any single, integrable random variable is UI, which can easily be
extended to finitely many.

(iii) If (, A, P) = (0, 1], B((0, 1]), then if IX () & 0 there exists > 0 such that
X
 n1

E |X| =
E |X| 1(k/n,(k+1)/n] n

for n = d1/e

and all X X ,

k=0

which includes that X is uniformly bounded. In general this does not hold.
39

(iv) Some sufficient conditions: X is UI, if


there
exists Y L1 (A, P)

 such that |X| Y for all X  X
E |X| 1A E Y 1A
for all A A, then use (ii)
there exists p > 1 such that X is uniformly bounded in Lp 

by Holder, for conjugate indices p and q < , E |X| 1A kXkp P(A)1/q

Example. Xn = n 1(0,1/n) is uniformly bounded in L1 (0, 1], B((0, 1]), but not UI.
Proposition 4.15. A family X of random variables on (, A, P) is UI if and only if



sup E |X| 1|X|K : X X 0 , as K .
Proof. Suppose X is U I. Given  > 0, choose > 0 so that IX () < , then choose K <
so that IX (1)
 K. Then with A = |X| K we have kXk1 P(A) K so that P(A)
and E |X| 1A <  for all X X . Hence, as K ,



sup E |X| 1|X|K : X X 0 .


On the other hand, if this condition holds, IX (1) < ,
 since E |X| K + E |X| 1|X|K .
Given  > 0, choose K < so that E |X| 1|X|K < /2 for all X X .
Then choose > 0 so that K < /2. For all X X and A A with P(A) < , we have


E |X| 1A E |X| 1|X|K + K P(A) <  .
2

Hence X is U I.
Theorem 4.16. Let Xn , n N and X be random variables. The following are equivalent:
(i) Xn L1 for all n N, X L1 and Xn X in L1 ,
(ii) {Xn : n N} is UI and Xn X in probability.

Proof. Suppose (i) holds. Then Xn X in probability, following


problem 3.4.

Moreover, given  > 0, there exists N such that E |Xn X| < /2 whenever n N . Then
we can find > 0 so that P(A) implies, using Lemma 4.14,


E |X| 1A /2 , E |Xn | 1A , for all n = 1, . . . , N .



Then, also for n N and P(A) , E |Xn | 1A E |Xn X| + E |X| 1A  .
Hence {Xn : n N} is UI and we have shown that (i) implies (ii).
Now suppose that (ii) holds. Then
there is a subsequence
(nk ) such that Xnk X a.s..


So, by Fatous lemma, E |X| lim inf k E |Xnk | < .
Now with Proposition 4.15, given  > 0, there exists K < such that, for all n,


E |Xn | 1|Xn |K < /3 , E |X| 1|X|K < /3 .
Consider the unif. bounded sequence XnK = (K) Xn K and set X K = (K) X K.
Then XnK X K in probability, so, by bounded convergence, there exists N N such that,
for all n N , E |XnK X K | < /3 . But then, for all n N ,




E |Xn X| E |Xn | 1|Xn |K + E |XnK X K | + E |X| 1|X|K <  .
Since  > 0 was arbitrary, we have shown that (ii) implies (i).

40

5
5.1

Characteristic functions and Gaussian random variables


Definitions


Definition 5.1. For a finite measure on Rn , B(Rn ) , define the Fourier transform

: Rn C by
Z
eihu,xi (dx) , for all u Rn .

(u) =
Rn

Here hu, xi =

n
X

ui xi denotes the usual inner product on Rn .

i=1

For a random variable X in Rn , the characteristic function X : Rn C is given by



X (u) = E eihu,Xi , for all u Rn .
Thus X =
X where X is the distribution of X in Rn .
Remark. For measurability of f : R C identify f = (Ref, Imf ) R2 and use B(R2 ).
The integral of such functions is to be understood as
Z
Z
Z
f (dx) =
Ref (dx) + i
Imf (dx) .
Rn

Rn

Rn

Since eix = cos x + i sin x has bounded real and imaginary part it is integrable with respect to
every finite measure. Thus also
(u) and X (u) are well defined for all u Rn (in contrast to
moment generating functions MX , see problem 3.13).
Definition 5.2. A random variable X in Rn is called standard Gaussian if
Z
1
2
e|x| /2 dx , for all A B(Rn ) .
P(X A) =
n/2
(2)
A
Example. For a standard Gaussian random variable X in R it is
Z
Z (xiu)2 /2
e
iux 1
x2 /2
u2 /2

X (u) =
e e
dx = e
I , where I =
dx .
2
2
R
R
R
2
I can be evaluated by considering the complex integral ez /2 dz around the rectangular
2
contour with corners R, R iu, R iu, R. Since ez /2 is analytic, the integral vanishes
by Cauchys theorem for every R > 0. In the limit R , the contributions from the vertical
sides of also vanish and thus
Z
1
2
2
ex /2 dx = 1 X (u) = eu /2 .
I=
2
R
In the next subsection we will also make use of the following.
Definition 5.3. For t > 0 and x, y Rn we define the heat kernel
p(t, x, y) =

1
2
e|xy| /(2t) R .
n/2
(2t)

41

Z
1
2
2
eiwu eu /2 du.
Remark. From the previous calculation we have ew /2 =
2
R

With w = (x y)/ t and the change of variable v = u/ t , we deduce for n = 1


Z
1
2
p(t, x, y) =
eixv ev t/2 eiyv dv .
2 R
Z
1
2
For n 1 we obtain analogously p(t, x, y) =
eihx,vi e|v| t/2 eihy,vi (dv) .
n
(2) Rn

5.2

Properties of characteristic functions

The characteristic function of a random variable uniquely determines its distribution.


Theorem 5.1. Uniqueness and inversion
Let X be a random variable in Rn . The law X of X is uniquely determined by its characteristic function X . Moreover, if X is integrable, then X has density function fX (x), with
Z
1
X (u) eihu,xi du .
fX (x) =
(2)n Rn
Remark. The above formula is also called the inverse Fourier transformation.
Proof. Let Y be a standard Gaussian r.v. in Rn , independent of X, and letg : Rn R
be a bounded Borel function. Then, for t > 0, by change of variable y 0 = x + ty and Fubini,
Z Z

2
E(g(X + tY )) =
g(x + ty)(2)n/2 e|y| /2 dy X (dx) =
n
n
ZR ZR
=
p(t, x, y 0 ) g(y 0 ) dy 0 X (dx) =
n
n

ZR RZ
Z
1
ihu,xi |u|2 t/2 ihu,yi
=
e
e
e
du X (dx) g(y) dy
n
Rn
Rn
Rn (2)

Z 
Z
1
|u|2 t/2 ihu,yi
=

(u)
e
e
du
g(y) dy .
X
(2)n Rn
Rn


By this formula, X determines E f (X + tY ) . For any bounded continuous g, we have



E g(X + tY ) E g(X) as t & 0 ,

so X determines E g(X) . Hence X determines X due to problem 4.1.
If X is integrable and if g is continuous and bounded, then



(u, y) 7 X (u) g(y) L1 (du dy) .
So, by dominated convergence, as t & 0, the last integral above converges to

Z 
Z
1
ihu,yi
X (u) e
du g(y) dy .
(2)n Rn
Rn
2

Hence X has the claimed density function.


42

Remark. Let X, Y be independent r.v.s in Rn . Then the characteristic fct. of the sum is


X+Y (u) = E eihu,X+Y i = E eihu,Xi eihu,Y i = X (u) Y (u) .
The next result shows that independence of r.v.s is equivalent to factorisation of the joint characteristic function.
Theorem 5.2. Let X = (X1 , . . . , Xn ) be a r.v. in Rn . Then the following are equivalent:
(i) X1 , . . . , Xn are independent ,
(ii) X = X1 . . . Xn ,
Y
 Y
n
n

(iii) E
fk (Xk ) =
E fk (Xk ) , for all bounded Borel functions f1 , . . . , fn ,
k=1

(iv) X (u) =

k=1
n
Y

Xk (uk ) , for all u = (u1 , . . . , un ) Rn .

k=1

Q
Proof. If (i) holds, X (A1 . . . An ) = k Xk (Ak ) for all Borel sets A1 , . . . , An .
So (ii) holds, since this formula characterizes the product measure by Theorem 3.15.
If (ii) holds, then, for f1 , . . . , fn bounded Borel,
Y
 Z Y
YZ
Y

fk (xk ) X (dx) =
fk (xk ) Xk (dxk ) =
E fk (Xk ) ,
E
fk (Xk ) =
Rn k

so (iii) holds. Statement (iv) is a special case of (iii), with fk (xk ) = ei uk xk .


1, . . . , X
n with = X for
Suppose, finally, that (iv) holds and take independent r.v.s X
k
Xk
= (X
1, . . . , X
n ), so
all k. Then X k = Xk , and we know that (i) implies (iv) for X
Y
Y
X (u) =
X k (uk ) =
Xk (uk ) = X (u) ,
k

and X = X by uniqueness of characteristic functions. Hence (i) holds.

5.3

Gaussian random variables

Definition 5.4. A random variable X in R is Gaussian if it has density function


fX (x) =

1
2 2

e(x)

2 /(2 2 )

for some R and 2 (0, ). We write X N (, 2 ) .


We also admit as Gaussian the degenerate case X = a.s., corresponding to taking 2 = 0 .
Proposition 5.3. Suppose X N (, 2 ) and a, b R. Then
(i) E(X) = ,
(iii) aX + b N (a + b, a2 2 ) ,

(ii) var(X) = 2 ,
(iv) X (u) = eiuu

43

2 2 /2

Proof. see problem 4.7


Definition 5.5. A random variable X in Rn is Gaussian if hu, Xi is Gaussian, for all u Rn .
Examples. (i) If X = (X1 , . . . , Xn ) is Gaussian, in particular Xi is Gaussian for each i.
(ii) Let X1 , . . . , Xn be independent N (0, 1) random variables. Then X = (X1 , . . . , Xn ) is
Gaussian, since for all u Rn
n
Y


2
2
E eivhu,Xi = E
eivuk Xk = ev |u| /2 ,

for all v R .

k=1

Thus hu, Xi N (0, |u|2 ) by uniqueness of characteristic functions.


Remark. Let X be a random variable in Rn . Then the covariance matrix


= var(X) = cov(Xi , Xj ) i,j=1,..,n = E (X E(X))(X E(X))T
is symmetric and non-negative definite by Proposition 4.11. Thus has n real eigenvalues
i 0 and the eigenvectors vi form an ortho-normal basis of Rn , i.e. hvi , vj i = i,j . So
x = hvi , xivi = (viT x)vi

and x =

n
X

i vi (viT x) for all x Rn .

i=1

So we can write =

n
X

i vi viT

and we define

1/2

n p
X
:=
i vi viT .

i=1

i=1

Pi = vi viT Rnn is the projection on the one-dimensional eigenspace corresponding to the


2
eigenvector i . Since Pi Pj = i,j Pi , it follows that = 1/2 .
Theorem 5.4. Let X : Rn be a Gaussian random variable. Then
(i) AX + b is Gaussian, for all A Rnn and all b Rn ,
(ii) X L2 () (coordinatewise) and its distribution is determined by the mean =
E(X) Rn and the covariance matrix = var(X) Rnn , we write X N (, ),
(iii) X (u) = eihu,ihu,ui/2 ,
(iv) if is invertible, then X has a density function on Rn , given by
1

fX (x) = p

(2)n det

i
exp x , 1 (x ) /2 ,

(v) if X = (Y, Z), with Y in Rm and Z in Rp (m+p=n), then the block structure

var(X) =

var(Y ) 0
0 var(Z)


implies that

44

Y and Z are independent .

Proof. We use hu, vi = uT v and (Av)T = v T AT for all u, v Rn and A Rnn .


(i) For all u Rn , hu, AX + bi = hAT u, Xi + hu, bi is Gaussian, by Proposition
5.3.

(ii),(iii) Each Xk is Gaussian, so X L2 . For all u Rn we have E hu, Xi = hu, i and

 D

 E
var hu, Xi = E uT (X )(X )T u = u, E (X )(X )T u = hu, ui .

Since hu, Xi is Gaussian, by Proposition 5.3, we must have hu, Xi N hu, i, hu, ui and

X (u) = E eihu,Xi = hu,Xi (1) = eihu,ihu,ui/2 .
This is (iii) and (ii) follows by uniqueness of characteristic functions.
(iv) Let Y1 , . . . , Yn be independent N (0, 1) r.v.s. Then Y = (Y1 , . . . , Yn ) has density
fY (y) = p

1
(2)n

e|y|

2 /2

= 1/2 Y + , then X
is Gaussian, with E(X)
= and var(X)
= , since
Set X
n
X


1/2
1/2
1/2
1/2

cov(Xi , Xj ) = E ( Y )i ( Y )j = E
ik Yk jl Yl = ij
k,l=1

X. If is invertible, then X
and hence X has the density
due to E(Yk Yl ) = k,l . So X
1/2
claimed in (iv), by the linear change of variables Y =
(X ) leading to


|y|2 = hy, yi = x , 1 (x )

and

dn x
.
dn y = dn x det 1/2 =
det

(v) Finally, if X = (Y, Z) and = var(X) has the block structure given in (v) then, for all
v Rm and w Rp ,


(v, w), (v, w) = hv, Y vi + hw, Z wi , where Y = var(Y ) and Z = var(Z) .
With = (Y , Z ), the joint characteristic function X then splits into a product
X (v, w) = eihv,Y ihv,Y vi/2 eihw,Z ihw,Z wi/2 ,
2

so Y and Z are independent by Theorem 5.2.


Remarks. Let X = (X1 , . . . , Xn ) N (, ) be a Gaussian.
(i) X1 , . . . , Xn are independent if and only if is a diagonal matrix.
(ii) If is invertible, Y = 1/2 (X ) N (0, In ) are independent N (0, 1).

45

6
6.1

Ergodic theory and sums of random variables


Motivation

Theorem 6.1. Strong law of large numbers


Let (Xn )nN be a sequence of independent random variables such that, for some constants
R, M > 0,
E(Xn ) = ,
Then, with Sn =

n
P

E(Xn4 ) M ,

for all n N .

Xi we have Sn /n a.s. as n .

i=1

Proof. For Yn := Xn we have


4
4

Yn4 |Xn | + || 2 max{|Xn |, ||} 16 |Xn |4 + ||4 ,
< for all n N. With Y 4 also Yn , Y 2 and Y 3 are
and thus E(Yn4 ) 16(M + 4 ) = M
n
n
n
integrable and by independence and E(Y ) = 0



E Yi Yj3 = E Yi Yj Yk2 = E Yi Yj Yk Yl = 0 ,
for distinct indices i, j, k, l. Hence

X
 X

X

4
4
2 2
4
Yi Yj Yk Yl = E
Yi +
Yi Yj ,
E (Sn n) = E
2
i

i,j,k,l



E (Y 2 )2 E Y 2 2 ,
and by Jensens inequality M
i
i

i<j

so using independence


+ 6 n(n 1) M
3n2 M
.
E (Sn n)4 nM
2
X

X
4

Thus E
(Sn /n ) 3M
1/n2 < by monotone convergence. Therefore
n
n
X
4
(Sn /n ) < a.s. and thus Sn /n a.s. .
2
n

Intuitively, the above result should also hold without the restrictive assumption on the fourth
moment of the random variables. One goal of this chapter is in fact to prove the above statement
with a much weaker assumption. For this purpose it is convenient to use a different approach,
leading to ergodic theory which is introduced in the next two sections.

6.2

Measure-preserving transformations

Let (E, E, ) and (F, F, ) be a -finite measure space.


Definition 6.1. Let : E E be measurable. A E is called invariant (under ) if
1 (A) = A . A measurable function f : E F is called invariant (under ) if f = f .


(where the first
Remarks. (i) For general A E it is 1 (A) A 1 (A)
and second inclusion become equalities, if is surjective and injective, respectively) . 
If A is invariant, (A) A (motivating the definition), and
A = 1 (A) ,
 also
1
1
since in addition to the general relation we have
(A) (A) = A .
46



(ii) E := A E : 1 (A) = A is a -algebra since pre-images preserve set operations.
(iii) A E is invariant

1A = 1A , since 1A = 11 (A) .


(iv) f : E F is invariant B F : f 1 (B) = 1 f 1 (B)
B F : f 1 (B) E , i.e. f is E -measurable .
Definition 6.2. A measurable function : E E is called measure-preserving if

1 (A) = (A) for all A E .
Such is ergodic if E is trivial, i.e. contains only sets of measure 0 and their complements.
Examples. (i) The constant function (x) = c E is not measure preserving.
The identity (x) = x is measure preserving, but not ergodic, since E = E.
(i) Translation map on the torus. Take E = [0, 1)n with Lebesgue measure, for a E set
a (x1 , . . . , xn ) = (x1 + a, . . . , xn + a) with addition modulo 1 .
In problem 4.10 it is shown for n = 1 that is measure-preserving, and also ergodic if
and only if a is irrational.
(ii) Bakers map. Take E = (0, 1] with Lebesgue measure and set (x) = 2x b2xc .
In problem 4.11 it is shown that is measure-preserving and ergodic.
Proposition 6.2. If f Z: E R isZintegrable and : E E is measure-preserving, then f
is integrable and
f d =
f d .
E


Proof. For f = 1A , A E the statement reduces to (A) = 1 (A) , which holds since
is measure-preserving. This extends to simple functions by linearity, to non-negative measurable functions by monotone convergence and to integrable f = f + f again by linearity.2
Proposition 6.3. If : E E is ergodic and f : E R is invariant, then f = c a.e.
for some constant c R .
c
Proof. For all A B, (f
 A) = 0 or (f A
) = 0, since f is E -measurable and
is ergodic. Set c := inf a R : (f > a) = 0 . So (f a) = 0 for all a < c and
(f a) = 0 for all a > c, and thus f = c a.e. .
2

Interpretation. : E E defines a dynamical system xn = xn (x0 ) = n (x0 ) E


with discrete time n N and initial condition x0 E. The dynamics is defined on the abstract
state space (E, E, ) and the observables
are given byR measurable functions f : E R.
R
If is measure-preserving, then E f (xn ) (dx0 ) = E f (x0 ) (dx0 ) for all f by Proposition
6.2 and can be interpreted as a stationary distribution for the process (xn )n . If f is invariant
then f (xn ) = f (x0 ) for all n and f is a conserved quantity, such as energy in a physical system.
If there exists such a non-constant f , the state space can be partitioned in subsets f 1 (y) E

47

for all y f (E), which are non-communicating under the time evolution defined by . However if is ergodic, Proposition 6.3 implies that the only invariant functions are constant a.e..
So an ergodic dynamical system does not have conserved quantities which partition the state
space into non-communicating classes of non-zero measure (compare to Markov chains).
For the rest of this section we consider the infinite product space


E = RN = x = (xn )nN : xn R, n N
with -algebra E = (Xn : n N) generated by the coordinate maps Xn : E R with
Xn (x) = xn .
Remark. E = (C) generated by the -system
nO
o
C=
An : An B, An = R for all but finitely many n ,
nN

which consists of socalled cylinder sets, where only finitely many coordinates are specified.
Let (Yn )nN be a sequence of iidrvs with distribution m. With the Skorohod theorem they
can be constructed
RN defined as
 on a common probability space (, A, P). Y :
Y () = Yn () nN is A/E-measurable and the distribution = P Y 1 of Y satisfies
Y
O
(A) =
m(An ) for all cylinder sets A =
An C .
nN

nN

Since C is a -system generating E, this is the unique measure on (E, E) with this property.
Therefore (, A, P) = (RN , E, ) is a generic example of such a common probability space.
Definition 6.3. On the probability space (RN , E, ) the coordinate maps Xn : RN R
themselves are iidrvs with distribution m, and this is called the canonical model for such a
sequence. The shift map : RN RN is defined as (x1 , x2 , . . .) = (x2 , x3 , . . .) .
Theorem 6.4. The shift map is ergodic.
Proof. is measurable and measure-preserving (see problem 4.9).
To see that is ergodic recall the tail -algebra
\
T =
Tn where Tn = (Xm : m > n) E .
nN



For A = kN Ak C , n (A) = x E : Xn+k (x) Ak forall k 1 Tn .

Since Tn is a -algebra, n (A) Tn for all A E . If A
E = B E : 1 (B) = B
T
then A = n (A) Tn for all n N and thus A n Tn = T so that E T . By
Kolmogorovs 0-1-law T and thus E is trivial and is ergodic.
2
N

48

6.3

Ergodic Theorems

In the following let (E, E, ) be a -finite measure space with a measure-preserving transformation : E E. Let f : E R be integrable and define Sn : E R by S0 = 0 and
Sn = Sn (f ) = f + f + . . . + f n1

for n 1 .

Example. Let (RN , E, ) be the canonical model for iidrvs (Xn )nN , f = X1 : E R the
n
P
first coordinate map and the shift map from the previous section. Then Sn (X1 ) =
Xi .
i=1

Lemma 6.5. Maximal ergodic lemma


Z
Let S = sup Sn : E R . Then
nN

f d 0 .
S >0

Proof. Set Sn = max Sm and An = {Sn > 0} . Then, for m = 1, . . . , n,


0mn

Sm = f + Sm1 f + Sn .
On An we have Sn = max Sm , so Sn f + Sn .
1mn

On Acn we have Sn = 0 Sn . So, integrating and adding, we obtain


Z
Z
Z

Sn d
f d +
Sn d .
E

An

But Sn is integrable and is measure-preserving, so


Z
Z
Sn d =
Sn d < which implies
E

Z
f d 0 .
An

As n , An % {S > 0} so, by monotone convergence,

Z
f d 0 .
{S >0}

Theorem 6.6. Birkhoffs almost everywhere ergodic theorem


Z
Z
Sn

There exists f : E R invariant, with


|f | d
|f | d and
f a.e. as n .
n
E
E
Proof. The functions lim inf n (Sn /n) and lim supn (Sn /n) are invariant, since
S 
S

S


Sn 
n
n+1 f
n+1
= lim inf
= lim inf
= lim inf
.
lim inf
n
n
n
n+1
Therefore, for a < b,
D = D(a, b) =


lim inf (Sn /n) < a < b < lim sup(Sn /n) .
n

is an invariant event. We shall show that (D) = 0. First, by invariance, we can restrict
everything to D and thereby reduce to the case D = E. Note that either b > 0 or a < 0. We
can interchange the two cases by replacing f by f . Let us assume then that b > 0.
Let B E with (B) < , then g = f b 1B is integrable and, for each x D, for some n,
Sn (g)(x) Sn (f )(x) n b > 0 .
49

Hence S (g) > 0 everywhere and, by the maximal ergodic lemma,


Z
Z
0
(f b 1B ) d =
f d b (B) .
D

Z
Since is -finite, we can let B % D to obtain b (D)

f d .
D

In particular, we see that (D) < .Z A similar argument applied to fZand a, this time with
B = D, shows that (a)(D)

(f ) d . Hence

f d a (D) .

b (D)
D

Since a < b and the integral is finite, this forces (D) = 0.


Back to general E. Set


= lim inf (Sn /n) < lim sup(Sn /n) ,
n

then is invariant. Also, = a,bQ,a<b D(a, b) , so () = 0 . On the complement of


, Sn /n converges in [, ], so we can define an invariant function f : E R by

limn (Sn /n) on c
f =
.
0
on
R
R
R
R
Finally, we have E |f n | d = E |f | d , so E |Sn | d n E |f | d for all n. Hence,
by Fatous lemma,
Z
Z
Z
Z

|f | d =
lim inf |Sn /n| d lim inf
|Sn /n| d
|f | d .
2
E

Theorem 6.7. von Neumanns Lp ergodic theorem


Assume that (E) < and p [1, ) and let f be the invariant limit function of Theorem
6.6. Then, for f Lp , Sn /n f in Lp .
Proof. Since is measure-preserving we have
Z
1/p
n
p
n
kf kp =
|f | d
= kf n1 kp = . . . = kf kp .
E

So, by Minkowskis inequality,


kSn (f )/nkp kf kp .
Given  > 0 , choose K < so that kf gkp < /3 , where g = (K) f K .
By Birkhoffs theorem, Sn (g)/n g a.e. . We have |Sn (g)/n| K for all n so, by
bounded convergence ((E) < ), there exists N such that, for n N ,
kSn (g)/n gkp < /3 .
By Fatous lemma,
kf gkpp =

Z
S (f g) p


Sn (f g) p
n
d

lim
inf
lim inf


d kf gkpp .
n
n
n
n
E
E

Hence, for n N ,
S (f )

S (f g)
S (g)

n

n

n

f
g + k
g fkp < 3/3 =  .

+
n
n
n
p
p
p

50

Corollary
limit function of Theorem 6.6.
Z 6.8. LetZ(E) < , f L and f be the invariant
Z
Then
f d =
f d and if is ergodic, f = f d/(E) a.e. .
E

Z

Z S




n
Proof.
d
f d Sn /n f 1 0 by Theorem 6.7. By the definition of
E
E n
Z
Z
Sn
Sn ,
f d for all n N since is measure preserving and the first statement
d =
E n
E
follows.
If is ergodic, the invariant function f is constant a.e. by Proposition 6.3, and together with
the first this implies the second statement.
2

6.4

Limit theorems for sums of random variables

Theorem 6.9. Strong law of large numbers



Let Yn : R, n N be iidrvs with E(Yn ) = R and E |Yn | < , i.e. Yn L1 . For
Sn = Y1 + . . . + Yn we have
Sn /n a.s. ,

as n .

Proof. Let (RN , E, ) be the canonical model for the sequence Y := (Yn )nN RN with
distribution as in Def. 6.3. Take f = Y1 L1 to be the first coordinate map. Note that
Sn = Y1 + . . . + Yn = f + f + . . . + f n1 ,
where : RN RN is the shift map which is measure preserving and ergodic by Theorem 6.4.
With (RN ) = 1 we have by Theorem 6.6 and Corollary 6.8
a.s.

Sn /n E(f ) = E(Y1 ) = .
Remarks. (i) By Theorem 6.7 we also have convergence in L1 in Theorem 6.9.

(ii) Theorem 6.9 is stronger than Theorem 6.1 where we needed E(Yn4 ) M for all n N.
But here the Yn have to be identically distributed for to be measure preserving.
(iii) With Thm 2.10 and Prop 2.11 the strong implies the weak law of large numbers:
Sn /n a.s. Sn /n in probability Sn /n as n .
Theorem 6.10. Levys convergence theorem for characteristic functions
Let Xn , n N and X be random variables in R with characteristic functions X (u) =
E eiuX and Xn (u). Then
Xn (u) X (u) for all u R

Xn X in distribution .



D
Proof. : By Theorem 3.9, Xn X E f (Xn ) E f (X) for all f Cb (R)
and eiux = cos(ux) + i sin(ux) is bounded and continuous for all u R.
is more involved, see e.g. Billingsley, Probability and Measure (3rd ed.), Thm 26.3. 2
51

Theorem 6.11. Central limit theorem


Let (Xn )nN be a sequence of iidrvs with mean 0 and variance 1. Set Sn = X1 + . . . + Xn .

Then P(Sn / n .) N (0, 1) , i.e. for all a < b,


Z b


1
2
ey /2 dy as n .
P Sn / n [a, b]
2
a


Proof. Set (u) = E eiuX1 . Since E(X12 ) < , we can differentiate E eiuX1 twice under
the expectation, to show that (see problem 4.2(b))
(0) = 1 ,

0 (0) = 0 ,

00 (0) = 1 .

Hence, by Taylors theorem, (u) = 1 u2 /2 + o(u2 ) as u 0.

So, for the characteristic function n of Sn / n,





 n
n
n (u) = E eiu(X1 +...+Xn )/ n = E ei(u/ n)X1
= 1 u2 /(2n) + o(u2 /n) .

The complex logarithm satisfies log(1 + z) = z + o |z| as z 0, so, for each u R,

log n (u) = n log 1 u2 /(2n) + o(u2 /n) = u2 /2 + o(1) , as n .
2

Hence n (u) eu /2 for all u. But eu /2 is the characteristic function of the N (0, 1) distribution, so Levys convergence theorem completes the proof.
2
Remarks. (i) This is only the simplest version of the central limit theorem. It holds in
more general cases, e.g. for non-independent or not identically distributed r.v.s.
(ii) Problem 4.5 indicates that Sn /n can also converge to other (socalled stable) distributions
than the Gaussian.

52

Appendix
A
A.1

Example sheets
Example sheet 1 Set systems and measures

1.1 Let E be a set.


(a) Let F P(E) be the set of all finite sets and their complements (called cofinite sets). Show
that F is an algebra.
(b) Let G P(E) be the set of all countable sets and their complements. Show that G is a
-algebra.
(c) Give a simple example of E and -algebras E1 , E2 , such that E1 E2 is not a -algebra.
1.2 A non-empty set A in a -algebra E is called an atom, if there is no proper subset B A such that
B E. Let A1 , . . . , AN be non-emtpy subsets of a set E.

S
(a) If the An are mutually disjoint and n An = E, how many elements does {A1 , . . . , AN }
have and what are its atoms?

(b) Show that in general {A1 , . . . , AN } consists of only finitely many sets.
1.3 Show that the following families of subsets of R generate the same -algebra B:
(i) {(a, b) : a < b} ,
(ii) {(a, b] : a < b} ,
(iii) {(, b] : b R} .
1.4 A -algebra is called separable if it can be generated by a countable family of sets. Show that the
Borel -algebra B of R is separable.
1.5 For which -algebras on R are the following set-functions measures:



0 , if A is finite
0 , if A =
0 , if A =
?
,
3 (A) =
,
2 (A) =
1 (A) =
1 , if Ac is finite
, if A 6=
1 , if A 6=
1.6 Let E be a ring on E and : E [0, ] an additive set function. Show that:
(a) If is continuous from below at all A E it is also countably additive.
(b) If is countably additive it is also countably subadditive.
1.7 Let (E, E, ) be a measure space. Let (An )nN be a sequence of sets in E, and define
[ \
\ [
lim inf An =
Am ,
lim sup An =
Am .
nN mn

nN mn

(a) Show that

(lim inf An ) lim inf (An ) .

(b) Show that

(lim sup An ) lim sup (An ) if (E) < .

Give an example with (E) = when this inequality fails.


1.8 (a) Show that a -system which is also a d-system is a -algebra.
(b) Give an example of a d-system that is not a -algebra.
1.9 (a) Find a Borel set that cannot be written as a countable union of intervals.

53

(b) Let B B be a Borel set with (B) < , where is the Lebesgue measure. Show that, for
every  > 0, there exists a finite union of disjoint intervals A = (a1 , b1 ] . . . (an , bn ] such
that (A 4 B) <  , where A 4 B = (A \ B) (B \ A).
(Hint: First consider all bounded sets B for which the conlusion holds and show that they form
a d-system.)
1.10 Completion
Let (E, E, ) be a measure space. A subset N of E is called null if N B for some B E with
(B) = 0. Write N for the set of all null sets.


(a) Prove that the family of subsets C = A N : A E, N N
is a -algebra.
0
(b) Show that the measure may be extended to a measure on C with 0 (A N ) = (A).
The -algebra C is called the completion of E with respect to .
1.11 Let (An )nN be a sequence of events in the probability space (, A, P), i.e. An A for all n.
Show that the An , n N, are independent if and only if the -algebras which they generate,
An = {, An , Acn , }, are independent.
1.12 (a) Let F be the Lebesgue-Stieltjes measure on R associated
with the distribution function F .

Show that F is continuous at x if and only if F {x} = 0.
(b) Let (Fn )nN , be a sequence of distribution functions on R such that F (x) = lim Fn (x) exists
n
for all x R. Show that F need not be a distribution function.
1.13 Cantor set
Let C0 = [0, 1], and let C1 , C2 , . . . be constructed iteratively by deletion of middle-thirds.
C1 = [0, 13 ] [ 32 , 1] , C2 = [0, 19 ] [ 92 , 13 ] [ 32 , 79 ] [ 89 , 1] and so on.
\
The set C = lim Cn =
Cn is called the Cantor set.

Thus

nN

Let Fn be the distribution function of the uniform probability measure concentrated on Cn .


(a) Show that C is uncountable and has Lebesgue measure 0.
(b) Show that the limit F (x) = lim Fn (x) exists for all x [0, 1].
n

(Hint: Establish a recursion relation for Fn (x) and use the contraction mapping theorem.)
(c) Show that F is continuous on [0, 1] with F (0) = 0, F (1) = 1.
(d) Show that F is differentiable except on a set of measure 0, and that F 0 (x) = 0 wherever F is
differentiable.
1.14 Riemann zeta function
The Riemann zeta function is given by (s) =

ns ,

n=1

s>1.

Let s>1 and Pf : P(N) [0, 1] be the probability measure with mass function f (n) = ns /(s).
For p {1, 2, . . .} let Ap = {n N : p | n} (p divides n).
(a) Show that the events {Ap : p prime} are independent. Deduce Eulers formula
Y 
1
1
=
1 s .
(s)
p
p prime

(b) Show that


Pf {n N : n is square-free} =

54

1
.
(2s)

A.2

Example sheet 2 Measurable functions and integration

Unless otherwise specified, let (E, E), (F, F) be measurable spaces and (, A, P) be a probability space.
2.1 Let f : E F be any function (not necessarily measurable).


(a) Show that f 1 (A) = f 1 (A) for all A P(F ).
(b) Let f be E/F-measurable. Under which circumstances is f (E) P(F ) a -algebra?
(c) Take (E, E) = (F, F) = (R, B). Find the -algebras (fi ) generated by the functions
f1 (x) = x , f2 (x) = x2 , f3 (x) = |x| , f4 (x) = 1Q (x) .
2.2 Let fn : E R, n N be E/B-measurable functions. Show that also the following functions are
measurable, whenever they are well defined:
(a) f1 + f2

(b) inf fn
nN

(f) Deduce further that:

(c) sup fn

(d) lim inf fn

(e) lim sup fn .


n


x E : fn (x) converges as n E
n

nN


2.3 Let f : E Rd be written in the form f (x) = f1 (x), . . . , fd (x) . Show that f is measurable
w.r.t. E and B(Rd ) if and only if each fi : E R is measurable w.r.t. E and B.
2.4 Skorohod representation theorem
Let Fn : R [0, 1], n N be probability distribution functions. Consider the probability space
(, A, P) where = (0, 1], A = B((0, 1]) are the Borel sets on (0, 1] and P
 is the restriction
of
Lebesgue measure to A. For each n define Xn : (0, 1] R , Xn () = inf x : Fn (x) .
(a) Show that the Xn are random variables with distributions Fn . Are the Xn independent?
(b)* Suppose F (x) is a probability distribution function such that lim Fn (x) = F (x) for all x R
n

at which F is continuous. Let X : (0, 1] R be a random variable with distribution F defined


analogously to the Xn . Show that Xn X a.s..
2.5 Let X1 , X2 , . . . be random variables on (, A, P).
(a) Show that X1 and X2 are independent if and only if

P X1 x, X2 y = P(X1 x) P(X2 y) for all x, y R .
()
(b) Suppose () holds for all pairs Xi , Xj , i 6= j. Is this sufficient for the (Xn )nN to be independent? Justify your answer.
(c) Let X1 , X2 be independent and identically distributed. Show that X1 = X2 almost surely
implies that X1 and X2 are almost surely constant.
D

2.6 Let X1 , X2 , . . . be random variables with Xn X. Show that then also h(Xn ) h(X) for
all continuous functions h : R R.
(Hint: Use the Skorohod representation theorem)
2.7 Let X1 , X2 , . . . be random variables on (, A, P) and T the tail -algebra of (Xn )nN .
For each n N let Sn = X1 + . . . + Xn . Which of the following are tail events in T ,



{Xn 0 ev.} , {Sn 0 i.o.} ,
lim inf Sn 0 , { lim Sn exists ?
n

55


2.8 Let X1 , X2 , . . . be random variables with

Xn =

n2 1 with probability 1/n2


.
1 with probability 1 1/n2

X + + X 
X1 + + Xn
1
n
= 0 for each n , but
1 almost surely.
Show that E
n
n
2.9 Let X, X1 , X2 , . . . be random variables on (, A, P).


(a) Show that
: Xn () X() A .
(b) Show that Xn X almost surely sup |Xm X| 0 in probability .
mn

2.10 Let X1 , X2 , . . . be independent random variables with distribution N (0, 1). Prove that
p

lim sup Xn / 2 log n = 1
a.s.
n


(Hint: Consider the events An = Xn > 2 log n for (0, ).)


2.11 Show that, as n ,
Z

sin(ex )/ 1 + nx2 dx 0 ,
(a)

Z
(b)


(n cos x)/ 1 + n2 x3/2 dx 0 .

2.12 Let u, v : R R be differentiable on [a, b] with continuous derivatives u0 and v 0 .


Show that for a < b
Z b
Z b


u(x) v 0 (x) dx = u(b) v(b) u(a) v(a)
u0 (x) v(x) dx .
a

2.13 Let : [a, b] R be


differentiable and strictly increasing. Show that for all continu continuously

ous functions g on (a), (b)
Z (b)
Z b

g(y) dy =
g (x) 0 (x) dx .
(a)

2.14 Show that the function f (x) = x1 sin x is not Lebesgue integrable over [1, ) but that
Z y
R

lim
f (x) dx = .
(use e.g. Fubinis theorem and x1 = 0 ext dt)
y 0
2
R
2.15 (a) Let be a measure
R on (E, E) and f : E [0, ) be E/B-measurable with E f d < .
Define (A) = A f d for each A E. Show that is a measure on (E, E) and that
Z
Z
g d =
f g d for all integrable g : E R .
E

(b) Let be a -finite measure on (E, E). Show that for all E/B-measurable g : E [0, )
Z
Z
g d =
(g ) d .
E

56

A.3

Example sheet 3 Convergence, Fubini, Lp -spaces

Unless otherwise specified, let (E, E, ) be a measure space and (, A, P) be a probability space.

3.1 Let be the Lebesgue measure on R2 , B(R2 ) .
2

y
f (x, y) = (xx2 +y
calculate the iterated Lebesgue integrals
2 )2
R1R1
R1R1
dx .
0 0 f (x, y) dx dy and
0 0 f (x, y) dy
R
What does the result tell about the double integral (0,1)2 f d ?
(
xy
2
2 2 , (x, y) 6= (0, 0)
(b) Show that for f (x, y) = (x +y )
the iterated integrals
0
, (x, y) = (0, 0)
R1 R1
R1 R1
f (x, y) dy dx
1 1 f (x, y) dx dy and
R1 1
coincide, but that the double integral (1,1)2 f d does not exist.

(a) For

(c) Let be the counting measure on (R, B), i.e. (A) is equal to the
in A
 number of elements

2
whenever A is finite, and (A) = otherwise. Denote by = (x, y) (0, 1) : x = y
the diagonal in (0, 1)2 and calculate the iterated integrals
R1R1
R1R1
0 0 1 (x, y) dx (dy) and
0 0 1 (x, y) (dy) dx .
Does the result contradict Fubinis theorem?
3.2 (a) Are the following statements equivalent? (Justify your answer.)
(i) f is continuous almost everywhere,
(ii) f = g a.e. for a continuous function g.

(b) Let Xn U [1/n, 1/n] be uniform random variables on [1/n, 1/n] for n N.
Do the Xn converge, and if yes in what sense?
3.3 Prove that the space L (E, E, ) is complete.
3.4 Let p [1, ] and let fn , f Lp (E, E, ) for n N. Show that:
fn f in Lp

fn f in measure ,

but the converse is not true .

3.5 Read hand-out 2 carefully. Find examples which show that the reverse implications, concerning
the concepts of convergence on page 1, are in general false. How does the picture change if the
measure space (, A, P) is not finite?
3.6 Let X be a random variable in R and let 1 p < q < . Show that
Z


E |X|p =
p p1 P |X| d
0

and deduce:


X Lq (P) P |X| = O(q ) X Lp (P) .

Remark on questions 3.7(a) and 3.8(a): Start with an indicator function and extend your argument to the
general case, analogous to the proof of Lemma 3.14(ii).
3.7 A stepfunction g : R R is any finite linear combination of indicator functions of finite intervals.
(a) Show that the set of stepfunctions I is dense in Lp (R) for all p [1, ),
i.e. for all f Lp (R) and every  > 0 there exists g I such that kf gkp < .
(Hint: Use the result of question 1.9.)
57

(b) Using (a), argue that the set of continuous functions C(R) is dense in Lp (R), p [1, ).
3.8 (a) Show that, if X and Y are independent random variables, then kX Y k1 = kXk1 kY k1 ,
but that the converse is in general not true.
(b) Show that, if X and Y are independent and integrable, then E(X Y ) = E(X) E(Y ) .
3.9 Let V1 V2 . . . be an increasing sequence of closed subspaces of L2 = L2 (E, E, ).
For f L2 , denote by fn the orthogonal projection of f on Vn . Show that fn converges in L2 .
S
3.10 Given a countable family of
disjoint
events
(G
)
,
G

A
,
with
i
i
iI
iI Gi = .

2
Set G = Gn : n N and V = L (, G, P) .
Show that, for X L2 (, A, P), the conditional expectation E(X | G) is a version of the
orthogonal projection of X on V .
3.11 (a) Find a sequence of random variables (Xn )nN which is not bounded in L1 , but satisfies the
other condition for uniform integrability, i.e.

 > 0 > 0 A A i I : P(A) < E |Xi | 1A <  .
(b) Find a uniformly integrable sequence of random variables (Xn )nN such that

Xn 0 a.s. and E sup |Xn | = .
n

3.12 Let (Xn )nN be a sequence of identically distributed r.v.s in L2 (P). Show that, as n ,

(a) for all  > 0 , n P |X1 | >  n 0 ,
(b) n1/2 max |Xk | 0 in probability ,
kn

(c) n

1/2

max |Xk | 0 in L1 .
kn

3.13 The moment generating function MX of a real-valued random variable X is defined by



MX () = E eX , R.


(a) Show that the maximal domain of definition I = R : MX () <
is an interval
and find examples for I = R, {0} and (, 1).
Assume for simplicity that X 0 from now on.
(b) Show that if I contains a neighbourhood of 0 then X has finite moments of all orders given by
 d n

n
E(X ) =
MX () .
d
=0
(c) Find a necessary and sufficient condition on the sequence of moments mn = E(X n ) for I to
contain a neighbourhood of 0.

58

A.4

Example sheet 4 Characteristic functions, Gaussian rvs, ergodic theory


Z

4.1 Let 1 , 2 be finite measures on (R, B) such that

Z
g d1 =

functions g : R R. Show that 1 = 2 .

g d2 for all bounded continuous


R

4.2 Let be a finite measure on (R, B) with Fourier transform


. Show the following:
(a)
isZ a bounded continuous function.
(b) If
|x|k (dx) < , then
has a k-th continuous derivative, which at 0 is given by
R
Z
xk (dx) .

(k) (0) = ik
R

4.3 Let X be a real-valued random variable with characteristic function X .


(a) Show that X (u) R for all u R if and only if X X , i.e. X = X .
(b) Suppose that |X (u)| = 1 for all |u| <  with some  > 0. Show that X is a.s. constant.
(Hint: Take an independent copy X 0 of X, calculate XX 0 to see that X = X 0 a.s..)
4.4 By considering characteristic functions or otherwise, show that there do not exist iidrvs X, Y such
that X Y is uniformly distributed on [1, 1] .
4.5 The Cauchy distribution has density function f (x) =

1
,
(1 + x2 )

xR.

(a) Show that the corresponding characteristic function is given by (u) = e|u| .
(b) Show also that, if X1 , . . . , Xn are independent Cauchy random variables, then
(X1 + + Xn )/n is also Cauchy.
Comment on this in the light of the strong law of large numbers and the central limit theorem.
4.6 Let X, Y N (0, 1) and Z N (0, 2 ) be independent Gaussian random variables. Calculate the
characteristic function of = XY Z .
4.7 Suppose X N (, 2 ) and a, b R. Prove Proposition 5.3, i.e. show that
(b) var(X) = 2 ,
2 2
(d) X (u) = eiuu /2 .

(a) E(X) = ,
(c) aX + b N (a + b, a2 2 ) ,

4.8 Let X1 , . . . , Xn be independent N (0, 1) random variables. Show that





 X
n
X
n1
2
2
X,
(Xm X)
and
Xn / n,
Xm
m=1

m=1

have the same distribution, where X = (X1 + + Xn )/n.


4.9 Show that the shift map of Definition 6.3 is measurable and measure-preserving.
4.10 Let E = [0, 1) with Lebesgue measure. For a E consider the mapping
a : E E ,

a (x) = (x + a) mod 1 .

(a) Show that a is measure-preserving.


(b) Show that a is not ergodic when a is rational.
59

(c) Show that a is ergodicR when a is irrational.


(Hint: Consider an = E f (x) e2inx dx to show that every invariant function is constant.)
(d) Let f : E R be integrable. Determine for each a E the limit function

f = lim f + f a + . . . + f an1 /n .
n

4.11 Show that (x) = 2x mod 1 is a measure-preserving transformation on E = [0, 1) with


Lebesgue measure, and that is ergodic. Find f for each integrable function f .
(Hint: Consider the binary expansion x = 0.x1 x2 x3 . . . and use that Xn (x) = xn are iidrvs with
P(Xn = 0) = P(Xn = 1) = 12 , which is proved on hand-out 2.)
4.12 Call a sequence of random variables (Xn )nN on a common probability space stationary if for each
n, k N the random vectors (X1 , . . . , Xn ) and (Xk+1 , . . . , Xk+n ) have the same distribution, i.e.
for A1 , . . . , An B,
P(X1 A1 , . . . , Xn An ) = P(Xk+1 A1 , . . . , Xk+n An ) .
Show that, if (Xn )nN is a stationary sequence and X1 Lp , for some p [1, ), then
n
1 X
Xi X a.s. and in Lp ,
n
i=1

for some random variable X Lp , and find E(X).



4.13 Find a sequence (Xn )nN of independent random variables with E |Xn | < and E(Xn ) = 0
for all n N, such that (X1 + . . . + Xn )/n does not almost surely converge to 0.
4.14 Let (Xn )nN be independent random variables with P(Xn = 0) = P(Xn = 1) =
Un = X1 X2 + X2 X3 + . . . + X2n X2n+1 .
Show that

Un /n c a.s. for some c R, and determine c.

60

1
2

, and define

B
B.1

Hand-outs
Hand-out 1 Proof of Caratheodorys extension theorem

Theorem 1.4. Caratheodorys extension theorem


Let E be a ring on E and : E [0, ] be a countably additive set function. Then there
exists a measure 0 on E, (E) such that 0 (A) = (A) for all A E.
Proof. For any B E, define the outer measure

(B) = inf

(An ) ,
S
where the infimum is taken over all sequences (An )nN in E such that B n An and is taken
to be if there is no such sequence. Note that is increasing and () = 0. Let us say that
A E is -measurable if, for all B E,
n

(B) = (B A) + (B Ac ) .
Write M for the set of all -measurable sets. We shall show that M is a -algebra containing
E and that is a measure on M, extending . This will prove the theorem.

Step I. We show that


subadditive.
S is countably
Suppose that B n Bn . If (Bn ) < for all n, then, given  > 0, there exist sequences
(Anm )mN in E, with
[
X
Bn
Anm , (Bn ) + /2n
(Anm ) .
m

[[

and thus (B)


n m
X
Hence, in any case (B)
(Bn ) .

Then

Anm

XX
n

(Anm )

(Bn ) +  .

Step II. We show that extends .


Since E is a ring and is countably additive, is countably subadditive.
X Hence, for A E and
S
(A)
any sequence (An )nN in E with A n An , we have
(An ).
n

On taking the infimum over all such sequences, we see that (A) (A). On the other hand,
it is obvious that (A) (A) for A E.
Step III. We show that M contains E.
Let A E and B E. We have to show that
(B) = (B A) + (B Ac ) .
By subadditivity of , it is enough to show that
(B) (B A) + (B Ac ) .
If (B) = , this is trivial, so let us assume that (B) < . Then, given  > 0, we can
find a sequence (An )nN in E such that
[
X
B
An , (B) + 
(An ) .
n

61

Then

BA

(An A) ,

B Ac

(An Ac ) ,

so that

(B A) + (B Ac )

(An A) +

(An Ac ) =

(An ) (B) +  .

Since  > 0 was arbitrary, we are done.


Step IV. We show that M is an algebra.
Clearly E M and Ac E whenever A E. Suppose that A1 , A2 M and B E. Then
(B) = (B A1 ) + (B Ac1 )
= (B A1 A2 ) + (B A1 Ac2 ) + (B Ac1 )
= (B A1 A2 ) + (B (A1 A2 )c A1 ) + (B (A1 A2 )c Ac1 )
= (B (A1 A2 )) + (B (A1 A2 )c ) .
Hence A1 A2 M.
Step V. We show that M is a -algebra and that is a measure on M.
We already know that M is anSalgebra, so it suffices to show that, for any sequence of disjoint
sets (An )nN in M, for A = n An we have
X
A M , (A) =
(An ).
n

So, take any B E, then


(B) = (B A1 ) + (B Ac1 ) = (B A1 ) + (B A2 ) + (B Ac1 Ac2 )
n
X
=... =
(B Ai ) + (B Ac1 . . . Acn ) .
i=1

Note that (B Ac1 . . . Acn ) (B Ac ) for all n. Hence, on letting n and using
countable subadditivity, we get
(B)

(B An ) + (B Ac ) (B A) + (B Ac ).

n=1

The reverse inequality holds by subadditivity, so we have equality. Hence A M and, setting

B = A, we get
(A) =
(An ) .
2
n=1

62

B.2

Hand-out 2 Convergence of random variables

Let X, Xn : R be random variables on a probability space (, A, P) with distributions , n .


There are several concepts of convergence of random variables, which are summarised in the following:
(i) Xn X

everywhere or pointwise if

a.s.

(ii) Xn X

Xn () X() for all as n .

P(Xn 6 X) = 0 .

almost surely (a.s.) if

in probability if  > 0 : P(|Xn X| > ) 0

Lp

in Lp for p [1, ],

(iii) Xn X
(iv) Xn X

if kXn Xkp 0

as n .

as n .

(v) Xn X in distribution or in law if P(Xn x) P(X x) as n , for all continuity


points of P(X x). Since equivalent to (vi), this is often also called weak convergence.
Z
Z
f d for all f Cb (R, R) .
f dn
(vi) n weakly if
R

The following implications hold

Xn X

(q p 1):

a.s.

Xn X

Xn X

Xn X

Lq

Xn X

Lp

Xn X

Proofs are given in Theorem 2.10, Proposition 2.11 (see below), Theorem 3.9 (see below), Corollary 5.3
and example sheet question 3.2.
P

Proof. of Proposition 2.11: Xn X Xn X


P
Suppose Xn X and write Fn (x) = P(Xn x) , F (x) = P(X x) for the distr. fcts.


If  > 0, Fn (x) = P Xn x, X x +  + P Xn x, X > x +  F (x + ) + P(|Xn X| > ) .


Similarly, F (x ) = P X x , Xn x + P X x , Xn > x Fn (x) + P(|Xn X| > ).
F (x ) P(|Xn X| > ) Fn (x) F (x + ) + P(|Xn X| > ) ,

Thus

and as n

F (x ) lim inf Fn (x) lim sup Fn (x) F (x + ) for all  > 0 .


n

If F is continuous at x, F (x ) % F (x) and F (x + ) & F (x) as  0, proving the result.

Proof. of Theorem 3.9: Xn X n


D
Suppose Xn X. Then by the Skorohod theorem 2.12 there exist Y X and Yn Xn on a common
probability space (, A, P) such that, f (Yn ) f (Y ) a.e. since f Cb (R, R). Thus
Z

Z
f (Yn ) dP

f dn =
R

Z
f (Y ) dP =

f d and
R

63

n by bounded convergence .

Suppose n and let y be a continuity pointof FX .

For > 0, approximate 1(,y] by

such that
1 + (y x)/ , x (y, y + )

1+(xy)/ , x 6 (y, y)
where g (x) = 1+(yx)/ , x [y, y+) .

0
, otherwise


Z
Z



(1(,y] f ) d g d
R

1(,y] (x) , x 6 (y, y + )

f (x) =

The same inequality holds for n for all n N. Then as n


Z
Z



FXn (y) FX (y) = 1(,y] dn
1(,y] d
R
R
Z


Z
Z
Z
Z






f d 2 g d ,
g dn + g d + f dn
R

R


since f , g Cb (R, R). Now, R g d (y , y + ) 0 as 0 ,


since {y} = 0,

so Xn X.

Skorohod representation theorem


For all probability distribution functions F1 , F2 , . . . : R [0, 1] there exists a probability space
(, A, P) and random variables X1 , X2 , . . . : R such that Xn has distribution function Fn .
(a) The Xn can be chosen to be independent.
(b) If Fn F for all continuity points of the probability distribution function F , then the Xn can also
be chosen such that Xn X a.s. with X : R having distribution function F .

Proof. Consider the probability space (, A, P) where = (0, 1], A = B (0, 1] andP is the restriction

of Lebesgue measure to A. For each n N define Gn : (0, 1] R , Gn () = inf x : Fn (x) .
(b) In problem 2.4 it is shown that Xn = Gn are random variables with distribution functions Fn and
that Xn X a.s. under the assumption in (b), where X() = G() is defined analogously.
(a) Each has a unique binary expansion = 0.1 2 3 . . ., where we forbid infinite sequences
of 0s. The Rademacher functions Rn : {0, 1} are defined as Rn () = n . Note that
R1 = 1( 1 ,1] ,
2

R2 = 1( 1 , 1 ] + 1( 3 ,1] ,
4 2

R3 = 1( 1 , 1 ] + 1( 3 , 1 ] + 1( 5 , 3 ] + 1( 7 ,1] ,
8 4

n1
2[

8 2

8 4

...

 2k 1 2k i
, n .
2n
2
k=1
With problem 1.11 the Rn are independent if and only if the An are. To see this, take n1 < . . . < nL for
some L N and we see that An1 , . . . , AnL are independent by induction, using that
thus in general Rn = 1An

where

An =

In,k

and

In,k =

P(Inl ,k Anl+1 ) = 12 P(Inl ,k ) = P(Inl ,k ) P(Anl+1 ) for all k = 1, . . . , 2nl 1 ,


and thus

P(An1 . . . Anl Anl+1 ) = P(An1 . . . Anl ) P(Anl+1 ) .

Now choose a bijection m :

N2

and set

Yk,n = Rm(k,n)

and

Yn =

2k Yk,n .

k=1

Then Y1 , Y2 , . . . are independent and P i 2k < Yn (i + 1) 2k = 2k for all n, k, i .
Thus P(Yn x) = x for all x (0, 1]. So Xn = Gn (Yn ) are independent random variables with
distribution Fn , which can be shown analogous to (b).
2

64

B.3

Hand-out 3 Connection between Lebesgue and Riemann integration

Definition. f : [a, b] R is Riemann integrable (R-integrable) with integral R R, if


n


X


 > 0 > 0 xj Ij : R
f (xj ) |Ij | <  ,

(B.1)

j=1

for some finite partition {I1 , . . . , In } of [a, b] into subintervals of lengths |Ij | < .
This corresponds to an approximation of f by step functions
simple functions which are constant on intervals.

Pn

j=1 f (xj )

1Ij , a special case of

The picture is taken from R.L. Schilling, Measures, Integrals and Martingales, CUP 2005. He
writes:
... the Riemann sums partition the domain of the function without taking into
account the shape of the function, thus slicing up the area under the function vertically. Lebesgues approach is exactly the opposit: the domain is partitioned
according to the values of the function at hand, leading to a horizontal decomposition of the area.
Theorem. Lebesgues integrability criterium
f : [a, b] R is R-integrable if and only if f is bounded on [a, b] and continuous almost
everywhere, i.e. the set of points in [a, b] where f is not continuous has Lebesgue measure 0.
Corollary. Suppose f : [a, b] R is R-integrable. Then it is also Lebesgue integrable
(L-integrable) and the values of both integrals coincide.
Proof. For a partition {I1 , . . . , In } define the step functions
gn =

n
X

sup{f (x) : x Ij } 1Ij ,

gn =

j=1

n
X

inf{f (x) : x Ij } 1Ij .

j=1

Thus g n f gn and if f is continuous a.e., g n , gn f a.e. as n and |Ij | 0.


Since f is bounded, it follows by dominated convergence for the L-integrals
Z b
Z b
Z b
Z b
g n (x) dx
f (x) dx ,
gn (x) dx
f (x) dx ,
a

65

Rb
so the sum in (B.1) converges and R = a f (x) dx .

Rb
On the other hand, if the sum in (B.1) converges, then a gn (x) g n (x) dx 0 and thus
for the limit functions g = g a.e., and f is continuous a.e. since g f g .
If f was not bounded, one could choose xj in (B.1) such that the sum does not converge. 2
On the other hand, not every L-integrable function is also R-integrable. The standard example
is f = 1[0,1]Q , which can be made R-integrable by changing it on a set of L-measure 0.
This might suggest that for every L-integrable f there exists an R-integrable g with f = g a.e. .
This is not true as demonstrated by the following example:
Let {r1 , r2 . . .} be an enumeration of the rationals in (0, 1). For small  > 0 and each n N
chooseS an open interval In (0, 1) with rn In and L-measure (In ) <  2n . Put
A = n In . Then A is dense in (0, 1) with 0 < (A) <  and thus for any non-degenerate
subinterval I of (0, 1), (A I) > 0.
Take f = 1A and suppose that f = g a.e.. Let {I
 j } be some decomposition of (0, 1) into
subintervals. Since for each j, Ij A {f = g} = (Ij A) > 0, g(xj ) = f (xj ) = 1 for
some xj Ij A, and thus
n
X

g(xj ) (Ij ) = 1 > (A) .

(B.2)

j=1

If g were R-integrable, its integral would have to coincide with the L-integral
which is in contradiction to (B.2).

B.4

R1
0

f d = (A),

Hand-out 4 Ergodic theorems

Hand-out 4 contains statements and proofs of Lemma 6.5 and Theorems 6.6 and 6.7, which
can be found in Section 6.3.

66