Stefan Grosskinsky
Cambridge, Michaelmas 2006
These notes and other information about the course are available on
www.statslab.cam.ac.uk/stefan/teaching/probmeas.html
The text is based on and partially copied from notes on the same course by James Norris, Alan Stacey and Geoffrey Grimmett.
Contents
Introduction
1
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
5
5
6
8
10
13
Integration
3.1 Definition and basic properties . . . .
3.2 Integrals and limits . . . . . . . . . .
3.3 Integration in R and differentiation . .
3.4 Product measure and Fubinis theorem
.
.
.
.
22
22
25
28
30
Lp spaces
4.1 Norms and inequalities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.2 L2 as a Hilbert space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.3 Convergence in L1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
33
33
36
39
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
46
46
46
49
51
.
.
.
.
.
.
.
.
.
.
.
.
53
53
55
57
59
Appendix B Handouts
B.1 Handout 1 Proof of Caratheodorys extension theorem . . . . . . .
B.2 Handout 2 Convergence of random variables . . . . . . . . . . . .
B.3 Handout 3 Connection between Lebesgue and Riemann integration
B.4 Handout 4 Ergodic theorems . . . . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
61
61
63
65
66
.
.
.
.
.
.
.
.
Introduction
Motivation from two perspectives:
1. Probability
Let , P(), P be a probability space, where is a set, P() the set of events (power set in
this case) and P : P() [0, 1] is the probability measure.
If is countable then we have for every A P()
X
P(A) =
P {} .
A
2. Integration (Analysis)
When P can be described by a density : [0, ) we can handle the situation via
Z
Z
P(A) =
(x) dx =
1A (x) (x) dx ,
A
()
where 1A is the indicator function of the set A. In the example above (x) 1 and this leads
R1
Rb
to P [a, b] = 0 1[a,b] (x) dx = a dx = b a.
In general this approach only makes sense if the integral () exists. Using the theory of
Riemannintegration we are fine as long as A is a finite union or intersection
R 1 of intervals and
(x) is e.g. continuous. But e.g. for A = [0, 1] Q, the Riemannintegral 0 1A (x) dx is not
defined, although the probability for this event is intuitively 0.
Moreover, since A is countable, it can be written as A = an : n N . Define
fn := 1{a1 ,...,an } with fn 1A for n . For every n, fn is Riemannintegrable and
R1
0 fn (x) dx = 0. So it should be the case that
Z 1
Z 1
?
lim
fn (x) dx = 0 =
1A (x) dx ,
n 0
but the latter integral is not defined. Thus the concept of Riemannintegrals is not satisfactory for two reasons: The set of Riemannintegrable functions is not closed, and there are too
many functions which are not Riemannintegrable.
Goals of this course
Generalisation of Riemannintegration to Lebesgueintegration using measure theory,
involving a precise treatment of sets A and functions for which () is defined
Using measure theory as the basis of advanced probability and discussing applications
in that area
Official schedule
Measure spaces, algebras, systems and uniqueness of extension, statement * and proof *
of Caratheodorys extension theorem. Construction of Lebesgue measure on R. The Borel
algebra of R. Existence of nonmeasurable subsets of R. LebesgueStieltjes measures and
probability distribution functions. Independence of events, independence of algebras. The
BorelCantelli lemmas. Kolmogorovs zeroone law.
[6]
Measurable functions, random variables, independence of random variables. Construction of
the integral, expectation. Convergence in measure and convergence almost everywhere. Fatous lemma, monotone and dominated convergence, differentiation under the integral sign.
Discussion of product measure and statement of Fubini s theorem.
[6]
Chebyshev s inequality, tail estimates. Jensens inequality. Completeness of Lp for 1 p
. The Holder and Minkowski inequalities, uniform integrability.
[4]
2
L as a Hilbert space. Orthogonal projection, relation with elementary conditional probability.
Variance and covariance. Gaussian random variables, the multivariate normal distribution. [2]
The strong law of large numbers, proof for independent random variables with bounded fourth
moments. Measure preserving transformations, Bernoulli shifts. Statements * and proofs *
of maximal ergodic theorem and Birkhoff s almost everywhere ergodic theorem, proof of the
strong law.
[4]
The Fourier transform of a finite measure, characteristic functions, uniqueness and inversion.
Weak convergence, statement of Levys convergence theorem for characteristic functions. The
central limit theorem.
[2]
Appropriate books
P. Billingsley, Probability and Measure. Wiley 1995 (hardback).
R.M. Dudley, Real Analysis and Probability. CUP 2002 (paperback).
R.L.Schilling, Measures, Integrals and Martingales. CUP 2005 (paperback).
R.T. Durrett, Probability: Theory and Examples. Wadsworth a. Brooks/Cole 1991 (hardback).
D. Williams, Probability with Martingales. CUP (paperback).
From the point of view of analysis, the first chapters of this book might be interesting:
S. Kantorovitz, Introduction to Modern Analysis. Oxford 2003 (hardback).
Nonexaminable material
proof of Caratheodorys extension theorem on handout 1
part (a) of the proof of Skorohods representation theorem on handout 2
connection between Lebesgue and Riemann integration on handout 3
Proof of the maximal ergodic lemma and Birkhoffs almost everywhere ergodic theorem
on handout 4 and in Section 6
Let E be an arbitrary set and E P(E) a set of subsets. To define a measure : E [0, )
(see section 1.2) we first need to identify a proper domain of definition.
1.1
Set systems
(ii) B \ A E
(iii) A B E .
(ii) Ac = E \ A E
(iii) A B E .
Properties. (i) A ()algebra is closed under (countably) finitely many set operations, since
[
c
\
c
A B = Ac B c E ,
An =
Acn , 1
nN
c
A\B=AB E ,
nN
A 4 B = (A \ B) (B \ A) E .
N
is the
i
i
i
i
0
i=1
ring of halfopen intervals ( is given by the empty intersection n = 0).
R is an algebra if we allow for infinite intervals and identify R = (, ].
nS
o
(iii) Beware:
(a
,
b
]
:
a
,
b
[,
],
a
<
b
,
i
N,
is a not a algebra.
i
i
i
i
i
i
i=1
(see problem 1.9(a))
Lemma
1.1.
Let
E
:
i
I
i
T
Sbe a (possibly uncountable) collection of algebras. Then
E
is
a
algebra,
whereas
i
iI
iI Ei in general is not.
T
Proof. Let E = iI Ei . We check (i) to (iii) in the above definition:
c
(i) Since Ei for all i I, E. (ii) Since for all A E, Ac Ei for
S all i I, A E.
(iii) Let A1 , A
S2 , . . . E. Then Ak Ei for all k N and i I, hence nN An Ei for each
i I and so nN An E. For the second part see problem 1.1 (c).
2
Definition 1.2. Let A P(E). Then the algebra generated by A is
\
(A) :=
E,
the smallest algebra containing A.
EA
E alg.
1
(a, b] : a < b
Proof.
S (i) I R (I) (R). On the other hand, each A R can be written as
(R) (I).
A = ni=1 (ai , bi ] (I). Thus R (I) T
1
(ii) Each A I can be written as A = (a, b] =
n=1 (a, b + n ) B (I) B.
Let A R be open, i.e. xA x >0 : (x x , x + x ) A. Thus
x A ax , bx Q : {x} (ax , bx ] A .
S
Then A = xA (ax , bx ] which is a countable union, since ax , bx Q.
Thus A (I) B (I).
1.2
Measures
(A B) = (A) + (B).
nN
nN
A1 = A, A2 = B, A3 = A4 = . . . =
Definition 1.6. Let (E, E) be a measurable space. A countably additive set function
: E [0, ] is called a measure, the triple (E, E, ) is called measure space.
If (E) < , is called finite. If (E) = 1, is a probability measure and (E, E, ) is a
probability space. If E is a topological space and E = B(E), then is called Borel measure.
Basic properties. Let (E, E, ) be a measure space.
(i) is nondecreasing: For all A, B E, A B it is (B) = (B \ A) + (A) (A).
(Note: The version (B \ A) = (B) (A) only makes sense if (A) < ).
(ii) is subadditive: For all A, B E,
= (A B) + (A B) (A B) .
(Again: (A B) = (A) + (B) (A B) only if (A B) < .)
(iii) is also countably subadditive (see problem 1.6 (b)).
(iv) Let E1 E2 be algebras. If is a measure on E2 , then it is also on E1 .
(v) For A E the restriction A = (. A) is a measure on (E, E).
Remark. These properties also hold for countably additive set functions (called premeasures)
on a ring , (i) and (ii) also for additive set functions on a ring.
1 , xA
.
Examples. (i) For every x E, the Dirac measure is given by x (A) =
0 , x 6 A
(ii) Discrete measure theory:
Let E be countable. Every measure on (E, P(E)) can be characterized by a mass
function m : E [0, ],
=
m(x) x
or equivalently A E : (A) =
xE
X
{x} =
m(x) .
xA
xA
In order to attack this question in the next subsection, it is useful to introduce the following
property of set functions.
Definition 1.7. Let E be a ring on E and : E [0, ] an additive set function. is
continuous at A E, if
(i) is continuous from below:
[
given any A1 A2 A3 . . . in E with
An = A E
(An % A),
nN
nN
Lemma 1.3. Let E be a ring on E and : E [0, ] an additive set function. Then:
(i) is countably additive
is continuous at all A E
is countably additive
is countably additive
Remark. The condition (An ) < for some n N in (ii) of the definition is necessary for
measures to be continuous. Consider e.g. E = N, Ak = {k, k + 1, . . .} and the counting
measure. Then (Ak ) = for all k N, but (A) = () = 0.
Proof. (i) Given An % A in E, then A = (A1 \ A0 ) (A2 \ A1 ) (A3 \ A2 ) . . . (A0 = )
(A) =
m1
[
(An+1 \ An ) = lim (An+1 \ An ) = lim (Am ) .
m
n=0
n=0
Given An & A in E and (Am ) < for some m N. Let Bn := Am \ An for n m. Then
Bn % (Am \ A) for n and thus, following the above,
n
1.3
Proof. The proof is not examinable and is given on Handout 1 in the appendix.
To formulate a result on uniqueness two further notions are useful.
Definition 1.8. Let E be a set. E P(E) is called a system if
E is called a dsystem if
(i) E E ,
A, B A : A B A .
(ii) A, B E, A B : B \ A E ,
(iii) A1 , A2 , . . . E : A1 A2 . . .
An E .
nN
Remarks.
(i) The set I {} = (a, b] : a < b {} is a system on R and we have shown in
Lemma 1.2 that (I) = B.
(ii) E is a algebra E is a  and a dsystem
B D0 .
Now consider
D00 = B (E) : B A (E) for all A (E) D0 .
Then E D00 because D0 = (E). We can check that D00 is a dsystem, just as we did for D0 .
Hence D00 = (E) which shows that (E) is a system.
2
Theorem 1.6. Uniqueness of extension
Let E P(E) be a system. Suppose that 1 , 2 are measures on (E) with
1 (E) = 2 (E) < . If 1 = 2 on E then 1 = 2 on (E).
Equivalently, if (E) < the measure on (E) is uniquely determined by its values on the
system E.
Proof. Consider D = A (E) : 1 (A) = 2 (A) (E). By hypothesis, E D.
For A, B D with A B we have
1 (B \ A) = 1 (B) 1 (A) = 2 (B) 2 (A) = 2 (B \ A) < ,
thus also B \ A D. If An D, n N, with An % A, then
1 (A) = lim 1 (An ) = lim 2 (An ) = 2 (A)
n
AD.
1.4
Lebesgue(Stieltjes) measure
Theorem 1.7. There exists a unique Borel measure on (R, B) such that
(a, b] = b a , for all a, b R with a < b.
The measure is called Lebesgue measure on R.
Proof. P
(Existence) Let R be the Sring of halfopen intervals. Consider the set function
(A) = ni=1 (bi ai ), where A = ni=1 (ai , bi ], n N0 . We aim to show that is countably
additive on R, which then proves existence by Caratheodorys extension theorem.
Since (A) < for all A R, by Lemma 1.3 (iii) it suffices to show that is continuous
from above at . Suppose not. Then there exists > 0 and An & with (An ) 2 for all n.
For each n we can find Cn R with C n An and (An \ Cn ) 2n (see problem 1.9 (b)).
Then
n
n
X
X
X
2k = ,
(Ak \ Ck )
(An \ Ck )
An \ (C1 . . . Cn )
k=1
k=1
k=1
10
Moreover3
card(B) = card(R) = c whereas card(L) = card P(R) = 2c .
Proof. (i) According to problem 1.4, B is separable, i.e. generated by a countable set system E. Thus card(B) card P(E) = c. On the other hand, {x} B for all x R
and thus card(B) = c. With problem 1.10 the Cantor set C R is uncountable and thus
card(C) = c. Since also (C) = 0, with Definition 1.9 we have P(C) L and thus
card(L) = card P(R) = 2c > c.
(ii) Using the axiom of choice we construct a subset of U = [0, 1] which is not in L:
Define the equivalence relation on U by x y if x y Q.
Write {Ei : i I} for the equivalence classes of and let R = {ei : i I} be a
collection of representatives ei Ei , chosen by the axiom of choice. Then U can be partitioned
[
[ [
[ [
[
(ei + q) = (R + q) ,
U = Ei =
(ei + q) =
iI
qQ[0,1) iI
iI qQ[0,1)
qQ[0,1)
{z
=R+q
Remarks. (i) Every set of positive measure has nonmeasurable subsets and, moreover:
P(A) L
(A) = 0 .
lim F (x) = 0 .
11
Proposition 1.9. Let be a Radon measure4 on (R, B). Then for every r R
(r, x] , x > r
Fr (x) :=
in particular Fr (r) = () = 0
(x, r] , x r
is a distriburion function with (a, b] = Fr (b) Fr (a), a < b .
Also Fr + C is a distribution function with that property for all C R.
The Fr differ only in an additive constant, namely Fr (x) = F0 (x) F0 (r) for all r R.
Proof. Fr % by monotonicity of and (a, b] = Fr (b) Fr (a) for b > a by definition. For xn & x > r it is Fr (xn ) = (r, xn ] & (r, x] = Fr (x) by continuity of
measures. For xn & r we have (r, xn ] & such that Fr (xn ) & () = 0 = Fr (r) .
The Fr differ only in a constant, since for all x > r
() = (r, x] (r, x] = F0 (x) F0 (r) Fr (x) + Fr (r) = 0 ,
and Fr (r) = 0 = F0 (r) F0 (r) . Both statements follow analogously for x < 0.
Remarks. (i) The distribution functions for the Lebesgue measure are Fr (x) = x + r, r R.
(ii) Note that for xn % x we have (xn , x] % {x} =
6 , so that Fr as defined in Proposition
1.9 is in general not leftcontinuous.
(iii) If is a probability measure one usually uses the cumulative distribution function
CDF (x) := F (x) = (, x] .
E.g. for the Dirac measure a concentrated in a R, CDFa (x) =
0 , x<a
.
1 , xa
n
X
F (bi ) F (ai )
where A =
i=1
n
[
(ai , bi ] .
i=1
The proof is then the same as that of Theorem 1.7 for Lebesgue measure.
Remark. Analogous to Definition 1.9, B can also be completed with respect to the LebesgueStieltjes measure F . However, the completion LF depends on the measure F . Although L
is much larger than B (see Theorem 1.8), it is therefore preferable to work with the measure
space (R, B), since it is defined independent of the measure and all F will have the same
domain of definition.
4
12
1.5
Let (E, E, P) be a probability space. It provides a model for an experiment whose outcome
is random. E describes the set of possible outcomes, E the set of events (observable sets of
outcomes) and P(A) is the probability of an event A E.
Definition 1.11. The events (Ai )iI , Ai E, are said to be independent if
\ Y
P
Ai =
P(Ai ) for all finite, nonempty J I .
iJ
iJ
The algebras (Ei )iI , Ei E are said to be independent if the events (Ai )iI are independent
for any choice Ai Ei .
A useful way to establish independence of two algebra is given below.
Theorem 1.11. Let E1 , E2 E be systems and suppose that
P(A1 A2 ) = P(A1 ) P(A2 ) whenever A1 E1 , A2 E2 .
Then (E1 ) and (E2 ) are independent.
Proof. Fix A1 E1 and define the measures , by
(A) = P(A1 A) ,
and agree on the system E2 with (E) = (E) = P(A1 ) < . So, by uniqueness of
extension (Theorem 1.6), for all A1 E1 and A2 (E2 )
P(A1 A2 ) = (A2 ) = (A2 ) = P(A1 ) P(A2 ) .
Now fix A2 (E2 ) and repeat the same argument with
0 (A) := P(A A2 ) ,
Remark. In particular, the algebras {A1 } and {A2 } generated by single events are
independent if and only if A1 and A2 are independent.
Background. Let (an )nN be a sequence in R. Then lim an does not necessarily exist,
n
e.g. for an = (1)n . To nevertheless study asymptotic properties of (an )nN consider
an = inf ak
kn
and an = sup ak .
kn
Then an % and an & are monotone and both have limits in R = R {, +}. Define
lim inf an := lim inf ak
n
n kn
and
n kn
which are equal to the smallest and largest accumulation point of (an )nN , respectively. In
general lim inf an lim sup an since am amn an for all m, n N. They may be
n
13
different, as e.g. lim inf (1)n = 1 < 1 = lim sup(1)n . Both are equal if and only if
n
lim an exists. Note that a sequence may have much more than two accumulation points, e.g.
for an = sin n the set of accumulation points is [1, 1], lim inf an = 1 and lim sup an = 1.
n
We use
S the same concept for a sequence (An )nN of subsets of a set E. Note that
and kn Ak & are monotone in n.
kn Ak
Definition 1.12. Let (An )nN be a sequence of sets in E. Then define the sets
[ \
\ [
lim inf An :=
Ak and lim sup An :=
Ak .
n
nN kn
nN kn
One also writes lim inf An =An ev. (eventually) and lim sup An =An i.o. (infinitely often).
n
Properties. (i) Let E be a algebra. If An E for all n N then lim inf An , lim sup An E.
n
(iii)
lim sup An
n
since
c
since
X
[
\ [
(Ak ) 0
Ak
Proof. (An i.o.) =
Ak
nN kn
kn
for n .
kn
Proof. We use the inequality 1 a ea . With (An )nN also (Acn )nN are independent
(see problem 1.11). Then we have for all n N
\
Y
h X
i
P
Ack =
1 P(Ak ) exp
P(Ak ) = 0 .
kn
Hence
kn
kn
[ \
nN kn
14
Ack = 1 .
2.1
Measurable Functions
Definition 2.1. Let (E, E) and (F, F) be measurable spaces. A function f : E F is called
measurable (with respect to E and F) or E/Fmeasurable if
A F : f 1 (A) = x E : f (x) A E
short: f 1 (F) E .
Often (F, F) = (R, B) or (R, B) with the extended real line R = R {, } and
B = B C : b B, C {, } . If in addition E is a topological space with
E = B(E), f is called Borel function.
Remarks. (i) Every function f : E F is measurable w.r.t. P(E) and F.
(ii) Preimages of functions preserve the set operations
f 1
[
nN
[
An =
f 1 (An ) ,
c
f 1 (Ac ) = f 1 (A) ,
nN
15
n
X
ci 1Ai
i=1
i=1
16
for all c R
(iv) inf fn
nN
(ii) f1 + f2
(v) sup fn
nN
(iii) f1 f2
inf fn : x 7 inf fn (x) : n N R and
nN
lim inf fn : x 7 lim inf fk (x) : k n R .
Definition 2.3. Let (E, E) and (F, F) be measurable spaces and let be a measure on (E, E).
Then any E/Fmeasurable function f : E F induces the image measure = f 1
on F, given by (A) = f 1 (A) for all A F .
Remark. is a measure since f 1 (A) E for all A F and f 1 preserves set operations as has been shown above.
2.2
Random Variables
17
Remark. By Proposition 1.9 and Theorem 1.10, X is characterised by FX . Usually random variables are given by their distribution function without specifying (, A, P) and the
function X : R.
Definition 2.5. The random variables (Xn )nN defined on (, A, P) and taking values in
(E, E), are called independent if the algebras (Xn ) = Xn1 (E) A are independent.
Lemma 2.7. Real random variables (Xn )nN are independent if and only if
P X1 x1 , . . . , Xk xk = P X1 x1 P Xk xk
for all x1 , . . . , xk R, k N.
Proof. see problem 2.5 for two random variables X1 , X2 .
This extends to X1 , . . . , Xk , noting that by continuity of measures e.g. for k = 3
P X1 x1 , X3 x3 = lim P X1 x1 , X2 x2 , X3 x3 =
x2
n
[
Xi1 (E) A ,
i=0
contains events depending measurably on X0 , . . . , Xn and represents what is known about the
process by time n. Fn Fn+1 for each n and the family (Fn )nN is called the filtration
generated by the process (Xn )n0 .
Definition 2.6. Let (Xn )nN be a sequence of random variables. Define
\
Tn = (Xn+1 , Xn+2 , . . .) (& in n) and T =
Tn A .
nN
Then T is called the tail algebra of (Xn )nN and elements in T are called tail events.
18
Example. A = : lim Xn () exists T since A = : lim XN +n () exists
n
n
for every fixed N N. Similarly, : lim sup Xn () = 137 T .
n
kN.
Since P(A B) = P(A) P(B) for all such A and B by independence, Fn and Tn are independent bySTheorem 1.11 for all n N. Hence Fn and T are independent, since T Tn+1 .
Since n Fn is a system generating the algebra F = (Xn : n N) , F and T
are independent, again by Theorem 1.11. But T F and thus every A T is independent
of itself, i.e.
P(A) = P(A A) = P(A) P(A)
P(A) {0, 1} .
Let Y be a T measurable random variable. Then FY (y) = P(Y y) takes values in {0, 1},
so P(Y = c) = 1 for c = inf{y R : FY (y) = 1}.
2
Remark. Kolmogorovs 01law involves the algebras generated by random variables, rather
than the random variables themselves. Thus it can be formulated without using r.v.s:
Let (Fn )nN be a sequence of independent algebras in A. Let A be a tail event, i.e.
[
\
A T , where T =
Tn with Tn =
Fm .
nm
nN
2.3
Definition 2.7. We say that A E holds almost everywhere (short a.e.), if (Ac ) = 0. If
is a probability measure ((E) = 1) one uses almost surely (short a.s.) instead.
Let f, f1 , f2 , . . . : E R be measurable functions. Say that
(i) fn f
everywhere or pointwise if
(ii) fn f
a.s.
a.e.
{x E : fn (x) 6 f (x)} = (fn 6 f ) = 0 ,
(iii) fn f
for
n.
1
k
i.o. = 0.
so gnk 0 a.e..
Xn X
Xn X
and
Xn c R
20
Xn c .
Proof. The first statement is proved on handout 2, the second follows directly from
n
P Xn c > = P Xn > c + + P Xn < c 0 for all > 0 .
and thus
21
Xn 6 0 a.s. .
Integration
3.1
2 n
X
k=0
Ak,n
= f 1 2n k , 2n (k + 1) for k < 2n n
n, .
Since f is measurable, so are the sets Ak,n , and thus fn S(E) for all n N.
From the first representation it follows immediately that fn+1 (x) fn (x) for all x E and
that fn (x) f (x) 2n for n f (x), or fn (x) for f (x) = . Thus fn % f .
2
This motivates the following definition.
Definition 3.1. Let f : E Z [0, ] be Zan E/Bmeasurable
function. We define the inteZ
gral of f , written as (f ) =
f d = f d =
f (x) (dx) , by
E
Z
f d := sup
nZ
g d : g S(E), 0 g f
Z
g d :=
where
n
X
k=1
n
X
ck 1Ak . We adopt 0 = 0 = 0.
k=1
Remarks. (i)
c1 f + c2 g d = c1
Z
f d + c2
g d
for all c1 , c2 R .
R
Proof. E fn d E fP
d for all n N by definition of E f d. It remains to show
that for any Esimple g = nk=1 ak 1Ak f (with standard representation and ak 6= 0)
Z
Z
lim
fn d
g d .
n E
22
Choose > 0 and set Bn := x E : fn (x) g(x) . Thus Bn % E and for any A E:
(Bn A)R% (A).
Case (i): E g d = (Ar ) = for some r {1, . . . , n} and ar > 0. Then
Z
Z
Z
fn 1Bn Ar d (g ) 1Bn Ar d = (ar ) (Bn Ar )
fn d
E
as n , Rprovided < ar .
S
Case (ii): E g d < for A = nk=1 Ak it is (A) < . Then
Z
Z
Z
fn d
fn 1Bn A d (g ) 1Bn A d =
E
E
ZE
Z
=
g 1Bn A d (Bn A)
g d (A) as n .
E
Z
This is true for arbitrarily small and thus
lim
n E
fn d
g d .
E
Z
For random variables X : R the integral
Remark.
The integral
R can be well defined even if f is not integrable, namely if either
R +
f
d
=
or
E
E f d = , it takes aRvalue . In particular a measurable function f : E [0, ] is integrable if and only if E f d < .
Theorem 3.3. Basic properties of integration
Let f, g : E R be integrable functions on (E, E, ).
(i) Linearity: f + g and, for any c R, c f are integrable with
Z
Z
Z
Z
Z
(f + g) d =
f d +
g d ,
(c f ) d = c
f d .
E
Z
(ii) Monotonicity: f g
Z
f d
g d .
E
Z
(iii) f 0 and
Z
f d = 0
f = 0 a.e.
f d = 0 .
E
f  integrable,
23
Z
f  d
f d .
E
(f + g) d =
E
For f, g : E R we have
Z
(f + g)+ (f + g) = f + f + g +
E
g
g d .
f d +
E
and thus
(f + g)+ + f + g = (f + g) + f + + g + .
Since each of the terms is nonnegative we also have
Z
Z
Z
Z
Z
Z
(f + g)+ d +
f d +
g d = (f + g) d +
f + d +
g + d
E
Remark. In particularZfor random variables X with distribution X this leads to the useful
formula E g(X) =
g(x) X (dx) .
R
Proof. For g = 1A , A F, the identity (A) = f 1 (A) is the definition of .
The identity extends to all Fsimple functions by linearity of integration, then to all measurable
g : F [0, ] with Lemma 3.2, using the approximations gn = 2n b2n gc n, and finally to
all integrable g = g + g : F R again by linearity.
2
24
k=1
Z
f dy = lim
n E
Z
f d = lim
n E
By our definition:
f 1{1,...,n} d = lim
n
X
f integrable
X
f (k) {k} =
f (k) .
k=1
k=1
f  integrable
f (k) < .
k=1
(1)k /k = ln 2 (0 =0 ).
k=1
3.2
fn d
lim fn d.
Then
E
f d.
E
for each m N
Z
Taking the limit m gives g = f . Hence
Z
f d = lim
n E
Z
gn d
fn d for each m n, n N and so with
E
E Z
E
Z
Z
n :
fm d
f d lim
fn d
n E
E
E
Z
Z
Z
m :
lim
fm d
f d lim
fn d .
n
fm
d
m E
n E
25
S
Now, let fn 0 a.e. and fn % f a.e.. Since N = n {fn < 0}{fn > fn+1 } {fn 6 f }
is a countable union of null sets, (N ) = 0. Then use monotone convergence on N c to get
Z
Z
Z
Z
f d .
2
f d =
fn d %
fn d =
Nc
Nc
S
Proof. As previously, N = n {fn < 0} is a null set, so suppose fn 0 pointwise w.l.o.g..
Let gn := inf fk . Then the gn are measurable by Proposition 2.6 and gn % lim inf fn .
kn
Zn
Z
Z
lim inf fn d
gn d
fn d
So since fn gn and by monotone convergence:
n
which proves the statement, taking lim inf n on the lefthand side.
f d .
E
R
R
Proof. f, fn are integrable
since
with
f

g
and
f

g,
f

d,
f  d < .
n
n
n
S
As before, N = n {fn > gn } {fn 6 f } {gn 6 g} is a null set which does not affect
the integral, so we assume pointwise validity of the assumptions w.l.o.g..
We have 0 gn fn g f , so lim inf n (gn fn ) = g f . By Fatous lemma,
R
R
R
R
R
R
g d+ f d = lim inf (gn +fn ) d lim inf (gn +fn ) d = g d+ lim inf fn d ,
n
g d f d =
Since
R
R
R
g d lim sup fn d .
n
proving that
fn d
f d as n .
26
g d = C (E) < .
The following example shows that the inequality in Lemma 3.6 can be strict and that domination by an integrable function in Theorem 3.7 is crucial.
2
Example. On (R, B, ) with Lebesgue
R measure Rtake fn = n 1(0,1/n) .
Then fn & f 0 pointwise, but
f d = 0 < fn d = n .
Z
X
(i) Let (fn )nN , fn : E [0, ] measurable. Then
fn d =
fn d .
E
n=1 E
n=1
P
n
fn converges and
fk g, where g is
n=1
k=1
Z
Z
X
X
P
integrable, then
fn , fn are integrable and
fn d =
fn d .
n=1
n=1 E
n=1
Definition 3.3. Let , 1 , 2 , . . . be measures on (R, B). Say that n converges weakly to ,
written n , if
Z
Z
f dn
f d for all f Cb (R, R) , i.e. f : R R bounded and continuous .
R
Proof. Suppose Xn X. Then by the Skorohod theorem 2.12 there exist Y X and
Yn Xn on a common probability space (, A, P) such that, f (Yn ) f (Y ) a.e. since
f Cb (R, R) (see also problem 2.6). Thus by bounded convergence
Z
Z
Z
Z
f dn =
f (Yn ) dP
f (Y ) dP =
f d
so n .
f (x) =
Z
Z
(1(,y] f ) d g d
R
such that
1 + (y x)/ , x (y, y + )
1+(xy)/ , x 6 (y, y)
where g (x) = 1+(yx)/ , x [y, y+) .
0
, otherwise
R
since f , g Cb (R, R). Now, R g d (y , y + ) 0 as 0 ,
D
{y} = 0, so Xn X.
27
since
2
3.3
Then f
t (t, .) is integrable for all t, the function F : U R defined by
Z
d
f
is differentiable and
F (t) =
(t, x) (dx).
dt
E t
f (t, x) (dx)
F (t) =
E
f (t + hn , x) f (t, x) f
(t, x) .
hn
t
Then for all x E, t U , gn (t, x) 0 and gn (t, x) 2 g(x) for all n N by the MVT.
f
f
t (t, .) is measurable as the limit of measurable functions, and integrable since t g.
Then by dominated convergence, as n
Z
Z
F (t + hn ) F (t)
f
(t, x) (dx) =
gn (t, x) (dx) 0 .
2
hn
E t
E
Remarks. (i) The integral on R w.r.t. Lebesgue measure is called Lebesgue integral.
Z
Z
Z
Z b
We write
f d =
f (x) dx and
f 1(a,b] d =
f (x) dx .
R
f (x) dx
c
Fa0
f (x) dx .
a
= f.
28
Proof. (i) Fix t [a, b). >0 >0 : xy< f (x)f (t)<. So for 0<h,
Z
1 Z t+h
F (t + h) F (t)
t+h
a
a
f (x) f (t) dx
dx = .
f (t) =
h
h t
h t
Analogous for negative h and t (a, b], thus Fa0 = f .
(ii) (F Fa )0 (t) = 0 for all t (a, b) so by the MVT
Z b
f (x) dx .
F (b) F (a) = Fa (b) Fa (a) =
u(x) v (x) dx = u(b) v(b) u(a) v(a)
u0 (x) v(x) dx .
(ii) Let C 1 [a, b], R be strictly increasing. Then
Z
(b)
Z
f (y) dy =
(a)
f (x) 0 (x) dx for all f C [(a), (b)], R .
Lemma 3.13.
Let (E, E, ) be a measure space. For every integrable f : E [0, ),
R
: A 7 A f d is a measure on (E, E) with density f and
Z
Z
g d =
f g d for all integrable g : E R .
E
Let be a Radon measure on (R, B) with distribution function F C 1 (R, R). Then has
density f = F 0 with respect to Lebesgue measure.
Proof. For the first part see problem 2.15(a).
Z b
With Theorem 1.10 and 3.11, (a, b] = F (b) F (a) =
f (x) dx .
a
So coincides with f on the system I {} = (a, b] : a < b {} that generates B.
Thus by uniqueness of extension, = f on B and has density f .
2
29
3.4
E : f1 is measurable ,
1
which can be checked to be a dsystem. Since f1 (x1 ) = 1A1 (x1 ) 2 (A2 ) for A = A1 A2 ,
A D and thus E = (A) = D with Dynkins lemma (1.5).
By linearity of integration the statement also holds for nonnegative Esimple functions, and
by monotone convergence for all bounded, measurable f using
Z
Z
f1 (x1 ) :=
f + (x1 , x2 ) 2 (dx2 )
f (x1 , x2 ) 2 (dx2 ) .
2
Proof. (i) For fixed x1 E1 define
E2
E2
E2
Proof. With Lemma 3.14, is a well defined function of A. Using monotone convergence
can be seen to be countably additive and is thus a measure.
Since 1A1 A2 = 1A1 1A2 the above property is fulfilled for all A1 E1 and A2 E2 .
Since A = {A1 A2 : Ai Ei } is a system generating E and (E) < , is uniquely
determined by its values on A following Theorem 1.6 (Uniqueness of extension).
2
30
E1 E2
f d =
E1
E2
E2
E1
f  d,
E1
E2
E2
E1
E2
Furthermore
E1
E2
f  d < .
E
The same follows for f (., x2 ) and finally the formula in (i) holds for f and thus for f =
f + f by linearity.
2
Remarks. (i) Product measures and Fubini can be extended to finite measure spaces,
S i.e.
for all A E1 there exist An E1 , n N with 1 (An ) < for all n and A = n An .
(ii) However, without finiteness Fubinis theorem does in general not hold. Consider e.g.
the measure () = 0, (A) = for a 6= on (R, B). This is not finite and with
Lebesgue measure on (R, B) we have
Z Z
R
Z Z
R
1Q (x + y) (dy) (dx) = .
So products can be taken without specifying the order, e.g. Rd , B(Rd ), d .
31
Z
Example. I =
ex dx =
I2 =
e(x
2 +y 2 )
dx dy =
R2
r=0
=0
2
2
er r dr d = 2 er /2 0 = .
32
Lp spaces
4
4.1
f d .
E
kf fn kp 0 as n .
Remarks. (i) At the end of this section we will see in what sense k.kp is a norm on Lp .
(ii) For f C(R), kf k = supxE f (x).
(iii) For 1 p < :
kf kp (E)1/p kf k .
(iv) Let f Lp , 1 p < . Then f  (kf kp /)p for all > 0 by Chebyshevs inequality.
For f Lp (R) this includes that f (x) essentially tends to zero as x in the
sense kf 1xy k 0 as y . For random variables X Lp the relation
P X = O(p ) as is called a tail estimate (see also problem 3.6) .
Definition 4.2. A function f : R R is convex if, for all x, y R and t [0, 1]
f t x + (1 t) y t f (x) + (1 t) f (y) .
Remark. Let f : R R be convex. Then f is continuous (in particular measurable) and
x0 R a R : f (x) a (xx0 ) + f (x0 ) .
Theorem 4.2. Jensens inequality
Let X be an integrable r.v. and f : R R convex. Then
E f (X) f E(X) .
33
E
f (x)p
kf kpp
g(x)q
a.e. .
kgkqq
es et
+
p
q
and thus a b
ap bq
+
.
p
q
(Youngs inequality)
b = ap1 .
1
p
1
q
= kf kp kgkq .
After integration equality holds if and only if b = ap1 a.e., finishing the proof.
Corollary 4.4. Minkowskis inequality
For p [1, ] and measurable f, g : E R we have
kf + gkp kf kp + kgkp .
Since x 7
xp
Examples. (i) There is no monotonicity of Lp norms if (E) = . Take e.g. f (x) = 1/x
on (0, )
with Lebesgue measure.
Then kf 1[1,) k1 = > 1 = kf 1[1,) k2
and k f 1(0,1) k1 = 2 < = k f 1(0,1) k2 .
P
(ii) Consider the counting measure = n n on the measurable space N, P(N) . Then
kf kp =
X
1/p
f (n)p
for p <
and
kf k = sup f (n) .
nN
n=1
X
Choose a subsequence (nk )kN such that S :=
kfnk+1 fnk kp < .
k=1
K
X
fnk+1 fnk 
S .
p
k=1
k=1
So for a.e. x R, fnk (x) is Cauchy and thus converges by completeness of R. We define
limk fnk (x) , if the limit exists
f (x) :=
.
0
, otherwise
Z
Given > 0, we can find N N such that
fn fm p d < for all m n N , and
Z
in particular
fn fnk p d < for sufficiently large k. Hence by Fatous Lemma
Z
Z
Z
p
fn f  d = lim inf fn fnk  d lim inf fn fnk p d < for all n N .
p
35
Hence kfn f kp 0 since > 0 was arbitrary and f Lp since for n large enough
kf kp kf fn kp + kfn kp 1 + kfn kp < .
4.2
L2 as a Hilbert space
Thus h., .i is finite and well defined on L2 , symmetric by definition and bilinear by linearity of
integration. Further hf, f i = kf k22 0 with equality if and only if f = 0 a.e. and (L2 , k.k2 )
is complete by Theorem 4.7.
2
Proposition 4.9. For f, g L2 we have Pythagoras rule
kf + gk22 = kf k22 + 2 hf, gi + kgk22 ,
and the parallelogram law
kf + gk22 + kf gk22 = 2 kf k22 + kgk22 .
Proof. Follows directly from kf gk22 = hf g, f gi.
hf, vi = f fn , vi kfn f k2 kvk2 0 as n .
u
= u, v = v a.e. .
for all h V .
Definition 4.5. For Rvalued random variables X, Y L2 (P) with means mX = E(X)
and mY = E(Y ) we define variance, covariance and correlation by
var(X) = E (X mX )2 , cov(X, Y ) = E (X mX )(Y mY ) ,
p
var(X) var(Y ) .
corr(X, Y ) = cov(X, Y )
For an Rn valued random variable X = (X1 , . . . , Xn ) L2 (P) (this means that each coordinate Xi L2 (P)) the variance is given by the covariance matrix
var(X) = cov(Xi , Xj ) i,j=1,..,n .
Remarks. (i) var(X) = 0
a var(X)a =
n
X
ai aj cov(Xi , Xj ) = var(at X) 0 ,
i,j=1
since aT X =
i ai Xi
L2 (P).
37
Revision. Let (, A, P) be a probability space and let G A be some event. For P(G) > 0
the conditional probability P(.  G), given by
P(A  G) =
P(A G)
P(G)
for all A A ,
E(X 1Gi )
iI
X
1Gi dP P(Gi ) =
E(X 1Gi ) =
iJ
X dP .
A
In particular, if E(X) < , E(X  G) is integrable and E E(X  G) = E(X).
(iii) For a algebra G A, L2 (G, P) is complete and therefore a closed subspace of
L2 (A, P). If X L2 (A, P) then E(X  G) L2 (G, P).
Proposition 4.12. If X L2 (A, P) then E(X  G) is a version of the orthogonal projection of
X on L2 (G, P).
Proof. see problem 3.10
Remarks on the general case
(i) For a general algebra F A one can show,
integrable r.v. X there
R that for every
R
exists an Fmeasurable, integrable r.v. Y with F Y dP = F X dP for every F F. It
is unique up to a version, defining the conditional expectation Y = E(X  F).
For X L2 (A, P), E(X  F) is the orthogonal projection of X on L2 (F, P).
(ii) If X is Fmeasurable, E(X  F) = X. In particular E(X  A) = X.
(iii) For algebras F1 F2 A we have
E E(X  F2 ) F1 = E(X  F1 ) = E E(X  F1 ) F2 .
38
4.3
Convergence in L1
Lp
gence in L1 is the weakest. From problem 3.4 we know that Xn X implies convergence
in probability. The converse holds only under additional assumptions.
Theorem. 4.13. Bounded convergence
Let (Xn )nN be a sequence of random variables with Xn X in probability. If in addition
Xn  C a.s. for all n N and some C < , then Xn X in L1 .
Proof. By Theorem 2.10(ii) X is the almost sure limit of a subsequence, so X C a.s. .
For > 0 there exists N N such that for all n N : P Xn X > /2 /(4C) . Then
E Xn X = E Xn X 1Xn X>/2 + E Xn X 1Xn X/2
2 C /(4C) + /2 = .
2
Remark. Corollary 3.8 on bounded convergence gives a similar statement under the stronger
assumption Xn X a.s.. Although the assumptions in 4.13 are weaker, they are still not necessary for the conclusion to hold. The main motivation of this section is to provide a necessary
and sufficient extra condition, such that convergence in probability implies convergence in L1 .
Lemma 4.14. For X L1 (A, P) set
Then IX () & 0 as & 0.
IX () = sup E(X 1A ) : A A, P(A) .
Proof. Suppose
not. Then, for some > 0, there exist An A, with P(An ) 2n and
E X 1An for all n N. By the first BorelCantelli lemma, P(An i.o.) = 0. But then
by dominated convergence
E X 1mn Am E X 1{An i.o.} = 0 as n ,
2
which is a contradiction.
for n = d1/e
and all X X ,
k=0
which includes that X is uniformly bounded. In general this does not hold.
39
Hence X is U I.
Theorem 4.16. Let Xn , n N and X be random variables. The following are equivalent:
(i) Xn L1 for all n N, X L1 and Xn X in L1 ,
(ii) {Xn : n N} is UI and Xn X in probability.
40
5
5.1
Definition 5.1. For a finite measure on Rn , B(Rn ) , define the Fourier transform
: Rn C by
Z
eihu,xi (dx) , for all u Rn .
(u) =
Rn
Here hu, xi =
n
X
i=1
Rn
Rn
Since eix = cos x + i sin x has bounded real and imaginary part it is integrable with respect to
every finite measure. Thus also
(u) and X (u) are well defined for all u Rn (in contrast to
moment generating functions MX , see problem 3.13).
Definition 5.2. A random variable X in Rn is called standard Gaussian if
Z
1
2
ex /2 dx , for all A B(Rn ) .
P(X A) =
n/2
(2)
A
Example. For a standard Gaussian random variable X in R it is
Z
Z (xiu)2 /2
e
iux 1
x2 /2
u2 /2
X (u) =
e e
dx = e
I , where I =
dx .
2
2
R
R
R
2
I can be evaluated by considering the complex integral ez /2 dz around the rectangular
2
contour with corners R, R iu, R iu, R. Since ez /2 is analytic, the integral vanishes
by Cauchys theorem for every R > 0. In the limit R , the contributions from the vertical
sides of also vanish and thus
Z
1
2
2
ex /2 dx = 1 X (u) = eu /2 .
I=
2
R
In the next subsection we will also make use of the following.
Definition 5.3. For t > 0 and x, y Rn we define the heat kernel
p(t, x, y) =
1
2
exy /(2t) R .
n/2
(2t)
41
Z
1
2
2
eiwu eu /2 du.
Remark. From the previous calculation we have ew /2 =
2
R
5.2
2
E(g(X + tY )) =
g(x + ty)(2)n/2 ey /2 dy X (dx) =
n
n
ZR ZR
=
p(t, x, y 0 ) g(y 0 ) dy 0 X (dx) =
n
n
ZR RZ
Z
1
ihu,xi u2 t/2 ihu,yi
=
e
e
e
du X (dx) g(y) dy
n
Rn
Rn
Rn (2)
Z
Z
1
u2 t/2 ihu,yi
=
(u)
e
e
du
g(y) dy .
X
(2)n Rn
Rn
By this formula, X determines E f (X + tY ) . For any bounded continuous g, we have
E g(X + tY ) E g(X) as t & 0 ,
so X determines E g(X) . Hence X determines X due to problem 4.1.
If X is integrable and if g is continuous and bounded, then
(u, y) 7 X (u) g(y) L1 (du dy) .
So, by dominated convergence, as t & 0, the last integral above converges to
Z
Z
1
ihu,yi
X (u) e
du g(y) dy .
(2)n Rn
Rn
2
Remark. Let X, Y be independent r.v.s in Rn . Then the characteristic fct. of the sum is
X+Y (u) = E eihu,X+Y i = E eihu,Xi eihu,Y i = X (u) Y (u) .
The next result shows that independence of r.v.s is equivalent to factorisation of the joint characteristic function.
Theorem 5.2. Let X = (X1 , . . . , Xn ) be a r.v. in Rn . Then the following are equivalent:
(i) X1 , . . . , Xn are independent ,
(ii) X = X1 . . . Xn ,
Y
Y
n
n
(iii) E
fk (Xk ) =
E fk (Xk ) , for all bounded Borel functions f1 , . . . , fn ,
k=1
(iv) X (u) =
k=1
n
Y
k=1
Q
Proof. If (i) holds, X (A1 . . . An ) = k Xk (Ak ) for all Borel sets A1 , . . . , An .
So (ii) holds, since this formula characterizes the product measure by Theorem 3.15.
If (ii) holds, then, for f1 , . . . , fn bounded Borel,
Y
Z Y
YZ
Y
fk (xk ) X (dx) =
fk (xk ) Xk (dxk ) =
E fk (Xk ) ,
E
fk (Xk ) =
Rn k
5.3
1
2 2
e(x)
2 /(2 2 )
(ii) var(X) = 2 ,
(iv) X (u) = eiuu
43
2 2 /2
for all v R .
k=1
and x =
n
X
i=1
So we can write =
n
X
i vi viT
and we define
1/2
n p
X
:=
i vi viT .
i=1
i=1
fX (x) = p
(2)n det
i
exp x , 1 (x ) /2 ,
(v) if X = (Y, Z), with Y in Rm and Z in Rp (m+p=n), then the block structure
var(X) =
var(Y ) 0
0 var(Z)
implies that
44
1
(2)n
ey
2 /2
= 1/2 Y + , then X
is Gaussian, with E(X)
= and var(X)
= , since
Set X
n
X
1/2
1/2
1/2
1/2
cov(Xi , Xj ) = E ( Y )i ( Y )j = E
ik Yk jl Yl = ij
k,l=1
X. If is invertible, then X
and hence X has the density
due to E(Yk Yl ) = k,l . So X
1/2
claimed in (iv), by the linear change of variables Y =
(X ) leading to
y2 = hy, yi = x , 1 (x )
and
dn x
.
dn y = dn x det 1/2 =
det
(v) Finally, if X = (Y, Z) and = var(X) has the block structure given in (v) then, for all
v Rm and w Rp ,
(v, w), (v, w) = hv, Y vi + hw, Z wi , where Y = var(Y ) and Z = var(Z) .
With = (Y , Z ), the joint characteristic function X then splits into a product
X (v, w) = eihv,Y ihv,Y vi/2 eihw,Z ihw,Z wi/2 ,
2
45
6
6.1
n
P
E(Xn4 ) M ,
for all n N .
Xi we have Sn /n a.s. as n .
i=1
i,j,k,l
E (Y 2 )2 E Y 2 2 ,
and by Jensens inequality M
i
i
i<j
so using independence
+ 6 n(n 1) M
3n2 M
.
E (Sn n)4 nM
2
X
X
4
Thus E
(Sn /n ) 3M
1/n2 < by monotone convergence. Therefore
n
n
X
4
(Sn /n ) < a.s. and thus Sn /n a.s. .
2
n
Intuitively, the above result should also hold without the restrictive assumption on the fourth
moment of the random variables. One goal of this chapter is in fact to prove the above statement
with a much weaker assumption. For this purpose it is convenient to use a different approach,
leading to ergodic theory which is introduced in the next two sections.
6.2
Measurepreserving transformations
(ii) E := A E : 1 (A) = A is a algebra since preimages preserve set operations.
(iii) A E is invariant
1A = 1A , since 1A = 11 (A) .
(iv) f : E F is invariant B F : f 1 (B) = 1 f 1 (B)
B F : f 1 (B) E , i.e. f is E measurable .
Definition 6.2. A measurable function : E E is called measurepreserving if
1 (A) = (A) for all A E .
Such is ergodic if E is trivial, i.e. contains only sets of measure 0 and their complements.
Examples. (i) The constant function (x) = c E is not measure preserving.
The identity (x) = x is measure preserving, but not ergodic, since E = E.
(i) Translation map on the torus. Take E = [0, 1)n with Lebesgue measure, for a E set
a (x1 , . . . , xn ) = (x1 + a, . . . , xn + a) with addition modulo 1 .
In problem 4.10 it is shown for n = 1 that is measurepreserving, and also ergodic if
and only if a is irrational.
(ii) Bakers map. Take E = (0, 1] with Lebesgue measure and set (x) = 2x b2xc .
In problem 4.11 it is shown that is measurepreserving and ergodic.
Proposition 6.2. If f Z: E R isZintegrable and : E E is measurepreserving, then f
is integrable and
f d =
f d .
E
Proof. For f = 1A , A E the statement reduces to (A) = 1 (A) , which holds since
is measurepreserving. This extends to simple functions by linearity, to nonnegative measurable functions by monotone convergence and to integrable f = f + f again by linearity.2
Proposition 6.3. If : E E is ergodic and f : E R is invariant, then f = c a.e.
for some constant c R .
c
Proof. For all A B, (f
A) = 0 or (f A
) = 0, since f is E measurable and
is ergodic. Set c := inf a R : (f > a) = 0 . So (f a) = 0 for all a < c and
(f a) = 0 for all a > c, and thus f = c a.e. .
2
47
for all y f (E), which are noncommunicating under the time evolution defined by . However if is ergodic, Proposition 6.3 implies that the only invariant functions are constant a.e..
So an ergodic dynamical system does not have conserved quantities which partition the state
space into noncommunicating classes of nonzero measure (compare to Markov chains).
For the rest of this section we consider the infinite product space
E = RN = x = (xn )nN : xn R, n N
with algebra E = (Xn : n N) generated by the coordinate maps Xn : E R with
Xn (x) = xn .
Remark. E = (C) generated by the system
nO
o
C=
An : An B, An = R for all but finitely many n ,
nN
which consists of socalled cylinder sets, where only finitely many coordinates are specified.
Let (Yn )nN be a sequence of iidrvs with distribution m. With the Skorohod theorem they
can be constructed
RN defined as
on a common probability space (, A, P). Y :
Y () = Yn () nN is A/Emeasurable and the distribution = P Y 1 of Y satisfies
Y
O
(A) =
m(An ) for all cylinder sets A =
An C .
nN
nN
Since C is a system generating E, this is the unique measure on (E, E) with this property.
Therefore (, A, P) = (RN , E, ) is a generic example of such a common probability space.
Definition 6.3. On the probability space (RN , E, ) the coordinate maps Xn : RN R
themselves are iidrvs with distribution m, and this is called the canonical model for such a
sequence. The shift map : RN RN is defined as (x1 , x2 , . . .) = (x2 , x3 , . . .) .
Theorem 6.4. The shift map is ergodic.
Proof. is measurable and measurepreserving (see problem 4.9).
To see that is ergodic recall the tail algebra
\
T =
Tn where Tn = (Xm : m > n) E .
nN
For A = kN Ak C , n (A) = x E : Xn+k (x) Ak forall k 1 Tn .
Since Tn is a algebra, n (A) Tn for all A E . If A
E = B E : 1 (B) = B
T
then A = n (A) Tn for all n N and thus A n Tn = T so that E T . By
Kolmogorovs 01law T and thus E is trivial and is ergodic.
2
N
48
6.3
Ergodic Theorems
In the following let (E, E, ) be a finite measure space with a measurepreserving transformation : E E. Let f : E R be integrable and define Sn : E R by S0 = 0 and
Sn = Sn (f ) = f + f + . . . + f n1
for n 1 .
Example. Let (RN , E, ) be the canonical model for iidrvs (Xn )nN , f = X1 : E R the
n
P
first coordinate map and the shift map from the previous section. Then Sn (X1 ) =
Xi .
i=1
f d 0 .
S >0
Sm = f + Sm1 f + Sn .
On An we have Sn = max Sm , so Sn f + Sn .
1mn
Sn d
f d +
Sn d .
E
An
Z
f d 0 .
An
Z
f d 0 .
{S >0}
lim inf (Sn /n) < a < b < lim sup(Sn /n) .
n
is an invariant event. We shall show that (D) = 0. First, by invariance, we can restrict
everything to D and thereby reduce to the case D = E. Note that either b > 0 or a < 0. We
can interchange the two cases by replacing f by f . Let us assume then that b > 0.
Let B E with (B) < , then g = f b 1B is integrable and, for each x D, for some n,
Sn (g)(x) Sn (f )(x) n b > 0 .
49
Z
Since is finite, we can let B % D to obtain b (D)
f d .
D
In particular, we see that (D) < .Z A similar argument applied to fZand a, this time with
B = D, shows that (a)(D)
(f ) d . Hence
f d a (D) .
b (D)
D
f  d =
lim inf Sn /n d lim inf
Sn /n d
f  d .
2
E
Z
S (f g) p
Sn (f g) p
n
d
lim
inf
lim inf
d kf gkpp .
n
n
n
n
E
E
Hence, for n N ,
S (f )
S (f g)
S (g)
n
n
n
f
g
+ k
g fkp < 3/3 = .
+
n
n
n
p
p
p
50
Corollary
limit function of Theorem 6.6.
Z 6.8. LetZ(E) < , f L and f be the invariant
Z
Then
f d =
f d and if is ergodic, f = f d/(E) a.e. .
E
Z
Z S
n
Proof.
d
f d
Sn /n f
1 0 by Theorem 6.7. By the definition of
E
E n
Z
Z
Sn
Sn ,
f d for all n N since is measure preserving and the first statement
d =
E n
E
follows.
If is ergodic, the invariant function f is constant a.e. by Proposition 6.3, and together with
the first this implies the second statement.
2
6.4
as n .
Proof. Let (RN , E, ) be the canonical model for the sequence Y := (Yn )nN RN with
distribution as in Def. 6.3. Take f = Y1 L1 to be the first coordinate map. Note that
Sn = Y1 + . . . + Yn = f + f + . . . + f n1 ,
where : RN RN is the shift map which is measure preserving and ergodic by Theorem 6.4.
With (RN ) = 1 we have by Theorem 6.6 and Corollary 6.8
a.s.
Sn /n E(f ) = E(Y1 ) = .
Remarks. (i) By Theorem 6.7 we also have convergence in L1 in Theorem 6.9.
(ii) Theorem 6.9 is stronger than Theorem 6.1 where we needed E(Yn4 ) M for all n N.
But here the Yn have to be identically distributed for to be measure preserving.
(iii) With Thm 2.10 and Prop 2.11 the strong implies the weak law of large numbers:
Sn /n a.s. Sn /n in probability Sn /n as n .
Theorem 6.10. Levys convergence theorem for characteristic functions
Let Xn , n N and X be random variables in R with characteristic functions X (u) =
E eiuX and Xn (u). Then
Xn (u) X (u) for all u R
Xn X in distribution .
D
Proof. : By Theorem 3.9, Xn X E f (Xn ) E f (X) for all f Cb (R)
and eiux = cos(ux) + i sin(ux) is bounded and continuous for all u R.
is more involved, see e.g. Billingsley, Probability and Measure (3rd ed.), Thm 26.3. 2
51
1
2
ey /2 dy as n .
P Sn / n [a, b]
2
a
Proof. Set (u) = E eiuX1 . Since E(X12 ) < , we can differentiate E eiuX1 twice under
the expectation, to show that (see problem 4.2(b))
(0) = 1 ,
0 (0) = 0 ,
00 (0) = 1 .
n
n
n (u) = E eiu(X1 +...+Xn )/ n = E ei(u/ n)X1
= 1 u2 /(2n) + o(u2 /n) .
The complex logarithm satisfies log(1 + z) = z + o z as z 0, so, for each u R,
log n (u) = n log 1 u2 /(2n) + o(u2 /n) = u2 /2 + o(1) , as n .
2
Hence n (u) eu /2 for all u. But eu /2 is the characteristic function of the N (0, 1) distribution, so Levys convergence theorem completes the proof.
2
Remarks. (i) This is only the simplest version of the central limit theorem. It holds in
more general cases, e.g. for nonindependent or not identically distributed r.v.s.
(ii) Problem 4.5 indicates that Sn /n can also converge to other (socalled stable) distributions
than the Gaussian.
52
Appendix
A
A.1
Example sheets
Example sheet 1 Set systems and measures
nN mn
53
(b) Let B B be a Borel set with (B) < , where is the Lebesgue measure. Show that, for
every > 0, there exists a finite union of disjoint intervals A = (a1 , b1 ] . . . (an , bn ] such
that (A 4 B) < , where A 4 B = (A \ B) (B \ A).
(Hint: First consider all bounded sets B for which the conlusion holds and show that they form
a dsystem.)
1.10 Completion
Let (E, E, ) be a measure space. A subset N of E is called null if N B for some B E with
(B) = 0. Write N for the set of all null sets.
(a) Prove that the family of subsets C = A N : A E, N N
is a algebra.
0
(b) Show that the measure may be extended to a measure on C with 0 (A N ) = (A).
The algebra C is called the completion of E with respect to .
1.11 Let (An )nN be a sequence of events in the probability space (, A, P), i.e. An A for all n.
Show that the An , n N, are independent if and only if the algebras which they generate,
An = {, An , Acn , }, are independent.
1.12 (a) Let F be the LebesgueStieltjes measure on R associated
with the distribution function F .
Show that F is continuous at x if and only if F {x} = 0.
(b) Let (Fn )nN , be a sequence of distribution functions on R such that F (x) = lim Fn (x) exists
n
for all x R. Show that F need not be a distribution function.
1.13 Cantor set
Let C0 = [0, 1], and let C1 , C2 , . . . be constructed iteratively by deletion of middlethirds.
C1 = [0, 13 ] [ 32 , 1] , C2 = [0, 19 ] [ 92 , 13 ] [ 32 , 79 ] [ 89 , 1] and so on.
\
The set C = lim Cn =
Cn is called the Cantor set.
Thus
nN
(Hint: Establish a recursion relation for Fn (x) and use the contraction mapping theorem.)
(c) Show that F is continuous on [0, 1] with F (0) = 0, F (1) = 1.
(d) Show that F is differentiable except on a set of measure 0, and that F 0 (x) = 0 wherever F is
differentiable.
1.14 Riemann zeta function
The Riemann zeta function is given by (s) =
ns ,
n=1
s>1.
Let s>1 and Pf : P(N) [0, 1] be the probability measure with mass function f (n) = ns /(s).
For p {1, 2, . . .} let Ap = {n N : p  n} (p divides n).
(a) Show that the events {Ap : p prime} are independent. Deduce Eulers formula
Y
1
1
=
1 s .
(s)
p
p prime
Pf {n N : n is squarefree} =
54
1
.
(2s)
A.2
Unless otherwise specified, let (E, E), (F, F) be measurable spaces and (, A, P) be a probability space.
2.1 Let f : E F be any function (not necessarily measurable).
(a) Show that f 1 (A) = f 1 (A) for all A P(F ).
(b) Let f be E/Fmeasurable. Under which circumstances is f (E) P(F ) a algebra?
(c) Take (E, E) = (F, F) = (R, B). Find the algebras (fi ) generated by the functions
f1 (x) = x , f2 (x) = x2 , f3 (x) = x , f4 (x) = 1Q (x) .
2.2 Let fn : E R, n N be E/Bmeasurable functions. Show that also the following functions are
measurable, whenever they are well defined:
(a) f1 + f2
(b) inf fn
nN
(c) sup fn
nN
2.3 Let f : E Rd be written in the form f (x) = f1 (x), . . . , fd (x) . Show that f is measurable
w.r.t. E and B(Rd ) if and only if each fi : E R is measurable w.r.t. E and B.
2.4 Skorohod representation theorem
Let Fn : R [0, 1], n N be probability distribution functions. Consider the probability space
(, A, P) where = (0, 1], A = B((0, 1]) are the Borel sets on (0, 1] and P
is the restriction
of
Lebesgue measure to A. For each n define Xn : (0, 1] R , Xn () = inf x : Fn (x) .
(a) Show that the Xn are random variables with distributions Fn . Are the Xn independent?
(b)* Suppose F (x) is a probability distribution function such that lim Fn (x) = F (x) for all x R
n
2.6 Let X1 , X2 , . . . be random variables with Xn X. Show that then also h(Xn ) h(X) for
all continuous functions h : R R.
(Hint: Use the Skorohod representation theorem)
2.7 Let X1 , X2 , . . . be random variables on (, A, P) and T the tail algebra of (Xn )nN .
For each n N let Sn = X1 + . . . + Xn . Which of the following are tail events in T ,
{Xn 0 ev.} , {Sn 0 i.o.} ,
lim inf Sn 0 , { lim Sn exists ?
n
55
2.8 Let X1 , X2 , . . . be random variables with
Xn =
X + + X
X1 + + Xn
1
n
= 0 for each n , but
1 almost surely.
Show that E
n
n
2.9 Let X, X1 , X2 , . . . be random variables on (, A, P).
(a) Show that
: Xn () X() A .
(b) Show that Xn X almost surely sup Xm X 0 in probability .
mn
2.10 Let X1 , X2 , . . . be independent random variables with distribution N (0, 1). Prove that
p
lim sup Xn / 2 log n = 1
a.s.
n
Z
(b)
(n cos x)/ 1 + n2 x3/2 dx 0 .
2.14 Show that the function f (x) = x1 sin x is not Lebesgue integrable over [1, ) but that
Z y
R
lim
f (x) dx = .
(use e.g. Fubinis theorem and x1 = 0 ext dt)
y 0
2
R
2.15 (a) Let be a measure
R on (E, E) and f : E [0, ) be E/Bmeasurable with E f d < .
Define (A) = A f d for each A E. Show that is a measure on (E, E) and that
Z
Z
g d =
f g d for all integrable g : E R .
E
(b) Let be a finite measure on (E, E). Show that for all E/Bmeasurable g : E [0, )
Z
Z
g d =
(g ) d .
E
56
A.3
Unless otherwise specified, let (E, E, ) be a measure space and (, A, P) be a probability space.
3.1 Let be the Lebesgue measure on R2 , B(R2 ) .
2
y
f (x, y) = (xx2 +y
calculate the iterated Lebesgue integrals
2 )2
R1R1
R1R1
dx .
0 0 f (x, y) dx dy and
0 0 f (x, y) dy
R
What does the result tell about the double integral (0,1)2 f d ?
(
xy
2
2 2 , (x, y) 6= (0, 0)
(b) Show that for f (x, y) = (x +y )
the iterated integrals
0
, (x, y) = (0, 0)
R1 R1
R1 R1
f (x, y) dy dx
1 1 f (x, y) dx dy and
R1 1
coincide, but that the double integral (1,1)2 f d does not exist.
(a) For
(c) Let be the counting measure on (R, B), i.e. (A) is equal to the
in A
number of elements
2
whenever A is finite, and (A) = otherwise. Denote by = (x, y) (0, 1) : x = y
the diagonal in (0, 1)2 and calculate the iterated integrals
R1R1
R1R1
0 0 1 (x, y) dx (dy) and
0 0 1 (x, y) (dy) dx .
Does the result contradict Fubinis theorem?
3.2 (a) Are the following statements equivalent? (Justify your answer.)
(i) f is continuous almost everywhere,
(ii) f = g a.e. for a continuous function g.
(b) Let Xn U [1/n, 1/n] be uniform random variables on [1/n, 1/n] for n N.
Do the Xn converge, and if yes in what sense?
3.3 Prove that the space L (E, E, ) is complete.
3.4 Let p [1, ] and let fn , f Lp (E, E, ) for n N. Show that:
fn f in Lp
fn f in measure ,
3.5 Read handout 2 carefully. Find examples which show that the reverse implications, concerning
the concepts of convergence on page 1, are in general false. How does the picture change if the
measure space (, A, P) is not finite?
3.6 Let X be a random variable in R and let 1 p < q < . Show that
Z
E Xp =
p p1 P X d
0
and deduce:
X Lq (P) P X = O(q ) X Lp (P) .
Remark on questions 3.7(a) and 3.8(a): Start with an indicator function and extend your argument to the
general case, analogous to the proof of Lemma 3.14(ii).
3.7 A stepfunction g : R R is any finite linear combination of indicator functions of finite intervals.
(a) Show that the set of stepfunctions I is dense in Lp (R) for all p [1, ),
i.e. for all f Lp (R) and every > 0 there exists g I such that kf gkp < .
(Hint: Use the result of question 1.9.)
57
(b) Using (a), argue that the set of continuous functions C(R) is dense in Lp (R), p [1, ).
3.8 (a) Show that, if X and Y are independent random variables, then kX Y k1 = kXk1 kY k1 ,
but that the converse is in general not true.
(b) Show that, if X and Y are independent and integrable, then E(X Y ) = E(X) E(Y ) .
3.9 Let V1 V2 . . . be an increasing sequence of closed subspaces of L2 = L2 (E, E, ).
For f L2 , denote by fn the orthogonal projection of f on Vn . Show that fn converges in L2 .
S
3.10 Given a countable family of
disjoint
events
(G
)
,
G
A
,
with
i
i
iI
iI Gi = .
2
Set G = Gn : n N and V = L (, G, P) .
Show that, for X L2 (, A, P), the conditional expectation E(X  G) is a version of the
orthogonal projection of X on V .
3.11 (a) Find a sequence of random variables (Xn )nN which is not bounded in L1 , but satisfies the
other condition for uniform integrability, i.e.
> 0 > 0 A A i I : P(A) < E Xi  1A < .
(b) Find a uniformly integrable sequence of random variables (Xn )nN such that
Xn 0 a.s. and E sup Xn  = .
n
3.12 Let (Xn )nN be a sequence of identically distributed r.v.s in L2 (P). Show that, as n ,
(a) for all > 0 , n P X1  > n 0 ,
(b) n1/2 max Xk  0 in probability ,
kn
(c) n
1/2
max Xk  0 in L1 .
kn
58
A.4
Z
g d1 =
(k) (0) = ik
R
1
,
(1 + x2 )
xR.
(a) Show that the corresponding characteristic function is given by (u) = eu .
(b) Show also that, if X1 , . . . , Xn are independent Cauchy random variables, then
(X1 + + Xn )/n is also Cauchy.
Comment on this in the light of the strong law of large numbers and the central limit theorem.
4.6 Let X, Y N (0, 1) and Z N (0, 2 ) be independent Gaussian random variables. Calculate the
characteristic function of = XY Z .
4.7 Suppose X N (, 2 ) and a, b R. Prove Proposition 5.3, i.e. show that
(b) var(X) = 2 ,
2 2
(d) X (u) = eiuu /2 .
(a) E(X) = ,
(c) aX + b N (a + b, a2 2 ) ,
m=1
a (x) = (x + a) mod 1 .
60
1
2
, and define
B
B.1
Handouts
Handout 1 Proof of Caratheodorys extension theorem
(B) = inf
(An ) ,
S
where the infimum is taken over all sequences (An )nN in E such that B n An and is taken
to be if there is no such sequence. Note that is increasing and () = 0. Let us say that
A E is measurable if, for all B E,
n
(B) = (B A) + (B Ac ) .
Write M for the set of all measurable sets. We shall show that M is a algebra containing
E and that is a measure on M, extending . This will prove the theorem.
[[
Then
Anm
XX
n
(Anm )
(Bn ) + .
On taking the infimum over all such sequences, we see that (A) (A). On the other hand,
it is obvious that (A) (A) for A E.
Step III. We show that M contains E.
Let A E and B E. We have to show that
(B) = (B A) + (B Ac ) .
By subadditivity of , it is enough to show that
(B) (B A) + (B Ac ) .
If (B) = , this is trivial, so let us assume that (B) < . Then, given > 0, we can
find a sequence (An )nN in E such that
[
X
B
An , (B) +
(An ) .
n
61
Then
BA
(An A) ,
B Ac
(An Ac ) ,
so that
(B A) + (B Ac )
(An A) +
(An Ac ) =
(An ) (B) + .
Note that (B Ac1 . . . Acn ) (B Ac ) for all n. Hence, on letting n and using
countable subadditivity, we get
(B)
(B An ) + (B Ac ) (B A) + (B Ac ).
n=1
The reverse inequality holds by subadditivity, so we have equality. Hence A M and, setting
B = A, we get
(A) =
(An ) .
2
n=1
62
B.2
everywhere or pointwise if
a.s.
(ii) Xn X
P(Xn 6 X) = 0 .
Lp
in Lp for p [1, ],
(iii) Xn X
(iv) Xn X
if kXn Xkp 0
as n .
as n .
Xn X
(q p 1):
a.s.
Xn X
Xn X
Xn X
Lq
Xn X
Lp
Xn X
Proofs are given in Theorem 2.10, Proposition 2.11 (see below), Theorem 3.9 (see below), Corollary 5.3
and example sheet question 3.2.
P
Thus
and as n
Z
f (Yn ) dP
f dn =
R
Z
f (Y ) dP =
f d and
R
63
n by bounded convergence .
such that
1 + (y x)/ , x (y, y + )
1+(xy)/ , x 6 (y, y)
where g (x) = 1+(yx)/ , x [y, y+) .
0
, otherwise
Z
Z
(1(,y] f ) d g d
R
f (x) =
R
since f , g Cb (R, R). Now, R g d (y , y + ) 0 as 0 ,
since {y} = 0,
so Xn X.
R2 = 1( 1 , 1 ] + 1( 3 ,1] ,
4 2
R3 = 1( 1 , 1 ] + 1( 3 , 1 ] + 1( 5 , 3 ] + 1( 7 ,1] ,
8 4
n1
2[
8 2
8 4
...
2k 1 2k i
, n .
2n
2
k=1
With problem 1.11 the Rn are independent if and only if the An are. To see this, take n1 < . . . < nL for
some L N and we see that An1 , . . . , AnL are independent by induction, using that
thus in general Rn = 1An
where
An =
In,k
and
In,k =
N2
and set
Yk,n = Rm(k,n)
and
Yn =
2k Yk,n .
k=1
Then Y1 , Y2 , . . . are independent and P i 2k < Yn (i + 1) 2k = 2k for all n, k, i .
Thus P(Yn x) = x for all x (0, 1]. So Xn = Gn (Yn ) are independent random variables with
distribution Fn , which can be shown analogous to (b).
2
64
B.3
(B.1)
j=1
for some finite partition {I1 , . . . , In } of [a, b] into subintervals of lengths Ij  < .
This corresponds to an approximation of f by step functions
simple functions which are constant on intervals.
Pn
j=1 f (xj )
The picture is taken from R.L. Schilling, Measures, Integrals and Martingales, CUP 2005. He
writes:
... the Riemann sums partition the domain of the function without taking into
account the shape of the function, thus slicing up the area under the function vertically. Lebesgues approach is exactly the opposit: the domain is partitioned
according to the values of the function at hand, leading to a horizontal decomposition of the area.
Theorem. Lebesgues integrability criterium
f : [a, b] R is Rintegrable if and only if f is bounded on [a, b] and continuous almost
everywhere, i.e. the set of points in [a, b] where f is not continuous has Lebesgue measure 0.
Corollary. Suppose f : [a, b] R is Rintegrable. Then it is also Lebesgue integrable
(Lintegrable) and the values of both integrals coincide.
Proof. For a partition {I1 , . . . , In } define the step functions
gn =
n
X
gn =
j=1
n
X
j=1
65
Rb
so the sum in (B.1) converges and R = a f (x) dx .
Rb
On the other hand, if the sum in (B.1) converges, then a gn (x) g n (x) dx 0 and thus
for the limit functions g = g a.e., and f is continuous a.e. since g f g .
If f was not bounded, one could choose xj in (B.1) such that the sum does not converge. 2
On the other hand, not every Lintegrable function is also Rintegrable. The standard example
is f = 1[0,1]Q , which can be made Rintegrable by changing it on a set of Lmeasure 0.
This might suggest that for every Lintegrable f there exists an Rintegrable g with f = g a.e. .
This is not true as demonstrated by the following example:
Let {r1 , r2 . . .} be an enumeration of the rationals in (0, 1). For small > 0 and each n N
chooseS an open interval In (0, 1) with rn In and Lmeasure (In ) < 2n . Put
A = n In . Then A is dense in (0, 1) with 0 < (A) < and thus for any nondegenerate
subinterval I of (0, 1), (A I) > 0.
Take f = 1A and suppose that f = g a.e.. Let {I
j } be some decomposition of (0, 1) into
subintervals. Since for each j, Ij A {f = g} = (Ij A) > 0, g(xj ) = f (xj ) = 1 for
some xj Ij A, and thus
n
X
(B.2)
j=1
If g were Rintegrable, its integral would have to coincide with the Lintegral
which is in contradiction to (B.2).
B.4
R1
0
f d = (A),
Handout 4 contains statements and proofs of Lemma 6.5 and Theorems 6.6 and 6.7, which
can be found in Section 6.3.
66