Terence Tao
Department of Mathematics, UCLA, Los Angeles, CA
90095
Email address: tao@math.ucla.edu
To Garth Gaudry, who set me on the road;
To my family, for their constant support;
And to the readers of my blog, for their feedback and contributions.
Contents
Preface ix
Notation x
Acknowledgments xvi
Chapter 1. Measure theory 1
'1.1. Prologue: The problem of measure 2
'1.2. Lebesgue measure 17
'1.3. The Lebesgue integral 46
'1.4. Abstract measure spaces 79
'1.5. Modes of convergence 114
'1.6. Diﬀerentiation theorems 131
'1.7. Outer measures, premeasures, and product measures 179
Chapter 2. Related articles 209
'2.1. Problem solving strategies 210
'2.2. The Radamacher diﬀerentiation theorem 226
'2.3. Probability spaces 232
'2.4. Inﬁnite product spaces and the Kolmogorov extension
theorem 235
Bibliography 243
vii
viii Contents
Index 245
Preface
In the fall of 2010, I taught an introductory onequarter course on
graduate real analysis, focusing in particular on the basics of mea
sure and integration theory, both in Euclidean spaces and in abstract
measure spaces. This text is based on my lecture notes of that course,
which are also available online on my blog terrytao.wordpress.com,
together with some supplementary material, such as a section on prob
lem solving strategies in real analysis (Section 2.1) which evolved from
discussions with my students.
This text is intended to form a prequel to my graduate text
[Ta2010] (henceforth referred to as An epsilon of room, Vol. I ),
which is an introduction to the analysis of Hilbert and Banach spaces
(such as L
p
and Sobolev spaces), pointset topology, and related top
ics such as Fourier analysis and the theory of distributions; together,
they serve as a text for a complete ﬁrstyear graduate course in real
analysis.
The approach to measure theory here is inspired by the text
[StSk2005], which was used as a secondary text in my course. In
particular, the ﬁrst half of the course is devoted almost exclusively
to measure theory on Euclidean spaces R
d
(starting with the more
elementary JordanRiemannDarboux theory, and only then moving
on to the more sophisticated Lebesgue theory), deferring the abstract
aspects of measure theory to the second half of the course. I found
ix
x Preface
that this approach strengthened the student’s intuition in the early
stages of the course, and helped provide motivation for more abstract
constructions, such as Carath´eodory’s general construction of a mea
sure from an outer measure.
Most of the material here is selfcontained, assuming only an
undergraduate knowledge in real analysis (and in particular, on the
HeineBorel theorem, which we will use as the foundation for our
construction of Lebesgue measure); a secondary real analysis text can
be used in conjunction with this one, but it is not strictly necessary.
A small number of exercises however will require some knowledge of
pointset topology or of settheoretic concepts such as cardinals and
ordinals.
A large number of exercises are interspersed throughout the text,
and it is intended that the reader perform a signiﬁcant fraction of
these exercises while going through the text. Indeed, many of the key
results and examples in the subject will in fact be presented through
the exercises. In my own course, I used the exercises as the basis
for the examination questions, and signalled this well in advance, to
encourage the students to attempt as many of the exercises as they
could as preparation for the exams.
The core material is contained in Chapter 1, and already com
prises a full quarter’s worth of material. Section 2.1 is a much more
informal section than the rest of the book, focusing on describing
problem solving strategies, either speciﬁc to real analysis exercises, or
more generally applicable to a wider set of mathematical problems;
this section evolved from various discussions with students through
out the course. The remaining three sections in Chapter 2 are op
tional topics, which require understanding of most of the material in
Chapter 1 as a prerequisite (although Section 2.3 can be read after
completing Section 1.4.
Notation
For reasons of space, we will not be able to deﬁne every single math
ematical term that we use in this book. If a term is italicised for
reasons other than emphasis or for deﬁnition, then it denotes a stan
dard mathematical object, result, or concept, which can be easily
Notation xi
looked up in any number of references. (In the blog version of the
book, many of these terms were linked to their Wikipedia pages, or
other online reference pages.)
Given a subset E of a space X, the indicator function 1
E
: X → R
is deﬁned by setting 1
E
(x) equal to 1 for x ∈ E and equal to 0 for
x ∈ E.
For any natural number d, we refer to the vector space R
d
:=
¦(x
1
, . . . , x
d
) : x
1
, . . . , x
d
∈ R¦ as (ddimensional) Euclidean space.
A vector (x
1
, . . . , x
d
) in R
d
has length
[(x
1
, . . . , x
d
)[ := (x
2
1
+. . . +x
2
d
)
1/2
and two vectors (x
1
, . . . , x
d
), (y
1
, . . . , y
d
) have dot product
(x
1
, . . . , x
d
) (y
1
, . . . , y
d
) := x
1
y
1
+. . . +x
d
y
d
.
The extended nonnegative real axis [0, +∞] is the nonnegative
real axis [0, +∞) := ¦x ∈ R : x ≥ 0¦ with an additional element
adjointed to it, which we label +∞; we will need to work with this
system because many sets (e.g. R
d
) will have inﬁnite measure. Of
course, +∞is not a real number, but we think of it as an extended real
number. We extend the addition, multiplication, and order structures
on [0, +∞) to [0, +∞] by declaring
+∞+x = x + +∞ = +∞
for all x ∈ [0, +∞],
+∞ x = x +∞ = +∞
for all nonzero x ∈ (0, +∞],
+∞ 0 = 0 +∞ = 0,
and
x < +∞ for all x ∈ [0, +∞).
Most of the laws of algebra for addition, multiplication, and order
continue to hold in this extended number system; for instance ad
dition and multiplication are commutative and associative, with the
latter distributing over the former, and an order relation x ≤ y is
preserved under addition or multiplication of both sides of that re
lation by the same quantity. However, we caution that the laws of
xii Preface
cancellation do not apply once some of the variables are allowed to be
inﬁnite; for instance, we cannot deduce x = y from +∞+x = +∞+y
or from +∞ x = +∞ y. This is related to the fact that the forms
+∞ − +∞ and +∞/ + ∞ are indeterminate (one cannot assign a
value to them without breaking a lot of the rules of algebra). A gen
eral rule of thumb is that if one wishes to use cancellation (or proxies
for cancellation, such as subtraction or division), this is only safe if
one can guarantee that all quantities involved are ﬁnite (and in the
case of multiplicative cancellation, the quantity being cancelled also
needs to be nonzero, of course). However, as long as one avoids us
ing cancellation and works exclusively with nonnegative quantities,
there is little danger in working in the extended real number system.
We note also that once one adopts the convention +∞ 0 =
0 +∞ = 0, then multiplication becomes upward continuous (in the
sense that whenever x
n
∈ [0, +∞] increases to x ∈ [0, +∞], and
y
n
∈ [0, +∞] increases to y ∈ [0, +∞], then x
n
y
n
increases to xy)
but not downward continuous (e.g. 1/n → 0 but 1/n +∞ → 0
+∞). This asymmetry will ultimately cause us to deﬁne integration
from below rather than from above, which leads to other asymmetries
(e.g. the monotone convergence theorem (Theorem 1.4.44) applies
for monotone increasing functions, but not necessarily for monotone
decreasing ones).
Remark 0.0.1. Note that there is a tradeoﬀ here: if one wants
to keep as many useful laws of algebra as one can, then one can
add in inﬁnity, or have negative numbers, but it is diﬃcult to have
both at the same time. Because of this tradeoﬀ, we will see two
overlapping types of measure and integration theory: the nonnegative
theory, which involves quantities taking values in [0, +∞], and the
absolutely integrable theory, which involves quantities taking values in
(−∞, +∞) or C. For instance, the fundamental convergence theorem
for the former theory is the monotone convergence theorem (Theorem
1.4.44), while the fundamental convergence theorem for the latter is
the dominated convergence theorem (Theorem 1.4.49). Both branches
of the theory are important, and both will be covered in later notes.
One important feature of the extended nonnegative real axis is
that all sums are convergent: given any sequence x
1
, x
2
, . . . ∈ [0, +∞],
Notation xiii
we can always form the sum
∞
¸
n=1
x
n
∈ [0, +∞]
as the limit of the partial sums
¸
N
n=1
x
n
, which may be either ﬁnite
or inﬁnite. An equivalent deﬁnition of this inﬁnite sum is as the
supremum of all ﬁnite subsums:
∞
¸
n=1
x
n
= sup
F⊂N,F ﬁnite
¸
n∈F
x
n
.
Motivated by this, given any collection (x
α
)
α∈A
of numbers x
α
∈
[0, +∞] indexed by an arbitrary set A (ﬁnite or inﬁnite, countable or
uncountable), we can deﬁne the sum
¸
α∈A
x
α
by the formula
(0.1)
¸
α∈A
x
α
= sup
F⊂A,F ﬁnite
¸
α∈F
x
α
.
Note from this deﬁnition that one can relabel the collection in an
arbitrary fashion without aﬀecting the sum; more precisely, given
any bijection φ : B → A, one has the change of variables formula
(0.2)
¸
α∈A
x
α
=
¸
β∈B
x
φ(β)
.
Note that when dealing with signed sums, the above rearrangement
identity can fail when the series is not absolutely convergent (cf. the
Riemann rearrangement theorem).
Exercise 0.0.1. If (x
α
)
α∈A
is a collection of numbers x
α
∈ [0, +∞]
such that
¸
α∈A
x
α
< ∞, show that x
α
= 0 for all but at most
countably many α ∈ A, even if A itself is uncountable.
We will rely frequently on the following basic fact (a special case
of the FubiniTonelli theorem, Corollary 1.7.23):
Theorem 0.0.2 (Tonelli’s theorem for series). Let (x
n,m
)
n,m∈N
be a
doubly inﬁnite sequence of extended nonnegative reals x
n,m
∈ [0, +∞].
Then
¸
(n,m)∈N
2
x
n,m
=
∞
¸
n=1
∞
¸
m=1
x
n,m
=
∞
¸
m=1
∞
¸
n=1
x
n,m
.
xiv Preface
Informally, Tonelli’s theorem asserts that we may rearrange inﬁ
nite series with impunity as long as all summands are nonnegative.
Proof. We shall just show the equality of the ﬁrst and second ex
pressions; the equality of the ﬁrst and third is proven similarly.
We ﬁrst show that
¸
(n,m)∈N
2
x
n,m
≤
∞
¸
n=1
∞
¸
m=1
x
n,m
.
Let F be any ﬁnite subset of N
2
. Then F ⊂ ¦1, . . . , N¦ ¦1, . . . , N¦
for some ﬁnite N, and thus (by the nonnegativity of the x
n,m
)
¸
(n,m)∈F
x
n,m
≤
¸
(n,m)∈¦1,...,N¦·¦1,...,N¦
x
n,m
.
The righthand side can be rearranged as
N
¸
n=1
N
¸
m=1
x
n,m
,
which is clearly at most
¸
∞
n=1
¸
∞
m=1
x
n,m
(again by nonnegativity
of x
n,m
). This gives
¸
(n,m)∈F
x
n,m
≤
∞
¸
n=1
∞
¸
m=1
x
n,m
.
for any ﬁnite subset F of N
2
, and the claim then follows from (0.1).
It remains to show the reverse inequality
∞
¸
n=1
∞
¸
m=1
x
n,m
≤
¸
(n,m)∈N
2
x
n,m
.
It suﬃces to show that
N
¸
n=1
∞
¸
m=1
x
n,m
≤
¸
(n,m)∈N
2
x
n,m
for each ﬁnite N.
Fix N. As each
¸
∞
m=1
x
n,m
is the limit of
¸
M
m=1
x
n,m
, the left
hand side is the limit of
¸
N
n=1
¸
M
m=1
x
n,m
as M → ∞. Thus it
Notation xv
suﬃces to show that
N
¸
n=1
M
¸
m=1
x
n,m
≤
¸
(n,m)∈N
2
x
n,m
for each ﬁnite M. But the lefthand side is
¸
(n,m)∈¦1,...,N¦·¦1,...,M¦
x
n,m
,
and the claim follows.
Remark 0.0.3. Note how important it was that the x
n,m
were non
negative in the above argument. In the signed case, one needs an
additional assumption of absolute summability of x
n,m
on N
2
before
one is permitted to interchange sums; this is Fubini’s theorem for
series, which we will encounter later in this text. Without absolute
summability or nonnegativity hypotheses, the theorem can fail (con
sider for instance the case when x
n,m
equals +1 when n = m, −1
when n = m+ 1, and 0 otherwise).
Exercise 0.0.2 (Tonelli’s theorem for series over arbitrary sets). Let
A, B be sets (possibly inﬁnite or uncountable), and (x
n,m
)
n∈A,m∈B
be a doubly inﬁnite sequence of extended nonnegative reals x
n,m
∈
[0, +∞] indexed by A and B. Show that
¸
(n,m)∈A·B
x
n,m
=
¸
n∈A
¸
m∈B
x
n,m
=
¸
m∈B
¸
n∈A
x
n,m
.
(Hint: although not strictly necessary, you may ﬁnd it convenient to
ﬁrst establish the fact that if
¸
n∈A
x
n
is ﬁnite, then x
n
is nonzero
for at most countably many n.)
Next, we recall the axiom of choice, which we shall be assuming
throughout the text:
Axiom 0.0.4 (Axiom of choice). Let (E
α
)
α∈A
be a family of non
empty sets E
α
, indexed by an index set A. Then we can ﬁnd a family
(x
α
)
α∈A
of elements x
α
of E
α
, indexed by the same set A.
This axiom is trivial when A is a singleton set, and from math
ematical induction one can also prove it without diﬃculty when A
is ﬁnite. However, when A is inﬁnite, one cannot deduce this axiom
from the other axioms of set theory, but must explicitly add it to the
list of axioms. We isolate the countable case as a particularly useful
xvi Preface
corollary (though one which is strictly weaker than the full axiom of
choice):
Corollary 0.0.5 (Axiom of countable choice). Let E
1
, E
2
, E
3
, . . . be
a sequence of nonempty sets. Then one can ﬁnd a sequence x
1
, x
2
, . . .
such that x
n
∈ E
n
for all n = 1, 2, 3, . . ..
Remark 0.0.6. The question of how much of real analysis still sur
vives when one is not permitted to use the axiom of choice is a delicate
one, involving a fair amount of logic and descriptive set theory to an
swer. We will not discuss these matters in this text. We will however
note a theorem of G¨odel[Go1938] that states that any statement that
can be phrased in the ﬁrstorder language of Peano arithmetic, and
which is proven with the axiom of choice, can also be proven without
the axiom of choice. So, roughly speaking, G¨odel’s theorem tells us
that for any “ﬁnitary” application of real analysis (which includes
most of the “practical” applications of the subject), it is safe to use
the axiom of choice; it is only when asking questions about “inﬁni
tary” objects that are beyond the scope of Peano arithmetic that one
can encounter statements that are provable using the axiom of choice,
but are not provable without it.
Acknowledgments
This text was strongly inﬂuenced by the real analysis text of Stein
and Shakarchi[StSk2005], which was used as a secondary text when
teaching the course on which these notes were based. In particular,
the strategy of focusing ﬁrst on Lebesgue measure and Lebesgue inte
gration, before moving onwards to abstract measure and integration
theory, was directly inspired by the treatment in [StSk2005], and
the material on diﬀerentiation theorems also closely follows that in
[StSk2005]. On the other hand, our discussion here diﬀers from that
in [StSk2005] in other respects; for instance, a far greater emphasis
is placed on Jordan measure and the Riemann integral as being an
elementary precursor to Lebesgue measure and the Lebesgue integral.
I am greatly indebted to my students of the course on which this
text was based, as well as many further commenters on my blog,
including Marco Angulo, J. Balachandran, Farzin Barekat, Marek
Acknowledgments xvii
Bern´at, Lewis Bowen, Chris Breeden, Danny Calegari, Yu Cao, Chan
drasekhar, David Chang, Nick Cook, Damek Davis, Eric Davis, Mar
ton Eekes, Wenying Gan, Nick Gill, Ulrich Groh, Tim Gowers, Lau
rens Gunnarsen, Tobias Hagge, Xueping Huang, Bo Jacoby, Apoorva
Khare, Shiping Liu, Colin McQuillan, David Milovich, Hossein Naderi,
Brent Nelson, Constantin Niculescu, Mircea Petrache, Walt Pohl,
Jim Ralston, David Roberts, Mark Schwarzmann, Vladimir Slepnev,
David Speyer, Tim Sullivan, Jonathan Weinstein, Duke Zhang, Lei
Zhang, Pavel Zorin, and several anonymous commenters, for provid
ing corrections and useful commentary on the material here. These
comments can be viewed online at
terrytao.wordpress.com/category/teaching/245arealanalysis
The author is supported by a grant from the MacArthur Founda
tion, by NSF grant DMS0649473, and by the NSF Waterman award.
Chapter 1
Measure theory
1
2 1. Measure theory
1.1. Prologue: The problem of measure
One of the most fundamental concepts in Euclidean geometry is that
of the measure m(E) of a solid body E in one or more dimensions. In
one, two, and three dimensions, we refer to this measure as the length,
area, or volume of E respectively. In the classical approach to geom
etry, the measure of a body was often computed by partitioning that
body into ﬁnitely many components, moving around each component
by a rigid motion (e.g. a translation or rotation), and then reassem
bling those components to form a simpler body which presumably
has the same area. One could also obtain lower and upper bounds on
the measure of a body by computing the measure of some inscribed
or circumscribed body; this ancient idea goes all the way back to the
work of Archimedes at least. Such arguments can be justiﬁed by an
appeal to geometric intuition, or simply by postulating the existence
of a measure m(E) that can be assigned to all solid bodies E, and
which obeys a collection of geometrically reasonable axioms. One can
also justify the concept of measure on “physical” or “reductionistic”
grounds, viewing the measure of a macroscopic body as the sum of
the measures of its microscopic components.
With the advent of analytic geometry, however, Euclidean geom
etry became reinterpreted as the study of Cartesian products R
d
of
the real line R. Using this analytic foundation rather than the classi
cal geometrical one, it was no longer intuitively obvious how to deﬁne
the measure m(E) of a general
1
subset E of R
d
; we will refer to this
(somewhat vaguely deﬁned) problem of writing down the “correct”
deﬁnition of measure as the problem of measure.
To see why this problem exists at all, let us try to formalise some
of the intuition for measure discussed earlier. The physical intuition
of deﬁning the measure of a body E to be the sum of the measure
of its component “atoms” runs into an immediate problem: a typical
solid body would consist of an inﬁnite (and uncountable) number of
points, each of which has a measure of zero; and the product ∞ 0 is
indeterminate. To make matters worse, two bodies that have exactly
1
One can also pose the problem of measure on other domains than Euclidean
space, such as a Riemannian manifold, but we will focus on the Euclidean case here for
simplicity, and refer to any text on Riemannian geometry for a treatment of integration
on manifolds.
1.1. Prologue: The problem of measure 3
the same number of points, need not have the same measure. For
instance, in one dimension, the intervals A := [0, 1] and B := [0, 2]
are in onetoone correspondence (using the bijection x → 2x from A
to B), but of course B is twice as long as A. So one can disassemble
A into an uncountable number of points and reassemble them to form
a set of twice the length.
Of course, one can point to the inﬁnite (and uncountable) number
of components in this disassembly as being the cause of this break
down of intuition, and restrict attention to just ﬁnite partitions. But
one still runs into trouble here for a number of reasons, the most
striking of which is the BanachTarski paradox, which shows that the
unit ball B := ¦(x, y, z) ∈ R
3
: x
2
+y
2
+z
2
≤ 1¦ in three dimensions
2
can be disassembled into a ﬁnite number of pieces (in fact, just ﬁve
pieces suﬃce), which can then be reassembled (after translating and
rotating each of the pieces) to form two disjoint copies of the ball B.
Here, the problem is that the pieces used in this decomposition are
highly pathological in nature; among other things, their construction
requires use of the axiom of choice. (This is in fact necessary; there
are models of set theory without the axiom of choice in which the
BanachTarski paradox does not occur, thanks to a famous theorem
of Solovay[So1970].) Such pathological sets almost never come up in
practical applications of mathematics. Because of this, the standard
solution to the problem of measure has been to abandon the goal
of measuring every subset E of R
d
, and instead to settle for only
measuring a certain subclass of “nonpathological” subsets of R
d
,
which are then referred to as the measurable sets. The problem of
measure then divides into several subproblems:
(i) What does it mean for a subset E of R
d
to be measurable?
(ii) If a set E is measurable, how does one deﬁne its measure?
(iii) What nice properties or axioms does measure (or the con
cept of measurability) obey?
2
The paradox only works in three dimensions and higher, for reasons having to
do with the grouptheoretic property of amenability; see ¸2.2 of An epsilon of room,
Vol. I for further discussion.
4 1. Measure theory
(iv) Are “ordinary” sets such as cubes, balls, polyhedra, etc.
measurable?
(v) Does the measure of an “ordinary” set equal the “naive geo
metric measure” of such sets? (e.g. is the measure of an
a b rectangle equal to ab?)
These questions are somewhat openended in formulation, and
there is no unique answer to them; in particular, one can expand the
class of measurable sets at the expense of losing one or more nice
properties of measure in the process (e.g. ﬁnite or countable addi
tivity, translation invariance, or rotation invariance). However, there
are two basic answers which, between them, suﬃce for most applica
tions. The ﬁrst is the concept of Jordan measure (or Jordan content)
of a Jordan measurable set, which is a concept closely related to that
of the Riemann integral (or Darboux integral ). This concept is el
ementary enough to be systematically studied in an undergraduate
analysis course, and suﬃces for measuring most of the “ordinary”
sets (e.g. the area under the graph of a continuous function) in many
branches of mathematics. However, when one turns to the type of
sets that arise in analysis, and in particular those sets that arise as
limits (in various senses) of other sets, it turns out that the Jordan
concept of measurability is not quite adequate, and must be extended
to the more general notion of Lebesgue measurability, with the corre
sponding notion of Lebesgue measure that extends Jordan measure.
With the Lebesgue theory (which can be viewed as a completion of
the JordanDarbouxRiemann theory), one keeps almost all of the de
sirable properties of Jordan measure, but with the crucial additional
property that many features of the Lebesgue theory are preserved un
der limits (as exempliﬁed in the fundamental convergence theorems
of the Lebesgue theory, such as the monotone convergence theorem
(Theorem 1.4.44) and the dominated convergence theorem (Theorem
1.4.49), which do not hold in the JordanDarbouxRiemann setting).
1.1. Prologue: The problem of measure 5
As such, they are particularly well suited
3
for applications in analysis,
where limits of functions or sets arise all the time.
In later sections, we will formally deﬁne Lebesgue measure and
the Lebesgue integral, as well as the more general concept of an ab
stract measure space and the associated integration operation. In
the rest of the current section, we will discuss the more elementary
concepts of Jordan measure and the Riemann integral. This mate
rial will eventually be superceded by the more powerful theory to be
treated in later sections; but it will serve as motivation for that later
material, as well as providing some continuity with the treatment of
measure and integration in undergraduate analysis courses.
1.1.1. Elementary measure. Before we discuss Jordan measure,
we discuss the even simpler notion of elementary measure, which al
lows one to measure a very simple class of sets, namely the elementary
sets (ﬁnite unions of boxes).
Deﬁnition 1.1.1 (Intervals, boxes, elementary sets). An interval is
a subset of R of the form [a, b] := ¦x ∈ R : a ≤ x ≤ b¦, [a, b) := ¦x ∈
R : a ≤ x < b¦, (a, b] := ¦x ∈ R : a < x ≤ b¦, or (a, b) := ¦x ∈ R :
a < x < b¦, where a ≤ b are real numbers. We deﬁne the length
4
[I[
of an interval I = [a, b], [a, b), (a, b], (a, b) to be [I[ := b −a. A box in
R
d
is a Cartesian product B := I
1
. . . I
d
of d intervals I
1
, . . . , I
d
(not necessarily of the same length), thus for instance an interval is
a onedimensional box. The volume [B[ of such a box B is deﬁned as
[B[ := [I
1
[ . . . [I
d
[. An elementary set is any subset of R
d
which
is the union of a ﬁnite number of boxes.
Exercise 1.1.1 (Boolean closure). Show that if E, F ⊂ R
d
are ele
mentary sets, then the union E ∪ F, the intersection E ∩ F, and the
set theoretic diﬀerence E`F := ¦x ∈ E : x ∈ F¦, and the symmetric
diﬀerence E∆F := (E`F) ∪ (F`E) are also elementary. If x ∈ R
d
,
show that the translate E+x := ¦y+x : y ∈ E¦ is also an elementary
set.
3
There are other ways to extend Jordan measure and the Riemann integral, see
for instance Exercise 1.6.53 or Section 1.7.3, but the Lebesgue approach handles limits
and rearrangement better than the other alternatives, and so has become the stan
dard approach in analysis; it is also particularly well suited for providing the rigorous
foundations of probability theory, as discussed in Section 2.3.
4
Note we allow degenerate intervals of zero length.
6 1. Measure theory
We now give each elementary set a measure.
Lemma 1.1.2 (Measure of an elementary set). Let E ⊂ R
d
be an
elementary set.
(i) E can be expressed as the ﬁnite union of disjoint boxes.
(ii) If E is partitioned as the ﬁnite union B
1
∪. . .∪B
k
of disjoint
boxes, then the quantity m(E) := [B
1
[ + . . . + [B
k
[ is inde
pendent of the partition. In other words, given any other
partition B
/
1
∪ . . . ∪ B
/
k
of E, one has [B
1
[ + . . . + [B
k
[ =
[B
/
1
[ +. . . +[B
/
k
[.
We refer to m(E) as the elementary measure of E. (We occasionally
write m(E) as m
d
(E) to emphasise the ddimensional nature of the
measure.) Thus, for example, the elementary measure of (1, 2) ∪[3, 6]
is 4.
Proof. We ﬁrst prove (i) in the onedimensional case d = 1. Given
any ﬁnite collection of intervals I
1
, . . . , I
k
, one can place the 2k end
points of these intervals in increasing order (discarding repetitions).
Looking at the open intervals between these endpoints, together with
the endpoints themselves (viewed as intervals of length zero), we see
that there exists a ﬁnite collection of disjoint intervals J
1
, . . . , J
k
such that each of the I
1
, . . . , I
k
are a union of some subcollection of
the J
1
, . . . , J
k
. This already gives (i) when d = 1. To prove the
higher dimensional case, we express E as the union B
1
, . . . , B
k
of
boxes B
i
= I
i,1
. . . I
i,d
. For each j = 1, . . . , d, we use the one
dimensional argument to express I
1,j
, . . . , I
k,j
as the union of sub
collections of a collection J
1,j
, . . . , J
k
j
,j
of disjoint intervals. Taking
Cartesian products, we can express the B
1
, . . . , B
k
as ﬁnite unions of
boxes J
i1,1
. . . J
i
d
,d
, where 1 ≤ i
j
≤ k
/
j
for all 1 ≤ j ≤ d. Such
boxes are all disjoint, and the claim follows.
To prove (ii) we use a discretisation argument. Observe (exercise!)
that for any interval I, the length of I can be recovered by the limiting
formula
[I[ = lim
N→∞
1
N
#(I ∩
1
N
Z)
1.1. Prologue: The problem of measure 7
where
1
N
Z := ¦
n
N
: n ∈ Z¦ and #A denotes the cardinality of a ﬁnite
set A. Taking Cartesian products, we see that
[B[ = lim
N→∞
1
N
d
#(B ∩
1
N
Z
d
)
for any box B, and in particular that
[B
1
[ +. . . +[B
k
[ = lim
N→∞
1
N
d
#(E ∩
1
N
Z
d
).
Denoting the righthand side as m(E), we obtain the claim (ii).
Exercise 1.1.2. Give an alternate proof of Lemma 1.1.2(ii) by show
ing that any two partitions of E into boxes admit a mutual reﬁnement
into boxes that arise from taking Cartesian products of elements from
ﬁnite collections of disjoint intervals.
Remark 1.1.3. One might be tempted to now deﬁne the measure
m(E) of an arbitrary set E ⊂ R
d
by the formula
(1.1) m(E) := lim
N→∞
1
N
d
#(E ∩
1
N
Z
d
),
since this worked well for elementary sets. However, this deﬁnition
is not particularly satisfactory for a number of reasons. Firstly, one
can concoct examples in which the limit does not exist (Exercise!).
Even when the limit does exist, this concept does not obey reasonable
properties such as translation invariance. For instance, if d = 1 and
E := Q∩[0, 1] := ¦x ∈ Q : 0 ≤ x ≤ 1¦, then this deﬁnition would give
E a measure of 1, but would give the translate E +
√
2 := ¦x +
√
2 :
x ∈ Q; 0 ≤ x ≤ 1¦ a measure of zero. Nevertheless, the formula (1.1)
will be valid for all Jordan measurable sets (see Exercise 1.1.13). It
also makes precise an important intuition, namely that the continuous
concept of measure can be viewed
5
as a limit of the discrete concept
of (normalised) cardinality.
From the deﬁnitions, it is clear that m(E) is a nonnegative real
number for every elementary set E, and that
m(E ∪ F) = m(E) +m(F)
5
Another way to obtain continuous measure as the limit of discrete measure is
via Monte Carlo integration, although in order to rigorously introduce the probability
theory needed to set up Monte Carlo integration properly, one already needs to develop
a large part of measure theory, so this perspective, while intuitive, is not suitable for
foundational purposes.
8 1. Measure theory
whenever E and F are disjoint elementary sets. We refer to the latter
property as ﬁnite additivity; by induction it also implies that
m(E
1
∪ . . . ∪ E
k
) = m(E
1
) +. . . +m(E
k
)
whenever E
1
, . . . , E
k
are disjoint elementary sets. We also have the
obvious degenerate case
m(∅) = 0.
Finally, elementary measure clearly extends the notion of volume, in
the sense that
m(B) = [B[
for all boxes B.
From nonnegativity and ﬁnite additivity (and Exercise 1.1.1) we
conclude the monotonicity property
m(E) ≤ m(F)
whenever E ⊂ F are nested elementary sets. From this and ﬁnite
additivity (and Exercise 1.1.1) we easily obtain the ﬁnite subadditivity
property
m(E ∪ F) ≤ m(E) +m(F)
whenever E, F are elementary sets (not necessarily disjoint); by in
duction one then has
m(E
1
∪ . . . ∪ E
k
) ≤ m(E
1
) +. . . +m(E
k
)
whenever E
1
, . . . , E
k
are elementary sets (not necessarily disjoint).
It is also clear from the deﬁnition that we have the translation
invariance
m(E +x) = m(E)
for all elementary sets E and x ∈ R
d
.
These properties in fact deﬁne elementary measure up to normal
isation:
Exercise 1.1.3 (Uniqueness of elementary measure). Let d ≥ 1. Let
m
/
: c(R
d
) → R
+
be a map from the collection c(R
d
) of elementary
subsets of R
d
to the nonnegative reals that obeys the nonnegativity,
ﬁnite additivity, and translation invariance properties. Show that
there exists a constant c ∈ R
+
such that m
/
(E) = cm(E) for all
1.1. Prologue: The problem of measure 9
elementary sets E. In particular, if we impose the additional normal
isation m
/
([0, 1)
d
) = 1, then m
/
≡ m. (Hint: Set c := m
/
([0, 1)
d
), and
then compute m
/
([0,
1
n
)
d
) for any positive integer n.)
Exercise 1.1.4. Let d
1
, d
2
≥ 1, and let E
1
⊂ R
d1
, E
2
⊂ R
d2
be
elementary sets. Show that E
1
E
2
⊂ R
d1+d2
is elementary, and
m
d1+d2
(E
1
E
2
) = m
d1
(E
1
) m
d2
(E
2
).
1.1.2. Jordan measure. We now have a satisfactory notion of mea
sure for elementary sets. But of course, the elementary sets are a very
restrictive class of sets, far too small for most applications. For in
stance, a solid triangle or disk in the plane will not be elementary, or
even a rotated box. On the other hand, as essentially observed long
ago by Archimedes, such sets E can be approximated from within and
without by elementary sets A ⊂ E ⊂ B, and the inscribing elemen
tary set A and the circumscribing elementary set B can be used to
give lower and upper bounds on the putative measure of E. As one
makes the approximating sets A, B increasingly ﬁne, one can hope
that these two bounds eventually match. This gives rise to the fol
lowing deﬁnitions.
Deﬁnition 1.1.4 (Jordan measure). Let E ⊂ R
d
be a bounded set.
• The Jordan inner measure m
∗,(J)
(E) of E is deﬁned as
m
∗,(J)
(E) := sup
A⊂E,A elementary
m(A).
• The Jordan outer measure m
∗,(J)
(E) of E is deﬁned as
m
∗,(J)
(E) := inf
B⊃E,B elementary
m(B).
• If m
∗,(J)
(E) = m
∗,(J)
(E), then we say that E is Jordan
measurable, and call m(E) := m
∗,(J)
(E) = m
∗,(J)
(E) the
Jordan measure of E. As before, we write m(E) as m
d
(E)
when we wish to emphasise the dimension d.
By convention, we do not consider unbounded sets to be Jordan mea
surable (they will be deemed to have inﬁnite Jordan outer measure).
Jordan measurable sets are those sets which are “almost elemen
tary” with respect to Jordan outer measure. More precisely, we have
10 1. Measure theory
Exercise 1.1.5 (Characterisation of Jordan measurability). Let E ⊂
R
d
be bounded. Show that the following are equivalent:
(1) E is Jordan measurable.
(2) For every ε > 0, there exist elementary sets A ⊂ E ⊂ B
such that m(B`A) ≤ ε.
(3) For every ε > 0, there exists an elementary set A such that
m
∗,(J)
(A∆E) ≤ ε.
As one corollary of this exercise, we see that every elementary set
E is Jordan measurable, and that Jordan measure and elementary
measure coincide for such sets; this justiﬁes the use of m(E) to denote
both. In particular, we still have m(∅) = 0.
Jordan measurability also inherits many of the properties of ele
mentary measure:
Exercise 1.1.6. Let E, F ⊂ R
d
be Jordan measurable sets.
(1) (Boolean closure) Show that E∪F, E∩F, E`F, and E∆F
are Jordan measurable.
(2) (Nonnegativity) m(E) ≥ 0.
(3) (Finite additivity) If E, F are disjoint, then m(E ∪ F) =
m(E) +m(F).
(4) (Monotonicity) If E ⊂ F, then m(E) ≤ m(F).
(5) (Finite subadditivity) m(E ∪ F) ≤ m(E) +m(F).
(6) (Translation invariance) For any x ∈ R
d
, E + x is Jordan
measurable, and m(E +x) = m(E).
Now we give some examples of Jordan measurable sets:
Exercise 1.1.7 (Regions under graphs are Jordan measurable). Let
B be a closed box in R
d
, and let f : B → R be a continuous function.
(1) Show that the graph ¦(x, f(x)) : x ∈ B¦ ⊂ R
d+1
is Jordan
measurable in R
d+1
with Jordan measure zero. (Hint: on
a compact metric space, continuous functions are uniformly
continuous.)
(2) Show that the set ¦(x, t) : x ∈ B; 0 ≤ t ≤ f(x)¦ ⊂ R
d+1
is
Jordan measurable.
1.1. Prologue: The problem of measure 11
Exercise 1.1.8. Let A, B, C be three points in R
2
.
(1) Show that the solid triangle with vertices A, B, C is Jordan
measurable.
(2) Show that the Jordan measure of the solid triangle is equal
to
1
2
[(B −A) ∧ (C −A)[, where [(a, b) ∧ (c, d)[ := [ad −bc[.
(Hint: It may help to ﬁrst do the case when one of the edges, say
AB, is horizontal.)
Exercise 1.1.9. Show that every compact convex polytope
6
in R
d
is Jordan measurable.
Exercise 1.1.10. (1) Show that all open and closed Euclidean
balls B(x, r) := ¦y ∈ R
d
: [y − x[ < r¦, B(x, r) := ¦y ∈
R
d
: [y −x[ ≤ r¦ in R
d
are Jordan measurable, with Jordan
measure c
d
r
d
for some constant c
d
> 0 depending only on
d.
(2) Establish the crude bounds
2
√
d
d
≤ c
d
≤ 2
d
.
(An exact formula for c
d
is c
d
=
1
d
ω
d
, where ω
d
:=
2π
d/2
Γ(d/2)
is the
volume of the unit sphere S
d−1
⊂ R
d
and Γ is the Gamma function,
but we will not derive this formula here.)
Exercise 1.1.11. This exercise assumes familiarity with linear alge
bra. Let L : R
d
→ R
d
be a linear transformation.
(1) Show that there exists a nonnegative real number D such
that m(L(E)) = Dm(E) for every elementary set E (note
from previous exercises that L(E) is Jordan measurable).
(Hint: apply Exercise 1.1.3 to the map E → m(L(E)).)
(2) Show that if E is Jordan measurable, then L(E) is also, and
m(L(E)) = Dm(E).
6
A closed convex polytope is a subset of R
d
formed by intersecting together
ﬁnitely many closed halfspaces of the form ¦x ∈ R
d
: x v ≤ c¦, where v ∈ R
d
, c ∈ R,
and denotes the usual dot product on R
d
. A compact convex polytope is a closed
convex polytope which is also bounded.
12 1. Measure theory
(3) Show that D = [ det L[. (Hint: Work ﬁrst with the case
when L is an elementary transformation, using Gaussian
elimination. Alternatively, work with the cases when L is
a diagonal transformation or an orthogonal transformation,
using the unit ball in the latter case, and use the polar
decomposition.)
Exercise 1.1.12. Deﬁne a Jordan null set to be a Jordan measurable
set of Jordan measure zero. Show that any subset of a Jordan null
set is a Jordan null set.
Exercise 1.1.13. Show that (1.1) holds for all Jordan measurable
E ⊂ R
d
.
Exercise 1.1.14 (Metric entropy formulation of Jordan measurabil
ity). Deﬁne a dyadic cube to be a halfopen box of the form
¸
i
1
2
n
,
i
1
+ 1
2
n
. . .
¸
i
d
2
n
,
i
d
+ 1
2
n
for some integers n, i
1
, . . . , i
d
. Let E ⊂ R
d
be a bounded set. For
each integer n, let c
∗
(E, 2
−n
) denote the number of dyadic cubes of
sidelength 2
−n
that are contained in E, and let c
∗
(E, 2
−n
) be the
number of dyadic cubes
7
of sidelength 2
−n
that intersect E. Show
that E is Jordan measurable if and only if
lim
n→∞
2
−dn
(c
∗
(E, 2
−n
) −c
∗
(E, 2
−n
)) = 0,
in which case one has
m(E) = lim
n→∞
2
−dn
c
∗
(E, 2
−n
) = lim
n→∞
2
−dn
c
∗
(E, 2
−n
).
Exercise 1.1.15 (Uniqueness of Jordan measure). Let d ≥ 1. Let
m
/
: .(R
d
) → R
+
be a map from the collection .(R
d
) of Jordan
measurable subsets of R
d
to the nonnegative reals that obeys the
nonnegativity, ﬁnite additivity, and translation invariance properties.
Show that there exists a constant c ∈ R
+
such that m
/
(E) = cm(E)
for all Jordan measurable sets E. In particular, if we impose the
additional normalisation m
/
([0, 1)
d
) = 1, then m
/
≡ m.
7
This quantity could be called the (dyadic) metric entropy of E at scale 2
−n
.
1.1. Prologue: The problem of measure 13
Exercise 1.1.16. Let d
1
, d
2
≥ 1, and let E
1
⊂ R
d1
, E
2
⊂ R
d2
be
Jordan measurable sets. Show that E
1
E
2
⊂ R
d1+d2
is Jordan
measurable, and m
d1+d2
(E
1
E
2
) = m
d1
(E
1
) m
d2
(E
2
).
Exercise 1.1.17. Let P, Q be two polytopes in R
d
. Suppose that
P can be partitioned into ﬁnitely many subpolytopes which, after
being rotated and translated, form a cover of Q, with any two of the
subpolytopes in Q intersecting only at their boundaries. Conclude
that P and Q have the same Jordan measure. The converse statement
is true in one and two dimensions d = 1, 2 (this is the BolyaiGerwien
theorem), but false in higher dimensions (this was Dehn’s negative
answer[De1901] to Hilbert’s third problem).
The above exercises give a fairly large class of Jordan measurable
sets. However, not every subset of R
d
is Jordan measurable. First of
all, the unbounded sets are not Jordan measurable, by construction.
But there are also bounded sets that are not Jordan measurable:
Exercise 1.1.18. Let E ⊂ R
d
be a bounded set.
(1) Show that E and the closure E of E have the same Jordan
outer measure.
(2) Show that E and the interior E
◦
of E have the same Jordan
inner measure.
(3) Show that E is Jordan measurable if and only if the topo
logical boundary ∂E of E has Jordan outer measure zero.
(4) Show that the bulletriddled square [0, 1]
2
`Q
2
, and set of
bullets [0, 1]
2
∩ Q
2
, both have Jordan inner measure zero
and Jordan outer measure one. In particular, both sets are
not Jordan measurable.
Informally, any set with a lot of “holes”, or a very “fractal”
boundary, is unlikely to be Jordan measurable. In order to measure
such sets we will need to develop Lebesgue measure, which is done in
the next set of notes.
Exercise 1.1.19 (Carath´eodory type property). Let E ⊂ R
d
be
a bounded set, and F ⊂ R
d
be an elementary set. Show that
m
∗,(J)
(E) = m
∗,(J)
(E ∩ F) +m
∗,(J)
(E`F).
14 1. Measure theory
1.1.3. Connection with the Riemann integral. To conclude this
section, we brieﬂy discuss the relationship between Jordan measure
and the Riemann integral (or the equivalent Darboux integral ). For
simplicity we will only discuss the classical onedimensional Riemann
integral on an interval [a, b], though one can extend the Riemann the
ory without much diﬃculty to higherdimensional integrals on Jordan
measurable sets. (In later sections, this Riemann integral will be su
perceded by the Lebesgue integral.)
Deﬁnition 1.1.5 (Riemann integrability). Let [a, b] be an interval of
positive length, and f : [a, b] → R be a function. A tagged partition
{ = ((x
0
, x
1
, . . . , x
n
), (x
∗
1
, . . . , x
∗
n
)) of [a, b] is a ﬁnite sequence of real
numbers a = x
0
< x
1
< . . . < x
n
= b, together with additional
numbers x
i−1
≤ x
∗
i
≤ x
i
for each i = 1, . . . , n. We abbreviate x
i
−x
i−1
as δx
i
. The quantity ∆({) := sup
1≤i≤n
δx
i
will be called the norm
of the tagged partition. The Riemann sum {(f, {) of f with respect
to the tagged partition { is deﬁned as
{(f, {) :=
n
¸
i=1
f(x
∗
i
)δx
i
.
We say that f is Riemann integrable on [a, b] if there exists a real
number, denoted
b
a
f(x) dx and referred to as the Riemann integral
of f on [a, b], for which we have
b
a
f(x) dx = lim
∆(1)→0
{(f, {)
by which we mean that for every ε > 0 there exists δ > 0 such
that [{(f, {) −
b
a
f(x) dx[ ≤ ε for every tagged partition { with
∆({) ≤ δ.
If [a, b] is an interval of zero length, we adopt the convention that
every function f : [a, b] → R is Riemann integrable, with a Riemann
integral of zero.
Note that unbounded functions cannot be Riemann integrable
(why?).
The above deﬁnition, while geometrically natural, can be awk
ward to use in practice. A more convenient formulation of the Rie
mann integral can be formulated using some additional machinery.
1.1. Prologue: The problem of measure 15
Exercise 1.1.20 (Piecewise constant functions). Let [a, b] be an in
terval. A piecewise constant function f : [a, b] → R is a function
for which there exists a partition of [a, b] into ﬁnitely many intervals
I
1
, . . . , I
n
, such that f is equal to a constant c
i
on each of the intervals
I
i
. If f is piecewise constant, show that the expression
n
¸
i=1
c
i
[I
i
[
is independent of the choice of partition used to demonstrate the
piecewise constant nature of f. We will denote this quantity by
p.c.
b
a
f(x) dx, and refer to it as the piecewise constant integral of f
on [a, b].
Exercise 1.1.21 (Basic properties of the piecewise constant integral).
Let [a, b] be an interval, and let f, g : [a, b] → R be piecewise constant
functions. Establish the following statements:
(1) (Linearity) For any real number c, cf and f + g are piece
wise constant, with p.c.
b
a
cf(x) dx = cp.c.
b
a
f(x) dx and
p.c.
b
a
f(x) +g(x) dx = p.c.
b
a
f(x) dx + p.c.
b
a
g(x) dx.
(2) (Monotonicity) If f ≤ g pointwise (i.e. f(x) ≤ g(x) for all
x ∈ [a, b]) then p.c.
b
a
f(x) dx ≤ p.c.
b
a
g(x) dx.
(3) (Indicator) If E is an elementary subset of [a, b], then the in
dicator function 1
E
: [a, b] → R (deﬁned by setting 1
E
(x) :=
1 when x ∈ E and 1
E
(x) := 0 otherwise) is piecewise con
stant, and p.c.
b
a
1
E
(x) dx = m(E).
Deﬁnition 1.1.6 (Darboux integral). Let [a, b] be an interval, and
f : [a, b] → R be a bounded function. The lower Darboux integral
b
a
f(x) dx of f on [a, b] is deﬁned as
b
a
f(x) dx := sup
g≤f, piecewise constant
p.c.
b
a
g(x) dx,
where g ranges over all piecewise constant functions that are pointwise
bounded above by f. (The hypothesis that f is bounded ensures that
the supremum is over a nonempty set.) Similarly, we deﬁne the upper
16 1. Measure theory
Darboux integral
b
a
f(x) dx of f on [a, b] by the formula
b
a
f(x) dx := inf
h≥f, piecewise constant
p.c.
b
a
h(x) dx.
Clearly
b
a
f(x) dx ≤
b
a
f(x) dx. If these two quantities are equal,
we say that f is Darboux integrable, and refer to this quantity as the
Darboux integral of f on [a, b].
Note that the upper and lower Darboux integrals are related by
the reﬂection identity
b
a
−f(x) dx = −
b
a
f(x) dx.
Exercise 1.1.22. Let [a, b] be an interval, and f : [a, b] → R be a
bounded function. Show that f is Riemann integrable if and only
if it is Darboux integrable, in which case the Riemann integral and
Darboux integrals are equal.
Exercise 1.1.23. Show that any continuous function f : [a, b] →
R is Riemann integrable. More generally, show that any bounded,
piecewise continuous
8
function f : [a, b] → R is Riemann integrable.
Now we connect the Riemann integral to Jordan measure in two
ways. First, we connect the Riemann integral to onedimensional
Jordan measure:
Exercise 1.1.24 (Basic properties of the Riemann integral). Let
[a, b] be an interval, and let f, g : [a, b] → R be Riemann integrable.
Establish the following statements:
(1) (Linearity) For any real number c, cf and f +g are Riemann
integrable, with
b
a
cf(x) dx = c
b
a
f(x) dx and
b
a
f(x) +
g(x) dx =
b
a
f(x) dx +
b
a
g(x) dx.
(2) (Monotonicity) If f ≤ g pointwise (i.e. f(x) ≤ g(x) for all
x ∈ [a, b]) then
b
a
f(x) dx ≤
b
a
g(x) dx.
8
A function f : [a, b] → R is piecewise continuous if one can partition [a, b] into
ﬁnitely many intervals, such that f is continuous on each interval.
1.2. Lebesgue measure 17
(3) (Indicator) If E is a Jordan measurable of [a, b], then the in
dicator function 1
E
: [a, b] → R (deﬁned by setting 1
E
(x) :=
1 when x ∈ E and 1
E
(x) := 0 otherwise) is Riemann inte
grable, and
b
a
1
E
(x) dx = m(E).
Finally, show that these properties uniquely deﬁne the Riemann inte
gral, in the sense that the functional f →
b
a
f(x) dx is the only map
from the space of Riemann integrable functions on [a, b] to R which
obeys all three of the above properties.
Next, we connect the integral to twodimensional Jordan measure:
Exercise 1.1.25 (Area interpretation of the Riemann integral). Let
[a, b] be an interval, and let f : [a, b] → R be a bounded function.
Show that f is Riemann integrable if and only if the sets E
+
:=
¦(x, t) : x ∈ [a, b]; 0 ≤ t ≤ f(x)¦ and E
−
:= ¦(x, t) : x ∈ [a, b]; f(x) ≤
t ≤ 0¦ are both Jordan measurable in R
2
, in which case one has
b
a
f(x) dx = m
2
(E
+
) −m
2
(E
−
),
where m
2
denotes twodimensional Jordan measure. (Hint: First
establish this in the case when f is nonnegative.)
Exercise 1.1.26. Extend the deﬁnition of the Riemann and Darboux
integrals to higher dimensions, in such a way that analogues of all the
previous results hold.
1.2. Lebesgue measure
In Section 1.1, we recalled the classical theory of Jordan measure on
Euclidean spaces R
d
. This theory proceeded in the following stages:
(i) First, one deﬁned the notion of a box B and its volume [B[.
(ii) Using this, one deﬁned the notion of an elementary set E (a
ﬁnite union of boxes), and deﬁnes the elementary measure
m(E) of such sets.
(iii) From this, one deﬁned the inner and Jordan outer measures
m
∗,(J)
(E), m
∗,(J)
(E) of an arbitrary bounded set E ⊂ R
d
. If
those measures match, we say that E is Jordan measurable,
18 1. Measure theory
and call m(E) = m
∗,(J)
(E) = m
∗,(J)
(E) the Jordan measure
of E.
As long as one is lucky enough to only have to deal with Jordan
measurable sets, the theory of Jordan measure works well enough.
However, as noted previously, not all sets are Jordan measurable, even
if one restricts attention to bounded sets. In fact, we shall see later
in these notes that there even exist bounded open sets, or compact
sets, which are not Jordan measurable, so the Jordan theory does
not cover many classes of sets of interest. Another class that it fails
to cover is countable unions or intersections of sets that are already
known to be measurable:
Exercise 1.2.1. Show that the countable union
¸
∞
n=1
E
n
or count
able intersection
¸
∞
n=1
E
n
of Jordan measurable sets E
1
, E
2
, . . . ⊂ R
need not be Jordan measurable, even when bounded.
This creates problems with Riemann integrability (which, as we
saw in Section 1.1, was closely related to Jordan measure) and point
wise limits:
Exercise 1.2.2. Give an example of a sequence of uniformly bounded,
Riemann integrable functions f
n
: [0, 1] → R for n = 1, 2, . . . that
converge pointwise to a bounded function f : [0, 1] → R that is not
Riemann integrable. What happens if we replace pointwise conver
gence with uniform convergence?
These issues can be rectiﬁed by using a more powerful notion of
measure than Jordan measure, namely Lebesgue measure. To deﬁne
this measure, we ﬁrst tinker with the notion of the Jordan outer
measure
m
∗,(J)
(E) := inf
B⊃E;B elementary
m(B)
of a set E ⊂ R
d
(we adopt the convention that m
∗,(J)
(E) = +∞ if
E is unbounded, thus m
∗,(J)
now takes values in the extended non
negative reals [0, +∞], whose properties we will brieﬂy review below).
Observe from the ﬁnite additivity and subadditivity of elementary
measure that we can also write the Jordan outer measure as
m
∗,(J)
(E) := inf
B1∪...∪B
k
⊃E;B1,...,B
k
boxes
[B
1
[ +. . . +[B
k
[,
1.2. Lebesgue measure 19
i.e. the Jordan outer measure is the inﬁmal cost required to cover E
by a ﬁnite union of boxes. (The natural number k is allowed to vary
freely in the above inﬁmum.) We now modify this by replacing the
ﬁnite union of boxes by a countable union of boxes, leading to the
Lebesgue outer measure
9
m
∗
(E) of E:
m
∗
(E) := inf
∞
n=1
Bn⊃E;B1,B2,... boxes
∞
¸
n=1
[B
n
[,
thus the Lebesgue outer measure is the inﬁmal cost required to cover
E by a countable union of boxes. Note that the countable sum
¸
∞
n=1
[B
n
[ may be inﬁnite, and so the Lebesgue outer measure m
∗
(E)
could well equal +∞.
Clearly, we always have m
∗
(E) ≤ m
∗,(J)
(E) (since we can always
pad out a ﬁnite union of boxes into an inﬁnite union by adding an
inﬁnite number of empty boxes). But m
∗
(E) can be a lot smaller:
Example 1.2.1. Let E = ¦x
1
, x
2
, x
3
, . . .¦ ⊂ R
d
be a countable set.
We know that the Jordan outer measure of E can be quite large;
for instance, in one dimension, m
∗,(J)
(Q) is inﬁnite, and m
∗,(J)
(Q∩
[−R, R]) = m
∗,(J)
([−R, R]) = 2R since Q∩[−R, R] has [−R, R] as its
closure (see Exercise 1.1.18). On the other hand, all countable sets E
have Lebesgue outer measure zero. Indeed, one simply covers E by
the degenerate boxes ¦x
1
¦, ¦x
2
¦, . . . of sidelength and volume zero.
Alternatively, if one does not like degenerate boxes, one can cover
each x
n
by a cube B
n
of sidelength ε/2
n
(say) for some arbitrary
ε > 0, leading to a total cost of
¸
∞
n=1
(ε/2
n
)
d
, which converges to
C
d
ε
d
for some absolute constant C
d
. As ε can be arbitrarily small,
we see that the Lebesgue outer measure must be zero. We will refer
to this type of trick as the ε/2
n
trick; it will be used many further
times in this text.
From this example we see in particular that a set may be un
bounded while still having Lebesgue outer measure zero, in contrast
to Jordan outer measure.
9
Lebesgue outer measure is also denoted m∗(E) in some texts.
20 1. Measure theory
As we shall see in Section 1.7, Lebesgue outer measure (also
known as Lebesgue exterior measure) is a special case of a more gen
eral concept known as an outer measure.
In analogy with the Jordan theory, we would also like to deﬁne
a concept of “Lebesgue inner measure” to complement that of outer
measure. Here, there is an asymmetry (which ultimately arises from
the fact that elementary measure is subadditive rather than superad
ditive): one does not gain any increase in power in the Jordan inner
measure by replacing ﬁnite unions of boxes with countable ones. But
one can get a sort of Lebesgue inner measure by taking complements;
see Exercise 1.2.18. This leads to one possible deﬁnition for Lebesgue
measurability, namely the Carath´eodory criterion for Lebesgue mea
surability, see Exercise 1.2.17. However, this is not the most intuitive
formulation of this concept to work with, and we will instead use a dif
ferent (but logically equivalent) deﬁnition of Lebesgue measurability.
The starting point is the observation (see Exercise 1.1.13) that Jordan
measurable sets can be eﬃciently contained in elementary sets, with
an error that has small Jordan outer measure. In a similar vein, we
will deﬁne Lebesgue measurable sets to be sets that can be eﬃciently
contained in open sets, with an error that has small Lebesgue outer
measure:
Deﬁnition 1.2.2 (Lebesgue measurability). A set E ⊂ R
d
is said
to be Lebesgue measurable if, for every ε > 0, there exists an open
set U ⊂ R
d
containing E such that m
∗
(U`E) ≤ ε. If E is Lebesgue
measurable, we refer to m(E) := m
∗
(E) as the Lebesgue measure of
E (note that this quantity may be equal to +∞). We also write m(E)
as m
d
(E) when we wish to emphasise the dimension d.
Remark 1.2.3. The intuition that measurable sets are almost open
is also known as Littlewood’s ﬁrst principle, this principle is a triviality
with our current choice of deﬁnitions, though less so if one uses other,
equivalent, deﬁnitions of Lebesgue measurability. See Section 1.3.5
for a further discussion of Littlewood’s principles.
As we shall see later, Lebesgue measure extends Jordan measure,
in the sense that every Jordan measurable set is Lebesgue measurable,
1.2. Lebesgue measure 21
and the Lebesgue measure and Jordan measure of a Jordan measur
able set are always equal. We will also see a few other equivalent
descriptions of the concept of Lebesgue measurability.
In the notes below we will establish the basic properties of Lebesgue
measure. Broadly speaking, this concept obeys all the intuitive prop
erties one would ask of measure, so long as one restricts attention
to countable operations rather than uncountable ones, and as long
as one restricts attention to Lebesgue measurable sets. The latter is
not a serious restriction in practice, as almost every set one actually
encounters in analysis will be measurable (the main exceptions be
ing some pathological sets that are constructed using the axiom of
choice). In the next set of notes we will use Lebesgue measure to
set up the Lebesgue integral, which extends the Riemann integral in
the same way that Lebesgue measure extends Jordan measure; and
the many pleasant properties of Lebesgue measure will be reﬂected in
analogous pleasant properties of the Lebesgue integral (most notably
the convergence theorems).
We will treat all dimensions d = 1, 2, . . . equally here, but for the
purposes of drawing pictures, we recommend to the reader that one
sets d equal to 2. However, for this topic at least, no additional mathe
matical diﬃculties will be encountered in the higherdimensional case
(though of course there are signiﬁcant visual diﬃculties once d ex
ceeds 3).
1.2.1. Properties of Lebesgue outer measure. We begin by
studying the Lebesgue outer measure m
∗
, which was deﬁned earlier,
and takes values in the extended nonnegative real axis [0, +∞]. We
ﬁrst record three easy properties of Lebesgue outer measure, which
we will use repeatedly in the sequel without further comment:
Exercise 1.2.3 (The outer measure axioms).
(i) (Empty set) m
∗
(∅) = 0.
(ii) (Monotonicity) If E ⊂ F ⊂ R
d
, then m
∗
(E) ≤ m
∗
(F).
(iii) (Countable subadditivity) If E
1
, E
2
, . . . ⊂ R
d
is a count
able sequence of sets, then m
∗
(
¸
∞
n=1
E
n
) ≤
¸
∞
n=1
m
∗
(E
n
).
(Hint: Use the axiom of countable choice, Tonelli’s theorem
22 1. Measure theory
for series, and the ε/2
n
trick used previously to show that
countable sets had outer measure zero.)
Note that countable subadditivity, when combined with the empty
set axiom, gives as a corollary the ﬁnite subadditivity property
m
∗
(E
1
∪ . . . ∪ E
k
) ≤ m
∗
(E
1
) +. . . +m
∗
(E
k
)
for any k ≥ 0. These subadditivity properties will be useful in estab
lishing upper bounds on Lebesgue outer measure. Establishing lower
bounds will often be a bit trickier. (More generally, when dealing
with a quantity that is deﬁned using an inﬁmum, it is usually easier
to obtain upper bounds on that quantity than lower bounds, because
the former requires one to bound just one element of the inﬁmum,
whereas the latter requires one to bound all elements.)
Remark 1.2.4. Later on in this text, when we study abstract mea
sure theory on a general set X, we will deﬁne the concept of an outer
measure on X, which is an assigment E → m
∗
(E) of element of
[0, +∞] to arbitrary subsets E of a space X that obeys the above
three axioms of the empty set, monotonicity, and countable subaddi
tivity; thus Lebesgue outer measure is a model example of an abstract
outer measure. On the other hand (and somewhat confusingly), Jor
dan outer measure will not be an abstract outer measure (even after
adopting the convention that unbounded sets have Jordan outer mea
sure +∞): it obeys the empty set and monotonicity axioms, but is
only ﬁnitely subadditive rather than countably subadditive. (For in
stance, the rationals Q have inﬁnite Jordan outer measure, despite
being the countable union of points, each of which have a Jordan
outer measure of zero.) Thus we already see a major beneﬁt of al
lowing countable unions of boxes in the deﬁnition of Lebesgue outer
measure, in contrast to the ﬁnite unions of boxes in the deﬁnition
of Jordan outer measure, in that ﬁnite subadditivity is upgraded to
countable subadditivity.
Of course, one cannot hope to upgrade countable subadditivity
to uncountable subadditivity: R
d
is an uncountable union of points,
each of which has Lebesgue outer measure zero, but (as we shall
shortly see), R
d
has inﬁnite Lebesgue outer measure.
1.2. Lebesgue measure 23
It is natural to ask whether Lebesgue outer measure has the ﬁnite
additivity property, that is to say that m
∗
(E ∪F) = m
∗
(E) +m
∗
(F)
whenever E, F ⊂ R
d
are disjoint. The answer to this question is
somewhat subtle: as we shall see later, we have ﬁnite additivity (and
even countable additivity) when all sets involved are Lebesgue mea
surable, but that ﬁnite additivity (and hence also countable additiv
ity) can break down in the nonmeasurable case. The diﬃculty here
(which, incidentally, also appears in the theory of Jordan outer mea
sure) is that if E and F are suﬃciently “entangled” with each other,
it is not always possible to take a countable cover of E ∪ F by boxes
and split the total volume of that cover into separate covers of E and
F without some duplication. However, we can at least recover ﬁnite
additivity if the sets E, F are separated by some positive distance:
Lemma 1.2.5 (Finite additivity for separated sets). Let E, F ⊂ R
d
be such that dist(E, F) > 0, where
dist(E, F) := inf¦[x −y[ : x ∈ E, y ∈ F¦
is the distance
10
between E and F. Then m
∗
(E ∪ F) = m
∗
(E) +
m
∗
(F).
Proof. From subadditivity one has m
∗
(E∪F) ≤ m
∗
(E) +m
∗
(F), so
it suﬃces to prove the other direction m
∗
(E) +m
∗
(F) ≤ m
∗
(E ∪F).
This is trivial if E ∪ F has inﬁnite Lebesgue outer measure, so we
may assume that it has ﬁnite Lebesgue outer measure (and then the
same is true for E and F, by monotonicity).
We use the standard “give yourself an epsilon of room” trick (see
Section 2.7 of An epsilon of room, Vol I.). Let ε > 0. By deﬁnition
of Lebesgue outer measure, we can cover E∪F by a countable family
B
1
, B
2
, . . . of boxes such that
∞
¸
n=1
[B
n
[ ≤ m
∗
(E ∪ F) +ε.
Suppose it was the case that each box intersected at most one of E and
F. Then we could divide this family into two subfamilies B
/
1
, B
/
2
, . . .
10
Recall from the preface that we use the usual Euclidean metric ](x1, . . . , x
d
)] :=
x
2
1
+. . . +x
2
d
on R
d
.
24 1. Measure theory
and B
//
1
, B
//
2
, B
//
3
, . . ., the ﬁrst of which covered E, and the second of
which covered F. From deﬁnition of Lebesgue outer measure, we have
m
∗
(E) ≤
∞
¸
n=1
[B
/
n
[
and
m
∗
(F) ≤
∞
¸
n=1
[B
//
n
[;
summing, we obtain
m
∗
(E) +m
∗
(F) ≤
∞
¸
n=1
[B
n
[
and thus
m
∗
(E) +m
∗
(F) ≤ m
∗
(E ∪ F) +ε.
Since ε was arbitrary, this gives m
∗
(E) + m
∗
(F) ≤ m
∗
(E ∪ F) as
required.
Of course, it is quite possible for some of the boxes B
n
to intersect
both E and F, particularly if the boxes are big, in which case the
above argument does not work because that box would be double
counted. However, observe that given any r > 0, one can always
partition a large box B
n
into a ﬁnite number of smaller boxes, each
of which has diameter
11
at most r, with the total volume of these
subboxes equal to the volume of the original box. Applying this
observation to each of the boxes B
n
, we see that given any r > 0,
we may assume without loss of generality that the boxes B
1
, B
2
, . . .
covering E∪F have diameter at most r. In particular, we may assume
that all such boxes have diameter strictly less than dist(E, F). Once
we do this, then it is no longer possible for any box to intersect both
E and F, and then the previous argument now applies.
In general, disjoint sets E, F need not have a positive separation
from each other (e.g. E = [0, 1) and F = [1, 2]). But the situation
improves when E, F are closed, and at least one of E, F is compact:
11
The diameter of a set B is deﬁned as sup¦]x −y] : x, y ∈ B¦.
1.2. Lebesgue measure 25
Exercise 1.2.4. Let E, F ⊂ R
d
be disjoint closed sets, with at least
one of E, F being compact. Show that dist(E, F) > 0. Give a coun
terexample to show that this claim fails when the compactness hy
pothesis is dropped.
We already know that countable sets have Lebesgue outer mea
sure zero. Now we start computing the outer measure of some other
sets. We begin with elementary sets:
Lemma 1.2.6 (Outer measure of elementary sets). Let E be an ele
mentary set. Then the Lebesgue outer measure m
∗
(E) of E is equal
to the elementary measure m(E) of E: m
∗
(E) = m(E).
Remark 1.2.7. Since countable sets have zero outer measure, we
note that we have managed to give a proof of Cantor’s theorem that
R
d
is uncountable. Of course, much quicker proofs of this theorem
are available. However, this observation shows that the proof this
lemma must somehow use some crucial fact about the real line which
is not true for countable subﬁelds of R, such as the rationals Q. In
the proof we give here, the key fact about the real line we use is the
HeineBorel theorem, which ultimately exploits the important fact
that the reals are complete. In the onedimensional case d = 1, it
is also possible to exploit the fact that the reals are connected as a
substitute for completeness (note that proper subﬁelds of the reals
are neither connected nor complete).
Proof. We already know that m
∗
(E) ≤ m
∗,(J)
(E) = m(E), so it
suﬃces to show that m(E) ≤ m
∗
(E).
We ﬁrst establish this in the case when the elementary set E is
closed. As the elementary set E is also bounded, this allows us to
use the powerful HeineBorel theorem, which asserts that every open
cover of E has a ﬁnite subcover (or in other words, E is compact).
We again use the epsilon of room strategy. Let ε > 0 be arbitrary,
then we can ﬁnd a countable family B
1
, B
2
, . . . of boxes that cover E,
E ⊂
∞
¸
n=1
B
n
,
26 1. Measure theory
and such that
∞
¸
n=1
[B
n
[ ≤ m
∗
(E) +ε.
We would like to use the HeineBorel theorem, but the boxes B
n
need not be open. But this is not a serious problem, as one can spend
another epsilon to enlarge the boxes to be open. More precisely, for
each box B
n
one can ﬁnd an open box B
/
n
containing B
n
such that
[B
/
n
[ ≤ [B
n
[ +ε/2
n
(say). The B
/
n
still cover E, and we have
∞
¸
n=1
[B
/
n
[ ≤
∞
¸
n=1
([B
n
[ +ε/2
n
) = (
∞
¸
n=1
[B
n
[) +ε ≤ m
∗
(E) + 2ε.
As the B
/
n
are open, we may apply the HeineBorel theorem and
conclude that
E ⊂
N
¸
n=1
B
/
n
for some ﬁnite N. Using the ﬁnite subadditivity of elementary mea
sure, we conclude that
m(E) ≤
N
¸
n=1
[B
/
n
[
and thus
m(E) ≤ m
∗
(E) + 2ε.
Since ε > 0 was arbitrary, the claim follows.
Now we consider the case when the elementary E is not closed.
Then we can write E as the ﬁnite union Q
1
∪ . . . ∪ Q
k
of disjoint
boxes, which need not be closed. But, similarly to before, we can use
the epsilon of room strategy: for every ε > 0 and every 1 ≤ j ≤ k,
one can ﬁnd a closed subbox Q
/
j
of Q
j
such that [Q
/
j
[ ≥ [Q
j
[ − ε/k
(say); then E contains the ﬁnite union of Q
/
1
∪. . . ∪Q
/
k
disjoint closed
boxes, which is a closed elementary set. By the previous discussion
and the ﬁnite additivity of elementary measure, we have
m
∗
(Q
/
1
∪ . . . ∪ Q
/
k
) = m(Q
/
1
∪ . . . ∪ Q
/
k
)
= m(Q
/
1
) +. . . +m(Q
/
k
)
≥ m(Q
1
) +. . . +m(Q
k
) −ε
= m(E) −ε.
1.2. Lebesgue measure 27
Applying by monotonicity of Lebesgue outer measure, we conclude
that
m
∗
(E) ≥ m(E) −ε
for every ε > 0. Since ε > 0 was arbitrary, the claim follows.
The above lemma allows us to compute the Lebesgue outer mea
sure of a ﬁnite union of boxes. From this and monotonicity we con
clude that the Lebesgue outer measure of any set is bounded below by
its Jordan inner measure. As it is also bounded above by the Jordan
outer measure, we have
(1.2) m
∗,(J)
(E) ≤ m
∗
(E) ≤ m
∗,(J)
(E)
for every E ⊂ R
d
.
Remark 1.2.8. We are now able to explain why not every bounded
open set or compact set is Jordan measurable. Consider the countable
set Q∩ [0, 1], which we enumerate as ¦q
1
, q
2
, q
3
, . . .¦, let ε > 0 be a
small number, and consider the set
U :=
∞
¸
n=1
(q
n
−ε/2
n
, q
n
+ε/2
n
).
This is the union of open sets and is thus open. On the other hand,
by countable subadditivity, one has
m
∗
(U) ≤
∞
¸
n=1
2ε/2
n
= 2ε.
Finally, as U is dense in [0, 1] (i.e. U contains [0, 1]), we have
m
∗,(J)
(U) = m
∗,(J)
(U) ≥ m
∗,(J)
([0, 1]) = 1.
For ε small enough (e.g. ε := 1/3), we see that the Lebesgue outer
measure and Jordan outer measure of U disagree. Using (1.2), we
conclude that the bounded open set U is not Jordan measurable.
This in turn implies that the complement of U in, say, [−2, 2], is also
not Jordan measurable, despite being a compact set.
Now we turn to countable unions of boxes. It is convenient to
introduce the following notion: two boxes are almost disjoint if their
28 1. Measure theory
interiors are disjoint, thus for instance [0, 1] and [1, 2] are almost dis
joint. As a box has the same elementary measure as its interior, we
see that the ﬁnite additivity property
(1.3) m(B
1
∪ . . . ∪ B
k
) = [B
1
[ +. . . +[B
k
[
holds for almost disjoint boxes B
1
, . . . , B
k
, and not just for disjoint
boxes. This (and Lemma 1.2.6) has the following consequence:
Lemma 1.2.9 (Outer measure of countable unions of almost disjoint
boxes). Let E =
¸
∞
n=1
B
n
be a countable union of almost disjoint
boxes B
1
, B
2
, . . .. Then
m
∗
(E) =
∞
¸
n=1
[B
n
[.
Thus, for instance, R
d
itself has an inﬁnite outer measure.
Proof. From countable subadditivity and Lemma 1.2.6 we have
m
∗
(E) ≤
∞
¸
n=1
m
∗
(B
n
) =
∞
¸
n=1
[B
n
[,
so it suﬃces to show that
∞
¸
n=1
[B
n
[ ≤ m
∗
(E).
But for each natural number N, E contains the elementary set B
1
∪
. . . ∪ B
N
, so by monotonicity and Lemma 1.2.6,
m
∗
(E) ≥ m
∗
(B
1
∪ . . . ∪ B
N
)
= m(B
1
∪ . . . ∪ B
N
)
and thus by (1.3), one has
N
¸
n=1
[B
n
[ ≤ m
∗
(E).
Letting N → ∞ we obtain the claim.
Remark 1.2.10. The above lemma has the following immediate
corollary: if E =
¸
∞
n=1
B
n
=
¸
∞
n=1
B
/
n
can be decomposed in two
diﬀerent ways as the countable union of almost disjoint boxes, then
1.2. Lebesgue measure 29
¸
∞
n=1
[B
n
[ =
¸
∞
n=1
[B
/
n
[. Although this statement is intuitively obvi
ous and does not explicitly use the concepts of Lebesgue outer mea
sure or Lebesgue measure, it is remarkably diﬃcult to prove this state
ment rigorously without essentially using one of these two concepts.
(Try it!)
Exercise 1.2.5. Show that if a set E ⊂ R
d
is expressible as the
countable union of almost disjoint boxes, then the Lebesgue outer
measure of E is equal to the Jordan inner measure: m
∗
(E) = m
∗,(J)
(E),
where we extend the deﬁnition of Jordan inner measure to unbounded
sets in the obvious manner.
Not every set can be expressed as the countable union of almost
disjoint boxes (consider for instance the irrationals R`Q, which con
tain no boxes other than the singleton sets). However, there is an
important class of sets of this form, namely the open sets:
Lemma 1.2.11. Let E ⊂ R
d
be an open set. Then E can be ex
pressed as the countable union of almost disjoint boxes (and, in fact,
as the countable union of almost disjoint closed cubes).
Proof. We will use the dyadic mesh structure of the Euclidean space
R
d
, which is a convenient tool for “discretising” certain aspects of
real analysis.
Deﬁne a closed dyadic cube to be a cube Q of the form
Q =
¸
i
1
2
n
,
i
1
+ 1
2
n
. . .
¸
i
d
2
n
,
i
d
+ 1
2
n
for some integers n, i
1
, . . . , i
d
. To avoid some technical issues we shall
restrict attention here to “small” cubes of sidelength at most 1, thus
we restrict n to the nonnegative integers, and we will completely
ignore “large” cubes of sidelength greater than one. Observe that
the closed dyadic cubes of a ﬁxed sidelength 2
−n
are almost disjoint,
and cover all of R
d
. Also observe that each dyadic cube of sidelength
2
−n
is contained in exactly one “parent” cube of sidelength 2
−n+1
(which, conversely, has 2
d
“children” of sidelength 2
−n
), giving the
dyadic cubes a structure analogous to that of a binary tree (or more
precisely, an inﬁnite forest of 2
d
ary trees). As a consequence of these
facts, we also obtain the important dyadic nesting property: given
30 1. Measure theory
any two closed dyadic cubes (possibly of diﬀerent sidelength), either
they are almost disjoint, or one of them is contained in the other.
If E is open, and x ∈ E, then by deﬁnition there is an open ball
centered at x that is contained in E, and it is easy to conclude that
there is also a closed dyadic cube containing x that is contained in
E. Thus, if we let O be the collection of all the dyadic cubes Q that
are contained in E, we see that the union
¸
Q∈C
Q of all these cubes
is exactly equal to E.
As there are only countably many dyadic cubes, O is at most
countable. But we are not done yet, because these cubes are not
almost disjoint (for instance, any cube Q in O will of course overlap
with its child cubes). But we can deal with this by exploiting the
dyadic nesting property. Let O
∗
denote those cubes in O which are
maximal with respect to set inclusion, which means that they are not
contained in any other cube in O. From the nesting property (and
the fact that we have capped the maximum size of our cubes) we see
that every cube in O is contained in exactly one maximal cube in
O
∗
, and that any two such maximal cubes in O
∗
are almost disjoint.
Thus, we see that E is the union E =
¸
Q∈C
∗ Q of almost disjoint
cubes. As O
∗
is at most countable, the claim follows (adding empty
boxes if necessary to pad out the cardinality).
We now have a formula for the Lebesgue outer measure of any
open set: it is exactly equal to the Jordan inner measure of that set, or
of the total volume of any partitioning of that set into almost disjoint
boxes. Finally, we have a formula for the Lebesgue outer measure of
an arbitrary set:
Lemma 1.2.12 (Outer regularity). Let E ⊂ R
d
be an arbitrary set.
Then one has
m
∗
(E) = inf
E⊂U,U open
m
∗
(U).
Proof. From monotonicity one trivially has
m
∗
(E) ≤ inf
E⊂U,U open
m
∗
(U)
1.2. Lebesgue measure 31
so it suﬃces to show that
inf
E⊂U,U open
m
∗
(U) ≤ m
∗
(E).
This is trivial for m
∗
(E) inﬁnite, so we may assume that m
∗
(E) is
ﬁnite.
Let ε > 0. By deﬁnition of outer measure, there exists a countable
family B
1
, B
2
, . . . of boxes covering E such that
∞
¸
n=1
[B
n
[ ≤ m
∗
(E) +ε.
We use the ε/2
n
trick again. We can enlarge each of these boxes
B
n
to an open box B
/
n
such that [B
/
n
[ ≤ [B
n
[ + ε/2
n
. Then the set
¸
∞
n=1
B
/
n
, being a union of open sets, is itself open, and contains E;
and
∞
¸
n=1
[B
/
n
[ ≤ m
∗
(E) +ε +
∞
¸
n=1
ε/2
n
= m
∗
(E) + 2ε.
By countable subadditivity, this implies that
m
∗
(
∞
¸
n=1
B
/
n
) ≤ m
∗
(E) + 2ε
and thus
inf
E⊂U,U open
m
∗
(U) ≤ m
∗
(E) + 2ε.
As ε > 0 was arbitrary, we obtain the claim.
Exercise 1.2.6. Give an example to show that the reverse statement
m
∗
(E) = sup
U⊂E,U open
m
∗
(U)
is false. (For the corrected version of this statement, see Exercise
1.2.15.)
1.2.2. Lebesgue measurability. We now deﬁne the notion of a
Lebesgue measurable set as one which can be eﬃciently contained
in open sets in the sense of Deﬁnition 1.2.2, and set out their basic
properties.
First, we show that there are plenty of Lebesgue measurable sets.
Lemma 1.2.13 (Existence of Lebesgue measurable sets).
32 1. Measure theory
(i) Every open set is Lebesgue measurable.
(ii) Every closed set is Lebesgue measurable.
(iii) Every set of Lebesgue outer measure zero is measurable.
(Such sets are called null sets.)
(iv) The empty set ∅ is Lebesgue measurable.
(v) If E ⊂ R
d
is Lebesgue measurable, then so is its complement
R
d
`E.
(vi) If E
1
, E
2
, E
3
, . . . ⊂ R
d
are a sequence of Lebesgue measur
able sets, then the union
¸
∞
n=1
E
n
is Lebesgue measurable.
(vii) If E
1
, E
2
, E
3
, . . . ⊂ R
d
are a sequence of Lebesgue measur
able sets, then the intersection
¸
∞
n=1
E
n
is Lebesgue mea
surable.
Proof. Claim (i) is obvious from deﬁnition, as are Claims (iii) and
(iv).
To prove Claim (vi), we use the ε/2
n
trick. Let ε > 0 be arbitrary.
By hypothesis, each E
n
is contained in an open set U
n
whose diﬀer
ence U
n
`E
n
has Lebesgue outer measure at most ε/2
n
. By countable
subadditivity, this implies that
¸
∞
n=1
E
n
is contained in
¸
∞
n=1
U
n
, and
the diﬀerence (
¸
∞
n=1
U
n
)`(
¸
∞
n=1
E
n
) has Lebesgue outer measure at
most ε. The set
¸
∞
n=1
U
n
, being a union of open sets, is itself open,
and the claim follows.
Now we establish Claim (ii). Every closed set E is the countable
union of closed and bounded sets (by intersecting E with, say, the
closed balls B(0, n) of radius n for n = 1, 2, 3, . . .), so by (vi), it
suﬃces to verify the claim when E is closed and bounded, hence
compact by the HeineBorel theorem. Note that the boundedness of
E implies that m
∗
(E) is ﬁnite.
Let ε > 0. By outer regularity (Lemma 1.2.12), we can ﬁnd an
open set U containing E such that m
∗
(U) ≤ m
∗
(E) +ε. It suﬃces to
show that m
∗
(U`E) ≤ ε.
The set U`E is open, and so by Lemma 1.2.11 is the countable
union
¸
∞
n=1
Q
n
of almost disjoint closed cubes. By Lemma 1.2.9,
m
∗
(U`E) =
¸
∞
n=1
[Q
n
[. So it will suﬃce to show that
¸
N
n=1
[Q
n
[ ≤ ε
for every ﬁnite N.
1.2. Lebesgue measure 33
The set
¸
N
n=1
Q
n
is a ﬁnite union of closed cubes and is thus
closed. It is disjoint from the compact set E, so by Exercise 1.2.4
followed by Lemma 1.2.5 one has
m
∗
(E ∪
N
¸
n=1
Q
n
) = m
∗
(E) +m
∗
(
N
¸
n=1
Q
n
).
By monotonicity, the lefthand side is at most m
∗
(U), which is in
turn at most m
∗
(E) +ε. Since m
∗
(E) is ﬁnite, we may cancel it and
conclude that m
∗
(
¸
N
n=1
Q
n
) ≤ ε, as required.
Next, we establish Claim (v). If E is Lebesgue measurable, then
for every n we can ﬁnd an open set U
n
containing E such that
m
∗
(U
n
`E) ≤ 1/n. Letting F
n
be the complement of U
n
, we con
clude that the complement R
d
`E of E contains all of the F
n
, and
that m
∗
((R
d
`E)`F
n
) ≤ 1/n. If we let F :=
¸
∞
n=1
F
n
, then R
d
`E
contains F, and from monotonicity m
∗
((R
d
`E)`F) = 0, thus R
d
`E
is the union of F and a set of Lebesgue outer measure zero. But F is
in turn the union of countably many closed sets F
n
. The claim now
follows from (ii), (iii), (iv).
Finally, Claim (vii) follows from (v), (vi), and de Morgan’s laws
(
¸
α∈A
E
α
)
c
=
¸
α∈A
E
c
α
, (
¸
α∈A
E
α
)
c
=
¸
α∈A
E
c
α
, (which work for
inﬁnite unions and intersections without any diﬃculty).
Informally, the above lemma asserts (among other things) that if
one starts with such basic subsets of R
d
as open or closed sets and
then takes at most countably many boolean operations, one will al
ways end up with a Lebesgue measurable set. This is already enough
to ensure that the majority of sets that one actually encounters in real
analysis will be Lebesgue measurable. (Nevertheless, using the axiom
of choice one can construct sets that are not Lebesgue measurable; we
will see an example of this later. As a consequence, we cannot gen
eralise the countable closure properties here to uncountable closure
properties.)
Remark 1.2.14. The properties (iv), (v), (vi) of Lemma 1.2.13 assert
that the collection of Lebesgue measurable subsets of R
d
form a σ
algebra, which is a strengthening of the more classical concept of a
34 1. Measure theory
boolean algebra. We will study abstract σalgebras in more detail in
Section 1.4.
Note how this Lemma 1.2.13 is signiﬁcantly stronger than the
counterpart for Jordan measurability (Exercise 1.1.6), in particular
by allowing countably many boolean operations instead of just ﬁnitely
many. This is one of the main reasons why we use Lebesgue measure
instead of Jordan measure.
Exercise 1.2.7 (Criteria for measurability). Let E ⊂ R
d
. Show that
the following are equivalent:
(i) E is Lebesgue measurable.
(ii) (Outer approximation by open) For every ε > 0, one can
contain E in an open set U with m
∗
(U`E) ≤ ε.
(iii) (Almost open) For every ε > 0, one can ﬁnd an open set U
such that m
∗
(U∆E) ≤ ε. (In other words, E diﬀers from
an open set by a set of outer measure at most ε.)
(iv) (Inner approximation by closed) For every ε > 0, one can
ﬁnd a closed set F contained in E with m
∗
(E`F) ≤ ε.
(v) (Almost closed) For every ε > 0, one can ﬁnd a closed set
F such that m
∗
(F∆E) ≤ ε. (In other words, E diﬀers from
a closed set by a set of outer measure at most ε.)
(vi) (Almost measurable) For every ε > 0, one can ﬁnd a Lebesgue
measurable set E
ε
such that m
∗
(E
ε
∆E) ≤ ε. (In other
words, E diﬀers from a measurable set by a set of outer
measure at most ε.)
(Hint: Some of these deductions are either trivial or very easy. To
deduce (i) from (vi), use the ε/2
n
trick to show that E is contained
in a Lebesgue measurable set E
/
ε
with m
∗
(E
/
ε
∆E) ≤ ε, and then
take countable intersections to show that E diﬀers from a Lebesgue
measurable set by a null set.)
Exercise 1.2.8. Show that every Jordan measurable set is Lebesgue
measurable.
Exercise 1.2.9 (Middle thirds Cantor set). Let I
0
:= [0, 1] be the
unit interval, let I
1
:= [0, 1/3] ∪ [2/3, 1] be I
0
with the interior of
1.2. Lebesgue measure 35
the middle third interval removed, let I
2
:= [0, 1/9] ∪ [2/9, 1/3] ∪
[2/3, 7/9] ∪[8/9, 1] be I
1
with the interior of the middle third of each
of the two intervals of I
1
removed, and so forth. More formally, write
I
n
:=
¸
a1,...,an∈¦0,2¦
[
n
¸
i=1
a
i
3
i
,
n
¸
i=1
a
i
3
i
+
1
3
n
].
Let C :=
¸
∞
n=1
I
n
be the intersection of all the elementary sets I
n
.
Show that C is compact, uncountable, and a null set.
Exercise 1.2.10. (This exercise presumes some familiarity with point
set topology.) Show that the halfopen interval [0, 1) cannot be ex
pressed as the countable union of disjoint closed intervals. (Hint: It
is easy to prevent [0, 1) from being expressed as the ﬁnite union of
disjoint closed intervals. Next, assume for sake of contradiction that
[0, 1) is the union of inﬁnitely many closed intervals, and conclude
that [0, 1) is homeomorphic to the middle thirds Cantor set, which is
absurd. It is also possible to proceed using the Baire category theo
rem ('1.7 of An epsilon of room, Vol. I.) For an additional challenge,
show that [0, 1) cannot be expressed as the countable union of disjoint
closed sets.
Now we look at the Lebesgue measure m(E) of a Lebesgue mea
surable set E, which is deﬁned to equal its Lebesgue outer mea
sure m
∗
(E). If E is Jordan measurable, we see from (1.2) that
the Lebesgue measure and the Jordan measure of E coincide, thus
Lebesgue measure extends Jordan measure. This justiﬁes the use of
the notation m(E) to denote both Lebesgue measure of a Lebesgue
measurable set, and Jordan measure of a Jordan measurable set (as
well as elementary measure of an elementary set).
Lebesgue measure obeys signiﬁcantly better properties than Lebesgue
outer measure, when restricted to Lebesgue measurable sets:
Lemma 1.2.15 (The measure axioms).
(i) (Empty set) m(∅) = 0.
(ii) (Countable additivity) If E
1
, E
2
, . . . ⊂ R
d
is a countable se
quence of disjoint Lebesgue measurable sets, then m(
¸
∞
n=1
E
n
) =
¸
∞
n=1
m(E
n
).
36 1. Measure theory
Proof. The ﬁrst claim is trivial, so we focus on the second. We deal
with an easy case when all of the E
n
are compact. By repeated use
of Lemma 1.2.5 and Exercise 1.2.4, we have
m(
N
¸
n=1
E
n
) =
N
¸
n=1
m(E
n
).
Using monotonicity, we conclude that
m(
∞
¸
n=1
E
n
) ≥
N
¸
n=1
m(E
n
).
(We can use m instead of m
∗
throughout this argument, thanks to
Lemma 1.2.13). Sending N → ∞ we obtain
m(
∞
¸
n=1
E
n
) ≥
∞
¸
n=1
m(E
n
).
On the other hand, from countable subadditivity one has
m(
∞
¸
n=1
E
n
) ≤
∞
¸
n=1
m(E
n
),
and the claim follows.
Next, we handle the case when the E
n
are bounded but not neces
sarily compact. We use the ε/2
n
trick. Let ε > 0. Applying Exercise
1.2.7, we know that each E
n
is the union of a compact set K
n
and a
set of outer measure at most ε/2
n
. Thus
m(E
n
) ≤ m(K
n
) +ε/2
n
and hence
∞
¸
n=1
m(E
n
) ≤ (
∞
¸
n=1
m(K
n
)) +ε.
Finally, from the compact case of this lemma we already know that
m(
∞
¸
n=1
K
n
) =
∞
¸
n=1
m(K
n
)
while from monotonicity
m(
∞
¸
n=1
K
n
) ≤ m(
∞
¸
n=1
E
n
).
1.2. Lebesgue measure 37
Putting all this together we see that
∞
¸
n=1
m(E
n
) ≤ m(
∞
¸
n=1
E
n
) +ε
for every ε > 0, while from countable subadditivity we have
m(
∞
¸
n=1
E
n
) ≤
∞
¸
n=1
m(E
n
).
The claim follows.
Finally, we handle the case when the E
n
are not assumed to be
bounded or closed. Here, the basic idea is to decompose each E
n
as a
countable disjoint union of bounded Lebesgue measurable sets. First,
decompose R
d
as the countable disjoint union R
d
=
¸
∞
m=1
A
m
of
bounded measurable sets A
m
; for instance one could take the annuli
A
m
:= ¦x ∈ R
d
: m− 1 ≤ [x[ < m¦. Then each E
n
is the countable
disjoint union of the bounded measurable sets E
n
∩ A
m
for m =
1, 2, . . ., and thus
m(E
n
) =
∞
¸
m=1
m(E
n
∩ A
m
)
by the previous arguments. In a similar vein,
¸
∞
n=1
E
n
is the count
able disjoint union of the bounded measurable sets E
n
∩ A
m
for
n, m = 1, 2, . . ., and thus
m(
∞
¸
n=1
E
n
) =
∞
¸
n=1
∞
¸
m=1
m(E
n
∩ A
m
),
and the claim follows.
From Lemma 1.2.15 one of course can conclude ﬁnite additivity
m(E
1
∪ . . . ∪ E
k
) = m(E
1
) +. . . +m(E
k
)
whenever E
1
, . . . , E
k
⊂ R
d
are Lebesgue measurable sets. We also
have another important result:
Exercise 1.2.11 (Monotone convergence theorem for measurable
sets).
38 1. Measure theory
(i) (Upward monotone convergence) Let E
1
⊂ E
2
⊂ . . . ⊂ R
n
be a countable nondecreasing sequence of Lebesgue mea
surable sets. Show that m(
¸
∞
n=1
E
n
) = lim
n→∞
m(E
n
).
(Hint: Express
¸
∞
n=1
E
n
as the countable union of the lacu
nae E
n
`
¸
n−1
n
=1
E
n
.)
(ii) (Downward monotone convergence) Let R
d
⊃ E
1
⊃ E
2
⊃
. . . be a countable nonincreasing sequence of Lebesgue mea
surable sets. If at least one of the m(E
n
) is ﬁnite, show that
m(
¸
∞
n=1
E
n
) = lim
n→∞
m(E
n
).
(iii) Give a counterexample to show that the hypothesis that at
least one of the m(E
n
) is ﬁnite in the downward monotone
convergence theorem cannot be dropped.
Exercise 1.2.12. Show that any map E → m(E) from Lebesgue
measurable sets to elements of [0, +∞] that obeys the above empty set
and countable additivity axioms will also obey the monotonicity and
countable subadditivity axioms from Exercise 1.2.3, when restricted
to Lebesgue measurable sets of course.
Exercise 1.2.13. We say that a sequence E
n
of sets in R
d
converges
pointwise to another set E in R
d
if the indicator functions 1
En
con
verge pointwise to 1
E
.
(i) Show that if the E
n
are all Lebesgue measurable, and con
verge pointwise to E, then E is Lebesgue measurable also.
(Hint: use the identity 1
E
(x) = liminf
n→∞
1
En
(x) or 1
E
(x) =
limsup
n→∞
1
En
(x) to write E in terms of countable unions
and intersections of the E
n
.)
(ii) (Dominated convergence theorem) Suppose that the E
n
are
all contained in another Lebesgue measurable set F of ﬁnite
measure. Show that m(E
n
) converges to m(E). (Hint: use
the upward and downward monotone convergence theorems,
Exercise 1.2.11.)
(iii) Give a counterexample to show that the dominated conver
gence theorem fails if the E
n
are not contained in a set of
ﬁnite measure, even if we assume that the m(E
n
) are all
uniformly bounded.
1.2. Lebesgue measure 39
In later sections we will generalise the monotone and dominated
convergence theorems to measurable functions instead of measurable
sets; see Theorem 1.4.44 and Theorem 1.4.49.
Exercise 1.2.14. Let E ⊂ R
d
. Show that E is contained in a
Lebesgue measurable set of measure exactly equal to m
∗
(E).
Exercise 1.2.15 (Inner regularity). Let E ⊂ R
d
be Lebesgue mea
surable. Show that
m(E) = sup
K⊂E,K compact
m(K).
Remark 1.2.16. The inner and outer regularity properties of mea
sure can be used to deﬁne the concept of a Radon measure (see '1.10
of An epsilon of room, Vol. I.).
Exercise 1.2.16 (Criteria for ﬁnite measure). Let E ⊂ R
d
. Show
that the following are equivalent:
(i) E is Lebesgue measurable with ﬁnite measure.
(ii) (Outer approximation by open) For every ε > 0, one can
contain E in an open set U of ﬁnite measure with m
∗
(U`E) ≤
ε.
(iii) (Almost open bounded) E diﬀers from a bounded open set
by a set of arbitrarily small Lebesgue outer measure. (In
other words, for every ε > 0 there exists a bounded open set
U such that m
∗
(E∆U) ≤ ε.)
(iv) (Inner approximation by compact) For every ε > 0, one can
ﬁnd a compact set F contained in E with m
∗
(E`F) ≤ ε.
(v) (Almost compact) E diﬀers from a compact set by a set of
arbitrarily small Lebesgue outer measure.
(vi) (Almost bounded measurable) E diﬀers from a bounded
Lebesgue measurable set by a set of arbitrarily small Lebesgue
outer measure.
(vii) (Almost ﬁnite measure) E diﬀers from a Lebesgue measur
able set with ﬁnite measure by a set of arbitrarily small
Lebesgue outer measure.
40 1. Measure theory
(viii) (Almost elementary) E diﬀers from an elementary set by a
set of arbitrarily small Lebesgue outer measure.
(ix) (Almost dyadically elementary) For every ε > 0, there exists
an integer n and a ﬁnite union F of closed dyadic cubes of
sidelength 2
−n
such that m
∗
(E∆F) ≤ ε.
One can interpret the equivalence of (i) and (ix) in the above ex
ercise as asserting that Lebesgue measurable sets are those which look
(locally) “pixelated” at suﬃciently ﬁne scales. This will be formalised
in later sections with the Lebesgue diﬀerentiation theorem (Exercise
1.6.24).
Exercise 1.2.17 (Carath´eodory criterion, one direction). Let E ⊂
R
d
. Show that the following are equivalent:
(i) E is Lebesgue measurable.
(ii) For every elementary set A, one has m(A) = m
∗
(A ∩ E) +
m
∗
(A`E).
(iii) For every box B, one has [B[ = m
∗
(B ∩ E) +m
∗
(B`E).
Exercise 1.2.18 (Inner measure). Let E ⊂ R
d
be a bounded set.
Deﬁne the Lebesgue inner measure m
∗
(E) of E by the formula
m
∗
(E) := m(A) −m
∗
(A`E)
for any elementary set A containing E.
(i) Show that this deﬁnition is well deﬁned, i.e. that if A, A
/
are
two elementary sets containing E, that m(A) −m
∗
(A`E) is
equal to m(A
/
) −m
∗
(A
/
`E).
(ii) Show that m
∗
(E) ≤ m
∗
(E), and that equality holds if and
only if E is Lebesgue measurable.
Deﬁne a G
δ
set to be a countable intersection
¸
∞
n=1
U
n
of open
sets, and an F
σ
set to be a countable union
¸
∞
n=1
F
n
of closed sets.
Exercise 1.2.19. Let E ⊂ R
d
. Show that the following are equiva
lent:
(i) E is Lebesgue measurable.
(ii) E is a G
δ
set with a null set removed.
1.2. Lebesgue measure 41
(iii) E is the union of a F
σ
set and a null set.
Remark 1.2.17. From the above exercises, we see that when de
scribing what it means for a set to be Lebesgue measurable, there is
a tradeoﬀ between the type of approximation one is willing to bear,
and the type of things one can say about the approximation. If one
is only willing to approximate to within a null set, then one can only
say that a measurable set is approximated by a G
δ
or a F
σ
set, which
is a fairly weak amount of structure. If one is willing to add on an
epsilon of error (as measured in the Lebesgue outer measure), one can
make a measurable set open; dually, if one is willing to take away an
epsilon of error, one can make a measurable set closed. Finally, if one
is willing to both add and subtract an epsilon of error, then one can
make a measurable set (of ﬁnite measure) elementary, or even a ﬁnite
union of dyadic cubes.
Exercise 1.2.20 (Translation invariance). If E ⊂ R
d
is Lebesgue
measurable, show that E+x is Lebesgue measurable for any x ∈ R
d
,
and that m(E +x) = m(E).
Exercise 1.2.21 (Change of variables). If E ⊂ R
d
is Lebesgue mea
surable, and T : R
d
→ R
d
is a linear transformation, show that T(E)
is Lebesgue measurable, and that m(T(E)) = [ det T[m(E). We cau
tion that if T : R
d
→ R
d
is a linear map to a space R
d
of strictly
smaller dimension than R
d
, then T(E) need not be Lebesgue mea
surable; see Exercise 1.2.27.
Exercise 1.2.22. Let d, d
/
≥ 1 be natural numbers.
(i) If E ⊂ R
d
and F ⊂ R
d
, show that (m
d+d
)
∗
(E F) ≤
(m
d
)
∗
(E)(m
d
)
∗
(F), where (m
d
)
∗
denotes ddimensional Lebesgue
measure, etc.
(ii) Let E ⊂ R
d
, F ⊂ R
d
be Lebesgue measurable sets. Show
that EF ⊂ R
d+d
is Lebesgue measurable, with m
d+d
(E
F) = m
d
(E) m
d
(F). (Note that we allow E or F to have
inﬁnite measure, and so one may have to divide into cases
or take advantage of the monotone convergence theorem for
Lebesgue measure, Exercise 1.2.11.)
42 1. Measure theory
Exercise 1.2.23 (Uniqueness of Lebesgue measure). Show that Lebesgue
measure E → m(E) is the only map from Lebesgue measurable sets
to [0, +∞] that obeys the following axioms:
(i) (Empty set) m(∅) = 0.
(ii) (Countable additivity) If E
1
, E
2
, . . . ⊂ R
d
is a countable se
quence of disjoint Lebesgue measurable sets, then m(
¸
∞
n=1
E
n
) =
¸
∞
n=1
m(E
n
).
(iii) (Translation invariance) If E is Lebesgue measurable and
x ∈ R
d
, then m(E +x) = m(E).
(iv) (Normalisation) m([0, 1]
d
) = 1.
Hint: First show that m must match elementary measure on elemen
tary sets, then show that m is bounded by outer measure.
Exercise 1.2.24 (Lebesgue measure as the completion of elementary
measure). The purpose of the following exercise is to indicate how
Lebesgue measure can be viewed as a metric completion of elementary
measure in some sense. To avoid some technicalities we will not work
in all of R
d
, but in some ﬁxed elementary set A (e.g. A = [0, 1]
d
).
(i) Let 2
A
:= ¦E : E ⊂ A¦ be the power set of A. We say
that two sets E, F ∈ 2
A
are equivalent if E∆F is a null set.
Show that this is a equivalence relation.
(ii) Let 2
A
/ ∼ be the set of equivalence classes [E] := ¦F ∈
2
A
: E ∼ F¦ of 2
A
with respect to the above equivalence
relation. Deﬁne a distance d : 2
A
/ ∼ 2
A
/ ∼→ R
+
between
two equivalence classes [E], [E
/
] by deﬁning d([E], [E
/
]) :=
m
∗
(E∆E
/
). Show that this distance is welldeﬁned (in the
sense that m(E∆E
/
) = m(F∆F
/
) whenever [E] = [F] and
[E
/
] = [F
/
]) and gives 2
A
/ ∼ the structure of a complete
metric space.
(iii) Let c ⊂ 2
A
be the elementary subsets of A, and let / ⊂ 2
A
be the Lebesgue measurable subsets of A. Show that // ∼
is the closure of c/ ∼ with respect to the metric deﬁned
above. In particular, // ∼ is a complete metric space that
contains c/ ∼ as a dense subset; in other words, // ∼ is a
metric completion of c/ ∼.
1.2. Lebesgue measure 43
(iv) Show that Lebesgue measure m : / → R
+
descends to a
continuous function m : // ∼→ R
+
, which by abuse of
notation we shall still call m. Show that m : // ∼→ R
+
is
the unique continuous extension of the analogous elementary
measure function m : c/ ∼→ R
+
to // ∼.
For a further discussion of how measures can be viewed as completions
of elementary measures, see '2.1 of An epsilon of room, Vol. I.
Exercise 1.2.25. Deﬁne a continuously diﬀerentiable curve in R
d
to
be a set of the form ¦γ(t) : a ≤ t ≤ b¦ where [a, b] is a closed interval
and γ : [a, b] → R
d
is a continuously diﬀerentiable function.
(i) If d ≥ 2, show that every continuously diﬀerentiable curve
has Lebesgue measure zero. (Why is the condition d ≥ 2
necessary?)
(ii) Conclude that if d ≥ 2, then the unit cube [0, 1]
d
cannot
be covered by countably many continuously diﬀerentiable
curves.
We remark that if the curve is only assumed to be continuous, rather
than continuously diﬀerentiable, then these claims fail, thanks to the
existence of spaceﬁlling curves.
1.2.3. Nonmeasurable sets. In the previous section we have set
out a rich theory of Lebesgue measure, which enjoys many nice prop
erties when applied to Lebesgue measurable sets.
Thus far, we have not ruled out the possibility that every single
set is Lebesgue measurable. There is good reason for this: a famous
theorem of Solovay[So1970] asserts that, if one is willing to drop the
axiom of choice, there exist models of set theory in which all subsets
of R
d
are measurable. So any demonstration of the existence of non
measurable sets must use the axiom of choice in some essential way.
That said, we can give an informal (and highly nonrigorous) mo
tivation as to why nonmeasurable sets should exist, using intuition
from probability theory rather than from set theory. The starting
point is the observation that Lebesgue sets of ﬁnite measure (and
in particular, bounded Lebesgue sets) have to be “almost elemen
tary”, in the sense of Exercise 1.2.16. So all we need to do to build
44 1. Measure theory
a nonmeasurable set is to exhibit a bounded set which is not almost
elementary. Intuitively, we want to build a set which has oscillatory
structure even at arbitrarily ﬁne scales.
We will nonrigorously do this as follows. We will work inside
the unit interval [0, 1]. For each x ∈ [0, 1], we imagine that we ﬂip a
coin to give either heads or tails (with an independent coin ﬂip for
each x), and let E ⊂ [0, 1] be the set of all the x ∈ [0, 1] for which
the coin ﬂip came up heads. We suppose for contradiction that E
is Lebesgue measurable. Intuitively, since each x had a 50% chance
of being heads, E should occupy about “half” of [0, 1]; applying the
law of large numbers (see e.g. [Ta2009, '1.4]) in an extremely non
rigorous fashion, we thus expect m(E) to equal 1/2.
Moreover, given any subinterval [a, b] of [0, 1], the same reasoning
leads us to expect that E ∩ [a, b] should occupy about half of [a, b],
so that m(E ∩ [a, b]) should be [[a, b][/2. More generally, given any
elementary set F in [0, 1], we should have m(E ∩ F) = m(F)/2.
This makes it very hard for E to be approximated by an elementary
set; indeed, a little algebra then shows that m(E∆F) = 1/2 for any
elementary F ⊂ [0, 1]. Thus E is not Lebesgue measurable.
Unfortunately, the above argument is terribly nonrigorous for a
number of reasons, not the least of which is that it uses an uncountable
number of coin ﬂips, and the rigorous probabilistic theory that one
would have to use to model such a system of random variables is too
weak
12
to be able to assign meaningful probabilities to such events
as “E is Lebesgue measurable”. So we now turn to more rigorous
arguments that establish the existence of nonmeasurable sets. The
arguments will be fairly simple, but the sets constructed are somewhat
artiﬁcial in nature.
Proposition 1.2.18. There exists a subset E ⊂ [0, 1] which is not
Lebesgue measurable.
Proof. We use the fact that the rationals Q are an additive subgroup
of the reals R, and so partition the reals R into disjoint cosets x+Q.
This creates a quotient group R/Q := ¦x + Q : x ∈ R¦. Each
coset C of R/Q is dense in R, and so has a nonempty intersection
12
For some further discussion of this point, see [Ta2009, ¸1.10].
1.2. Lebesgue measure 45
with [0, 1]. Applying the axiom of choice, we may thus ﬁnd an element
x
C
∈ C∩[0, 1] for each C ∈ R/Q. We then let E := ¦x
C
: C ∈ R/Q¦
be the collection of all these coset representatives. By construction,
E ⊂ [0, 1].
Let y be any element of [0, 1]. Then it must lie in some coset C
of R/Q, and thus diﬀers from x
C
by some rational number in [−1, 1].
In other words, we have
(1.4) [0, 1] ⊂
¸
q∈Q∩[−1,1]
(E +q).
On the other hand, we clearly have
(1.5)
¸
q∈Q∩[−1,1]
(E +q) ⊂ [−1, 2].
Also, the diﬀerent translates E + q are disjoint, because E contains
only one element from each coset of Q.
We claim that E is not Lebesgue measurable. To see this, sup
pose for contradiction that E was Lebesgue measurable. Then the
translates E + q would also be Lebesgue measurable. By countable
additivity, we thus have
m(
¸
q∈Q∩[−1,1]
(E +q)) =
¸
q∈Q∩[−1,1]
m(E +q),
and thus by translation invariance and (1.4), (1.5)
1 ≤
¸
q∈Q∩[−1,1]
m(E) ≤ 3.
On the other hand, the sum
¸
q∈Q∩[−1,1]
m(E) is either zero (if
m(E) = 0) or inﬁnite (if m(E) > 0), leading to the desired con
tradiction.
Exercise 1.2.26 (Outer measure is not ﬁnitely additive). Show that
there exists disjoint bounded subsets E, F of the real line such that
m
∗
(E ∪F) = m
∗
(E) +m
∗
(F). (Hint: Show that the set constructed
in the proof of the above proposition has positive outer measure.)
Exercise 1.2.27 (Projections of measurable sets need not be mea
surable). Let π : R
2
→ R be the coordinate projection π(x, y) := x.
Show that there exists a measurable subset E of R
2
such that π(E)
is not measurable.
46 1. Measure theory
Remark 1.2.19. The above discussion shows that, in the presence of
the axiom of choice, one cannot hope to extend Lebesgue measure to
arbitrary subsets of R while retaining both the countable additivity
and the translation invariance properties. If one drops the transla
tion invariant requirement, then this question concerns the theory of
measurable cardinals, and is not decidable from the standard ZFC
axioms. On the other hand, one can construct ﬁnitely additive trans
lation invariant extensions of Lebesgue measure to the power set of
R by use of the HahnBanach theorem ('1.5 of An epsilon of room,
Vol. I ) to extend the integration functional, though we will not do
so here.
1.3. The Lebesgue integral
In Section 1.2, we deﬁned the Lebesgue measure m(E) of a Lebesgue
measurable set E ⊂ R
d
, and set out the basic properties of this
measure. In this set of notes, we use Lebesgue measure to deﬁne the
Lebesgue integral
R
d
f(x) dx
of functions f : R
d
→ C∪¦∞¦. Just as not every set can be measured
by Lebesgue measure, not every function can be integrated by the
Lebesgue integral; the function will need to be Lebesgue measurable.
Furthermore, the function will either need to be unsigned (taking
values on [0, +∞]), or absolutely integrable.
To motivate the Lebesgue integral, let us ﬁrst brieﬂy review two
simpler integration concepts. The ﬁrst is that of an inﬁnite summa
tion
∞
¸
n=1
c
n
of a sequence of numbers c
n
, which can be viewed as a discrete ana
logue of the Lebesgue integral. Actually, there are two overlapping,
but diﬀerent, notions of summation that we wish to recall here. The
ﬁrst is that of the unsigned inﬁnite sum, when the c
n
lie in the ex
tended nonnegative real axis [0, +∞]. In this case, the inﬁnite sum
1.3. The Lebesgue integral 47
can be deﬁned as the limit of the partial sums
(1.6)
∞
¸
n=1
c
n
= lim
N→∞
N
¸
n=1
c
n
or equivalently as a supremum of arbitrary ﬁnite partial sums:
(1.7)
∞
¸
n=1
c
n
= sup
A⊂N,A ﬁnite
¸
n∈A
c
n
.
The unsigned inﬁnite sum
¸
∞
n=1
c
n
always exists, but its value may
be inﬁnite, even when each term is individually ﬁnite (consider e.g.
¸
∞
n=1
1).
The second notion of a summation is the absolutely summable
inﬁnite sum, in which the c
n
lie in the complex plane C and obey the
absolute summability condition
∞
¸
n=1
[c
n
[ < ∞,
where the lefthand side is of course an unsigned inﬁnite sum. When
this occurs, one can show that the partial sums
¸
N
n=1
c
n
converge to
a limit, and we can then deﬁne the inﬁnite sum by the same formula
(1.6) as in the unsigned case, though now the sum takes values in
C rather than [0, +∞]. The absolute summability condition confers
a number of useful properties that are not obeyed by sums that are
merely conditionally convergent; most notably, the value of an abso
lutely convergent sum is unchanged if one rearranges the terms in the
series in an arbitrary fashion. Note also that the absolutely summable
inﬁnite sums can be deﬁned in terms of the unsigned inﬁnite sums by
taking advantage of the formulae
∞
¸
n=1
c
n
= (
∞
¸
n=1
Re(c
n
)) +i(
∞
¸
n=1
Im(c
n
))
for complex absolutely summable c
n
, and
∞
¸
n=1
c
n
=
∞
¸
n=1
c
+
n
−
∞
¸
n=1
c
−
n
48 1. Measure theory
for real absolutely summable c
n
, where c
+
n
:= max(c
n
, 0) and c
−
n
:=
max(−c
n
, 0) are the (magnitudes of the) positive and negative parts
of c
n
.
In an analogous spirit, we will ﬁrst deﬁne an unsigned Lebesgue
integral
R
d
f(x) dx of (measurable) unsigned functions f : R
d
→
[0, +∞], and then use that to deﬁne the absolutely convergent Lebesgue
integral
R
d
f(x) dx of absolutely integrable functions f : R
d
→
C ∪ ¦∞¦. (In contrast to absolutely summable series, which can
not have any inﬁnite terms, absolutely integrable functions will be
allowed to occasionally become inﬁnite. However, as we will see, this
can only happen on a set of Lebesgue measure zero.)
To deﬁne the unsigned Lebesgue integral, we now turn to another
more basic notion of integration, namely the
b
a
f(x) dx of a Riemann
integrable function f : [a, b] → R. Recall from Section 1.1 that this
integral is equal to the lower Darboux integral
b
a
f(x) =
b
a
f(x) dx := sup
g≤f;g piecewise constant
p.c.
b
a
g(x) dx.
(It is also equal to the upper Darboux integral; but much as the theory
of Lebesgue measure is easiest to deﬁne by relying solely on outer mea
sure and not on inner measure, the theory of the unsigned Lebesgue
integral is easiest to deﬁne by relying solely on lower integrals rather
than upper ones; the upper integral is somewhat problematic when
dealing with “improper” integrals of functions that are unbounded
or are supported on sets of inﬁnite measure.) Compare this formula
also with (1.7). The integral p.c.
b
a
g(x) dx is a piecewise constant
integral, formed by breaking up the piecewise constant functions g, h
into ﬁnite linear combinations of indicator functions 1
I
of intervals I,
and then measuring the length of each interval.
It turns out that virtually the same deﬁnition allows us to de
ﬁne a lower Lebesgue integral
R
d
f(x) dx of any unsigned function
f : R
d
→ [0, +∞], simply by replacing intervals with the more gen
eral class of Lebesgue measurable sets (and thus replacing piecewise
constant functions with the more general class of simple functions).
If the function is Lebesgue measurable (a concept that we will deﬁne
presently), then we refer to the lower Lebesgue integral simply as the
1.3. The Lebesgue integral 49
Lebesgue integral. As we shall see, it obeys all the basic properties one
expects of an integral, such as monotonicity and additivity; in sub
sequent notes we will also see that it behaves quite well with respect
to limits, as we shall see by establishing the two basic convergence
theorems of the unsigned Lebesgue integral, namely Fatou’s lemma
(Corollary 1.4.47) and the monotone convergence theorem (Theorem
1.4.44).
Once we have the theory of the unsigned Lebesgue integral, we
will then be able to deﬁne the absolutely convergent Lebesgue inte
gral, similarly to how the absolutely convergent inﬁnite sum can be
deﬁned using the unsigned inﬁnite sum. This integral also obeys all
the basic properties one expects, such as linearity and compatibility
with the more classical Riemann integral; in subsequent notes we will
see that it also obeys a fundamentally important convergence the
orem, the dominated convergence theorem (Theorem 1.4.49). This
convergence theorem makes the Lebesgue integral (and its abstract
generalisations to other measure spaces than R
d
) particularly suit
able for analysis, as well as allied ﬁelds that rely heavily on limits of
functions, such as PDE, probability, and ergodic theory.
Remark 1.3.1. This is not the only route to setting up the unsigned
and absolutely convergent Lebesgue integrals. For instance, one can
proceed with the unsigned integral but then making an auxiliary stop
at integration of functions that are bounded and are supported on
a set of ﬁnite measure, before going to the absolutely convergent
Lebesgue integral; see e.g. [StSk2005]. Another approach (which
will not be discussed here) is to take the metric completion of the
Riemann integral with respect to the L
1
metric.
The Lebesgue integral and Lebesgue measure can be viewed as
completions of the Riemann integral and Jordan measure respectively.
This means three things. Firstly, the Lebesgue theory extends the
Riemann theory: every Jordan measurable set is Lebesgue measur
able, and every Riemann integrable function is Lebesgue measurable,
with the measures and integrals from the two theories being compat
ible. Conversely, the Lebesgue theory can be approximated by the
Riemann theory; as we saw in Section 1.2, every Lebesgue measur
able set can be approximated (in various senses) by simpler sets, such
50 1. Measure theory
as open sets or elementary sets, and in a similar fashion, Lebesgue
measurable functions can be approximated by nicer functions, such
as Riemann integrable or continuous functions. Finally, the Lebesgue
theory is complete in various ways; this is formalised in '1.3 of An ep
silon of room, Vol. I, but the convergence theorems mentioned above
already hint at this completeness. A related fact, known as Egorov’s
theorem, asserts that a pointwise converging sequence of functions
can be approximated as a (locally) uniformly converging sequence of
functions. The facts listed here manifestations of Littlewood’s three
principles of real analysis (Section 1.3.5), which capture much of the
essence of the Lebesgue theory.
1.3.1. Integration of simple functions. Much as the Riemann
integral was set up by ﬁrst using the integral for piecewise constant
functions, the Lebesgue integral is set up using the integral for simple
functions.
Deﬁnition 1.3.2 (Simple function). A (complexvalued) simple func
tion f : R
d
→ C is a ﬁnite linear combination
(1.8) f = c
1
1
E1
+. . . +c
k
1
E
k
of indicator functions 1
Ei
of Lebesgue measurable sets E
i
⊂ R
d
for
i = 1, . . . , k, where k ≥ 0 is a natural number and c
1
, . . . , c
k
∈ C are
complex numbers. An unsigned simple function f : R
d
→ [0, +∞], is
deﬁned similarly, but with the c
i
taking values in [0, +∞] rather than
C.
It is clear from construction that the space Simp(R
d
) of complex
valued simple functions forms a complex vector space; also, Simp(R
d
)
also closed under pointwise product f, g → fg and complex conjuga
tion f → f. In short, Simp(R
d
) is a commutative ∗algebra. Mean
while, the space Simp
+
(R
d
) of unsigned simple functions is a [0, +∞]
module; it is closed under addition, and under scalar multiplication
by elements in [0, +∞].
In this deﬁnition, we did not require the E
1
, . . . , E
k
to be disjoint.
However, it is easy enough to arrange this, basically by exploiting
Venn diagrams (or, to use fancier language, ﬁnite boolean algebras).
Indeed, any k subsets E
1
, . . . , E
k
of R
d
partition R
d
into 2
k
disjoint
1.3. The Lebesgue integral 51
sets, each of which is an intersection of E
i
or the complement R
d
`E
i
for i = 1, . . . , k (and in particular, is measurable). The (complex or
unsigned) simple function is constant on each of these sets, and so can
easily be decomposed as a linear combination of the indicator function
of these sets. One easy consequence of this is that if f is a complex
valued simple function, then its absolute value [f[ : x → [f(x)[ is an
unsigned simple function.
It is geometrically intuitive that we should deﬁne the integral
R
d
1
E
(x) dx of an indicator function of a measurable set E to equal
m(E):
R
d
1
E
(x) dx = m(E).
Using this and applying the laws of integration formally, we are led to
propose the following deﬁnition for the integral of an unsigned simple
function:
Deﬁnition 1.3.3 (Integral of a unsigned simple function). If f =
c
1
1
E1
+. . .+c
k
1
E
k
is an unsigned simple function, the integral Simp
R
d
f(x) dx
is deﬁned by the formula
Simp
R
d
f(x) dx := c
1
m(E
1
) +. . . +c
k
m(E
k
),
thus Simp
R
d
f(x) dx will take values in [0, +∞].
However, one has to actually check that this deﬁnition is well
deﬁned, in the sense that diﬀerent representations
f = c
1
1
E1
+. . . +c
k
1
E
k
= c
/
1
1
E
1
+. . . +c
/
k
1
E
k
of a function as a ﬁnite unsigned combination of indicator func
tions of measurable sets will give the same value for the integral
Simp
R
d
f(x) dx. This is the purpose of the following lemma:
Lemma 1.3.4 (Welldeﬁnedness of simple integral). Let k, k
/
≥ 0 be
natural numbers, c
1
, . . . , c
k
, c
/
1
, . . . , c
/
k
∈ [0, +∞], and let E
1
, . . . , E
k
, E
/
1
, . . . , E
/
k
⊂
R
d
be Lebesgue measurable sets such that the identity
(1.9) c
1
1
E1
+. . . +c
k
1
E
k
= c
/
1
1
E
1
+. . . +c
/
k
1
E
k
holds identically on R
d
. Then one has
c
1
m(E
1
) +. . . +c
k
m(E
k
) = c
/
1
m(E
/
1
) +. . . +c
/
k
m(E
/
k
).
52 1. Measure theory
Proof. We again use a Venn diagram argument. The k + k
/
sets
E
1
, . . . , E
k
, E
/
1
, . . . , E
/
k
partition R
d
into 2
k+k
disjoint sets, each of
which is an intersection of some of the E
1
, . . . , E
k
, E
/
1
, . . . , E
/
k
and
their complements. We throw away any sets that are empty, leaving
us with a partition of R
d
into m nonempty disjoint sets A
1
, . . . , A
m
for some 0 ≤ m ≤ 2
k+k
. As the E
1
, . . . , E
k
, E
/
1
, . . . , E
/
k
are Lebesgue
measurable, the A
1
, . . . , A
m
are too. By construction, each of the
E
1
, . . . , E
k
, E
/
1
, . . . , E
k
arise as unions of some of the A
1
, . . . , A
m
,
thus we can write
E
i
=
¸
j∈Ji
A
j
and
E
/
i
=
¸
j
∈J
i
A
j
for all i = 1, . . . , k and i
/
= 1, . . . , k
/
, and some subsets J
i
, J
/
i
⊂
¦1, . . . , m¦. By ﬁnite additivity of Lebesgue measure, we thus have
m(E
i
) =
¸
j∈Ji
m(A
j
)
and
m(E
/
i
) =
¸
j∈J
i
m(A
j
)
Thus, our objective is now to show that
(1.10)
k
¸
i=1
c
i
¸
j∈Ji
m(A
j
) =
k
¸
i
=1
c
/
i
¸
j∈J
i
m(A
j
).
To obtain this, we ﬁx 1 ≤ j ≤ m and evaluate (1.9) at a point x in
the nonempty set A
j
. At such a point, 1
Ei
(x) is equal to 1
Ji
(j), and
similarly 1
E
i
is equal to 1
J
i
(j). From (1.9) we conclude that
k
¸
i=1
c
i
1
Ji
(j) =
k
¸
i
=1
c
/
i
1
J
i
(j).
Multiplying this by m(A
j
) and then summing over all j = 1, . . . , m
we obtain (1.10).
We now make some important deﬁnitions that we will use repeat
edly in this text:
1.3. The Lebesgue integral 53
Deﬁnition 1.3.5 (Almost everywhere and support). A property P(x)
of a point x ∈ R
d
is said to hold (Lebesgue) almost everywhere in R
d
,
or for (Lebesgue) almost every point x ∈ R
d
, if the set of x ∈ R
d
for
which P(x) fails has Lebesgue measure zero (i.e. P is true outside of a
null set). We usually omit the preﬁx Lebesgue, and often abbreviate
“almost everywhere” or “almost every” as a.e.
Two functions f, g : R
d
→ Z into an arbitrary range Z are said
to agree almost everywhere if one has f(x) = g(x) for almost every
x ∈ R
d
.
The support of a function f : R
d
→ C or f : R
d
→ [0, +∞] is
deﬁned to be the set ¦x ∈ R
d
: f(x) = 0¦ where f is nonzero.
Note that if P(x) holds for almost every x, and P(x) implies Q(x),
then Q(x) holds for almost every x. Also, if P
1
(x), P
2
(x), . . . are an at
most countable family of properties, each of which individually holds
for almost every x, then they will simultaneously be true for almost
every x, because the countable union of null sets is still a null set.
Because of these properties, one can (as a rule of thumb) treat the
almost universal quantiﬁer “for almost every” as if it was the truly
universal quantiﬁer “for every”, as long as one is only concatenating at
most countably many properties together, and as long as one never
specialises the free variable x to a null set. Observe also that the
property of agreeing almost everywhere is an equivalence relation,
which we will refer to as almost everywhere equivalence.
In An epsilon of room, Vol. I we will also see the notion of the
closed support of a function f : R
d
→ C, deﬁned as the closure of the
support.
The following properties of the simple unsigned integral are easily
obtained from the deﬁnitions:
Exercise 1.3.1 (Basic properties of the simple unsigned integral).
Let f, g : R
d
→ [0, +∞] be simple unsigned functions.
54 1. Measure theory
(i) (Unsigned linearity) We have
Simp
R
d
f(x) +g(x) dx = Simp
R
d
f(x) dx
+ Simp
R
d
g(x) dx
and
Simp
R
d
cf(x) dx = c Simp
R
d
f(x) dx
for all c ∈ [0, +∞].
(ii) (Finiteness) We have Simp
R
d
f(x) dx < ∞ if and only
if f is ﬁnite almost everywhere, and its support has ﬁnite
measure.
(iii) (Vanishing) We have Simp
R
d
f(x) dx = 0 if and only if f
is zero almost everywhere.
(iv) (Equivalence) If f and g agree almost everywhere, then
Simp
R
d
f(x) dx = Simp
R
d
g(x) dx.
(v) (Monotonicity) If f(x) ≤ g(x) for almost every x ∈ R
d
, then
Simp
R
d
f(x) dx ≤ Simp
R
d
g(x) dx.
(vi) (Compatibility with Lebesgue measure) For any Lebesgue
measurable E, one has Simp
R
d
1
E
(x) dx = m(E).
Furthermore, show that the simple unsigned integral f → Simp
R
d
f(x) dx
is the only map from the space Simp
+
(R
d
) of unsigned simple func
tions to [0, +∞] that obeys all of the above properties.
We can now deﬁne an absolutely convergent counterpart to the
simple unsigned integral. This integral will soon be superceded by
the absolutely Lebesgue integral, but we give it here as motivation
for that more general notion of integration.
Deﬁnition 1.3.6 (Absolutely convergent simple integral). A complex
valued simple function f : R
d
→ C is said to be absolutely integrable
of Simp
R
d
[f(x)[ dx < ∞. If f is absolutely integrable, the integral
Simp
R
d
f(x) dx is deﬁned for real signed f by the formula
Simp
R
d
f(x) dx := Simp
R
d
f
+
(x) dx −Simp
R
d
f
−
(x) dx
1.3. The Lebesgue integral 55
where f
+
(x) := max(f(x), 0) and f
−
(x) := max(−f(x), 0) (note that
these are unsigned simple functions that are pointwise dominated by
[f[ and thus have ﬁnite integral), and for complexvalued f by the
formula
13
Simp
R
d
f(x) dx := Simp
R
d
Re f(x) dx
+i Simp
R
d
Imf(x) dx.
Note from the preceding exercise that a complexvalued simple
function f is absolutely integrable if and only if it has ﬁnite measure
support (since ﬁniteness almost everywhere is automatic). In particu
lar, the space Simp
abs
(R
d
) of absolutely integrable simple functions is
closed under addition and scalar multiplication by complex numbers,
and is thus a complex vector space.
The properties of the unsigned simple integral then can be used
to deduce analogous properties for the complexvalued integral:
Exercise 1.3.2 (Basic properties of the complexvalued simple inte
gral). Let f, g : R
d
→ C be absolutely integrable simple functions.
(i) (*linearity) We have
Simp
R
d
f(x) +g(x) dx = Simp
R
d
f(x) dx
+ Simp
R
d
g(x) dx
and
(1.11) Simp
R
d
cf(x) dx = c Simp
R
d
f(x) dx
for all c ∈ C. Also we have
Simp
R
d
f(x) dx = Simp
R
d
f(x) dx.
(ii) (Equivalence) If f and g agree almost everywhere, then
Simp
R
d
f(x) dx = Simp
R
d
g(x) dx.
13
Strictly speaking, this is an abuse of notation as we have now deﬁned the simple
integral Simp
R
d
three diﬀerent times, for unsigned, real signed, and complexvalued
simple functions, but one easily veriﬁes that these three deﬁnitions agree with each
other on their common domains of deﬁnition, so it is safe to use a single notation for
all three.
56 1. Measure theory
(iii) (Compatibility with Lebesgue measure) For any Lebesgue
measurable E, one has Simp
R
d
1
E
(x) dx = m(E).
(Hints: Work out the realvalued counterpart of the linearity prop
erty ﬁrst. To establish (1.11), treat the cases c > 0, c = 0, c = −1
separately. To deal with the additivity for real functions f, g, start
with the identity
f +g = (f +g)
+
−(f +g)
−
= (f
+
−f
−
) + (g
+
−g
−
)
and rearrange the second inequality so that no subtraction appears.)
Furthermore, show that the complexvalued simple integral f →
Simp
R
d
f(x) dx is the only map from the space Simp
abs
(R
d
) of ab
solutely integrable simple functions to C that obeys all of the above
properties.
We now comment further on the fact that (simple) functions that
agree almost everywhere, have the same integral. We can view this
as an assertion that integration is a noisetolerant operation: one can
have “noise” or “errors” in a function f(x) on a null set, and this
will not aﬀect the ﬁnal value of the integral. Indeed, once one has
this noise tolerance, one can even integrate functions f that are not
deﬁned everywhere on R
d
, but merely deﬁned almost everywhere on
R
d
(i.e. f is deﬁned on some set R
d
`N where N is a null set), simply
by extending f to all of R
d
in some arbitrary fashion (e.g. by setting
f equal to zero on N). This is extremely convenient for analysis, as
there are many natural functions (e.g.
sin x
x
in one dimension, or
1
]x]
α
for various α > 0 in higher dimensions) that are only deﬁned almost
everywhere instead of everywhere (often due to “division by zero”
problems when a denominator vanishes). While such functions cannot
be evaulated at certain singular points, they can still be integrated
(provided they obey some integrability condition, of course, such as
absolute integrability), and so one can still perform a large portion of
analysis on such functions.
In fact, in the subﬁeld of analysis known as functional analysis, it
is convenient to abstract the notion of an almost everywhere deﬁned
function somewhat, by replacing any such function f with the equiv
alence class of almost everywhere deﬁned functions that are equal to
f almost everywhere. Such classes are then no longer functions in the
1.3. The Lebesgue integral 57
standard settheoretic sense (they do not map each point in the do
main to a unique point in the range, since points in R
d
have measure
zero), but the properties of various function spaces improve when one
does this (various seminorms become norms, various topologies be
come Hausdorﬀ, and so forth). See '1.3 of An epsilon of room, Vol.
I for further discussion.
Remark 1.3.7. The “Lebesgue philosophy” that one is willing to lose
control on sets of measure zero is a perspective that distinguishes
Lebesguetype analysis from other types of analysis, most notably
that of descriptive set theory, which is also interested in studying
subsets of R
d
, but can give completely diﬀerent structural classiﬁ
cations to a pair of sets that agree almost everywhere. This loss of
control on null sets is the price one has to pay for gaining access to
the powerful tool of the Lebesgue integral; if one needs to control a
function at absolutely every point, and not just almost every point,
then one often needs to use other tools than integration theory (un
less one has some regularity on the function, such as continuity, that
lets one pass from almost everywhere true statements to everywhere
true statements).
1.3.2. Measurable functions. Much as the piecewise constant in
tegral can be completed to the Riemann integral, the unsigned simple
integral can be completed to the unsigned Lebesgue integral, by ex
tending the class of unsigned simple functions to the larger class of
unsigned Lebesgue measurable functions. One of the shortest ways
to deﬁne this class is as follows:
Deﬁnition 1.3.8 (Unsigned measurable function). An unsigned func
tion f : R
d
→ [0, +∞] is unsigned Lebesgue measurable, or measurable
for short, if it is the pointwise limit of unsigned simple functions, i.e.
if there exists a sequence f
1
, f
2
, f
3
, . . . : R
d
→ [0, +∞] of unsigned
simple functions such that f
n
(x) → f(x) for every x ∈ R
d
.
This particular deﬁnition is not always the most tractable. For
tunately, it has many equivalent forms:
Lemma 1.3.9 (Equivalent notions of measurability). Let f : R
d
→
[0, +∞] be an unsigned function. Then the following are equivalent:
58 1. Measure theory
(i) f is unsigned Lebesgue measurable.
(ii) f is the pointwise limit of unsigned simple functions f
n
(thus
the limit lim
n→∞
f
n
(x) exists and is equal to f(x) for all
x ∈ R
d
).
(iii) f is the pointwise almost everywhere limit of unsigned simple
functions f
n
(thus the limit lim
n→∞
f
n
(x) exists and is equal
to f(x) for almost every x ∈ R
d
).
(iv) f is the supremum f(x) = sup
n
f
n
(x) of an increasing se
quence 0 ≤ f
1
≤ f
2
≤ . . . of unsigned simple functions f
n
,
each of which are bounded with ﬁnite measure support.
(v) For every λ ∈ [0, +∞], the set ¦x ∈ R
d
: f(x) > λ¦ is
Lebesgue measurable.
(vi) For every λ ∈ [0, +∞], the set ¦x ∈ R
d
: f(x) ≥ λ¦ is
Lebesgue measurable.
(vii) For every λ ∈ [0, +∞], the set ¦x ∈ R
d
: f(x) < λ¦ is
Lebesgue measurable.
(ix) For every λ ∈ [0, +∞], the set ¦x ∈ R
d
: f(x) ≤ λ¦ is
Lebesgue measurable.
(x) For every interval I ⊂ [0, +∞), the set f
−1
(I) := ¦x ∈ R
d
:
f(x) ∈ I¦ is Lebesgue measurable.
(xi) For every (relatively) open set U ⊂ [0, +∞), the set f
−1
(U) :=
¦x ∈ R
d
: f(x) ∈ U¦ is Lebesgue measurable.
(xii) For every (relatively) closed set K ⊂ [0, +∞), the set f
−1
(K) :=
¦x ∈ R
d
: f(x) ∈ K¦ is Lebesgue measurable.
Proof. (i) and (ii) are equivalent by deﬁnition. (ii) clearly implies
(iii). As every monotone sequence in [0, +∞] converges, (iv) implies
(ii). Now we show that (iii) implies (v). If f is the pointwise almost
everywhere limit of f
n
, then for almost every x ∈ R
d
one has
f(x) = lim
n→∞
f
n
(x) = limsup
n→∞
f
n
(x) = inf
N>0
sup
n≥N
f
n
(x).
This implies that, for any λ, the set ¦x ∈ R
d
: f(x) > λ¦ is equal to
¸
M>0
¸
N>0
¦x ∈ R
d
: sup
n≥N
f
n
(x) > λ +
1
M
¦
1.3. The Lebesgue integral 59
outside of a set of measure zero; this set in turn is equal to
¸
M>0
¸
N>0
¸
n≥N
¦x ∈ R
d
: f
n
(x) > λ +
1
M
¦
outside of a set of measure zero. But as each f
n
is an unsigned simple
function, the sets ¦x ∈ R
d
: f
n
(x) > λ +
1
M
¦ are Lebesgue measur
able. Since countable unions or countable intersections of Lebesgue
measurable sets are Lebesgue measurable, and modifying a Lebesgue
measurable set on a null set produces another Lebesgue measurable
set, we obtain (v).
To obtain the equivalence of (v) and (vi), observe that
¦x ∈ R
d
: f(x) ≥ λ¦ =
¸
λ
∈Q
+
:λ
<λ
¦x ∈ R
d
: f(x) > λ
/
¦
for λ ∈ (0, +∞] and
¦x ∈ R
d
: f(x) > λ¦ =
¸
λ
∈Q
+
:λ
>λ
¦x ∈ R
d
: f(x) ≥ λ
/
¦
λ ∈ [0, +∞), where Q
+
:= Q ∩ [0, +∞] are the nonnegative ratio
nals. The claim then easily follows from the countable nature of Q
+
(treating the extreme cases λ = 0, +∞ separately if necessary). A
similar argument lets one deduce (v) or (vi) from (ix).
The equivalence of (v), (vi) with (vii), (viii) comes from the ob
servation that ¦x ∈ R
d
: f(x) ≤ λ¦ is the complement of ¦x ∈
R
d
: f(x) > λ¦, and ¦x ∈ R
d
: f(x) < λ¦ is the complement of
¦x ∈ R
d
: f(x) ≥ λ¦. A similar argument shows that (x) and (xi) are
equivalent.
By expressing an interval as the intersection of two halfintervals,
we see that (ix) follows from (v)(viii), and so all of (v)(ix) are now
shown to be equivalent.
Clearly (x) implies (vii), and hence (v)(ix). Conversely, because
every open set in [0, +∞) is the union of countably many open inter
vals in [0, +∞), (ix) implies (x).
The only remaining task is to show that (v)(xi) implies (iv).
Let f obey (v)(xi). For each positive integer n, we let f
n
(x) be
deﬁned to be the largest integer multiple of 2
−n
that is less than or
equal to min(f(x), n) when [x[ ≤ n, with f
n
(x) := 0 for [x[ > n.
60 1. Measure theory
From construction it is easy to see that the f
n
: R
d
→ [0, +∞] are
increasing and have f as their supremum. Furthermore, each f
n
takes
on only ﬁnitely many values, and for each nonzero value c it attains,
the set f
−1
n
(c) takes the form f
−1
(I
c
) ∩ ¦x ∈ R
d
: [x[ ≤ n¦ for some
interval or ray I
c
, and is thus measurable. As a consequence, f
n
is
a simple function, and by construction it is bounded and has ﬁnite
measure support. The claim follows.
With these equivalent formulations, we can now generate plenty
of measurable functions:
Exercise 1.3.3.
(i) Show that every continuous function f : R
d
→ [0, +∞] is
measurable.
(ii) Show that every unsigned simple function is measurable.
(iii) Show that the supremum, inﬁmum, limit superior, or limit
inferior of unsigned measurable functions is unsigned mea
surable.
(iv) Show that an unsigned function that is equal almost every
where to an unsigned measurable function, is itself measur
able.
(v) Show that if a sequence f
n
of unsigned measurable functions
converges pointwise almost everywhere to an unsigned limit
f, then f is also measurable.
(vi) If f : R
d
→ [0, +∞] is measurable and φ : [0, +∞] →
[0, +∞] is continuous, show that φ ◦ f : R
d
→ [0, +∞] is
measurable.
(vii) If f, g are unsigned measurable functions, show that f + g
and fg are measurable.
In view of Exercise 1.3.3(iv), one can deﬁne the concept of mea
surability for an unsigned function that is only deﬁned almost ev
erywhere on R
d
, rather than everywhere on R
d
, by extending that
function arbitrarily to the null set where it is currently undeﬁned.
1.3. The Lebesgue integral 61
Exercise 1.3.4. Let f : R
d
→ [0, +∞]. Show that f is a bounded
unsigned measurable function if and only if f is the uniform limit of
bounded simple functions.
Exercise 1.3.5. Show that an unsigned function f : R
d
→ [0, +∞]
is a simple function if and only if it is measurable and takes on at
most ﬁnitely many values.
Exercise 1.3.6. Let f : R
d
→ [0, +∞] be an unsigned measurable
function. Show that the region ¦(x, t) ∈ R
d
R : 0 ≤ t ≤ f(x)¦ is
a measurable subset of R
d+1
. (There is a converse to this statement,
but we will wait until Exercise 1.7.24 to prove it, once we have the
FubiniTonelli theorem (Corollary 1.7.23) available to us.)
Remark 1.3.10. Lemma 1.3.9 tells us that if f : R
d
→ [0, +∞] is
measurable, then f
−1
(E) is Lebesgue measurable for many classes of
sets E. However, we caution that it is not necessarily the case that
f
−1
(E) is Lebesgue measurable if E is Lebesgue measurable. To see
this, we let C be the Cantor set
C := ¦
∞
¸
j=1
a
j
3
−j
: a
j
∈ ¦0, 2¦ for all j¦
and let f : R → [0, +∞] be the function deﬁned by setting
f(x) :=
∞
¸
j=1
2b
j
3
−j
whenever x ∈ [0, 1] is not a terminating binary decimal, and so has
a unique binary expansion x =
¸
∞
j=1
b
j
2
−j
for some b
j
∈ ¦0, 1¦, and
f(x) := 0 otherwise. We thus see that f takes values in C, and is
bijective on the set A of nonterminating decimals in [0, 1]. Using
Lemma 1.3.9, it is not diﬃcult to show that f is measurable. On the
other hand, by modifying the construction from the previous notes,
we can ﬁnd a subset F of A which is nonmeasurable. If we set
E := f(F), then E is a subset of the null set C and is thus itself
a null set; but f
−1
(E) = F is nonmeasurable, and so the inverse
image of a Lebesgue measurable set by a measurable function need
not remain Lebesgue measurable.
62 1. Measure theory
However, we will later see that it is still true that f
−1
(E) is
Lebesgue measurable if E has a slightly stronger measurability prop
erty than Lebesgue measurability, namely Borel measurability; see
Exercise 1.4.29(iii).
Now we can deﬁne the concept of a complexvalued measurable
function. As discussed earlier, it will be convenient to allow for such
functions to only be deﬁned almost everywhere, rather than every
where, to allow for the possibility that the function becomes singular
or otherwise undeﬁned on a null set.
Deﬁnition 1.3.11 (Complex measurability). An almost everywhere
deﬁned complexvalued function f : R
d
→ C is Lebesgue measurable,
or measurable for short, if it is the pointwise almost everywhere limit
of complexvalued simple functions.
As before, there are several equivalent deﬁnitions:
Exercise 1.3.7. Let f : R
d
→ C be an almost everywhere deﬁned
complexvalued function. Then the following are equivalent:
(i) f is measurable.
(ii) f is the pointwise almost everywhere limit of complexvalued
simple functions.
(iii) The (magnitudes of the) positive and negative parts of Re(f)
and Im(f) are unsigned measurable functions.
(iv) f
−1
(U) is Lebesgue measurable for every open set U ⊂ C.
(v) f
−1
(K) is Lebesgue measurable for every closed set K ⊂ C.
From the above exercise, we see that the notion of complexvalued
measurability and unsigned measurability are compatible when ap
plied to a function that takes values in [0, +∞) = [0, +∞] ∩C every
where (or almost everywhere).
Exercise 1.3.8.
(i) Show that every continuous function f : R
d
→ C is mea
surable.
(ii) Show that a function f : R
d
→ C is simple if and only if it
is measurable and takes on at most ﬁnitely many values.
1.3. The Lebesgue integral 63
(iii) Show that a complexvalued function that is equal almost
everywhere to an measurable function, is itself measurable.
(iv) Show that if a sequence f
n
of complexvalued measurable
functions converges pointwise almost everywhere to an complex
valued limit f, then f is also measurable.
(v) If f : R
d
→ C is measurable and φ : C → C is continuous,
show that φ ◦ f : R
d
→ C is measurable.
(vi) If f, g are measurable functions, show that f +g and fg are
measurable.
Exercise 1.3.9. Let f : [a, b] → R be a Riemann integrable function.
Show that if one extends f to all of R by deﬁning f(x) = 0 for
x ∈ [a, b], then f is measurable.
1.3.3. Unsigned Lebesgue integrals. We are now ready to inte
grate unsigned measurable functions. We begin with the notion of the
lower unsigned Lebesgue integral, which can be deﬁned for arbitrary
unsigned functions (not necessarily measurable):
Deﬁnition 1.3.12 (Lower unsigned Lebesgue integral). Let f : R
d
→
[0, +∞] be an unsigned function (not necessarily measurable). We
deﬁne the lower unsigned Lebesgue integral
R
d
f(x) dx to be the
quantity
R
d
f(x) dx := sup
0≤g≤f;g simple
Simp
R
d
g(x) dx
where g ranges over all unsigned simple functions g : R
d
→ [0, +∞]
that are pointwise bounded by f.
One can also deﬁne the upper unsigned Lebesgue integral
R
d
f(x) dx := inf
h≥f;h simple
Simp
R
d
h(x) dx
but we will use this integral much more rarely. Note that both inte
grals take values in [0, +∞], and that the upper Lebesgue integral is
always at least as large as the lower Lebesgue integral.
In the deﬁnition of the lower unsigned Lebesgue integral, g is
required to be bounded by f pointwise everywhere, but it is easy to
64 1. Measure theory
see that one could also require g to just be bounded by f pointwise
almost everywhere without aﬀecting the value of the integral, since
the simple integral is not aﬀected by modiﬁcations on sets of measure
zero.
The following properties of the lower Lebesgue integral are easy
to establish:
Exercise 1.3.10 (Basic properties of the lower Lebesgue integral).
Let f, g : R
d
→ [0, +∞] be unsigned functions (not necessarily mea
surable).
(i) (Compatibility with the simple integral) If f is simple, then
R
d
f(x) dx =
R
d
f(x) dx = Simp
R
d
f(x) dx.
(ii) (Monotonicity) If f ≤ g pointwise almost everywhere, then
R
d
f(x) dx ≤
R
d
g(x) dx and
R
d
f(x) dx ≤
R
d
g(x) dx.
(iii) (Homogeneity) If c ∈ [0, +∞), then
R
d
cf(x) dx = c
R
d
f(x) dx.
(The claim unfortunately fails for c = +∞, but this is some
what tricky to show.)
(iv) (Equivalence) If f, g agree almost everywhere, then
R
d
f(x) dx =
R
d
g(x) dx and
R
d
f(x) dx =
R
d
g(x) dx.
(v) (Superadditivity)
R
d
f(x)+g(x) dx ≥
R
d
f(x) dx+
R
d
g(x) dx.
(vi) (Subadditivity of upper integral)
R
d
f(x)+g(x) dx ≤
R
d
f(x) dx+
R
d
g(x) dx
(vii) (Divisibility) For any measurable set E, one has
R
d
f(x) dx =
R
d
f(x)1
E
(x) dx +
R
d
f(x)1
R
d
\E
(x) dx.
(viii) (Horizontal truncation) As n → ∞,
R
d
min(f(x), n) dx
converges to
R
d
f(x) dx.
(ix) (Vertical truncation) As n → ∞,
R
d
f(x)1
]x]≤n
dx con
verges to
R
d
f(x) dx. Hint: From Exercise 1.2.11 one has
m(E ∩ ¦x : [x[ ≤ n¦) → m(E) for any measurable set E.
(x) (Reﬂection) If f + g is a simple function that is bounded
with ﬁnite measure support (i.e. it is absolutely integrable),
then Simp
R
d
f(x) +g(x) dx =
R
d
f(x) dx +
R
d
g(x) dx.
1.3. The Lebesgue integral 65
Do the horizontal and vertical truncation properties hold if the lower
Lebesgue integral is replaced with the upper Lebesgue integral?
Now we restrict attention to measurable functions.
Deﬁnition 1.3.13 (Unsigned Lebesgue integral). If f : R
d
→ [0, +∞]
is measurable, we deﬁne the unsigned Lebesgue integral
R
d
f(x) dx
of f to equal the lower unsigned Lebesgue integral
R
d
f(x) dx. (For
nonmeasurable functions, we leave the unsigned Lebesgue integral
undeﬁned.)
One nice feature of measurable functions is that the lower and
upper Lebesgue integrals can match, if one also assumes some bound
edness:
Exercise 1.3.11. Let f : R
d
→ [0, +∞] be measurable, bounded,
and vanishing outside of a set of ﬁnite measure. Show that the lower
and upper Lebesgue integrals of f agree. (Hint: use Exercise 1.3.4.)
There is a converse to this statement, but we will defer it to later
notes. What happens if f is allowed to be unbounded, or is not
supported inside a set of ﬁnite measure?
This gives an important corollary:
Corollary 1.3.14 (Finite additivity of the Lebesgue integral). Let
f, g : R
d
→ [0, +∞] be measurable. Then
R
d
f(x) + g(x) dx =
R
d
f(x) dx +
R
d
g(x) dx.
Proof. From the horizontal truncation property and a limiting ar
gument, we may assume that f, g are bounded. From the vertical
truncation property and another limiting argument, we may assume
that f, g are supported inside a bounded set. From Exercise 1.3.11,
we now see that the lower and upper Lebesgue integrals of f, g, and
f +g agree. The claim now follows by combining the superadditivity
of the lower Lebesgue integral with the subadditivity of the upper
Lebesgue integral.
In the next section we will improve this ﬁnite additivity property
for the unsigned Lebesgue integral further, to countable additivity;
66 1. Measure theory
this property is also known as the monotone convergence theorem
(Theorem 1.4.44).
Exercise 1.3.12 (Upper Lebesgue integral and outer Lebesgue mea
sure). Show that for any set E ⊂ R
d
,
R
d
1
E
(x) dx = m
∗
(E). Con
clude that the upper and lower Lebesgue integrals are not necessarily
additive if no measurability hypotheses are assumed.
Exercise 1.3.13 (Area interpretation of integral). If f : R
d
→
[0, +∞] is measurable, show that
R
d
f(x) dx is equal to the d + 1
dimensional Lebesgue measure of the region ¦(x, t) ∈ R
d
R : 0 ≤
t ≤ f(x)¦. (This can be used as an alternate, and more geometrically
intuitive, deﬁnition of the unsigned Lebesgue integral; it is a more
convenient formulation for establishing the basic convergence theo
rems, but not quite as convenient for establishing basic properties
such as additivity.) (Hint: use Exercise 1.2.22.)
Exercise 1.3.14 (Uniqueness of the Lebesgue integral). Show that
the Lebesgue integral f →
R
d
f(x) dx is the only map from measur
able unsigned functions f : R
d
→ [0, +∞] to [0, +∞] that obeys the
following properties for measurable f, g : R
d
→ [0, +∞]:
(i) (Compatibility with the simple integral) If f is simple, then
R
d
f(x) dx = Simp
R
d
f(x) dx.
(ii) (Finite additivity)
R
d
f(x) + g(x) dx =
R
d
f(x) dx +
R
d
g(x) dx.
(iii) (Horizontal truncation) As n → ∞,
R
d
min(f(x), n) dx
converges to
R
d
f(x) dx.
(iv) (Vertical truncation) As n → ∞,
R
d
f(x)1
]x]≤n
dx con
verges to
R
d
f(x) dx.
Exercise 1.3.15 (Translation invariance). Let f : R
d
→ [0, +∞] be
measurable. Show that
R
d
f(x+y) dx =
R
d
f(x) dx for any y ∈ R
d
.
Exercise 1.3.16 (Linear change of variables). Let f : R
d
→ [0, +∞]
be measurable, and let T : R
d
→ R
d
be an invertible linear trans
formation. Show that
R
d
f(T
−1
(x)) dx = [ det T[
R
d
f(x) dx, or
equivalently
R
d
f(Tx) dx =
1
] det T]
R
d
f(x) dx.
1.3. The Lebesgue integral 67
Exercise 1.3.17 (Compatibility with the Riemann integral). Let
f : [a, b] → [0, +∞] be Riemann integrable. If we extend f to R by
declaring f to equal zero outside of [a, b], show that
R
f(x) dx =
b
a
f(x) dx.
We record a basic inequality, known as Markov’s inequality, that
asserts that the Lebesgue integral of an unsigned measurable function
controls how often that function can be large:
Lemma 1.3.15 (Markov’s inequality). Let f : R
d
→ [0, +∞] be
measurable. Then for any 0 < λ < ∞, one has
m(¦x ∈ R
d
: f(x) ≥ λ¦) ≤
1
λ
R
d
f(x) dx.
Proof. We have the trivial pointwise inequality
λ1
¦x∈R
d
:f(x)≥λ¦
≤ f(x).
From the deﬁnition of the lower Lebesgue integral, we conclude that
λm(¦x ∈ R
d
: f(x) ≥ λ¦) ≤
R
d
f(x) dx
and the claim follows.
By sending λ to inﬁnity or to zero, we obtain the following im
portant corollary:
Exercise 1.3.18. Let f : R
d
→ [0, +∞] be measurable.
(i) Show that if
R
d
f(x) dx < ∞, then f is ﬁnite almost ev
erywhere. Give a counterexample to show that the converse
statement is false.
(ii) Show that
R
d
f(x) dx = 0 if and only if f is zero almost
everywhere.
Remark 1.3.16. The use of the integral
R
d
f(x) dx to control the
distribution of f is known as the ﬁrst moment method. One can also
control this distribution using higher moments such as
R
d
[f(x)[
p
dx
for various values of p, or exponential moments such as
R
d
e
tf(x)
dx
or the Fourier moments
R
d
e
itf(x)
dx for various values of t; such
moment methods are fundamental to probability theory.
68 1. Measure theory
1.3.4. Absolute integrability. Having set out the theory of the
unsigned Lebesgue integral, we can now deﬁne the absolutely conver
gent Lebesgue integral.
Deﬁnition 1.3.17 (Absolute integrability). An almost everywhere
deﬁned measurable function f : R
d
→ C is said to be absolutely
integrable if the unsigned integral
f
L
1
(R
d
)
:=
R
d
[f(x)[ dx
is ﬁnite. We refer to this quantity f
L
1
(R
d
)
as the L
1
(R
d
) norm of
f, and use L
1
(R
d
) or L
1
(R
d
→ C) to denote the space of absolutely
integrable functions. If f is realvalued and absolutely integrable, we
deﬁne the Lebesgue integral
R
d
f(x) dx by the formula
(1.12)
R
d
f(x) dx :=
R
d
f
+
(x) dx −
R
d
f
−
(x) dx
where f
+
:= max(f, 0), f
−
:= max(−f, 0) are the magnitudes of the
positive and negative components of f (note that the two unsigned
integrals on the righthand side are ﬁnite, as f
+
, f
−
are pointwise
dominated by [f[). If f is complexvalued and absolutely integrable,
we deﬁne the Lebesgue integral
R
d
f(x) dx by the formula
R
d
f(x) dx :=
R
d
Re f(x) dx +i
R
d
Imf(x) dx
where the two integrals on the right are interpreted as realvalued
absolutely integrable Lebesgue integrals. It is easy to see that the
unsigned, realvalued, and complexvalued Lebesgue integrals deﬁned
in this manner are compatible on their common domains of deﬁnition.
Note from construction that the absolutely integrable Lebesgue
integral extends the absolutely integrable simple integral, which is
now redundant and will not be needed any further in the sequel.
Remark 1.3.18. One can attempt to deﬁne integrals for nonabsolutely
integrable functions, analogous to the improper integrals
∞
0
f(x) dx :=
lim
R→∞
R
0
f(x) dx or the principal value integrals p.v.
∞
−∞
f(x) dx :=
lim
R→∞
R
−R
f(x) dx one sees in the classical onedimensional Rie
mannian theory. While one can certainly generate any number of
such extensions of the Lebesgue integral concept, such extensions tend
1.3. The Lebesgue integral 69
to be poorly behaved with respect to various important operations,
such as change of variables or exchanging limits and integrals, so it is
usually not worthwhile to try to set up a systematic theory for such
nonabsolutelyintegrable integrals that is anywhere near as complete
as the absolutely integrable theory, and instead deal with such exotic
integrals on an ad hoc basis.
From the pointwise triangle inequality [f(x) + g(x)[ ≤ [f(x)[ +
[g(x)[, we conclude the L
1
triangle inequality
(1.13) f +g
L
1
(R
d
)
≤ f
L
1
(R
d
)
+g
L
1
(R
d
)
for any almost everywhere deﬁned measurable f, g : R
d
→ C. It is
also easy to see that
cf
L
1
(R
d
)
= [c[f
L
1
(R
d
)
for any complex number c. As such, we see that L
1
(R
d
→ C) is
a complex vector space. (The L
1
norm is then a seminorm on this
space; see '1.3 of An epsilon of room, Vol. I.) From Exercise 1.3.18
we make the important observation that a function f ∈ L
1
(R
d
→ C)
has zero L
1
norm, f
L
1
(R
d
)
= 0, if and only if f is zero almost
everywhere.
Given two functions f, g ∈ L
1
(R
d
→ C), we can deﬁne the L
1
distance d
L
1(f, g) between them by the formula
d
L
1(f, g) := f −g
L
1
(R
d
)
.
Thanks to (1.13), this distance obeys almost all the axioms of a met
ric on L
1
(R
d
), with one exception: it is possible for two diﬀerent
functions f, g ∈ L
1
(R
d
→ C) to have a zero L
1
distance, if they agree
almost everywhere. As such, d
L
1 is only a semimetric (also known
as a pseudometric) rather than a metric. However, if one adopts the
convention that any two functions that agree almost everywhere are
considered equivalent (or more formally, one works in the quotient
space of L
1
(R
d
) by the equivalence relation of almost everywhere
agreement, which by abuse of notation is also denoted L
1
(R
d
)), then
one recovers a genuine metric. (Later on, we will establish the im
portant fact that this metric makes the (quotient space) L
1
(R
d
) a
70 1. Measure theory
complete metric space, a fact known as the L
1
RieszFischer theo
rem; this completeness is one of the main reasons we spend so much
eﬀort setting up Lebesgue integration theory in the ﬁrst place.)
The linearity properties of the unsigned integral induce analogous
linearity properties of the absolutely convergent Lebesgue integral:
Exercise 1.3.19 (Integration is linear). Show that integration f →
R
d
f(x) dx is a (complex) linear operation from L
1
(R
d
) to C. In
other words, show that
R
d
f(x) +g(x) dx =
R
d
f(x) dx +
R
d
g(x) dx
and
R
d
cf(x) dx = c
R
d
f(x) dx
for all absolutely integrable f, g : R
d
→ C and complex numbers c.
Also establish the identity
R
d
f(x) dx =
R
d
f(x) dx,
which makes integration not just a linear operation, but a *linear
operation.
Exercise 1.3.20. Show that Exercises 1.3.15, 1.3.16, and 1.3.17 also
hold for complexvalued, absolutely integrable functions rather than
for unsigned measurable functions.
Exercise 1.3.21 (Absolute summability is a special case of absolute
integrability). Let (c
n
)
n∈Z
be a doubly inﬁnite sequence of complex
numbers, and let f : R → C be the function
f(x) :=
¸
n∈Z
c
n
1
[n,n+1)
(x) = c
x
where x is the greatest integer less than x. Show that f is absolutely
integrable if and only if the series
¸
n∈Z
c
n
is absolutely convergent,
in which case one has
R
f(x) dx =
¸
n∈Z
c
n
.
We can localise the absolutely convergent integral to any mea
surable subset E of R
d
. Indeed, if f : E → C is a function, we
say that f is measurable (resp. absolutely integrable) if its extension
1.3. The Lebesgue integral 71
˜
f : R
d
→ C is measurable (resp. absolutely integrable), where
˜
f(x)
is deﬁned to equal f(x) when x ∈ E and zero otherwise, and then we
deﬁne
E
f(x) dx :=
R
d
˜
f(x) dx. Thus, for instance, the absolutely
integrable analogue of Exercise 1.3.17 tells us that
b
a
f(x) dx =
[a,b]
f(x) dx
for any Riemannintegrable f : [a, b] → C.
Exercise 1.3.22. If E, F are disjoint measurable subsets of R
d
, and
f : E ∪ F → C is absolutely integrable, show that
E
f(x) dx =
E∪F
f(x)1
E
(x) dx
and
E
f(x) dx +
F
f(x) dx =
E∪F
f(x) dx.
We will study the properties of the absolutely convergent Lebesgue
integral in more detail in later notes, as a special case of the more
general Lebesgue integration theory on abstract measure spaces. For
now, we record one very basic inequality:
Lemma 1.3.19 (Triangle inequality). Let f ∈ L
1
(R
d
→ C). Then
[
R
d
f(x) dx[ ≤
R
d
[f(x)[ dx.
Proof. If f is realvalued, then [f[ = f
+
+f
−
and the claim is obvious
from (1.12). When f is complexvalued, one cannot argue quite so
simply; a naive mimicking of the realvalued argument would lose a
factor of 2, giving the inferior bound
[
R
d
f(x) dx[ ≤ 2
R
d
[f(x)[ dx.
To do better, we exploit the phase rotation invariance properties of
the absolute value operation and of the integral, as follows. Note that
for any complex number z, one can write [z[ as ze
iθ
for some real θ.
In particular, we have
[
R
d
f(x) dx[ = e
iθ
R
d
f(x) dx =
R
d
e
iθ
f(x) dx
72 1. Measure theory
for some real θ. Taking real parts of both sides, we obtain
[
R
d
f(x) dx[ =
R
d
Re(e
iθ
f(x)) dx.
Since Re(e
iθ
f(x)) ≤ [e
iθ
f(x)[ = [f(x)[, we obtain the claim.
1.3.5. Littlewood’s three principles. Littlewood’s three princi
ples are informal heuristics that convey much of the basic intuition
behind the measure theory of Lebesgue. Brieﬂy, the three principles
are as follows:
(i) Every (measurable) set is nearly a ﬁnite sum of intervals;
(ii) Every (absolutely integrable) function is nearly continuous;
and
(iii) Every (pointwise) convergent sequence of functions is nearly
uniformly convergent.
Various manifestations of the ﬁrst principle were given in Exercise
1.2.7 and Exercise 1.2.16. Now we turn to the second principle. Deﬁne
a step function to be a ﬁnite linear combination of indicator functions
1
B
of boxes B.
Theorem 1.3.20 (Approximation of L
1
functions). Let f ∈ L
1
(R
d
)
and ε > 0.
(i) There exists an absolutely integrable simple function g such
that f −g
L
1
(R
d
)
≤ ε.
(ii) There exists a step function g such that f −g
L
1
(R
d
)
≤ ε.
(iii) There exists a continuous, compactly supported g such that
f −g
L
1
(R
d
)
≤ ε.
To put things another way, the absolutely integrable simple func
tions, the step functions, and the continuous, compactly supported
functions are all dense subsets of L
1
(R
d
) with respect to the L
1
(R
d
)
(semi)metric. In '1.13 of An epsilon of room, Vol. I it is shown that
a similar statement holds if one replaces continuous, compactly sup
ported functions with smooth, compactly supported functions, also
known as test functions; this is an important fact for the theory of
distributions.
1.3. The Lebesgue integral 73
Proof. We begin with part (i). When f is unsigned, we see from the
deﬁnition of the lower Lebesgue integral that there exists an unsigned
simple function g such that g ≤ f (so, in particular, g is absolutely
integrable) and
R
d
g(x) dx ≥
R
d
f(x) dx −ε,
which by linearity implies that f −g
L
1
(R
d
)
≤ ε. This gives (i) when
f is unsigned. The case when f is realvalued then follows by splitting
f into positive and negative parts (and adjusting ε as necessary), and
the case when f is complexvalued then follows by splitting f into
real and imaginary parts (and adjusting ε yet again).
To establish part (ii), we see from (i) and the triangle inequality
in L
1
that it suﬃces to show this when f is an absolutely integrable
simple function. By linearity (and more applications of the triangle
inequality), it then suﬃces to show this when f = 1
E
is the indicator
function of a measurable set E ⊂ R
d
of ﬁnite measure. But then, by
Exercise (1.2.16), such a set can be approximated (up to an error of
measure at most ε) by an elementary set, and the claim follows.
To establish part (iii), we see from (ii) and the argument from
the preceding paragraph that it suﬃces to show this when f = 1
E
is the indicator function of a box. But one can then establish the
claim by direct construction. Indeed, if one makes a slightly larger
box F that contains the closure of E in its interior, but has a volume
at most ε more than that of E, then one can directly construct a
piecewise linear continuous function g supported on F that equals
1 on E (e.g. one can set g(x) = max(1 − Rdist(x, E), 0) for some
suﬃciently large R; one may also invoke Urysohn’s lemma, see '1.10
of An epsilon of room, Vol. I ). It is then clear from construction that
f −g
L
1
(R
d
)
≤ ε as required.
This is not the only way to make Littlewood’s second principle
manifest; we return to this point shortly. For now, we turn to Little
wood’s third principle. We recall three basic ways in which a sequence
f
n
: R
d
→ C of functions can converge to a limit f : R
d
→ C:
(i) (Pointwise convergence) f
n
(x) → f(x) for every x ∈ R
d
.
74 1. Measure theory
(ii) (Pointwise almost everywhere convergence) f
n
(x) → f(x)
for almost every x ∈ R
d
.
(iii) (Uniform convergence) For every ε > 0, there exists N such
that [f
n
(x) −f(x)[ ≤ ε for all n ≥ N and all x ∈ R
d
.
Uniform convergence implies pointwise convergence, which in turn
implies pointwise almost everywhere convergence.
We now add a fourth mode of convergence, that is weaker than
uniform convergence but stronger than pointwise convergence:
Deﬁnition 1.3.21 (Locally uniform convergence). A sequence of
functions f
n
: R
d
→ C converges locally uniformly to a limit f :
R
d
→ C if, for every bounded subset E of R
d
, f
n
converges uni
formly to f on E. In other words, for every bounded E ⊂ R
d
and
every ε > 0, there exists N > 0 such that [f
n
(x) − f(x)[ ≤ ε for all
n ≥ N and x ∈ E.
Remark 1.3.22. At least as far as R
d
is concerned, an equivalent def
inition of local uniform convergence is: f
n
converges locally uniformly
to f if, for every point x
0
∈ R
d
, there exists an open neighbourhood
U of x
0
such that f
n
converges uniformly to f on U. The equivalence
of the two deﬁnitions is immediate from the HeineBorel theorem.
More generally, the adverb “locally” in mathematics is usually used
in this fashion; a propery P is said to hold locally on some domain X
if, for every point x
0
in that domain, there is an open neighbourhood
of x
0
in X on which P holds.
One should caution, though, that on domains on which the Heine
Borel theorem does not hold, the boundedset notion of local uniform
convergence is not equivalent to the openset notion of local uni
form convergence (though, for locally compact spaces, one can recover
equivalence of one replaces “bounded” by “compact”).
Example 1.3.23. The functions x → x/n on R for n = 1, 2, . . .
converge locally uniformly (and hence pointwise) to zero on R, but
do not converge uniformly.
Example 1.3.24. The partial sums
¸
N
n=0
x
n
n!
of the Taylor series
e
x
=
¸
∞
n=0
x
n
n!
converges to e
x
locally uniformly (and hence point
wise) on R, but not uniformly.
1.3. The Lebesgue integral 75
Example 1.3.25. The functions f
n
(x) :=
1
nx
1
x>0
for n = 1, 2, . . .
(with the convention that f
n
(0) = 0) converge pointwise everywhere
to zero, but do not converge locally uniformly.
From the preceding example, we see that pointwise convergence
(either everywhere or almost everywhere) is a weaker concept than
local uniform convergence. Nevertheless, a remarkable theorem of
Egorov, which demonstrates Littlewood’s third principle, asserts that
one can recover local uniform convergence as long as one is willing to
delete a set of small measure:
Theorem 1.3.26 (Egorov’s theorem). Let f
n
: R
d
→ C be a se
quence of measurable functions that converge pointwise almost every
where to another function f : R
d
→ C, and let ε > 0. Then there
exists a Lebesgue measurable set A of measure at most ε, such that
f
n
converges locally uniformly to f outside of A.
Note that Example 1.3.25 demonstrates that the exceptional set
A in Egorov’s theorem cannot be taken to have zero measure, at least
if one uses the boundedset deﬁnition of local uniform convergence
from Deﬁnition 1.3.21. (If one instead takes the “open neighbour
hood” deﬁnition, then the sequence in Example 1.3.25 does converge
locally uniformly on R`¦0¦ in the open neighbourhood sense, even if it
does not do so in the boundedset sense. On a domain such as R
d
`A,
boundedset locally uniform convergence implies openneighbourhood
locally uniform convergence, but not conversely, so for the purposes
of applying Egorov’s theorem, the distinction is not too important
since one local uniform convergence in both senses.)
Proof. By modifying f
n
and f on a set of measure zero (that can
be absorbed into A at the end of the argument) we may assume that
f
n
converges pointwise everywhere to f, thus for every x ∈ R
d
and
m > 0 there exists N ≥ 0 such that [f
n
(x) − f(x)[ ≤ 1/m for all
n ≥ N. We can rewrite this fact settheoretically as
∞
¸
N=0
E
N,m
= ∅
for each m, where
E
N,m
:= ¦x ∈ R
d
: [f
n
(x) −f(x)[ > 1/m for some n ≥ N¦.
76 1. Measure theory
It is clear that the E
N,m
are Lebesgue measurable, and are decreasing
in N. Applying downward monotone convergence (Exercise 1.2.11(ii))
we conclude that, for any radius R > 0, one has
lim
N→∞
m(E
N,m
∩ B(0, R)) = 0.
(The restriction to the ball B(0, R) is necessary, because the down
ward monotone convergence property only works when the sets in
volved have ﬁnite measure.) In particular, for any m ≥ 1, we can ﬁnd
N
m
such that
m(E
N,m
∩ B(0, m)) ≤
ε
2
m
for all N ≥ N
m
.
Now let A :=
¸
∞
m=1
E
Nm,m
∩ B(0, m). Then A is Lebesgue mea
surable, and by countable subadditivity, m(A) ≤ ε. By construction,
we have
[f
n
(x) −f(x)[ ≤ 1/m
whenever m ≥ 1, x ∈ R
d
`A, [x[ ≤ m, and n ≥ N
m
. In particular,
we see for any ball B(0, m
0
) with an integer radius, f
n
converges
uniformly to f on B(0, m
0
)`A. Since every bounded set is contained
in such a ball, the claim follows.
Remark 1.3.27. Unfortunately, one cannot in general upgrade local
uniform convergence to uniform convergence in Egorov’s theorem. A
basic example here is the moving bump example f
n
:= 1
[n,n+1]
on
R, which “escapes to horizontal inﬁnity”. This sequence converges
pointwise (and locally uniformly) to the zero function f ≡ 0. How
ever, for any 0 < ε < 1 and any n, we have [f
n
(x) −f(x)[ > ε on a set
of measure 1, namely on the interval [n, n + 1]. Thus, if one wanted
f
n
to converge uniformly to f outside of a set A, then that set A has
to contain a set of measure 1. In fact, it must contain the intervals
[n, n + 1] for all suﬃciently large n and must therefore have inﬁnite
measure.
However, if all the f
n
and f were supported on a ﬁxed set E
of ﬁnite measure (e.g. on a ball B(0, R)), then the above “escape to
horizontal inﬁnity” cannot occur, it is easy to see from the above argu
ment that one can recover uniform convergence (and not just locally
uniform convergence) outside of a set of arbitrarily small measure.
1.3. The Lebesgue integral 77
We now use Theorem 1.3.20 to give another version of Little
wood’s second principle, known as Lusin’s theorem:
Theorem 1.3.28 (Lusin’s theorem). Let f : R
d
→ C be absolutely
integrable, and let ε > 0. Then there exists a Lebesgue measurable set
E ⊂ R
d
of measure at most ε such that the restriction of f to the
complementary set R
d
`E is continuous on that set.
Caution: this theorem does not imply that the unrestricted func
tion f is continuous on R
d
`E. For instance, the absolutely integrable
function 1
Q
: R → C is nowhere continuous, so is certainly not con
tinuous on R`E for any E of ﬁnite measure; but on the other hand,
if one deletes the measure zero set E := Q from the reals, then the
restriction of f to R`E is identically zero and thus continuous.
Proof. By Theorem 1.3.20, for any n ≥ 1 one can ﬁnd a continuous,
compactly supported function f
n
such that f − f
n

L
1
(R
d
)
≤ ε/4
n
(say). By Markov’s inequality (Lemma 1.3.15), that implies that
[f(x)−f
n
(x)[ ≤ 1/2
n−1
for all x outside of a Lebesgue measurable set
A
n
of measure at most ε/2
n+1
. Letting A :=
¸
∞
n=1
A
n
, we conclude
that A is Lebesgue measurable with measure at most ε/2, and f
n
converges uniformly to f outside of A. But the uniform limit of
continuous functions is continuous, and the same is true for local
uniform limits (because continuity is itself a local property). We
conclude that the restriction f to R
d
`E is continuous, as required.
Exercise 1.3.23. Show that the hypothesis that f is absolutely in
tegrable in Lusin’s theorem can be relaxed to being locally absolutely
integrable (i.e. absolutely integrable on every bounded set), and then
relaxed further to that of being measurable (but still ﬁnite everywhere
or almost everywhere). (To achieve the latter goal, one can replace
f locally with a horizontal truncation f1
]f]≤n
; alternatively, one can
replace f with a bounded variant, such as
f
(1+]f]
2
)
1/2
.)
Exercise 1.3.24. Show that a function f : R
d
→ C is measurable if
and only if it is the pointwise almost everywhere limit of continuous
functions f
n
: R
d
→ C. (Hint: if f : R
d
→ C is measurable and
n ≥ 1, show that there exists a continuous function f
n
: R
d
→ C for
which the set ¦x ∈ B(0, n) : [f(x) − f
n
(x)[ ≥ 1/n¦ has measure at
78 1. Measure theory
most
1
2
n
. You may ﬁnd Exercise 1.3.25 below to be useful for this.)
Use this (and Egorov’s theorem, Theorem 1.3.26) to give an alternate
proof of Lusin’s theorem for arbitrary measurable functions.
Remark 1.3.29. This is a trivial but important remark: when deal
ing with unsigned measurable functions such as f : R
d
→ [0, +∞],
then Lusin’s theorem does not apply directly because f could be in
ﬁnite on a set of positive measure, which is clearly in contradiction
with the conclusion of Lusin’s theorem (unless one allows the contin
uous function to also take values in the extended nonnegative reals
[0, +∞] with the extended topology). However, if one knows already
that f is almost everywhere ﬁnite (which is for instance the case when
f is absolutely integrable), then Lusin’s theorem applies (since one
can simply zero out f on the null set where it is inﬁnite, and add that
null set to the exceptional set of Lusin’s theorem).
Remark 1.3.30. By combining Lusin’s theorem with inner regularity
(Exercise 1.2.15) and the Tietze extension theorem (see '1.10 of An
epsilon of room, Vol. I ), one can conclude that every measurable
function f : R
d
→ C agrees (outside of a set of arbitrarily small
measure) with a continuous function g : R
d
→ C.
Exercise 1.3.25 (Littlewoodlike principles). The following facts are
not, strictly speaking, instances of any of Littlewood’s three princi
ples, but are in a similar spirit.
(i) (Absolutely integrable functions almost have bounded sup
port) Let f : R
d
→ C be an absolutely integrable function,
and let ε > 0. Show that there exists a ball B(0, R) outside
of which f has an L
1
norm of at most ε, or in other words
that
R
d
\B(0,R)
[f(x)[ dx ≤ ε.
(ii) (Measurable functions are almost locally bounded) Let f :
R
d
→ C be a measurable function supported on a set of
ﬁnite measure, and let ε > 0. Show that there exists a
measurable set E ⊂ R
d
of measure at most ε outside of
which f is locally bounded, or in other words that for every
R > 0 there exists M < ∞ such that [f(x)[ ≤ M for all
x ∈ B(0, R)`E.
1.4. Abstract measure spaces 79
As with Remark 1.3.29, it is important in the second part of the
exercise that f is known to be ﬁnite everywhere (or at least almost
everywhere); the result would of course fail if f was, say, unsigned
but took the value +∞ on a set of positive measure.
1.4. Abstract measure spaces
Thus far, we have only focused on measure and integration theory in
the context of Euclidean spaces R
d
. Now, we will work in a more
abstract and general setting, in which the Euclidean space R
d
is re
placed by a more general space X.
It turns out that in order to properly deﬁne measure and integra
tion on a general space X, it is not enough to just specify the set X.
One also needs to specify two additional pieces of data:
(i) A collection B of subsets of X that one is allowed to measure;
and
(ii) The measure µ(E) ∈ [0, +∞] one assigns to each measurable
set E ∈ B.
For instance, Lebesgue measure theory covers the case when X is
a Euclidean space R
d
, B is the collection B = /[R
d
] of all Lebesgue
measurable subsets of R
d
, and µ(E) is the Lebesgue measure µ(E) =
m(E) of E.
The collection B has to obey a number of axioms (e.g. being
closed with respect to countable unions) that make it a σalgebra,
which is a stronger variant of the more wellknown concept of a boolean
algebra. Similarly, the measure µ has to obey a number of axioms
(most notably, a countable additivity axiom) in order to obtain a
measure and integration theory comparable to the Lebesgue theory
on Euclidean spaces. When all these axioms are satisﬁed, the triple
(X, B, µ) is known as a measure space. These play much the same role
in abstract measure theory that metric spaces or topological spaces
play in abstract pointset topology, or that vector spaces play in ab
stract linear algebra.
On any measure space, one can set up the unsigned and absolutely
convergent integrals in almost exactly the same way as was done in
80 1. Measure theory
the previous notes for the Lebesgue integral on Euclidean spaces,
although the approximation theorems are largely unavailable at this
level of generality due to the lack of such concepts as “elementary set”
or “continuous function” for an abstract measure space. On the other
hand, one does have the fundamental convergence theorems for the
subject, namely Fatou’s lemma, the monotone convergence theorem
and the dominated convergence theorem, and we present these results
here.
One question that will not be addressed much in this section is
how one actually constructs interesting examples of measures. We will
return to this issue in Section 1.7 (although one of the most powerful
tools for such constructions, namely the Riesz representation theorem,
will not be covered here, but instead in '1.10 of An epsilon of room,
Vol. I ).
1.4.1. Boolean algebras. We begin by recalling the concept of a
Boolean algebra.
Deﬁnition 1.4.1 (Boolean algebras). Let X be a set. A (concrete)
Boolean algebra on X is a collection B of X which obeys the following
properties:
(i) (Empty set) ∅ ∈ B.
(ii) (Complement) If E ∈ B, then the complement E
c
:= X`E
also lies in B.
(iii) (Finite unions) If E, F ∈ B, then E ∪ F ∈ B.
We sometimes say that E is Bmeasurable, or measurable with respect
to B, if E ∈ B.
Given two Boolean algebras B, B
/
on X, we say that B
/
is ﬁner
than, a subalgebra of, or a reﬁnement of B, or that B is coarser than
or a coarsening of B
/
, if B ⊂ B
/
.
We have chosen a “minimalist” deﬁnition of a Boolean algebra,
in which one is only assumed to be closed under two of the basic
Boolean operations, namely complement and ﬁnite union. However,
by using the laws of Boolean algebra (such as de Morgan’s laws),
it is easy to see that a Boolean algebra is also closed under other
1.4. Abstract measure spaces 81
Boolean algebra operations such as intersection E∩F, set diﬀererence
E`F, and symmetric diﬀerence E∆F. So one could have placed these
additional closure properties inside the deﬁnition of a Boolean algebra
without any loss of generality. However, when we are verifying that a
given collection B of sets is indeed a Boolean algebra, it is convenient
to have as minimal a set of axioms as possible.
Remark 1.4.2. One can also consider abstract Boolean algebras B,
which do not necessarily live in an ambient domain X, but for which
one has a collection of abstract Boolean operations such as meet ∧
and join ∨ instead of the concrete operations of intersection ∩ and
union ∪. We will not take this abstract perspective here, but see
'2.3 of An epsilon of room, Vol. I for some further discussion of the
relationship between concrete and abstract Boolean algebras, which
is codiﬁed by Stone’s theorem.
Example 1.4.3 (Trivial and discrete algebra). Given any set X, the
coarsest Boolean algebra is the trivial algebra ¦∅, X¦, in which the
only measurable sets are the empty set and the whole set. The ﬁnest
Boolean algebra is the discrete algebra 2
X
:= ¦E : E ⊂ X¦, in which
every set is measurable. All other Boolean algebras are intermediate
between these two extremes: ﬁner than the trivial algebra, but coarser
than the discrete one.
Exercise 1.4.1 (Elementary algebra). Let c[R
d
] be the collection of
those sets E ⊂ R
d
that are either elementary sets, or coelementary
sets (i.e. the complement of an elementary set). Show that c[R
d
] is
a Boolean algebra. We will call this algebra the elementary Boolean
algebra of R
d
.
Example 1.4.4 (Jordan algebra). Let .[R
d
] be the collection of sub
sets of R
d
that are either Jordan measurable or coJordan measurable
(i.e. the complement of a Jordan measurable set). Then .[R
d
] is a
Boolean algebra that is ﬁner than the elementary algebra. We refer to
this algebra as the Jordan algebra on R
d
(but caution that there is a
completely diﬀerent concept of a Jordan algebra in abstract algebra.)
Example 1.4.5 (Lebesgue algebra). Let /[R
d
] be the collection of
Lebesgue measurable subsets of R
d
. Then /[R
d
] is a Boolean algebra
82 1. Measure theory
that is ﬁner than the Jordan algebra; we refer to this as the Lebesgue
algebra on R
d
.
Example 1.4.6 (Null algebra). Let A(R
d
) be the collection of sub
sets of R
d
that are either Lebesgue null sets or Lebesgue conull sets
(the complement of null sets). Then A(R
d
) is a Boolean algebra that
is coarser than the Lebesgue algebra; we refer to it as the null algebra
on R
d
.
Exercise 1.4.2 (Restriction). Let B be a Boolean algebra on a set
X, and let Y be a subset of X (not necessarily Bmeasurable). Show
that the restriction B
Y
:= ¦E ∩ Y : E ∈ B¦ of B to Y is a Boolean
algebra on Y . If Y is Bmeasurable, show that
B
Y
= B ∩ 2
Y
= ¦E ⊂ Y : E ∈ B¦.
Example 1.4.7 (Atomic algebra). Let X be partitioned into a union
X =
¸
α∈I
A
α
of disjoint sets A
α
, which we refer to as atoms. Then
this partition generates a Boolean algebra /((A
α
)
α∈I
), deﬁned as the
collection of all the sets E of the form E =
¸
α∈J
A
α
for some J ⊂ I,
i.e. /((A
α
)
α∈I
) is the collection of all sets that can be represented as
the union of one or more atoms. This is easily veriﬁed to be a Boolean
algebra, and we refer to it as the atomic algebra with atoms (A
α
)
α∈I
.
The trivial algebra corresponds to the trivial partition X = X into
a single atom; at the other extreme, the discrete algebra corresponds
to the discrete partition X =
¸
x∈X
¦x¦ into singleton atoms. More
generally, note that ﬁner (resp. coarser) partitions lead to ﬁner (resp.
coarser) atomic algebra. In this deﬁnition, we permit some of the
atoms in the partition to be empty; but it is clear that empty atoms
have no impact on the ﬁnal atomic algebra, and so without loss of
generality one can delete all empty atoms and assume that all atoms
are nonempty if one wishes.
Example 1.4.8 (Dyadic algebras). Let n be an integer. The dyadic
algebra T
n
(R
d
) at scale 2
−n
in R
d
is deﬁned to be the atomic algebra
generated by the halfopen dyadic cubes
¸
i
1
2
n
,
i
1
+ 1
2
n
. . .
¸
i
d
2
n
,
i
d
+ 1
2
n
1.4. Abstract measure spaces 83
of length 2
−n
(see Exercise 1.1.14). These are Boolean algebras which
are increasing in n: T
n+1
⊃ T
n
. Draw a diagram to indicate how
these algebras sit in relation to the elementary, Jordan, and Lebesgue,
null, discrete, and trivial algebras.
Remark 1.4.9. The dyadic algebras are analogous to the ﬁnite reso
lution one has on modern computer monitors, which subdivide space
into square pixels. A low resolution monitor (in which each pixel has
a large size) can only resolve a very small set of “blocky” images, as
opposed to the larger class of images that can be resolved by a ﬁner
resolution monitor.
Exercise 1.4.3. Show that the nonempty atoms of an atomic al
gebra are determined up to relabeling. More precisely, show that if
X =
¸
α∈I
A
α
=
¸
α
∈I
A
/
α
are two partitions of X into nonempty
atoms A
α
, A
/
α
, then /((A
α
)
α∈I
) = /((A
/
α
)
α
∈I
) if and only if exists
a bijection φ : I → I
/
such that A
/
φ(α)
= A
α
for all α ∈ I.
While many Boolean algebras are atomic, many are not, as the
following two exercises indicate.
Exercise 1.4.4. Show that every ﬁnite Boolean algebra is an atomic
algebra. (A Boolean algebra B is ﬁnite if its cardinality is ﬁnite,
i.e. there are only ﬁnitely many measurable sets.) Conclude that
every ﬁnite Boolean algebra has a cardinality of the form 2
n
for some
natural number n. From this exercise and Exercise 1.4.3 we see that
there is a onetoone correspondence between ﬁnite Boolean algebras
on X and ﬁnite partitions of X into nonempty sets (up to relabeling).
Exercise 1.4.5. Show that the elementary, Jordan, Lebesgue, and
null algebras are not atomic algebras. (Hint: argue by contradiction.
If these algebras were atomic, what must the atoms be?)
Now we describe some further ways to generate Boolean algebras.
Exercise 1.4.6 (Intersection of algebras). Let (B
α
)
α∈I
be a family
of Boolean algebras on a set X, indexed by a (possibly inﬁnite or
uncountable) label set I. Show that the intersection
α∈I
B
α
:=
¸
α∈I
B
α
of these algebras is still a Boolean algebra, and is the ﬁnest
84 1. Measure theory
Boolean algebra that is coarser than all of the B
α
. (If I is empty, we
adopt the convention that
α∈I
B
α
is the discrete algebra.)
Deﬁnition 1.4.10 (Generation of algebras). Let T be any family
of sets in X. We deﬁne 'T`
bool
to be the intersection of all the
Boolean algebras that contain T, which is again a Boolean algebra by
Exercise 1.4.6. Equivalently, 'T`
bool
is the coarsest Boolean algebra
that contains T. We say that 'T`
bool
is the Boolean algebra generated
by T.
Example 1.4.11. T is a Boolean algebra if and only if 'T`
bool
= T;
thus each Boolean algebra is generated by itself.
Exercise 1.4.7. Show that the elementary algebra c(R
d
) is gener
ated by the collection of boxes in R
d
.
Exercise 1.4.8. Let n be a natural number. Show that if T is a
ﬁnite collection of n sets, then 'T`
bool
is a ﬁnite Boolean algebra
of cardinality at most 2
2
n
(in particular, ﬁnite sets generate ﬁnite
algebras). Give an example to show that this bound is best possible.
(Hint: for the latter, it may be convenient to use a discrete ambient
space such as the discrete cube X = ¦0, 1¦
n
.)
The Boolean algebra 'T`
bool
can be described explicitly in terms
of T as follows:
Exercise 1.4.9 (Recursive description of a generated Boolean al
gebra). Let T be a collection of sets in a set X. Deﬁne the sets
T
0
, T
1
, T
2
, . . . recursively as follows:
(i) T
0
:= T.
(ii) For each n ≥ 1, we deﬁne T
n
to be the collection of all
sets that either the union of a ﬁnite number of sets in T
n−1
(including the empty union ∅), or the complement of such a
union.
Show that 'T`
bool
=
¸
∞
n=0
T
n
.
1.4. Abstract measure spaces 85
1.4.2. σalgebras and measurable spaces. In order to obtain a
measure and integration theory that can cope well with limits, the
ﬁnite union axiom of a Boolean algebra is insuﬃcient, and must be
improved to a countable union axiom:
Deﬁnition 1.4.12 (Sigma algebras). Let X be a set. A σalgebra
on X is a collection B of X which obeys the following properties:
(i) (Empty set) ∅ ∈ B.
(ii) (Complement) If E ∈ B, then the complement E
c
:= X`E
also lies in B.
(iii) (Countable unions) If E
1
, E
2
, . . . ∈ B, then
¸
∞
n=1
E
n
∈ B.
We refer to the pair (X, B) of a set X together with a σalgebra on
that set as a measurable space.
Remark 1.4.13. The preﬁx σ usually denotes “countable union”.
Other instances of this preﬁx include a σcompact topological space (a
countable union of compact sets), a σﬁnite measure space (a count
able union of sets of ﬁnite measure), or F
σ
set (a countable union of
closed sets) for other instances of this preﬁx.
From de Morgan’s law (which is just as valid for inﬁnite unions
and intersections as it is for ﬁnite ones), we see that σalgebras are
closed under countable intersections as well as countable unions.
By padding a ﬁnite union into a countable union by using the
empty set, we see that every σalgebra is automatically a Boolean al
gebra. Thus, we automatically inherit the notion of being measurable
with respect to a σalgebra, or of one σalgebra being coarser or ﬁner
than another.
Exercise 1.4.10. Show that all atomic algebras are σalgebras. In
particular, the discrete algebra and trivial algebra are σalgebras, as
are the ﬁnite algebras and the dyadic algebras on Euclidean spaces.
Exercise 1.4.11. Show that the Lebesgue and null algebras are σ
algebras, but the elementary and Jordan algebras are not.
Exercise 1.4.12. Show that any restriction B
Y
of a σalgebra B to
a subspace Y of X (as deﬁned in Exercise 1.4.2) is again a σalgebra
on the subspace Y .
86 1. Measure theory
There is an exact analogue of Exercise 1.4.6:
Exercise 1.4.13 (Intersection of σalgebras). Show that the inter
section
α∈I
B
α
:=
¸
α∈I
B
α
of an arbitrary (and possibly inﬁnite or
uncountable) number of σalgebras B
α
is again a σalgebra, and is
the ﬁnest σalgebra that is coarser than all of the B
α
.
Similarly, we have a notion of generation:
Deﬁnition 1.4.14 (Generation of σalgebras). Let T be any family
of sets in X. We deﬁne 'T` to be the intersection of all the σalgebras
that contain T, which is again a σalgebra by Exercise 1.4.13. Equiv
alently, 'T` is the coarsest σalgebra that contains T. We say that
'T` is the σalgebra generated by T.
Since every σalgebra is a Boolean algebra, we have the trivial
inclusion
'T`
bool
⊂ 'T`.
However, equality need not hold; it only holds if and only if 'T`
bool
is a σalgebra. For instance, if T is the collection of all boxes in
R
d
, then 'T`
bool
is the elementary algebra (Exercise 1.4.7), but 'T`
cannot equal this algebra, as it is not a σalgebra.
Remark 1.4.15. From the deﬁnitions, it is clear that we have the
following principle, somewhat analogous to the principle of math
ematical induction: if T is a family of sets in X, and P(E) is a
property of sets E ⊂ X which obeys the following axioms:
(i) P(∅) is true.
(ii) P(E) is true for all E ∈ T.
(iii) If P(E) is true for some E ⊂ X, then P(X`E) is true also.
(iv) If E
1
, E
2
, . . . ⊂ X are such that P(E
n
) is true for all n, then
P(
¸
∞
n=1
E
n
) is true also.
Then one can conclude that P(E) is true for all E ∈ 'T`. Indeed,
the set of all E for which P(E) holds is a σalgebra that contains T,
whence the claim. This principle is particularly useful for establishing
properties of Borel measurable sets (see below).
We now turn to an important example of a σalgebra:
1.4. Abstract measure spaces 87
Deﬁnition 1.4.16 (Borel σalgebra). Let X be a metric space, or
more generally a topological space. The Borel σalgebra B[X] of X
is deﬁned to be the σalgebra generated by the open subsets of X.
Elements of B[X] will be called Borel measurable.
Thus, for instance, the Borel σalgebra contains the open sets,
the closed sets (which are complements of open sets), the countable
unions of closed sets (known as F
σ
sets), the countable intersections
of open sets (known as G
δ
sets), the countable intersections of F
σ
sets, and so forth.
In R
d
, every open set is Lebesgue measurable, and so we see that
the Borel σalgebra is coarser than the Lebesgue σalgebra. We will
shortly see, though, that the two σalgebras are not equal.
We deﬁned the Borel σalgebra to be generated by the open sets.
However, they are also generated by several other sets:
Exercise 1.4.14. Show that the Borel σalgebra B[R
d
] of a Euclidean
set is generated by any of the following collections of sets:
(i) The open subsets of R
d
.
(ii) The closed subsets of R
d
.
(iii) The compact subsets of R
d
.
(iv) The open balls of R
d
.
(v) The boxes in R
d
.
(vi) The elementary sets in R
d
.
(Hint: To show that two families T, T
/
of sets generate the same
σalgebra, it suﬃces to show that every σalgebra that contains T,
contains T
/
also, and conversely.)
There is an analogue of Exercise 1.4.9, which illustrates the ex
tent to which a generated σalgebra is “larger” than the analogous
generated Boolean algebra:
Exercise 1.4.15 (Recursive description of a generated σalgebra).
(This exercise requires familiarity with the theory of ordinals, which
is reviewed in '2.4 of An epsilon of room, Vol. I. Recall that we
are assuming the axiom of choice throughout this text.) Let T be
88 1. Measure theory
a collection of sets in a set X, and let ω
1
be the ﬁrst uncountable
ordinal. Deﬁne the sets T
α
for every countable ordinal α ∈ ω
1
via
transﬁnite induction as follows:
(i) T
α
:= T.
(ii) For each countable successor ordinal α = β + 1, we deﬁne
T
α
to be the collection of all sets that either the union of
an at most countable number of sets in T
n−1
(including the
empty union ∅), or the complement of such a union.
(iii) For each countable limit ordinal α = sup
β<α
β, we deﬁne
T
α
:=
¸
β<α
T
β
.
Show that 'T` =
¸
α∈ω1
T
α
.
Remark 1.4.17. The ﬁrst uncountable ordinal ω
1
will make several
further cameo appearances here and in An epsilon of room, Vol. I,
for instance by generating counterexamples to various plausible state
ments in pointset topology. In the case when T is the collection of
open sets in a topological space, so that 'T`, then the sets T
α
are
essentially the Borel hierarchy (which starts at the open and closed
sets, then moves on to the F
σ
and G
δ
sets, and so forth); these play
an important role in descriptive set theory.
Exercise 1.4.16. (This exercise requires familiarity with the theory
of cardinals.) Let T be an inﬁnite family of subsets of X of cardinality
κ (thus κ is an inﬁnite cardinal). Show that 'T` has cardinality at
most κ
ℵ0
. (Hint: use Exercise 1.4.15.) In particular, show that the
Borel σalgebra B[R
d
] has cardinality at most c := 2
ℵ0
.
Conclude that there exist Jordan measurable (and hence Lebesgue
measurable) subsets of R
d
which are not Borel measurable. (Hint:
How many subsets of the Cantor set are there?) Use this to place the
Borel σalgebra on the diagram that you drew for Exercise 1.4.8.
Remark 1.4.18. Despite this demonstration that not all Lebesgue
measurable subsets are Borel measurable, it is remarkably diﬃcult
(though not impossible) to exhibit a speciﬁc set that is not Borel
measurable. Indeed, a large majority of the explicitly constructible
sets that one actually encounters in practice tend to be Borel measur
able, and one can view the property of Borel measurability intuitively
1.4. Abstract measure spaces 89
as a kind of “constructibility” property. (Indeed, as a very crude ﬁrst
approximation, one can view the Borel measurable sets as those sets
of “countable descriptive complexity”; in contrast, sets of ﬁnite de
scriptive complexity tend to be Jordan measurable (assuming they
are bounded, of course).
Exercise 1.4.17. Let E, F be Borel measurable subsets of R
d1
, R
d2
respectively. Show that EF is a Borel measurable subset of R
d1+d2
.
(Hint: ﬁrst establish this in the case when F is a box, by using
Remark 1.4.15. To obtain the general case, apply Remark 1.4.15 yet
again.)
The above exercise has a partial converse:
Exercise 1.4.18. Let E be a Borel measurable subset of R
d1+d2
.
(i) Show that for any x
1
∈ R
d1
, the slice ¦x
2
∈ R
d2
: (x
1
, x
2
) ∈
E¦ is a Borel measurable subset of R
d2
. Similarly, show
that for every x
2
∈ R
d2
, the slice ¦x
1
∈ R
d1
: (x
1
, x
2
) ∈ E¦
is a Borel measurable subset of R
d1
.
(ii) Give a counterexample to show that this claim is not true
if “Borel” is replaced with “Lebesgue” throughout. (Hint:
the Cartesian product of any set with a point is a null set,
even if the ﬁrst set was not measurable.)
Exercise 1.4.19. Show that the Lebesgue σalgebra on R
d
is gener
ated by the union of the Borel σalgebra and the null σalgebra.
1.4.3. Countably additive measures and measure spaces. Hav
ing set out the concept of a σalgebra a measurable space, we now
endow these structures with a measure.
We begin with the ﬁnitely additive theory, although this theory
is too weak for our purposes and will soon be supplanted by the
countably additive theory.
Deﬁnition 1.4.19 (Finitely additive measure). Let B be a Boolean
algebra on a space X. An (unsigned) ﬁnitely additive measure µ on
B is a map µ : B → [0, +∞] that obeys the following axioms:
(i) (Empty set) µ(∅) = 0.
90 1. Measure theory
(ii) (Finite additivity) Whenever E, F ∈ B are disjoint, then
µ(E ∪ F) = µ(E) +µ(F).
Remark 1.4.20. The empty set axiom is needed in order to rule out
the degenerate situation in which every set (including the empty set)
has inﬁnite measure.
Example 1.4.21. Lebesgue measure m is a ﬁnitely additive measure
on the Lebesgue σalgebra, and hence on all subalgebras (such as the
null algebra, the Jordan algebra, or the elementary algebra). In par
ticular, Jordan measure and elementary measure are ﬁnitely additive
(adopting the convention that coJordan measurable sets have inﬁ
nite Jordan measure, and coelementary sets have inﬁnite elementary
measure).
On the other hand, as we saw in previous notes, Lebesgue outer
measure is not ﬁnitely additive on the discrete algebra, and Jordan
outer measure is not ﬁnitely additive on the Lebesgue algebra.
Example 1.4.22 (Dirac measure). Let x ∈ X and B be an arbitrary
Boolean algebra on X. Then the Dirac measure δ
x
at x, deﬁned by
setting δ
x
(E) := 1
E
(x), is ﬁnitely additive.
Example 1.4.23 (Zero measure). The zero measure 0 : E → 0 is a
ﬁnitely additive measure on any Boolean algebra.
Example 1.4.24 (Linear combinations of measures). If B is a Boolean
algebra on X, and µ, ν : B → [0, +∞] are ﬁnitely additive measures on
B, then µ+ν : E → µ(E)+ν(E) is also a ﬁnitely additive measure, as
is cµ : E → c µ(E) for any c ∈ [0, +∞]. Thus, for instance, the sum
of Lebesgue measure and a Dirac measure is also a ﬁnitely additive
measure on the Lebesgue algebra (or on any of its subalgebras).
Example 1.4.25 (Restriction of a measure). If B is a Boolean algebra
on X, µ : B → [0, +∞] is a ﬁnitely additive measure, and Y is a B
measurable subset of X, then the restriction µ
Y
: B
Y
→ [0, +∞] of
B to Y , deﬁned by setting µ
Y
(E) := µ(E) whenever E ∈ B
Y
(i.e.
if E ∈ B and E ⊂ Y ), is also a ﬁnitely additive measure.
Example 1.4.26 (Counting measure). If B is a Boolean algebra on
X, then the function # : B → [0, +∞] deﬁned by setting #(E) to be
1.4. Abstract measure spaces 91
the cardinality of E if E is ﬁnite, and #(E) := +∞ if E is inﬁnite, is
a ﬁnitely additive measure, known as counting measure.
As with our deﬁnition of Boolean algebras and σalgebras, we
adopted a “minimalist” deﬁnition so that the axioms are easy to ver
ify. But they imply several further useful properties:
Exercise 1.4.20. Let µ : B → [0, +∞] be a ﬁnitely additive measure
on a Boolean σalgebra B. Establish the following facts:
(i) (Monotonicity) If E, F are Bmeasurable and E ⊂ F, then
µ(E) ≤ µ(F).
(ii) (Finite additivity) If k is a natural number, and E
1
, . . . , E
k
are Bmeasurable and disjoint, then µ(E
1
∪ . . . ∪ E
k
) =
µ(E
1
) +. . . +µ(E
k
).
(iii) (Finite subadditivity) If k is a natural number, and E
1
, . . . , E
k
are Bmeasurable, then µ(E
1
∪ . . . ∪ E
k
) ≤ µ(E
1
) + . . . +
µ(E
k
).
(iv) (Inclusionexclusion for two sets) If E, F are Bmeasurable,
then µ(E ∪ F) +µ(E ∩ F) = µ(E) +µ(F).
(Caution: remember that the cancellation law a+c = b+c =⇒ a = b
does not hold in [0, +∞] if c is inﬁnite, and so the use of cancellation
(or subtraction) should be avoided if possible.)
One can characterise measures completely for any ﬁnite algebra:
Exercise 1.4.21. Let B be a ﬁnite Boolean algebra, generated by
a ﬁnite family A
1
, . . . , A
k
of nonempty atoms. Show that for every
ﬁnitely additive measure µ on B there exists c
1
, . . . , c
k
∈ [0, +∞] such
that
µ(E) =
¸
1≤j≤k:Aj⊂E
c
j
.
Equivalently, if x
j
is a point in A
j
for each 1 ≤ j ≤ k, then
µ =
k
¸
j=1
c
j
δ
xj
.
Furthermore, show that the c
1
, . . . , c
k
are uniquely determined by µ.
92 1. Measure theory
This is about the limit of what one can say about ﬁnitely additive
measures at this level of generality. We now specialise to the countably
additive measures on σalgebras.
Deﬁnition 1.4.27 (Countably additive measure). Let (X, B) be a
measurable space. An (unsigned) countably additive measure µ on
B, or measure for short, is a map µ : B → [0, +∞] that obeys the
following axioms:
(i) (Empty set) µ(∅) = 0.
(ii) (Countable additivity) Whenever E
1
, E
2
, . . . ∈ B are a count
able sequence of disjoint measurable sets, then µ(
¸
∞
n=1
E
n
) =
¸
∞
n=1
µ(E
n
).
A triplet (X, B, µ), where (X, B) is a measurable space and µ : B →
[0, +∞] is a countably additive measure, is known as a measure space.
Note the distinction between a measure space and a measurable
space. The latter has the capability to be equipped with a measure,
but the former is actually equipped with a measure.
Example 1.4.28. Lebesgue measure is a countably additive measure
on the Lebesgue σalgebra, and hence on every subσalgebra (such
as the Borel σalgebra).
Example 1.4.29. The Dirac measures from Exercise 1.4.22 are count
ably additive, as is counting measure.
Example 1.4.30. Any restriction of a countably additive measure
to a measurable subspace is again countably additive.
Exercise 1.4.22 (Countable combinations of measures). Let (X, B)
be a measurable space.
(i) If µ is a countably additive measure on B, and c ∈ [0, +∞],
then cµ is also countably additive.
(ii) If µ
1
, µ
2
, . . . are a sequence of countably additive measures
on B, then the sum
¸
∞
n=1
µ
n
: E →
¸
∞
n=1
µ
n
(E) is also a
countably additive measure.
Note that countable additivity measures are necessarily ﬁnitely
additive (by padding out a ﬁnite union into a countable union using
1.4. Abstract measure spaces 93
the empty set), and so countably additive measures inherit all the
properties of ﬁnitely additive properties, such as monotonicity and
ﬁnite subadditivity. But one also has additional properties:
Exercise 1.4.23. Let (X, B, µ) be a measure space.
(i) (Countable subadditivity) If E
1
, E
2
, . . . are Bmeasurable,
then µ(
¸
∞
n=1
E
n
) ≤
¸
∞
n=1
µ(E
n
).
(ii) (Upwards monotone convergence) If E
1
⊂ E
2
⊂ . . . are B
measurable, then
µ(
∞
¸
n=1
E
n
) = lim
n→∞
µ(E
n
) = sup
n
µ(E
n
).
(iii) (Downwards monotone convergence) If E
1
⊃ E
2
⊃ . . . are
Bmeasurable, and µ(E
n
) < ∞ for at least one n, then
µ(
∞
¸
n=1
E
n
) = lim
n→∞
µ(E
n
) = inf
n
µ(E
n
).
Show that the downward monotone convergence claim can fail if the
hypothesis that µ(E
n
) < ∞ for at least one n is dropped. (Hint:
mimic the solution to Exercise 1.2.11.)
Exercise 1.4.24 (Dominated convergence for sets). Let (X, B, µ)
be a measure space. Let E
1
, E
2
, . . . be a sequence of Bmeasurable
sets that converge to another set E, in the sense that 1
En
converges
pointwise to 1
E
.
(i) Show that E is also Bmeasurable.
(ii) If there exists a Bmeasurable set F of ﬁnite measure (i.e.
µ(F) < ∞) that contains all of the E
n
, show that lim
n→∞
µ(E
n
) =
µ(E). (Hint: Apply downward monotonicity to the sets
¸
n>N
(E
n
∆E).)
(iii) Show that the previous part of this exercise can fail if the
hypothesis that all the E
n
are contained in a set of ﬁnite
measure is omitted.
Exercise 1.4.25. Let X be an at most countable set with the discrete
σalgebra. Show that every measure µ on this measurable space can
94 1. Measure theory
be uniquely represented in the form
µ =
¸
x∈X
c
x
δ
x
for some c
x
∈ [0, +∞], thus
µ(E) =
¸
x∈E
c
x
for all E ⊂ X. (This claim fails in the uncountable case, although
showing this is slightly tricky.)
A useful technical property, enjoyed by some measure spaces, is
that of completeness:
Deﬁnition 1.4.31 (Completeness). A null set of a measure space
(X, B, µ) is deﬁned to be a Bmeasurable set of measure zero. A sub
null set is any subset of a null set. A measure space is said to be
complete if every subnull set is a null set.
Thus, for instance, the Lebesgue measure space (R
d
, /[R
d
], m) is
complete, but the Borel measure space (R
d
, B[R
d
], m) is not (as can
be seen from the solution to Exercise 1.4.16).
Completion is a convenient property to have in some cases, par
ticularly when dealing with properties that hold almost everywhere.
Fortunately, it is fairly easy to modify any measure space to be com
plete:
Exercise 1.4.26 (Completion). Let (X, B, µ) be a measure space.
Show that there exists a unique reﬁnement (X, B, µ), known as the
completion of (X, B, µ), which is the coarsest reﬁnement of (X, B, µ)
that is complete. Furthermore, show that B consists precisely of those
sets that diﬀer from a Bmeasurable set by a Bsubnull set.
Exercise 1.4.27. Show that the Lebesgue measure space (R
d
, /[R
d
], m)
is the completion of the Borel measure space (R
d
, B[R
d
], m).
Exercise 1.4.28 (Approximation by an algebra). Let /be a Boolean
algebra on X, and let µ be a measure on '/`.
(i) If µ(X) < ∞, show that for every E ∈ '/` and ε > 0 there
exists F ∈ / such that µ(E∆F) < ε.
1.4. Abstract measure spaces 95
(ii) More generally, if X =
¸
∞
n=1
A
n
for some A
1
, A
2
, . . . ∈ /
with µ(A
n
) < ∞ for all n, E ∈ '/` has ﬁnite measure, and
ε > 0, show that there exists F ∈ / such that µ(E∆F) < ε.
1.4.4. Measurable functions, and integration on a measure
space. Now we are ready to deﬁne integration on measure spaces.
We ﬁrst need the notion of a measurable function, which is analogous
to that of a continuous function in topology. Recall that a function
f : X → Y between two topological spaces X, Y is continuous if the
inverse image f
−1
(U) of any open set is open. In a similar spirit, we
have
Deﬁnition 1.4.32. Let (X, B) be a measurable space, and let f :
X → [0, +∞] or f : X → C be an unsigned or complexvalued
function. We say that f is measurable if f
−1
(U) is Bmeasurable for
every open subset U of [0, +∞] or C.
From Lemma 1.3.9, we see that this generalises the notion of a
Lebesgue measurable function.
Exercise 1.4.29. Let (X, B) be a measurable space.
(i) Show that a function f : X → [0, +∞] is measurable if and
only if the level sets ¦x ∈ X : f(x) > λ¦ are Bmeasurable.
(ii) Show that an indicator function 1
E
of a set E ⊂ X is mea
surable if and only if E itself is Bmeasurable.
(iii) Show that a function f : X → [0, +∞] or f : X → C is
measurable if and only if f
−1
(E) is Bmeasurable for every
Borelmeasurable subset E of [0, +∞] or C.
(iv) Show that a function f : X → C is measurable if and only
if its real and imaginary parts are measurable.
(v) Show that a function f : X → R is measurable if and only
if the magnitudes f
+
:= max(f, 0), f
−
:= max(−f, 0) of its
positive and negative parts are measurable.
(vi) If f
n
: X → [0, +∞] are a sequence of measurable functions
that converge pointwise to a limit f : X → [0, +∞], then
show that f is also measurable. Obtain the same claim if
[0, +∞] is replaced by C.
96 1. Measure theory
(vii) If f : X → [0, +∞] is measurable and φ : [0, +∞] → [0, +∞]
is continuous, show that φ ◦ f is measurable. Obtain the
same claim if [0, +∞] is replaced by C.
(viii) Show that the sum or product of two measurable functions
in [0, +∞] or C is still measurable.
Remark 1.4.33. One can also view measurable functions in a more
category theoretic fashion. Deﬁne measurable morphism or measur
able map f from one measurable space (X, B) to another (Y, () to
be a function f : X → Y with the property that f
−1
(E) is B
measurable for every (measurable set E. Then a measurable function
f : X → [0, +∞] or f : X → C is the same thing as a measurable
morphism from X to [0, +∞] or C, where the latter is equipped with
the Borel σalgebra. Also, one σalgebra B on a space X is coarser
than another B
/
precisely when the identity map id
X
: X → X is
a measurable morphism from (X, B
/
) to (X, B). The main purpose
of adopting this viewpoint is that it is obvious that the composi
tion of measurable morphisms is again a measurable morphism. This
is important in those ﬁelds of mathematics, such as ergodic theory
(discussed in [Ta2009]), in which one frequently wishes to compose
measurable transformations (and in particular, to compose a trans
formation T : (X, B) → (X, B) with itself repeatedly); but it will not
play a major role in this text.
Measurable functions are particularly easy to describe on atomic
spaces:
Exercise 1.4.30. Let (X, B) be a measurable space that is atomic,
thus B = /((A
α
)
α∈I
) for some partition X =
¸
α∈I
A
α
of X into
disjoint nonempty atoms. Show that a function f : X → [0, +∞] or
f : X → C is measurable if and only if it is constant on each atom,
or equivalently if one has a representation of the form
f =
¸
α∈I
c
α
1
Aα
for some constants c
α
in [0, +∞] or in C as appropriate. Furthermore,
the c
α
are uniquely determined by f.
Exercise 1.4.31 (Egorov’s theorem). Let (X, B, µ) be a ﬁnite mea
sure space (so µ(X) < ∞), and let f
n
: X → C be a sequence of
1.4. Abstract measure spaces 97
measurable functions that converge pointwise almost everywhere to a
limit f : X → C, and let ε > 0. Show that there exists a measurable
set E of measure at most ε such that f
n
converges uniformly to f
outside of E. Give an example to show that the claim can fail when
the measure µ is not ﬁnite.
In Section 1.3 we deﬁned ﬁrst an simple integral, then an un
signed integral, and then ﬁnally an absolutely convergent integral.
We perform the same three stages here. We begin with the simple
integral, which in the abstract setting becomes integration in the case
when the σalgebra is ﬁnite:
Deﬁnition 1.4.34 (Simple integral). Let (X, B, µ) be a measure
space with B ﬁnite. By Exercise 1.4.4, X is partitioned into a ﬁ
nite number of atoms A
1
, . . . , A
n
. If f : X → [0, +∞] is measurable,
then by Exercise 1.4.30 it has a unique representation of the form
f =
n
¸
i=1
c
i
1
Ai
for some c
1
, . . . , c
n
∈ [0, +∞]. We then deﬁne the simple integral
Simp
X
f dµ of f by the formula
Simp
X
f dµ :=
n
¸
i=1
c
i
µ(A
i
).
Note that, thanks to Exercise 1.4.3, the precise decomposition into
atoms does not aﬀect the deﬁnition of the simple integral.
Exercise 1.4.32. Propose a deﬁnition for the simple integral for ab
solutely convergent complexvalued functions on a measurable space
with a ﬁnite σalgebra.
With this deﬁnition, it is clear that one has the monotonicity
property
Simp
X
f dµ ≤ Simp
X
g dµ
whenever f ≤ g are unsigned measurable, as well as the linearity
properties
Simp
X
f +g dµ = Simp
X
f dµ + Simp
X
g dµ
98 1. Measure theory
and
Simp
X
cf dµ = c Simp
X
f dµ
for unsigned measurable f, g and c ∈ [0, +∞]. We also make the
following important technical observation:
Exercise 1.4.33 (Simple integral unaﬀected by reﬁnements). Let
(X, B, µ) be a measure space, and let (X, B
/
, µ
/
) be a reﬁnement of
(X, B, µ), which means that B
/
contains B and µ
/
: B
/
→ [0, +∞]
agrees with µ : B → [0, +∞] on B. Suppose that both B, B
/
are ﬁnite,
and let f : B → [0, +∞] be measurable. Show that
Simp
X
f dµ = Simp
X
f dµ
/
.
This allows one to extend the simple integral to simple functions:
Deﬁnition 1.4.35 (Integral of simple functions). An (unsigned) sim
ple function f : X → [0, +∞] on a measurable space (X, B) is a mea
surable function that takes on ﬁnitely many values a
1
, . . . , a
k
. Note
that such a function is then automatically measurable with respect
to at least one ﬁnite subσalgebra B
/
of B, namely the σalgebra B
/
generated by the preimages f
−1
(¦a
1
¦), . . . , f
−1
(¦a
k
¦) of a
1
, . . . , a
k
.
We then deﬁne the simple integral Simp
X
f dµ by the formula
Simp
X
f dµ := Simp
X
f dµ
B
,
where µ
B
: B
/
→ [0, +∞] is the restriction of µ : B → [0, +∞] to B
/
.
Note that there could be multiple ﬁnite σalgebras with respect
to which f is measurable, but Exercise 1.4.33 guarantees that all such
extensions will give the same simple integral. Indeed, if f were mea
surable with respect to two separate ﬁnite subσalgebras B
/
and B
//
of B, then it would also be measurable with respect to their common
reﬁnement B
/
∨B
//
:= 'B
/
∪B
//
`, which is also ﬁnite (by Exercise 1.4.8),
and then by Exercise 1.4.33,
X
f dµ
B
and
X
f dµ
B
are both
equal to
X
f dµ
B
∨B
, and hence equal to each other.
From this we can deduce the following properties of the simple
integral. As with the Lebesgue theory, we say that a property P(x)
of an element x ∈ X of a measure space (X, B, µ) holds µalmost
everywhere if it holds outside of a subnull set.
1.4. Abstract measure spaces 99
Exercise 1.4.34 (Basic properties of the simple integral). Let (X, B, µ)
be a measure space, and let f, g : X → [0, +∞] be simple functions.
(i) (Monotonicity) If f ≤ g pointwise, then Simp
X
f dµ ≤
Simp
X
g dµ.
(ii) (Compatibility with measure) For every Bmeasurable set
E, we have Simp
X
1
E
dµ = µ(E).
(iii) (Homogeneity) For every c ∈ [0, +∞], one has Simp
X
cf dµ =
c Simp
X
f dµ.
(iv) (Finite additivity) Simp
X
(f + g) dµ = Simp
X
f dµ +
Simp
X
g dµ.
(v) (Insensitivity to reﬁnement) If (X, B
/
, µ
/
) is a reﬁnement of
(X, B, µ) (as deﬁned in Exercise 1.4.33), then Simp
X
f dµ =
Simp
X
f dµ
/
.
(vi) (Almost everywhere equivalence) If f(x) = g(x) for µalmost
every x ∈ X, then Simp
X
f dµ = Simp
X
g dµ.
(vii) (Finiteness) Simp
X
f dµ < ∞ if and only if f is ﬁnite
almost everywhere, and is supported on a set of ﬁnite mea
sure.
(viii) (Vanishing) Simp
X
f dµ = 0 if and only if f is zero almost
everywhere.
Exercise 1.4.35 (Inclusionexclusion principle). Let (X, B, µ) be a
measure space, and let A
1
, . . . , A
n
be Bmeasurable sets of ﬁnite mea
sure. Show that
µ
n
¸
i=1
A
i
=
¸
J⊂¦1,...,n¦:J,=∅
(−1)
]J]−1
µ
¸
i∈J
A
i
.
(Hint: Compute Simp
X
(1−
¸
n
i=1
(1−1
Ai
)) dµ in two diﬀerent ways.)
Remark 1.4.36. The simple integral could also be deﬁned on ﬁnitely
additive measure spaces, rather than countably additive ones, and all
the above properties would still apply. However, on a ﬁnitely additive
measure space one would have diﬃculty extending the integral beyond
simple functions, as we will now do.
100 1. Measure theory
From the simple integral, we can now deﬁne the unsigned integral,
in analogy to how the unsigned Lebesgue integral was constructed in
Section 1.3.3.
Deﬁnition 1.4.37. Let (X, B, µ) be a measure space, and let f :
X → [0, +∞] be measurable. Then we deﬁne the unsigned integral
X
f dµ of f by the formula
(1.14)
X
f dµ := sup
0≤g≤f;g simple
Simp
X
g dµ.
Clearly, this deﬁnition generalises Deﬁnition 1.3.13. Indeed, if f :
R
d
→ [0, +∞] is Lebesgue measurable, then
R
d
f(x) dx =
R
d
f dm.
We record some easy properties of this integral:
Exercise 1.4.36 (Easy properties of the unsigned integral). Let
(X, B, µ) be a measure space, and let f, g : X → [0, +∞] be mea
surable.
(i) (Almost everywhere equivalence) If f = g µalmost every
where, then
X
f dµ =
X
g dµ
(ii) (Monotonicity) If f ≤ g µalmost everywhere, then
X
f dµ ≤
X
g dµ.
(iii) (Homogeneity) We have
X
cf dµ = c
X
f dµ for every
c ∈ [0, +∞].
(iv) (Superadditivity) We have
X
(f+g) dµ ≥
X
f dµ+
X
g dµ.
(v) (Compatibility with the simple integral) If f is simple, then
X
f dµ = Simp
X
f dµ.
(vi) (Markov’s inequality) For any 0 < λ < ∞, one has
µ(¦x ∈ X : f(x) ≥ λ¦) ≤
1
λ
X
f dµ.
In particular, if
X
f dµ < ∞, then the sets ¦x ∈ X : f(x) ≥
λ¦ have ﬁnite measure for each λ > 0.
(vii) (Finiteness) If
X
f dµ < ∞, then f(x) is ﬁnite for µalmost
every x.
(viii) (Vanishing) If
X
f dµ = 0, then f(x) is zero for µalmost
every x.
1.4. Abstract measure spaces 101
(ix) (Vertical truncation) We have lim
n→∞
X
min(f, n) dµ =
X
f dµ.
(x) (Horizontal truncation) If E
1
⊂ E
2
⊂ . . . is an increasing
sequence of Bmeasurable sets, then
lim
n→∞
X
f1
En
dµ =
X
f1
∞
n=1
En
dµ.
(xi) (Restriction) If Y is a measurable subset of X, then
X
f1
Y
dµ =
Y
f
Y
dµ
Y
, where f
Y
: Y → [0, +∞] is the restriction of
f : X → [0, +∞] to Y , and the restriction µ
Y
was deﬁned
in Example 1.4.25. We will often abbreviate
Y
f
Y
dµ
Y
(by slight abuse of notation) as
Y
f dµ.
As before, one of the key properties of this integral is its additiv
ity:
Theorem 1.4.38. Let (X, B, µ) be a measure space, and let f, g :
X → [0, +∞] be measurable. Then
X
(f +g) dµ =
X
f dµ +
X
g dµ.
Proof. In view of superadditivity, it suﬃces to establish the subad
ditivity property
X
(f +g) dµ ≤
X
f dµ +
X
g dµ
We establish this in stages. We ﬁrst deal with the case when µ is a
ﬁnite measure (which means that µ(X) < ∞) and f, g are bounded.
Pick an ε > 0, and let f
ε
be f rounded down to the nearest integer
multiple of ε, and f
ε
be f rounded up to the nearest integer multiple.
Clearly, we have the pointwise bounds
f
ε
(x) ≤ f(x) ≤ f
ε
(x)
and
f
ε
(x) −f
ε
(x) ≤ ε.
Since f is bounded, f
ε
and f
ε
are simple. Similarly deﬁne g
ε
, g
ε
. We
then have the pointwise bound
f +g ≤ f
ε
+g
ε
≤ f
ε
+g
ε
+ 2ε,
102 1. Measure theory
hence by Exercise 1.4.36 and the properties of the simple integral,
X
f +g dµ ≤
X
f
ε
+g
ε
+ 2ε dµ
= Simp
X
f
ε
+g
ε
+ 2ε dµ
= Simp
X
f
ε
dµ + Simp
X
g
ε
dµ + 2εµ(X).
From (1.14) we conclude that
X
f +g dµ ≤
X
f dµ +
X
g dµ + 2εµ(X).
Letting ε → 0 and using the assumption that µ(X) is ﬁnite, we obtain
the claim.
Now we continue to assume that µ is a ﬁnite measure, but now
do not assume that f, g are bounded. Then for any natural number
n, we can use the previous case to deduce that
X
min(f, n) + min(g, n) dµ ≤
X
min(f, n) dµ +
X
min(g, n) dµ.
Since min(f +g, n) ≤ min(f, n) + min(g, n), we conclude that
X
min(f +g, n) ≤
X
min(f, n) dµ +
X
min(g, n) dµ.
Taking limits as n → ∞ using vertical truncation, we obtain the
claim.
Finally, we no longer assume that µ is of ﬁnite measure, and also
do not require f, g to be bounded. If either
X
f dµ or
X
g dµ is
inﬁnite, then by monotonicity,
X
f +g dµ is inﬁnite as well, and the
claim follows; so we may assume that
X
f dµ and
X
g dµ are both
ﬁnite. By Markov’s inequality (Exercise 1.4.36(vi)), we conclude that
for each natural number n, the set E
n
:= ¦x ∈ X : f(x) >
1
n
¦ ∪ ¦x ∈
X : g(x) >
1
n
¦ has ﬁnite measure. These sets are increasing in n, and
f, g, f +g are supported on
¸
∞
n=1
E
n
, and so by horizontal truncation
X
(f +g) dµ = lim
n→∞
X
(f +g)1
En
dµ.
1.4. Abstract measure spaces 103
From the previous case, we have
X
(f +g)1
En
dµ ≤
X
f1
En
dµ +
X
g1
En
dµ.
Letting n → ∞ and using horizontal truncation we obtain the claim.
Exercise 1.4.37 (Linearity in µ). Let (X, B, µ) be a measure space,
and let f : X → [0, +∞] be measurable.
(i) Show that
X
f d(cµ) = c
X
f dµ for every c ∈ [0, +∞].
(ii) If µ
1
, µ
2
, . . . are a sequence of measures on B, show that
X
f d
∞
¸
n=1
µ
n
=
∞
¸
n=1
X
f dµ
n
.
Exercise 1.4.38 (Change of variables formula). Let (X, B, µ) be a
measure space, and let φ : X → Y be a measurable morphism (as
deﬁned in Remark 1.4.33) from (X, B) to another measurable space
(Y, (). Deﬁne the pushforward φ
∗
µ : ( → [0, +∞] of µ by φ by the
formula φ
∗
µ(E) := µ(φ
−1
(E)).
(i) Show that φ
∗
µ is a measure on (, so that (Y, (, φ
∗
µ) is a
measure space.
(ii) If f : Y → [0, +∞] is measurable, show that
Y
f dφ
∗
µ =
X
(f ◦ φ) dµ.
(Hint: the quickest proof here is via the monotone convergence the
orem (Theorem 1.4.44) below, but it is also possible to prove the
exercise without this theorem.)
Exercise 1.4.39. Let T : R
d
→ R
d
be an invertible linear transfor
mation, and let m be Lebesgue measure on R
d
. Show that T
∗
m =
1
] det T]
m, where the pushforward T
∗
m of m was deﬁned in Exercise
1.4.38.
Exercise 1.4.40 (Sums as integrals). Let X be an arbitrary set
(with the discrete σalgebra), let # be counting measure (see Exercise
1.4.26), and let f : X → [0, +∞] be an arbitrary unsigned function.
104 1. Measure theory
Show that f is measurable with
X
f d# =
¸
x∈X
f(x).
Once one has the unsigned integral, one can deﬁne the absolutely
convergent integral exactly as in the Lebesgue case:
Deﬁnition 1.4.39 (Absolutely convergent integral). Let (X, B, µ)
be a measure space. A measurable function f : X → C is said to be
absolutely integrable if the unsigned integral
f
L
1
(X,B,µ)
:=
X
[f[ dµ
is ﬁnite, and use L
1
(X, B, µ), L
1
(X), or L
1
(µ) to denote the space
of absolutely integrable functions. If f is realvalued and absolutely
integrable, we deﬁne the integral
X
f dµ by the formula
X
f dµ :=
X
f
+
dµ −
X
f
−
dµ
where f
+
:= max(f, 0), f
−
:= max(−f, 0) are the magnitudes of the
positive and negative components of f. If f is complexvalued and
absolutely integrable, we deﬁne the integral
X
f dµ by the formula
X
f dµ :=
X
Re f dµ +i
X
Imf dµ
where the two integrals on the right are interpreted as realvalued in
tegrals. It is easy to see that the unsigned, realvalued, and complex
valued integrals deﬁned in this manner are compatible on their com
mon domains of deﬁnition.
Clearly, this deﬁnition generalises the Deﬁnition 1.3.17.
We record some of the key facts about the absolutely convergent
integral:
Exercise 1.4.41. Let (X, B, µ) be a measure space.
(i) Show that L
1
(X, B, µ) is a complex vector space.
(ii) Show that the integration map f →
X
f dµ is a complex
linear map from L
1
(X, B, µ) to C.
1.4. Abstract measure spaces 105
(iii) Establish the triangle inequality f +g
L
1
(µ)
≤ f
L
1
(µ)
+
g
L
1
(µ)
and the homogeneity property cf
L
1
(µ)
= [c[f
L
1
(µ)
for all f, g ∈ L
1
(X, B, µ) and c ∈ C.
(iv) Show that if f, g ∈ L
1
(X, B, µ) are such that f(x) = g(x)
for µalmost every x ∈ X, then
X
f dµ =
X
g dµ.
(v) If f ∈ L
1
(X, B, µ), and (X, B
/
, µ
/
) is a reﬁnement of (X, B, µ),
then f ∈ L
1
(X, B
/
, µ
/
), and
X
f dµ
/
=
X
f dµ. (Hint: it
is easy to get one inequality. To get the other inequality,
ﬁrst work in the case when f is both bounded and has ﬁ
nite measure support (i.e. is both vertically and horizontally
truncated).)
(vi) Show that if f ∈ L
1
(X, B, µ), then f
L
1
(µ)
= 0 if and only
if f is zero µalmost everywhere.
(vii) If Y ⊂ X is Bmeasurable and f ∈ L
1
(X, B, µ), then f
Y
∈
L
1
(Y, B
Y
, µ
Y
) and
Y
f
Y
dµ
Y
=
X
f1
Y
dµ. As
before, by abuse of notation we write
Y
f dµ for
Y
f
Y
dµ
Y
.
1.4.5. The convergence theorems. Let (X, B, µ) be a measure
space, and let f
1
, f
2
, . . . : X → [0, +∞] be a sequence of measurable
functions. Suppose that as n → ∞, f
n
(x) converges pointwise either
everywhere, or µalmost everywhere, to a measurable limit f. A basic
question in the subject is to determine the conditions under which
such pointwise convergence would imply convergence of the integral:
X
f
n
dµ
?
→
X
f dµ.
To put it another way: when can we ensure that one can interchange
integrals and limits,
lim
n→∞
X
f
n
dµ
?
=
X
lim
n→∞
f
n
dµ?
There are certainly some cases in which one can safely do this:
Exercise 1.4.42 (Uniform convergence on a ﬁnite measure space).
Suppose that (X, B, µ) is a ﬁnite measure space (so µ(X) < ∞),
and f
n
: X → [0, +∞] (resp. f
n
: X → C) are a sequence of un
signed measurable functions (resp. absolutely integrable functions)
106 1. Measure theory
that converge uniformly to a limit f. Show that
X
f
n
dµ converges
to
X
f dµ.
However, there are also cases in which one cannot interchange
limits and integrals, even when the f
n
are unsigned. We give the
three classic examples, all of “moving bump” type, though the way
in which the bump moves varies from example to example:
Example 1.4.40 (Escape to horizontal inﬁnity). Let X be the real
line with Lebesgue measure, and let f
n
:= 1
[n,n+1]
. Then f
n
con
verges pointwise to f := 0, but
R
f
n
(x) dx = 1 does not converge
to
R
f(x) dx = 0. Somehow, all the mass in the f
n
has escaped by
moving oﬀ to inﬁnity in a horizontal direction, leaving none behind
for the pointwise limit f.
Example 1.4.41 (Escape to width inﬁnity). Let X be the real line
with Lebesgue measure, and let f
n
:=
1
n
1
[0,n]
. Then f
n
now converges
uniformly f := 0, but
R
f
n
(x) dx = 1 still does not converge to
R
f(x) dx = 0. Exercise 1.4.42 would prevent this from happening
if all the f
n
were supported in a single set of ﬁnite measure, but the
increasingly wide nature of the support of the f
n
prevents this from
happening.
Example 1.4.42 (Escape to vertical inﬁnity). Let X be the unit
interval [0, 1] with Lebesgue measure (restricted from R), and let
f
n
:= n1
[
1
n
,
2
n
]
. Now, we have ﬁnite measure, and f
n
converges point
wise to f, but no uniform convergence. And again,
[0,1]
f
n
(x) dx = 1
is not converging to
[0,1]
f(x) dx = 0. This time, the mass has es
caped vertically, through the increasingly large values of f
n
.
Remark 1.4.43. From the perspective of timefrequency analysis
(or perhaps more accurately, spacefrequency analysis), these three
escapes are analogous (though not quite identical) to escape to spatial
inﬁnity, escape to zero frequency, and escape to inﬁnite frequency
respectively, thus describing the three diﬀerent ways in which phase
space fails to be compact (if one excises the zero frequency as being
singular).
1.4. Abstract measure spaces 107
However, once one shuts down these avenues of escape to inﬁnity,
it turns out that one can recover convergence of the integral. There
are two major ways to accomplish this. One is to enforce monotonic
ity, which prevents each f
n
from abandoning the location where the
mass of the preceding f
1
, . . . , f
n−1
was concentrated and which thus
shuts down the above three escape scenarios. More precisely, we have
the monotone convergence theorem:
Theorem 1.4.44 (Monotone convergence theorem). Let (X, B, µ) be
a measure space, and let 0 ≤ f
1
≤ f
2
≤ . . . be a monotone non
decreasing sequence of unsigned measurable functions on X. Then we
have
lim
n→∞
X
f
n
dµ =
X
lim
n→∞
f
n
dµ.
Note that in the special case when each f
n
is an indicator function
f
n
= 1
En
, this theorem collapses to the upwards monotone conver
gence property (Exercise 1.4.23(ii)). Conversely, the upwards mono
tone convergence property will play a key role in the proof of this
theorem.
Proof. Write f := lim
n→∞
f
n
= sup
n
f
n
, then f : X → [0, +∞]
is measurable. Since the f
n
are nondecreasing to f, we see from
monotonicity that
X
f
n
dµ are nondecreasing and bounded above
by
X
f dµ, which gives the bound
lim
n→∞
X
f
n
dµ ≤
X
f dµ.
It remains to establish the reverse inequality
X
f dµ ≤ lim
n→∞
X
f
n
dµ.
By deﬁnition, it suﬃces to show that
X
g dµ ≤ lim
n→∞
X
f
n
dµ.
whenever g is a simple function that is bounded pointwise by f. By
vertical truncation we may assume without loss of generality that g
108 1. Measure theory
also is ﬁnite everywhere, then we can write
g =
k
¸
i=1
c
i
1
Ai
for some 0 ≤ c
i
< ∞ and some disjoint Bmeasurable sets A
1
, . . . , A
k
,
thus
X
g dµ =
k
¸
i=1
c
i
µ(A
i
).
Let 0 < ε < 1 be arbitrary. Then we have
f(x) = sup
n
f
n
(x) > (1 −ε)c
i
for all x ∈ A
i
. Thus, if we deﬁne the sets
A
i,n
:= ¦x ∈ A
i
: f
n
(x) > (1 −ε)c
i
¦
then the A
i,n
increase to A
i
and are measurable. By upwards mono
tonicity of measure, we conclude that
lim
n→∞
µ(A
i,n
) = µ(A
i
).
On the other hand, observe the pointwise bound
f
n
≥
k
¸
i=1
(1 −ε)c
i
1
Ai,n
for any n; integrating this, we obtain
X
f
n
dµ ≥ (1 −ε)
k
¸
i=1
c
i
µ(A
i,n
).
Taking limits as n → ∞, we obtain
lim
n→∞
X
f
n
dµ ≥ (1 −ε)
k
¸
i=1
c
i
µ(A
i
);
sending ε → 0 we then obtain the claim.
Remark 1.4.45. It is easy to see that the result still holds if the
monotonicity f
n
≤ f
n+1
only holds almost everywhere rather than
everywhere.
1.4. Abstract measure spaces 109
This has a number of important corollaries. Firstly, we can gen
eralise (part of) Tonelli’s theorem for exchanging sums (see Theorem
0.0.2):
Corollary 1.4.46 (Tonelli’s theorem for sums and integrals). Let
(X, B, µ) be a measure space, and let f
1
, f
2
, . . . : X → [0, +∞] be a
sequence of unsigned measurable functions. Then one has
X
∞
¸
n=1
f
n
dµ =
∞
¸
n=1
X
f
n
dµ.
Proof. Apply the monotone convergence theorem (Theorem 1.4.44)
to the partial sums F
N
:=
¸
N
n=1
f
n
.
Exercise 1.4.43. Give an example to show that this corollary can fail
if the f
n
are assumed to be absolutely integrable rather than unsigned
measurable, even if the sum
¸
∞
n=1
f
n
(x) is absolutely convergent for
each x. (Hint: think about the three escapes to inﬁnity.)
Exercise 1.4.44 (BorelCantelli lemma). Let (X, B, µ) be a measure
space, and let E
1
, E
2
, E
3
, . . . be a sequence of Bmeasurable sets such
that
¸
∞
n=1
µ(E
n
) < ∞. Show that almost every x ∈ X is contained
in at most ﬁnitely many of the E
n
(i.e. ¦n ∈ N : x ∈ E
n
¦ is ﬁnite for
almost every x ∈ X). (Hint: Apply Tonelli’s theorem to the indicator
functions 1
En
.)
Exercise 1.4.45.
(i) Give an alternate proof of the BorelCantelli lemma (Exer
cise 1.4.44) that does not go through any of the convergence
theorems, but instead exploits the more basic properties of
measure from Exercise 1.4.23.
(ii) Give a counterexample that shows that the BorelCantelli
lemma can fail if the condition
¸
∞
n=1
µ(E
n
) < ∞ is relaxed
to lim
n→∞
µ(E
n
) = 0.
Secondly, when one does not have monotonicity, one can at least
obtain an important inequality, known as Fatou’s lemma:
110 1. Measure theory
Corollary 1.4.47 (Fatou’s lemma). Let (X, B, µ) be a measure space,
and let f
1
, f
2
, . . . : X → [0, +∞] be a sequence of unsigned measurable
functions. Then
X
liminf
n→∞
f
n
dµ ≤ liminf
n→∞
X
f
n
dµ.
Proof. Write F
N
:= inf
n≥N
f
n
for each N. Then the F
N
are mea
surable and nondecreasing, and hence by the monotone convergence
theorem (Theorem 1.4.44)
X
sup
N>0
F
N
dµ = sup
N>0
X
F
N
dµ.
By deﬁnition of lim inf, we have sup
N>0
F
N
= liminf
n→∞
f
n
. By
monotonicity, we have
X
F
N
dµ ≤
X
f
n
dµ for all n ≥ N, and thus
X
F
N
dµ ≤ inf
n≥N
X
f
n
dµ.
Hence we have
X
liminf
n→∞
f
n
dµ ≤ sup
N>0
inf
n≥N
X
f
n
dµ.
The claim then follows by another appeal to the deﬁnition of the lim
inferior.
Remark 1.4.48. Informally, Fatou’s lemma tells us that when tak
ing the pointwise limit of unsigned functions f
n
, that mass
X
f
n
dµ
can be destroyed in the limit (as was the case in the three key moving
bump examples), but it cannot be created in the limit. Of course the
unsigned hypothesis is necessary here (consider for instance multiply
ing any of the moving bump examples by −1). While this lemma
was stated only for pointwise limits, the same general principle (that
mass can be destroyed, but not created, by the process of taking lim
its) tends to hold for other “weak” notions of convergence. See '1.9
of An epsilon of room, Vol. I for some examples of this.
Finally, we give the other major way to shut down loss of mass via
escape to inﬁnity, which is to dominate all of the functions involved by
an absolutely convergent one. This result is known as the dominated
convergence theorem:
1.4. Abstract measure spaces 111
Theorem 1.4.49 (Dominated convergence theorem). Let (X, B, µ)
be a measure space, and let f
1
, f
2
, . . . : X → C be a sequence of
measurable functions that converge pointwise µalmost everywhere to
a measurable limit f : X → C. Suppose that there is an unsigned
absolutely integrable function G : X → [0, +∞] such that [f
n
[ are
pointwise µalmost everywhere bounded by G for each n. Then we
have
lim
n→∞
X
f
n
dµ =
X
f dµ.
From the moving bump examples we see that this statement fails
if there is no absolutely integrable dominating function G. The reader
is encouraged to see why, in each of the moving bump examples,
no such dominating function exists, without appealing to the above
theorem. Note also that when each of the f
n
is an indicator function
f
n
= 1
En
, the dominated convergence theorem collapses to Exercise
1.4.24.
Proof. By modifying f
n
, f on a null set, we may assume without loss
of generality that the f
n
converge to f pointwise everywhere rather
than µalmost everywhere, and similarly we can assume that [f
n
are
bounded by Gpointwise everywhere rather than µalmost everywhere.
By taking real and imaginary parts we may assume without loss
of generality that f
n
, f are real, thus −G ≤ f
n
≤ G pointwise. Of
course, this implies that −G ≤ f ≤ G pointwise also.
If we apply Fatou’s lemma (Corollary1.4.47) to the unsigned func
tions f
n
+G, we see that
X
f +G dµ ≤ liminf
n→∞
X
f
n
+G dµ,
which on subtracting the ﬁnite quantity
X
G dµ gives
X
f dµ ≤ liminf
n→∞
X
f
n
dµ.
Similarly, if we apply that lemma to the unsigned functions G − f
n
,
we obtain
X
G−f dµ ≤ liminf
n→∞
X
G−f
n
dµ;
112 1. Measure theory
negating this inequality and then cancelling
X
G dµ again we con
clude that
limsup
n→∞
X
f
n
dµ ≤
X
f dµ.
The claim then follows by combining these inequalities.
Remark 1.4.50. We deduced the dominated convergence theorem
from Fatou’s lemma, and Fatou’s lemma from the monotone conver
gence theorem. However, one can obtain these theorems in a diﬀerent
order, depending on one’s taste, as they are so closely related. For
instance, in [StSk2005], the logic is somewhat diﬀerent; one ﬁrst
obtains the slightly simpler bounded convergence theorem, which is
the dominated convergence theorem under the assumption that the
functions are uniformly bounded and all supported on a single set of
ﬁnite measure, and then uses that to deduce Fatou’s lemma, which in
turn is used to deduce the monotone convergence theorem; and then
the horizontal and vertical truncation properties are used to extend
the bounded convergence theorem to the dominated convergence the
orem. It is instructive to view a couple diﬀerent derivations of these
key theorems to get more of an intuitive understanding as to how
they work.
Exercise 1.4.46. Under the hypotheses of the dominated conver
gence theorem (Theorem 1.4.49), establish also that f
n
−f
L
1 → 0
as n → ∞.
Exercise 1.4.47 (Almost dominated convergence). Let (X, B, µ) be
a measure space, and let f
1
, f
2
, . . . : X → C be a sequence of mea
surable functions that converge pointwise µalmost everywhere to a
measurable limit f : X → C. Suppose that there is an unsigned
absolutely integrable functions G, g
1
, g
2
, . . . : X → [0, +∞] such that
the [f
n
[ are pointwise µalmost everywhere bounded by G + g
n
, and
that
X
g
n
dµ → 0 as n → ∞. Show that
lim
n→∞
X
f
n
dµ =
X
f dµ.
Exercise 1.4.48 (Defect version of Fatou’s lemma). Let (X, B, µ) be
a measure space, and let f
1
, f
2
, . . . : X → [0, +∞] be a sequence of
1.4. Abstract measure spaces 113
unsigned absolutely integrable functions that converges pointwise to
an absolutely integrable limit f. Show that
X
f
n
dµ −
X
f dµ −f −f
n

L
1
(µ)
→ 0
as n → ∞. (Hint: Apply the dominated convergence theorem (The
orem 1.4.49) to min(f
n
, f).) Informally, this tells us that the gap
between the left and right hand sides of Fatou’s lemma can be mea
sured by the quantity f −f
n

L
1
(µ)
.
Exercise 1.4.49. Let (X, B, µ) be a measure space, and let g : X →
[0, +∞] be measurable. Show that the function µ
g
: B → [0, +∞]
deﬁned by the formula
µ
g
(E) :=
X
1
E
g dµ =
E
g dµ
is a measure. (Such measures are studied in greater detail in '1.2 of
An epsilon of room, Vol. I.)
The monotone convergence theorem is, in some sense, a deﬁning
property of the unsigned integral, as the following exercise illustrates.
Exercise 1.4.50 (Characterisation of the unsigned integral). Let
(X, B) be a measurable space. I : f → I(f) be a map from the
space (X, B) of unsigned measurable functions f : X → [0, +∞] to
[0, +∞] that obeys the following axioms:
(i) (Homogeneity) For every f ∈ (X, B) and c ∈ [0, +∞], one
has I(cf) = cI(f).
(ii) (Finite additivity) For every f, g ∈ (X, B), one has I(f +
g) = I(f) +I(g).
(iii) (Monotone convergence) If 0 ≤ f
1
≤ f
2
≤ . . . are a non
decreasing sequence of unsigned measurable functions, then
I(lim
n→∞
f
n
) = lim
n→∞
I(f
n
).
Then there exists a unique measure µ on (X, B) such that I(f) =
X
f dµ for all f ∈ (X, B). Furthermore, µ is given by the formula
µ(E) := I(1
E
) for all Bmeasurable sets E.
114 1. Measure theory
Exercise 1.4.51. Let (X, B, µ) be a ﬁnite measure space (i.e. µ(X) <
∞), and let f : X → R be a bounded function. Suppose that µ is
complete (see Deﬁnition 1.4.31). Suppose that the upper integral
X
f dµ := inf
g≥f;g simple
X
g dµ
and lower integral
X
f dµ := sup
h≤f;h simple
X
h dµ
agree. Show that f is measurable. (This is a converse to Exercise
1.3.11.)
We will continue to see the monotone convergence theorem, Fa
tou’s lemma, and the dominated convergence theorem make an ap
pearance throughout the rest of this text (and in An epsilon of room,
Vol. I ).
1.5. Modes of convergence
If one has a sequence x
1
, x
2
, x
3
, . . . ∈ R of real numbers x
n
, it is
unambiguous what it means for that sequence to converge to a limit
x ∈ R: it means that for every ε > 0, there exists an N such that
[x
n
−x[ ≤ ε for all n > N. Similarly for a sequence z
1
, z
2
, z
3
, . . . ∈ C
of complex numbers z
n
converging to a limit z ∈ C.
More generally, if one has a sequence v
1
, v
2
, v
3
, . . . of ddimensional
vectors v
n
in a real vector space R
d
or complex vector space C
d
, it
is also unambiguous what it means for that sequence to converge
to a limit v ∈ R
d
or v ∈ C
d
; it means that for every ε > 0,
there exists an N such that v
n
− v ≤ ε for all n ≥ N. Here,
the norm v of a vector v = (v
(1)
, . . . , v
(d)
) can be chosen to be
the Euclidean norm v
2
:= (
¸
d
j=1
(v
(j)
)
2
)
1/2
, the supremum norm
v
∞
:= sup
1≤j≤d
[v
(j)
[, or any other number of norms, but for the
purposes of convergence, these norms are all equivalent; a sequence
of vectors converges in the Euclidean norm if and only if it converges
in the supremum norm, and similarly for any other two norms on the
ﬁnitedimensional space R
d
or C
d
.
1.5. Modes of convergence 115
If however one has a sequence f
1
, f
2
, f
3
, . . . of functions f
n
: X →
R or f
n
: X → C on a common domain X, and a putative limit
f : X → R or f : X → C, there can now be many diﬀerent ways in
which the sequence f
n
may or may not converge to the limit f. (One
could also consider convergence of functions f
n
: X
n
→ C on diﬀerent
domains X
n
, but we will not discuss this issue at all here.) This is
contrast with the situation with scalars x
n
or z
n
(which corresponds
to the case when X is a single point) or vectors v
n
(which corresponds
to the case when X is a ﬁnite set such as ¦1, . . . , d¦). Once X be
comes inﬁnite, the functions f
n
acquire an inﬁnite number of degrees
of freedom, and this allows them to approach f in any number of
inequivalent ways.
What diﬀerent types of convergence are there? As an undergrad
uate, one learns of the following two basic modes of convergence:
(i) We say that f
n
converges to f pointwise if, for every x ∈ X,
f
n
(x) converges to f(x). In other words, for every ε > 0
and x ∈ X, there exists N (that depends on both ε and x)
such that [f
n
(x) −f(x)[ ≤ ε whenever n ≥ N.
(ii) We say that f
n
converges to f uniformly if, for every ε > 0,
there exists N such that for every n ≥ N, [f
n
(x) −f(x)[ ≤ ε
for every x ∈ X. The diﬀerence between uniform conver
gence and pointwise convergence is that with the former,
the time N at which f
n
(x) must be permanently εclose to
f(x) is not permitted to depend on x, but must instead be
chosen uniformly in x.
Uniform convergence implies pointwise convergence, but not con
versely. A typical example: the functions f
n
: R → R deﬁned by
f
n
(x) := x/n converge pointwise to the zero function f(x) := 0, but
not uniformly.
However, pointwise and uniform convergence are only two of
dozens of many other modes of convergence that are of importance
in analysis. We will not attempt to exhaustively enumerate these
modes here (but see '1.9 of An epsilon of room, Vol. I ). We will,
however, discuss some of the modes of convergence that arise from
measure theory, when the domain X is equipped with the structure
116 1. Measure theory
of a measure space (X, B, µ), and the functions f
n
(and their limit f)
are measurable with respect to this space. In this context, we have
some additional modes of convergence:
(i) We say that f
n
converges to f pointwise almost everywhere
if, for (µ)almost everywhere x ∈ X, f
n
(x) converges to
f(x).
(ii) We say that f
n
converges to f uniformly almost everywhere,
essentially uniformly, or in L
∞
norm if, for every ε > 0,
there exists N such that for every n ≥ N, [f
n
(x) −f(x)[ ≤ ε
for µalmost every x ∈ X.
(iii) We say that f
n
converges to f almost uniformly if, for every
ε > 0, there exists an exceptional set E ∈ B of measure
µ(E) ≤ ε such that f
n
converges uniformly to f on the
complement of E.
(iv) We say that f
n
converges to f in L
1
norm if the quantity
f
n
−f
L
1
(µ)
=
X
[f
n
(x) −f(x)[ dµ converges to 0 as n →
∞.
(v) We say that f
n
converges to f in measure if, for every ε > 0,
the measures µ(¦x ∈ X : [f
n
(x) − f(x)[ ≥ ε¦) converge to
zero as n → ∞.
Observe that each of these ﬁve modes of convergence is unaﬀected
if one modiﬁes f
n
or f on a set of measure zero. In contrast, the
pointwise and uniform modes of convergence can be aﬀected if one
modiﬁes f
n
or f even on a single point. The L
1
and L
∞
modes of
converges are special cases of the L
p
mode of convergence, which is
discussed in '1.3 of An epsilon of room, Vol. I.
Remark 1.5.1. In the context of probability theory (see Section 2.3),
in which f
n
and f are interpreted as random variables, convergence
in L
1
norm is often referred to as convergence in mean, pointwise con
vergence almost everywhere is often referred to as almost sure conver
gence, and convergence in measure is often referred to as convergence
in probability.
1.5. Modes of convergence 117
Exercise 1.5.1 (Linearity of convergence). Let (X, B, µ) be a mea
sure space, let f
n
, g
n
: X → C be sequences of measurable functions,
and let f, g : X → C be measurable functions.
(i) Show that f
n
converges to f along one of the above seven
modes of convergence if and only if [f
n
− f[ converges to 0
along the same mode.
(ii) If f
n
converges to f along one of the above seven modes of
convergence, and g
n
converges to g along the same mode,
show that f
n
+g
n
converges to f +g along the same mode,
and that cf
n
converges to cf along the same mode for any
c ∈ C.
(iii) (Squeeze test) If f
n
converges to 0 along one of the above
seven modes, and [g
n
[ ≤ f
n
pointwise for each n, show that
g
n
converges to 0 along the same mode.
We have some easy implications between modes:
Exercise 1.5.2 (Easy implications). Let (X, B, µ) be a measure space,
and let f
n
: X → C and f : X → C be measurable functions.
(i) If f
n
converges to f uniformly, then f
n
converges to f point
wise.
(ii) If f
n
converges to f uniformly, then f
n
converges to f in L
∞
norm. Conversely, if f
n
converges to f in L
∞
norm, then
f
n
converges to f uniformly outside of a null set (i.e. there
exists a null set E such that the restriction f
n
X\E
of f
n
to the complement of E converges to the restriction f
X\E
of f).
(iii) If f
n
converges to f in L
∞
norm, then f
n
converges to f
almost uniformly.
(iv) If f
n
converges to f almost uniformly, then f
n
converges to
f pointwise almost everywhere.
(v) If f
n
converges to f pointwise, then f
n
converges to f point
wise almost everywhere.
(vi) If f
n
converges to f in L
1
norm, then f
n
converges to f in
measure.
118 1. Measure theory
(vii) If f
n
converges to f almost uniformly, then f
n
converges to
f in measure.
The reader is encouraged to draw a diagram that summarises the
logical implications between the seven modes of convergence that the
above exercise describes.
We give four key examples that distinguish between these modes,
in the case when X is the real line R with Lebesgue measure. The
ﬁrst three of these examples already were introduced in Section 1.4,
but the fourth is new, and also important.
Example 1.5.2 (Escape to horizontal inﬁnity). Let f
n
:= 1
[n,n+1]
.
Then f
n
converges to zero pointwise (and thus, pointwise almost ev
erywhere), but not uniformly, in L
∞
norm, almost uniformly, in L
1
norm, or in measure.
Example 1.5.3 (Escape to width inﬁnity). Let f
n
:=
1
n
1
[0,n]
. Then
f
n
converges to zero uniformly (and thus, pointwise, pointwise almost
everywhere, in L
∞
norm, almost uniformly, and in measure), but not
in L
1
norm.
Example 1.5.4 (Escape to vertical inﬁnity). Let f
n
:= n1
[
1
n
,
2
n
]
.
Then f
n
converges to zero pointwise (and thus, pointwise almost ev
erywhere) and almost uniformly (and hence in measure), but not
uniformly, in L
∞
norm, or in L
1
norm.
Example 1.5.5 (Typewriter sequence). Let f
n
be deﬁned by the
formula
f
n
:= 1
[
n−2
k
2
k
,
n−2
k
+1
2
k
]
whenever k ≥ 0 and 2
k
≤ n < 2
k+1
. This is a sequence of indicator
functions of intervals of decreasing length, marching across the unit
interval [0, 1] over and over again. Then f
n
converges to zero in
measure and in L
1
norm, but not pointwise almost everywhere (and
hence also not pointwise, not almost uniformly, nor in L
∞
norm, nor
uniformly).
1.5. Modes of convergence 119
Remark 1.5.6. The L
∞
norm f
L
∞
(µ)
of a measurable function
f : X → C is deﬁned to the inﬁmum of all the quantities M ∈ [0, +∞]
that are essential upper bounds for f in the sense that [f(x)[ ≤ M
for almost every x. Then f
n
converges to f in L
∞
norm if and only
if f
n
− f
L
∞
(µ)
→ 0 as n → ∞. The L
∞
and L
1
norms are part of
the larger family of L
p
norms, studied in '1.3 of An epsilon of room,
Vol. I.
One particular advantage of L
1
convergence is that, in the case
when the f
n
are absolutely integrable, it implies convergence of the
integrals,
X
f
n
dµ →
X
f dµ,
as one sees from the triangle inequality. Unfortunately, none of the
other modes of convergence automatically imply this convergence of
the integral, as the above examples show.
The purpose of these notes is to compare these modes of conver
gence with each other. Unfortunately, the relationship between these
modes is not particularly simple; unlike the situation with pointwise
and uniform convergence, one cannot simply rank these modes in a
linear order from strongest to weakest. This is ultimately because
the diﬀerent modes react in diﬀerent ways to the three “escape to
inﬁnity” scenarios described above, as well as to the “typewriter” be
haviour when a single set is “overwritten” many times. On the other
hand, if one imposes some additional assumptions to shut down one
or more of these escape to inﬁnity scenarios, such as a ﬁnite measure
hypothesis µ(X) < ∞ or a uniform integrability hypothesis, then one
can obtain some additional implications between the diﬀerent modes.
1.5.1. Uniqueness. Throughout these notes, (X, B, µ) denotes a
measure space. We abbreviate “µalmost everywhere” as “almost
everywhere” throughout.
Even though the modes of convergence all diﬀer from each other,
they are all compatible in the sense that they never disagree about
which function f a sequence of functions f
n
converges to, outside of
a set of measure zero. More precisely:
120 1. Measure theory
Proposition 1.5.7. Let f
n
: X → C be a sequence of measurable
functions, and let f, g : X → C be two additional measurable func
tions. Suppose that f
n
converges to f along one of the seven modes of
convergence deﬁned above, and f
n
converges to g along another of the
seven modes of convergence (or perhaps the same mode of convergence
as for f). Then f and g agree almost everywhere.
Note that the conclusion is the best one can hope for in the case
of the last ﬁve modes of convergence, since as remarked earlier, these
modes of convergence are unaﬀected if one modiﬁes f or g on a set of
measure zero.
Proof. In view of Exercise 1.5.2, we may assume that f
n
converges
to f either pointwise almost everywhere, or in measure, and simi
larly that f
n
converges to g either pointwise almost everywhere, or in
measure.
Suppose ﬁrst that f
n
converges to both f and g pointwise almost
everywhere. Then by Exercise 1.5.1, 0 converges to f − g pointwise
almost everywhere, which clearly implies that f − g is zero almost
everywhere, and the claim follows. A similar argument applies if f
n
converges to both f and g in measure.
By symmetry, the only remaining case that needs to be consid
ered is when f
n
converges to f pointwise almost everywhere, and f
n
converges to g in measure. We need to show that f = g almost every
where. It suﬃces to show that for every ε > 0, that [f(x) −g(x)[ ≤ ε
for almost every x, as the claim then follows by setting ε = 1/m for
m = 1, 2, 3, . . . and using the fact that the countable union of null
sets is again a null set.
Fix ε > 0, and let A := ¦x ∈ X : [f(x) − g(x)[ > ε¦. This is a
measurable set; our task is to show that it has measure zero. Suppose
for contradiction that µ(A) > 0. We consider the sets
A
N
:= ¦x ∈ A : [f
n
(x) −f(x)[ ≤ ε/2 for all n ≥ N¦.
These are measurable sets that are increasing in N. As f
n
converges
to f almost everywhere, we see that almost every x ∈ A belongs to
at least one of the A
N
, thus
¸
∞
N=1
A
N
is equal to A outside of a null
1.5. Modes of convergence 121
set. In particular,
µ(
∞
¸
N=1
A
N
) > 0.
Applying monotone convergence for sets, we conclude that
µ(A
N
) > 0
for some ﬁnite N. But by the triangle inequality, we have [f
n
(x) −
g(x)[ > ε/2 for all x ∈ A
N
and all n ≥ N. As a consequence, f
n
cannot converge in measure to g, which gives the desired contradic
tion.
1.5.2. The case of a step function. One way to appreciate the
distinctions between the above modes of convergence is to focus on
the case when f = 0, and when each of the f
n
is a step function,
by which we mean a constant multiple f
n
= A
n
1
En
of a measurable
set E
n
. For simplicity we will assume that the A
n
> 0 are positive
reals, and that the E
n
have a positive measure µ(E
n
) > 0. We also
assume the A
n
exhibit one of two modes of behaviour: either the A
n
converge to zero, or else they are bounded away from zero (i.e. there
exists c > 0 such that A
n
≥ c for every n. It is easy to see that if
a sequence A
n
does not converge to zero, then it has a subsequence
that is bounded away from zero, so it does not cause too much loss
of generality to restrict to one of these two cases.
Given such a sequence f
n
= A
n
1
En
of step functions, we now
ask, for each of the seven modes of convergence, what it means for
this sequence to converge to zero along that mode. It turns out that
the answer to question is controlled more or less completely by the
following three quantities:
(i) The height A
n
of the n
th
function f
n
;
(ii) The width µ(E
n
) of the n
th
function f
n
; and
(iii) The N
th
tail support E
∗
N
:=
¸
n≥N
E
n
of the sequence
f
1
, f
2
, f
3
, . . ..
Indeed, we have:
Exercise 1.5.3 (Convergence for step functions). Let the notation
and assumptions be as above. Establish the following claims:
122 1. Measure theory
(i) f
n
converges uniformly to zero if and only if A
n
→ 0 as
n → ∞.
(ii) f
n
converges in L
∞
norm to zero if and only if A
n
→ 0 as
n → ∞.
(iii) f
n
converges almost uniformly to zero if and only if A
n
→ 0
as n → ∞, or µ(E
∗
N
) → 0 as N → ∞.
(iv) f
n
converges pointwise to zero if and only if A
n
→ 0 as
n → ∞, or
¸
∞
N=1
E
∗
N
= ∅.
(v) f
n
converges pointwise almost everywhere to zero if and only
if A
n
→ 0 as n → ∞, or
¸
∞
N=1
E
∗
N
is a null set.
(vi) f
n
converges in measure to zero if and only if A
n
→ 0 as
n → ∞, or µ(E
n
) → 0 as n → ∞.
(vii) f
n
converges in L
1
norm if and only if A
n
µ(E
n
) → 0 as
n → ∞.
To put it more informally: when the height goes to zero, then one
has convergence to zero in all modes except possibly for L
1
conver
gence, which requires that the product of the height and the width
goes to zero. If instead the height is bounded away from zero and
the width is positive, then we never have uniform or L
∞
convergence,
but we have convergence in measure if the width goes to zero, we
have almost uniform convergence if the tail support (which has larger
measure than the width) has measure that goes to zero, we have
pointwise almost everywhere convergence if the tail support shrinks
to a null set, and pointwise convergence if the tail support shrinks to
the empty set.
It is instructive to compare this exercise with Exercise 1.5.2, or
with the four examples given in the introduction. In particular:
(i) In the escape to horizontal inﬁnity scenario, the height and
width do not shrink to zero, but the tail set shrinks to the
empty set (while remaining of inﬁnite measure throughout).
(ii) In the escape to width inﬁnity scenario, the height goes to
zero, but the width (and tail support) go to inﬁnity, causing
the L
1
norm to stay bounded away from zero.
1.5. Modes of convergence 123
(iii) In the escape to vertical inﬁnity, the height goes to inﬁnity,
but the width (and tail support) go to zero (or the empty
set), causing the L
1
norm to stay bounded away from zero.
(iv) In the typewriter example, the width goes to zero, but the
height and the tail support stay ﬁxed (and thus bounded
away from zero).
Remark 1.5.8. The monotone convergence theorem (Theorem 1.4.44)
can also be specialised to this case. Observe that the f
n
= A
n
1
En
are monotone increasing if and only if A
n
≤ A
n+1
and E
n
⊂ E
n+1
for each n. In such cases, observe that the f
n
converge pointwise to
f := A1
E
, where A := lim
n→∞
A
n
and E :=
¸
∞
n=1
E
n
. The mono
tone convergence theorem then asserts that A
n
µ(E
n
) → Aµ(E) as
n → ∞, which is a consequence of the monotone convergence theo
rem µ(E
n
) → µ(E) for sets.
1.5.3. Finite measure spaces. The situation simpliﬁes somewhat
if the space X has ﬁnite measure (and in particular, in the case when
(X, B, µ) is a probability space, see Section 2.3). This shuts down two
of the four examples (namely, escape to horizontal inﬁnity or width
inﬁnity) and creates a few more equivalences. Indeed, from Egorov’s
theorem (Exercise 1.4.31), we now have
Theorem 1.5.9 (Egorov’s theorem, again). Let X have ﬁnite mea
sure, and let f
n
: X → C and f : X → C be measurable functions.
Then f
n
converges to f pointwise almost everywhere if and only if f
n
converges to f almost uniformly.
Note that when one specialises to step functions using Exercise
1.5.3, then Egorov’s theorem collapses to the downward monotone
convergence property for sets (Exercise 1.4.23(iii)).
Another nice feature of the ﬁnite measure case is that L
∞
con
vergence implies L
1
convergence:
Exercise 1.5.4. Let X have ﬁnite measure, and let f
n
: X → C and
f : X → C be measurable functions. Show that if f
n
converges to f
in L
∞
norm, then f
n
also converges to f in L
1
norm.
1.5.4. Fast convergence. The typewriter example shows that L
1
convergence is not strong enough to force almost uniform or pointwise
124 1. Measure theory
almost everywhere convergence. However, this can be rectiﬁed if one
assumes that the L
1
convergence is suﬃciently fast:
Exercise 1.5.5 (Fast L
1
convergence). Suppose that f
n
, f : X → C
are measurable functions such that
¸
∞
n=1
f
n
− f
L
1
(µ)
< ∞; thus,
not only do the quantities f
n
− f
L
1
(µ)
go to zero (which would
mean L
1
convergence), but they converge in an absolutely summable
fashion.
(i) Show that f
n
converges pointwise almost everywhere to f.
(ii) Show that f
n
converges almost uniformly to f.
(Hint: If you have trouble getting started, try working ﬁrst in the
special case in which f
n
= A
n
1
En
are step functions and f = 0 and
use Exercise 1.5.3 in order to gain some intuition. The second part
of the exercise implies the ﬁrst, but the ﬁrst is a little easier to prove
and may thus serve as a useful warmup. The ε/2
n
trick may come in
handy for the second part.)
As a corollary, we see that L
1
convergence implies almost uniform
or pointwise almost everywhere convergence if we are allowed to pass
to a subsequence:
Corollary 1.5.10. Suppose that f
n
: X → C are a sequence of
measurable functions that converge in L
1
norm to a limit f. Then
there exists a subsequence f
nj
that converges almost uniformly (and
hence, pointwise almost everywhere) to f (while remaining convergent
in L
1
norm, of course).
Proof. Since f
n
−f
L
1
(µ)
→ 0 as n → ∞, we can select n
1
< n
2
<
n
3
< . . . such that f
nj
−f
L
1
(µ)
≤ 2
−j
(say). This is enough for the
previous exercise to apply.
Actually, one can strengthen this corollary a bit by relaxing L
1
convergence to convergence in measure:
Exercise 1.5.6. Suppose that f
n
: X → C are a sequence of mea
surable functions that converge in measure to a limit f. Then there
exists a subsequence f
nj
that converges almost uniformly (and hence,
pointwise almost everywhere) to f. (Hint: Choose the n
j
so that the
sets ¦x ∈ X : [f
nj
(x) −f(x)[ > 1/j¦ have a suitably small measure.)
1.5. Modes of convergence 125
It is instructive to see how this subsequence is extracted in the
case of the typewriter sequence. In general, one can view the oper
ation of passing to a subsequence as being able to eliminate “type
writer” situations in which the tail support is much larger than the
width.
Exercise 1.5.7. Let (X, B, µ) be a measure space, let f
n
: X → C
be a sequence of measurable functions converging pointwise almost
everywhere as n → ∞ to a measurable limit f : X → C, and for each
n, let f
n,m
: X → C be a sequence of measurable functions converging
pointwise almost everywhere as m → ∞ (keeping n ﬁxed) to f
n
.
(i) If µ(X) is ﬁnite, show that there exists a sequence m
1
, m
2
, . . .
such that f
n,mn
converges pointwise almost everywhere to
f.
(ii) Show the same claim is true if, instead of assuming that
µ(X) is ﬁnite, we merely assume that X is σﬁnite, i.e. it is
the countable union of sets of ﬁnite measure.
(The claim can fail if X is not σﬁnite. A counterexample is if X =
N
N
with counting measure, f
n
and f are identically zero for all n ∈ N,
and f
n,m
is the indicator function the space of all sequences (a
i
)
i∈N
∈
N
N
with a
n
≥ m.)
Exercise 1.5.8. Let f
n
: X → C be a sequence of measurable func
tions, and let f : X → C be another measurable function. Show that
the following are equivalent:
(i) f
n
converges in measure to f.
(ii) Every subsequence f
nj
of the f
n
has a further subsequence
f
nj
i
that converges almost uniformly to f.
1.5.5. Domination and uniform integrability. Now we turn to
the reverse question, of whether almost uniform convergence, point
wise almost everywhere convergence, or convergence in measure can
imply L
1
convergence. The escape to vertical and width inﬁnity ex
amples shows that without any further hypotheses, the answer to this
question is no. However, one can do better if one places some dom
ination hypotheses on the f
n
that shut down both of these escape
routes.
126 1. Measure theory
We say that a sequence f
n
: X → C is dominated if there exists
an absolutely integrable function g : X → C such that [f
n
(x)[ ≤ g(x)
for all n and almost every x. For instance, if X has ﬁnite measure and
the f
n
are uniformly bounded, then they are dominated. Observe that
the sequences in the vertical and width escape to inﬁnity examples
are not dominated (why?).
The dominated convergence theorem (Theorem 1.4.49) then as
serts that if f
n
converges to f pointwise almost everywhere, then it
necessarily converges to f in L
1
norm (and hence also in measure).
Here is a variant:
Exercise 1.5.9. Suppose that f
n
: X → C are a dominated sequence
of measurable functions, and let f : X → C be another measurable
function. Show that f
n
converges in L
1
norm to f if and only if
f
n
converges in measure to f. (Hint: one way to establish the “if”
direction is ﬁrst show that every subsequence of the f
n
has a further
subsequence that converges in L
1
to f, using Exercise 1.5.6 and the
dominated convergence theorem (Theorem 1.4.49). Alternatively, use
monotone convergence to ﬁnd a set E of ﬁnite measure such that
X\E
g dµ, and hence
X\E
f
n
dµ and
X\E
f dµ, are small.)
There is a more general notion than domination, known as uni
form integrability, which serves as a substitute for domination in many
(but not all) contexts.
Deﬁnition 1.5.11 (Uniform integrability). A sequence f
n
: X → C
of absolutely integrable functions is said to be uniformly integrable if
the following three statements hold:
(i) (Uniform bound on L
1
norm) One has sup
n
f
n

L
1
(µ)
=
sup
n
X
[f
n
[ dµ < +∞.
(ii) (No escape to vertical inﬁnity) One has sup
n
]fn]≥M
[f
n
[ dµ →
0 as M → +∞.
(iii) (No escape to width inﬁnity) One has sup
n
]fn]≤δ
[f
n
[ dµ →
0 as δ → 0.
Remark 1.5.12. It is instructive to understand uniform integrabil
ity in the step function case f
n
= A
n
1
En
. The uniform bound on the
1.5. Modes of convergence 127
L
1
norm then asserts that A
n
µ(E
n
) stays bounded. The lack of es
cape to vertical inﬁnity means that along any subsequence for which
A
n
→ ∞, A
n
µ(E
n
) must go to zero. Similarly, the lack of escape to
width inﬁnity means that along any subsequence for which A
n
→ 0,
A
n
µ(E
n
) must go to zero.
Exercise 1.5.10. (i) Show that if f is an absolutely integrable
function, then the constant sequence f
n
= f is uniformly
integrable. (Hint: use the monotone convergence theorem.)
(ii) Show that every dominated sequence of measurable func
tions is uniformly integrable.
(iii) Give an example of a sequence that is uniformly integrable
but not dominated.
In the case of a ﬁnite measure space, there is no escape to width
inﬁnity, and the criterion for uniform integrability simpliﬁes to just
that of excluding vertical inﬁnity:
Exercise 1.5.11. Suppose that X has ﬁnite measure, and let f
n
:
X → C be a sequence of measurable functions. Show that f
n
is
uniformly integrable if and only if sup
n
]fn]≥M
[f
n
[ dµ → 0 as M →
+∞.
Exercise 1.5.12 (Uniform L
p
bound on ﬁnite measure implies uni
form integrability). Suppose that X have ﬁnite measure, let 1 < p <
∞, an d suppose that f
n
: X → C is a sequence of measurable func
tions such that sup
n
X
[f
n
[
p
dµ < ∞. Show that the sequence f
n
is
uniformly integrable.
Exercise 1.5.13. Let f
n
: X → Cbe a uniformly integrable sequence
of functions. Show that for every ε > 0 there exists a δ > 0 such that
E
[f
n
[ dµ ≤ ε
whenever n ≥ 1 and E is a measurable set with µ(E) ≤ δ.
Exercise 1.5.14. This exercise is a partial converse to Exercise
1.5.13. Let X be a probability space, and let f
n
: X → C be a
sequence of absolutely integrable functions with sup
n
f
n

L
1 < ∞.
128 1. Measure theory
Suppose that for every ε > 0 there exists a δ > 0 such that
E
[f
n
[ dµ ≤ ε
whenever n ≥ 1 and E is a measurable set with µ(E) ≤ δ. Show that
the sequence f
n
is uniformly integrable.
The dominated convergence theorem (Theorem 1.4.49) does not
have an analogue in the uniformly integrable setting:
Exercise 1.5.15. Give an example of a sequence f
n
of uniformly
integrable functions that converge pointwise almost everywhere to
zero, but do not converge almost uniformly, in measure, or in L
1
norm.
However, one does have an analogue of Exercise 1.5.9:
Theorem 1.5.13 (Uniformly integrable convergence in measure).
Let f
n
: X → C be a uniformly integrable sequence of functions, and
let f : X → C be another function. Then f
n
converges in L
1
norm
to f if and only if f
n
converges to f in measure.
Proof. The “only if” part follows from Exercise 1.5.2, so we establish
the “if” part.
By uniform integrability, there exists a ﬁnite A > 0 such that
X
[f
n
[ dµ ≤ A
for all n. By Exercise 1.5.6, there is a subsequence of the f
n
that
converges pointwise almost everywhere to f. Applying Fatou’s lemma
(Corollary1.4.47), we conclude that
X
[f[ dµ ≤ A,
thus f is absolutely integrable.
Now let ε > 0 be arbitrary. By uniform integrability, one can ﬁnd
δ > 0 such that
(1.15)
]fn]≤δ
[f
n
[ dµ ≤ ε
1.5. Modes of convergence 129
for all n. By monotone convergence, and decreasing δ if necessary, we
may say the same for f, thus
(1.16)
]f]≤δ
[f[ dµ ≤ ε.
Let 0 < κ < δ/2 be another small quantity (that can depend on
A, ε, δ) that we will choose a bit later. From (1.15), (1.16) and the
hypothesis κ < δ/2 we have
]fn−f]<κ;]f]≤δ/2
[f
n
[ dµ ≤ ε
and
]fn−f]<κ;]f]≤δ/2
[f[ dµ ≤ ε
and hence by the triangle inequality
(1.17)
]fn−f]<κ;]f]≤δ/2
[f −f
n
[ dµ ≤ 2ε.
Finally, from Markov’s inequality (Exercise 1.4.36(vi)) we have
µ(¦x : [f(x)[ > δ/2¦) ≤
A
δ/2
and thus
]fn−f]<κ;]f]>δ/2
[f −f
n
[ dµ ≤ ε ≤
A
δ/2
κ.
In particular, by shrinking κ further if necessary we see that
]fn−f]<κ;]f]>δ/2
[f −f
n
[ dµ ≤ ε
and hence by (1.17)
(1.18)
]fn−f]<κ
[f −f
n
[ dµ ≤ 3ε
for all n.
Meanwhile, since f
n
converges in measure to f, we know that
there exists an N (depending on κ) such that
µ([f
n
(x) −f(x)[ ≥ κ) ≤ κ
130 1. Measure theory
for all n ≥ N. Applying Exercise 1.5.13, we conclude (making κ
smaller if necessary) that
]fn−f]≥κ
[f
n
[ dµ ≤ ε
and
]fn−f]≥κ
[f[ dµ ≤ ε
and hence by the triangle inequality
]fn−f]≥κ
[f −f
n
[ dµ ≤ 2ε
for all n ≥ N. Combining this with (1.18) we conclude that
f
n
−f
L
1
(µ)
=
X
[f −f
n
[ dµ ≤ 5ε
for all n ≥ N, and so f
n
converges to f in L
1
norm as desired.
Finally, we recall two results from the previous notes for unsigned
functions.
Exercise 1.5.16 (Monotone convergence theorem). Suppose that
f
n
: X → [0, +∞) are measurable, monotone nondecreasing in n
and are such that sup
n
X
f
n
dµ < ∞. Show that f
n
converges in L
1
norm to sup
n
f
n
. (Note that sup
n
f
n
can be inﬁnite on a null set, but
the deﬁnition of L
1
convergence can be easily modiﬁed to accomodate
this.)
Exercise 1.5.17 (Defect version of Fatou’s lemma). Suppose that
f
n
: X → [0, +∞) are measurable, are such that sup
n
X
f
n
dµ < ∞,
and converge pointwise almost everywhere to some measurable limit
f : X → [0, +∞). Show that f
n
converges in L
1
norm to f if and
only if
X
f
n
dµ converges to
X
f dµ. Informally, we see that in the
unsigned, bounded mass case, pointwise convergence implies L
1
norm
convergence if and only if there is no loss of mass.
Exercise 1.5.18. Suppose that f
n
: X → C are a dominated se
quence of measurable functions, and let f : X → C be another
measurable function. Show that f
n
converges pointwise almost ev
erywhere to f if and only if f
n
converges in almost uniformly to f.
1.6. Diﬀerentiation theorems 131
Exercise 1.5.19. Let X be a probability space (see Section 2.3).
Given any realvalued measurable function f : X → R, we deﬁne the
cumulative distribution function F : R → [0, 1] of f to be the function
F(λ) := µ(¦x ∈ X : f(x) ≤ λ¦). Given another sequence f
n
: X →
R of realvalued measurable functions, we say that f
n
converges in
distribution to f if the cumulative distribution function F
n
(λ) of f
n
converges pointwise to the cumulative distribution function F(λ) of
f at all λ ∈ R for which F is continuous.
(i) Show that if f
n
converges to f in any of the seven senses dis
cussed above (uniformly, essentially uniformly, almost uni
formly pointwise, pointwise almost everywhere, in L
1
, or in
measure), then it converges in distribution to f.
(ii) Give an example in which f
n
converges to f in distribution,
but not in any of the above seven senses.
(iii) Show that convergence in distribution is not linear, in the
sense that if f
n
converges to f in distribution, and g
n
con
verges to g, then f
n
+g
n
need not converge to f +g.
(iv) Show that a sequence f
n
can converge in distribution to two
diﬀerent limits f, g, which are not equal almost everywhere.
Convergence in distribution (not to be confused with convergence in
the sense of distributions, which is studied in S 1.13 of An epsilon of
room, Vol. I is commonly used in probability; but, as the above ex
ercise demonstrates, it is quite a weak notion of convergence, lacking
many of the properties of the modes of convergence discussed here.
1.6. Diﬀerentiation theorems
Let [a, b] be a compact interval of positive length (thus −∞ < a < b <
+∞). Recall that a function F : [a, b] → R is said to be diﬀerentiable
at a point x ∈ [a, b] if the limit
(1.19) F
/
(x) := lim
y→x;y∈[a,b]\¦x¦
F(y) −F(x)
y −x
exists. In that case, we call F
/
(x) the strong derivative, classical de
rivative, or just derivative for short, of F at x. We say that F is
132 1. Measure theory
everywhere diﬀerentiable, or diﬀerentiable for short, if it is diﬀeren
tiable at all points x ∈ [a, b], and diﬀerentiable almost everywhere if it
is diﬀerentiable at almost every point x ∈ [a, b]. If F is diﬀerentiable
everywhere and its derivative F
/
is continuous, then we say that F is
continuously diﬀerentiable.
Remark 1.6.1. In '1.13 of An epsilon of room, Vol. I, the notion of
a weak derivative or distributional derivative is introduced. This type
of derivative can be applied to a much rougher class of functions and
is in many ways more suitable than the classical derivative for doing
“Lebesgue” type analysis (i.e. analysis centred around the Lebesgue
integral, and in particular allowing functions to be uncontrolled, in
ﬁnite, or even undeﬁned on sets of measure zero). However, for now
we will stick with the classical approach to diﬀerentiation.
Exercise 1.6.1. If F : [a, b] → R is everywhere diﬀerentiable, show
that F is continuous and F
/
is measurable. If F is almost everywhere
diﬀerentiable, show that the (almost everywhere deﬁned) function F
/
is measurable (i.e. it is equal to an everywhere deﬁned measurable
function on [a, b] outside of a null set), but give an example to demon
strate that F need not be continuous.
Exercise 1.6.2. Give an example of a function F : [a, b] → R
which is everywhere diﬀerentiable, but not continuously diﬀerentiable.
(Hint: choose an F that vanishes quickly at some point, say at the
origin 0, but which also oscillates rapidly near that point.)
In singlevariable calculus, the operations of integration and dif
ferentiation are connected by a number of basic theorems, starting
with Rolle’s theorem.
Theorem 1.6.2 (Rolle’s theorem). Let [a, b] be a compact interval of
positive length, and let F : [a, b] → R be a diﬀerentiable function such
that F(a) = F(b). Then there exists x ∈ (a, b) such that F
/
(x) = 0.
Proof. By subtracting a constant from F (which does not aﬀect dif
ferentiability or the derivative) we may assume that F(a) = F(b) = 0.
If F is identically zero then the claim is trivial, so assume that F is
nonzero somewhere. By replacing F with −F if necessary, we may
1.6. Diﬀerentiation theorems 133
assume that F is positive somewhere, thus sup
x∈[a,b]
F(x) > 0. On
the other hand, as F is continuous and [a, b] is compact, F must at
tain its maximum somewhere, thus there exists x ∈ [a, b] such that
F(x) ≥ F(y) for all y ∈ [a, b]. Then F(x) must be positive and so x
cannot equal either a or b, and thus must lie in the interior. From the
right limit of (1.19) we see that F
/
(x) ≤ 0, while from the left limit
we have F
/
(x) ≥ 0. Thus F
/
(x) = 0 and the claim follows.
Remark 1.6.3. Observe that the same proof also works if F is only
diﬀerentiable in the interior (a, b) of the interval [a, b], so long as it is
continuous all the way up to the boundary of [a, b].
Exercise 1.6.3. Give an example to show that Rolle’s theorem can
fail if f is merely assumed to be almost everywhere diﬀerentiable,
even if one adds the additional hypothesis that f is continuous. This
example illustrates that everywhere diﬀerentiability is a signiﬁcantly
stronger property than almost everywhere diﬀerentiability. We will
see further evidence of this fact later in these notes; there are many
theorems that assert in their conclusion that a function is almost ev
erywhere diﬀerentiable, but few that manage to conclude everywhere
diﬀerentiability.
Remark 1.6.4. It is important to note that Rolle’s theorem only
works in the real scalar case when F is realvalued, as it relies heavily
on the least upper bound property for the domain R. If, for instance,
we consider complexvalued scalar functions F : [a, b] → C, then the
theorem can fail; for instance, the function F : [0, 1] → C deﬁned by
F(x) := e
2πix
−1 vanishes at both endpoints and is diﬀerentiable, but
its derivative F
/
(x) = 2πie
2πix
is never zero. (Rolle’s theorem does
imply that the real and imaginary parts of the derivative F
/
both
vanish somewhere, but the problem is that they don’t simultaneously
vanish at the same point.) Similar remarks to functions taking values
in a ﬁnitedimensional vector space, such as R
n
.
One can easily amplify Rolle’s theorem to the mean value theo
rem:
Corollary 1.6.5 (Mean value theorem). Let [a, b] be a compact in
terval of positive length, and let F : [a, b] → R be a diﬀerentiable
function. Then there exists x ∈ (a, b) such that F
/
(x) =
F(b)−F(a)
b−a
.
134 1. Measure theory
Proof. Apply Rolle’s theorem to the function x → F(x)−
F(b)−F(a)
b−a
(x−
a).
Remark 1.6.6. As Rolle’s theorem is only applicable to real scalar
valued functions, the more general mean value theorem is also only
applicable to such functions.
Exercise 1.6.4 (Uniqueness of antiderivatives up to constants). Let
[a, b] be a compact interval of positive length, and let F : [a, b] → R
and G : [a, b] → R be diﬀerentiable functions. Show that F
/
(x) =
G
/
(x) for every x ∈ [a, b] if and only if F(x) = G(x) + C for some
constant C ∈ R and all x ∈ [a, b].
We can use the mean value theorem to deduce one of the funda
mental theorems of calculus:
Theorem 1.6.7 (Second fundamental theorem of calculus). Let F :
[a, b] → R be a diﬀerentiable function, such that F
/
is Riemann in
tegrable. Then the Riemann integral
b
a
F
/
(x) dx of F
/
is equal to
F(b) −F(a). In particular, we have
b
a
F
/
(x) dx = F(b) −F(a) when
ever F is continuously diﬀerentiable.
Proof. Let ε > 0. By the deﬁnition of Riemann integrability, there
exists a ﬁnite partition a = t
0
< t
1
< . . . < t
k
= b such that
[
k
¸
j=1
F
/
(t
∗
j
)(t
j
−t
j−1
) −
b
a
F
/
(x)[ ≤ ε
for every choice of t
∗
j
∈ [t
j−1
, t
j
].
Fix this partition. From the mean value theorem, for each 1 ≤
j ≤ k one can ﬁnd t
∗
j
∈ [t
j−1
, t
j
] such that
F
/
(t
∗
j
)(t
j
−t
j−1
) = F(t
j
) −F(t
j−1
)
and thus by telescoping series
[(F(b) −F(a)) −
b
a
F
/
(x)[ ≤ ε.
Since ε > 0 was arbitrary, the claim follows.
1.6. Diﬀerentiation theorems 135
Remark 1.6.8. Even though the mean value theorem only holds
for real scalar functions, the fundamental theorem of calculus holds
for complex or vectorvalued functions, as one can simply apply that
theorem to each component of that function separately.
Of course, we also have the other half of the fundamental theorem
of calculus:
Theorem 1.6.9 (First fundamental theorem of calculus). Let [a, b]
be a compact interval of positive length. Let f : [a, b] → C be a
continuous function, and let F : [a, b] → C be the indeﬁnite integral
F(x) :=
x
a
f(t) dt. Then F is diﬀerentiable on [a, b], with derivative
F
/
(x) = f(x) for all x ∈ [a, b]. In particular, F is continuously
diﬀerentiable.
Proof. It suﬃces to show that
lim
h→0
+
F(x +h) −F(x)
h
= f(x)
for all x ∈ [a, b), and
lim
h→0
−
F(x +h) −F(x)
h
= f(x)
for all x ∈ (a, b]. After a change of variables, we can write
F(x +h) −F(x)
h
=
1
0
f(x +ht) dt
for any x ∈ [a, b) and any suﬃciently small h > 0, or any x ∈ (a, b]
and any suﬃciently small h < 0. As f is continuous, the function
t → f(x+ht) converges uniformly to f(x) on [0, 1] as h → 0 (keeping x
ﬁxed). As the interval [0, 1] is bounded,
1
0
f(x+ht) dt thus converges
to
1
0
f(x) dt = f(x), and the claim follows.
Corollary 1.6.10 (Diﬀerentiation theorem for continuous functions).
Let f : [a, b] → C be a continuous function on a compact interval.
Then we have
lim
h→0
+
1
h
[x,x+h]
f(t) dt = f(x)
136 1. Measure theory
for all x ∈ [a, b),
lim
h→0
+
1
h
[x−h,x]
f(t) dt = f(x)
for all x ∈ (a, b], and thus
lim
h→0
+
1
2h
[x−h,x+h]
f(t) dt = f(x)
for all x ∈ (a, b).
In these notes we explore the question of the extent to which these
theorems continue to hold when the diﬀerentiability or integrability
conditions on the various functions F, F
/
, f are relaxed. Among the
results proven in these notes are
(i) The Lebesgue diﬀerentiation theorem, which roughly speak
ing asserts that Corollary 1.6.10 continues to hold for almost
every x if f is merely absolutely integrable, rather than con
tinuous;
(ii) A number of diﬀerentiation theorems, which assert for in
stance that monotone, Lipschitz, or bounded variation func
tions in one dimension are almost everywhere diﬀerentiable;
and
(iii) The second fundamental theorem of calculus for absolutely
continuous functions.
1.6.1. The Lebesgue diﬀerentiation theorem in one dimen
sion. The main objective of this section is to show
Theorem 1.6.11 (Lebesgue diﬀerentiation theorem, onedimensional
case). Let f : R → C be an absolutely integrable function, and let
F : R → C be the deﬁnite integral F(x) :=
[−∞,x]
f(t) dt. Then F
is continuous and almost everywhere diﬀerentiable, and F
/
(x) = f(x)
for almost every x ∈ R.
This can be viewed as a variant of Corollary 1.6.10; the hypothe
ses are weaker because f is only assumed to be absolutely integrable,
rather than continuous (and can live on the entire real line, and not
just on a compact interval); but the conclusion is weaker too, because
1.6. Diﬀerentiation theorems 137
F is only found to be almost everywhere diﬀerentiable, rather than
everywhere diﬀerentiable. (But such a relaxation of the conclusion is
necessary at this level of generality; consider for instance the example
when f = 1
[0,1]
.)
The continuity is an easy exercise:
Exercise 1.6.5. Let f : R → C be an absolutely integrable function,
and let F : R → C be the deﬁnite integral F(x) :=
[−∞,x]
f(t) dt.
Show that F is continuous.
The main diﬃculty is to show that F
/
(x) = f(x) for almost every
x ∈ R. This will follow from
Theorem 1.6.12 (Lebesgue diﬀerentiation theorem, second formu
lation). Let f : R → C be an absolutely integrable function. Then
(1.20) lim
h→0
+
1
h
[x,x+h]
f(t) dt = f(x)
for almost every x ∈ R, and
(1.21) lim
h→0
+
1
h
[x−h,x]
f(t) dt = f(x)
for almost every x ∈ R.
Exercise 1.6.6. Show that Theorem 1.6.11 follows from Theorem
1.6.12.
We will just prove the ﬁrst fact (1.20); the second fact (1.21)
is similar (or can be deduced from (1.20) by replacing f with the
reﬂected function x → f(−x).
We are taking f to be complex valued, but it is clear from taking
real and imaginary parts that it suﬃces to prove the claim when f is
realvalued, and we shall thus assume this for the rest of the argument.
The conclusion (1.20) we want to prove is a convergence theorem
 an assertion that for all functions f in a given class (in this case,
the class of absolutely integrable functions f : R → R), a certain
sequence of linear expressions T
h
f (in this case, the right averages
T
h
f(x) =
1
h
[x,x+h]
f(t) dt) converge in some sense (in this case,
pointwise almost everywhere) to a speciﬁed limit (in this case, f).
138 1. Measure theory
There is a general and very useful argument to prove such convergence
theorems, known as the density argument. This argument requires
two ingredients, which we state informally as follows:
(i) A veriﬁcation of the convergence result for some “dense sub
class” of “nice” functions f, such as continuous functions,
smooth functions, simple functions, etc.. By “dense”, we
mean that a general function f in the original class can be
approximated to arbitrary accuracy in a suitable sense by a
function in the nice subclass.
(ii) A quantitative estimate that upper bounds the maximal
ﬂuctuation of the linear expressions T
h
f in terms of the
“size” of the function f (where the precise deﬁnition of
“size” depends on the nature of the approximation in the
ﬁrst ingredient).
Once one has these two ingredients, it is usually not too hard to
put them together to obtain the desired convergence theorem for gen
eral functions f (not just those in the dense subclass). We illustrate
this with a simple example:
Proposition 1.6.13 (Translation is continuous in L
1
). Let f : R
d
→
C be an absolutely integrable function, and for each h ∈ R
d
, let f
h
:
R
d
→ C be the shifted function
f
h
(x) := f(x −h).
Then f
h
converges in L
1
norm to f as h → 0, thus
lim
h→0
R
d
[f
h
(x) −f(x)[ dx = 0.
Proof. We ﬁrst verify this claim for a dense subclass of f, namely
the functions f which are continuous and compactly supported (i.e.
they vanish outside of a compact set). Such functions are continuous,
and thus f
h
converges uniformly to f as h → 0. Furthermore, as f is
compactly supported, the support of f
h
−f stays uniformly bounded
for h in a bounded set. From this we see that f
h
also converges to f
in L
1
norm as required.
1.6. Diﬀerentiation theorems 139
Next, we observe the quantitative estimate
(1.22)
R
d
[f
h
(x) −f(x)[ dx ≤ 2
R
d
[f(x)[ dx
for any h ∈ R
d
. This follows easily from the triangle inequality
R
d
[f
h
(x) −f(x)[ dx ≤
R
d
[f
h
(x)[ dx +
R
d
[f(x)[ dx
together with the translation invariance of the Lebesgue integral:
R
d
[f
h
(x)[ dx =
R
d
[f(x)[ dx.
Now we put the two ingredients together. Let f : R
d
→ C be ab
solutely integrable, and let ε > 0 be arbitrary. Applying Littlewood’s
second principle (Theorem 1.3.20(iii)) to the absolutely integrable
function F
/
, we can ﬁnd a continuous, compactly supported function
g : R
d
→ C such that
R
d
[f(x) −g(x)[ dx ≤ ε.
Applying (1.22), we conclude that
R
d
[(f −g)
h
(x) −(f −g)(x)[ dx ≤ 2ε,
which we rearrange as
R
d
[(f
h
−f)
h
(x) −(g
h
−g)(x)[ dx ≤ 2ε.
By the dense subclass result, we also know that
R
d
[g
h
(x) −g(x)[ dx ≤ ε
for all h suﬃciently close to zero. From the triangle inequality, we
conclude that
R
d
[f
h
(x) −f(x)[ dx ≤ 3ε
for all h suﬃciently close to zero, and the claim follows.
Remark 1.6.14. In the above application of the density argument,
we proved the required quantitative estimate directly for all functions
f in the original class of functions. However, it is also possible to use
140 1. Measure theory
the density argument a second time and initially verify the quantita
tive estimate just for functions f in a nice subclass (e.g. continuous
functions of compact support). In many cases, one can then extend
that estimate to the general case by using tools such as Fatou’s lemma
(Corollary1.4.47), which are particularly suited for showing that up
per bound estimates are preserved with respect to limits.
Exercise 1.6.7. Let f : R
d
→ C, g : R
d
→ C be Lebesgue measur
able functions such that f is absolutely integrable and g is essentially
bounded (i.e. bounded outside of a null set). Show that the convolu
tion f ∗ g : R
d
→ C deﬁned by the formula
f ∗ g(x) =
R
d
f(y)g(x −y) dy
is welldeﬁned (in the sense that the integrand on the righthand
side is absolutely integrable) and that f ∗ g is a bounded, continuous
function.
The above exercise is illustrative of a more general intuition,
which is that convolutions tend to be smoothing in nature; the con
volution f ∗ g of two functions is usually at least as regular as, and
often more regular than, either of the two factors f, g.
This smoothing phenomenon gives rise to an important fact,
namely the Steinhaus theorem:
Exercise 1.6.8 (Steinhaus theorem). Let E ⊂ R
d
be a Lebesgue
measurable set of positive measure. Show that the set E−E := ¦x−
y : x, y ∈ E¦ contains an open neighbourhood of the origin. (Hint:
reduce to the case when E is bounded, and then apply the previous
exercise to the convolution 1
E
∗ 1
−E
, where −E := ¦−y : y ∈ E¦.)
Exercise 1.6.9. A homomorphism f : R
d
→ C is a map with the
property that f(x +y) = f(x) +f(y) for all x, y ∈ R
d
.
(i) Show that all measurable homomorphisms are continuous.
(Hint: for any disk D centered at the origin in the complex
plane, show that f
−1
(z + D) has positive measure for at
least one z ∈ C, and then use the Steinhaus theorem from
the previous exercise.)
1.6. Diﬀerentiation theorems 141
• Show that f is a measurable homomorphism if and only
if it takes the form f(x
1
, . . . , x
d
) = x
1
z
1
+ . . . + x
d
z
d
for
all x
1
, . . . , x
d
∈ R and some complex coeﬃcients z
1
, . . . , z
d
.
(Hint: ﬁrst establish this for rational x
1
, . . . , x
d
, and then
use the previous part of this exercise.)
(ii) (For readers familiar with Zorn’s lemma, see '2.4 of An ep
silon of room, Vol. I ) Show that there exist homomorphisms
f : R
d
→ C which are not of the form in the previous ex
ercise. (Hint: view R
d
(or C) as a vector space over the
rationals Q, and use the fact (from Zorn’s lemma) that ev
ery vector space  even an inﬁnitedimensional one  has at
least one basis.) This gives an alternate construction of a
nonmeasurable set to that given in previous notes.
Remark 1.6.15. One drawback with the density argument is it gives
convergence results which are qualitative rather than quantitative 
there is no explicit bound on the rate of convergence. For instance,
in Proposition 1.6.13, we know that for any ε > 0, there exists δ > 0
such that
R
d
[f
h
(x) −f(x)[ dx ≤ ε whenever [h[ ≤ δ, but we do not
know exactly how δ depends on ε and f. Actually, the proof does
eventually give such a bound, but it depends on “how measurable”
the function f is, or more precisely how “easy” it is to approximate
f by a “nice” function. To illustrate this issue, let’s work in one di
mension and consider the function f(x) := sin(Nx)1
[0,2π]
(x), where
N ≥ 1 is a large integer. On the one hand, f is bounded in the
L
1
norm uniformly in N:
R
[f(x)[ dx ≤ 2π (indeed, the lefthand
side is equal to 2). On the other hand, it is not hard to see that
R
[f
π/N
(x) − f(x)[ dx ≥ c for some absolute constant c > 0. Thus,
if one force
R
[f
h
(x) −f(x)[ dx to drop below c, one has to make h
at most π/N from the origin. Making N large, we thus see that the
rate of convergence of
R
[f
h
(x) −f(x)[ dx to zero can be arbitrarily
slow, even though f is bounded in L
1
. The problem is that as N
gets large, it becomes increasingly diﬃcult to approximate f well by
a “nice” function, by which we mean a uniformly continuous function
with a reasonable modulus of continuity, due to the increasingly os
cillatory nature of f. See [Ta2008, '1.4] for some further discussion
142 1. Measure theory
of this issue, and what quantitative substitutes are available for such
qualitative results.
Now we return to the Lebesgue diﬀerentiation theorem, and apply
the density argument. The dense subclass result is already contained
in Corollary 1.6.10, which asserts that (1.20) holds for all continuous
functions f. The quantitative estimate we will need is the following
special case of the HardyLittlewood maximal inequality:
Lemma 1.6.16 (Onesided HardyLittlewood maximal inequality).
Let f : R → C be an absolutely integrable function, and let λ > 0.
Then
m(¦x ∈ R : sup
h>0
1
h
[x,x+h]
[f(t)[ dt ≥ λ¦) ≤
1
λ
R
[f(t)[ dt.
We will prove this lemma shortly, but let us ﬁrst see how this,
combined with the dense subclass result, will give the Lebesgue dif
ferentiation theorem. Let f : R → C be absolutely integrable, and
let ε, λ > 0 be arbitrary. Then by Littlewood’s second principle, we
can ﬁnd a function g : R → C which is continuous and compactly
supported, with
R
[f(x) −g(x)[ dx ≤ ε.
Applying the onesided HardyLittlewood maximal inequality, we con
clude that
m(¦x ∈ R : sup
h>0
1
h
[x,x+h]
[f(t) −g(t)[ dt ≥ λ¦) ≤
ε
λ
.
In a similar spirit, from Markov’s inequality (Lemma 1.3.15) we have
m(¦x ∈ R : [f(x) −g(x)[ ≥ λ¦) ≤
ε
λ
.
By subadditivity, we conclude that for all x ∈ R outside of a set E
of measure at most 2ε/λ, one has both
(1.23)
1
h
[x,x+h]
[f(t) −g(t)[ dt < λ
and
(1.24) [f(x) −g(x)[ < λ
for all h > 0.
1.6. Diﬀerentiation theorems 143
Now let x ∈ R`E. From the dense subclass result (Corollary
1.6.10) applied to the continuous function g, we have
[
1
h
[x,x+h]
g(t) dt −g(x)[ < λ
whenever h is suﬃciently close to x. Combining this with (1.23),
(1.24), and the triangle inequality, we conclude that
[
1
h
[x,x+h]
f(t) dt −f(x)[ < 3λ
for all h suﬃciently close to zero. In particular we have
limsup
h→0
[
1
h
[x,x+h]
f(t) dt −f(x)[ < 3λ
for all x outside of a set of measure 2ε/λ. Keeping λ ﬁxed and sending
ε to zero, we conclude that
limsup
h→0
[
1
h
[x,x+h]
f(t) dt −f(x)[ < 3λ
for almost every x ∈ R. If we then let λ go to zero along a countable
sequence (e.g. λ := 1/n for n = 1, 2, . . .), we conclude that
limsup
h→0
[
1
h
[x,x+h]
f(t) dt −f(x)[ = 0
for almost every x ∈ R, and the claim follows.
The only remaining task is to establish the onesided Hardy
Littlewood maximal inequality. We will do so by using the rising
sun lemma:
Lemma 1.6.17 (Rising sun lemma). Let [a, b] be a compact interval,
and let F : [a, b] → R be a continuous function. Then one can ﬁnd an
at most countable family of disjoint nonempty open intervals I
n
=
(a
n
, b
n
) in [a, b] with the following properties:
(i) For each n, either F(a
n
) = F(b
n
), or else a
n
= a and
F(b
n
) ≥ F(a
n
).
(ii) If x ∈ [a, b] does not lie in any of the intervals I
n
, then one
must have F(y) ≤ F(x) for all x ≤ y ≤ b.
144 1. Measure theory
Remark 1.6.18. To explain the name “rising sun lemma”, imagine
the graph ¦(x, F(x)) : x ∈ [a, b]¦ of F as depicting a hilly landscape,
with the sun shining horizontally from the rightward inﬁnity (+∞, 0)
(or rising from the east, if you will). Those x for which F(y) ≤ F(x)
are the locations on the landscape which are illuminated by the sun.
The intervals I
n
then represent the portions of the landscape that are
in shadow. The reader is encouraged to draw a picture
14
to illustrate
this perspective.
This lemma is proven using the following basic fact:
Exercise 1.6.10. Show that any open subset U of R can be written
as the union of at most countably many disjoint nonempty open
intervals, whose endpoints lie outside of U. (Hint: ﬁrst show that
every x in U is contained in a maximal open subinterval (a, b) of U,
and that these maximal open subintervals are disjoint, with each such
interval containing at least one rational number.)
Proof. (Proof of rising sun lemma) Let U be the set of all x ∈ (a, b)
such that F(y) > F(x) for at least one x < y < b. As F is continuous,
U is open, and so U is the union of at most countably many disjoint
nonempty open intervals I
n
= (a
n
, b
n
), with the endpoints a
n
, b
n
lying outside of U.
The second conclusion of the rising sun lemma is clear from
construction, so it suﬃces to establish the ﬁrst. Suppose ﬁrst that
I
n
= (a
n
, b
n
) is such that a
n
= a. As the endpoint a
n
does not
lie in U, we must have F(y) ≤ F(a
n
) for all a
n
≤ y ≤ b; similarly
we have F(y) ≤ F(b
n
) for all b
n
≤ y ≤ b. In particular we have
F(b
n
) ≤ F(a
n
). By the continuity of F, it will then suﬃce to show
that F(b
n
) ≥ F(t) for all a
n
< t < b
n
.
Suppose for contradiction that there was a
n
< t < b
n
with
F(b
n
) < F(t). Let A := ¦s ∈ [t, b] : F(s) ≥ F(t)¦, then A is
a closed set that contains t but not b. Set t
∗
:= sup(A), then
t
∗
∈ [t, b) ⊂ I
n
⊂ U, and thus there exists t
∗
< y ≤ b such that
F(y) > F(t
∗
). Since F(t
∗
) ≥ F(t) > F(b
n
), and F(b
n
) ≥ F(z) for all
14
Author’s note: I have deliberately omitted including such pictures in the text,
as I feel that it is far more instructive and useful for the reader to directly create a
personalised visual aid for these results.
1.6. Diﬀerentiation theorems 145
b
n
≤ z ≤ b, we see that y cannot exceed b
n
, and thus lies in A, but
this contradicts the fact that t
∗
is the supremum of A.
The case when a
n
= a is similar and is left to the reader; the only
diﬀerence is that we can no longer assert that F(y) ≤ F(a
n
) for all
a
n
≤ y ≤ b, and so do not have the upper bound F(b
n
) ≤ F(a
n
).
Now we can prove the onesided HardyLittlewood maximal in
equality. By upwards monotonicity, it will suﬃce to show that
m(¦x ∈ [a, b] : sup
h>0;[x,x+h]⊂[a,b]
1
h
[x,x+h]
[f(t)[ dt ≥ λ¦) ≤
1
λ
R
[f(t)[ dt
for any compact interval [a, b]. By modifying λ by an epsilon, we may
replace the nonstrict inequality here with strict inequality:
(1.25)
m(¦x ∈ [a, b] : sup
h>0;[x,x+h]⊂[a,b]
1
h
[x,x+h]
[f(t)[ dt > λ¦) ≤
1
λ
R
[f(t)[ dt
Fix [a, b]. We apply the rising sun lemma to the function F :
[a, b] → R deﬁned as
F(x) :=
[a,x]
[f(t)[ dt −(x −a)λ.
By Lemma 1.6.5, F is continuous, and so we can ﬁnd an at most
countable sequence of intervals I
n
= (a
n
, b
n
) with the properties given
by the rising sun lemma. From the second property of that lemma,
we observe that
¦x ∈ [a, b] : sup
h>0;[x,x+h]⊂[a,b]
1
h
[x,x+h]
[f(t)[ dt > λ¦ ⊂
¸
n
I
n
,
since the property
1
h
[x,x+h]
[f(t)[ dt > λ can be rearranged as F(x+
h) > F(x). By countable additivity, we may thus upper bound the
lefthand side of (1.25) by
¸
n
(b
n
− a
n
). On the other hand, since
F(b
n
) −F(a
n
) ≥ 0, we have
In
[f(t)[ dt ≥ λ(b
n
−a
n
)
and thus
¸
n
(b
n
−a
n
) ≤
1
λ
¸
n
In
[f(t)[ dt.
146 1. Measure theory
As the I
n
are disjoint intervals in I, we may apply monotone conver
gence and monotonicity to conclude that
¸
n
In
[f(t)[ dt ≤
[a,b]
[f(t)[ dt,
and the claim follows.
Exercise 1.6.11 (Twosided HardyLittlewood maximal inequality).
Let f : R → C be an absolutely integrable function, and let λ > 0.
Show that
m(¦x ∈ R : sup
x∈I
1
[I[
I
[f(t)[ dt ≥ λ¦) ≤
2
λ
R
[f(t)[ dt,
where the supremum ranges over all intervals I of positive length that
contain x.
Exercise 1.6.12 (Rising sun inequality). Let f : R → R be an
absolutely integrable function, and let f
∗
: R → R be the onesided
signed HardyLittlewood maximal function
f
∗
(x) := sup
h>0
1
h
[x,x+h]
f(t) dt.
Establish the rising sun inequality
λm(¦f
∗
(x) > λ¦) ≤
x:f
∗
(x)>λ
f(x) dx
for all real λ (note here that we permit λ to be zero or negative), and
show that this inequality implies Lemma 1.6.16. (Hint: First do the
λ = 0 case, by invoking the rising sun lemma.) See [Ta2009, '2.9] for
some further discussion of inequalities of this type, and applications
to ergodic theory (and in particular the maximal ergodic theorem).
Exercise 1.6.13. Show that the left and righthand sides in Lemma
1.6.16 are in fact equal. (Hint: one may ﬁrst wish to try this in the
case when f has compact support, in which case one can apply the
rising sun lemma to a suﬃciently large interval containing the support
of f.)
1.6. Diﬀerentiation theorems 147
1.6.2. The Lebesgue diﬀerentiation theorem in higher di
mensions. Now we extend the Lebesgue diﬀerentiation theorem to
higher dimensions. Theorem 1.6.11 does not have an obvious high
dimensional analogue, but Theorem 1.6.12 does:
Theorem 1.6.19 (Lebesgue diﬀerentiation theorem in general di
mension). Let f : R
d
→ C be an absolutely integrable function. Then
for almost every x ∈ R
d
, one has
(1.26) lim
r→0
1
m(B(x, r))
B(x,r)
[f(y) −f(x)[ dy = 0
and
lim
r→0
1
m(B(x, r))
B(x,r)
f(y) dy = f(x),
where B(x, r) := ¦y ∈ R
d
: [x − y[ < r¦ is the open ball of radius r
centred at x.
From the triangle inequality we see that
[
1
m(B(x, r))
B(x,r)
f(y) dy −f(x)[ = [
1
m(B(x, r))
B(x,r)
f(y) −f(x) dy[
≤
1
m(B(x, r))
B(x,r)
[f(y) −f(x)[ dy,
so we see that the ﬁrst conclusion of Theorem 1.6.19 implies the
second. A point x for which (1.26) holds is called a Lebesgue point of
f; thus, for an absolutely integrable function f, almost every point in
R
d
will be a Lebesgue point for R
d
.
Exercise 1.6.14. Call a function f : R
d
→ C locally integrable if,
for every x ∈ R
d
, there exists an open neighbourhood of x on which
f is absolutely integrable.
(i) Show that f is locally integrable if and only if
B(0,r)
[f(x)[ dx <
∞ for all r > 0.
(ii) Show that Theorem 1.6.19 implies a generalisation of itself in
which the condition of absolute integrability of f is weakened
to local integrability.
148 1. Measure theory
Exercise 1.6.15. For each h > 0, let E
h
be a subset of B(0, h) with
the property that m(E
h
) ≥ cm(B(0, h)) for some c > 0 independent
of h. Show that if f : R
d
→ C is locally integrable, and x is a
Lebesgue point of f, then
lim
h→0
1
m(E
h
)
x+E
h
f(y) dy = f(x).
Conclude that Theorem 1.6.19 implies Theorem 1.6.12.
To prove Theorem 1.6.19, we use the density argument. The
dense subclass case is easy:
Exercise 1.6.16. Show that Theorem 1.6.19 holds whenever f is
continuous.
The quantitative estimate needed is the following:
Theorem 1.6.20 (HardyLittlewood maximal inequality). Let f :
R
d
→ C be an absolutely integrable function, and let λ > 0. Then
m(¦x ∈ R
d
: sup
r>0
1
m(B(x, r))
B(x,r)
[f(y)[ dy ≥ λ¦) ≤
C
d
λ
R
[f(t)[ dt
for some constant C
d
> 0 depending only on d.
Remark 1.6.21. The expression sup
r>0
1
m(B(x,r))
B(x,r)
[f(y)[ dy ≥
λ¦ is known as the HardyLittlewood maximal function of f, and is
often denoted Mf(x). It is an important function in the ﬁeld of
(realvariable) harmonic analysis.
Exercise 1.6.17. Use the density argument to show that Theorem
1.6.20 implies Theorem 1.6.19.
In the onedimensional case, this estimate was established via
the rising sun lemma. Unfortunately, that lemma relied heavily on
the ordered nature of R, and does not have an obvious analogue in
higher dimensions. Instead, we will use the following covering lemma.
Given an open ball B = B(x, r) in R
d
and a real number c > 0, we
write cB := B(x, cr) for the ball with the same centre as B, but c
times the radius. (Note that this is slightly diﬀerent from the set
c B := ¦cy : y ∈ B¦  why?) Note that [cB[ = c
d
[B[ for any open
ball B ⊂ R
d
and any c > 0.
1.6. Diﬀerentiation theorems 149
Lemma 1.6.22 (Vitalitype covering lemma). Let B
1
, . . . , B
n
be a
ﬁnite collection of open balls in R
d
(not necessarily disjoint). Then
there exists a subcollection B
/
1
, . . . , B
/
m
of disjoint balls in this collec
tion, such that
(1.27)
n
¸
i=1
B
i
⊂
m
¸
j=1
3B
/
j
.
In particular, by ﬁnite subadditivity,
m(
n
¸
i=1
B
i
) ≤ 3
d
m
¸
j=1
m(B
/
j
).
Proof. We use a greedy algorithm argument, selecting the balls B
/
i
to be as large as possible while remaining disjoint. More precisely, we
run the following algorithm:
Step 0. Initialise m = 0 (so that, initially, there are no balls B
/
1
, . . . , B
/
m
in the desired collection).
Step 1. Consider all the balls B
j
that do not already intersect one of
the B
/
1
, . . . , B
/
m
(so, initially, all of the balls B
1
, . . . , B
n
will
be considered). If there are no such balls, STOP. Otherwise,
go on to Step 2.
Step 2. Locate the largest ball B
j
that does not already intersect
one of the B
/
1
, . . . , B
/
m
. (If there are multiple largest balls
with exactly the same radius, break the tie arbitrarily.) Add
this ball to the collection B
/
1
, . . . , B
/
m
by setting B
/
m+1
:= B
j
and then incrementing m to m+ 1. Then return to Step 1.
Note that at each iteration of this algorithm, the number of available
balls amongst the B
1
, . . . , B
n
drops by at least one (since each ball
selected certainly intersects itself and so cannot be selected again).
So this algorithm terminates in ﬁnite time. It is also clear from con
struction that the B
/
1
, . . . , B
/
m
are a subcollection of the B
1
, . . . , B
n
consisting of disjoint balls. So the only task remaining is to verify
that (1.27) holds at the completion of the algorithm, i.e. to show
that each ball B
i
in the original collection is covered by the triples
3B
/
j
of the subcollection.
150 1. Measure theory
For this, we argue as follows. Take any ball B
i
in the original
collection. Because the algorithm only halts when there are no more
balls that are disjoint from the B
/
1
, . . . , B
/
m
, the ball B
i
must intersect
at least one of the balls B
/
j
in the subcollection. Let B
/
j
be the ﬁrst
ball with this property, thus B
i
is disjoint from B
/
1
, . . . , B
/
j−1
, but
intersects B
/
j
. Because B
/
j
was chosen to be largest amongst all balls
that did not intersect B
/
1
, . . . , B
/
j−1
, we conclude that the radius of B
i
cannot exceed that of B
/
j
. From the triangle inequality, this implies
that B
i
⊂ 3B
/
j
, and the claim follows.
Exercise 1.6.18. Technically speaking, the above algorithmic ar
gument was not phrased in the standard language of formal mathe
matical deduction, because in that language, any mathematical ob
ject (such as the natural number m) can only be deﬁned once, and
not redeﬁned multiple times as is done in most algorithms. Rewrite
the above argument in a way that avoids redeﬁning any variable.
(Hint: introduce a “time” variable t, and recursively construct fam
ilies B
/
1,t
, . . . , B
/
mt,t
of balls that represent the outcome of the above
algorithm after t iterations (or t
∗
iterations, if the algorithm halted
at some previous time t
∗
< t). For this particular algorithm, there
are also more ad hoc approaches that exploit the relatively simple
nature of the algorithm to allow for a less notationally complicated
construction.) More generally, it is possible to use this time parame
ter trick to convert any construction involving a provably terminating
algorithm into a construction that does not redeﬁne any variable. (It
is however dangerous to work with any algorithm that has an inﬁnite
run time, unless one has a suitably strong convergence result for the
algorithm that allows one to take limits, either in the classical sense
or in the more general sense of jumping to limit ordinals; in the latter
case, one needs to use transﬁnite induction in order to ensure that
the use of such algorithms is rigorous; see '2.4 of An epsilon of room,
Vol. I.)
Remark 1.6.23. The actual Vitali covering lemma[Vi1908] is slightly
diﬀerent to this one, but we will not need it here. Actually there is
a family of related covering lemmas which are useful for a variety of
tasks in harmonic analysis, see for instance [deG1981] for further
discussion.
1.6. Diﬀerentiation theorems 151
Now we can prove the HardyLittlewood inequality, which we will
do with the constant C
d
:= 3
d
. It suﬃces to verify the claim with
strict inequality,
m(¦x ∈ R
d
: sup
r>0
1
m(B(x, r))
B(x,r)
[f(y)[ dy > λ¦) ≤
C
d
λ
R
[f(t)[ dt
as the nonstrict case then follows by perturbing λ slightly and then
taking limits.
Fix f and λ. By inner regularity, it suﬃces to show that
m(K) ≤
3
d
λ
R
[f(t)[ dt
whenever K is a compact set that is contained in ¦x ∈ R
d
: sup
r>0
1
m(B(x,r))
B(x,r)
[f(y)[ dy >
λ¦.
By construction, for every x ∈ K, there exists an open ball B(x, r)
such that
(1.28)
1
m(B(x, r))
B(x,r)
[f(y)[ dy > λ.
By compactness of K, we can cover K by a ﬁnite number B
1
, . . . , B
n
of such balls. Applying the Vitalitype covering lemma, we can ﬁnd
a subcollection B
/
1
, . . . , B
/
m
of disjoint balls such that
m(
n
¸
i=1
B
i
) ≤ 3
d
m
¸
j=1
m(B
/
j
).
By (1.28), on each ball B
/
j
we have
m(B
/
j
) <
1
λ
B
j
[f(y)[ dy;
summing in j and using the disjointness of the B
/
j
we conclude that
m(
n
¸
i=1
B
i
) ≤
3
d
λ
R
d
[f(y)[ dy.
Since the B
1
, . . . , B
n
cover K, we obtain Theorem 1.6.20 as desired.
Exercise 1.6.19. Improve the constant 3
d
in the HardyLittlewood
maximal inequality to 2
d
. (Hint: observe that with the construction
used to prove the Vitali covering lemma, the centres of the balls B
i
are contained in
¸
m
j=1
2B
/
j
and not just in
¸
m
j=1
3B
/
j
. To exploit this
152 1. Measure theory
observation one may need to ﬁrst create an epsilon of room, as the
centers are not by themselves suﬃcient to cover the required set.)
Remark 1.6.24. The optimal value of C
d
is not known in general,
although a fairly recent result of Melas[Me2003] gives the surprising
conclusion that the optimal value of C
1
is C
1
=
11+
√
61
12
= 1.56 . . .. It
is known that C
d
grows at most linearly in d, thanks to a result of
Stein and Str¨omberg[StSt1983], but it is not known if C
d
is bounded
in d or grows as d → ∞.
Exercise 1.6.20 (Dyadic maximal inequality). If f : R
d
→ C is an
absolutely integrable function, establish the dyadic HardyLittlewood
maximal inequality
m(¦x ∈ R
d
: sup
x∈Q
1
[Q[
Q
[f(y)[ dy ≥ λ¦) ≤
1
λ
R
[f(t)[ dt
where the supremum ranges over all dyadic cubes Q that contain x.
(Hint: the nesting property of dyadic cubes will be useful when it
comes to the covering lemma stage of the argument, much as it was
in Exercise 1.1.14.)
Exercise 1.6.21 (Besicovich covering lemma in one dimension). Let
I
1
, . . . , I
n
be a ﬁnite family of open intervals in R (not necessarily
disjoint). Show that there exist a subfamily I
/
1
, . . . , I
/
m
of intervals
such that
(i)
¸
n
i=1
I
n
=
¸
m
j=1
I
/
m
; and
(ii) Each point x ∈ R is contained in at most two of the I
/
m
.
(Hint: First reﬁne the family of intervals so that no interval I
i
is
contained in the union of the the other intervals. At that point, show
that it is no longer possible for a point to be contained in three of
the intervals.) There is a variant of this lemma that holds in higher
dimensions, known as the Besicovitch covering lemma.
Exercise 1.6.22. Let µ be a Borel measure (i.e. a countably additive
measure on the Borel σalgebra) on R, such that 0 < µ(I) < ∞ for
every interval I of positive length. Assume that µ is inner regular, in
the sense that µ(E) = sup
K⊂E, compact
µ(K) for every Borel mea
surable set E. (As it turns out, from the theory of Radon measures,
1.6. Diﬀerentiation theorems 153
all locally ﬁnite Borel measures have this property, but we will not
prove this here; see '1.10 of An epsilon of room, Vol. I.) Establish
the HardyLittlewood maximal inequality
µ(¦x ∈ R : sup
x∈I
1
µ(I)
I
[f(y)[ dµ(y) ≥ λ¦) ≤
2
λ
R
[f(y)[ dµ(y)
for any absolutely integrable function f ∈ L
1
(µ), where the supremum
ranges over all open intervals I that contain x. Note that this essen
tially generalises Exercise 1.6.11, in which µ is replaced by Lebesgue
measure. (Hint: Repeat the proof of the usual HardyLittlewood
maximal inequality, but use the Besicovich covering lemma in place
of the Vitalitype covering lemma. Why do we need the former lemma
here instead of the latter?)
Exercise 1.6.23 (Cousin’s theorem). Prove Cousin’s theorem: given
any function δ : [a, b] → (0, +∞) on a compact interval [a, b] of posi
tive length, there exists a partition a = t
0
< t
1
< . . . < t
k
= b with
k ≥ 1, together with real numbers t
∗
j
∈ [t
j−1
, t
j
] for each 1 ≤ j ≤ k
and t
j
− t
j−1
≤ δ(t
∗
j
). (Hint: use the HeineBorel theorem, which
asserts that any open cover of [a, b] has a ﬁnite subcover, followed by
the Besicovitch covering lemma.) This theorem is useful in a variety
of applications related to the second fundamental theorem of calculus,
as we shall see below. The positive function δ is known as a gauge
function.
Now we turn to consequences of the Lebesgue diﬀerentiation the
orem. Given a Lebesgue measurable set E ⊂ R
d
, call a point x ∈ R
d
a point of density for E if
m(E∩B(x,r))
m(B(x,r))
→ 1 as r → 0. Thus, for in
stance, if E = [−1, 1]`¦0¦, then every point in (−1, 1) (including the
boundary point 0) is a point of density for E, but the endpoints −1, 1
(as well as the exterior of E) are not points of density. One can think
of a point of density as being an “almost interior” point of E; it is
not necessarily the case that one can ﬁt an small ball B(x, r) centred
at x inside of E, but one can ﬁt most of that small ball inside E.
Exercise 1.6.24. If E ⊂ R
d
is Lebesgue measurable, show that
almost every point in E is a point of density for E, and almost every
point in the complement of E is not a point of density for E.
154 1. Measure theory
Exercise 1.6.25. Let E ⊂ R
d
be a measurable set of positive mea
sure, and let ε > 0.
(i) Using Exercise 1.6.15 and Exercise 1.6.24, show that there
exists a cube Q ⊂ R
d
of positive sidelength such that m(E∩
Q) > (1 −ε)m(Q).
(ii) Give an alternate proof of the above claim that avoids the
Lebesgue diﬀerentiation theorem. (Hint: reduce to the case
when E is bounded, then approximate E by an almost dis
joint union of cubes.)
(iii) Use the above result to give an alternate proof of the Stein
haus theorem (Exercise 1.6.8).
Of course, one can replace cubes here by other comparable shapes,
such as balls. (Indeed, a good principle to adopt in analysis is that
cubes and balls are “equivalent up to constants”, in that a cube of
some sidelength can be contained in a ball of comparable radius, and
vice versa. This type of mental equivalence is analogous to, though
not identical with, the famous dictum that a topologist cannot dis
tinguish a doughnut from a coﬀee cup.)
Exercise 1.6.26. (i) Give an example of a compact set K ⊂
R of positive measure such that m(K ∩ I) < [I[ for every
interval I of positive length. (Hint: ﬁrst construct an open
dense subset of [0, 1] of measure strictly less than 1.)
(ii) Give an example of a measurable set E ⊂ R such that
0 < m(E ∩ I) < [I[ for every interval I of positive length.
(Hint: ﬁrst work in a bounded interval, such as (−1, 2). The
complement of the set K in the ﬁrst example is the union of
at most countably many open intervals, thanks to Exercise
1.6.10. Now ﬁll in these open intervals and iterate.)
Exercise 1.6.27 (Approximations to the identity). Deﬁne a good
kernel
15
to be a measurable function P : R
d
→ R
+
which is non
negative, radial (which means that there is a function
˜
P : [0, +∞) →
15
Diﬀerent texts have slightly diﬀerent notions of what a good kernel is; the
“right” class of kernels to consider depends to some extent on what type of convergence
results one is interested in (e.g. almost everywhere convergence, convergence in L
1
or
L
∞
norm, etc.), and on what hypotheses one wishes to place on the original function
f.
1.6. Diﬀerentiation theorems 155
R
+
such that P(x) =
˜
P([x[)), radially nonincreasing (so that
˜
P is a
nonincreasing function), and has total mass
R
d
P(x) dx equal to 1.
The functions P
t
(x) :=
1
t
d
P(
x
t
) for t > 0 are then said to be a good
family of approximations to the identity.
(i) Show that the heat kernels
16
P
t
(x) :=
1
(4πt
2
)
d/2
e
−]x]
2
/4t
2
and
Poisson kernels P
t
(x) := c
d
t
(t
2
+]x]
2
)
(d+1)/2
are good families
of approximations to the identity, if the constant c
d
> 0 is
chosen correctly (in fact one has c
d
= Γ((d+1)/2)/π
(d+1)/2
,
but you are not required to establish this).
(ii) Show that if P is a good kernel, then
c
d
<
∞
¸
n=−∞
2
dn
˜
P(2
n
) ≤ C
d
for some constants 0 < c
d
< C
d
depending only on d. (Hint:
compare P with such “horizontal wedding cake” functions
as
¸
∞
n=−∞
1
2
n−1
<]x]≤2
n
˜
P(2
n
).)
(iii) Establish the quantitative upper bound
[
R
d
f(y)P
t
(x −y) dy[ ≤ C
/
d
sup
r>0
1
[B(x, r)[
B(x,r)
[f(y)[ dy
for any absolutely integrable function f and some constant
C
/
d
> 0 depending only on d.
(iv) Show that if f : R
d
→ C is absolutely integrable and x is a
Lebesgue point of f, then the convolution
f ∗ P
t
(x) :=
R
d
f(y)P
t
(x −y) dy
converges to f(x) as t → 0. (Hint: split f(y) as the sum
of f(x) and f(y) − f(x).) In particular, f ∗ P
t
converges
pointwise almost everywhere to f.
1.6.3. Almost everywhere diﬀerentiability. As we see in under
graduate real analysis, not every continuous function f : R → R is
diﬀerentiable, with the standard example being the absolute value
function f(x) := [x[, which is continuous not diﬀerentiable at the
16
Note that we have modiﬁed the usual formulation of the heat kernel by replacing
t with t
2
in order to make it conform to the notational conventions used in this exercise.
156 1. Measure theory
origin x = 0. Of course, this function is still almost everywhere dif
ferentiable. With a bit more eﬀort, one can construct continuous
functions that are in fact nowhere diﬀerentiable:
Exercise 1.6.28 (Weierstrass function). Let F : R → R be the
function
F(x) :=
∞
¸
n=1
4
−n
sin(8
n
πx).
(i) Show that F is welldeﬁned (in the sense that the series is
absolutely convergent) and that F is a bounded continuous
function.
(ii) Show that for every 8dyadic interval [
j
8
n
,
j+1
8
n
] with n ≥ 1,
one has [F(
j+1
8
n
)−F(
j
8
n
)[ ≥ c4
−n
for some absolute constant
c > 0.
(iii) Show that F is not diﬀerentiable at any point x ∈ R. (Hint:
argue by contradiction and use the previous part of this
exercise.) Note that it is not enough to formally diﬀerentiate
the series term by term and observe that the resulting series
is divergent  why not?
The diﬃculty here is that a continuous function can still contain a
large amount of oscillation, which can lead to breakdown of diﬀeren
tiability. However, if one can somehow limit the amount of oscillation
present, then one can often recover a fair bit of diﬀerentiability. For
instance, we have
Theorem 1.6.25 (Monotone diﬀerentiation theorem). Any function
F : R → R which is monotone (either monotone nondecreasing or
monotone nonincreasing) is diﬀerentiable almost everywhere.
Exercise 1.6.29. Show that every monotone function is measurable.
To prove this theorem, we just treat the case when F is mono
tone nondecreasing, as the nonincreasing case is similar (and can be
deduced from the nondecreasing case by replacing F with −F).
We also ﬁrst focus on the case when F is continuous, as this allows
us to use the rising sun lemma. To understand the diﬀerentiability of
F, we introduce the four Dini derivatives of F at x:
1.6. Diﬀerentiation theorems 157
(i) The upper right derivative D
+
F(x) := limsup
h→0
+
F(x+h)−F(x)
h
;
(ii) The lower right derivative D
+
F(x) := liminf
h→0
+
F(x+h)−F(x)
h
;
(iii) The upper left derivative D
−
F(x) := limsup
h→0
−
F(x+h)−F(x)
h
;
(iv) The lower right derivative D
−
F(x) := liminf
h→0
−
F(x+h)−F(x)
h
.
Regardless of whether F is diﬀerentiable or not (or even whether F
is continuous or not), the four Dini derivatives always exist and take
values in the extended real line [−∞, ∞]. (If F is only deﬁned on an
interval [a, b], rather than on the endpoints, then some of the Dini
derivatives may not exist at the endpoints, but this is a measure zero
set and will not impact our analysis.)
Exercise 1.6.30. If F is monotone, show that the four Dini deriva
tives of F are measurable. (Hint: the main diﬃculty is to reformulate
the derivatives so that h ranges over a countable set rather than an
uncountable one.)
A function F is diﬀerentiable at x precisely when the four deriva
tives are equal and ﬁnite:
(1.29) D
+
F(x) = D
+
F(x) = D
−
F(x) = D
−
F(x) ∈ (−∞, +∞).
We also have the trivial inequalities
D
+
F(x) ≤ D
+
F(x); D
−
F(x) ≤ D
−
F(x).
If F is nondecreasing, all these quantities are nonnegative, thus
0 ≤ D
+
F(x) ≤ D
+
F(x); 0 ≤ D
−
F(x) ≤ D
−
F(x).
The onesided HardyLittlewood maximal inequality has an ana
logue in this setting:
Lemma 1.6.26 (Onesided HardyLittlewood inequality). Let F :
[a, b] → R be a continuous monotone nondecreasing function, and
let λ > 0. Then we have
m(¦x ∈ [a, b] : D
+
F(x) ≥ λ¦) ≤
F(b) −F(a)
λ
.
Similarly for the other three Dini derivatives of F.
158 1. Measure theory
If F is not assumed to be continuous, then we have the weaker
inequality
m(¦x ∈ [a, b] : D
+
F(x) ≥ λ¦) ≤ C
F(b) −F(a)
λ
for some absolute constant C > 0.
Remark 1.6.27. Note that if one naively applies the fundamental
theorems of calculus, one can formally see that the ﬁrst part of Lemma
1.6.26 is equivalent to Lemma 1.6.16. We cannot however use this
argument rigorously because we have not established the necessary
fundamental theorems of calculus to do this. Nevertheless, we can
borrow the proof of Lemma 1.6.16 without diﬃculty to use here, and
this is exactly what we will do.
Proof. We just prove the continuous case and leave the discontinuous
case as an exercise.
It suﬃces to prove the claim for D
+
F; by reﬂection (replacing
F(x) with −F(−x), and [a, b] with [−b, −a]), the same argument
works for D
−
F, and then this trivially implies the same inequalities
for D
+
F and D
−
F. By modifying λ by an epsilon, and dropping the
endpoints from [a, b] as they have measure zero, it suﬃces to show
that
m(¦x ∈ (a, b) : D
+
F(x) > λ¦) ≤
F(b) −F(a)
λ
We may apply the rising sun lemma (Lemma 1.6.17) to the contin
uous function G(x) := F(x) −λx. This gives us an at most countable
family of intervals I
n
= (a
n
, b
n
) in (a, b), such that G(b
n
) ≥ G(a
n
)
for each n, and such that G(y) ≤ G(x) whenever a ≤ x ≤ y ≤ b and
x lies outside of all of the I
n
.
Observe that if x ∈ (a, b), and G(y) ≤ G(x) for all x ≤ y ≤ b, then
D
+
F(x) ≤ λ. Thus we see that the set ¦x ∈ (a, b) : D
+
F(x) > λ¦ is
contained in the union of the I
n
, and so by countable additivity
m(¦x ∈ (a, b) : D
+
F(x) > λ¦) ≤
¸
n
b
n
−a
n
.
But we can rearrange the inequality G(b
n
) ≤ G(a
n
) as b
n
− a
n
≤
F(bn)−F(an)
λ
. From telescoping series and the monotone nature of F
we have
¸
n
F(b
n
) −F(a
n
) ≤ F(b) −F(a) (this is easiest to prove by
1.6. Diﬀerentiation theorems 159
ﬁrst working with a ﬁnite subcollection of the intervals (a
n
, b
n
), and
then taking suprema), and the claim follows.
The discontinuous case is left as an exercise.
Exercise 1.6.31. Prove Lemma 1.6.26 in the discontinuous case.
(Hint: the rising sun lemma is no longer available, but one can use
either the Vitalitype covering lemma (which will give C = 3) or the
Besicovitch lemma (which will give C = 2), by modifying the proof
of Theorem 1.6.20.
Sending λ → ∞ in the above lemma (cf. Exercise 1.3.18), and
then sending [a, b] to R, we conclude as a corollary that all the four
Dini derivatives of a continuous monotone nondecreasing function are
ﬁnite almost everywhere. So to prove Theorem 1.6.25 for continuous
monotone nondecreasing functions, it suﬃces to show that (1.29)
holds for almost every x. In view of the trivial inequalities, it suﬃces
to show that D
+
F(x) ≤ D
−
F(x) and D
−
F(x) ≤ D
+
F(x) for almost
every x. We will just show the ﬁrst inequality, as the second follows
by replacing F with its reﬂection x → −F(−x). It will suﬃce to show
that for every pair 0 < r < R of real numbers, the set
E = E
r,R
:= ¦x ∈ R : D
+
F(x) > R > r > D
−
F(x)¦
is a null set, since by letting R, r range over rationals with R > r > 0
and taking countable unions, we would conclude that the set ¦x ∈ R :
D
+
F(x) > D
−
F(x)¦ is a null set (recall that the Dini derivatives are
all nonnegative when F is nondecreasing), and the claim follows.
Clearly E is a measurable set. To prove that it is null, we will
establish the following estimate:
Lemma 1.6.28 (E has density less than one). For any interval [a, b]
and any 0 < r < R, one has m(E
r,R
∩ [a, b]) ≤
r
R
[b −a[.
Indeed, this lemma implies that E has no points of density, which
by Exercise 1.6.24 forces E to be a null set.
Proof. We begin by applying the rising sun lemma to the function
G(x) := rx + F(−x) on [−b, −a]; the large number of negative signs
present here is needed in order to properly deal with the lower left Dini
derivative D
−
F. This gives an at most countable family of disjoint
160 1. Measure theory
intervals −I
n
= (−b
n
, −a
n
) in (−b, −a), such that G(−a
n
) ≥ G(−b
n
)
for all n, and such that G(−x) ≤ G(−y) whenever −x ≤ −y ≤ −a and
−x ∈ (−b, −a) lies outside of all of the −I
n
. Observe that if x ∈ (a, b),
and G(−x) ≤ G(−y) for all −x ≤ −y ≤ −a, then D
−
F(x) ≥ r.
Thus we see that E
r,R
is contained inside the union of the intervals
I
n
= (a
n
, b
n
). On the other hand, from the ﬁrst part of Lemma 1.6.26
we have
m(E
r,R
∩ (a
n
, b
n
)) ≤
F(b
n
) −F(a
n
)
R
.
But we can rearrange the inequality G(−a
n
) ≤ G(−b
n
) as F(b
n
) −
F(a
n
) ≤ r(b
n
−a
n
). From countable additivity, one thus has
m(E
r,R
) ≤
r
R
¸
n
b
n
−a
n
.
But the (a
n
, b
n
) are disjoint inside (a, b), so from countable additivity
again, we have
¸
n
b
n
−a
n
≤ b −a, and the claim follows.
Remark 1.6.29. Note if F was not assumed to be continuous, then
one would lose a factor of C here from the second part of Lemma
1.6.26, and one would then be unable to prevent D
+
F from being up
to C times as large as D
−
F. So sometimes, even when all one is seek
ing is a qualitative result such as diﬀerentiability, it is still important
to keep track of constants. (But this is the exception rather than the
rule: for a large portion of arguments in analysis, the constants are
not terribly important.)
This concludes the proof of Theorem 1.6.25 in the continuous
monotone nondecreasing case. Now we work on removing the conti
nuity hypothesis (which was needed in order to make the rising sun
lemma work properly). If we naively try to run the density argument
as we did in previous sections, then (for once) the argument does not
work very well, as the space of continuous monotone functions are not
suﬃciently dense in the space of all monotone functions in the rele
vant sense (which, in this case, is in the total variation sense, which is
what is needed to invoke such tools as Lemma 1.6.26.). To bridge this
gap, we have to supplement the continuous monotone functions with
another class of monotone functions, known as the jump functions.
1.6. Diﬀerentiation theorems 161
Deﬁnition 1.6.30 (Jump function). A basic jump function J is a
function of the form
J(x) :=
0 when x < x
0
θ when x = x
0
1 when x > x
0
for some real numbers x
0
∈ R and 0 ≤ θ ≤ 1; we call x
0
the point of
discontinuity for J and θ the fraction. Observe that such functions
are monotone nondecreasing, but have a discontinuity at one point.
A jump function is any absolutely convergent combination of basic
jump functions, i.e. a function of the form F =
¸
n
c
n
J
n
, where n
ranges over an at most countable set, each J
n
is a basic jump function,
and the c
n
are positivereals with
¸
n
c
n
< ∞. If there are only ﬁnitely
many n involved, we say that F is a piecewise constant jump function.
Thus, for instance, if q
1
, q
2
, q
3
, . . . is any enumeration of the ra
tionals, then
¸
∞
n=1
2
−n
1
[qn,+∞)
is a jump function.
Clearly, all jump functions are monotone nondecreasing. From
the absolute convergence of the c
n
we see that every jump function is
the uniform limit of piecewise constant jump functions, for instance
¸
∞
n=1
c
n
J
n
is the uniform limit of
¸
N
n=1
c
n
J
n
. One consequence of
this is that the points of discontinuity of a jump function
¸
∞
n=1
c
n
J
n
are precisely those of the individual summands c
n
J
n
, i.e. of the points
x
n
where each J
n
jumps.
The key fact is that these functions, together with the continuous
monotone functions, essentially generate all monotone functions, at
least in the bounded case:
Lemma 1.6.31 (Continuoussingular decomposition for monotone
functions). Let F : R → R be a monotone nondecreasing function.
(i) The only discontinuities of F are jump discontinuities. More
precisely, if x is a point where F is discontinuous, then the
limits lim
y→x
− F(y) and lim
y→x
+ F(y) both exist, but are
unequal, with lim
y→x
− F(y) < lim
y→x
+ F(y).
(ii) There are at most countably many discontinuities of F.
162 1. Measure theory
(iii) If F is bounded, then F can be expressed as the sum of
a continuous monotone nondecreasing function F
c
and a
jump function F
pp
.
Remark 1.6.32. This decomposition is part of the more general
Lebesgue decomposition, discussed in '1.2 of An epsilon of room, Vol.
I.
Proof. By monotonicity, the limits F
−
(x) := lim
y→x
− F(y) and F
+
(x) :=
lim
y→x
+ F(y) always exist, with F
−
(x) ≤ F(x) ≤ F
+
(x) for all x.
This gives (i).
By (i), whenever there is a discontinuity x of F, there is at least
one rational number q
x
strictly between F
−
(x) and F
+
(x), and from
monotonicity, each rational number can be assigned to at most one
discontinuity. This gives (ii).
Now we prove (iii). Let A be the set of discontinuities of F,
thus A is at most countable. For each x ∈ A, we deﬁne the jump
c
x
:= F
+
(x) −F
−
(x) > 0, and the fraction θ
x
:=
F(x)−F−(x)
F+(x)−F−(x)
∈ [0, 1].
Thus
F
+
(x) = F
−
(x) +c
x
and F(x) = F
−
(x) +θ
x
c
x
.
Note that c
x
is the measure of the interval (F
−
(x), F
+
(x)). By
monotonicity, these intervals are disjoint; by the boundedness of F,
their union is bounded. By countable additivity, we thus have
¸
x∈A
c
x
<
∞, and so if we let J
x
be the basic jump function with point of dis
continuity x and fraction θ
x
, then the function
F
pp
:=
¸
x∈A
c
x
J
x
is a jump function.
As discussed previously, G is discontinuous only at A, and for
each x ∈ A one easily checks that
(F
pp
)
+
(x) = (F
pp
)
−
(x) +c
x
and F
pp
(x) = (F
pp
)
−
(x) +θ
x
c
x
where (F
pp
)
−
(x) := lim
y→x
− F
pp
(y), and (F
pp
)
+
(x) := lim
y→x
+ F
pp
(y).
We thus see that the diﬀerence F
c
:= F −F
pp
is continuous. The only
1.6. Diﬀerentiation theorems 163
remaining task is to verify that F
c
is monotone nondecreasing, thus
we need
F
pp
(b) −F
pp
(a) ≤ F(b) −F(a)
for all a < b. But the lefthand side can be rewritten as
¸
x∈A∩[a,b]
c
x
.
As each c
x
is the measure of the interval (F
−
(x), F
+
(x)), and these
intervals for x ∈ A ∩ [a, b] are disjoint and lie in (F(a), F(b)), the
claim follows from countable additivity.
Exercise 1.6.32. Show that the decomposition of a bounded mono
tone nondecreasing function F into continuous F
c
and jump compo
nents F
pp
given by the above lemma is unique.
Exercise 1.6.33. Find a suitable generalisation of the notion of a
jump function that allows one to extend the above decomposition to
unbounded monotone functions, and then prove this extension. (Hint:
the notion to shoot for here is that of a “locally jump function”.)
Now we can ﬁnish the proof of Theorem 1.6.25. As noted pre
viously, it suﬃces to prove the claim for monotone nondecreasing
functions. As diﬀerentiability is a local condition, we can easily re
duce to the case of bounded monotone nondecreasing functions, since
to test diﬀerentiability of a monotone nondecreasing function F in
any compact interval [a, b] we may replace F by the bounded mono
tone nondecreasing function max(min(F, F(b)), F(a)) with no change
in the diﬀerentiability in [a, b] (except perhaps at the endpoints a, b,
but these form a set of measure zero). As we have already proven
the claim for continuous functions, it suﬃces by Lemma 1.6.31 (and
linearity of the derivative) to verify the claim for jump functions.
Now, ﬁnally, we are able to use the density argument, using the
piecewise constant jump functions as the dense subclass, and using
the second part of Lemma 1.6.26 for the quantitative estimate; for
tunately for us, the density argument does not particularly care that
there is a loss of a constant factor in this estimate.
For piecewise constant jump functions, the claim is clear (indeed,
the derivative exists and is zero outside of ﬁnitely many discontinu
ities). Now we run the density argument. Let F be a bounded jump
function, and let ε > 0 and λ > 0 be arbitrary. As every jump function
is the uniform limit of piecewise constant jump functions, we can ﬁnd
164 1. Measure theory
a piecewise constant jump function F
ε
such that [F(x) − F
ε
(x)[ ≤ ε
for all x. Indeed, by taking F
ε
to be a partial sum of the basic jump
functions that make up F, we can ensure that F −F
ε
is also a mono
tone nondecreasing function. Applying the second part of Lemma
1.6.26, we have
¦x ∈ R : D
+
(F −F
ε
)(x) ≥ λ¦ ≤
2Cε
λ
for some absolute constant C, and similarly for the other four Dini
derivatives. Thus, outside of a set of measure at most 8Cε/λ, all
of the Dini derivatives of F − F
ε
are less than λ. Since F
/
ε
is almost
everywhere diﬀerentiable, we conclude that outside of a set of measure
at most 8Cε/λ, all the Dini derivatives of F(x) lie within λ of F
/
ε
(x),
and in particular are ﬁnite and lie within 2λ of each other. Sending
ε to zero (holding λ ﬁxed), we conclude that for almost every x, the
Dini derivatives of F are ﬁnite and lie within 2λ of each other. If
we then send λ to zero, we see that for almost every x, the Dini
derivatives of F agree with each other and are ﬁnite, and the claim
follows. This concludes the proof of Theorem 1.6.25.
Just as the integration theory of unsigned functions can be used to
develop the integration theory of the absolutely convergent functions
(see Section 1.3.4), the diﬀerentiation theory of monotone functions
can be used to develop a parallel diﬀerentiation theory for the class
of functions of bounded variation:
Deﬁnition 1.6.33 (Bounded variation). Let F : R → R be a func
tion. The total variation F
TV (R)
(or F
TV
for short) of F is
deﬁned to be the supremum
F
TV (R)
:= sup
x0<...<xn
n
¸
i=1
[F(x
i
) −F(x
i+1
)[
where the supremum ranges over all ﬁnite increasing sequences x
0
, . . . , x
n
of real numbers with n ≥ 0; this is a quantity in [0, +∞]. We say
that F has bounded variation (on R) if F
TV (R)
is ﬁnite. (In this
case, F
TV (R)
is often written as F
BV (R)
or just F
BV
.)
1.6. Diﬀerentiation theorems 165
Given any interval [a, b], we deﬁne the total variation F
TV ([a,b])
of F on [a, b] as
F
TV ([a,b])
:= sup
a≤x0<...<xn≤b
n
¸
i=1
[F(x
i
) −F(x
i+1
)[;
thus the deﬁnition is the same, but the points x
0
, . . . , x
n
are restricted
to lie in [a, b]. Thus for instance F
TV (R)
= sup
N→∞
F
TV ([−N,N])
.
We say that a function F has bounded variation on [a, b] if F
BV ([a,b])
is ﬁnite.
Exercise 1.6.34. If F : R → R is a monotone function, show that
F
TV ([a,b])
= [F(b) − F(a)[ for any interval [a, b], and that F has
bounded variation on R if and only if it is bounded.
Exercise 1.6.35. For any functions F, G : R → R, establish the
triangle property F + G
TV (R)
≤ F
TV (R)
+ G
TV (R)
and the
homogeneity property cF
TV (R)
= [c[F
TV (R)
for any c ∈ R. Also
show that F
TV
= 0 if and only if F is constant.
Exercise 1.6.36. If F : R → Ris a function, show that F
TV ([a,b])
+
F
TV ([b,c])
= F
TV ([a,c])
whenever a ≤ b ≤ c.
Exercise 1.6.37. (i) Show that every function f : R → R of
bounded variation is bounded, and that the limits lim
x→+∞
f(x)
and lim
x→−∞
f(x), are welldeﬁned.
(ii) Give a counterexample of a bounded, continuous, compactly
supported function f that is not of bounded variation.
Exercise 1.6.38. Let f : R → R be an absolutely integrable func
tion, and let F : R → Rbe the indeﬁnite integral F(x) :=
[−∞,x]
f(x).
Show that F is of bounded variation, and that F
TV (R)
= f
L
1
(R)
.
(Hint: the upper bound F
TV (R)
≤ f
L
1
(R)
is relatively easy to
establish. To obtain the lower bound, use the density argument.)
Much as an absolutely integrable function can be expressed as
the diﬀerence of its positive and negative parts, a bounded variation
function can be expressed as the diﬀerence of two bounded monotone
functions:
Proposition 1.6.34. A function F : R → R is of bounded variation
if and only if it is the diﬀerence of two bounded monotone functions.
166 1. Measure theory
Proof. It is clear from Exercises 1.6.34, 1.6.35 that the diﬀerence of
two bounded monotone functions is bounded. Now deﬁne the positive
variation F
+
: R → R of F by the formula
(1.30) F
+
(x) := sup
x0<...<xn≤x
n
¸
i=1
max(F(x
i+1
) −F(x
i
), 0).
It is clear from construction that this is a monotone increasing func
tion, taking values between 0 and F
TV (R)
, and is thus bounded. To
conclude the proposition, it suﬃces to (by writing F = F
+
−(F
+
−F
−
)
to show that F
+
−F is nondecreasing, or in other words to show that
F
+
(b) ≥ F
+
(a) +F(b) −F(a).
If F(b) −F(a) is negative then this is clear from the monotone non
decreasing nature of F
+
, so assume that F(b) −F(a) ≥ 0. But then
the claim follows because any sequence of real numbers x
0
< . . . <
x
n
≤ a can be extended by one or two elements by adding a and b,
thus increasing the sum sup
x0<...<xn
¸
n
i=1
max(F(x
i
) − F(x
i+1
), 0)
by at least F(b) −F(a).
Exercise 1.6.39. Let F : R → R be of bounded variation. Deﬁne
the positive variation F
+
by (1.30), and the negative variation F
−
by
F
−
(x) := sup
x0<...<xn≤x
n
¸
i=1
max(−F(x
i+1
) +F(x
i
), 0).
Establish the identities
F(x) = F(−∞) +F
+
(x) −F
−
(x),
F
TV [a,b]
= F
+
(b) −F
+
(a) +F
−
(b) −F
−
(a),
and
F
TV
= F
+
(+∞) +F
−
(+∞)
for every interval [a, b], where F(−∞) := lim
x→−∞
F(x), F
+
(+∞) :=
lim
x→+∞
F
+
(x), and F
−
(+∞) := lim
x→+∞
F
−
(x). (Hint: The main
diﬃculty comes from the fact that a partition x
0
< . . . < x
n
≤ x that
is good for F
+
need not be good for F
−
, and vice versa. However, this
can be ﬁxed by taking a good partition for F
+
and a good partition
for F
−
and combining them together into a common reﬁnement.)
1.6. Diﬀerentiation theorems 167
From Proposition 1.6.34 and Theorem 1.6.25 we immediately ob
tain
Corollary 1.6.35 (BV diﬀerentiation theorem). Every bounded vari
ation function is diﬀerentiable almost everywhere.
Exercise 1.6.40. Call a function locally of bounded variation if it
is of bounded variation on every compact interval [a, b]. Show that
every function that is locally of bounded variation is diﬀerentiable
almost everywhere.
Exercise 1.6.41 (Lipschitz diﬀerentiation theorem, onedimensional
case). A function f : R → R is said to be Lipschitz continuous
if there exists a constant C > 0 such that [f(x) − f(y)[ ≤ C[x −
y[ for all x, y ∈ R; the smallest C with this property is known as
the Lipschitz constant of f. Show that every Lipschitz continuous
function F is locally of bounded variation, and hence diﬀerentiable
almost everywhere. Furthermore, show that the derivative F
/
, when
it exists, is bounded in magnitude by the Lipschitz constant of F.
Remark 1.6.36. The same result is true in higher dimensions, and
is known as the Radamacher diﬀerentiation theorem, but we will defer
the proof of this theorem to Section 2.2, when we have the powerful
tool of the FubiniTonelli theorem (Corollary 1.7.23) available, that is
particularly useful for deducing higherdimensional results in analysis
from lowerdimensional ones.
Exercise 1.6.42. A function f : R → R is said to be convex if one
has f((1 −t)x+ty) ≤ (1 −t)f(x) +tf(y) for all x < y and 0 < t < 1.
Show that if f is convex, then it is continuous and almost everywhere
diﬀerentiable, and its derivative f
/
is equal almost everywhere to a
monotone nondecreasing function, and so is itself almost everywhere
diﬀerentiable. (Hint: Drawing the graph of f, together with a number
of chords and tangent lines, is likely to be very helpful in providing
visual intuition.) Thus we see that in some sense, convex functions
are “almost everywhere twice diﬀerentiable”. Similar claims also hold
for concave functions, of course.
168 1. Measure theory
1.6.4. The second fundamental theorem of calculus. We are
now ﬁnally ready to attack the second fundamental theorem of cal
culus in the cases where F is not assumed to be continuously diﬀer
entiable. We begin with the case when F : [a, b] → R is monotone
nondecreasing. From Theorem 1.6.25 (extending F to the rest of the
real line if needed), this implies that F is diﬀerentiable almost every
where in [a, b], so F
/
is deﬁned a.e.; from monotonicity we see that F
/
is nonnegative whenever it is deﬁned. Also, an easy modiﬁcation of
Exercise 1.6.1 shows that F
/
is measurable.
One half of the second fundamental theorem is easy:
Proposition 1.6.37 (Upper bound for second fundamental theo
rem). Let F : [a, b] → R be monotone nondecreasing (so that, as
discussed above, F
/
is deﬁned almost everywhere, is unsigned, and is
measurable). Then
[a,b]
F
/
(x) dx ≤ F(b) −F(a).
In particular, F
/
is absolutely integrable.
Proof. It is convenient to extend F to all of R by declaring F(x) :=
F(b) for x > b and F(x) := F(a) for x < a, then F is now a bounded
monotone function on R, and F
/
vanishes outside of [a, b]. As F is
almost everywhere diﬀerentiable, the Newton quotients
f
n
(x) :=
F(x + 1/n) −F(x)
1/n
converge pointwise almost everywhere to F
/
. Applying Fatou’s lemma
(Corollary1.4.47), we conclude that
[a,b]
F
/
(x) dx ≤ liminf
n→∞
[a,b]
F(x + 1/n) −F(x)
1/n
dx.
The righthand side can be rearranged as
liminf
n→∞
n(
[a+1/n,b+1/n]
F(y) dy −
[a,b]
F(x) dx)
which can be rearranged further as
liminf
n→∞
n(
[b,b+1/n]
F(x) dx −
[a,a+1/n]
F(x) dx).
1.6. Diﬀerentiation theorems 169
Since F is equal to F(b) for the ﬁrst integral and is at least F(a) for
the second integral, this expression is at most
≤ liminf
n→∞
n(F(b)/n −F(a)/n) = F(b) −F(a)
and the claim follows.
Exercise 1.6.43. Show that any function of bounded variation has
an (almost everywhere deﬁned) derivative that is absolutely inte
grable.
In the Lipschitz case, one can do better:
Exercise 1.6.44 (Second fundamental theorem for Lipschitz func
tions). Let F : [a, b] → R be Lipschitz continuous. Show that
[a,b]
F
/
(x) dx = F(b) −F(a). (Hint: Argue as in the proof of Propo
sition 1.6.37, but use the dominated convergence theorem (Theorem
1.4.49) in place of Fatou’s lemma (Corollary1.4.47).)
Exercise 1.6.45 (Integration by parts formula). Let F, G : [a, b] →
R be Lipschitz continuous functions. Show that
[a,b]
F
/
(x)G(x) dx = F(b)G(b) −F(a)G(a)
−
[a,b]
F(x)G
/
(x) dx.
(Hint: ﬁrst show that the product of two Lipschitz continuous func
tions on [a, b] is again Lipschitz continuous.)
Now we return to the monotone case. Inspired by the Lipschitz
case, one may hope to recover equality in Proposition 1.6.37 for such
functions F. However, there is an important obstruction to this,
which is that all the variation of F may be concentrated in a set of
measure zero, and thus undetectable by the Lebesgue integral of F
/
.
This is most obvious in the case of a discontinuous monotone function,
such as the (appropriately named) Heaviside function F := 1
[0,+∞)
;
it is clear that F
/
vanishes almost everywhere, but F(b) − F(a) is
not equal to
[a,b]
F
/
(x) dx if b and a lie on opposite sides of the
discontinuity at 0. In fact, the same problem arises for all jump
functions:
170 1. Measure theory
Exercise 1.6.46. Show that if F is a jump function, then F
/
vanishes
almost everywhere. (Hint: use the density argument, starting from
piecewise constant jump functions and using Proposition 1.6.37 as the
quantitative estimate.)
One may hope that jump functions  in which all the ﬂuctua
tion is concentrated in a countable set  are the only obstruction to
the second fundamental theorem of calculus holding for monotone
functions, and that as long as one restricts attention to continuous
monotone functions, that one can recover the second fundamental
theorem. However, this is still not true, because it is possible for
all the ﬂuctuation to now be concentrated, not in a countable collec
tion of jump discontinuities, but instead in an uncountable set of zero
measure, such as the middle thirds Cantor set (Exercise 1.2.9). This
can be illustrated by the key counterexample of the Cantor function,
also known as the Devil’s staircase function. The construction of this
function is detailed in the exercise below.
Exercise 1.6.47 (Cantor function). Deﬁne the functions F
0
, F
1
, F
2
, . . . :
[0, 1] → R recursively as follows:
1. Set F
0
(x) := x for all x ∈ [0, 1].
2. For each n = 1, 2, . . . in turn, deﬁne
F
n
(x) :=
1
2
F
n−1
(3x) if x ∈ [0, 1/3];
1
2
if x ∈ (1/3, 2/3);
1
2
+
1
2
F
n−1
(3x −2) if x ∈ [2/3, 1]
(i) Graph F
0
, F
1
, F
2
, and F
3
(preferably on a single graph).
(ii) Show that for each n = 0, 1, . . ., F
n
is a continuous monotone
nondecreasing function with F
n
(0) = 0 and F
n
(1) = 1.
(Hint: induct on n.)
(iii) Show that for each n = 0, 1, . . ., one has [F
n+1
(x)−F
n
(x)[ ≤
2
−n
for each x ∈ [0, 1]. Conclude that the F
n
converge
uniformly to a limit F : [0, 1] → R. This limit is known as
the Cantor function.
(iv) Show that the Cantor function F is continuous and mono
tone nondecreasing, with F(0) = 0 and F(1) = 1.
1.6. Diﬀerentiation theorems 171
(v) Show that if x ∈ [0, 1] lies outside the middle thirds Can
tor set (Exercise 1.2.9), then F is constant in a neighbour
hood of x, and in particular F
/
(x) = 0. Conclude that
[0,1]
F
/
(x) dx = 0 = 1 = F(1) − F(0), so that the second
fundamental theorem of calculus fails for this function.
(vi) Show that F(
¸
∞
n=1
a
n
3
−n
) =
¸
∞
n=1
an
2
2
−n
for any digits
a
1
, a
2
, . . . ∈ ¦0, 2¦. Thus the Cantor function, in some sense,
converts base three expansions to base two expansions.
(1) Let I = [
¸
n
i=1
ai
3
i
,
¸
n
i=1
ai
3
i
+
1
3
n
] be one of the intervals used
in the n
th
cover I
n
of C (see Exercise 1.2.9), thus n ≥ 0 and
a
1
, . . . , a
n
∈ ¦0, 2¦. Show that I is an interval of length 3
−n
,
but F(I) is an interval of length 2
−n
.
(2) Show that F is not diﬀerentiable at any element of the Can
tor set C.
Remark 1.6.38. This example shows that the classical derivative
F
/
(x) := lim
h→0;h,=0
F(x+h)−F(x)
h
of a function has some defects; it
cannot “see” some of the variation of a continuous monotone function
such as the Cantor function. In '1.13 of An epsilon of room, Vol. I,
this will be rectiﬁed by introducing the concept of the weak derivative
of a function, which despite the name, is more able than the strong
derivative to detect this type of singular variation behaviour. (We will
also encounter in Section 1.7.3 the LebesgueStieltjes integral, which
is another (closely related) way to capture all of the variation of a
monotone function, and which is related to the classical derivative
via the LebesgueRadonNikodym theorem, see '1.2 of An epsilon of
room, Vol. I.)
In view of this counterexample, we see that we need to add an ad
ditional hypothesis to the continuous monotone nonincreasing func
tion F before we can recover the second fundamental theorem. One
such hypothesis is absolute continuity. To motivate this deﬁnition, let
us recall two existing deﬁnitions:
(i) A function F : R → R is continuous if, for every ε > 0
and x
0
∈ R, there exists a δ > 0 such that [F(b) −F(a)[ ≤
ε whenever (a, b) is an interval of length at most δ that
contains x
0
.
172 1. Measure theory
(ii) A function F : R → R is uniformly continuous if, for every
ε > 0, there exists a δ > 0 such that [F(b) − F(a)[ ≤ ε
whenever (a, b) is an interval of length at most δ.
Deﬁnition 1.6.39. A function F : R → R is said to be abso
lutely continuous if, for every ε > 0, there exists a δ > 0 such that
¸
n
j=1
[F(b
j
) − F(a
j
)[ ≤ ε whenever (a
1
, b
1
), . . . , (a
n
, b
n
) is a ﬁnite
collection of disjoint intervals of total length
¸
n
j=1
b
j
−a
j
at most δ.
We deﬁne absolute continuity for a function F : [a, b] → Rdeﬁned
on an interval [a, b] similarly, with the only diﬀerence being that the
intervals [a
j
, b
j
] are of course now required to lie in the domain [a, b]
of F.
The following exercise places absolute continuity in relation to
other regularity properties:
Exercise 1.6.48. (i) Show that every absolutely continuous
function is uniformly continuous and therefore continuous.
(ii) Show that every absolutely continuous function is of bounded
variation on every compact interval [a, b]. (Hint: ﬁrst show
this is true for any suﬃciently small interval.) In particu
lar (by Exercise 1.6.40), absolutely continuous functions are
diﬀerentiable almost everywhere.
(iii) Show that every Lipschitz continuous function is absolutely
continuous.
(iv) Show that the function x →
√
x is absolutely continuous,
but not Lipschitz continuous, on the interval [0, 1].
(v) Show that the Cantor function from Exercise 1.6.47 is con
tinuous, monotone, and uniformly continuous, but not ab
solutely continuous, on [0, 1].
(vi) If f : R → R is absolutely integrable, show that the indef
inite integral F(x) :=
[−∞,x]
f(y) dy is absolutely contin
uous, and that F is diﬀerentiable almost everywhere with
F
/
(x) = f(x) for almost every x.
(vii) Show that the sum or product of two absolutely continuous
functions on an interval [a, b] remains absolutely continuous.
What happens if we work on R instead of on [a, b]?
1.6. Diﬀerentiation theorems 173
Exercise 1.6.49. (i) Show that absolutely continuous functions
map null sets to null sets, i.e. if F : R → R is absolutely
continuous and E is a null set then F(E) := ¦F(x) : x ∈ E¦
is also a null set.
(ii) Show that the Cantor function does not have this property.
For absolutely continuous functions, we can recover the second
fundamental theorem of calculus:
Theorem 1.6.40 (Second fundamental theorem for absolutely con
tinuous functions). Let F : [a, b] → R be absolutely continuous. Then
[a,b]
F
/
(x) dx = F(b) −F(a).
Proof. Our main tool here will be Cousin’s theorem (Exercise 1.6.23).
By Exercise 1.6.43, F
/
is absolutely integrable. By Exercise 1.5.10,
F
/
is thus uniformly integrable. Now let ε > 0. By Exercise 1.5.13,
we can ﬁnd κ > 0 such that
U
[F
/
(x)[ dx ≤ ε whenever U ⊂ [a, b] is a
measurable set of measure at most κ. (Here we adopt the convention
that F
/
vanishes outside of [a, b].) By making κ small enough, we may
also assume from absolute continuity that
¸
n
j=1
[F(b
j
) − F(a
j
)[ ≤ ε
whenever (a
1
, b
1
), . . . , (a
n
, b
n
) is a ﬁnite collection of disjoint intervals
of total length
¸
n
j=1
b
j
−a
j
at most κ.
Let E ⊂ [a, b] be the set of points x where F is not diﬀerentiable,
together with the endpoints a, b, as well as the points where x is not
a Lebesgue point of F
/
. thus E is a null set. By outer regularity (or
the deﬁnition of outer measure) we can ﬁnd an open set U containing
E of measure m(U) < κ. In particular,
U
[F
/
(x)[ dx ≤ ε.
Now deﬁne a gauge function δ : [a, b] → (0, +∞) as follows.
(i) If x ∈ E, we deﬁne δ(x) > 0 to be small enough that the
open interval (x −δ(x), x +δ(x)) lies in U.
(ii) If x ∈ E, then F is diﬀerentiable at x and x is a Lebesgue
point of F
/
. We let δ(x) > 0 be small enough that [F(y) −
F(x)−(y−x)F
/
(x)[ ≤ ε[y−x[ holds whenever [y−x[ ≤ δ(x),
and such that [
1
]I]
I
F
/
(y) dy −F
/
(x)[ ≤ ε whenever I is an
interval containing x of length at most δ(x); such a δ(x)
exists by the deﬁnition of diﬀerentiability, and of Lebesgue
174 1. Measure theory
point. We rewrite these properties using bigO notation
17
as
F(y) −F(x) = (y −x)F
/
(x) +O(ε[y −x[) and
I
F
/
(y) dy =
[I[F
/
(x) +O(ε[I[).
Applying Cousin’s theorem, we can ﬁnd a partition a = t
0
< t
1
<
. . . < t
k
= b with k ≥ 1, together with real numbers t
∗
j
∈ [t
j−1
, t
j
] for
each 1 ≤ j ≤ k and t
j
−t
j−1
≤ δ(t
∗
j
).
We can express F(b) −F(a) as a telescoping series
F(b) −F(a) =
k
¸
j=1
F(t
j
) −F(t
j−1
).
To estimate the size of this sum, let us ﬁrst consider those j for which
t
∗
j
∈ E. Then, by construction, the intervals (t
j−1
, t
j
) are disjoint in
U. By construction of κ, we thus have
¸
j:t
∗
j
∈E
[F(t
j
) −F(t
j−1
)[ ≤ ε
and thus
¸
j:t
∗
j
∈E
F(t
j
) −F(t
j−1
) = O(ε).
Next, we consider those j for which t
∗
j
∈ E. By construction, for
those j we have
F(t
j
) −F(t
∗
j
) = (t
j
−t
∗
j
)F
/
(t
∗
j
) +O(ε[t
j
−t
∗
j
[)
and
F(t
∗
j
) −F(t
j−1
) = (t
∗
j
−t
j−1
)F
/
(t
∗
j
) +O(ε[t
∗
j
−t
j−1
[)
and thus
F(t
j
) −F(t
j−1
) = (t
j
−t
j−1
)F
/
(t
∗
j
) +O(ε[t
j
−t
j−1
[).
On the other hand, from construction again we have
[tj−1,tj]
F
/
(y) dy = (t
j
−t
j−1
)F
/
(t
∗
j
) +O(ε[t
j
−t
j−1
[)
17
In this notation, we use O(X) to denote a quantity Y whose magnitude ]Y ] is
at most CX for some absolute constant C. This notation is convenient for managing
error terms when it is not important to keep track of the exact value of constants such
as C, due to such rules as O(X) +O(X) = O(X).
1.6. Diﬀerentiation theorems 175
and thus
F(t
j
) −F(t
j−1
) =
[tj−1,tj]
F
/
(y) dy +O(ε[t
j
−t
j−1
[).
Summing in j, we conclude that
¸
j:t
∗
j
,∈E
F(t
j
) −F(t
j−1
) =
S
F
/
(y) dy +O(ε(b −a)),
where S is the union of all the [t
j−1
, t
j
] with t
∗
j
∈ E. By con
struction, this set is contained in [a, b] and contains [a, b]`U. Since
U
[F
/
(x)[ dx ≤ ε, we conclude that
S
F
/
(y) dy =
[a,b]
F
/
(y) dy +O(ε).
Putting everything together, we conclude that
F(b) −F(a) =
[a,b]
F
/
(y) dy +O(ε) +O(ε[b −a[).
Since ε > 0 was arbitrary, the claim follows.
Combining this result with Exercise 1.6.48, we obtain a satisfac
tory classiﬁcation of the absolutely continuous functions:
Exercise 1.6.50. Show that a function F : [a, b] → R is absolutely
continuous if and only if it takes the form F(x) =
[a,x]
f(y) dy + C
for some absolutely integrable f : [a, b] → R and a constant C.
Exercise 1.6.51 (Compatibility of the strong and weak derivatives
in the absolutely continuous case). Let F : [a, b] → R be an abso
lutely continuous function, and let φ : [a, b] → R be a continuously
diﬀerentiable function supported in a compact subset of (a, b). Show
that
[a,b]
F
/
φ(x) dx = −
[a,b]
Fφ
/
(x) dx.
Inspecting the proof of Theorem 1.6.40, we see that the abso
lute continuity was used primarily in two ways: ﬁrstly, to ensure the
almost everywhere existence, and to control an exceptional null set
E. It turns out that one can achieve the latter control by making a
diﬀerent hypothesis, namely that the function F is everywhere diﬀer
entiable rather than merely almost everywhere diﬀerentiable. More
precisely, we have
176 1. Measure theory
Proposition 1.6.41 (Second fundamental theorem of calculus, again).
Let [a, b] be a compact interval of positive length, let F : [a, b] → R be
a diﬀerentiable function, such that F
/
is absolutely integrable. Then
the Lebesgue integral
[a,b]
F
/
(x) dx of F
/
is equal to F(b) −F(a).
Proof. This will be similar to the proof of Theorem 1.6.40, the one
main new twist being that we need several open sets U instead of just
one. Let E ⊂ [a, b] be the set of points x which are not Lebesgue
points of F
/
, together with the endpoints a, b. This is a null set. Let
ε > 0, and then let κ > 0 be small enough that
U
[F
/
(x)[ dx ≤ ε
whenever U is measurable with m(U) ≤ κ. We can also ensure that
κ ≤ ε.
For every natural number m = 1, 2, . . . we can ﬁnd an open set
U
m
containing E of measure m(U
m
) ≤ κ/4
m
. In particular we see
that m(
¸
∞
m=1
U
m
) ≤ κ and thus
∞
m=1
Um
[F
/
(x)[ dx ≤ ε.
Now deﬁne a gauge function δ : [a, b] → (0, +∞) as follows.
(i) If x ∈ E, we deﬁne δ(x) > 0 to be small enough that the
open interval (x − δ(x), x + δ(x)) lies in U
m
, where m is
the ﬁrst natural number such that [F
/
(x)[ ≤ 2
m
, and also
small enough that [F(y) − F(x) − (y − x)F
/
(x)[ ≤ ε[y − x[
holds whenever [y − x[ ≤ δ(x). (Here we crucially use the
everywhere diﬀerentiability to ensure that f
/
(x) exists and
is ﬁnite here.)
(ii) If x ∈ E, we let δ(x) > 0 be small enough that [F(y)−F(x)−
(y −x)F
/
(x)[ ≤ ε[y −x[ holds whenever [y −x[ ≤ δ(x), and
such that [
1
]I]
I
F
/
(y) dy − F
/
(x)[ ≤ ε whenever I is an
interval containing x of length at most δ(x), exactly as in
the proof of Theorem 1.6.40.
Applying Cousin’s theorem, we can ﬁnd a partition a = t
0
< t
1
<
. . . < t
k
= b with k ≥ 1, together with real numbers t
∗
j
∈ [t
j−1
, t
j
] for
each 1 ≤ j ≤ k and t
j
−t
j−1
≤ δ(t
∗
j
).
As before, we express F(b) −F(a) as a telescoping series
F(b) −F(a) =
k
¸
j=1
F(t
j
) −F(t
j−1
).
1.6. Diﬀerentiation theorems 177
For the contributions of those j with t
∗
j
∈ E, we argue exactly as in
the proof of Theorem 1.6.40 to conclude eventually that
¸
j:t
∗
j
,∈E
F(t
j
) −F(t
j−1
) =
S
F
/
(y) dy +O(ε(b −a)),
where S is the union of all the [t
j−1
, t
j
] with t
∗
j
∈ E. Since
[a,b]\S
[F
/
(x)[ dx ≤
∞
m=1
Um
[F
/
(x)[ dx ≤ ε
we thus have
S
F
/
(y) dy =
[a,b]
F
/
(y) dy +O(ε).
Now we turn to those j with t
∗
j
∈ E. By construction, we have
F(t
j
) −F(t
j−1
) = (t
j
−t
j−1
)F
/
(t
∗
j
) +O(ε[t
j
−t
j−1
[)
ﬁr these intervals, and so
¸
j:t
∗
j
∈E
F(t
j
) −F(t
j−1
) = (
¸
j:t
∗
j
∈E
(t
j
−t
j−1
)F
/
(t
∗
j
)) +O(ε(b −a)).
Next, for each j we have F
/
(t
∗
j
) ≤ 2
m
and [t
j−1
, t
j
] ⊂ U
m
for some
natural number m = 1, 2, . . ., by construction. By countable additiv
ity, we conclude that
(
¸
j:t
∗
j
∈E
(t
j
−t
j−1
)F
/
(t
∗
j
)) ≤
∞
¸
m=1
2
m
m(U
m
) ≤
∞
¸
m=1
2
m
ε/4
m
= O(ε).
Putting all this together, we again have
F(b) −F(a) =
[a,b]
F
/
(y) dy +O(ε) +O(ε[b −a[).
Since ε > 0 was arbitrary, the claim follows.
Remark 1.6.42. The above proposition is yet another illustration of
how the property of everywhere diﬀerentiability is signiﬁcantly better
than that of almost everywhere diﬀerentiability. In practice, though,
the above proposition is not as useful as one might initially think,
because there are very few methods that establish the everywhere
diﬀerentiability of a function that do not also establish continuous
diﬀerentiability (or at least Riemann integrability of the derivative),
at which point one could just use Theorem 1.6.7 instead.
178 1. Measure theory
Exercise 1.6.52. Let F : [−1, 1] → R be the function deﬁned by
setting F(x) := x
2
sin(
1
x
3
) when x is nonzero, and F(0) := 0. Show
that F is everywhere diﬀerentiable, but the deriative F
/
is not abso
lutely integrable, and so the second fundamental theorem of calculus
does not apply in this case (at least if we interpret
[a,b]
F
/
(x) dx
using the absolutely convergent Lebesgue integral). See however the
next exercise.
Exercise 1.6.53 (HenstockKurzweil integral). Let [a, b] be a com
pact interval of positive length. We say that a function f : [a, b] → R
is HenstockKurzweil integrable with integral L ∈ R if for every ε > 0
there exists a gauge function δ : [a, b] → (0, +∞) such that one has
[
k
¸
j=1
f(t
∗
j
)(t
j
−t
j−1
) −L[ ≤ ε
whenever k ≥ 1 and a = t
0
< t
1
< . . . < t
k
= b and t
∗
1
, . . . , t
∗
k
are
such that t
∗
j
∈ [t
j−1
, t
j
] and [t
j
− t
j−1
[ ≤ δ(t
∗
j
) for every 1 ≤ j ≤ k.
When this occurs, we call L the HenstockKurzweil integral of f and
write it as
[a,b]
f(x) dx.
(i) Show that if a function is HenstockKurzweil integrable,
it has a unique HenstockKurzweil integral. (Hint: use
Cousin’s theorem.)
(ii) Show that if a function is Riemann integrable, then it is
HenstockKurzweil integrable, and the HenstockKurzweil
integral
[a,b]
f(x) dx is equal to the Riemann integral
b
a
f(x) dx.
(iii) Show that if a function f : [a, b] → R is everywhere de
ﬁned, everywhere ﬁnite, and is absolutely integrable, then it
is HenstockKurzweil integrable, and the HenstockKurzweil
integral
[a,b]
f(x) dx is equal to the Lebesgue integral
[a,b]
f(x) dx.
(Hint: this is a variant of the proof of Theorem 1.6.40 or
Proposition 1.6.41.)
(iv) Show that if F : [a, b] → R is everywhere diﬀerentiable,
then F
/
is HenstockKurzweil integrable, and the Henstock
Kurzweil integral
[a,b]
F
/
(x) dx is equal to F(b) − F(a).
(Hint: this is a variant of the proof of Theorem 1.6.40 or
Proposition 1.6.41.)
1.7. Outer measure, premeasure, product measure 179
(v) Explain why the above results give an alternate proof of
Exercise 1.6.4 and of Proposition 1.6.41.
Remark 1.6.43. As the above exercise indicates, the Henstock
Kurzweil integral (also known as the Denjoy integral or Perron in
tegral ) extends the Riemann integral and the absolutely convergent
Lebesgue integral, at least as long as one restricts attention to func
tions that are deﬁned and are ﬁnite everywhere (in contrast to the
Lebesgue integral, which is willing to tolerate functions being inﬁnite
or undeﬁned so long as this only occurs on a null set). It is the notion
of integration that is most naturally associated with the fundamental
theorem of calculus for everywhere diﬀerentiable functions, as seen in
part 4 of the above exercise; it can also be used as a uniﬁed frame
work for all the proofs in this section that invoked Cousin’s theorem.
The HenstockKurzweil integral can also integrate some (highly os
cillatory) functions that the Lebesgue integral cannot, such as the
derivative F
/
of the function F appearing in Exercise 1.6.52. This is
analogous to how conditional summation lim
N→∞
¸
N
n=1
a
n
can sum
conditionally convergent series
¸
∞
n=1
a
n
, even if they are not abso
lutely integrable. However, much as conditional summation is not
always wellbehaved with respect to rearrangement, the Henstock
Kurzweil integral does not always react well to changes of variable;
also, due to its reliance on the order structure of the real line R,
it is diﬃcult to extend the HenstockKurzweil integral to more gen
eral spaces, such as the Euclidean space R
d
, or to abstract measure
spaces.
1.7. Outer measures, premeasures, and product
measures
In this text so far, we have focused primarily on one speciﬁc example
of a countably additive measure, namely Lebesgue measure. This
measure was constructed from a more primitive concept of Lebesgue
outer measure, which in turn was constructed from the even more
primitive concept of elementary measure.
It turns out that both of these constructions can be abstracted. In
this section, we will give the Carath´eodory extension theorem, which
constructs a countably additive measure from any abstract outer
180 1. Measure theory
measure; this generalises the construction of Lebesgue measure from
Lebesgue outer measure. One can in turn construct outer measures
from another concept known as a premeasure, of which elementary
measure is a typical example.
With these tools, one can start constructing many more measures,
such as LebesgueStieltjes measures, product measures, and Hausdorﬀ
measures. With a little more eﬀort, one can also establish the Kol
mogorov extension theorem, which allows one to construct a variety
of measures on inﬁnitedimensional spaces, and is of particular im
portance in the foundations of probability theory, as it allows one to
set up probability spaces associated to both discrete and continuous
random processes, even if they have inﬁnite length.
The most important result about product measure, beyond the
fact that it exists, is that one can use it to evaluate iterated inte
grals, and to interchange their order, provided that the integrand
is either unsigned or absolutely integrable. This fact is known as
the FubiniTonelli theorem, and is an absolutely indispensable tool
for computing integrals, and for deducing higherdimensional results
from lowerdimensional ones.
In this section we will however omit a very important way to
construct measures, namely the Riesz representation theorem, which
is discussed in '1.10 of An epsilon of room, Vol. I.
1.7.1. Outer measures and the Carath´eodory extension the
orem. We begin with the abstract concept of an outer measure.
Deﬁnition 1.7.1 (Abstract outer measure). Let X be a set. An ab
stract outer measure (or outer measure for short) is a map µ
∗
: 2
X
→
[0, +∞] that assigns an unsigned extended real number µ
∗
(E) ∈
[0, +∞] to every set E ⊂ X which obeys the following axioms:
(i) (Empty set) µ
∗
(∅) = 0.
(ii) (Monotonicity) If E ⊂ F, then µ
∗
(E) ≤ µ
∗
(F).
(iii) (Countable subadditivity) If E
1
, E
2
, . . . ⊂ X is a countable
sequence of subsets of X, then µ
∗
(
¸
∞
n=1
E
n
) ≤
¸
∞
n=1
µ
∗
(E
n
).
Outer measures are also known as exterior measures.
1.7. Outer measure, premeasure, product measure 181
Thus, for instance, Lebesgue outer measure m
∗
is an outer mea
sure (see Exercise 1.2.3). On the other hand, Jordan outer measure
m
∗,(J)
is only ﬁnitely subadditive rather than countably subadditive
and thus is not, strictly speaking, an outer measure; for this reason
this concept is often referred to as Jordan outer content rather than
Jordan outer measure.
Note that outer measures are weaker than measures in that they
are merely countably subadditive, rather than countably additive. On
the other hand, they are able to measure all subsets of X, whereas
measures can only measure a σalgebra of measurable sets.
In Deﬁnition 1.2.2, we used Lebesgue outer measure together with
the notion of an open set to deﬁne the concept of Lebesgue measur
ability. This deﬁnition is not available in our more abstract setting,
as we do not necessarily have the notion of an open set. An alterna
tive deﬁnition of measurability was put forth in Exercise 1.2.17, but
this still required the notion of a box or an elementary set, which is
still not available in this setting. Nevertheless, we can modify that
deﬁnition to give an abstract deﬁnition of measurability:
Deﬁnition 1.7.2 (Carath´eodory measurability). Let µ
∗
be an outer
measure on a set X. A set E ⊂ X is said to be Carath´eodory mea
surable with respect to µ
∗
if one has
µ
∗
(A) = µ
∗
(A∩ E) +µ
∗
(A`E)
for every set A ⊂ X.
Exercise 1.7.1 (Null sets are Carath´eodory measurable). Suppose
that E is a null set for an outer measure µ
∗
(i.e. µ
∗
(E) = 0). Show
that E is Carath´eodory measurable with respect to µ
∗
.
Exercise 1.7.2 (Compatibility with Lebesgue measurability). Show
that a set E ⊂ R
d
is Carath´eodory measurable with respect to
Lebesgue outer measurable if and only if it is Lebesgue measurable.
(Hint: one direction follows from Exercise 1.2.17. For the other di
rection, ﬁrst verify simple cases, such as when E is a box, or when E
or A are bounded.)
The construction of Lebesgue measure can then be abstracted as
follows:
182 1. Measure theory
Theorem 1.7.3 (Carath´eodory extension theorem). Let µ
∗
: 2
X
→
[0, +∞] be an outer measure on a set X, let B be the collection of all
subsets of X that are Carath´eodory measurable with respect to µ
∗
, and
let µ : B → [0, +∞] be the restriction of µ
∗
to B (thus µ(E) := µ
∗
(E)
whenever E ∈ B). Then B is a σalgebra, and µ is a measure.
Proof. We begin with the σalgebra property. It is easy to see that
the empty set lies in B, and that the complement of a set in B lies
in B also. Next, we verify that B is closed under ﬁnite unions (which
will make B a Boolean algebra). Let E, F ∈ B, and let A ⊂ X be
arbitrary. By deﬁnition, it suﬃces to show that
(1.31) µ
∗
(A) = µ
∗
(A∩ (E ∪ F)) +µ
∗
(A`(E ∪ F)).
To simplify the notation, we partition A into the four disjoint sets
A
00
:= A`(E ∪ F);
A
10
:= (A`F) ∩ E;
A
01
:= (A`E) ∩ F;
A
11
:= A∩ E ∩ F
(the reader may wish to draw a Venn diagram here to understand the
nature of these sets). Thus (1.31) becomes
(1.32) µ
∗
(A
00
∪ A
01
∪ A
10
∪ A
11
) = µ
∗
(A
01
∪ A
10
∪ A
11
) +µ
∗
(A
00
).
On the other hand, from the Carath´eodory measurability of E, one
has
µ
∗
(A
00
∪ A
01
∪ A
10
∪ A
11
) = µ
∗
(A
00
∪ A
01
) +µ
∗
(A
10
∪ A
11
)
and
µ
∗
(A
01
∪ A
10
∪ A
11
) = µ
∗
(A
01
) +µ
∗
(A
10
∪ A
11
)
while from the Carath´eodory measurability of F one has
µ
∗
(A
00
∪ A
01
) = µ
∗
(A
00
) +µ
∗
(A
01
);
putting these identities together we obtain (1.32). (Note that no
subtraction is employed here, and so the arguments still work when
some sets have inﬁnite outer measure.)
Now we verify that B is a σalgebra. As it is already a Boolean
algebra, it suﬃces (see Exercise 1.7.3 below) to verify that B is closed
1.7. Outer measure, premeasure, product measure 183
with respect to countable disjoint unions. Thus, let E
1
, E
2
, . . . be
a disjoint sequence of Carath´eodorymeasurable sets, and let A be
arbitrary. We wish to show that
µ
∗
(A) = µ
∗
(A∩
∞
¸
n=1
E
n
) +µ
∗
(A`
∞
¸
n=1
E
n
).
In view of subadditivity, it suﬃces to show that
µ
∗
(A) ≥ µ
∗
(A∩
∞
¸
n=1
E
n
) +µ
∗
(A`
∞
¸
n=1
E
n
).
For any N ≥ 1,
¸
N
n=1
E
n
is Carath´eodory measurable (as B is a
Boolean algebra), and so
µ
∗
(A) ≥ µ
∗
(A∩
N
¸
n=1
E
n
) +µ
∗
(A`
N
¸
n=1
E
n
).
By monotonicity, µ
∗
(A`
¸
N
n=1
E
n
) ≥ µ
∗
(A`
¸
∞
n=1
E
n
). Taking limits
as N → ∞, it thus suﬃces to show that
µ
∗
(A∩
∞
¸
n=1
E
n
) ≤ lim
N→∞
µ
∗
(A∩
N
¸
n=1
E
n
).
But by the Carath´eodory measurability of
¸
N
n=1
E
n
, we have
µ
∗
(A∩
N+1
¸
n=1
E
n
) = µ
∗
(A∩
N
¸
n=1
E
n
) +µ
∗
(A∩ E
N+1
`
N
¸
n=1
E
n
)
for any N ≥ 0, and thus on iteration
lim
N→∞
µ
∗
(A∩
N
¸
n=1
E
n
) =
∞
¸
N=0
µ
∗
(A∩ E
N+1
`
N
¸
n=1
E
n
)
On the other hand, from countable subadditivity one has
µ
∗
(A∩
∞
¸
n=1
E
n
) ≤
∞
¸
N=0
µ
∗
(A∩ E
N+1
`
N
¸
n=1
E
n
)
and the claim follows.
Finally, we show that µ is a measure. It is clear that µ(∅) = 0,
so it suﬃces to establish countable additivity, thus we need to show
184 1. Measure theory
that
µ
∗
(
∞
¸
n=1
E
n
) =
∞
¸
n=1
µ
∗
(E
n
)
whenever E
1
, E
2
, . . . are Carath´eodorymeasurable and disjoint. By
subadditivity it suﬃces to show that
µ
∗
(
∞
¸
n=1
E
n
) ≥
∞
¸
n=1
µ
∗
(E
n
).
By monotonicity it suﬃces to show that
µ
∗
(
N
¸
n=1
E
n
) =
N
¸
n=1
µ
∗
(E
n
)
for any ﬁnite N. But from the Carath´eodory measurability of
¸
N
n=1
E
n
one has
µ
∗
(
N+1
¸
n=1
E
n
) = µ
∗
(
N
¸
n=1
E
n
) +µ
∗
(E
N+1
)
for any N ≥ 0, and the claim follows from induction.
Exercise 1.7.3. Let B be a Boolean algebra on a set X. Show that
B is a σalgebra if and only if it is closed under countable disjoint
unions, which means that
¸
∞
n=1
E
n
∈ B whenever E
1
, E
2
, E
3
, . . . ∈ B
are a countable sequence of disjoint sets in B.
Remark 1.7.4. Note that the above theorem, combined with Exer
cise 1.7.2 gives a slightly alternate way to construct Lebesgue measure
from Lebesgue outer measure than the construction given in Section
1.2. This is arguably a more eﬃcient way to proceed, but is also less
geometrically intuitive than the approach taken in Section 1.2.
Remark 1.7.5. From Exercise 1.7.1 we see that the measure µ con
structed by the Carath´eodory extension theorem is automatically
complete (see Deﬁnition 1.4.31).
Remark 1.7.6. In '1.15 of An epsilon of room, Vol. I, an impor
tant example of a measure constructed by Carath´eodory’s theorem is
given, namely the ddimensional Hausdorﬀ measure 1
d
on R
n
that
is good for measuring the size of ddimensional subsets of R
n
.
1.7. Outer measure, premeasure, product measure 185
1.7.2. Premeasures. In previous notes, we saw that ﬁnitely addi
tive measures, such as elementary measure or Jordan measure, could
be extended to a countably additive measure, namely Lebesgue mea
sure. It is natural to ask whether this property is true in general. In
other words, given a ﬁnitely additive measure µ
0
: B
0
→ [0, +∞] on
a Boolean algebra B
0
, is it possible to ﬁnd a σalgebra B reﬁning B
0
,
and a countably additive measure µ : B → [0, +∞] that extends µ
0
?
There is an obvious necessary condition in order for µ
0
to have a
countably additive extension, namely that µ
0
already has to be count
ably additive within B
0
. More precisely, suppose that E
1
, E
2
, E
3
, . . . ∈
B
0
were disjoint sets such that their union
¸
∞
n=1
E
n
was also in B
0
.
(Note that this latter property is not automatic as B
0
is merely a
Boolean algebra rather than a σalgebra.) Then, in order for µ
0
to
be extendible to a countably additive measure, it is clearly necessary
that
µ
0
(
∞
¸
n=1
E
n
) =
∞
¸
n=1
µ
0
(E
n
).
Using the Carath´eodory extension theorem, we can show that
this necessary condition is also suﬃcient. More precisely, we have
Deﬁnition 1.7.7 (Premeasure). A premeasure on a Boolean alge
bra B
0
is a ﬁnitely additive measure µ
0
: B
0
→ [0, +∞] with the prop
erty that µ
0
(
¸
∞
n=1
E
n
) =
¸
∞
n=1
µ
0
(E
n
) whenever E
1
, E
2
, E
3
, . . . ∈ B
0
are disjoint sets such that
¸
∞
n=1
E
n
is in B
0
.
Exercise 1.7.4.
(i) Show that the requirement that µ
0
is ﬁnitely additive can
be relaxed to the condition that µ
0
(∅) = 0 without aﬀecting
the deﬁnition of a premeasure.
(ii) Show that the condition µ
0
(
¸
∞
n=1
E
n
) =
¸
∞
n=1
µ
0
(E
n
) can
be relaxed to µ
0
(
¸
∞
n=1
E
n
) ≤
¸
∞
n=1
µ
0
(E
n
) without aﬀect
ing the deﬁnition of a premeasure.
(iii) On the other hand, give an example to show that if one
performs both of the above two relaxations at once, one
starts admitting objects µ
0
that are not premeasures.
186 1. Measure theory
Exercise 1.7.5. Without using the theory of Lebesgue measure,
show that elementary measure (on the elementary Boolean algebra)
is a premeasure. (Hint: use Lemma 1.2.6. Note that one has to
also deal with coelementary sets as well as elementary sets in the
elementary Boolean algebra.)
Exercise 1.7.6. Construct a ﬁnitely additive measure µ
0
: B
0
→
[0, +∞] that is not a premeasure. (Hint: take X to be the natural
numbers, take B
0
= 2
N
to be the discrete algebra, and deﬁne µ
0
separately for ﬁnite and inﬁnite sets.)
Theorem 1.7.8 (HahnKolmogorov theorem). Every premeasure
µ
0
: B
0
→ [0, +∞] on a Boolean algebra B
0
in X can be extended
to a countably additive measure µ : B → [0, +∞].
Proof. We mimic the construction of Lebesgue measure from elemen
tary measure. Namely, for any set E ⊂ X, deﬁne the outer measure
µ
∗
(E) of E to be the quantity
µ
∗
(E) := inf¦
∞
¸
n=1
µ
0
(E
n
) : E ⊂
∞
¸
n=1
E
n
; E
n
∈ B
0
for all n¦.
It is easy to verify (cf. Exercise 1.2.3) that µ
∗
is indeed an outer mea
sure. Let B be the collection of all sets E ⊂ X that are Carath´eodory
measurable with respect to µ
∗
, and let µ be the restriction of µ
∗
to
B. By the Carath´eodory extension theorem, B is a σalgebra and µ
is a countably additive measure.
It remains to show that B contains B
0
and that µ extends µ
0
.
Thus, let E ∈ B
0
; we need to show that E is Carath´eodory measurable
with respect to µ
∗
and that µ
∗
(E) = µ
0
(E). To prove the ﬁrst claim,
let A ⊂ X be arbitrary. We need to show that
µ
∗
(A) = µ
∗
(A∩ E) +µ
∗
(A`E);
by subadditivity, it suﬃces to show that
µ
∗
(A) ≥ µ
∗
(A∩ E) +µ
∗
(A`E).
We may assume that µ
∗
(A) is ﬁnite, since the claim is trivial other
wise.
1.7. Outer measure, premeasure, product measure 187
Fix ε > 0. By deﬁnition of µ
∗
, one can ﬁnd E
1
, E
2
, . . . ∈ B
0
covering A such that
∞
¸
n=1
µ
0
(E
n
) ≤ µ
∗
(A) +ε.
The sets E
n
∩ E lie in B
0
and cover A∩ E and thus
µ
∗
(A∩ E) ≤
∞
¸
n=1
µ
0
(E
n
∩ E).
Similarly we have
µ
∗
(A`E) ≤
∞
¸
n=1
µ
0
(E
n
`E).
Meanwhile, from ﬁnite additivity we have
µ
0
(E
n
∩ E) +µ
0
(E
n
`E) = µ
0
(E
n
).
Combining all of these estimates, we obtain
µ
∗
(A∩ E) +µ
∗
(A`E) ≤ µ
∗
(A) +ε;
since ε > 0 was arbitrary, the claim follows.
Finally, we have to show that µ
∗
(E) = µ
0
(E). Since E covers
itself, we certainly have µ
∗
(E) ≤ µ
0
(E). To show the converse in
equality, it suﬃces to show that
∞
¸
n=1
µ
0
(E
n
) ≥ µ
0
(E)
whenever E
1
, E
2
, . . . ∈ B
0
cover E. By replacing each E
n
with the
smaller set E
n
`
¸
n−1
m=1
E
m
(which still lies in B
0
, and still covers E),
we may assume without loss of generality (thanks to the monotonicity
of µ
0
) that the E
n
are disjoint. Similarly, by replacing each E
n
with
the smaller set E
n
∩ E we may assume without loss of generality
that the union of the E
n
is exactly equal to E. But then the claim
follows from the hypothesis that µ
0
is a premeasure (and not merely
a ﬁnitely additive measure).
188 1. Measure theory
Let us call the measure µ constructed in the above proof the
HahnKolmogorov extension of the premeasure µ
0
. Thus, for in
stance, from Exercise 1.7.2, the HahnKolmogorov extension of ele
mentary measure (with the convention that coelementary sets have
inﬁnite elementary measure) is Lebesgue measure. This is not quite
the unique extension of µ
0
to a countably additive measure, though.
For instance, one could restrict Lebesgue measure to the Borel σ
algebra, and this would still be a countably additive extension of
elementary measure. However, the extension is unique within its own
σalgebra:
Exercise 1.7.7. Let µ
0
: B
0
→ [0, +∞] be a premeasure, let µ :
B → [0, +∞] be the HahnKolmogorov extension of µ
0
, and let µ
/
:
B
/
→ [0, +∞] be another countably additive extension of µ
0
. Suppose
also that µ
0
is σﬁnite, which means that one can express the whole
space X as the countable union of sets E
1
, E
2
, . . . ∈ B
0
for which
µ
0
(E
n
) < ∞ for all n. Show that µ and µ
/
agree on their common
domain of deﬁnition. In other words, show that µ(E) = µ
/
(E) for all
E ∈ B ∩ B
/
. (Hint: ﬁrst show that µ
/
(E) ≤ µ
∗
(E) for all E ∈ B
/
.)
Exercise 1.7.8. The purpose of this exercise is to show that the σ
ﬁnite hypothesis in Exercise 1.7.7 cannot be removed. Let / be the
collection of all subsets in R that can be expressed as ﬁnite unions of
halfopen intervals [a, b). Let µ
0
: / → [0, +∞] be the function such
that µ
0
(E) = +∞ for nonempty E and µ
0
(∅) = 0.
(i) Show that µ
0
is a premeasure.
(ii) Show that '/` is the Borel σalgebra B[R].
(iii) Show that the HahnKolmogorov extension µ : B[R] →
[0, +∞] of µ
0
assigns an inﬁnite measure to any nonempty
Borel set.
(iv) Show that counting measure # (or more generally, c# for
any c ∈ (0, +∞]) is another extension of µ
0
on B[R].
Exercise 1.7.9. Let µ
0
: B
0
→ [0, +∞] be a premeasure which is σ
ﬁnite (thus X is the countable union of sets in B
0
of ﬁnite µ
0
measure),
and let µ : B → [0, +∞] be the HahnKolmogorov extension of µ
0
.
1.7. Outer measure, premeasure, product measure 189
(i) Show that if E ∈ B, then there exists F ∈ 'B
0
` containing
E such that µ(F`E) = 0 (thus F consists of the union of E
and a null set). Furthermore, show that F can be chosen to
be a countable intersection F =
¸
∞
n=1
F
n
of sets F
n
, each of
which is a countable union F
n
=
¸
∞
m=1
F
n,m
of sets F
n,m
in
B
0
.
(ii) If E ∈ B has ﬁnite measure (i.e. µ(E) < ∞), and ε > 0,
show that there exists F ∈ B
0
such that µ(E∆F) ≤ ε.
(iii) Conversely, if E is a set such that for every ε > 0 there
exists F ∈ B
0
such that µ
∗
(E∆F) ≤ ε, show that E ∈ B.
1.7.3. LebesgueStieltjes measure. Now we use the HahnKolmogorov
extension theorem to construct a variety of measures. We begin with
LebesgueStieltjes measure.
Theorem 1.7.9 (Existence of LebesgueStieltjes measure). Let F :
R → R be a monotone nondecreasing function, and deﬁne the left
and right limits
F
−
(x) := sup
y<x
F(y); F
+
(x) := inf
y>x
F(y),
thus one has F
−
(x) ≤ F(x) ≤ F
+
(x) for all x. Let B[R] be the
Borel σalgebra on R. Then there exists a unique Borel measure
µ
F
: B[R] → [0, +∞] such that
(1.33) µ
F
([a, b]) = F
+
(b) −F
−
(a), µ
F
([a, b)) = F
−
(b) −F
−
(a),
µ
F
((a, b]) = F
+
(b) −F
+
(a), µ
F
((a, b)) = F
−
(b) −F
+
(a)
for all −∞ < b < a < ∞, and
(1.34) µ
F
(¦a¦) = F
+
(a) −F
−
(a)
for all a ∈ R.
Proof. (Sketch) For this proof, we will deviate from our previous
notational conventions, and allow intervals to be unbounded, thus
in particular including the halfinﬁnite intervals [a, +∞), (a, +∞),
(−∞, a], (−∞, a) and the doubly inﬁnite interval (−∞, +∞) as in
tervals.
190 1. Measure theory
Deﬁne the Fvolume [I[
F
∈ [0, +∞] of any interval I, adopting
the obvious conventions that F
−
(+∞) = sup
y∈R
F(y) and F
+
(−∞) =
inf
y∈R
F(y), and also adopting the convention that the empty inter
val ∅ has zero Fvolume, [∅[
F
= 0. Note that F
−
(+∞) could equal
+∞ and F
+
(−∞) could equal −∞, but in all circumstances the F
volume [I[
F
is welldeﬁned and takes values in [0, +∞], after adopting
the obvious conventions to evaluate expressions such as +∞−(−∞).
A somewhat tedious case check (Exercise!) gives the additivity
property
[I ∪ J[
F
= [I[
F
+[J[
F
whenever I, J are disjoint intervals that share a common endpoint.
As a corollary, we see that if a interval I is partitioned into ﬁnitely
many disjoint subintervals I
1
, . . . , I
k
, we have [I[ = [I
1
[ +. . . +[I
k
[.
Let B
0
be the Boolean algebra generated by the (possibly inﬁnite)
intervals, then B
0
consists of those sets that can be expressed as a
ﬁnite union of intervals. (This is slightly larger than the elementary
algebra, as it allows for halfinﬁnite intervals such as [0, +∞), whereas
the elementary algebra does not.) We can deﬁne a measure µ
0
on this
algebra by declaring
µ
0
(E) = [I
1
[
F
+. . . +[I
k
[
F
whenever E = I
1
∪ . . . ∪ I
k
is the disjoint union of ﬁnitely many
intervals. One can check (Exercise!) that this measure is welldeﬁned
(in the sense that it gives a unique value to µ
0
(E) for each E ∈ B
0
)
and is ﬁnitely additive. We now claim that µ
0
is a premeasure: thus
we suppose that E = B
0
is the disjoint union of countably many sets
E
1
, E
2
, . . . ∈ B
0
, and wish to show that
µ
0
(E) =
∞
¸
n=1
µ
0
(E
n
).
By splitting up E into intervals and then intersecting each of the E
n
with these intervals and using ﬁnite additivity, we may assume that
E is a single interval. By splitting up the E
n
into their component
intervals and using ﬁnite additivity, we may assume that the E
n
are
1.7. Outer measure, premeasure, product measure 191
also individual intervals. By subadditivity, it suﬃces to show that
µ
0
(E) ≤
∞
¸
n=1
µ
0
(E
n
).
By the deﬁnition of µ
0
(E), one can check that
(1.35) µ
0
(E) = sup
K⊂E
µ
0
(K)
where K ranges over all compact intervals contained in E (Exercise!).
Thus, it suﬃces to show that
µ
0
(K) ≤
∞
¸
n=1
µ
0
(E
n
)
for each compact subinterval K of E. In a similar spirit, one can
show that
µ
0
(E
n
) = inf
U⊃En
µ
0
(E
n
)
where U ranges over all open intervals containing E
n
(Exercise!).
Using the ε/2
n
trick, it thus suﬃces to show that
µ
0
(K) ≤
∞
¸
n=1
µ
0
(U
n
)
whenever U
n
is an open interval containing E
n
. But by the Heine
Borel theorem, one can cover K by a ﬁnite number
¸
N
n=1
U
n
of the
U
n
, hence by ﬁnite subadditivity
µ
0
(K) ≤
N
¸
n=1
µ
0
(U
n
)
and the claim follows.
As µ
0
is now veriﬁed to be a premeasure, we may use the Hahn
Kolmogorov extension theorem to extend it to a countably additive
measure µ on a σalgebra B that contains B
0
. In particular, B contains
all the elementary sets and hence (by Exercise 1.4.14) contains the
Borel σalgebra. Restricting µ to the Borel σalgebra we obtain the
existence claim.
Finally, we establish uniqueness. If µ
/
is another Borel measure
with the stated properties, then µ
/
(K) = [K[
F
for every compact in
terval K, and hence by (1.35) and upward monotone convergence, one
192 1. Measure theory
has µ
/
(I) = [I[
F
for every interval (including the unbounded ones).
This implies that µ
/
agrees with µ
0
on B
0
, and thus (by Exercise 1.7.7,
noting that µ
0
is σﬁnite) agrees with µ on Borel measurable sets.
Exercise 1.7.10. Verify the claims marked “Exercise!” in the above
proof.
The measure µ
F
given by the above theorem is known as the
LebesgueStieltjes measure µ
F
of F. (In some texts, this measure is
only deﬁned when F is rightcontinuous, or equivalently if F = F
+
.)
Exercise 1.7.11. Deﬁne a Radon measure on R to be a Borel mea
sure µ obeying the following additional properties:
(i) (Local ﬁniteness) µ(K) < ∞ for every compact K.
(ii) (Inner regularity) One has µ(E) = sup
K⊂E,K compact
µ(K)
for every Borel set E.
(iii) (Outer regularity) One has µ(E) = inf
U⊃E,U open
µ(U) for
every Borel set E.
Show that for every monotone function F : R → R, the Lebesgue
Stieltjes measure µ
F
is a Radon measure on R; conversely, if µ is a
Radon measure on R, show that there exists a monotone function
F : R → R such that µ = µ
F
.
Radon measures are studied in more detail in '1.10 of An epsilon
of room, Vol. I.
Exercise 1.7.12 (Near uniqueness). If F, F
/
: R → R are monotone
nondecreasing functions, show that µ
F
= µ
F
if and only if there
exists a constant C ∈ R such that F
+
(x) = F
/
+
(x) +C and F
−
(x) =
F
/
−
(x) + C for all x ∈ R. Note that this implies that the value
of F at its points of discontinuity are irrelevant for the purposes of
determining the LebesgueStieltjes measure µ
F
; in particular, µ
F
=
µ
F+
= µ
F−
.
In the special case when F
+
(−∞) = 0 and F
−
(+∞) = 1, then
µ
F
is a probability measure, and F
+
(x) = µ
F
((−∞, x]) is known as
the cumulative distribution function of µ
F
.
Now we give some examples of LebesgueStieltjes measure.
1.7. Outer measure, premeasure, product measure 193
Exercise 1.7.13 (LebesgueStieltjes measure, absolutely continuous
case).
(i) If F : R → R is the identity function F(x) = x, show that
µ
F
is equal to Lebesgue measure m.
(ii) If F : R → R is monotone nondecreasing and absolutely
continuous (which in particular implies that F
/
exists and
is absolutely integrable, show that µ
F
= m
F
in the sense
of Exercise 1.4.49, thus
µ
F
(E) =
E
F
/
(x) dx
for any Borel measurable E, and
R
f(x) dµ
F
(x) =
R
f(x)F
/
(x) dx
for any unsigned Borel measurable f : R → [0, +∞].
In view of the above exercise, the integral
R
f dµ
F
is often ab
breviated
R
f dF, and referred to as the LebesgueStieltjes integral
of f with respect to F. In particular, observe the identity
[a,b]
dF = F
+
(b) −F
−
(a)
for any monotone nondecreasing F : R → R and any −∞ < b <
a < +∞, which can be viewed as yet another formulation of the
fundamental theorem of calculus.
Exercise 1.7.14 (LebesgueStieltjes measure, pure point case).
(i) If H : R → R is the Heaviside function H := 1
[0,+∞)
,
show that µ
H
is equal to the Dirac measure δ
0
at the origin
(deﬁned in Example 1.4.22).
(ii) If F =
¸
n
c
n
J
n
is a jump function (as deﬁned in Deﬁnition
1.6.30), show that µ
F
is equal to the linear combination
¸
c
n
δ
xn
of delta functions (as deﬁned in Exercise 1.4.22),
where x
n
is the point of discontinuity for the basic jump
function J
n
.
Exercise 1.7.15 (LebesgueStieltjes measure, singular continuous
case).
194 1. Measure theory
(i) If F : R → R is a monotone nondecreasing function, show
that F is continuous if and only if µ
F
(¦x¦) = 0 for all x ∈ R.
(ii) If F is the Cantor function (deﬁned in Exercise 1.6.47),
show that µ
F
is a probability measure supported on the
middlethirds Cantor set (see Exercise 1.2.9) in the sense
that µ
F
(R`C) = 0. The measure µ
F
is known as Cantor
measure.
(iii) If µ
F
is Cantor measure, establish the selfsimilarity prop
erties µ(
1
3
E) =
1
2
µ(E) and µ(
1
3
E +
2
3
) =
1
2
µ(E) for every
Borelmeasurable E ⊂ [0, 1], where
1
3
E := ¦
1
3
x : x ∈ E¦.
Exercise 1.7.16 (Connection with RiemannStieltjes integral). Let
F : R → R be monotone nondecreasing, let [a, b] be a compact
interval, and let f : [a, b] → R be continuous. Suppose that F is
continuous at the endpoints a, b of the interval. Show that for every
ε > 0 there exists δ > 0 such that
[
n
¸
i=1
f(t
∗
i
)(F(t
i
) −F(t
i−1
)) −
[a,b]
f dF[ ≤ ε
whenever a = t
0
< t
1
< . . . < t
n
= b and t
∗
i
∈ [t
i−1
, t
i
] for 1 ≤
i ≤ n are such that sup
1≤i≤n
[t
i
− t
i−1
[ ≤ δ. In the language of
the RiemannStieltjes integral, this result asserts that the Lebesgue
Stieltjes integral extends the RiemannStieltjes integral.
Exercise 1.7.17 (Integration by parts formula). Let F, G : R → R
be monotone nondecreasing and continuous. Show that
[a,b]
F dG = −
[a,b]
G dF +F(b)G(b) −F(a)G(a)
for any compact interval [a, b]. (Hint: use Exercise 1.7.16.) This
formula can be partially extended to the case when one or both of
F, G have discontinuities, but care must be taken when F and G are
simultaneously discontinuous at the same location.
1.7.4. Product measure. Given two sets X and Y , one can form
their Cartesian product X Y = ¦(x, y) : x ∈ X, y ∈ Y ¦. This
set is naturally equipped with the coordinate projection maps π
X
:
XY → X and π
Y
: XY → Y deﬁned by setting π
X
(x, y) := x and
1.7. Outer measure, premeasure, product measure 195
π
Y
(x, y) := y. One can certainly take Cartesian products X
1
. . .X
d
of more than two sets, or even take an inﬁnite product
¸
α∈A
X
α
, but
for simplicity we will only discuss the theory for products of two sets
for now.
Now suppose that (X, B
X
) and (Y, B
Y
) are measurable spaces.
Then we can still form the Cartesian product XY and the projection
maps π
X
: X Y → X and π
Y
: X Y → Y . But now we can also
form the pullback σalgebras
π
∗
X
(B
X
) := ¦π
−1
X
(E) : E ∈ B
X
¦ = ¦E Y : E ∈ B
X
¦
and
π
∗
Y
(B
Y
) := ¦π
−1
Y
(E) : E ∈ B
Y
¦ = ¦X F : F ∈ B
Y
¦.
We then deﬁne the product σalgebra B
X
B
Y
to be the σalgebra
generated by the union of these two σalgebras:
B
X
B
Y
:= 'π
∗
X
(B
X
) ∪ π
∗
Y
(B
Y
)`.
This deﬁnition has several equivalent formulations:
Exercise 1.7.18. Let (X, B
X
) and (Y, B
Y
) be measurable spaces.
(i) Show that B
X
B
Y
is the σalgebra generated by the sets
E F with E ∈ B
X
, Y ∈ B
Y
. In other words, B
X
B
Y
is
the coarsest σalgebra on X Y with the property that the
product of a B
X
measurable set and a B
Y
measurable set is
always B
X
B
Y
measurable.
(ii) Show that B
X
B
Y
is the coarsest σalgebra on X Y
that makes the projection maps π
X
, π
Y
both measurable
morphisms (see Remark 1.4.33).
(iii) If E ∈ B
X
B
Y
, show that the sets E
x
:= ¦y ∈ Y : (x, y) ∈
E¦ lie in B
Y
for every x ∈ X, and similarly that the sets
E
y
:= ¦x ∈ X : (x, y) ∈ E¦ lie in B
X
for every y ∈ Y .
(iv) If f : X Y → [0, +∞] is measurable (with respect to
B
X
B
Y
), show that the function f
x
: y → f(x, y) is B
Y

measurable for every x ∈ X, and similarly that the function
f
y
: x → f(x, y) is B
X
measurable for every y ∈ Y .
(v) If E ∈ B
X
B
Y
, show that the slices E
x
:= ¦y ∈ Y :
(x, y) ∈ E¦ lie in a countably generated σalgebra. In other
196 1. Measure theory
words, show that there exists an at most countable collec
tion / = /
E
of sets (which can depend on E) such that
¦E
x
: x ∈ X¦ ⊂ '/`. Conclude in particular that the num
ber of distinct slices E
x
is at most c, the cardinality of the
continuum. (The last part of this exercise is only suitable
for students who are comfortable with cardinal arithmetic.)
Exercise 1.7.19.
(i) Show that the product of two trivial σalgebras (on two
diﬀerent spaces X, Y ) is again trivial.
(ii) Show that the product of two atomic σalgebras is again
atomic.
(iii) Show that the product of two ﬁnite σalgebras is again ﬁnite.
(iv) Show that the product of two Borel σalgebras (on two Eu
clidean spaces R
d
, R
d
with d, d
/
≥ 1) is again the Borel
σalgebra (on R
d
R
d
≡ R
d+d
).
(v) Show that the product of two Lebesgue σalgebras (on two
Euclidean spaces R
d
, R
d
with d, d
/
≥ 1) is not the Lebesgue
σalgebra. (Hint: argue by contradiction and use Exercise
1.7.18(iii).)
(vi) However, show that the Lebesgue σalgebra on R
d+d
is
the completion (see Exercise 1.4.26) of the product of the
Lebesgue σalgebras of R
d
and R
d
with respect to d + d
/

dimensional Lebesgue measure.
(vii) This part of the exercise is only for students who are com
fortable with cardinal arithmetic. Give an example to show
that the product of two discrete σalgebras is not necessarily
discrete.
(viii) On the other hand, show that the product of two discrete
σalgebras 2
X
, 2
Y
is again a discrete σalgebra if at least one
of the domains X, Y is at most countably inﬁnite.
Now suppose we have two measure spaces (X, B
X
, µ
X
) and (Y, B
Y
, µ
Y
).
Given that we can multiply together the sets X and Y to form a prod
uct set X Y , and can multiply the σalgebras B
X
and B
Y
together
to form a product σalgebra B
X
B
Y
, it is natural to expect that we
1.7. Outer measure, premeasure, product measure 197
can multiply the two measures µ
X
: B
X
→ [0, +∞] and µ
Y
: B
Y
→
[0, +∞] to form a product measure µ
X
µ
Y
: B
X
B
Y
→ [0, +∞]. In
view of the “base times height formula” that one learns in elementary
school, one expects to have
(1.36) µ
X
µ
Y
(E F) = µ
X
(E)µ
Y
(F)
whenever E ∈ B
X
and F ∈ B
Y
.
To construct this measure, it is convenient to make the assump
tion that both spaces are σﬁnite:
Deﬁnition 1.7.10 (σﬁnite). A measure space (X, B, µ) is σﬁnite if
X can be expressed as the countable union of sets of ﬁnite measure.
Thus, for instance, R
d
with Lebesgue measure is σﬁnite, as R
d
can be expressed as the union of (for instance) the balls B(0, n) for
n = 1, 2, 3, . . ., each of which has ﬁnite measure. On the other hand,
R
d
with counting measure is not σﬁnite (why?). But most measure
spaces that one actually encounters in analysis (including, clearly, all
probability spaces) are σﬁnite. It is possible to partially extend the
theory of product spaces to the nonσﬁnite setting, but there are a
number of very delicate technical issues that arise and so we will not
discuss them here.
As long as we restrict attention to the σﬁnite case, product mea
sure always exists and is unique:
Proposition 1.7.11 (Existence and uniqueness of product measure).
Let (X, B
X
, µ
X
) and (Y, B
Y
, µ
Y
) be σﬁnite measure spaces. Then
there exists a unique measure µ
X
µ
Y
on B
X
B
Y
that obeys µ
X
µ
Y
(E F) = µ
X
(E)µ
Y
(F) whenever E ∈ B
X
and F ∈ B
Y
.
Proof. We ﬁrst show existence. Inspired by the fact that Lebesgue
measure is the HahnKolmogorov completion of elementary (pre)measure,
we shall ﬁrst construct an “elementary product premeasure” that we
will then apply Theorem 1.7.8 to.
Let B
0
be the collection of all ﬁnite unions
S := (E
1
F
1
) ∪ . . . ∪ (E
k
F
k
)
198 1. Measure theory
of Cartesian products of B
X
measurable sets E
1
, . . . , E
k
and B
Y

measurable sets F
1
, . . . , F
k
. (One can think of such sets as being
somewhat analogous to elementary sets in Euclidean space, although
the analogy is not perfectly exact.) It is not diﬃcult to verify that
this is a Boolean algebra (though it is not, in general, a σalgebra).
Also, any set in B
0
can be easily decomposed into a disjoint union
of product sets E
1
F
1
, . . . , E
k
F
k
of B
X
measurable sets and B
Y

measurable sets (cf. Exercise 1.1.2). We then deﬁne the quantity
µ
0
(S) associated such a disjoint union S by the formula
µ
0
(S) :=
k
¸
j=1
µ
X
(E
j
)µ
Y
(F
j
)
whenever S is the disjoint union of products E
1
F
1
, . . . , E
k
F
k
of B
X
measurable sets and B
Y
measurable sets. One can show that
this deﬁnition does not depend on exactly how S is decomposed, and
gives a ﬁnitely additive measure µ
0
: B
0
→ [0, +∞] (cf. Exercise 1.1.2
and Exercise 1.4.33).
Now we show that µ
0
is a premeasure. It suﬃces to show that
if S ∈ B
0
is the countable disjoint union of sets S
1
, S
2
, . . . ∈ B
0
, then
µ
0
(S) =
¸
∞
n=1
µ(S
n
).
Splitting S up into disjoint product sets, and restricting the S
n
to each of these product sets in turn, we may assume without loss
of generality (using the ﬁnite additivity of µ
0
) that S = E F for
some E ∈ B
X
and F ∈ B
Y
. In a similar spirit, by breaking each S
n
up into component product sets and using ﬁnite additivity again, we
may assume without loss of generality that each S
n
takes the form
S
n
= E
n
F
n
for some E
n
∈ B
X
and F
n
∈ B
Y
. By deﬁnition of µ
0
,
our objective is now to show that
µ
X
(E)µ
Y
(F) =
∞
¸
n=1
µ
X
(E
n
)µ
Y
(F
n
).
To do this, ﬁrst observe from construction that we have the pointwise
identity
1
E
(x)1
F
(y) =
∞
¸
n=1
1
En
(x)1
Fn
(y)
1.7. Outer measure, premeasure, product measure 199
for all x ∈ X and y ∈ Y . We ﬁx x ∈ X, and integrate this identity in
y (noting that both sides are measurable and unsigned) to conclude
that
Y
1
E
(x)1
F
(y) dµ
Y
(y) =
Y
∞
¸
n=1
1
En
(x)1
Fn
(y) dµ
Y
(y).
The lefthand side simpliﬁes to 1
E
(x)µ
Y
(F). To compute the right
hand side, we use the monotone convergence theorem (Theorem 1.4.44)
to interchange the summation and integration, and soon see that the
righthand side is
¸
∞
n=1
1
En
(x)µ
Y
(F
n
), thus
1
E
(x)µ
Y
(F) =
∞
¸
n=1
1
En
(x)µ
Y
(F
n
)
for all x. Both sides are measurable and unsigned in x, so we may
integrate in X and conclude that
X
1
E
(x)µ
Y
(F) dµ
X
=
X
∞
¸
n=1
1
En
(x)µ
Y
(F
n
) dµ
X
(x).
The lefthand side here is µ
X
(E)µ
Y
(F). Using monotone convergence
as before, the righthand side simpliﬁes to
¸
∞
n=1
µ
X
(E
n
)µ
Y
(F
n
), and
the claim follows.
Now that we have established that µ
0
is a premeasure, we may
apply Theorem 1.7.8 to extend this measure to a countably additive
measure µ
X
µ
Y
on a σalgebra containing B
0
. By Exercise 1.7.18(2),
µ
X
µ
Y
is a countably additive measure on B
X
B
Y
, and as it extends
µ
0
, it will obey (1.36). Finally, to show uniqueness, observe from ﬁnite
additivity that any measure µ
X
µ
Y
on B
X
B
Y
that obeys (1.36)
must extend µ
0
, and so uniqueness follows from Exercise 1.7.7.
Remark 1.7.12. When X, Y are not both σﬁnite, then one can
still construct at least one product measure, but it will, in general,
not be unique. This makes the theory much more subtle, and we will
not discuss it in these notes.
Example 1.7.13. From Exercise 1.2.22, we see that the product
m
d
m
d
of the Lebesgue measures m
d
, m
d
on (R
d
, /[R
d
]) and
(R
d
, /[R
d
]) respectively will agree with Lebesgue measure m
d+d
on
the product space /[R
d
] /[R
d
], which as noted in Exercise 1.7.19
200 1. Measure theory
is a subalgebra of /[R
d+d
]. After taking the completion m
d
m
d
of
this product measure, one obtains the full Lebesgue measure m
d+d
.
Exercise 1.7.20. Let (X, B
X
), (Y, B
Y
) be measurable spaces.
(i) Show that the product of two Dirac measures on (X, B
X
),
(Y, B
Y
) is a Dirac measure on (X Y, B
X
B
Y
).
(ii) If X, Y are at most countable, show that the product of the
two counting measures on (X, B
X
), (Y, B
Y
) is the counting
measure on (X Y, B
X
B
Y
).
Exercise 1.7.21 (Associativity of product). Let (X, B
X
, µ
X
), (Y, B
Y
, µ
Y
),
(Z, B
Z
, µ
Z
) be σﬁnite sets. We may identify the Cartesian products
(X Y ) Z and X (Y Z) with each other in the obvious man
ner. If we do so, show that (B
X
B
Y
) B
Z
= B
X
(B
Y
B
Z
) and
(µ
X
µ
Y
) µ
Z
= µ
X
(µ
Y
µ
Z
).
Now we integrate using this product measure. We will need the
following technical lemma. Deﬁne a monotone class in X is a collec
tion B of subsets of X with the following two closure properties:
(i) If E
1
⊂ E
2
⊂ . . . are a countable increasing sequence of sets
in B, then
¸
∞
n=1
E
n
∈ B.
(ii) If E
1
⊃ E
2
⊃ . . . are a countable decreasing sequence of sets
in B, then
¸
∞
n=1
E
n
∈ B.
Lemma 1.7.14 (Monotone class lemma). Let / be a Boolean algebra
on X. Then '/` is the smallest monotone class that contains /.
Proof. Let B be the intersection of all the monotone classes that
contain /. Since '/` is clearly one such class, B is a subset of '/`.
Our task is then to show that B contains '/`.
It is also clear that B is a monotone class that contains /. By
replacing all the elements of B with their complements, we see that
B is necessarily closed under complements.
For any E ∈ /, consider the set (
E
of all sets F ∈ B such that
F`E, E`F, F ∩ E, and X`(E ∪ F) all lie in B. It is clear that (
E
contains /; since B is a monotone class, we see that (
E
is also. By
deﬁnition of B, we conclude that (
E
= B for all E ∈ /.
1.7. Outer measure, premeasure, product measure 201
Next, let T be the set of all E ∈ B such that F`E, E`F, F ∩ E,
and X`(E∪F) all lie in B for all F ∈ B. By the previous discussion, we
see that T contains /. One also easily veriﬁes that T is a monotone
class. By deﬁnition of B, we conclude that T = B. Since B is also
closed under complements, this implies that B is closed with respect
to ﬁnite unions. Since this class also contains /, which contains
∅, we conclude that B is a Boolean algebra. Since B is also closed
under increasing countable unions, we conclude that it is closed under
arbitrary countable unions, and is thus a σalgebra. As it contains
/, it must also contain '/`.
Theorem 1.7.15 (Tonelli’s theorem, incomplete version). Let (X, B
X
, µ
X
)
and (Y, B
Y
, µ
Y
) be σﬁnite measure spaces, and let f : X Y →
[0, +∞] be measurable with respect to B
X
B
Y
. Then:
(i) The functions x →
Y
f(x, y) dµ
Y
(y) and y →
X
f(x, y) dµ
X
(x)
(which are welldeﬁned, thanks to Exercise 1.7.18) are mea
surable with respect to B
X
and B
Y
respectively.
(ii) We have
X·Y
f(x, y) dµ
X
µ
Y
(x, y)
=
X
(
Y
f(x, y) dµ
Y
(y)) dµ
X
(x)
=
Y
(
X
f(x, y) dµ
X
(x)) dµ
Y
(y).
Proof. By writing the σﬁnite space X as an increasing union X =
¸
∞
n=1
X
n
of ﬁnite measure sets, we see from several applications of
the monotone convergence theorem (Theorem 1.4.44) that it suﬃces
to prove the claims with X replaced by X
n
. Thus we may assume
without loss of generality that X has ﬁnite measure. Similarly we
may assume Y has ﬁnite measure. Note from (1.36) that this implies
that X Y has ﬁnite measure also.
Every unsigned measurable function is the increasing limit of un
signed simple functions. By several applications of the monotone
convergence theorem (Theorem 1.4.44), we thus see that it suﬃces to
202 1. Measure theory
verify the claim when f is a simple function. By linearity, it then suf
ﬁces to verify the claim when f is an indicator function, thus f = 1
S
for some S ∈ B
X
B
Y
.
Let ( be the set of all S ∈ B
X
B
Y
for which the claims hold.
From the repeated applications of the monotone convergence theorem
(Theorem 1.4.44) and the downward monotone convergence theorem
(which is available in this ﬁnite measure setting) we see that ( is a
monotone class.
By direct computation (using (1.36)), we see that ( contains as
an element any product S = E F with E ∈ B
X
and F ∈ B
Y
. By
ﬁnite additivity, we conclude that ( also contains as an element any a
disjoint ﬁnite union S = E
1
F
1
∪. . .∪E
k
F
k
of such products. This
implies that ( also contains the Boolean algebra B
0
in the proof of
Proposition 1.7.11, as such sets can always be expressed as the disjoint
ﬁnite union of Cartesian products of measurable sets. Applying the
monotone class lemma, we conclude that ( contains 'B
0
` = B
X
B
Y
,
and the claim follows.
Remark 1.7.16. Note that Tonelli’s theorem for sums (Theorem
0.0.2) is a special case of the above result when µ
X
, µ
Y
are counting
measure. In a similar spirit, Corollary 1.4.46 is the special case when
just one of µ
X
, µ
Y
is counting measure.
Corollary 1.7.17. Let (X, B
X
, µ
X
) and (Y, B
Y
, µ
Y
) be σﬁnite mea
sure spaces, and let E ∈ B
X
B
Y
be a null set with respect to µ
X
µ
Y
.
Then for µ
X
almost every x ∈ X, the set E
x
:= ¦y ∈ Y : (x, y) ∈ E¦
is a µ
Y
null set; and similarly, for µ
Y
almost every y ∈ Y , the set
E
y
:= ¦x ∈ X : (x, y) ∈ E¦ is a µ
X
null set.
Proof. Applying the Tonelli theorem to the indicator function 1
E
,
we conclude that
0 =
X
(
Y
1
E
(x, y) dµ
Y
(y)) dµ
X
(x) =
Y
(
X
1
E
(x, y) dµ
X
(x)) dµ
Y
(y)
and thus
0 =
X
µ
Y
(E
x
) dµ
X
(x) =
Y
µ
X
(E
y
) dµ
Y
(y),
and the claim follows.
1.7. Outer measure, premeasure, product measure 203
With this corollary, we can extend Tonelli’s theorem to the com
pletion (XY, B
X
B
Y
, µ
X
µ
Y
) of the product space (XY, B
X
B
Y
, µ
X
µ
Y
), as constructed in Exercise 1.4.26. But we can easily
extend the Tonelli theorem to this context:
Theorem 1.7.18 (Tonelli’s theorem, complete version). Let (X, B
X
, µ
X
)
and (Y, B
Y
, µ
Y
) be complete σﬁnite measure spaces, and let f : X
Y → [0, +∞] be measurable with respect to B
X
B
Y
. Then:
(i) For µ
X
almost every x ∈ X, the function y → f(x, y) is
B
Y
measurable, and in particular
Y
f(x, y) dµ
Y
(y) exists.
Furthermore, the (µ
X
almost everywhere deﬁned) map x →
Y
f(x, y) dµ
Y
is B
X
measurable.
(ii) For µ
Y
almost every y ∈ Y , the function x → f(x, y) is
B
X
measurable, and in particular
X
f(x, y) dµ
X
(x) exists.
Furthermore, the (µ
Y
almost everywhere deﬁned) map y →
X
f(x, y) dµ
X
is B
Y
measurable.
(iii) We have
X·Y
f(x, y) dµ
X
µ
Y
(x, y) =
X
(
Y
f(x, y) dµ
Y
(y)) dµ
X
(x)
=
X
(
Y
f(x, y) dµ
Y
(y)) dµ
X
(x).
(1.37)
Proof. From Exercise 1.4.28, every measurable set in B
X
B
Y
is
equal to a measurable set in B
X
B
Y
outside of a µ
X
µ
Y
null set.
This implies that the B
X
B
Y
measurable function f agrees with a
B
X
B
Y
measurable function
˜
f outside of a µ
X
µ
Y
null set E (as
can be seen by expressing f as the limit of simple functions). From
Corollary 1.7.17, we see that for µ
X
almost every x ∈ X, the function
y → f(x, y) agrees with y →
˜
f(x, y) outside of a µ
Y
null set (and is
in particular measurable, as (Y, B
Y
, µ
Y
) is complete); and similarly
for µ
Y
almost every y ∈ Y , the function x → f(x, y) agrees with
x →
˜
f(x, y) outside of a µ
X
null set and is measurable, and the claim
follows.
Specialising to the case when f is an indicator function f = 1
E
,
we conclude
204 1. Measure theory
Corollary 1.7.19 (Tonelli’s theorem for sets). Let (X, B
X
, µ
X
) and
(Y, B
Y
, µ
Y
) be complete σﬁnite measure spaces, and let E ∈ B
X
B
Y
.
Then:
(i) For µ
X
almost every x ∈ X, the set E
x
:= ¦y ∈ Y : (x, y) ∈
E¦ lies in B
Y
, and the (µ
X
almost everywhere deﬁned) map
x → µ
Y
(E
x
) is B
X
measurable.
(ii) For µ
Y
almost every y ∈ Y , the set E
y
:= ¦x ∈ X : (x, y) ∈
E¦ lies in B
X
, and the (µ
Y
almost everywhere deﬁned) map
y → µ
X
(E
y
) is B
Y
measurable.
(iii) We have
(1.38) µ
X
µ
Y
(E) =
X
µ
Y
(E
x
) dµ
X
(x)
=
X
µ
X
(E
y
) dµ
X
(x).
Exercise 1.7.22. The purpose of this exercise is to demonstrate that
Tonelli’s theorem can fail if the σﬁnite hypothesis is removed, and
also that product measure need not be unique. Let X is the unit
interval [0, 1] with Lebesgue measure m (and the Lebesgue σalgebra
/([0, 1])) and Y is the unit interval [0, 1] with counting measure (and
the discrete σalgebra 2
[0,1]
) #. Let f := 1
E
be the indicator function
of the diagonal E := ¦(x, x) : x ∈ [0, 1]¦.
(i) Show that f is measurable in the product σalgebra.
(ii) Show that
X
(
Y
f(x, y) d#(y))dm(x) = 1.
(iii) Show that
Y
(
X
f(x, y) dm(x))d#(y) = 0.
(iv) Show that there is more than one measure µ on /([0, 1])
2
[0,1]
with the property that µ(E F) = m(E)#(F) for all
E ∈ /([0, 1]) and F ∈ 2
[0,1]
. (Hint: use the two diﬀerent
ways to perform a double integral to create two diﬀerent
measures.)
Remark 1.7.20. If f is not assumed to be measurable in the product
space (or its completion), then of course the expression
X·Y
f(x, y) dµ
X
µ
Y
(x, y)
does not make sense. Furthermore, in this case the remaining two ex
pressions in (1.37) may become diﬀerent as well (in some models of
set theory, at least), even when X and Y are ﬁnite measure. For
1.7. Outer measure, premeasure, product measure 205
instance, let us assume the continuum hypothesis, which implies that
the unit interval [0, 1] can be placed in onetoone correspondence
with the ﬁrst uncountable ordinal ω
1
. Let ≺ be the ordering of [0, 1]
that is associated to this ordinal, let E := ¦(x, y) ∈ [0, 1]
2
: x ≺ y¦,
and let f := 1
E
. Then, for any y ∈ [0, 1], there are at most countably
many x such that x ≺ y, and so
[0,1]
f(x, y) dx exists and is equal
to zero for every y. On the other hand, for every x ∈ [0, 1], one has
x ≺ y for all but countably many y ∈ [0, 1], and so
[0,1]
f(x, y) dy ex
ists and is equal to one for every y, and so the last two expressions in
(1.37) exist but are unequal. (In particular, Tonelli’s theorem implies
that E cannot be a Lebesgue measurable subset of [0, 1]
2
.) Thus we
see that measurability in the product space is an important hypoth
esis. (There do however exist models of set theory (with the axiom
of choice) in which such counterexamples cannot be constructed, at
least in the case when X and Y are the unit interval with Lebesgue
measure.)
Tonelli’s theorem is for the unsigned integral, but it leads to an
important analogue for the absolutely integral, known as Fubini’s
theorem:
Theorem 1.7.21 (Fubini’s theorem). Let (X, B
X
, µ
X
) and (Y, B
Y
, µ
Y
)
be complete σﬁnite measure spaces, and let f : X Y → C be abso
lutely integrable with respect to B
X
B
Y
. Then:
(i) For µ
X
almost every x ∈ X, the function y → f(x, y) is
absolutely integrable with respect to µ
Y
, and in particular
Y
f(x, y) dµ
Y
(y) exists. Furthermore, the (µ
X
almost ev
erywhere deﬁned) map x →
Y
f(x, y) dµ
Y
(y) is absolutely
integrable with respect to µ
X
.
(ii) For µ
Y
almost every y ∈ Y , the function x → f(x, y) is
absolutely integrable with respect to µ
X
, and in particular
X
f(x, y) dµ
X
(x) exists. Furthermore, the (µ
Y
almost ev
erywhere deﬁned) map y →
X
f(x, y) dµ
X
(x) is absolutely
integrable with respect to µ
Y
.
206 1. Measure theory
(iii) We have
X·Y
f(x, y) dµ
X
µ
Y
(x, y) =
X
(
Y
f(x, y) dµ
Y
(y)) dµ
X
(x)
=
X
(
Y
f(x, y) dµ
Y
(y)) dµ
X
(x).
Proof. By taking real and imaginary parts we may assume that f
is real; by taking positive and negative parts we may assume that
f is unsigned. But then the claim follows from Tonelli’s theorem;
note from (1.37) that
X
(
Y
f(x, y) dµ
Y
(y)) dµ
X
(x) is ﬁnite, and so
Y
f(x, y) dµ
Y
(y) < ∞ for µ
X
almost every x ∈ X, and similarly
X
f(x, y) dµ
X
(x) < ∞ for µ
Y
almost every y ∈ Y .
Exercise 1.7.23. Give an example of a Borel measurable function f :
[0, 1]
2
→ R such that the integrals
[0,1]
f(x, y) dy and
[0,1]
f(x, y) dx
exist and are absolutely integrable for all x ∈ [0, 1] and y ∈ [0, 1] re
spectively, and that
[0,1]
(
[0,1]
f(x, y) dy) dx and
[0,1]
(
[0,1]
f(x, y) dy) dx
exist and are absolutely integrable, but such that
[0,1]
(
[0,1]
f(x, y) dy) dx =
[0,1]
(
[0,1]
f(x, y) dy) dx.
are unequal. (Hint: adapt the example from Remark 0.0.3.) Thus we
see that Fubini’s theorem fails when one drops the hypothesis that f
is absolutely integrable with respect to the product space.
Remark 1.7.22. Despite the failure of Tonelli’s theorem in the σ
ﬁnite setting, it is possible to (carefully) extend Fubini’s theorem
to the nonσﬁnite setting, as the absolute integrability hypotheses,
when combined with Markov’s inequality (Exercise 1.4.36(vi)), can
provide a substitute for the σﬁnite property. However, we will not
do so here, and indeed I would recommend proceeding with extreme
caution when performing any sort of interchange of integrals or in
voking of product measure when one is not in the σﬁnite setting.
Informally, Fubini’s theorem allows one to always interchange the
order of two integrals, as long as the integrand is absolutely integrable
in the product space (or its completion). In particular, specialising
1.7. Outer measure, premeasure, product measure 207
to Lebesgue measure, we have
R
d+d
f(x, y) d(x, y) =
R
d
(
R
d
f(x, y) dy) dx =
R
d
(
R
d
f(x, y) dx) dy
whenever f : R
d+d
→ C is absolutely integrable. In view of this, we
often write dxdy (or dydx) for d(x, y).
By combining Fubini’s theorem with Tonelli’s theorem, we can
recast the absolute integrability hypothesis:
Corollary 1.7.23 (FubiniTonelli theorem). Let (X, B
X
, µ
X
) and
(Y, B
Y
, µ
Y
) be complete σﬁnite measure spaces, and let f : XY →
C be measurable with respect to B
X
B
Y
. If
X
(
Y
[f(x, y)[ dµ
Y
(y)) dµ
X
(x) < ∞
(note the lefthand side always exists, by Tonelli’s theorem) then f is
absolutely integrable with respect to B
X
B
Y
, and in particular the
conclusions of Fubini’s theorem hold. Similarly if we use
Y
(
X
[f(x, y)[ dµ
X
(x)) dµ
Y
(y)
instead of
X
(
Y
[f(x, y)[ dµ
Y
) dµ
X
.
The FubiniTonelli theorem is an indispensable tool for comput
ing integrals. We give some basic examples below:
Exercise 1.7.24 (Area interpretation of integral). Let (X, B, µ) be a
σﬁnite measure space, and let R be equipped with Lebesgue measure
m and the Borel σalgebra B[R]. Show that if f : X → [0, +∞] is
measurable if and only if the set ¦(x, t) ∈ X R : 0 ≤ t ≤ f(x)¦ is
measurable in B B[R], in which case we have
(µ m)(¦(x, t) ∈ X R : 0 ≤ t ≤ f(x)¦) =
X
f(x) dµ(x).
Similarly if we replace ¦(x, t) ∈ X R : 0 ≤ t ≤ f(x)¦ by ¦(x, t) ∈
X R : 0 ≤ t < f(x)¦.
Exercise 1.7.25 (Distribution formula). Let (X, B, µ) be a σﬁnite
measure space, and let f : X → [0, +∞] be measurable. Show that
X
f(x) dµ(x) =
[0,+∞]
µ(¦x ∈ X : f(x) ≥ λ¦) dλ.
208 1. Measure theory
(Note that the integrand on the righthand side is monotone and thus
Lebesgue measurable.) Similarly if we replace ¦x ∈ X : f(x) ≥ λ¦ by
¦x ∈ X : f(x) > λ¦.
Exercise 1.7.26 (Approximations to the identity). Let P : R
d
→
R
+
be a good kernel (see Exercise 1.6.27), and let P
t
(x) :=
1
t
d
P(
x
t
)
be the associated rescaled functions. Show that if f : R
d
→ C is
absolutely integrable, that f ∗ P
t
converges in L
1
norm to f as t → 0.
(Hint: use the density argument. You will need an upper bound on
f ∗ P
t

L
1
(R
d
)
which can be obtained using Tonelli’s theorem.)
Chapter 2
Related articles
209
210 2. Related articles
2.1. Problem solving strategies
The purpose of this section is to list (in no particular order) a number
of common problem solving strategies for attacking real analysis exer
cises such as that presented in this text. Some of these strategies are
speciﬁc to real analysis type problems, but others are quite general
and would be applicable to other mathematical exercises.
2.1.1. Split up equalities into inequalities. If one has to show
that two numerical quantities X and Y are equal, try proving that
X ≤ Y and Y ≤ X separately. Often one of these will be very easy,
and the other one harder; but the easy direction may still provide
some clue as to what needs to be done to establish the other direction.
Exercise 1.1.6(iii) is a typical problem in which this strategy can be
applied.
In a similar spirit, to show that two sets E and F are equal, try
proving that E ⊂ F and F ⊂ E. See for instance the proof of Lemma
1.2.11 for a simple example of this.
2.1.2. Give yourself an epsilon of room. If one has to show
that X ≤ Y , try proving that X ≤ Y + ε for any ε > 0. (This
trick combines well with '2.1.1.) See for instance Lemma 1.2.5 for an
example of this.
In a similar spirit:
• if one needs to show that a quantity X vanishes, try showing
that [X[ ≤ ε for every ε > 0. (Exercise 1.2.19 is a simple
application of this strategy.)
• if one wishes to show that two functions f, g agree almost
everywhere, try showing ﬁrst that [f(x) − g(x)[ ≤ ε holds
for almost every x, or even just outside of a set of measure
at most ε, for any given ε > 0. (See for instance the proof
of Lemma 1.5.7 for an example of this.)
• if one wants to show that a sequence x
n
of real numbers
converges to zero, try showing that limsup
n→∞
[x
n
[ ≤ ε
for every ε > 0. (The proof of the Lebesgue diﬀerentiation
theorem, Theorem 1.6.12, is in this spirit.)
2.1. Problem solving strategies 211
Don’t be too focused on getting all your error terms adding up to
exactly ε  usually, as long as the ﬁnal error bound consists of terms
that can all be made as small as one wishes by choosing parameters
in a suitable way, that is enough. For instance, an error term such
as 10ε is certainly OK, or even more complicated expressions such as
10ε/δ + 4δ if one has the ability to choose δ as small as one wishes,
and then after δ is chosen, one can then also set ε as small as one
wishes (in a manner that can depend on δ).
One caveat: for ﬁnite x, and any ε > 0, it is true that x +ε > x
and x −ε < x, but this statement is not true when x is equal to +∞
(or −∞). So remember to exercise some care with the epsilon of room
trick when some quantities are inﬁnite.
See also '2.7 of An epsilon of room, Vol. I.
2.1.3. Decompose (or approximate) a rough or general ob
ject into (or by) a smoother or simpler one. If one has to
prove something about an unbounded (or inﬁnite measure) set, con
sider proving it for bounded (or ﬁnite measure) sets ﬁrst if this looks
easier.
In a similar spirit:
• If one has to prove something about a measurable set, try
proving it for open, closed, compact, bounded, or elementary
sets ﬁrst.
• If one has to prove something about a measurable function,
try proving it for functions that are continuous, bounded,
compactly supported, simple, absolutely integrable, etc..
• If one has to prove something about an inﬁnite sum or se
quence, try proving it ﬁrst for ﬁnite truncations of that sum
or sequence (but try to get all the bounds independent of
the number of terms in that truncation, so that you can still
pass to the limit!).
• If one has to prove something about a complexvalued func
tion, try it for realvalued functions ﬁrst.
• If one has to prove something about a realvalued function,
try it for unsigned functions ﬁrst.
212 2. Related articles
• If one has to prove something about a simple function, try
it for indicator functions ﬁrst.
In order to pass back to the general case from these special cases,
one will have to somehow decompose the general object into a combi
nation of special ones, or approximate general objects by special ones
(or as a limit of a sequence of special objects). In the latter case,
one may need an epsilon of room ('2.1.2), and some sort of limiting
analysis may be needed to deal with the errors in the approximation
(it is not always enough to just “pass to the limit”, as one has to
justify that the desirable properties of the approximating object are
preserved in the limit). Littlewood’s principles (Section 1.3.5) and
their variants are often useful for thus purpose.
Note: one should not do this blindly, as one might then be loading
on a bunch of distracting but ultimately useless hypotheses that end
up being a lot less help than one might hope. But they should be
kept in mind as something to try if one starts having thoughts such
as “Gee, it would be nice at this point if I could assume that f is
continuous / realvalued / simple / unsigned / etc.”.
In the more quantitative areas of analysis and PDE, one sees
a common variant of the above technique, namely the method of a
priori estimates. Here, one needs to prove an estimate or inequality
for all functions in a large, rough class (e.g. all rough solutions to
a PDE). One can often then ﬁrst prove this inequality in a much
smaller (but still “dense”) class of “nice” functions, so that there is
little diﬃculty justifying the various manipulations (e.g. exchanging
integrals, sums, or limits, or integrating by parts) that one wishes
to perform. Once one obtains these a priori estimates, one can then
often take some sort of limiting argument to recover the general case.
2.1.4. If one needs to ﬂip an upper bound to a lower bound
or vice versa, look for a way to take reﬂections or comple
ments. Sometimes one needs a lower bound for some quantity, but
only has techniques that give upper bounds. In some cases, though,
one can “reﬂect” an upper bound into a lower bound (or vice versa)
by replacing a set E contained in some space X with its complement
X`E, or a function f with its negation −f (or perhaps subtracting f
2.1. Problem solving strategies 213
from some dominating function F to obtain F −f). This trick works
best when the objects being reﬂected are contained in some sort of
“bounded”, “ﬁnite measure”, or “absolutely integrable” container, so
that one avoids having the dangerous situation of having to subtract
inﬁnite quantities from each other.
A typical example of this is when one deduces downward mono
tone convergence for sets from upward monotone convergence for sets
(Exercise 1.2.11).
2.1.5. Uncountable unions can sometimes be replaced by
countable or ﬁnite unions. Uncountable unions are not wellbehaved
in measure theory; for instance, an uncountable union of null sets
need not be a null set (or even a measurable set). (On the other
hand, the uncountable union of open sets remains open; this can of
ten be important to know.) However, in many cases one can replace
an uncountable union by a countable one. For instance, if one needs
to prove a statement for all ε > 0, then there are an uncountable
number of ε’s one needs to check, which may threaten measurability;
but in many cases it is enough to only work with a countable sequence
of εs, such as the numbers 1/m for m = 1, 2, 3, . . .. (Exercise 1.6.30
relies heavily on this trick.)
In a similar spirit, given a real parameter λ, this parameter ini
tially ranges over uncountably many values, but in some cases one
can get away with only working with a countable set of such values,
such as the rationals. In a similar spirit, rather than work with all
boxes (of which there are uncountably many), one might work with
the dyadic boxes (of which there are only countably many; also, they
obey nicer nesting properties than general boxes and so are often
desirable to work with in any event).
If you are working on a compact set, then one can often replace
even uncountable unions with ﬁnite ones, so long as one is working
with open sets. (The proof of Theorem 1.6.20 is a good example of
this strategy.) When this option is available, it is often worth spend
ing an epsilon of measure (or whatever other resource is available to
spend) to make one’s sets open, just so that one can take advantage
of compactness.
214 2. Related articles
2.1.6. If it is diﬃcult to work globally, work locally instead.
A domain such as Euclidean space R
d
has inﬁnite measure, and this
creates a number of technical diﬃculties when trying to do measure
theory directly on such spaces. Sometimes it is best to work more
locally, for instance working on a large ball B(0, R) or even a small
ball such as B(x, ε) ﬁrst, and then ﬁguring out how to patch things
together later. Compactness (or the closely related property of total
boundedness) is often useful for patching together small balls to cover
a large ball. Patching together large balls into the whole space tends
to work well when the properties one are trying to establish are local
in nature (such as continuity, or pointwise convergence) or behave
well with respect to countable unions. For instance, to prove that
a sequence of functions f
n
converges pointwise almost everywhere
to f on R
d
, it suﬃces to verify this pointwise almost everywhere
convergence on the ball B(0, R) for every R > 0 (which one can take
to be an integer to get countability, see '2.1.5). The application of
vertical truncation (as done, for instance, in the proof of Corollary
1.3.14) is an instance of this idea.
2.1.7. Be willing to throw away an exceptional set. The “Lebesgue
philosophy” to measure theory is that null sets are often “irrelevant”,
and so one should be very willing to cut out a set of measure zero
on which bad things are happening (e.g. a function is undeﬁned or
inﬁnite, a sequence of functions is not converging, etc.). One should
also be only slightly less willing to throw away sets of positive but
small measure, e.g. sets of measure at most ε. If such sets can be
made arbitrarily small in measure, this is often almost as good as just
throwing away a null set.
Many things in measure theory improve after throwing away a
small set. The most notable examples of this are Egorov’s theorem
(Theorem 1.3.26) and Lusin’s theorem (Theorem 1.3.28); see also
Exercise 1.3.25 for some other examples of this idea.
2.1. Problem solving strategies 215
It is also common to see a similar trick
1
of throwing away most
of a sequence and working with a subsequence instead. See '2.1.17
below.
2.1.8. Draw pictures and try to build counterexamples. Mea
sure theory, particularly on Euclidean spaces, has a signiﬁcant geo
metric aspect to it, and you should be exploiting your geometric intu
ition. Drawing pictures and graphs of all the objects being studied is
a good way to start. These pictures need not be completely realistic;
they should be just complicated enough to hint at the complexities
of the problem, but not more. For instance, usually one or two
dimensional pictures suﬃce for understanding problems in R
d
; draw
ing intricate 3D (or 4D, etc.) pictures does not often make things
simpler. To indicate that a function is not continuous, one or two
discontinuities or oscillations might suﬃce; make it too ornate and
it becomes less clear what to do about that function. One should
view these pictures as providing a “cartoon sketch” of the situation,
which exaggerates key features and downplays others, rather than a
photorealistic image of the situation; too much detail or accuracy in
a picture may be a waste of time, or otherwise counterproductive.
A common mistake is to try to draw a picture in which both
the hypotheses and conclusion of the problem hold. This is actually
not all that useful, as it often does not reveal the causal relationship
between the former and the latter. One should try instead to draw a
picture in which the hypotheses hold but for which the conclusion does
not  in other words, a counterexample to the problem. Of course,
you should be expected to fail at this task, given that the statement
of the problem is presumably true. However, the way in which your
picture fails to accomplish this task is often very instructive, and can
reveal vital clues as to how the solution to the problem is supposed
to proceed.
I have deliberately avoided drawing pictures in this book. This
is not because I feel that pictures are not useful  far from it  but
because I have found that it is far more informative for a reader
1
This trick can also be interpreted as “throwing away a small set”, but to un
derstand what “small” means in this context, one needs the language of ultraﬁlters,
which will not be discussed here; see [Ta2008, ¸1.5] for a discussion.
216 2. Related articles
to draw his or her own pictures of a given mathematical situation,
rather than rely on the author’s images (except in situations where
the geometric situation is particularly complicated or subtle), as such
pictures will naturally be adapted to the reader’s mindset rather than
the author’s. Besides, the process of actually drawing the picture is
at least as instructive as the picture itself.
2.1.9. Try simpler cases ﬁrst. This advice of course extends well
beyond measure theory, but if one is completely stuck on a problem,
try making the problem simpler (while still capturing at least one
of the diﬃculties of the problem that you cannot currently resolve).
For instance, if faced with a problem in R
d
, try the onedimensional
case d = 1 ﬁrst. Faced with a problem about a general measurable
function f, try a much simpler case ﬁrst, such as an indicator function
f = 1
E
. Faced with a problem about a general measurable set, try
an elementary set ﬁrst. Faced with a problem about a sequence of
functions, try a monotone sequence of functions ﬁrst. And so forth.
(Note that this trick overlaps quite a bit with '2.1.3.)
The problem should not be made so simple that it becomes trivial,
as this doesn’t really gain you any new insight about the original
problem; instead, one should try to keep the “essential” diﬃculties
of the problem while throwing away those aspects that you think are
less important (but are still serving to add to the overall diﬃculty
level).
On the other hand, if the simpliﬁed problem is unexpectedly easy,
but one cannot extend the methods to the general case (or somehow
leverage the simpliﬁed case to the general case, as in '2.1.3), this
is an indication that the true diﬃculty lies elsewhere. For instance,
if a problem involving general functions could be solved easily for
monotone functions, but one cannot then extend that argument to
the general case, this suggests that the true enemy is oscillation, and
perhaps one should try another simple case in which the function is
allowed to be highly oscillatory (but perhaps simple in other ways,
e.g. bounded with compact support).
2.1. Problem solving strategies 217
2.1.10. Abstract away any information that you believe or
suspect to be irrelevant. Sometimes one is faced with an embar
rassment of riches when it comes to what choice of technique to use
on a problem; there are so many diﬀerent facts that one knows about
the problem, and so many diﬀerent pieces of theory that one could
apply, that one doesn’t quite know where to begin.
When this happens, abstraction can be a vital tool to clear away
some of the conceptual clutter. Here, one wants to “forget” part of
the setting that the problem is phrased in, and only keep the part
that seems to be most relevant to the hypotheses and conclusions of
the problem (and thus, presumably, to the solution as well).
For instance, if one is working in a problem that is set in Eu
clidean space R
d
, but the hypotheses and conclusions only involve
measuretheoretic concepts (e.g. measurability, integrability, mea
sure, etc.) rather than topological structure, metric structure, etc.,
then it may be worthwhile to try abstracting the problem to the more
general setting of an abstract measure space, thus forgetting that one
was initially working in R
d
. The point of doing this is that it cuts
down on the number of possible ways to start attacking the problem.
For instance, facts such as outer regularity (every measurable set can
be approximated from above by an open set) do not hold in abstract
measure spaces (which do not even have a meaningful notion of an
open set), and so presumably will not play a role in the solution; sim
ilarly for any facts involving boxes. Instead, one should be trying to
use general facts about measure, such as countable additivity, which
are not speciﬁc to R
d
.
Remark 2.1.1. It is worth noting that sometimes this abstraction
method does not always work; for instance, when viewed as a measure
space, R
d
is not completely arbitrary, but does have one or two fea
tures that distinguish it from a generic measure space, most notably
the fact that it is σﬁnite. So, even if the hypotheses and conclusion
of a problem about R
d
is purely measuretheoretic in nature, one may
still need to use some measuretheoretic facts speciﬁc to R
d
. Here, it
becomes useful to know a little bit about the classiﬁcation of measure
spaces to have some intuition as to how “generic” a measure space
such as R
d
really is. This intuition is hard to convey at this level of
218 2. Related articles
the subject, but in general, measure spaces form a very “nonrigid”
category, with very few invariants, and so it is largely true that one
measure space usually behaves much the same as any other.
Another example of abstraction: suppose that a problem in
volves a large number of sets (e.g. E
n
and F
n
) and their measures,
but that the conclusion of the problem only involves the measures
m(E
n
), m(F
n
) of the sets, rather than the sets themselves. Then
one can try to abstract the sets out of the problem, by trying to
write down every single relationship between the numerical quantities
m(E
n
), m(F
n
) that one can easily deduce from the given hypotheses
(together with basic properties of measure, such as monotonicity or
countable additivity). One can then rename these quantities (e.g.
a
n
:= m(E
n
) and b
n
:= m(F
n
)) to ”forget” that these quantities
arose from a measuretheoretic context, and then work with a purely
numerical problem, in which one is starting with hypotheses on some
sequences a
n
, b
n
of numbers and trying to deduce a conclusion about
such sequences. Such a problem is often easier to solve than the orig
inal problem due to the simpler context. Sometimes, this simpliﬁed
problem will end up being false, but the counterexample will often
be instructive, either in indicating the need to add an additional hy
pothesis connecting the a
n
, b
n
, or to indicate that one cannot work at
this level of abstraction but must introduce some additional concrete
ingredient.
Note that this trick is in many ways the antithesis of '2.1.9, be
cause by passing to a special case, one often makes the problem more
concrete, with more things that one is now able to start trying. How
ever, the two tricks can work together. One particularly useful “ad
vanced move” in mathematical problem solving is to ﬁrst abstract the
problem to a more general one, and then consider a special case of
that more abstract problem which is not directly related to the origi
nal one, but is somehow simpler than the original while still capturing
some of the essence of the diﬃculty. Attacking this alternate problem
can then lead to some indirect but important ways to make progress
on the original problem.
2.1. Problem solving strategies 219
2.1.11. Exploit Zeno’s paradox: a single epsilon can be cut
up into countably many subepsilons. A particularly useful fact
in measure theory is that one can cut up a single epsilon into count
ably many pieces, for instance by using the geometric series identity
ε = ε/2 +ε/4 +ε/8 +. . . ;
this observation arguably goes all the way back to Zeno. As such,
even if one only has an epsilon of room budgeted for a problem, one
can still use this budget to do a countably inﬁnite number of things.
This fact underlies many of the countable additivity and subaddi
tivity properties in measure theory, and also makes the ability to
approximate rough objects by smoother ones to be useful even when
countably many rough objects need to be approximated. (Exercise
1.2.3 is a typical example in which this trick is used.)
In general, one should be alert to this sort of trick when one has
to spend an epsilon or so on an inﬁnite number of objects. If one was
forced to spend the same epsilon on each object, one would soon end
up with an unacceptable loss; but if one can get away with using a
diﬀerent epsilon each time, then Zeno’s trick comes in very handy.
2.1.12. If you expand your way to a double sum, a double
integral, a sum of an integral, or an integral of a sum, try in
terchanging the two operations. Or, to put it another way: “The
FubiniTonelli theorem (Corollary 1.7.23) is your friend”. Provided
that one is either in the unsigned or absolutely convergent worlds,
this theorem allows you to interchange sums and integrals with each
other. In many cases, a double sum or integral that is diﬃcult to
sum in one direction can become easier to sum (or at least to upper
bound, which is often all that one needs in analysis). In fact, if in the
course of expanding an expression, you encounter such a double sum
or integral, you should reﬂexively try interchanging the operations to
see if the resulting expression looks any simpler.
Note that in some cases the parameters in the summation may be
constrained, and one may have to take a little care to sum it properly.
220 2. Related articles
For instance,
(2.1)
∞
¸
n=−∞
∞
¸
m=n
a
m,n
interchanges (assuming that the a
n,m
are either unsigned or abso
lutely convergent) to
∞
¸
m=−∞
m
¸
n=−∞
a
m,n
(why? try plotting the set of pairs (m, n) that appear in both). If
one is having trouble interchanging constrained sums or integrals, one
solution is to reexpress the constraint using indicator functions. For
instance, one can rewrite the constrained sum (2.1) as the uncon
strained sum
∞
¸
n=−∞
∞
¸
m=−∞
1
m≥n
a
m,n
(extending the domain of a
m,n
if necessary), at which point inter
changing the summations is easily accomplished.
The following point is obvious, but bears mentioning explicitly:
while the interchanging sums/integrals trick can be very powerful,
one should not apply it twice in a row to the same double sum or
double operation, unless one is doing something genuinely nontrivial
in between those two applications. So, after one exchanges a sum
or integral, the next move should be something other than another
exchange (unless one is dealing with a triple or higher sum or integral).
A related move (not so commonly used in measure theory, but
occurring in other areas of analysis, particularly those involving the
geometry of Euclidean spaces) is to merge two sums or integrals into
a single sum or integral over the product space, in order to use some
additional feature of the product space (e.g. rotation symmetry) that
is not readily visible in the factor spaces. The classic example of
this trick is the evaluation of the gaussian integral
∞
−∞
e
−x
2
dx by
squaring it, rewriting that square as the twodimensional gaussian
integral
R
2
e
−x
2
−y
2
dxdy, and then switching to polar coordinates.
2.1.13. Pointwise control, uniform control, and integrated
(average) control are all partially convertible to each other.
2.1. Problem solving strategies 221
There are three main ways to control functions (or sequences of func
tions, etc.) in analysis. Firstly, there is pointwise control, in which
one can control the function at every point (or almost every point),
but in a nonuniform way. Then there is uniform control, where one
can control the function in the same way at most points (possibly
throwing out a set of zero measure, or small measure). Finally, there
is integrated control (or control “on the average”), in which one con
trols the integral of a function, rather than the pointwise values of
that function.
It is important to realise that control of one type can often be
partially converted to another type. Simple examples include the
deduction of pointwise convergence from uniform convergence, or in
tegrating a pointwise bound f(x) ≤ g(x) to obtain an integrated
bound
f ≤
g. Of course, these conversions are not reversible
and thus lose some information; not every pointwise convergent se
quence is uniformly convergent, and an integral bound does not imply
a pointwise bound. However, one can partially reverse such implica
tions if one is willing to throw away an exceptional set ('2.1.7). For
instance, Egorov’s theorem (Theorem 1.3.26) lets one convert point
wise convergence to (local) uniform convergence after throwing away
an exceptional set, and Markov’s inequality (Exercise 1.4.36(vi)) lets
one convert integral bounds to pointwise bounds, again after throwing
away an exceptional set.
2.1.14. If the conclusion and hypotheses look particularly
close to each other, just expand out all the deﬁnitions and
follow your nose. This trick is particularly useful when building
the most basic foundations of a theory. Here, one may not need to
experiment too much with generalisations, abstractions, or special
cases, or try to marshall a lot of possibly relevant facts about the
objects being studied: sometimes, all one has to do is go back to ﬁrst
principles, write out all the deﬁnitions with their epsilons and deltas,
and start plugging away at the problem.
Knowing when to just follow one’s nose, and when to instead
look for a more highlevel approach to a problem, can require some
judgement or experience. A direct approach tends to work best when
the conclusion and hypothesis already look quite similar to each other
222 2. Related articles
(e.g. they both state that a certain set or family of sets is measurable,
or they both state that a certain function or family of functions is
continuous, etc.). But when the conclusion looks quite diﬀerent from
the hypotheses (e.g. the conclusion is some sort of integral identity,
and the hypotheses involve measurability or convergence properties),
then one may need to use more sophisticated tools than what one can
easily get from using ﬁrst principles.
2.1.15. Don’t worry too much about exactly what ε (or δ, or
N, etc.) needs to be. It can usually be chosen or tweaked
later if necessary. Often in the middle of an argument, you will
want to use some fact that involves a parameter, such as ε, that
you are completely free to choose (subject of course to reasonable
constraints, such as requiring ε to be positive). For instance, you
may have a measurable set and decide to approximate it from above
by an open set of at most ε more measure. But it may not be obvious
exactly what value to give this parameter ε; you have so many choices
available that you don’t know which one to pick!
In many cases, one can postpone thinking about this problem
by leaving ε undetermined for now, and continuing on with one’s
argument, which will gradually start being decorated with ε’s all over
the place. At some point, one will need ε to do something (and,
in the particular case of ε, “doing something” almost always means
“being small enough”), e.g. one may need 3nε to be less than δ, where
n, δ are some other positive quantities in one’s problem that do not
depend on ε. At this point, one could now set ε to be whatever is
needed to get past this step in the argument, e.g. one could set ε to
equal δ/4n. But perhaps one still wishes to retain the freedom to set
ε because it might come in handy later. In that case, one sets aside
the requirement “3nε < δ” and keeps going. Perhaps a bit later on,
one might need ε to do something else; for instance, one might also
need 5ε ≤ 2
−n
. Once one has compiled the complete “wish list” of
everything one wishes one’s parameters to do, then one can ﬁnally
make the decision of what value to set those parameters equal to.
For instance, if the above two inequalities are the only inequalities
required of ε, one can choose ε equal to min(δ/4n, 2
−n
/5). This may
2.1. Problem solving strategies 223
be a choice of ε which was not obvious at the start of the argument,
but becomes so as the argument progresses.
There is however one big caveat when adopting this “choose pa
rameters later” approach, which is that one needs to avoid a circular
dependence of constants. For instance, it is perfectly ﬁne to have two
arbitrary parameters ε and δ ﬂoating around unspeciﬁed for most of
the argument, until at some point you realise that you need ε to be
smaller than δ, and so one chooses ε accordingly (e.g. one sets it to
equal δ/2). Or, perhaps instead one needs δ to be smaller than ε, and
so sets δ equal to ε/2. One can execute either of these two choices
separately, but of course one cannot perform them simultaneously;
this sets up an inconsistent circular dependency in which ε needs to
be deﬁned after δ is chosen, and δ can only be chosen after ε is ﬁxed.
So, if one is going to delay choosing a parameter such as ε until later,
it becomes important to mentally keep track of what objects in one’s
argument depend on ε, and which ones are independent of ε. One
can choose ε in terms of the latter quantities, but one usually cannot
do so in terms of the former quantities (unless one takes the care to
show that the interlinked constraints between the quantities are still
consistent, and thus simultaneously satisﬁable).
2.1.16. Once one has started to lose some constants, don’t
be hesitant to lose some more. Many techniques in analysis end
up giving inequalities that are ineﬃcient by a constant factor. For
instance, any argument involving dyadic decomposition and powers
of two tends to involve losses of factors of 2. When arguing using balls
in Euclidean space, one sometimes loses factors involving the volume
of the unit ball (although this factor often cancels itself out if one
tracks it more carefully). And so forth. However, in many cases these
constant factors end up being of little importance: an upper bound
of 2ε or 100ε is often just as good as an upper bound of ε for the
purposes of analysis (cf. '2.1.15). So it is often best not to invest too
much energy in carefully computing and optimising these constants;
giving these constants a symbol such as C, and not worrying about
their exact value, is often the simplest approach. (One can also use
asymptotic notation, such as O(), which is very convenient to use
once you know how it works.)
224 2. Related articles
Now there are some cases in which one really does not want to
lose any constants at all. For instance, if one is using '2.1.1 to prove
that X = Y , it is not enough to show that X ≤ 2Y and Y ≤ 2X;
one really needs to show X ≤ Y and Y ≤ X without losing any
constants. (But proving X ≤ (1 + ε)Y and Y ≤ (1 + ε)X is OK,
by '2.1.2.) But once one has already performed one step that loses
a constant, there is little further to be lost by losing more; there can
be a big diﬀerence between X ≤ Y and X ≤ 2Y , but there is little
diﬀerence in practice between X ≤ 2Y and X ≤ 100Y , at least for
the purposes of mathematical analysis. At that stage, one should
put oneself in the mental mode of thought where “constants don’t
matter”, which can lead to some simpliﬁcations. For instance, if one
has to estimate a sum X+Y of two positive quantities, one can start
using such estimates as
max(X, Y ) ≤ X +Y ≤ 2 max(X, Y ),
which says that, up to a factor of 2, X + Y is the same thing as
max(X, Y ). In some cases the latter is easier to work with (e.g.
max(X, Y )
n
is equal to max(X
n
, Y
n
), whereas the formula for (X +
Y )
n
is messier).
2.1.17. One can often pass to a subsequence to improve the
convergence properties. In real analysis, one often ends up pos
sessing a sequence of objects, such as a sequence of functions f
n
,
which may converge in some rather slow or weak fashion to a limit f.
Often, one can improve the convergence of this sequence by passing
to a subsequence. For instance:
• In a metric space, if a sequence x
n
converges to a limit
x, then one can ﬁnd a subsequence x
nj
which converges
quickly to the same limit x, for instance one can ensure that
d(x
nj
, x) ≤ 2
−j
(or one can replace 2
−j
with any other posi
tive expression depending on j). In particular, one can make
¸
∞
j=1
d(x
nj
, x) and
¸
∞
j=1
d(x
nj
, x
nj+1
) absolutely conver
gent, which is sometimes useful.
• A sequence of functions that converges in L
1
norm or in mea
sure can be reﬁned to a subsequence that converges point
wise almost everywhere as well.
2.1. Problem solving strategies 225
• A sequence in a (sequentially) compact space may not con
verge at all, but some subsequence of it will always converge.
• The pigeonhole principle: A sequence which takes only ﬁnitely
many values has a subsequence that is constant. More gen
erally, a sequence which lives in the union of ﬁnitely many
sets has a subsequence that lives in just one of these sets.
Often, the subsequence is good enough for one’s applications, and
there are also a number of ways to get back from a subsequence to
the original sequence, such as:
• In a metric space, if you know that x
n
is a Cauchy sequence,
and some subsequence of x
n
already converges to x, then
this drags the entire sequence with it, i.e. x
n
converges to
x also.
• The Urysohn subsequence principle: in a topological space,
if every subsequence of a sequence x
n
itself has a subse
quence that converges to a limit x, then the entire sequence
converges to x.
2.1.18. A real limit can be viewed as a meeting of the limit
superior and limit inferior. A sequence x
n
of real numbers does
not necessarily have a limit lim
n→∞
x
n
, but the limit superior limsup
n→∞
x
n
:=
inf
N
sup
n>N
x
n
and the limit inferior liminf
n→∞
x
n
= sup
N
inf
n>N
x
n
always exist (though they may be inﬁnite), and can be easily deﬁned
in terms of inﬁma and suprema. Because of this, it is often convenient
to work with the lim sup and lim inf instead of a limit. For instance,
to show that a limit lim
n→∞
x
n
exists, it suﬃces to show that
limsup
n→∞
x
n
≤ liminf
n→∞
x
n
+ε
for all ε > 0. In a similar spirit, to show that a sequence x
n
of real
numbers converges to zero, it suﬃces to show that
limsup
n→∞
[x
n
[ ≤ ε
for all ε > 0. It can be more convenient to work with lim sups and
lim infs instead of limits because one does not need to worry about
the issue of whether the limit exists or not, and many tools (notably
Fatou’s lemma and its relatives) still work in this setting. One should
226 2. Related articles
however be cautious that lim sups and lim infs tend to have only
one half of the linearity properties that limits do; for instance, lim
sups are subadditive but not necessarily additive, while lim infs are
superadditive but not necessarily additive.
The proof of the monotone diﬀerentiation theorem (Theorem
1.6.25) given in the text relies quite heavily on this strategy.
2.2. The Radamacher diﬀerentiation theorem
The FubiniTonelli theorem (Corollary 1.7.23) is often used in ex
tending lowerdimensional results to higherdimensional ones. We
illustrate this by extending the onedimensional Lipschitz diﬀerenti
ation theorem (Exercise 1.6.41) to higher dimensions, obtaining the
Radamacher diﬀerentiation theorem.
We ﬁrst recall some higherdimensional deﬁnitions:
Deﬁnition 2.2.1 (Lipschitz continuity). A function f : X → Y from
one metric space (X, d
X
) to another (Y, d
Y
) is said to be Lipschitz con
tinuous if there exists a constant C > 0 such that d
Y
(f(x), f(x
/
)) ≤
Cd
X
(x, x
/
) for all x, x
/
∈ X. (In the applications of this section, X
will be R
d
and Y will be R, with the usual metrics.)
Exercise 2.2.1. Show that Lipschitz continuous functions are uni
formly continuous, and hence continuous. Then give an example of a
uniformly continuous function f : [0, 1] → [0, 1] that is not Lipschitz
continuous.
Deﬁnition 2.2.2 (Diﬀerentiability). Let f : R
d
→ R be a function,
and let x
0
∈ R
d
. For any v ∈ R
d
, we say that f is directionally
diﬀerentiable at x
0
in the direction v if the limit
D
v
f(x
0
) := lim
h→0;h∈R\¦0¦
f(x
0
+hv) −f(x
0
)
h
exists, in which case we call D
v
f(x
0
) the directional derivative of f
at x
0
in this direction. If v = e
i
is one of the standard basis vectors
e
1
, . . . , e
d
of R
d
, we write D
v
f(x
0
) as
∂f
∂xi
(x
0
), and refer to this as
the partial derivative of f at x
0
in the e
i
direction.
2.2. The Radamacher diﬀerentiation theorem 227
We say that f is totally diﬀerentiable at x
0
if there exists a vector
∇f(x
0
) ∈ R
d
with the property that
lim
h→0;h→R
d
\¦0¦
f(x
0
+h) −f(x
0
) −h ∇f(x
0
)
[h[
= 0,
where v w is the usual dot product on R
d
. We refer to ∇f(x
0
) (if it
exists) as the gradient of f at x
0
.
Remark 2.2.3. From the viewpoint of diﬀerential geometry, it is
better to work not with the gradient vector ∇f(x
0
) ∈ R
d
, but rather
with the derivative covector df(x
0
) : R
d
→ R given by df(x
0
) :
v → ∇f(x
0
) v. This is because one can then deﬁne the notion of
total diﬀerentiability without any mention of the Euclidean dot prod
uct, which allows one to extend this notion to other manifolds in
which there is no Euclidean (or more generally, Riemannian) struc
ture. However, as we are working exclusively in Euclidean space for
this application, this distinction will not be important for us.
Total diﬀerentiability implies directional and partial diﬀerentia
bility, but not conversely, as the following three exercises demonstrate.
Exercise 2.2.2 (Total diﬀerentiability implies directional and partial
diﬀerentiability). Show that if f : R
d
→ R is totally diﬀerentiable
at x
0
, then it is directionally diﬀerentiable at x
0
in each direction
v ∈ R
d
, and one has the formula
(2.2) D
v
f(x
0
) = v ∇f(x
0
).
In particular, the partial derivatives
∂f
∂xi
f(x
0
) exist for i = 1, . . . , d
and
(2.3) ∇f(x
0
) =
∂f
∂x
1
(x
0
), . . . ,
∂f
∂x
d
(x
0
)
.
Exercise 2.2.3 (Continuous partial diﬀerentiability implies total dif
ferentiability). Let f : R
d
→ R be such that the partial derivatives
∂f
∂xi
: R
d
→ R exist everywhere and are continuous. Then show
that f is totally diﬀerentiable everywhere, which in particular implies
that the gradient is given by the formula (2.3) and the directional
derivatives are given by (2.2).
228 2. Related articles
Exercise 2.2.4 (Directional diﬀerentiability does not imply total dif
ferentiability). Let f : R
2
→ R be deﬁned by setting f(0, 0) := 0 and
f(x
1
, x
2
) :=
x1x
2
2
x
2
1
+x
2
2
for (x
1
, x
2
) ∈ R
2
`¦(0, 0)¦. Show that the direc
tional derivatives D
v
f(x) exist for all x, v ∈ R
2
(so in particular, the
partial derivatives exist), but that f is not totally diﬀerentiable at
the origin (0, 0).
Now we can state the Rademacher diﬀerentiation theorem.
Theorem 2.2.4 (Rademacher diﬀerentiation theorem). Let f : R
d
→
R be Lipschitz continuous. Then f is totally diﬀerentiable at x
0
for
almost every x
0
∈ R
d
.
Note that the d = 1 case of this theorem is Exercise 1.6.41, and
indeed we will use the onedimensional theorem to imply the higher
dimensional one, though there will be some technical issues due to
the gap between directional and total diﬀerentiability.
Proof. The strategy here is to ﬁrst aim for the more modest goal of
directional diﬀerentiability, and then ﬁnd a way to link the directional
derivatives together to get total diﬀerentiability.
Let v, x
0
∈ R
d
. As f is continuous, we see that in order for the
directional derivative
D
v
f(x
0
) := lim
h→0;h∈R\¦0¦
f(x
0
+hv) −f(x
0
)
h
to exist, it suﬃces to let h range in the dense subset Q`¦0¦ of R`¦0¦
for the purposes of determing whether the limit exists. In particular,
D
v
f(x
0
) exists if and only if
limsup
h→0;h∈Q\¦0¦
f(x
0
+hv) −f(x
0
)
h
= liminf
h→0;h∈Q\¦0¦
f(x
0
+hv) −f(x
0
)
h
.
From this we easily conclude that for each direction v ∈ R
d
, the set
E
v
:= ¦x
0
∈ R
d
: D
v
f(x
0
) does not exist¦
is Lebesgue measurable in R
d
(indeed, it is even Borel measurable). A
similar argument reveals that D
v
f is a measurable function outside
of E
v
. From the Lipschitz nature of f, we see that D
v
f is also a
bounded function.
2.2. The Radamacher diﬀerentiation theorem 229
Now we claim that E
v
is a null set for each v. For v = 0 E
v
is
clearly empty, so we may assume v = 0. Applying an invertible linear
transformation to map v to e
1
(noting that such transformations will
map Lipschitz functions to Lispchitz functions, and null sets to null
sets) we may assume without loss of generality that v is the basis
vector e
1
. Thus our task is now to show that
∂f
∂x1
(x) exists for almost
every x ∈ R
d
.
We now split R
d
as RR
d−1
. For each x
0
∈ R and y
0
∈ R
d−1
,
we see from the deﬁnitions that
∂f
∂x1
(x
0
, y
0
) exists if and only if the
onedimensional function x → f(x, y
0
) is diﬀerentiable at x
0
. But this
function is Lipschitz continuous (this is inherited from the Lipschitz
continuity of f), and so we see that for each ﬁxed y
0
∈ R
d−1
, the set
E
y0
:= ¦x
0
∈ R : (x
0
, y
0
) ∈ E¦ is a null set in R. Applying Tonelli’s
theorem for sets (Corollary 1.7.19), we conclude that E is a null set
as required.
We would like to now conclude that
¸
v∈R
d E
v
is a null set, but
there are uncountably many v’s, so this is not directly possible. How
ever, as Q
d
is rational, we can at least assert that E :=
¸
v∈Q
d E
v
is
a null set. In particular, for almost every x
0
∈ R
d
, f is directionally
diﬀerentiable in every rational direction v ∈ Q
d
.
Now we perform an important trick, in which we interpret the
directional derivative D
v
f as a weak derivative. We already know
that D
v
f is almost everywhere deﬁned, bounded and measurable.
Now let g : R
d
→ R be any function that is compactly supported
and Lipschitz continuous. We investigate the integral
R
d
D
v
f(x)g(x) dx.
This integral is absolutely convergent since D
v
f(x) is bounded and
measurable, and g(x) is continuous and compactly supported, hence
bounded. We expand this out as
R
d
lim
h→0;h∈R\¦0¦
f(x +hv) −f(x)
h
g(x) dx.
Note (from the Lipschitz nature of f) that the expression
f(x+hv)−f(x)
h
g(x)
is bounded uniformly in h and x, and is also uniformly compactly
230 2. Related articles
supported in x for h in a bounded set. We may thus apply the domi
nated convergence theorem (Theorem 1.4.49) to pull the limit out of
the integral to obtain
lim
h→0;h∈R\¦0¦
R
d
f(x +hv) −f(x)
h
g(x) dx.
Now, from translation invariance of the Lebesgue integral (Exercise
1.3.15) we have
R
d
f(x +hv)g(x) dx =
R
d
f(x)g(x −hv) dx
and so (by the lienarity of the Lebesgue integral) we may rearrange
the previous expression as
lim
h→0;h∈R\¦0¦
R
d
f(x)
g(x −hv) −g(x)
h
dx.
Now, as g is Lipschitz, we know that
g(x−hv)−g(x)
h
is uniformly bounded
and converges pointwise almost everywhere to D
−v
g(x) as h → 0. We
may thus apply the dominated convergence theorem again and end
up with the integration by parts formula
(2.4)
R
d
D
v
f(x)g(x) dx =
R
d
f(x)D
−v
g(x) dx.
This formula moves the directional derivative operator D
v
from f over
to g. At present, this does not look like much of an advantage, be
cause g is the same sort of function that f is. However, the key point
is that we can choose g to be whatever we please, whereas f is ﬁxed.
In particular, we can choose g to be a compactly supported, contin
uously diﬀerentiable function (such functions are Lipschitz from the
fundamental theorem of calculus, as their derivatives are bounded).
By Exercise 2.2.3, one has D
−v
g = −v ∇g for such functions, and so
R
d
D
v
f(x)g(x) dx = −
R
d
f(x)(v ∇g)(x) dx.
The righthand side is linear in v, and so the lefthand side must be
linear in v also. In particular, if v = (v
1
, . . . , v
d
), then we have
R
d
D
v
f(x)g(x) dx =
d
¸
j=1
v
j
R
d
D
ej
f(x)g(x) dx.
2.2. The Radamacher diﬀerentiation theorem 231
If we deﬁne the gradient candidate function
∇f(x) := (D
e1
f(x), . . . , D
e
d
f(x)) = (
∂f
∂x
1
(x), . . . ,
∂f
∂x
d
(x))
(note that this function is welldeﬁned almost everywhere, even though
we don’t know yet whether f is totally diﬀerentiable almost every
where), we thus have
R
d
(D
v
f −v ∇f)(x)g(x) dx = 0
for all compactly supported, continuously diﬀerentiable g. This im
plies (see Exercise 2.2.5 below) that F
v
:= D
v
f − v ∇f vanishes
almost everywhere, thus (by countable subadditivity) we have
(2.5) D
v
f(x
0
) = v ∇f(x
0
)
for almost every x
0
∈ R
d
and every v ∈ Q
d
.
Let x
0
be such that (2.5) holds for all v ∈ Q
d
. We claim that
this forces f to be totally diﬀerentiable at x
0
, which would give the
claim. Let F : R
d
→ R
d
be the modiﬁed function
F(h) := f(x
0
+h) −f(x
0
) −h ∇f(x
0
).
Our objective is to show that
lim
h→0;h∈R
d
\¦0¦
[F(h)[/[h[ = 0.
On the other hand, we have F(0) = 0, F is Lipschitz, and from (2.5)
we see that D
v
F(0) = 0 for every v ∈ Q
d
.
Let ε > 0, and suppose that h ∈ R
d
`¦0¦. Then we can write
h = ru where r := [h[ and u := h/[h[ lies on the unit sphere. This u
need not lie in Q
d
, but we can approximate it by some vector v ∈ Q
d
with [u −v[ ≤ ε. Furthermore, by the total boundedness of the unit
sphere, we can make v lie in a ﬁnite subset V
ε
of Q
d
that only depends
on ε (and on d).
Since D
v
F(0) = 0 for all v ∈ V
ε
, we see (by making [h[ small
enough depending on V
ε
) that we have
[
F(rv) −F(0)
r
[ ≤ ε
for all v ∈ V
ε
, and thus
[F(rv)[ ≤ εr.
232 2. Related articles
On the other hand, from the Lipschitz nature of F, we have
[F(ru) −F(rv)[ ≤ Cr[u −v[ ≤ Crε
where C is the Lipschitz constant of F. As h = ru, we conclude that
[F(h)[ ≤ (C + 1)rε.
In other words, we have shown that
[F(h)[/[h[ ≤ (C + 1)ε
whenever [h[ is suﬃciently small depending on ε. Letting ε → 0, we
obtain the claim.
Exercise 2.2.5. Let F : R
d
→ R be a locally integrable function
with the property that
R
d
F(x)g(x) dx = 0 whenever g is a com
pactly supported, continuously diﬀerentiable function. Show that F
is zero almost everywhere. (Hint: if not, use the Lebesgue diﬀerenti
ation theorem to ﬁnd a Lebesgue point x
0
of F for which F(x
0
) = 0,
then pick a g which is supported in a suﬃciently small neighbourhood
of x
0
.)
2.3. Probability spaces
In this section we isolate an important special type of measure space,
namely a probability space. As the name suggests, these spaces are of
fundamental importance in the foundations of probability, although
it should be emphasised that probability theory should not be viewed
as the study of probability spaces, as these are merely models for the
true objects of study of that theory, namely the behaviour of random
events and random variables. (See '??? of ??? for further discussion
of this point. Crossreference will be added once the remaining
sections of the blog are converted to book form  T.) This text
will however not be focused on applications to probability theory
Deﬁnition 2.3.1 (Probability space). A probability space is a mea
sure space (Ω, T, P) of total measure 1: P(Ω) = 1. The measure P
is known as a probability measure.
Note the change of notation: whereas measure spaces are tradi
tionally denoted by symbols such as (X, B, µ), probability spaces are
traditionally denoted by symbols such as (Ω, T, P). Of course, such
2.3. Probability spaces 233
notational changes have no impact on the underlying mathematical
formalism, but they reﬂect the diﬀerent cultures of measure theory
and probability theory. In particular, the various components Ω, T,
P carry the following interpretations in probability theory, that are
absent in other applications of measure theory:
(i) The space Ω is known as the sample space, and is interpreted
as the set of all possible states ω ∈ Ω that a random system
could be in.
(ii) The σalgebra T is known as the event space, and is inter
preted as the set of all possible events E ∈ T that one can
measure.
(iii) The measure P(E) of an event is known as the probability
of that event.
The various axioms of a probability space then formalise the foun
dational axioms of probability, as set out by Kolmogorov.
Example 2.3.2 (Normalised measure). Given any measure space
(X, B, µ) with 0 < µ(X) < +∞, the space (X, B,
1
µ(X)
µ) is a prob
ability space. For instance, if Ω is a nonempty ﬁnite set with the
discrete σalgebra 2
Ω
and the counting measure #, then the nor
malised counting measure
1
#Ω
# is a probability measure (known as
the (discrete) uniform probability measure on Ω), and (Ω, 2
Ω
,
1
#Ω
#)
is a probability space. In probability theory, this probability spaces
models the act of drawing an element of the discrete set Ω uniformly
at random.
Similarly, if Ω ⊂ R
d
is a Lebesgue measurable set of positive ﬁnite
Lebesgue measure, 0 < m(Ω) < ∞, then (Ω, /[R
d
]
Ω
,
1
m(Ω)
m
Ω
) is
a probability space. The probability measure
1
m(Ω)
m
Ω
is known as
the (continuous) uniform probability measure on Ω. In probability
theory, this probability spaces models the act of drawing an element
of the continuous set Ω uniformly at random.
Example 2.3.3 (Discrete and continuous probability measures). If
Ω is a (possibly inﬁnite) nonempty set with the discrete σalgebra
2
Ω
, and if (p
ω
)
ω∈Ω
are a collection of real numbers in [0, 1] with
¸
ω∈Ω
p
ω
= 1, then the probability measure P deﬁned by P :=
234 2. Related articles
¸
ω∈Ω
p
ω
δ
ω
, or in other words
P(E) :=
¸
ω∈E
p
ω
,
is indeed a probability measure, and (Ω, 2
Ω
, P) is a probability space.
The function ω → p
ω
is known as the (discrete) probability distribu
tion of the state variable ω.
Similarly, if Ω is a Lebesgue measurable subset of R
d
of positive
(and possibly inﬁnite) measure, and f : Ω → [0, +∞] is a Lebesgue
measurable function on Ω (where of course we restrict the Lebesgue
measure space on R
d
to Ω in the usual fashion) with
Ω
f(x) dx = 1,
then (Ω, /[R
d
]
Ω
, P) is a probability space, where P := m
f
is the
measure
P(E) :=
Ω
1
E
(x)f(x) dx =
E
f(x) dx.
The function f is known as the (continuous) probability density of
the state variable ω. (This density is not quite unique, since one can
modify it on a set of probability zero, but it is welldeﬁned up to
this ambiguity. See '1.2 of An epsilon of room, Vol. I for further
discussion.)
Exercise 2.3.1 (No translationinvariant random integer). Show that
there is no probability measure P on the integers Z with the discrete
σalgebra 2
Z
with the translationinvariance property P(E + n) =
P(E) for every event E ∈ 2
Z
and every integer n.
Exercise 2.3.2 (No translationinvariant random real). Show that
there is no probability measure P on the reals R with the Lebesgue
σalgebra /[R] with the translationinvariance property P(E +x) =
P(E) for every event E ∈ /[R] and every real x.
Many concepts in measure theory are of importance in probabil
ity theory, although the terminology is changed to reﬂect the diﬀerent
perspective on the subject. For instance, the notion of a property
holding almost everywhere is now replaced with that of a property
holding almost surely. A measurable function is now referred to as a
random variable and is often denoted by symbols such as X, and the
integral of that function on the probability space (if the random vari
able is unsigned or absolutely convergent) is known as the expectation
2.4. The Kolmogorov extension theorem 235
of that random variable, and is denoted E(X). Thus, for instance,
the BorelCantelli lemma (Exercise 1.4.44) now reads as follows: given
any sequence E
1
, E
2
, E
3
, . . . of events such that
¸
∞
n=1
P(E
n
) < ∞, it
is almost surely true that at most ﬁnitely many of these events hold.
In a similar spirit, Markov’s inequality (Exercise 1.4.36(vi)) becomes
the assertion that P(X ≥ λ) ≤
1
λ
EX for any nonnegative random
variable X and any 0 < λ < ∞.
2.4. Inﬁnite product spaces and the Kolmogorov
extension theorem
In Section 1.7.4 we considered the product of two sets, measurable
spaces, or (σﬁnite) measure spaces. We now consider how to gen
eralise this concept to products of more than two such spaces. The
axioms of set theory allow us to form a Cartesian product X
A
:=
¸
α∈A
X
α
of any family (X
α
)
α∈A
of sets indexed by another set A,
which consists of the space of all tuples x
A
= (x
α
)
α∈A
indexed by A,
for which x
α
∈ X
α
for all α ∈ A. This concept allows for a succinct
formulation of the axiom of choice (Axiom 0.0.4), namely that an
arbitrary Cartesian product of nonempty sets remains nonempty.
For any β ∈ A, we have the coordinate projection maps π
β
:
X
A
→ X
β
deﬁned by π
β
((x
α
)
α∈A
) := x
β
. More generally, given any
B ⊂ A, we deﬁne the partial projections π
B
: X
A
→ X
B
to the partial
product space X
B
:=
¸
α∈B
X
α
by π
B
((x
α
)
α∈A
) := (x
α
)
α∈B
. More
generally still, given two subsets C ⊂ B ⊂ A, we have the partial
subprojections π
C←B
: X
B
→ X
C
deﬁned by π
C←B
((x
α
)
α∈B
) :=
(x
α
)
α∈C
. These partial subprojections obey the composition law
π
D←C
◦ π
C←B
:= π
D←B
for all D ⊂ C ⊂ B ⊂ A (and thus form
a very simple example of a category).
As before, given any σalgebra B
β
on X
β
, we can pull it back by
π
β
to create a σalgebra
π
∗
β
(B
β
) := ¦π
−1
β
(E
β
) : E
β
∈ B
β
¦
on X
A
. One easily veriﬁes that this is indeed a σalgebra. Informally,
π
∗
β
(B
β
) describes those sets (or “events”, if one is thinking in prob
abilistic terms) that depend only on the x
β
coordinate of the state
x
A
= (x
α
)
α∈A
, and whose dependence on x
β
is B
β
measurable. We
236 2. Related articles
can then deﬁne the product σalgebra
¸
β∈A
B
β
:= '
¸
β∈A
π
∗
β
(B
β
)`.
We have a generalisation of Exercise 1.7.18:
Exercise 2.4.1. Let ((X
α
, B
α
))
α∈A
be a family of measurable spaces.
For any B ⊂ A, write B
B
:=
¸
β∈B
B
β
.
(1) Show that B
A
is the coarsest σalgebra on X
A
that makes
the projection maps π
β
measurable morphisms for all β ∈ A.
(2) Show that for each B ⊂ A, that π
B
is a measurable mor
phism from (X
A
, B
A
) to (X
B
, B
B
).
(3) If E in B
A
, show that there exists an at most countable
set B ⊂ A and a set E
B
∈ B
B
such that E
A
= π
−1
B
(E
B
).
Informally, this asserts that a measurable event can only
depend on at most countably many of the coeﬃcients.
(4) If f : X
A
→ [0, +∞] is B
A
measurable, show that there
exists an at most countable set B ⊂ A and a B
B
measurable
function f
B
: X
B
→ [0, +∞] such that f = f
B
◦ π
B
.
(5) If A is at most countable, show that B
A
is the σalgebra
generated by the sets
¸
β∈A
E
β
with E
β
∈ B
β
for all β ∈ A.
(6) On the other hand, show that if A is uncountable and the
B
α
are all nontrivial, show that B
A
is not the σalgebra
generated by sets
¸
β∈A
E
β
with E
β
∈ B
β
for all β ∈ A.
(7) If B ⊂ A, E ∈ B
A
, and x
A\B
∈ X
A\B
, show that the set
E
x
A\B
,B
:= ¦x
B
∈ X
B
: (x
B
, x
A\B
) ∈ E¦ lies in B
B
, where
we identify X
B
X
A\B
with X
A
in the obvious manner.
(8) If B ⊂ A, f : X
A
→ [0, +∞] is B
A
measurable, and x
A\B
∈
X
A\B
, show that the function f
x
A\B
,B
: x
B
→ f(x
B
, x
A\B
)
is B
B
measurable.
Now we consider the problem of constructing a measure µ
A
on
the product space X
A
. Any such measure µ
A
will induce pushforward
measures µ
B
:= (π
B
)
∗
µ
A
on X
B
(introduced in Exercise 1.4.38), thus
µ
B
(E
B
) := µ
A
(π
−1
B
(E
B
))
2.4. The Kolmogorov extension theorem 237
for all E
B
∈ B
B
. These measures obey the compatibility relation
(2.6) (π
C←B
)
∗
µ
B
= µ
C
whenever C ⊂ B ⊂ A, as can be easily seen by chasing the deﬁnitions.
One can then ask whether one can reconstruct µ
A
from just from
the projections µ
B
to ﬁnite subsets B. This is possible in the impor
tant special case when the µ
B
(and hence µ
A
) are probability mea
sures, provided one imposes an additional inner regularity hypothesis
on the measures µ
B
. More precisely:
Deﬁnition 2.4.1 (Inner regularity). A (metrisable) inner regular
measure space (X, B, µ, d) is a measure space (X, B, µ) equipped with
a metric d such that
(1) Every compact set is measurable; and
(2) One has µ(E) = sup
K⊂E,K compact
µ(K) for all measur
able E.
We say that µ is inner regular if it is associated to an inner regular
measure space.
Thus for instance Lebesgue measure is inner regular, as are Dirac
measures and counting measures. Indeed, most measures that one ac
tually encounters in applications will be inner regular. For instance,
any ﬁnite Borel measure on R
d
(or more generally, on a locally com
pact, σcompact space) is inner regular, as is any Radon measure; see
'1.10 of An epsilon of room, Vol. I.
Remark 2.4.2. One can generalise the concept of an inner regular
measure space to one which is given by a topology rather than a met
ric; Kolmogorov’s extension theorem still holds in this more general
setting, but requires Tychonoﬀ’s theorem, which is discussed in '1.8
of An epsilon of room, Vol. I. However, some minimal regularity hy
potheses of a topological nature are needed to make the Kolmogorov
extension theorem work, although this is usually not a severe restric
tion in practice.
Theorem 2.4.3 (Kolmogorov extension theorem). Let ((X
α
, B
α
), T
α
)
α∈A
be a family of measurable spaces (X
α
, B
α
), equipped with a topology
238 2. Related articles
T
α
. For each ﬁnite B ⊂ A, let µ
B
be an inner regular probability mea
sure on B
B
:=
¸
α∈B
B
α
with the product topology T
B
:=
¸
α∈B
T
α
,
obeying the compatibility condition (2.6) whenever C ⊂ B ⊂ A are
two nested ﬁnite subsets of A. Then there exists a unique probability
measure µ
A
on B
A
with the property that (π
B
)
∗
µ
A
= µ
B
for all ﬁnite
B ⊂ A.
Proof. Our main tool here will be the HahnKolmogorov extension
theorem for premeasures (Theorem 1.7.8), combined with the Heine
Borel theorem.
Let B
0
be the set of all subsets of X
A
that are of the formπ
−1
B
(E
B
)
for some ﬁnite B ⊂ A and some E
B
∈ B
B
. One easily veriﬁes that
this is a Boolean algebra that is contained in B
A
. We deﬁne a function
µ
0
: B
0
→ [0, +∞] by setting
µ
0
(E) := µ
B
(E
B
)
whenever E takes the form π
−1
B
(E
B
) for some ﬁnite B ⊂ A and E
B
∈
B
B
. Note that a set E ∈ B
0
may have two diﬀerent representations
E = π
−1
B
(E
B
) = π
−1
B
(E
B
) for some ﬁnite B, B
/
⊂ A, but then one
must have E
B
= π
B←B∪B
(E
B∪B
) and E
B
= π
B
←B∪B
(E
B∪B
),
where E
B∪B
:= π
B∪B
(E). Applying (2.6), we see that
µ
B
(E
B
) = µ
B∪B
(E
B∪B
)
and
µ
B
(E
B
) = µ
B∪B
(E
B∪B
)
and thus µ
B
(E
B
) = µ
B
(E
B
). This shows that µ
0
(E) is well deﬁned.
As the µ
B
are probability measures, we see that µ
0
(X
A
) = 1.
It is not diﬃcult to see that µ
0
is ﬁnitely additive. We now claim
that µ
0
is a premeasure. In other words, we claim that if E ∈ B
0
is the disjoint countable union E =
¸
∞
n=1
E
n
of sets E
n
∈ B
0
, then
µ
0
(E) =
¸
∞
n=1
µ
0
(E
n
).
For each N ≥ 1, let F
N
:= E`
¸
N
n=1
E
N
. Then the F
N
lie in
B
0
, are decreasing, and are such that
¸
∞
N=1
F
N
= ∅. By ﬁnite addi
tivity (and the ﬁniteness of µ
0
), we see that it suﬃces to show that
lim
N→∞
µ
0
(F
N
) = 0.
2.4. The Kolmogorov extension theorem 239
Suppose this is not the case, then there exists ε > 0 such that
µ
0
(F
N
) > ε for all N. As each F
N
lies in B
0
, we have F
N
= π
−1
B
N
(G
N
)
for some ﬁnite sets B
N
⊂ A and some B
B
N
measurable sets G
N
. By
enlarging each B
N
as necessary we may assume that the B
N
are
increasing in N. The decreasing nature of the F
N
then gives the
inclusions
G
N+1
⊂ π
−1
B
N
←B
N+1
(G
N
).
By inner regularity, one can ﬁnd a compact subset K
N
of each G
N
such that
µ
B
N
(K
N
) ≥ µ
B
N
(G
N
) −ε/2
N+1
.
If we then set
K
/
N
:=
N
¸
N
=1
π
−1
B
N
←B
N
(K
N
)
then we see that each K
/
N
is compact and
µ
B
N
(K
/
N
) ≥ µ
B
N
(G
N
) −ε/2
N
≥ ε −ε/2
N
.
In particular, the sets K
/
N
are nonempty. By construction, we also
have the inclusions
K
/
N+1
⊂ π
−1
B
N
←B
N+1
(K
/
N
)
and thus the sets H
N
:= π
−1
B
N
(K
/
N
) are decreasing in N. On the other
hand, since these sets are contained in F
N
, we have
¸
∞
N=1
H
N
= ∅.
By the axiom of choice, we can select an element x
N
∈ H
N
from
H
N
for each N. Observe that for any N
0
, that π
B
N
0
(x
N
) will lie
in the compact set K
/
N0
whenever N ≥ N
0
. Applying the Heine
Borel theorem repeatedly, we may thus ﬁnd a subsequence x
N1,m
of
the x
N
for m = 1, 2, . . . such that π
B1
(x
N1,m
) converges; then we
can ﬁnd a further subsequence x
N2,m
of that subsequence such that
π
B2
(x
N2,m
), and more generally obtain nested subsequences x
Nj,m
for
m = 1, 2, . . . and j = 1, 2, . . . such that for each j = 1, 2, . . ., the
sequence m → π
Bj
(x
Nj,m
) converges.
Now we use the diagonalisation trick. Consier the sequence x
Nm,m
=:
(y
m,α
)
α∈A
for m = 1, 2, . . .. By construction, we see that for each j,
π
Bj
(x
Nm,m
) converges to a limit as m → ∞. This implies that for
each α ∈
¸
∞
j=1
B
j
, y
m,α
converges to a limit y
α
as m → ∞. As K
/
j
is
closed, we see that (y
α
)
α∈Bj
∈ K
/
j
for each j. If we then extend y
α
240 2. Related articles
arbitrarily from α ∈
¸
∞
j=1
B
j
to α ∈ A, then the point y := (y
α
)
α∈A
lies in H
j
for each j. But this contradicts the fact that
¸
∞
N=1
H
N
= ∅.
This contradiction completes the proof that µ
0
is a premeasure.
If we then let µ be the HahnKolmogorov extension of µ
0
, one eas
ily veriﬁes that µ obeys all the required properties, and the uniqueness
follows from Exercise 1.7.7.
The Kolmogorov extension theorem is a fundamental tool in the
foundations of probability theory, as it allows one to construct a prob
ability space to hold a variety of random processes (X
t
)
t∈T
, both in
the discrete case (when the set of times T is something like the in
tegers Z) and in the continuous case (when the set of times T is
something like R). In particular, it can be used to rigorously con
struct a process for Brownian motion, known as the Wiener process.
We will however not focus on this topic, which can be found in many
graduate probability texts. But we will give one common special case
of the Kolmogorov extension theorem, which is to construct product
probability measures:
Theorem 2.4.4 (Existence of product measures). Let A be an ar
bitrary set. For each α ∈ A, let (X
α
, B
α
, µ
α
) be a probability space
in which X
α
is a locally compact, σcompact metric space, with B
α
being its Borel σalgebra (i.e. the σalgebra generated by the open
sets). Then there exists a unique probability measure µ
A
=
¸
α∈A
µ
α
on (X
A
, B
A
) := (
¸
α∈A
X
α
,
¸
α∈A
B
α
) with the property that
µ
A
(
¸
α∈A
E
α
) =
¸
α∈A
µ
α
(E
α
)
whenever E
α
∈ B
α
for each α ∈ A, and one has E
α
= X
α
for all but
ﬁnitely many of the α.
Proof. We apply the Kolmogorov extension theorem to the ﬁnite
product measures µ
B
:=
¸
α∈B
µ
α
for ﬁnite B ⊂ A, which can be
constructed using the machinery in Section 1.7.4. These are Borel
probability measures on a locally compact, σcompact space and are
thus inner regular (see '1.10 of An epsilon of room, Vol. I ). The com
patibility condition (2.6) can be veriﬁed from the uniqueness proper
ties of ﬁnite product measures.
2.4. The Kolmogorov extension theorem 241
Remark 2.4.5. This result can also be obtained from the Riesz rep
resentation theorem, which is covered in '1.10 of An epsilon of room,
Vol. I.
Example 2.4.6 (Bernoulli cube). Let A := N, and for each α ∈ A,
let (X
α
, B
α
, µ
α
) be the twoelement set X
α
= ¦0, 1¦ with the discrete
metric (and thus discrete σalgebra) and the uniform probability mea
sure µ
α
. Then Theorem 2.4.4 gives a probability measure µ on the in
ﬁnite discrete cube X
A
:= ¦0, 1¦
N
, known as the (uniform) Bernoulli
measure on this cube. The coordinate functions π
α
: X
A
→ ¦0, 1¦ can
then be interpreted as a countable sequence of random variables tak
ing values in ¦0, 1¦. From the properties of product measure one can
easily check that these random variables are uniformly distributed on
¦0, 1¦ and are jointly independent
2
. Informally, Bernoulli measure al
lows one to model an inﬁnite number of “coin ﬂips”. One can replace
the natural numbers here by any other index set, and have a similar
construction.
Example 2.4.7 (Continuous cube). We repeat the previous example,
but replace ¦0, 1¦ with the unit interval [0, 1] (with the usual metric,
the Borel σalgebra, and the uniform probability measure). This gives
a probability measure on the inﬁnite continuous cube [0, 1]
N
, and
the coordinate functions π
α
: X
A
→ [0, 1] can now be interpreted
as jointly independent random variables, each having the uniform
distribution on [0, 1].
Example 2.4.8 (Independent gaussians). We repeat the previous
example, but now replace [0, 1] with R (with the usual metric, and
the Borel σalgebra), and the normal probability distribution dµ
α
=
1
√
2π
e
−x
2
/2
dx (thus µ
α
(E) =
E
1
√
2π
e
−x
2
/2
dx for every Borel set E).
This gives a probability space that supports a countable sequence of
jointly independent gaussian random variables π
α
.
2
A family of random variables (Yα)
α∈A
is said to be jointly independent if one
has P(
α∈B
Yα ∈ Eα) =
α∈B
P(Yα ∈ Eα) for every ﬁnite subset B of A and every
collection Eα of measurable sets in the range of Yα.
Bibliography
[De1901] M. Dehn,
¨
Uber den Rauminhalt, Mathematische Annalen 55
(1901), no. 3, pages 465478.
[deG1981] M. de Guzm´ an, Real variable methods in Fourier analysis.
NorthHolland Mathematics Studies, 46. Notas de Matem´ atica , 75.
NorthHolland Publishing Co., AmsterdamNew York, 1981.
[Go1938] K. G¨ odel, Consistency of the axiom of choice and of the gener
alized continuumhypothesis with the axioms of set theory, Proc. Nat.
Acad. Sci. 24 (1938), 556–557.
[Me2003] A. Melas, The best constant for the centered HardyLittlewood
maximal inequality, Ann. of Math. 157 (2003), no. 2, 647688.
[So1970] R. Solovaly, A model of settheory in which every set of reals is
Lebesgue measurable, Annals of Mathematics 92 (1970), 156.
[StSk2005] E. Stein, R. Shakarchi, Real analysis. Measure theory, integra
tion, and Hilbert spaces. Princeton Lectures in Analysis, III. Princeton
University Press, Princeton, NJ, 2005.
[StSt1983] E. Stein, J.O. Str¨ omberg, Behavior of maximal functions in R
n
for large n, Ark. Mat. 21 (1983), no. 2, 259269.
[Ta2008] T. Tao, Structure and Randomness: pages from year one of a
mathematical blog, American Mathematical Society, Providence RI,
2008.
[Ta2009] T. Tao, Poincar´e’s Legacies: pages from year two of a mathe
matical blog, Vol. I, American Mathematical Society, Providence RI,
2009.
[Ta2010] T. Tao, An epsilon of room, Vol. I, American Mathematical So
ciety, Providence RI, 2010.
243
244 Bibliography
[Vi1908] G. Vitali, Sui gruppi di punti e sulle funzioni di variabili reali,
Atti dell’Accademia delle Scienze di Torino 43 (1908), 7592.
Index
Fσ set, 40, 87
G
δ
set, 40, 87
L
p
norm, 119, 127
σalgebra, 85
σcompact, 85
σﬁnite, 85, 197
a priori estimate, 212
absolute continuity, 172
absolute integrability, 54, 68, 104
absolutely convergent integral, 104
almost disjoint, 28
almost dominated convergence, 112
almost everywhere, 53
almost everywhere diﬀerentiability,
132
almost sure convergence, 116
almost uniform convergence, 116
approximation by an algebra, 95
approximation to the identity, 155,
208
area interpretation of integral, 207
area interpretation of Lebesgue
integral, 66
area interpretation of Riemann
integral, 17
atomic algebra, 82
axiom of choice, xv
axiom of countable choice, xvi
ball, 11
BanachTarski paradox, 3
basic jump function, 160
Bernoulli random variables, 241
Besicovitch covering lemma, 152
BolyaiGerwien theorem, 13
Boolean algebra, 10, 80
Borel σalgebra, 87
Borel hierarchy, 88
Borel measure, 153
BorelCantelli lemma, 109
bounded variation, 165
bounded variation diﬀerentiation
theorem, 167
box, 5
bulletriddled square, 13
Cantor function, 170
Cantor set, 35, 61
Cantor’s theorem, 25
Carath´eodory measurability, 40,
181
category of measure spaces, 96
change of variables (linear), 41, 66
change of variables (measure), 103
change of variables (series), xiii
classical derivative, 132
closed dyadic cube, 29
coarsening, 80
compactness, 25
245
246 Index
complete measure, 94
completion (measure), 94
continuity of translation, 138
continuous diﬀerentiability, 132
continuous probability distribution,
234
continuously diﬀerentiable curve,
43
convergence in L
1
norm, 116
convergence in L
∞
norm, 116
convergence in distribution, 131
convergence in mean, 116
convergence in measure, 116
convergence in probability, 116
convolution, 140
countable additivity, 36, 92
countable subadditivity, 22, 93
Cousin’s theorem, 153
create an epsilon of room, 23, 25,
210
cumulative distribution function,
131
Darboux integrability, 16
Darboux integral, 16
de Morgan’s laws, 33
defect version of Fatou’s lemma,
113, 130
density argument, 138
Devil’s staircase, 170
diameter, 24
diﬀerentiability, 132
Dini derivative, 156
Dirac measure, 90
directional diﬀerentiability, 226
discrete algebra, 81
discrete probability distribution,
234
discretisation, 7
distance (L
1
), 69
dominated convergence theorem,
111
dominated convergence theorem
(sets), 38, 93
domination, 126
dot product, xi
downward monotone convergence
(sets), 38
downwards monotone convergence,
93
Dyadic algebra, 83
dyadic cube, 12
dyadic maximal inequality, 152
dyadic mesh, 29
dyadic nesting property, 30
Egorov’s theorem, 75, 97, 123
elementary algebra, 81
elementary measure, 6
elementary set, 5
escape to horizontal inﬁnity, 76,
106, 118
escape to vertical inﬁnity, 106, 118,
126
escape to width inﬁnity, 106, 118,
126
essential upper bound, 119
essentially uniform convergence,
116
Euclidean space, xi
event space, 233
existence of nonmeasurable sets,
44
extended real, xi
exterior measure, 180
fast convergence, 124
Fatou’s lemma, 110
ﬁnite additivity, 8, 10, 23, 65, 90,
91, 99, 101
ﬁnite subadditivity, 10, 22, 91
ﬁrst fundamental theorem of
calculus, 135
ﬁrst uncountable ordinal, 88
Fubini’s theorem, 205
FubiniTonelli theorem, 207
gauge function, 153
generation of algebras, 84, 86
good kernel, 155
gradient, 227
graphs, 10
greedy algorithm, 149
HahnKolmogorov extension, 188
HahnKolmogorov theorem, 186
Index 247
HardyLittlewood maximal
inequality, 142, 146, 148, 157
heat kernel, 155
Heaviside function, 193
height (step function), 121
HeineBorel theorem, 25
HenstockKurzweil integral, 178
Hilbert’s third problem, 13
homogeneity, 100
homogeneity (integral), 64, 99
homomorphism, 141
horizontal truncation, 64, 101
inclusionexclusion principle, 91, 99
indeterminate forms, xii
indicator function, xi
inﬁnite series (absolutely
summable), 47
inﬁnite series (unsigned), xiii, 47
inner regularity, 39, 237
integration by parts, 169, 194, 230
interval, 5, 189
Jordan algebra, 81
Jordan inner measure, 9
Jordan measurability, 9
Jordan null set, 12
Jordan outer measure, 9, 18
jump function, 160
Kolmogorov extension theorem,
238
Lebesgue algebra, 82
Lebesgue decomposition, 162
Lebesgue diﬀerentiation theorem,
136, 137, 147
Lebesgue exterior measure, 20
Lebesgue inner measure, 40
Lebesgue integral (absolutely
integrable), 68
Lebesgue integral (unsigned), 65
Lebesgue measurability, 20
Lebesgue measurability (complex
functions), 62
Lebesgue measurability (unsigned
functions), 57
Lebesgue outer measure, 19
Lebesgue philosophy, 57
Lebesgue point, 147
LebesgueStieltjes measure, 189
length, xi
length (intervals), 5
linearity (integral), 15, 16, 54, 55,
70, 98
Lipschitz continuity, 226
Lipschitz diﬀerentiation theorem,
167
Littlewood’s ﬁrst principle, 20, 34,
40, 72
Littlewood’s second principle, 72,
77
Littlewood’s third principle, 72, 75
Littlewoodlike principles, 78
local integrability, 147
locally uniform convergence, 74
lower Darboux integral, 15
lower unsigned Lebesgue integral,
63
Lusin’s theorem, 77
Markov’s inequality, 67, 100
mean value theorem, 134
measurability (function), 95
measurability (set), 80
measurable map, 96
measurable morphism, 96
measure, 36, 92
measure space, 92
metric completion, 42
metric entropy, 12
monotone class lemma, 200
monotone convergence theorem,
107, 130
monotone convergence theorem
(sets), 38, 93
monotone diﬀerentiation theorem,
156
monotonicity (integral), 15, 16, 54,
64, 97, 99, 100
monotonicity (measure), 8, 10, 21,
91
moving bump example, 76
moving bump function, 106
noise tolerance, 56
nonatomic algebra, 83
248 Index
nonnegativity (measure), 10
norm (partition), 14
Notation, x
null algebra, 82
null set, 32, 94
outer measure, 20, 22, 180, 186
outer regularity, 30
partial derivative, 226
piecewise constant function, 15
piecewise constant integral, 15
pointwise almost everywhere
convergence, 74, 116
pointwise convergence, 73, 115
pointwise convergence (sets), 38
Poisson kernel, 155
polytope, 11
premeasure, 185
probability, 233
probability density, 234
probability measure, 232
probability space, 232
problem of measure, 2
product σalgebra, 195, 236
product measure, 197, 240
product space, 235
pullback (σalgebra), 195
pushforward, 103, 237
Radamacher diﬀerentiation
theorem, 228
Radon measure, 192
recursive description of a σalgebra,
88
recursive description of Boolean
algebra, 84
reﬁnement, 80
reﬂection, 16, 213
restriction (Boolean algebra), 82
restriction (measure), 101
Riemann integrability, 14
Riemann integral, 14
Riemann sum, 14
RiemannStieltjes integral, 194
rising sun inequality, 146
rising sun lemma, 143
Rolle’s theorem, 132
sample space, 233
second fundamental theorem of
calculus, 134, 168, 169, 173,
176
seminorm, 69
simple function, 50
simple integral, 51, 97, 98
Solovay’s theorem, 43
spaceﬁlling curve, 43
Steinhaus theorem, 140, 154
step function, 72
strong derivative, 132
subnull set, 94
subadditivity (integral), 64
sums of measures, 92
superadditivity, 100
superadditivity (integral), 64
support, 53
symmetric diﬀerence, 5
tagged partition, 14
tail support, 121
Tonelli’s theorem, 201, 203, 204
Tonelli’s theorem (series), xiii, xv
Tonelli’s theorem (sums and
integrals), 109
total diﬀerentiability, 227
total variation, 165
translation (of a set in Euclidean
space), 5
translation invariance, 8, 10, 41, 66
triangle inequality, 71
trivial algebra, 81
typewriter sequence, 118
uniform continuity, 172
uniform convergence, 74, 115
uniform integrability, 126
uniformly almost everywhere
convergence, 116
uniqueness of antiderivative, 134
uniqueness of Jordan measure, 12
uniqueness of Lebesgue measure, 42
uniqueness of the Lebesgue
integral, 66
uniqueness of the Riemann
integral, 17
Index 249
uniqueness of the unsigned integral,
113
unsigned integral, 100
upper Darboux integral, 16
upper unsigned Lebesgue integral,
63
upward monotone convergence
(sets), 37
upwards monotone convergence, 93
vertical truncation, 64, 101
Vitalitype covering lemma, 149
volume (box), 5
weak derivative, 229
Weierstrass function, 156
width (step function), 121
zero measure, 90
To Garth Gaudry, who set me on the road; To my family, for their constant support; And to the readers of my blog, for their feedback and contributions.
Contents
Preface Notation Acknowledgments Chapter 1. §1.1. §1.2. §1.3. §1.4. §1.5. §1.6. §1.7. Measure theory
ix x xvi 1 2 17 46 79 114 131
Prologue: The problem of measure Lebesgue measure The Lebesgue integral Abstract measure spaces Modes of convergence Diﬀerentiation theorems
Outer measures, premeasures, and product measures 179 Related articles 209 210 226 232
Chapter 2. §2.1. §2.2. §2.3. §2.4.
Problem solving strategies The Radamacher diﬀerentiation theorem Probability spaces
Inﬁnite product spaces and the Kolmogorov extension theorem 235 243 vii
Bibliography
viii Index Contents 245 .
pointset topology.com. together. focusing in particular on the basics of measure and integration theory. both in Euclidean spaces and in abstract measure spaces. such as a section on problem solving strategies in real analysis (Section 2. I found ix . together with some supplementary material.Preface In the fall of 2010. which are also available online on my blog terrytao. the ﬁrst half of the course is devoted almost exclusively to measure theory on Euclidean spaces Rd (starting with the more elementary JordanRiemannDarboux theory.1) which evolved from discussions with my students. which is an introduction to the analysis of Hilbert and Banach spaces (such as Lp and Sobolev spaces). The approach to measure theory here is inspired by the text [StSk2005]. deferring the abstract aspects of measure theory to the second half of the course. which was used as a secondary text in my course. In particular. This text is intended to form a prequel to my graduate text [Ta2010] (henceforth referred to as An epsilon of room. and related topics such as Fourier analysis and the theory of distributions. I ). Vol. they serve as a text for a complete ﬁrstyear graduate course in real analysis. This text is based on my lecture notes of that course. I taught an introductory onequarter course on graduate real analysis. and only then moving on to the more sophisticated Lebesgue theory).wordpress.
The core material is contained in Chapter 1. assuming only an undergraduate knowledge in real analysis (and in particular. and signalled this well in advance. In my own course. A large number of exercises are interspersed throughout the text.4. The remaining three sections in Chapter 2 are optional topics. many of the key results and examples in the subject will in fact be presented through the exercises. which can be easily .x Preface that this approach strengthened the student’s intuition in the early stages of the course. to encourage the students to attempt as many of the exercises as they could as preparation for the exams. Section 2. I used the exercises as the basis for the examination questions. Indeed. A small number of exercises however will require some knowledge of pointset topology or of settheoretic concepts such as cardinals and ordinals. Notation For reasons of space. a secondary real analysis text can be used in conjunction with this one. such as Carath´odory’s general construction of a meae sure from an outer measure.1 is a much more informal section than the rest of the book. or concept. we will not be able to deﬁne every single mathematical term that we use in this book. result. but it is not strictly necessary.3 can be read after completing Section 1. then it denotes a standard mathematical object. which we will use as the foundation for our construction of Lebesgue measure). focusing on describing problem solving strategies. this section evolved from various discussions with students throughout the course. either speciﬁc to real analysis exercises. Most of the material here is selfcontained. on the HeineBorel theorem. and already comprises a full quarter’s worth of material. and it is intended that the reader perform a signiﬁcant fraction of these exercises while going through the text. which require understanding of most of the material in Chapter 1 as a prerequisite (although Section 2. If a term is italicised for reasons other than emphasis or for deﬁnition. and helped provide motivation for more abstract constructions. or more generally applicable to a wider set of mathematical problems.
. +∞ · 0 = 0 · +∞ = 0. we caution that the laws of . multiplication. . +∞) := {x ∈ R : x ≥ 0} with an additional element adjointed to it. . . . or other online reference pages. . + xd yd . . and order continue to hold in this extended number system. .g. +∞) to [0. + x2 )1/2 1 d and two vectors (x1 . xd ) : x1 . xd ) := (x2 + . Of course. multiplication. +∞). Most of the laws of algebra for addition. .Notation xi looked up in any number of references. . . xd ). However.) Given a subset E of a space X. . A vector (x1 . xd ∈ R} as (ddimensional) Euclidean space. and an order relation x ≤ y is preserved under addition or multiplication of both sides of that relation by the same quantity. . . . +∞] by declaring +∞ + x = x + +∞ = +∞ for all x ∈ [0. yd ) have dot product (x1 . . . and x < +∞ for all x ∈ [0. we refer to the vector space Rd := {(x1 . with the latter distributing over the former. the indicator function 1E : X → R is deﬁned by setting 1E (x) equal to 1 for x ∈ E and equal to 0 for x ∈ E. . . . . and order structures on [0. we will need to work with this system because many sets (e. . . . . (y1 . +∞ · x = x · +∞ = +∞ for all nonzero x ∈ (0. . +∞ is not a real number. . +∞] is the nonnegative real axis [0. For any natural number d. which we label +∞. . . . +∞]. . for instance addition and multiplication are commutative and associative. . xd ) · (y1 . We extend the addition. Rd ) will have inﬁnite measure. The extended nonnegative real axis [0. xd ) in Rd has length (x1 . . (In the blog version of the book. but we think of it as an extended real number. . . . yd ) := x1 y1 + . many of these terms were linked to their Wikipedia pages. +∞].
1/n → 0 but 1/n · +∞ → 0 · +∞). +∞]. . while the fundamental convergence theorem for the latter is the dominated convergence theorem (Theorem 1. x2 . then one can add in inﬁnity. for instance. this is only safe if one can guarantee that all quantities involved are ﬁnite (and in the case of multiplicative cancellation. of course).1. then multiplication becomes upward continuous (in the sense that whenever xn ∈ [0. One important feature of the extended nonnegative real axis is that all sums are convergent: given any sequence x1 . and yn ∈ [0. which involves quantities taking values in (−∞.0. but not necessarily for monotone decreasing ones). which leads to other asymmetries (e. but it is diﬃcult to have both at the same time. Because of this tradeoﬀ. and the absolutely integrable theory.44) applies for monotone increasing functions.4. there is little danger in working in the extended real number system. the quantity being cancelled also needs to be nonzero. which involves quantities taking values in [0. . +∞]. or have negative numbers.g.g.4. +∞) or C. ∈ [0. A general rule of thumb is that if one wishes to use cancellation (or proxies for cancellation. +∞] increases to y ∈ [0. +∞].44). Note that there is a tradeoﬀ here: if one wants to keep as many useful laws of algebra as one can. Both branches of the theory are important. as long as one avoids using cancellation and works exclusively with nonnegative quantities.xii Preface cancellation do not apply once some of the variables are allowed to be inﬁnite. such as subtraction or division). This is related to the fact that the forms +∞ − +∞ and +∞/ + ∞ are indeterminate (one cannot assign a value to them without breaking a lot of the rules of algebra). +∞] increases to x ∈ [0. and both will be covered in later notes. . For instance. we will see two overlapping types of measure and integration theory: the nonnegative theory. . then xn yn increases to xy) but not downward continuous (e. the fundamental convergence theorem for the former theory is the monotone convergence theorem (Theorem 1. We note also that once one adopts the convention +∞ · 0 = 0 · +∞ = 0. we cannot deduce x = y from +∞+x = +∞+y or from +∞ · x = +∞ · y.49). This asymmetry will ultimately cause us to deﬁne integration from below rather than from above. However. the monotone convergence theorem (Theorem 1.4. Remark 0. +∞].
2 (Tonelli’s theorem for series).m = (n. more precisely. +∞] such that α∈A xα < ∞. show that xα = 0 for all but at most countably many α ∈ A. +∞] indexed by an arbitrary set A (ﬁnite or inﬁnite. +∞].23): Theorem 0. . one has the change of variables formula (0. +∞] n=1 as the limit of the partial sums n=1 xn . given any collection (xα )α∈A of numbers xα ∈ [0. the Riemann rearrangement theorem).m )n. Then ∞ ∞ ∞ ∞ xn. If (xα )α∈A is a collection of numbers xα ∈ [0. An equivalent deﬁnition of this inﬁnite sum is as the supremum of all ﬁnite subsums: ∞ N xn = n=1 F ⊂N.Notation we can always form the sum ∞ xiii xn ∈ [0.2) α∈A xα = β∈B xφ(β) . the above rearrangement identity can fail when the series is not absolutely convergent (cf.0. Let (xn.m)∈N2 n=1 m=1 xn.m∈N be a doubly inﬁnite sequence of extended nonnegative reals xn. ﬁnite α∈F Note from this deﬁnition that one can relabel the collection in an arbitrary fashion without aﬀecting the sum.1) α∈A xα = F ⊂A. Corollary 1. Exercise 0.7. Note that when dealing with signed sums.m . We will rely frequently on the following basic fact (a special case of the FubiniTonelli theorem.1.m ∈ [0.F sup xα .0. countable or uncountable).F sup xn . which may be either ﬁnite or inﬁnite.m = m=1 n=1 xn. even if A itself is uncountable. ﬁnite n∈F Motivated by this. given any bijection φ : B → A. we can deﬁne the sum α∈A xα by the formula (0.
n=1 m=1 which is clearly at most of xn.m (again by nonnegativity ∞ ∞ xn..m is the limit of m=1 xn. and thus (by the nonnegativity of the xn. This gives ∞ n=1 ∞ m=1 xn. Let F be any ﬁnite subset of N2 . N } for some ﬁnite N .m for each ﬁnite N . We shall just show the equality of the ﬁrst and second expressions. It suﬃces to show that N ∞ xn.m ..m .m)∈N2 xn..m . The righthand side can be rearranged as N N xn. Tonelli’s theorem asserts that we may rearrange inﬁnite series with impunity as long as all summands are nonnegative.m)∈N2 n=1 m=1 xn.m ≤ (n.m)∈F (n.m ) xn. .m .. the equality of the ﬁrst and third is proven similarly. . .1).N }×{1.xiv Preface Informally. the leftN M hand side is the limit of n=1 m=1 xn.m)∈{1..m ≤ (n.m)∈F n=1 m=1 xn. N } × {1. and the claim then follows from (0. .m)∈N2 xn. As each m=1 xn.m . Then F ⊂ {1.m . . . . Fix N . It remains to show the reverse inequality ∞ ∞ xn. We ﬁrst show that ∞ ∞ xn.. Thus it ∞ M ..m ≤ n=1 m=1 (n.m ≤ n=1 m=1 (n. Proof.m ).m ≤ (n. . for any ﬁnite subset F of N2 .N } xn..m as M → ∞.
one needs an additional assumption of absolute summability of xn. +∞] indexed by A and B. xn. the theorem can fail (consider for instance the case when xn.m . when A is inﬁnite. Let A. Remark 0..M } for each ﬁnite M .4 (Axiom of choice). B be sets (possibly inﬁnite or uncountable). indexed by the same set A. one cannot deduce this axiom from the other axioms of set theory. Let (Eα )α∈A be a family of nonempty sets Eα . (Hint: although not strictly necessary. −1 when n = m + 1.Notation suﬃces to show that N M xv xn.) Next.0.m)∈{1. Then we can ﬁnd a family (xα )α∈A of elements xα of Eα .. and from mathematical induction one can also prove it without diﬃculty when A is ﬁnite. this is Fubini’s theorem for series.m = (n..m)∈N2 xn.m ≤ n=1 m=1 (n.m on N2 before one is permitted to interchange sums. which we shall be assuming throughout the text: Axiom 0. you may ﬁnd it convenient to ﬁrst establish the fact that if n∈A xn is ﬁnite.0.m (n. Show that xn.N }×{1.m were nonnegative in the above argument... Without absolute summability or nonnegativity hypotheses.m )n∈A..m .m ∈ [0. and (xn. indexed by an index set A.m equals +1 when n = m.0. and 0 otherwise). However.. We isolate the countable case as a particularly useful .m∈B be a doubly inﬁnite sequence of extended nonnegative reals xn.2 (Tonelli’s theorem for series over arbitrary sets). we recall the axiom of choice. Exercise 0.. which we will encounter later in this text. Note how important it was that the xn.m = m∈B n∈A xn. but must explicitly add it to the list of axioms. In the signed case. This axiom is trivial when A is a singleton set.3.m)∈A×B n∈A m∈B xn. then xn is nonzero for at most countably many n. But the lefthand side is and the claim follows.
it is safe to use the axiom of choice. . E2 . Then one can ﬁnd a sequence x1 .0. 2. it is only when asking questions about “inﬁnitary” objects that are beyond the scope of Peano arithmetic that one can encounter statements that are provable using the axiom of choice. Marek . Acknowledgments This text was strongly inﬂuenced by the real analysis text of Stein and Shakarchi[StSk2005]. involving a fair amount of logic and descriptive set theory to answer. roughly speaking. . .0. Balachandran. such that xn ∈ En for all n = 1. Farzin Barekat. . . We will not discuss these matters in this text. We will however note a theorem of G¨del[Go1938] that states that any statement that o can be phrased in the ﬁrstorder language of Peano arithmetic. I am greatly indebted to my students of the course on which this text was based. The question of how much of real analysis still survives when one is not permitted to use the axiom of choice is a delicate one. . E3 . . the strategy of focusing ﬁrst on Lebesgue measure and Lebesgue integration. G¨del’s theorem tells us o that for any “ﬁnitary” application of real analysis (which includes most of the “practical” applications of the subject). as well as many further commenters on my blog. Remark 0. for instance. Let E1 . including Marco Angulo.. before moving onwards to abstract measure and integration theory. . So. which was used as a secondary text when teaching the course on which these notes were based. x2 .xvi Preface corollary (though one which is strictly weaker than the full axiom of choice): Corollary 0. In particular. was directly inspired by the treatment in [StSk2005]. J.6. and which is proven with the axiom of choice. our discussion here diﬀers from that in [StSk2005] in other respects. can also be proven without the axiom of choice. and the material on diﬀerentiation theorems also closely follows that in [StSk2005]. 3. On the other hand. but are not provable without it. be a sequence of nonempty sets. a far greater emphasis is placed on Jordan measure and the Riemann integral as being an elementary precursor to Lebesgue measure and the Lebesgue integral. .5 (Axiom of countable choice).
Laurens Gunnarsen. David Speyer. Nick Cook. Pavel Zorin. Tim Sullivan. Mircea Petrache. . Brent Nelson. Chris Breeden. and by the NSF Waterman award. Lewis Bowen. Duke Zhang.Acknowledgments xvii Bern´t. Xueping Huang.wordpress. David Roberts. Constantin Niculescu. Wenying Gan. David Chang. Vladimir Slepnev. David Milovich. Hossein Naderi. for providing corrections and useful commentary on the material here. Lei Zhang. Danny Calegari. Nick Gill. Ulrich Groh. Apoorva Khare. Damek Davis. Tim Gowers.com/category/teaching/245arealanalysis The author is supported by a grant from the MacArthur Foundation. Colin McQuillan. Chana drasekhar. Walt Pohl. Jonathan Weinstein. Marton Eekes. Jim Ralston. Mark Schwarzmann. These comments can be viewed online at terrytao. Tobias Hagge. and several anonymous commenters. Eric Davis. Yu Cao. Shiping Liu. Bo Jacoby. by NSF grant DMS0649473.
.
Chapter 1 Measure theory 1 .
area.1. two. and three dimensions. moving around each component by a rigid motion (e. let us try to formalise some of the intuition for measure discussed earlier. two bodies that have exactly 1One can also pose the problem of measure on other domains than Euclidean space. Prologue: The problem of measure One of the most fundamental concepts in Euclidean geometry is that of the measure m(E) of a solid body E in one or more dimensions. . we refer to this measure as the length. Euclidean geometry became reinterpreted as the study of Cartesian products Rd of the real line R. or simply by postulating the existence of a measure m(E) that can be assigned to all solid bodies E. but we will focus on the Euclidean case here for simplicity. Using this analytic foundation rather than the classical geometrical one. viewing the measure of a macroscopic body as the sum of the measures of its microscopic components. or volume of E respectively. we will refer to this (somewhat vaguely deﬁned) problem of writing down the “correct” deﬁnition of measure as the problem of measure. Measure theory 1. One could also obtain lower and upper bounds on the measure of a body by computing the measure of some inscribed or circumscribed body. the measure of a body was often computed by partitioning that body into ﬁnitely many components. In one. such as a Riemannian manifold. a translation or rotation). it was no longer intuitively obvious how to deﬁne the measure m(E) of a general1 subset E of Rd . In the classical approach to geometry. With the advent of analytic geometry.2 1. however. and the product ∞ · 0 is indeterminate. The physical intuition of deﬁning the measure of a body E to be the sum of the measure of its component “atoms” runs into an immediate problem: a typical solid body would consist of an inﬁnite (and uncountable) number of points. To see why this problem exists at all. One can also justify the concept of measure on “physical” or “reductionistic” grounds. and which obeys a collection of geometrically reasonable axioms. To make matters worse. each of which has a measure of zero. and then reassembling those components to form a simpler body which presumably has the same area. Such arguments can be justiﬁed by an appeal to geometric intuition. this ancient idea goes all the way back to the work of Archimedes at least.g. and refer to any text on Riemannian geometry for a treatment of integration on manifolds.
2] are in onetoone correspondence (using the bijection x → 2x from A to B). The problem of measure then divides into several subproblems: (i) What does it mean for a subset E of Rd to be measurable? (ii) If a set E is measurable.1. Of course. for reasons having to do with the grouptheoretic property of amenability. (This is in fact necessary. which shows that the unit ball B := {(x. Vol. see §2. . need not have the same measure. but of course B is twice as long as A. and restrict attention to just ﬁnite partitions. in one dimension. 1] and B := [0. their construction requires use of the axiom of choice. I for further discussion. one can point to the inﬁnite (and uncountable) number of components in this disassembly as being the cause of this breakdown of intuition. Because of this. among other things. the standard solution to the problem of measure has been to abandon the goal of measuring every subset E of Rd . how does one deﬁne its measure? (iii) What nice properties or axioms does measure (or the concept of measurability) obey? 2The paradox only works in three dimensions and higher. the intervals A := [0.1. Here. thanks to a famous theorem of Solovay[So1970]. just ﬁve pieces suﬃce). y.2 of An epsilon of room. Prologue: The problem of measure 3 the same number of points. z) ∈ R3 : x2 + y 2 + z 2 ≤ 1} in three dimensions2 can be disassembled into a ﬁnite number of pieces (in fact. the most striking of which is the BanachTarski paradox. For instance. which are then referred to as the measurable sets. which can then be reassembled (after translating and rotating each of the pieces) to form two disjoint copies of the ball B.) Such pathological sets almost never come up in practical applications of mathematics. But one still runs into trouble here for a number of reasons. there are models of set theory without the axiom of choice in which the BanachTarski paradox does not occur. and instead to settle for only measuring a certain subclass of “nonpathological” subsets of Rd . So one can disassemble A into an uncountable number of points and reassemble them to form a set of twice the length. the problem is that the pieces used in this decomposition are highly pathological in nature.
and in particular those sets that arise as limits (in various senses) of other sets.4 1. which is a concept closely related to that of the Riemann integral (or Darboux integral ). balls.g. and suﬃces for measuring most of the “ordinary” sets (e. is the measure of an a × b rectangle equal to ab?) These questions are somewhat openended in formulation. polyhedra. the area under the graph of a continuous function) in many branches of mathematics. ﬁnite or countable additivity. measurable? (v) Does the measure of an “ordinary” set equal the “naive geometric measure” of such sets? (e. it turns out that the Jordan concept of measurability is not quite adequate. and there is no unique answer to them. This concept is elementary enough to be systematically studied in an undergraduate analysis course.49). etc. . and must be extended to the more general notion of Lebesgue measurability. one keeps almost all of the desirable properties of Jordan measure. one can expand the class of measurable sets at the expense of losing one or more nice properties of measure in the process (e. Measure theory (iv) Are “ordinary” sets such as cubes.g.4. However. With the Lebesgue theory (which can be viewed as a completion of the JordanDarbouxRiemann theory).g. The ﬁrst is the concept of Jordan measure (or Jordan content) of a Jordan measurable set. with the corresponding notion of Lebesgue measure that extends Jordan measure.44) and the dominated convergence theorem (Theorem 1. in particular. translation invariance. when one turns to the type of sets that arise in analysis. such as the monotone convergence theorem (Theorem 1. However. but with the crucial additional property that many features of the Lebesgue theory are preserved under limits (as exempliﬁed in the fundamental convergence theorems of the Lebesgue theory. suﬃce for most applications. there are two basic answers which. or rotation invariance).4. which do not hold in the JordanDarbouxRiemann setting). between them.
. b) := {x ∈ R : a < x < b}. which allows one to measure a very simple class of sets.1. Elementary measure.1. .1. Show that if E. as well as providing some continuity with the treatment of measure and integration in undergraduate analysis courses. In later sections. A box in Rd is a Cartesian product B := I1 × . b]. (a. In the rest of the current section. If x ∈ Rd . we will formally deﬁne Lebesgue measure and the Lebesgue integral. elementary sets). (a. thus for instance an interval is a onedimensional box.6. Deﬁnition 1. b).53 or Section 1. . where a ≤ b are real numbers. we will discuss the more elementary concepts of Jordan measure and the Riemann integral. . Id (not necessarily of the same length). we discuss the even simpler notion of elementary measure. but it will serve as motivation for that later material. then the union E ∪ F . b] := {x ∈ R : a ≤ x ≤ b}. b].1.3. but the Lebesgue approach handles limits and rearrangement better than the other alternatives.7. . This material will eventually be superceded by the more powerful theory to be treated in later sections. b) to be I := b − a. as well as the more general concept of an abstract measure space and the associated integration operation. An elementary set is any subset of Rd which is the union of a ﬁnite number of boxes. × Id of d intervals I1 .3. namely the elementary sets (ﬁnite unions of boxes).1.1. Exercise 1. b) := {x ∈ R : a ≤ x < b}. × Id . they are particularly well suited3 for applications in analysis. The volume B of such a box B is deﬁned as B := I1  × . [a. and the symmetric diﬀerence E∆F := (E\F ) ∪ (F \E) are also elementary. Before we discuss Jordan measure. and so has become the standard approach in analysis. . . b] := {x ∈ R : a < x ≤ b}. 1. show that the translate E + x := {y + x : y ∈ E} is also an elementary set. We deﬁne the length 4 I of an interval I = [a. An interval is a subset of R of the form [a. . or (a. it is also particularly well suited for providing the rigorous foundations of probability theory. 3There are other ways to extend Jordan measure and the Riemann integral. Prologue: The problem of measure 5 As such. [a. .1 (Boolean closure).1 (Intervals. 4Note we allow degenerate intervals of zero length. as discussed in Section 2. boxes. see for instance Exercise 1. (a. the intersection E ∩ F . and the set theoretic diﬀerence E\F := {x ∈ E : x ∈ F }. F ⊂ Rd are elementary sets. where limits of functions or sets arise all the time.
one has B1  + . Taking Cartesian products. we express E as the union B1 . .d . Looking at the open intervals between these endpoints. given any other partition B1 ∪ . . . . . we can express the B1 . . . . . . . the length of I can be recovered by the limiting formula I = lim N →∞ 1 1 #(I ∩ Z) N N . We ﬁrst prove (i) in the onedimensional case d = 1. . . . . . 2) ∪ [3. d. To prove (ii) we use a discretisation argument. . Given any ﬁnite collection of intervals I1 . ∪ Bk of E. . Ik are a union of some subcollection of the J1 . × Jid . (ii) If E is partitioned as the ﬁnite union B1 ∪. .1 × . This already gives (i) when d = 1. Measure theory We now give each elementary set a measure. 6] is 4.6 1. Such boxes are all disjoint. we use the onedimensional argument to express I1. . for example. . We refer to m(E) as the elementary measure of E. . (i) E can be expressed as the ﬁnite union of disjoint boxes. . the elementary measure of (1. we see that there exists a ﬁnite collection of disjoint intervals J1 . + Bk . (We occasionally write m(E) as md (E) to emphasise the ddimensional nature of the measure. . . .d . . .j . . then the quantity m(E) := B1  + .1.1 × . Jkj . . Bk of boxes Bi = Ii. + Bk  is independent of the partition.2 (Measure of an elementary set). Ik . . . . Lemma 1. . .j as the union of subcollections of a collection J1. . Proof. . . . . .∪Bk of disjoint boxes. Observe (exercise!) that for any interval I. . together with the endpoints themselves (viewed as intervals of length zero). Jk .j . . . To prove the higher dimensional case. Jk such that each of the I1 .) Thus. In other words. Ik.j of disjoint intervals. where 1 ≤ ij ≤ kj for all 1 ≤ j ≤ d. For each j = 1. . and the claim follows. × Ii. Bk as ﬁnite unions of boxes Ji1 . . + Bk  = B1  + . one can place the 2k endpoints of these intervals in increasing order (discarding repetitions). . . . . Let E ⊂ Rd be an elementary set.
(1. 0 ≤ x ≤ 1} a measure of zero.2(ii) by showing that any two partitions of E into boxes admit a mutual reﬁnement into boxes that arise from taking Cartesian products of elements from ﬁnite collections of disjoint intervals. Firstly. it is clear that m(E) is a nonnegative real number for every elementary set E. we see that 1 1 #(B ∩ Zd ) B = lim d N →∞ N N for any box B. so this perspective. namely that the continuous concept of measure can be viewed5 as a limit of the discrete concept of (normalised) cardinality. we obtain the claim (ii). and in particular that 1 1 #(E ∩ Zd ). if d = 1 and E := Q∩[0.1) m(E) := lim N →∞ N d N since this worked well for elementary sets. .1. From the deﬁnitions. Prologue: The problem of measure 7 1 n where N Z := { N : n ∈ Z} and #A denotes the cardinality of a ﬁnite set A. Taking Cartesian products. Exercise 1. one already needs to develop a large part of measure theory. It also makes precise an important intuition. this deﬁnition is not particularly satisfactory for a number of reasons. this concept does not obey reasonable properties such as translation invariance. then this deﬁnition would √ give √ E a measure of 1.3.2.1) will be valid for all Jordan measurable sets (see Exercise 1. For instance. Give an alternate proof of Lemma 1. . the formula (1.1.1. One might be tempted to now deﬁne the measure m(E) of an arbitrary set E ⊂ Rd by the formula 1 1 #(E ∩ Zd ). 1] := {x ∈ Q : 0 ≤ x ≤ 1}. Even when the limit does exist.13). but would give the translate E + 2 := {x + 2 : x ∈ Q. although in order to rigorously introduce the probability theory needed to set up Monte Carlo integration properly. while intuitive. + Bk  = lim d N →∞ N N Denoting the righthand side as m(E). B1  + . .1. and that m(E ∪ F ) = m(E) + m(F ) 5Another way to obtain continuous measure as the limit of discrete measure is via Monte Carlo integration.1. However. one can concoct examples in which the limit does not exist (Exercise!). is not suitable for foundational purposes. Nevertheless.1. Remark 1.
. . . by induction it also implies that m(E1 ∪ .3 (Uniqueness of elementary measure). Measure theory whenever E and F are disjoint elementary sets. F are elementary sets (not necessarily disjoint). Ek are disjoint elementary sets. . . . in the sense that m(B) = B for all boxes B. . .1. . We refer to the latter property as ﬁnite additivity.1. elementary measure clearly extends the notion of volume. . and translation invariance properties. . Ek are elementary sets (not necessarily disjoint). + m(Ek ) whenever E1 . + m(Ek ) whenever E1 . by induction one then has m(E1 ∪ . These properties in fact deﬁne elementary measure up to normalisation: Exercise 1. ∪ Ek ) = m(E1 ) + .8 1.1) we easily obtain the ﬁnite subadditivity property m(E ∪ F ) ≤ m(E) + m(F ) whenever E. Let m : E(Rd ) → R+ be a map from the collection E(Rd ) of elementary subsets of Rd to the nonnegative reals that obeys the nonnegativity. It is also clear from the deﬁnition that we have the translation invariance m(E + x) = m(E) for all elementary sets E and x ∈ Rd . ﬁnite additivity. From this and ﬁnite additivity (and Exercise 1. Finally. . . ∪ Ek ) ≤ m(E1 ) + .1. . From nonnegativity and ﬁnite additivity (and Exercise 1. Show that there exists a constant c ∈ R+ such that m (E) = cm(E) for all . Let d ≥ 1. We also have the obvious degenerate case m(∅) = 0.1) we conclude the monotonicity property m(E) ≤ m(F ) whenever E ⊂ F are nested elementary sets. . .
(J) (E) of E is deﬁned as m∗. (Hint: Set c := m ([0. far too small for most applications.) Exercise 1. we do not consider unbounded sets to be Jordan measurable (they will be deemed to have inﬁnite Jordan outer measure). a solid triangle or disk in the plane will not be elementary. On the other hand. if we impose the additional normalisation m ([0. Prologue: The problem of measure 9 elementary sets E. as essentially observed long ago by Archimedes.B inf m(B). such sets E can be approximated from within and without by elementary sets A ⊂ E ⊂ B. and call m(E) := m∗. B increasingly ﬁne.1. Deﬁnition 1.2.(J) (E).4. and let E1 ⊂ Rd1 . Let E ⊂ Rd be a bounded set.1.(J) (E) the Jordan measure of E. n )d ) for any positive integer n.1. E2 ⊂ Rd2 be elementary sets. or even a rotated box. Let d1 . In particular.(J) (E) := B⊃E. More precisely.(J) (E) = m∗. As one makes the approximating sets A.(J) (E) = m∗. Show that E1 × E2 ⊂ Rd1 +d2 is elementary. We now have a satisfactory notion of measure for elementary sets. then m ≡ m. one can hope that these two bounds eventually match. But of course. and md1 +d2 (E1 × E2 ) = md1 (E1 ) × md2 (E2 ). elementary • The Jordan outer measure m∗. Jordan measure. we have . • The Jordan inner measure m∗. d2 ≥ 1. By convention. and the inscribing elementary set A and the circumscribing elementary set B can be used to give lower and upper bounds on the putative measure of E.(J) (E) of E is deﬁned as m∗. the elementary sets are a very restrictive class of sets. we write m(E) as md (E) when we wish to emphasise the dimension d. This gives rise to the following deﬁnitions. then we say that E is Jordan measurable. Jordan measurable sets are those sets which are “almost elementary” with respect to Jordan outer measure. and 1 then compute m ([0.1.1.4 (Jordan measure). As before.A sup m(A). 1)d ) = 1. 1.(J) (E) := A⊂E. elementary • If m∗. 1)d ). For instance.
7 (Regions under graphs are Jordan measurable).1. we still have m(∅) = 0.10 1. As one corollary of this exercise.6.) (2) Show that the set {(x. F are disjoint. and m(E + x) = m(E). (Hint: on a compact metric space. there exist elementary sets A ⊂ E ⊂ B such that m(B\A) ≤ ε. (3) (Finite additivity) If E. (6) (Translation invariance) For any x ∈ Rd . (4) (Monotonicity) If E ⊂ F .1. 0 ≤ t ≤ f (x)} ⊂ Rd+1 is Jordan measurable.1. (3) For every ε > 0. (2) (Nonnegativity) m(E) ≥ 0. (5) (Finite subadditivity) m(E ∪ F ) ≤ m(E) + m(F ). Now we give some examples of Jordan measurable sets: Exercise 1. and let f : B → R be a continuous function. Let E. . (1) Show that the graph {(x. and that Jordan measure and elementary measure coincide for such sets. E\F . t) : x ∈ B. there exists an elementary set A such that m∗. E ∩ F .(J) (A∆E) ≤ ε. Jordan measurability also inherits many of the properties of elementary measure: Exercise 1. Show that the following are equivalent: (1) E is Jordan measurable. continuous functions are uniformly continuous. In particular. then m(E) ≤ m(F ). (2) For every ε > 0. Measure theory Exercise 1. then m(E ∪ F ) = m(E) + m(F ).5 (Characterisation of Jordan measurability). this justiﬁes the use of m(E) to denote both. (1) (Boolean closure) Show that E ∪ F . E + x is Jordan measurable. Let E ⊂ Rd be bounded. Let B be a closed box in Rd . f (x)) : x ∈ B} ⊂ Rd+1 is Jordan measurable in Rd+1 with Jordan measure zero. and E∆F are Jordan measurable. F ⊂ Rd be Jordan measurable sets. we see that every elementary set E is Jordan measurable.
) Exercise 1. say AB. is horizontal. B. r) := {y ∈ Rd : y − x ≤ r} in Rd are Jordan measurable. where v ∈ Rd . (2) Establish the crude bounds 2 √ d d ≤ cd ≤ 2 d . where (a. b) ∧ (c. (2) Show that the Jordan measure of the solid triangle is equal 1 to 2 (B − A) ∧ (C − A). c ∈ R. B(x.8.1.10.1. C be three points in R2 . then L(E) is also.1.1.1. (Hint: It may help to ﬁrst do the case when one of the edges. A compact convex polytope is a closed convex polytope which is also bounded. Show that every compact convex polytope6 in Rd is Jordan measurable.) (2) Show that if E is Jordan measurable. where ωd := Γ(d/2) is the volume of the unit sphere S d−1 ⊂ Rd and Γ is the Gamma function. 6A closed convex polytope is a subset of Rd formed by intersecting together ﬁnitely many closed halfspaces of the form {x ∈ Rd : x · v ≤ c}. with Jordan measure cd rd for some constant cd > 0 depending only on d.) Exercise 1. (1) Show that there exists a nonnegative real number D such that m(L(E)) = Dm(E) for every elementary set E (note from previous exercises that L(E) is Jordan measurable). (Hint: apply Exercise 1. and m(L(E)) = Dm(E). Prologue: The problem of measure Exercise 1. C is Jordan measurable.11. This exercise assumes familiarity with linear algebra.3 to the map E → m(L(E)).9. d/2 1 2π (An exact formula for cd is cd = d ωd . B. Let L : Rd → Rd be a linear transformation. but we will not derive this formula here. . Let A. and · denotes the usual dot product on Rd .1. d) := ad − bc. (1) Show that all open and closed Euclidean balls B(x. 11 (1) Show that the solid triangle with vertices A. r) := {y ∈ Rd : y − x < r}. Exercise 1.1.
Exercise 1. . . n→∞ n→∞ Exercise 1. if we impose the additional normalisation m ([0.12 1. Let m : J (Rd ) → R+ be a map from the collection J (Rd ) of Jordanmeasurable subsets of Rd to the nonnegative reals that obeys the nonnegativity. 2−n ) − E∗ (E. Exercise 1. and use the polar decomposition.1. ﬁnite additivity. 1)d ) = 1.1. in which case one has m(E) = lim 2−dn E∗ (E. × id id + 1 . 2n 2n × . 2−n )) = 0. Show that any subset of a Jordan null set is a Jordan null set.1) holds for all Jordan measurable E ⊂ Rd .1. . Let E ⊂ Rd be a bounded set. Alternatively. 2n 2n for some integers n. id . In particular. For each integer n. Show that (1. and translation invariance properties. using Gaussian elimination. 2−n ) denote the number of dyadic cubes of sidelength 2−n that are contained in E. . work with the cases when L is a diagonal transformation or an orthogonal transformation. 2−n ) = lim 2−dn E ∗ (E. then m ≡ m. Measure theory (3) Show that D =  det L. 7This quantity could be called the (dyadic) metric entropy of E at scale 2−n . using the unit ball in the latter case.15 (Uniqueness of Jordan measure). Show that E is Jordan measurable if and only if n→∞ lim 2−dn (E ∗ (E.. 2−n ) be the number of dyadic cubes7 of sidelength 2−n that intersect E. 2−n ).. Deﬁne a Jordan null set to be a Jordan measurable set of Jordan measure zero. let E∗ (E. Let d ≥ 1. (Hint: Work ﬁrst with the case when L is an elementary transformation.1. Deﬁne a dyadic cube to be a halfopen box of the form i1 i1 + 1 . .14 (Metric entropy formulation of Jordan measurability).12. i1 . Show that there exists a constant c ∈ R+ such that m (E) = cm(E) for all Jordan measurable sets E. and let E ∗ (E.) Exercise 1.13.
The above exercises give a fairly large class of Jordan measurable sets. 1]2 \Q2 . by construction.1. Prologue: The problem of measure 13 Exercise 1. not every subset of Rd is Jordan measurable. In order to measure such sets we will need to develop Lebesgue measure. First of all. Let E ⊂ Rd be a bounded set. However. Let d1 . The converse statement is true in one and two dimensions d = 1. But there are also bounded sets that are not Jordan measurable: Exercise 1. (2) Show that E and the interior E ◦ of E have the same Jordan inner measure. 2 (this is the BolyaiGerwien theorem). which is done in the next set of notes.19 (Carath´odory type property). . (4) Show that the bulletriddled square [0. and set of bullets [0.(J) (E\F ). (3) Show that E is Jordan measurable if and only if the topological boundary ∂E of E has Jordan outer measure zero. Show that E1 × E2 ⊂ Rd1 +d2 is Jordan measurable. Q be two polytopes in Rd . and F ⊂ Rd be an elementary set. and md1 +d2 (E1 × E2 ) = md1 (E1 ) × md2 (E2 ). with any two of the subpolytopes in Q intersecting only at their boundaries. Let E ⊂ Rd be e a bounded set. Suppose that P can be partitioned into ﬁnitely many subpolytopes which. or a very “fractal” boundary.1. both sets are not Jordan measurable.16. is unlikely to be Jordan measurable. Exercise 1. any set with a lot of “holes”. Let P. Exercise 1. d2 ≥ 1. form a cover of Q. both have Jordan inner measure zero and Jordan outer measure one. Conclude that P and Q have the same Jordan measure.1.1.1. after being rotated and translated.(J) (E ∩ F ) + m∗.18.(J) (E) = m∗. E2 ⊂ Rd2 be Jordan measurable sets. In particular. 1]2 ∩ Q2 .17. Show that m∗. and let E1 ⊂ Rd1 .1. Informally. but false in higher dimensions (this was Dehn’s negative answer[De1901] to Hilbert’s third problem). (1) Show that E and the closure E of E have the same Jordan outer measure. the unbounded sets are not Jordan measurable.
. together with additional numbers xi−1 ≤ x∗ ≤ xi for each i = 1. Connection with the Riemann integral. A tagged partition P = ((x0 . We abbreviate xi −xi−1 i as δxi . denoted a f (x) dx and referred to as the Riemann integral of f on [a.) Deﬁnition 1.1. < xn = b. . . b]. . . A more convenient formulation of the Riemann integral can be formulated using some additional machinery. For simplicity we will only discuss the classical onedimensional Riemann integral on an interval [a. x∗ )) of [a. b] → R be a function. The quantity ∆(P) := sup1≤i≤n δxi will be called the norm of the tagged partition. Measure theory 1. we brieﬂy discuss the relationship between Jordan measure and the Riemann integral (or the equivalent Darboux integral ). we adopt the convention that every function f : [a. P) by which we mean that for every ε > 0 there exists δ > 0 such b that R(f.3. x1 . (x∗ . . . while geometrically natural. . b] is an interval of zero length. b] → R is Riemann integrable. this Riemann integral will be superceded by the Lebesgue integral. i We say that f is Riemann integrable on [a. (In later sections. If [a. and f : [a. . P) := i=1 f (x∗ )δxi . Let [a. . with a Riemann integral of zero. . . To conclude this section. . P) − a f (x) dx ≤ ε for every tagged partition P with ∆(P) ≤ δ. . b] if there exists a real b number.1. for which we have b f (x) dx = a lim ∆(P)→0 R(f.5 (Riemann integrability). can be awkward to use in practice. b] is a ﬁnite sequence of real n 1 numbers a = x0 < x1 < . b] be an interval of positive length. The Riemann sum R(f. Note that unbounded functions cannot be Riemann integrable (why?).14 1. b]. though one can extend the Riemann theory without much diﬃculty to higherdimensional integrals on Jordan measurable sets. . The above deﬁnition. n. xn ). P) of f with respect to the tagged partition P is deﬁned as n R(f.
g : [a. Exercise 1. b] is deﬁned as a b b f (x) dx := a g≤f. and p. b]) then p.) Similarly. . a f (x) dx ≤ p. (The hypothesis that f is bounded ensures that the supremum is over a nonempty set. b] be an interval. (3) (Indicator) If E is an elementary subset of [a.c.21 (Basic properties of the piecewise constant integral). such that f is equal to a constant ci on each of the intervals Ii .c. The lower Darboux integral b f (x) dx of f on [a. a cf (x) dx = cp.c. b a f (x) + g(x) dx = p.c.1. b] → R is a function for which there exists a partition of [a. a where g ranges over all piecewise constant functions that are pointwise bounded above by f . b a g(x) dx. and refer to it as the piecewise constant integral of f on [a.20 (Piecewise constant functions). a f (x) dx. then the indicator function 1E : [a. We will denote this quantity by b p. b a f (x) dx + p. b] → R (deﬁned by setting 1E (x) := 1 when x ∈ E and 1E (x) := 0 otherwise) is piecewise conb stant. . cf and f + g are pieceb b wise constant. A piecewise constant function f : [a. a f (x) dx and p. .6 (Darboux integral). b]. a g(x) dx.c. Deﬁnition 1. b] be an interval. (2) (Monotonicity) If f ≤ g pointwise (i. a 1E (x) dx = m(E). In .1.c. b] be an interval. with p. b] into ﬁnitely many intervals I1 .1.e. Let [a.1. piecewise constant g(x) dx. . b] → R be a bounded function. b]. If f is piecewise constant.c. sup p. show that the expression n ci Ii  i=1 is independent of the choice of partition used to demonstrate the piecewise constant nature of f .c. we deﬁne the upper .c. and let f.1. and f : [a. f (x) ≤ g(x) for all b b x ∈ [a. Let [a. b] → R be piecewise constant functions.c. Let [a. Prologue: The problem of measure 15 Exercise 1. Establish the following statements: (1) (Linearity) For any real number c.
23. and refer to this quantity as the Darboux integral of f on [a. b inf p.e. . b] be an interval. b] into ﬁnitely many intervals. Show that f is Riemann integrable if and only if it is Darboux integrable.1. b] → R is piecewise continuous if one can partition [a. b] → R be a bounded function. (2) (Monotonicity) If f ≤ g pointwise (i. we connect the Riemann integral to onedimensional Jordan measure: Exercise 1. a Clearly a f (x) dx ≤ a f (x) dx. piecewise continuous8 function f : [a. we say that f is Darboux integrable. cf and f +g are Riemann b b b integrable. b] be an interval. More generally. Note that the upper and lower Darboux integrals are related by the reﬂection identity b b − f (x) dx = − a a f (x) dx. with a cf (x) dx = c · a f (x) dx and a f (x) + g(x) dx = b a f (x) dx + b a g(x) dx. Let [a. in which case the Riemann integral and Darboux integrals are equal. Let [a. b] → R be Riemann integrable. Show that any continuous function f : [a. b] by the formula b f (x) dx := a b h≥f.1. Exercise 1.22.16 Darboux integral b b f (x) a 1.1. If these two quantities are equal. Measure theory dx of f on [a. 8A function f : [a. b] → R is Riemann integrable.c. Now we connect the Riemann integral to Jordan measure in two ways. f (x) ≤ g(x) for all b b x ∈ [a. First. piecewise constant h(x) dx. b]) then a f (x) dx ≤ a g(x) dx. b] → R is Riemann integrable. show that any bounded. and let f. Exercise 1. Establish the following statements: (1) (Linearity) For any real number c. b]. such that f is continuous on each interval.24 (Basic properties of the Riemann integral). and f : [a. g : [a.
in which case one has b f (x) dx = m2 (E+ ) − m2 (E− ).1. Show that f is Riemann integrable if and only if the sets E+ := {(x. (ii) Using this.25 (Area interpretation of the Riemann integral). Finally. and let f : [a. in such a way that analogues of all the previous results hold. m∗. and a 1E (x) dx = m(E). (Hint: First establish this in the case when f is nonnegative. we connect the integral to twodimensional Jordan measure: Exercise 1. we say that E is Jordan measurable. This theory proceeded in the following stages: (i) First. show that these properties uniquely deﬁne the Riemann inteb gral. b] to R which obeys all three of the above properties. t) : x ∈ [a. b]. in the sense that the functional f → a f (x) dx is the only map from the space of Riemann integrable functions on [a. 1. Extend the deﬁnition of the Riemann and Darboux integrals to higher dimensions. If those measures match. 0 ≤ t ≤ f (x)} and E− := {(x.2. Let [a. Lebesgue measure 17 (3) (Indicator) If E is a Jordan measurable of [a.) Exercise 1. b] be an interval. b] → R be a bounded function. .1. then the indicator function 1E : [a. f (x) ≤ t ≤ 0} are both Jordan measurable in R2 .26. and deﬁnes the elementary measure m(E) of such sets. Next.1. we recalled the classical theory of Jordan measure on Euclidean spaces Rd . b].1. (iii) From this. b].2. one deﬁned the notion of an elementary set E (a ﬁnite union of boxes). t) : x ∈ [a. b] → R (deﬁned by setting 1E (x) := 1 when x ∈ E and 1E (x) := 0 otherwise) is Riemann inteb grable. one deﬁned the notion of a box B and its volume B.(J) (E) of an arbitrary bounded set E ⊂ Rd .(J) (E). a where m2 denotes twodimensional Jordan measure. Lebesgue measure In Section 1. one deﬁned the inner and Jordan outer measures m∗.
. Give an example of a sequence of uniformly bounded. In fact.(J) (E) = +∞ if E is unbounded. + Bk . 1] → R that is not Riemann integrable. Riemann integrable functions fn : [0. 1] → R for n = 1. as we saw in Section 1. Measure theory and call m(E) = m∗. even when bounded. To deﬁne this measure. E2 . However.. that converge pointwise to a bounded function f : [0. +∞].(J) (E) := inf B1 ∪. so the Jordan theory does not cover many classes of sets of interest. thus m∗. we ﬁrst tinker with the notion of the Jordan outer measure m∗.. . we shall see later in these notes that there even exist bounded open sets. 2... even if one restricts attention to bounded sets. Observe from the ﬁnite additivity and subadditivity of elementary measure that we can also write the Jordan outer measure as m∗. namely Lebesgue measure.B1 . ⊂ R need not be Jordan measurable. What happens if we replace pointwise convergence with uniform convergence? These issues can be rectiﬁed by using a more powerful notion of measure than Jordan measure.2.. . . As long as one is lucky enough to only have to deal with Jordan measurable sets. which are not Jordan measurable. . This creates problems with Riemann integrability (which. was closely related to Jordan measure) and pointwise limits: Exercise 1. . Another class that it fails to cover is countable unions or intersections of sets that are already known to be measurable: Exercise 1.1. not all sets are Jordan measurable. .B elementary of a set E ⊂ Rd (we adopt the convention that m∗. as noted previously.2.Bk ∞ boxes B1  + . whose properties we will brieﬂy review below).(J) now takes values in the extended nonnegative reals [0. or compact sets.18 1. .2. . the theory of Jordan measure works well enough. Show that the countable union n=1 En or count∞ able intersection n=1 En of Jordan measurable sets E1 . .(J) (E) = m∗.(J) (E) the Jordan measure of E.1.(J) (E) := inf m(B) B⊃E.∪Bk ⊃E.
On the other hand. We know that the Jordan outer measure of E can be quite large. . R] has [−R. leading to a total cost of n=1 (ε/2n )d . We will refer to this type of trick as the ε/2n trick .) We now modify this by replacing the ﬁnite union of boxes by a countable union of boxes. . . . one can cover each xn by a cube Bn of sidelength ε/2n (say) for some arbitrary ∞ ε > 0. all countable sets E have Lebesgue outer measure zero. for instance. and so the Lebesgue outer measure m (E) could well equal +∞. ∗ .2. Indeed.} ⊂ Rd be a countable set.1. in contrast to Jordan outer measure. Lebesgue measure 19 i. R]) = 2R since Q ∩ [−R. {x2 }. the Jordan outer measure is the inﬁmal cost required to cover E by a ﬁnite union of boxes. x3 .B1 . which converges to Cd εd for some absolute constant Cd . .. m∗.(J) (Q) is inﬁnite.B2 . in one dimension. (The natural number k is allowed to vary freely in the above inﬁmum. But m∗ (E) can be a lot smaller: Example 1. it will be used many further times in this text. one simply covers E by the degenerate boxes {x1 }.1. As ε can be arbitrarily small. R]) = m∗.(J) ([−R. leading to the Lebesgue outer measure 9 m∗ (E) of E: ∞ m (E) := ∞ n=1 ∗ inf Bn ⊃E.2. Note that the countable sum ∞ ∗ n=1 Bn  may be inﬁnite. thus the Lebesgue outer measure is the inﬁmal cost required to cover E by a countable union of boxes.1.. x2 . R] as its closure (see Exercise 1. of sidelength and volume zero. Let E = {x1 . From this example we see in particular that a set may be unbounded while still having Lebesgue outer measure zero. Alternatively. boxes n=1 Bn . and m∗. . if one does not like degenerate boxes.. Clearly. we see that the Lebesgue outer measure must be zero.18).(J) (E) (since we can always pad out a ﬁnite union of boxes into an inﬁnite union by adding an inﬁnite number of empty boxes). we always have m∗ (E) ≤ m∗. 9Lebesgue outer measure is also denoted m (E) in some texts.e.(J) (Q ∩ [−R.
20 1. A set E ⊂ Rd is said to be Lebesgue measurable if. with an error that has small Lebesgue outer measure: Deﬁnition 1. we refer to m(E) := m∗ (E) as the Lebesgue measure of E (note that this quantity may be equal to +∞). and we will instead use a different (but logically equivalent) deﬁnition of Lebesgue measurability. We also write m(E) as md (E) when we wish to emphasise the dimension d. in the sense that every Jordan measurable set is Lebesgue measurable. equivalent. there is an asymmetry (which ultimately arises from the fact that elementary measure is subadditive rather than superadditive): one does not gain any increase in power in the Jordan inner measure by replacing ﬁnite unions of boxes with countable ones.7. In a similar vein. this is not the most intuitive formulation of this concept to work with. deﬁnitions of Lebesgue measurability. we will deﬁne Lebesgue measurable sets to be sets that can be eﬃciently contained in open sets. The starting point is the observation (see Exercise 1. If E is Lebesgue measurable.1. In analogy with the Jordan theory.3. As we shall see later. with an error that has small Jordan outer measure. Remark 1.13) that Jordan measurable sets can be eﬃciently contained in elementary sets. .2 (Lebesgue measurability).18. This leads to one possible deﬁnition for Lebesgue measurability. though less so if one uses other. Lebesgue measure extends Jordan measure. Lebesgue outer measure (also known as Lebesgue exterior measure) is a special case of a more general concept known as an outer measure.3.17. Here.2. namely the Carath´odory criterion for Lebesgue meae surability. for every ε > 0. But one can get a sort of Lebesgue inner measure by taking complements. this principle is a triviality with our current choice of deﬁnitions.5 for a further discussion of Littlewood’s principles. Measure theory As we shall see in Section 1.2. See Section 1. see Exercise 1. see Exercise 1. However.2.2. The intuition that measurable sets are almost open is also known as Littlewood’s ﬁrst principle. there exists an open set U ⊂ Rd containing E such that m∗ (U \E) ≤ ε. we would also like to deﬁne a concept of “Lebesgue inner measure” to complement that of outer measure.
Tonelli’s theorem . this concept obeys all the intuitive properties one would ask of measure. for this topic at least. 1. as almost every set one actually encounters in analysis will be measurable (the main exceptions being some pathological sets that are constructed using the axiom of choice). (iii) (Countable subadditivity) If E1 . then m∗ ( n=1 En ) ≤ n=1 m∗ (En ). and as long as one restricts attention to Lebesgue measurable sets. E2 .2. In the notes below we will establish the basic properties of Lebesgue measure. We will also see a few other equivalent descriptions of the concept of Lebesgue measurability. and takes values in the extended nonnegative real axis [0.2. but for the purposes of drawing pictures.1.1. However. (i) (Empty set) m∗ (∅) = 0. then m∗ (E) ≤ m∗ (F ). (ii) (Monotonicity) If E ⊂ F ⊂ Rd . which we will use repeatedly in the sequel without further comment: Exercise 1. which extends the Riemann integral in the same way that Lebesgue measure extends Jordan measure. we recommend to the reader that one sets d equal to 2. .3 (The outer measure axioms). . . which was deﬁned earlier.2. no additional mathematical diﬃculties will be encountered in the higherdimensional case (though of course there are signiﬁcant visual diﬃculties once d exceeds 3). We will treat all dimensions d = 1. and the many pleasant properties of Lebesgue measure will be reﬂected in analogous pleasant properties of the Lebesgue integral (most notably the convergence theorems). Broadly speaking. Lebesgue measure 21 and the Lebesgue measure and Jordan measure of a Jordan measurable set are always equal. +∞]. . In the next set of notes we will use Lebesgue measure to set up the Lebesgue integral. (Hint: Use the axiom of countable choice. The latter is not a serious restriction in practice. . . ⊂ Rd is a count∞ ∞ able sequence of sets. equally here. We begin by studying the Lebesgue outer measure m∗ . We ﬁrst record three easy properties of Lebesgue outer measure. Properties of Lebesgue outer measure. 2. so long as one restricts attention to countable operations rather than uncountable ones.
. ∪ Ek ) ≤ m∗ (E1 ) + . Jordan outer measure will not be an abstract outer measure (even after adopting the convention that unbounded sets have Jordan outer measure +∞): it obeys the empty set and monotonicity axioms. whereas the latter requires one to bound all elements. Establishing lower bounds will often be a bit trickier. .2. Rd has inﬁnite Lebesgue outer measure. and countable subadditivity.) Thus we already see a major beneﬁt of allowing countable unions of boxes in the deﬁnition of Lebesgue outer measure.) Remark 1. one cannot hope to upgrade countable subadditivity to uncountable subadditivity: Rd is an uncountable union of points. +∞] to arbitrary subsets E of a space X that obeys the above three axioms of the empty set. but (as we shall shortly see). (For instance. gives as a corollary the ﬁnite subadditivity property m∗ (E1 ∪ . Measure theory for series. each of which has Lebesgue outer measure zero. but is only ﬁnitely subadditive rather than countably subadditive. On the other hand (and somewhat confusingly). thus Lebesgue outer measure is a model example of an abstract outer measure. when dealing with a quantity that is deﬁned using an inﬁmum. each of which have a Jordan outer measure of zero. and the ε/2n trick used previously to show that countable sets had outer measure zero. Of course. . it is usually easier to obtain upper bounds on that quantity than lower bounds. Later on in this text. because the former requires one to bound just one element of the inﬁmum. the rationals Q have inﬁnite Jordan outer measure. .) Note that countable subadditivity. (More generally. when combined with the empty set axiom. we will deﬁne the concept of an outer measure on X. monotonicity. despite being the countable union of points. in contrast to the ﬁnite unions of boxes in the deﬁnition of Jordan outer measure. These subadditivity properties will be useful in establishing upper bounds on Lebesgue outer measure. . which is an assigment E → m∗ (E) of element of [0.4. + m∗ (Ek ) for any k ≥ 0. in that ﬁnite subadditivity is upgraded to countable subadditivity. when we study abstract measure theory on a general set X.22 1.
that is to say that m∗ (E ∪ F ) = m∗ (E) + m∗ (F ) whenever E. where dist(E.7 of An epsilon of room. The diﬃculty here (which. . also appears in the theory of Jordan outer measure) is that if E and F are suﬃciently “entangled” with each other. incidentally. F are separated by some positive distance: Lemma 1. .2. . F ⊂ Rd are disjoint. .5 (Finite additivity for separated sets). but that ﬁnite additivity (and hence also countable additivity) can break down in the nonmeasurable case. it is not always possible to take a countable cover of E ∪ F by boxes and split the total volume of that cover into separate covers of E and F without some duplication. + x2 on Rd . Let ε > 0. . we have ﬁnite additivity (and even countable additivity) when all sets involved are Lebesgue measurable. . y ∈ F } is the distance10 between E and F . F ⊂ Rd be such that dist(E. 1 d . From subadditivity one has m∗ (E ∪ F ) ≤ m∗ (E) + m∗ (F ). By deﬁnition of Lebesgue outer measure. of boxes such that ∞ Bn  ≤ m∗ (E ∪ F ) + ε. Lebesgue measure 23 It is natural to ask whether Lebesgue outer measure has the ﬁnite additivity property. so we may assume that it has ﬁnite Lebesgue outer measure (and then the same is true for E and F . Vol I. F ) := inf{x − y : x ∈ E. . F ) > 0. 10Recall from the preface that we use the usual Euclidean metric (x . . .2. we can cover E ∪ F by a countable family B1 . we can at least recover ﬁnite additivity if the sets E. The answer to this question is somewhat subtle: as we shall see later. Let E.1. B2 .). by monotonicity). . B2 . . Proof. However. This is trivial if E ∪ F has inﬁnite Lebesgue outer measure. x ) := 1 d x2 + . We use the standard “give yourself an epsilon of room” trick (see Section 2. so it suﬃces to prove the other direction m∗ (E) + m∗ (F ) ≤ m∗ (E ∪ F ). Then we could divide this family into two subfamilies B1 . n=1 Suppose it was the case that each box intersected at most one of E and F . . Then m∗ (E ∪ F ) = m∗ (E) + m∗ (F ).
the ﬁrst of which covered E. In particular. particularly if the boxes are big. Since ε was arbitrary. with the total volume of these subboxes equal to the volume of the original box.24 1. each of which has diameter11 at most r. Measure theory and B1 . one can always partition a large box Bn into a ﬁnite number of smaller boxes. and at least one of E. From deﬁnition of Lebesgue outer measure. F need not have a positive separation from each other (e. . summing. we obtain ∞ m∗ (E) + m∗ (F ) ≤ n=1 Bn  and thus m∗ (E) + m∗ (F ) ≤ m∗ (E ∪ F ) + ε. 2]). But the situation improves when E. . then it is no longer possible for any box to intersect both E and F . F is compact: 11The diameter of a set B is deﬁned as sup{x − y : x. we may assume without loss of generality that the boxes B1 . Of course. we have ∞ m∗ (E) ≤ n=1 Bn  and ∞ m∗ (F ) ≤ n=1 Bn . we see that given any r > 0. covering E ∪F have diameter at most r. and then the previous argument now applies. in which case the above argument does not work because that box would be doublecounted. Applying this observation to each of the boxes Bn . we may assume that all such boxes have diameter strictly less than dist(E. . and the second of which covered F . this gives m∗ (E) + m∗ (F ) ≤ m∗ (E ∪ F ) as required. Once we do this. B3 . observe that given any r > 0. it is quite possible for some of the boxes Bn to intersect both E and F . B2 . F are closed.g. F ).. . E = [0. y ∈ B}. In general. . . However. B2 . disjoint sets E. . 1) and F = [1.
E is compact). F being compact. this observation shows that the proof this lemma must somehow use some crucial fact about the real line which is not true for countable subﬁelds of R. much quicker proofs of this theorem are available. Now we start computing the outer measure of some other sets. Since countable sets have zero outer measure. However. We already know that m∗ (E) ≤ m∗. . Lebesgue measure 25 Exercise 1. of boxes that cover E. We begin with elementary sets: Lemma 1. so it suﬃces to show that m(E) ≤ m∗ (E).2. Proof. . we note that we have managed to give a proof of Cantor’s theorem that Rd is uncountable. . Then the Lebesgue outer measure m∗ (E) of E is equal to the elementary measure m(E) of E: m∗ (E) = m(E).1. We again use the epsilon of room strategy. Of course. F ) > 0. this allows us to use the powerful HeineBorel theorem. the key fact about the real line we use is the HeineBorel theorem. B2 . with at least one of E. Let E be an elementary set. In the proof we give here.(J) (E) = m(E). such as the rationals Q. Let E. it is also possible to exploit the fact that the reals are connected as a substitute for completeness (note that proper subﬁelds of the reals are neither connected nor complete). Remark 1. .2. which asserts that every open cover of E has a ﬁnite subcover (or in other words. Let ε > 0 be arbitrary.6 (Outer measure of elementary sets). As the elementary set E is also bounded.7. F ⊂ Rd be disjoint closed sets.2. Show that dist(E. which ultimately exploits the important fact that the reals are complete. Give a counterexample to show that this claim fails when the compactness hypothesis is dropped.2. We ﬁrst establish this in the case when the elementary set E is closed. ∞ E⊂ n=1 Bn . We already know that countable sets have Lebesgue outer measure zero. then we can ﬁnd a countable family B1 .4. In the onedimensional case d = 1.
As the Bn are open. ∪ Qk ) = m(Q1 ∪ . and we have ∞ ∞ ∞ Bn  ≤ n=1 (Bn  + ε/2n ) = ( n=1 n=1 Bn ) + ε ≤ m∗ (E) + 2ε. which is a closed elementary set. . we have m∗ (Q1 ∪ . then E contains the ﬁnite union of Q1 ∪ . ∪ Qk of disjoint boxes. similarly to before. + m(Qk ) ≥ m(Q1 ) + . . . . . . Then we can write E as the ﬁnite union Q1 ∪ . as one can spend another epsilon to enlarge the boxes to be open. But this is not a serious problem. Using the ﬁnite subadditivity of elementary measure. we may apply the HeineBorel theorem and conclude that N E⊂ n=1 Bn for some ﬁnite N . . . More precisely. for each box Bn one can ﬁnd an open box Bn containing Bn such that Bn  ≤ Bn  + ε/2n (say). + m(Qk ) − ε = m(E) − ε. n=1 We would like to use the HeineBorel theorem. Measure theory Bn  ≤ m∗ (E) + ε. . By the previous discussion and the ﬁnite additivity of elementary measure. . which need not be closed. The Bn still cover E. one can ﬁnd a closed subbox Qj of Qj such that Qj  ≥ Qj  − ε/k (say). ∪ Qk disjoint closed boxes. . . the claim follows. Since ε > 0 was arbitrary.26 and such that ∞ 1. but the boxes Bn need not be open. we can use the epsilon of room strategy: for every ε > 0 and every 1 ≤ j ≤ k. . Now we consider the case when the elementary E is not closed. we conclude that N m(E) ≤ n=1 Bn  and thus m(E) ≤ m∗ (E) + 2ε. But. ∪ Qk ) = m(Q1 ) + .
which we enumerate as {q1 .(J) ([0.g. Using (1.2.8. q3 . we see that the Lebesgue outer measure and Jordan outer measure of U disagree. q2 . As it is also bounded above by the Jordan outer measure. say. It is convenient to introduce the following notion: two boxes are almost disjoint if their . 2]. On the other hand. . 1]). as U is dense in [0. Consider the countable set Q ∩ [0. . From this and monotonicity we conclude that the Lebesgue outer measure of any set is bounded below by its Jordan inner measure. we have (1. is also not Jordan measurable. Lebesgue measure 27 Applying by monotonicity of Lebesgue outer measure.2) for every E ⊂ Rd . We are now able to explain why not every bounded open set or compact set is Jordan measurable. For ε small enough (e. U contains [0.2). This in turn implies that the complement of U in.(J) (E) U := (qn − ε/2n . Now we turn to countable unions of boxes. . Remark 1.1. by countable subadditivity. [−2. qn + ε/2n ). we have m∗. ε := 1/3).2. The above lemma allows us to compute the Lebesgue outer measure of a ﬁnite union of boxes.(J) (E) ≤ m∗ (E) ≤ m∗.(J) (U ) = m∗. 1]. 1] (i. Since ε > 0 was arbitrary. we conclude that the bounded open set U is not Jordan measurable.}. and consider the set ∞ m∗.e. Finally. let ε > 0 be a small number.(J) (U ) ≥ m∗. one has ∞ m∗ (U ) ≤ n=1 2ε/2n = 2ε. 1]) = 1. the claim follows. n=1 This is the union of open sets and is thus open. despite being a compact set. we conclude that m∗ (E) ≥ m(E) − ε for every ε > 0.
28 1. As a box has the same elementary measure as its interior. Bk . 1] and [1. . so by monotonicity and Lemma 1.6.2. . so it suﬃces to show that ∞ Bn  ≤ m∗ (E). ∪ BN . Remark 1. . and not just for disjoint boxes.2.2. one has N Bn  ≤ m∗ (E).3) m(B1 ∪ . we see that the ﬁnite additivity property (1. Let E = n=1 Bn be a countable union of almost disjoint boxes B1 . . + Bk  holds for almost disjoint boxes B1 . E contains the elementary set B1 ∪ . Then ∞ m∗ (E) = n=1 Bn . From countable subadditivity and Lemma 1. ∪ BN ) and thus by (1. . then . . . Rd itself has an inﬁnite outer measure. Measure theory interiors are disjoint. ∪ BN ) = m(B1 ∪ . .6 we have ∞ ∞ m∗ (E) ≤ n=1 m∗ (Bn ) = n=1 Bn . n=1 Letting N → ∞ we obtain the claim. .3). . This (and Lemma 1.2. thus for instance [0.2.10. m∗ (E) ≥ m∗ (B1 ∪ . Proof. 2] are almost disjoint. n=1 But for each natural number N .6) has the following consequence: Lemma 1. .9 (Outer measure of countable unions of almost disjoint ∞ boxes). for instance. . . . B2 . Thus. .. ∪ Bk ) = B1  + . . The above lemma has the following immediate ∞ ∞ corollary: if E = n=1 Bn = n=1 Bn can be decomposed in two diﬀerent ways as the countable union of almost disjoint boxes. .
Also observe that each dyadic cube of sidelength 2−n is contained in exactly one “parent” cube of sidelength 2−n+1 (which. conversely.1. Deﬁne a closed dyadic cube to be a cube Q of the form Q= id id + 1 i1 i1 + 1 . . Proof. which contain no boxes other than the singleton sets). We will use the dyadic mesh structure of the Euclidean space Rd .2. × n. we also obtain the important dyadic nesting property: given . in fact.5. 2n 2n 2 2n for some integers n. Let E ⊂ Rd be an open set. Then E can be expressed as the countable union of almost disjoint boxes (and. giving the dyadic cubes a structure analogous to that of a binary tree (or more precisely.11. As a consequence of these facts. × . which is a convenient tool for “discretising” certain aspects of real analysis. thus we restrict n to the nonnegative integers. However. Observe that the closed dyadic cubes of a ﬁxed sidelength 2−n are almost disjoint. where we extend the deﬁnition of Jordan inner measure to unbounded sets in the obvious manner. Although this statement is intuitively obvious and does not explicitly use the concepts of Lebesgue outer measure or Lebesgue measure. i1 . an inﬁnite forest of 2d ary trees). there is an important class of sets of this form.2.2. (Try it!) Exercise 1.. has 2d “children” of sidelength 2−n ). and we will completely ignore “large” cubes of sidelength greater than one. Show that if a set E ⊂ Rd is expressible as the countable union of almost disjoint boxes. namely the open sets: Lemma 1.(J) (E). . it is remarkably diﬃcult to prove this statement rigorously without essentially using one of these two concepts. id . . To avoid some technical issues we shall restrict attention here to “small” cubes of sidelength at most 1. then the Lebesgue outer measure of E is equal to the Jordan inner measure: m∗ (E) = m∗. and cover all of Rd . Not every set can be expressed as the countable union of almost disjoint boxes (consider for instance the irrationals R\Q.. Lebesgue measure ∞ n=1 ∞ 29 Bn  = n=1 Bn . . as the countable union of almost disjoint closed cubes).
we have a formula for the Lebesgue outer measure of an arbitrary set: Lemma 1. As Q∗ is at most countable. If E is open. and that any two such maximal cubes in Q∗ are almost disjoint. and x ∈ E.U inf m∗ (U ) open . open Proof.12 (Outer regularity). which means that they are not contained in any other cube in Q. Q is at most countable. But we can deal with this by exploiting the dyadic nesting property. As there are only countably many dyadic cubes. and it is easy to conclude that there is also a closed dyadic cube containing x that is contained in E. the claim follows (adding empty boxes if necessary to pad out the cardinality). because these cubes are not almost disjoint (for instance. then by deﬁnition there is an open ball centered at x that is contained in E. But we are not done yet. Thus. Then one has m∗ (E) = E⊂U.30 1. we see that the union Q∈Q Q of all these cubes is exactly equal to E. From monotonicity one trivially has m∗ (E) ≤ E⊂U. or of the total volume of any partitioning of that set into almost disjoint boxes. Let Q∗ denote those cubes in Q which are maximal with respect to set inclusion. or one of them is contained in the other. Let E ⊂ Rd be an arbitrary set. Thus. we see that E is the union E = Q∈Q∗ Q of almost disjoint cubes.U inf m∗ (U ).2. We now have a formula for the Lebesgue outer measure of any open set: it is exactly equal to the Jordan inner measure of that set. any cube Q in Q will of course overlap with its child cubes). if we let Q be the collection of all the dyadic cubes Q that are contained in E. either they are almost disjoint. Measure theory any two closed dyadic cubes (possibly of diﬀerent sidelength). Finally. From the nesting property (and the fact that we have capped the maximum size of our cubes) we see that every cube in Q is contained in exactly one maximal cube in Q∗ .
2. Lebesgue measurability.2. see Exercise 1.2.U 31 inf m∗ (U ) ≤ m∗ (E). and ∞ ∞ n Bn  ≤ m∗ (E) + ε + n=1 ∞ n=1 ε/2n = m∗ (E) + 2ε.2. open As ε > 0 was arbitrary. (For the corrected version of this statement. Let ε > 0. is itself open. E⊂U. .2.15. and set out their basic properties. so we may assume that m∗ (E) is ﬁnite. we obtain the claim. this implies that m∗ ( n=1 Bn ) ≤ m∗ (E) + 2ε and thus inf m∗ (U ) ≤ m∗ (E) + 2ε.2.1. there exists a countable family B1 .13 (Existence of Lebesgue measurable sets).U Exercise 1. We can enlarge each of these boxes Bn to an open box Bn such that Bn  ≤ Bn  + ε/2n .2. Lebesgue measure so it suﬃces to show that E⊂U. open This is trivial for m∗ (E) inﬁnite. and contains E. . . Lemma 1.6. we show that there are plenty of Lebesgue measurable sets. By countable subadditivity. n=1 We use the ε/2 trick again. Give an example to show that the reverse statement m∗ (E) = sup m∗ (U ) U ⊂E. Then the set ∞ n=1 Bn . being a union of open sets. First. .U open is false. of boxes covering E such that ∞ Bn  ≤ m∗ (E) + ε. By deﬁnition of outer measure. We now deﬁne the notion of a Lebesgue measurable set as one which can be eﬃciently contained in open sets in the sense of Deﬁnition 1.) 1. B2 .2.
E2 . . being a union of open sets. By countable ∞ ∞ subadditivity. By Lemma 1. . . . is itself open. . this implies that n=1 En is contained in n=1 Un . then the intersection n=1 En is Lebesgue measurable.32 1. it suﬃces to verify the claim when E is closed and bounded. To prove Claim (vi). Now we establish Claim (ii).) (iv) The empty set ∅ is Lebesgue measurable. By hypothesis. (vi) If E1 . . ⊂ Rd are a sequence of Lebesgue measur∞ able sets. then so is its complement Rd \E. Claim (i) is obvious from deﬁnition. and so by Lemma 1. (vii) If E1 . .2. It suﬃces to show that m∗ (U \E) ≤ ε. (v) If E ⊂ Rd is Lebesgue measurable. 2. n) of radius n for n = 1. and ∞ ∞ the diﬀerence ( n=1 Un )\( n=1 En ) has Lebesgue outer measure at ∞ most ε. 3. . .11 is the countable ∞ union n=1 Qn of almost disjoint closed cubes. ∞ N m∗ (U \E) = n=1 Qn . By outer regularity (Lemma 1.2. then the union n=1 En is Lebesgue measurable. E3 .12). .9. the closed balls B(0. hence compact by the HeineBorel theorem. (ii) Every closed set is Lebesgue measurable. (iii) Every set of Lebesgue outer measure zero is measurable. Proof. Let ε > 0 be arbitrary. each En is contained in an open set Un whose diﬀerence Un \En has Lebesgue outer measure at most ε/2n . so by (vi). Note that the boundedness of E implies that m∗ (E) is ﬁnite. we use the ε/2n trick. ⊂ Rd are a sequence of Lebesgue measur∞ able sets. The set U \E is open. and the claim follows. Let ε > 0. (Such sets are called null sets. we can ﬁnd an open set U containing E such that m∗ (U ) ≤ m∗ (E) + ε. E3 . So it will suﬃce to show that n=1 Qn  ≤ ε for every ﬁnite N . as are Claims (iii) and (iv). Measure theory (i) Every open set is Lebesgue measurable. Every closed set E is the countable union of closed and bounded sets (by intersecting E with.2.). say. E2 . The set n=1 Un .
1. If we let F := n=1 Fn .2. and de Morgan’s laws c c ( α∈A Eα )c = α∈A Eα . Since m∗ (E) is ﬁnite. By monotonicity. (vi). then Rd \E contains F . one will always end up with a Lebesgue measurable set. The claim now follows from (ii). which is a strengthening of the more classical concept of a .2. (vi) of Lemma 1. This is already enough to ensure that the majority of sets that one actually encounters in real analysis will be Lebesgue measurable. It is disjoint from the compact set E. (iv). Letting Fn be the complement of Un . Claim (vii) follows from (v). Finally. we cannot generalise the countable closure properties here to uncountable closure properties. which is in turn at most m∗ (E) + ε. and ∞ that m∗ ((Rd \E)\Fn ) ≤ 1/n. (Nevertheless. then for every n we can ﬁnd an open set Un containing E such that m∗ (Un \E) ≤ 1/n.14. Informally. If E is Lebesgue measurable.5 one has N N m∗ (E ∪ n=1 Qn ) = m∗ (E) + m∗ ( n=1 Qn ). as required. using the axiom of choice one can construct sets that are not Lebesgue measurable. (iii). and from monotonicity m∗ ((Rd \E)\F ) = 0. ( α∈A Eα )c = α∈A Eα . The properties (iv). we establish Claim (v). As a consequence.) Remark 1. we will see an example of this later. Next. we may cancel it and N conclude that m∗ ( n=1 Qn ) ≤ ε.2. But F is in turn the union of countably many closed sets Fn . thus Rd \E is the union of F and a set of Lebesgue outer measure zero. so by Exercise 1. (which work for inﬁnite unions and intersections without any diﬃculty).2. the lefthand side is at most m∗ (U ). Lebesgue measure N 33 The set n=1 Qn is a ﬁnite union of closed cubes and is thus closed. the above lemma asserts (among other things) that if one starts with such basic subsets of Rd as open or closed sets and then takes at most countably many boolean operations.2. (v).4 followed by Lemma 1. we conclude that the complement Rd \E of E contains all of the Fn .13 assert that the collection of Lebesgue measurable subsets of Rd form a σalgebra.
2.34 1.9 (Middle thirds Cantor set). Measure theory boolean algebra. one can contain E in an open set U with m∗ (U \E) ≤ ε.) (Hint: Some of these deductions are either trivial or very easy. use the ε/2n trick to show that E is contained in a Lebesgue measurable set Eε with m∗ (Eε ∆E) ≤ ε. Exercise 1.) (iv) (Inner approximation by closed) For every ε > 0. one can ﬁnd an open set U such that m∗ (U ∆E) ≤ ε. To deduce (i) from (vi).) (vi) (Almost measurable) For every ε > 0. one can ﬁnd a closed set F such that m∗ (F ∆E) ≤ ε. Let E ⊂ Rd . (In other words. Let I0 := [0. one can ﬁnd a Lebesgue measurable set Eε such that m∗ (Eε ∆E) ≤ ε. (In other words. We will study abstract σalgebras in more detail in Section 1.2. Exercise 1.4. E diﬀers from an open set by a set of outer measure at most ε.6).13 is signiﬁcantly stronger than the counterpart for Jordan measurability (Exercise 1. This is one of the main reasons why we use Lebesgue measure instead of Jordan measure. in particular by allowing countably many boolean operations instead of just ﬁnitely many. let I1 := [0. E diﬀers from a closed set by a set of outer measure at most ε. one can ﬁnd a closed set F contained in E with m∗ (E\F ) ≤ ε. Note how this Lemma 1.2. (ii) (Outer approximation by open) For every ε > 0.7 (Criteria for measurability).) Exercise 1. Show that every Jordan measurable set is Lebesgue measurable. 1] be I0 with the interior of . 1] be the unit interval.8.1. (In other words. 1/3] ∪ [2/3. (iii) (Almost open) For every ε > 0. Show that the following are equivalent: (i) E is Lebesgue measurable. (v) (Almost closed) For every ε > 0.2. and then take countable intersections to show that E diﬀers from a Lebesgue measurable set by a null set. E diﬀers from a measurable set by a set of outer measure at most ε.
uncountable. Vol. (ii) (Countable additivity) If E1 .) Show that the halfopen interval [0. 1) cannot be expressed as the countable union of disjoint closed intervals. Lebesgue measure obeys signiﬁcantly better properties than Lebesgue outer measure.15 (The measure axioms). write n In := ∞ n=1 In [ a1 . 1/9] ∪ [2/9. . . Now we look at the Lebesgue measure m(E) of a Lebesgue measurable set E.. Next.. (This exercise presumes some familiarity with pointset topology. E2 .. we see from (1. 1] be I1 with the interior of the middle third of each of the two intervals of I1 removed. 1) cannot be expressed as the countable union of disjoint closed sets.) For an additional challenge. This justiﬁes the use of the notation m(E) to denote both Lebesgue measure of a Lebesgue measurable set. + n ].2} i=1 1 ai ai . and conclude that [0. and a null set.2. (i) (Empty set) m(∅) = 0. assume for sake of contradiction that [0. 1) is the union of inﬁnitely many closed intervals. then m( n=1 En ) = ∞ n=1 m(En ). 1/3] ∪ [2/3. I. i i 3 i=1 3 3 n Let C := be the intersection of all the elementary sets In . Exercise 1. which is deﬁned to equal its Lebesgue outer measure m∗ (E).10.an ∈{0.7 of An epsilon of room. 7/9] ∪ [8/9.2. and Jordan measure of a Jordan measurable set (as well as elementary measure of an elementary set). (Hint: It is easy to prevent [0. Lebesgue measure 35 the middle third interval removed. More formally. 1) from being expressed as the ﬁnite union of disjoint closed intervals. . thus Lebesgue measure extends Jordan measure. . show that [0. and so forth.1.2) that the Lebesgue measure and the Jordan measure of E coincide. 1) is homeomorphic to the middle thirds Cantor set. If E is Jordan measurable.2. ⊂ Rd is a countable se∞ quence of disjoint Lebesgue measurable sets.. Show that C is compact. which is absurd. It is also possible to proceed using the Baire category theorem (§1. when restricted to Lebesgue measurable sets: Lemma 1. let I2 := [0.
36
1. Measure theory
Proof. The ﬁrst claim is trivial, so we focus on the second. We deal with an easy case when all of the En are compact. By repeated use of Lemma 1.2.5 and Exercise 1.2.4, we have
N N
m(
n=1
En ) =
n=1
m(En ).
Using monotonicity, we conclude that
∞ N
m(
n=1
En ) ≥
n=1
m(En ).
(We can use m instead of m∗ throughout this argument, thanks to Lemma 1.2.13). Sending N → ∞ we obtain
∞ ∞
m(
n=1
En ) ≥
n=1
m(En ).
On the other hand, from countable subadditivity one has
∞ ∞
m(
n=1
En ) ≤
n=1
m(En ),
and the claim follows. Next, we handle the case when the En are bounded but not necessarily compact. We use the ε/2n trick. Let ε > 0. Applying Exercise 1.2.7, we know that each En is the union of a compact set Kn and a set of outer measure at most ε/2n . Thus m(En ) ≤ m(Kn ) + ε/2n and hence
∞ ∞
m(En ) ≤ (
n=1 n=1
m(Kn )) + ε.
Finally, from the compact case of this lemma we already know that
∞ ∞
m(
n=1
Kn ) =
n=1
m(Kn )
while from monotonicity
∞ ∞
m(
n=1
Kn ) ≤ m(
n=1
En ).
1.2. Lebesgue measure Putting all this together we see that
∞ ∞
37
m(En ) ≤ m(
n=1 n=1
En ) + ε
for every ε > 0, while from countable subadditivity we have
∞ ∞
m(
n=1
En ) ≤
n=1
m(En ).
The claim follows. Finally, we handle the case when the En are not assumed to be bounded or closed. Here, the basic idea is to decompose each En as a countable disjoint union of bounded Lebesgue measurable sets. First, ∞ decompose Rd as the countable disjoint union Rd = m=1 Am of bounded measurable sets Am ; for instance one could take the annuli Am := {x ∈ Rd : m − 1 ≤ x < m}. Then each En is the countable disjoint union of the bounded measurable sets En ∩ Am for m = 1, 2, . . ., and thus
∞
m(En ) =
m=1
m(En ∩ Am )
∞
by the previous arguments. In a similar vein, n=1 En is the countable disjoint union of the bounded measurable sets En ∩ Am for n, m = 1, 2, . . ., and thus
∞ ∞ ∞
m(
n=1
En ) =
n=1 m=1
m(En ∩ Am ),
and the claim follows. From Lemma 1.2.15 one of course can conclude ﬁnite additivity m(E1 ∪ . . . ∪ Ek ) = m(E1 ) + . . . + m(Ek ) whenever E1 , . . . , Ek ⊂ Rd are Lebesgue measurable sets. We also have another important result: Exercise 1.2.11 (Monotone convergence theorem for measurable sets).
38
1. Measure theory (i) (Upward monotone convergence) Let E1 ⊂ E2 ⊂ . . . ⊂ Rn be a countable nondecreasing sequence of Lebesgue mea∞ surable sets. Show that m( n=1 En ) = limn→∞ m(En ). ∞ (Hint: Express n=1 En as the countable union of the lacun−1 nae En \ n =1 En .) (ii) (Downward monotone convergence) Let Rd ⊃ E1 ⊃ E2 ⊃ . . . be a countable nonincreasing sequence of Lebesgue measurable sets. If at least one of the m(En ) is ﬁnite, show that ∞ m( n=1 En ) = limn→∞ m(En ). (iii) Give a counterexample to show that the hypothesis that at least one of the m(En ) is ﬁnite in the downward monotone convergence theorem cannot be dropped.
Exercise 1.2.12. Show that any map E → m(E) from Lebesgue measurable sets to elements of [0, +∞] that obeys the above empty set and countable additivity axioms will also obey the monotonicity and countable subadditivity axioms from Exercise 1.2.3, when restricted to Lebesgue measurable sets of course. Exercise 1.2.13. We say that a sequence En of sets in Rd converges pointwise to another set E in Rd if the indicator functions 1En converge pointwise to 1E . (i) Show that if the En are all Lebesgue measurable, and converge pointwise to E, then E is Lebesgue measurable also. (Hint: use the identity 1E (x) = lim inf n→∞ 1En (x) or 1E (x) = lim supn→∞ 1En (x) to write E in terms of countable unions and intersections of the En .) (ii) (Dominated convergence theorem) Suppose that the En are all contained in another Lebesgue measurable set F of ﬁnite measure. Show that m(En ) converges to m(E). (Hint: use the upward and downward monotone convergence theorems, Exercise 1.2.11.) (iii) Give a counterexample to show that the dominated convergence theorem fails if the En are not contained in a set of ﬁnite measure, even if we assume that the m(En ) are all uniformly bounded.
1.2. Lebesgue measure
39
In later sections we will generalise the monotone and dominated convergence theorems to measurable functions instead of measurable sets; see Theorem 1.4.44 and Theorem 1.4.49. Exercise 1.2.14. Let E ⊂ Rd . Show that E is contained in a Lebesgue measurable set of measure exactly equal to m∗ (E). Exercise 1.2.15 (Inner regularity). Let E ⊂ Rd be Lebesgue measurable. Show that m(E) =
K⊂E,K
sup m(K). compact
Remark 1.2.16. The inner and outer regularity properties of measure can be used to deﬁne the concept of a Radon measure (see §1.10 of An epsilon of room, Vol. I.). Exercise 1.2.16 (Criteria for ﬁnite measure). Let E ⊂ Rd . Show that the following are equivalent: (i) E is Lebesgue measurable with ﬁnite measure. (ii) (Outer approximation by open) For every ε > 0, one can contain E in an open set U of ﬁnite measure with m∗ (U \E) ≤ ε. (iii) (Almost open bounded) E diﬀers from a bounded open set by a set of arbitrarily small Lebesgue outer measure. (In other words, for every ε > 0 there exists a bounded open set U such that m∗ (E∆U ) ≤ ε.) (iv) (Inner approximation by compact) For every ε > 0, one can ﬁnd a compact set F contained in E with m∗ (E\F ) ≤ ε. (v) (Almost compact) E diﬀers from a compact set by a set of arbitrarily small Lebesgue outer measure. (vi) (Almost bounded measurable) E diﬀers from a bounded Lebesgue measurable set by a set of arbitrarily small Lebesgue outer measure. (vii) (Almost ﬁnite measure) E diﬀers from a Lebesgue measurable set with ﬁnite measure by a set of arbitrarily small Lebesgue outer measure.
40
1. Measure theory (viii) (Almost elementary) E diﬀers from an elementary set by a set of arbitrarily small Lebesgue outer measure. (ix) (Almost dyadically elementary) For every ε > 0, there exists an integer n and a ﬁnite union F of closed dyadic cubes of sidelength 2−n such that m∗ (E∆F ) ≤ ε.
One can interpret the equivalence of (i) and (ix) in the above exercise as asserting that Lebesgue measurable sets are those which look (locally) “pixelated” at suﬃciently ﬁne scales. This will be formalised in later sections with the Lebesgue diﬀerentiation theorem (Exercise 1.6.24). Exercise 1.2.17 (Carath´odory criterion, one direction). Let E ⊂ e Rd . Show that the following are equivalent: (i) E is Lebesgue measurable. (ii) For every elementary set A, one has m(A) = m∗ (A ∩ E) + m∗ (A\E). (iii) For every box B, one has B = m∗ (B ∩ E) + m∗ (B\E). Exercise 1.2.18 (Inner measure). Let E ⊂ Rd be a bounded set. Deﬁne the Lebesgue inner measure m∗ (E) of E by the formula m∗ (E) := m(A) − m∗ (A\E) for any elementary set A containing E. (i) Show that this deﬁnition is well deﬁned, i.e. that if A, A are two elementary sets containing E, that m(A) − m∗ (A\E) is equal to m(A ) − m∗ (A \E). (ii) Show that m∗ (E) ≤ m∗ (E), and that equality holds if and only if E is Lebesgue measurable. Deﬁne a Gδ set to be a countable intersection n=1 Un of open ∞ sets, and an Fσ set to be a countable union n=1 Fn of closed sets. Exercise 1.2.19. Let E ⊂ Rd . Show that the following are equivalent: (i) E is Lebesgue measurable. (ii) E is a Gδ set with a null set removed.
∞
(ii) Let E ⊂ Rd .22. If one is willing to add on an epsilon of error (as measured in the Lebesgue outer measure). where (md )∗ denotes ddimensional Lebesgue measure. and so one may have to divide into cases or take advantage of the monotone convergence theorem for Lebesgue measure. If one is only willing to approximate to within a null set. Lebesgue measure (iii) E is the union of a Fσ set and a null set. F ⊂ Rd be Lebesgue measurable sets. show that T (E) is Lebesgue measurable. if one is willing to take away an epsilon of error. From the above exercises. Exercise 1.11.2. there is a tradeoﬀ between the type of approximation one is willing to bear. show that E + x is Lebesgue measurable for any x ∈ Rd . one can make a measurable set closed. If E ⊂ Rd is Lebesgue measurable. dually.27.2. one can make a measurable set open. Exercise 1. show that (md+d )∗ (E × F ) ≤ (md )∗ (E)(md )∗ (F ). If E ⊂ Rd is Lebesgue measurable. Finally. (Note that we allow E or F to have inﬁnite measure. or even a ﬁnite union of dyadic cubes. Let d. then one can only say that a measurable set is approximated by a Gδ or a Fσ set. and that m(E + x) = m(E).2.2. and that m(T (E)) =  det T m(E).2. (i) If E ⊂ Rd and F ⊂ Rd . 41 Remark 1. see Exercise 1. Exercise 1. we see that when describing what it means for a set to be Lebesgue measurable. d ≥ 1 be natural numbers. then T (E) need not be Lebesgue measurable.2. We caution that if T : Rd → Rd is a linear map to a space Rd of strictly smaller dimension than Rd . then one can make a measurable set (of ﬁnite measure) elementary. etc. if one is willing to both add and subtract an epsilon of error. with md+d (E× F ) = md (E) · md (F ). Show that E×F ⊂ Rd+d is Lebesgue measurable.1. which is a fairly weak amount of structure. and T : Rd → Rd is a linear transformation. and the type of things one can say about the approximation.17.21 (Change of variables).) .2. Exercise 1.20 (Translation invariance).
We say that two sets E.g. 1]d ). then show that m is bounded by outer measure. Show that Lebesgue measure E → m(E) is the only map from Lebesgue measurable sets to [0. and let L ⊂ 2A be the Lebesgue measurable subsets of A. . (i) Let 2A := {E : E ⊂ A} be the power set of A. 1]d ) = 1. Show that L/ ∼ is the closure of E/ ∼ with respect to the metric deﬁned above. A = [0.24 (Lebesgue measure as the completion of elementary measure). [E ]) := m∗ (E∆E ).2. The purpose of the following exercise is to indicate how Lebesgue measure can be viewed as a metric completion of elementary measure in some sense. ⊂ Rd is a countable se∞ quence of disjoint Lebesgue measurable sets.23 (Uniqueness of Lebesgue measure). F ∈ 2A are equivalent if E∆F is a null set. then m( n=1 En ) = ∞ n=1 m(En ). Measure theory Exercise 1. Exercise 1. . (iii) (Translation invariance) If E is Lebesgue measurable and x ∈ Rd . E2 . . (iii) Let E ⊂ 2A be the elementary subsets of A. Deﬁne a distance d : 2A / ∼ ×2A / ∼→ R+ between two equivalence classes [E]. . (ii) Let 2A / ∼ be the set of equivalence classes [E] := {F ∈ 2A : E ∼ F } of 2A with respect to the above equivalence relation. Hint: First show that m must match elementary measure on elementary sets. Show that this distance is welldeﬁned (in the sense that m(E∆E ) = m(F ∆F ) whenever [E] = [F ] and [E ] = [F ]) and gives 2A / ∼ the structure of a complete metric space. (iv) (Normalisation) m([0. but in some ﬁxed elementary set A (e.2. +∞] that obeys the following axioms: (i) (Empty set) m(∅) = 0. then m(E + x) = m(E). (ii) (Countable additivity) If E1 . L/ ∼ is a complete metric space that contains E/ ∼ as a dense subset. To avoid some technicalities we will not work in all of Rd . Show that this is a equivalence relation. in other words. L/ ∼ is a metric completion of E/ ∼. In particular.42 1. [E ] by deﬁning d([E].
1.2. Lebesgue measure
43
(iv) Show that Lebesgue measure m : L → R+ descends to a continuous function m : L/ ∼→ R+ , which by abuse of notation we shall still call m. Show that m : L/ ∼→ R+ is the unique continuous extension of the analogous elementary measure function m : E/ ∼→ R+ to L/ ∼. For a further discussion of how measures can be viewed as completions of elementary measures, see §2.1 of An epsilon of room, Vol. I. Exercise 1.2.25. Deﬁne a continuously diﬀerentiable curve in Rd to be a set of the form {γ(t) : a ≤ t ≤ b} where [a, b] is a closed interval and γ : [a, b] → Rd is a continuously diﬀerentiable function. (i) If d ≥ 2, show that every continuously diﬀerentiable curve has Lebesgue measure zero. (Why is the condition d ≥ 2 necessary?) (ii) Conclude that if d ≥ 2, then the unit cube [0, 1]d cannot be covered by countably many continuously diﬀerentiable curves. We remark that if the curve is only assumed to be continuous, rather than continuously diﬀerentiable, then these claims fail, thanks to the existence of spaceﬁlling curves. 1.2.3. Nonmeasurable sets. In the previous section we have set out a rich theory of Lebesgue measure, which enjoys many nice properties when applied to Lebesgue measurable sets. Thus far, we have not ruled out the possibility that every single set is Lebesgue measurable. There is good reason for this: a famous theorem of Solovay[So1970] asserts that, if one is willing to drop the axiom of choice, there exist models of set theory in which all subsets of Rd are measurable. So any demonstration of the existence of nonmeasurable sets must use the axiom of choice in some essential way. That said, we can give an informal (and highly nonrigorous) motivation as to why nonmeasurable sets should exist, using intuition from probability theory rather than from set theory. The starting point is the observation that Lebesgue sets of ﬁnite measure (and in particular, bounded Lebesgue sets) have to be “almost elementary”, in the sense of Exercise 1.2.16. So all we need to do to build
44
1. Measure theory
a nonmeasurable set is to exhibit a bounded set which is not almost elementary. Intuitively, we want to build a set which has oscillatory structure even at arbitrarily ﬁne scales. We will nonrigorously do this as follows. We will work inside the unit interval [0, 1]. For each x ∈ [0, 1], we imagine that we ﬂip a coin to give either heads or tails (with an independent coin ﬂip for each x), and let E ⊂ [0, 1] be the set of all the x ∈ [0, 1] for which the coin ﬂip came up heads. We suppose for contradiction that E is Lebesgue measurable. Intuitively, since each x had a 50% chance of being heads, E should occupy about “half” of [0, 1]; applying the law of large numbers (see e.g. [Ta2009, §1.4]) in an extremely nonrigorous fashion, we thus expect m(E) to equal 1/2. Moreover, given any subinterval [a, b] of [0, 1], the same reasoning leads us to expect that E ∩ [a, b] should occupy about half of [a, b], so that m(E ∩ [a, b]) should be [a, b]/2. More generally, given any elementary set F in [0, 1], we should have m(E ∩ F ) = m(F )/2. This makes it very hard for E to be approximated by an elementary set; indeed, a little algebra then shows that m(E∆F ) = 1/2 for any elementary F ⊂ [0, 1]. Thus E is not Lebesgue measurable. Unfortunately, the above argument is terribly nonrigorous for a number of reasons, not the least of which is that it uses an uncountable number of coin ﬂips, and the rigorous probabilistic theory that one would have to use to model such a system of random variables is too weak12 to be able to assign meaningful probabilities to such events as “E is Lebesgue measurable”. So we now turn to more rigorous arguments that establish the existence of nonmeasurable sets. The arguments will be fairly simple, but the sets constructed are somewhat artiﬁcial in nature. Proposition 1.2.18. There exists a subset E ⊂ [0, 1] which is not Lebesgue measurable. Proof. We use the fact that the rationals Q are an additive subgroup of the reals R, and so partition the reals R into disjoint cosets x + Q. This creates a quotient group R/Q := {x + Q : x ∈ R}. Each coset C of R/Q is dense in R, and so has a nonempty intersection
12For some further discussion of this point, see [Ta2009, §1.10].
1.2. Lebesgue measure
45
with [0, 1]. Applying the axiom of choice, we may thus ﬁnd an element xC ∈ C ∩ [0, 1] for each C ∈ R/Q. We then let E := {xC : C ∈ R/Q} be the collection of all these coset representatives. By construction, E ⊂ [0, 1]. Let y be any element of [0, 1]. Then it must lie in some coset C of R/Q, and thus diﬀers from xC by some rational number in [−1, 1]. In other words, we have (1.4) [0, 1] ⊂
q∈Q∩[−1,1]
(E + q).
On the other hand, we clearly have (1.5)
q∈Q∩[−1,1]
(E + q) ⊂ [−1, 2].
Also, the diﬀerent translates E + q are disjoint, because E contains only one element from each coset of Q. We claim that E is not Lebesgue measurable. To see this, suppose for contradiction that E was Lebesgue measurable. Then the translates E + q would also be Lebesgue measurable. By countable additivity, we thus have m( (E + q)) =
q∈Q∩[−1,1] q∈Q∩[−1,1]
m(E + q),
and thus by translation invariance and (1.4), (1.5) 1≤
q∈Q∩[−1,1]
m(E) ≤ 3.
On the other hand, the sum q∈Q∩[−1,1] m(E) is either zero (if m(E) = 0) or inﬁnite (if m(E) > 0), leading to the desired contradiction. Exercise 1.2.26 (Outer measure is not ﬁnitely additive). Show that there exists disjoint bounded subsets E, F of the real line such that m∗ (E ∪ F ) = m∗ (E) + m∗ (F ). (Hint: Show that the set constructed in the proof of the above proposition has positive outer measure.) Exercise 1.2.27 (Projections of measurable sets need not be measurable). Let π : R2 → R be the coordinate projection π(x, y) := x. Show that there exists a measurable subset E of R2 such that π(E) is not measurable.
46
1. Measure theory
Remark 1.2.19. The above discussion shows that, in the presence of the axiom of choice, one cannot hope to extend Lebesgue measure to arbitrary subsets of R while retaining both the countable additivity and the translation invariance properties. If one drops the translation invariant requirement, then this question concerns the theory of measurable cardinals, and is not decidable from the standard ZFC axioms. On the other hand, one can construct ﬁnitely additive translation invariant extensions of Lebesgue measure to the power set of R by use of the HahnBanach theorem (§1.5 of An epsilon of room, Vol. I ) to extend the integration functional, though we will not do so here.
1.3. The Lebesgue integral
In Section 1.2, we deﬁned the Lebesgue measure m(E) of a Lebesgue measurable set E ⊂ Rd , and set out the basic properties of this measure. In this set of notes, we use Lebesgue measure to deﬁne the Lebesgue integral f (x) dx
Rd
of functions f : Rd → C∪{∞}. Just as not every set can be measured by Lebesgue measure, not every function can be integrated by the Lebesgue integral; the function will need to be Lebesgue measurable. Furthermore, the function will either need to be unsigned (taking values on [0, +∞]), or absolutely integrable. To motivate the Lebesgue integral, let us ﬁrst brieﬂy review two simpler integration concepts. The ﬁrst is that of an inﬁnite summation
∞
cn
n=1
of a sequence of numbers cn , which can be viewed as a discrete analogue of the Lebesgue integral. Actually, there are two overlapping, but diﬀerent, notions of summation that we wish to recall here. The ﬁrst is that of the unsigned inﬁnite sum, when the cn lie in the extended nonnegative real axis [0, +∞]. In this case, the inﬁnite sum
1.3. The Lebesgue integral can be deﬁned as the limit of the partial sums
∞ N
47
(1.6)
n=1
cn = lim
N →∞
cn
n=1
or equivalently as a supremum of arbitrary ﬁnite partial sums:
∞
(1.7)
n=1
cn =
∞
A⊂N,A
sup cn . ﬁnite n∈A
The unsigned inﬁnite sum n=1 cn always exists, but its value may be inﬁnite, even when each term is individually ﬁnite (consider e.g. ∞ n=1 1). The second notion of a summation is the absolutely summable inﬁnite sum, in which the cn lie in the complex plane C and obey the absolute summability condition
∞
cn  < ∞,
n=1
where the lefthand side is of course an unsigned inﬁnite sum. When N this occurs, one can show that the partial sums n=1 cn converge to a limit, and we can then deﬁne the inﬁnite sum by the same formula (1.6) as in the unsigned case, though now the sum takes values in C rather than [0, +∞]. The absolute summability condition confers a number of useful properties that are not obeyed by sums that are merely conditionally convergent; most notably, the value of an absolutely convergent sum is unchanged if one rearranges the terms in the series in an arbitrary fashion. Note also that the absolutely summable inﬁnite sums can be deﬁned in terms of the unsigned inﬁnite sums by taking advantage of the formulae
∞ ∞ ∞
cn = (
n=1 n=1
Re(cn )) + i(
n=1
Im(cn ))
for complex absolutely summable cn , and
∞ ∞ ∞
cn =
n=1 n=1
c+ − n
n=1
c− n
h into ﬁnite linear combinations of indicator functions 1I of intervals I.) To deﬁne the unsigned Lebesgue integral. Recall from Section 1. the theory of the unsigned Lebesgue integral is easiest to deﬁne by relying solely on lower integrals rather than upper ones. as we will see. absolutely integrable functions will be allowed to occasionally become inﬁnite. formed by breaking up the piecewise constant functions g. 0) are the (magnitudes of the) positive and negative parts of cn . then we refer to the lower Lebesgue integral simply as the . a g(x) dx is a piecewise constant integral. piecewise constant g(x) dx. namely the a f (x) dx of a Riemann integrable function f : [a. we now turn to another b more basic notion of integration.48 1. where c+ := max(cn . we will ﬁrst deﬁne an unsigned Lebesgue integral Rd f (x) dx of (measurable) unsigned functions f : Rd → [0. but much as the theory of Lebesgue measure is easiest to deﬁne by relying solely on outer measure and not on inner measure. +∞]. which cannot have any inﬁnite terms. It turns out that virtually the same deﬁnition allows us to deﬁne a lower Lebesgue integral Rd f (x) dx of any unsigned function f : Rd → [0. and then use that to deﬁne the absolutely convergent Lebesgue integral Rd f (x) dx of absolutely integrable functions f : Rd → C ∪ {∞}. simply by replacing intervals with the more general class of Lebesgue measurable sets (and thus replacing piecewise constant functions with the more general class of simple functions). 0) and c− := n n max(−cn . In an analogous spirit. and then measuring the length of each interval. If the function is Lebesgue measurable (a concept that we will deﬁne presently). b] → R. However.7). the upper integral is somewhat problematic when dealing with “improper” integrals of functions that are unbounded or are supported on sets of inﬁnite measure.c. a (It is also equal to the upper Darboux integral. (In contrast to absolutely summable series.c. Measure theory for real absolutely summable cn .g sup p.1 that this integral is equal to the lower Darboux integral b b b f (x) = a a f (x) dx := g≤f .) Compare this formula b also with (1. +∞]. The integral p. this can only happen on a set of Lebesgue measure zero.
see e.49). Another approach (which will not be discussed here) is to take the metric completion of the Riemann integral with respect to the L1 metric. The Lebesgue integral 49 Lebesgue integral. similarly to how the absolutely convergent inﬁnite sum can be deﬁned using the unsigned inﬁnite sum. it obeys all the basic properties one expects of an integral. The Lebesgue integral and Lebesgue measure can be viewed as completions of the Riemann integral and Jordan measure respectively. we will then be able to deﬁne the absolutely convergent Lebesgue integral. [StSk2005]. one can proceed with the unsigned integral but then making an auxiliary stop at integration of functions that are bounded and are supported on a set of ﬁnite measure.4. Remark 1.1.2.47) and the monotone convergence theorem (Theorem 1.4.3. For instance. the Lebesgue theory can be approximated by the Riemann theory. namely Fatou’s lemma (Corollary 1. in subsequent notes we will see that it also obeys a fundamentally important convergence theorem.g. probability. such as PDE. as we shall see by establishing the two basic convergence theorems of the unsigned Lebesgue integral.1. and every Riemann integrable function is Lebesgue measurable. This integral also obeys all the basic properties one expects. This is not the only route to setting up the unsigned and absolutely convergent Lebesgue integrals.44). as well as allied ﬁelds that rely heavily on limits of functions. Firstly. such . the dominated convergence theorem (Theorem 1. This means three things. before going to the absolutely convergent Lebesgue integral.4. in subsequent notes we will also see that it behaves quite well with respect to limits. Conversely. and ergodic theory. As we shall see. as we saw in Section 1. This convergence theorem makes the Lebesgue integral (and its abstract generalisations to other measure spaces than Rd ) particularly suitable for analysis. Once we have the theory of the unsigned Lebesgue integral. with the measures and integrals from the two theories being compatible. such as linearity and compatibility with the more classical Riemann integral. the Lebesgue theory extends the Riemann theory: every Jordan measurable set is Lebesgue measurable. such as monotonicity and additivity.3. every Lebesgue measurable set can be approximated (in various senses) by simpler sets.
3 of An epsilon of room. .8) f = c1 1E1 + . A related fact. In this deﬁnition. but the convergence theorems mentioned above already hint at this completeness. . However. k.3. ck ∈ C are complex numbers. it is closed under addition. the Lebesgue theory is complete in various ways. . . we did not require the E1 . Simp(Rd ) also closed under pointwise product f. . . the space Simp+ (Rd ) of unsigned simple functions is a [0.3. It is clear from construction that the space Simp(Rd ) of complexvalued simple functions forms a complex vector space. Lebesgue measurable functions can be approximated by nicer functions. . basically by exploiting Venn diagrams (or. known as Egorov’s theorem. Much as the Riemann integral was set up by ﬁrst using the integral for piecewise constant functions. such as Riemann integrable or continuous functions. Finally. any k subsets E1 .3. +∞] rather than C. A (complexvalued) simple function f : Rd → C is a ﬁnite linear combination (1. but with the ci taking values in [0.2 (Simple function). and under scalar multiplication by elements in [0. I. . +∞].1. +∞]. . . is deﬁned similarly. . +∞]module. Meanwhile. the Lebesgue integral is set up using the integral for simple functions. . The facts listed here manifestations of Littlewood’s three principles of real analysis (Section 1. it is easy enough to arrange this. where k ≥ 0 is a natural number and c1 . also. this is formalised in §1. Indeed. . g → f g and complex conjugation f → f . Integration of simple functions. to use fancier language. Ek of Rd partition Rd into 2k disjoint . An unsigned simple function f : Rd → [0. . asserts that a pointwise converging sequence of functions can be approximated as a (locally) uniformly converging sequence of functions. In short. Ek to be disjoint. which capture much of the essence of the Lebesgue theory.50 1. Simp(Rd ) is a commutative ∗algebra. 1. ﬁnite boolean algebras). Vol. and in a similar fashion. Measure theory as open sets or elementary sets.5). Deﬁnition 1. + ck 1Ek of indicator functions 1Ei of Lebesgue measurable sets Ei ⊂ Rd for i = 1. . . . .
c1 . Let k. . E1 . . . thus Simp Rd f (x) dx will take values in [0. . . + ck m(Ek ). + ck 1Ek c1 m(E1 ) + . . . + ck m(Ek ). is measurable). . then its absolute value f  : x → f (x) is an unsigned simple function. + ck 1Ek = c1 1E1 + .3. . k ≥ 0 be natural numbers.3 (Integral of a unsigned simple function). .9) c1 1E1 + . . c1 . . . . . . . Then one has .4 (Welldeﬁnedness of simple integral). .3. . . + ck m(Ek ) = c1 m(E1 ) + . If f = c1 1E1 +. . Ek ⊂ Rd be Lebesgue measurable sets such that the identity (1. the integral Simp Rd f (x) dx is deﬁned by the formula Simp Rd f (x) dx := c1 m(E1 ) + . + ck 1Ek = c1 1E1 + . . . one has to actually check that this deﬁnition is welldeﬁned. It is geometrically intuitive that we should deﬁne the integral 1E (x) dx of an indicator function of a measurable set E to equal Rd m(E): 1E (x) dx = m(E). One easy consequence of this is that if f is a complexvalued simple function. +∞]. The (complex or unsigned) simple function is constant on each of these sets. . . and so can easily be decomposed as a linear combination of the indicator function of these sets. ck . This is the purpose of the following lemma: Lemma 1. Rd Using this and applying the laws of integration formally.3. + ck 1Ek of a function as a ﬁnite unsigned combination of indicator functions of measurable sets will give the same value for the integral Simp Rd f (x) dx. .1. . . The Lebesgue integral 51 sets. in the sense that diﬀerent representations f = c1 1E1 + . . we are led to propose the following deﬁnition for the integral of an unsigned simple function: Deﬁnition 1.+ck 1Ek is an unsigned simple function. each of which is an intersection of Ei or the complement Rd \Ei for i = 1. . . k (and in particular. However. . +∞]. . . . . Ek . and let E1 . holds identically on Rd . ck ∈ [0.
Am for some 0 ≤ m ≤ 2k+k . Ek are Lebesgue measurable. . At such a point. each of which is an intersection of some of the E1 . . . m we obtain (1. Ek . Measure theory Proof. . . E1 . .10). . Am . . . . . we ﬁx 1 ≤ j ≤ m and evaluate (1. Am are too. . . . . . . . . and some subsets Ji . . . . . We throw away any sets that are empty. . . . thus we can write Ei = Aj j∈Ji and Ei = j ∈Ji Aj for all i = 1. To obtain this. 1Ei (x) is equal to 1Ji (j). . leaving us with a partition of Rd into m nonempty disjoint sets A1 . . Ji ⊂ {1. we thus have m(Ei ) = j∈Ji m(Aj ) and m(Ei ) = j∈Ji m(Aj ) Thus. Ek . . . . . . . . . our objective is now to show that k k (1. Ek . We again use a Venn diagram argument. . Multiplying this by m(Aj ) and then summing over all j = 1. m}. Ek arise as unions of some of the A1 . Ek . . . We now make some important deﬁnitions that we will use repeatedly in this text: . From (1. By construction. Ek and their complements.9) at a point x in the nonempty set Aj . . . . . . . k . and similarly 1Ei is equal to 1Ji (j). .52 1. . . . . . As the E1 .9) we conclude that k k ci 1Ji (j) = i=1 i =1 ci 1Ji (j). k and i = 1. Ek partition Rd into 2k+k disjoint sets.10) i=1 ci j∈Ji m(Aj ) = i =1 ci j∈Ji m(Aj ). each of the E1 . the A1 . . E1 . . . . . E1 . . The k + k sets E1 . By ﬁnite additivity of Lebesgue measure. . E1 . . .
The support of a function f : Rd → C or f : Rd → [0.3. because the countable union of null sets is still a null set. +∞] be simple unsigned functions. g : Rd → Z into an arbitrary range Z are said to agree almost everywhere if one has f (x) = g(x) for almost every x ∈ Rd . P2 (x). .3. which we will refer to as almost everywhere equivalence. . one can (as a rule of thumb) treat the almost universal quantiﬁer “for almost every” as if it was the truly universal quantiﬁer “for every”. then they will simultaneously be true for almost every x. as long as one is only concatenating at most countably many properties together.3. We usually omit the preﬁx Lebesgue.5 (Almost everywhere and support). are an at most countable family of properties. and as long as one never specialises the free variable x to a null set. The Lebesgue integral 53 Deﬁnition 1. Vol. deﬁned as the closure of the support. and P (x) implies Q(x).e. Also. if P1 (x). Observe also that the property of agreeing almost everywhere is an equivalence relation. In An epsilon of room. Because of these properties.1. Note that if P (x) holds for almost every x. . . if the set of x ∈ Rd for which P (x) fails has Lebesgue measure zero (i. P is true outside of a null set). The following properties of the simple unsigned integral are easily obtained from the deﬁnitions: Exercise 1. and often abbreviate “almost everywhere” or “almost every” as a.1 (Basic properties of the simple unsigned integral). A property P (x) of a point x ∈ Rd is said to hold (Lebesgue) almost everywhere in Rd . each of which individually holds for almost every x. then Q(x) holds for almost every x. Let f.e. Two functions f. g : Rd → [0. +∞] is deﬁned to be the set {x ∈ Rd : f (x) = 0} where f is nonzero. I we will also see the notion of the closed support of a function f : Rd → C. or for (Lebesgue) almost every point x ∈ Rd .
We can now deﬁne an absolutely convergent counterpart to the simple unsigned integral. (v) (Monotonicity) If f (x) ≤ g(x) for almost every x ∈ Rd . If f is absolutely integrable. then Simp Rd f (x) dx ≤ Simp Rd g(x) dx. (vi) (Compatibility with Lebesgue measure) For any Lebesgue measurable E. show that the simple unsigned integral f → Simp Rd f (x) dx is the only map from the space Simp+ (Rd ) of unsigned simple functions to [0. Deﬁnition 1.6 (Absolutely convergent simple integral). one has Simp Rd 1E (x) dx = m(E). (ii) (Finiteness) We have Simp Rd f (x) dx < ∞ if and only if f is ﬁnite almost everywhere. +∞] that obeys all of the above properties. Rd f (x) dx = 0 if and only if f (iv) (Equivalence) If f and g agree almost everywhere. A complexvalued simple function f : Rd → C is said to be absolutely integrable of Simp Rd f (x) dx < ∞. Furthermore.54 (i) (Unsigned linearity) We have Simp Rd 1. Measure theory f (x) + g(x) dx = Simp Rd f (x) dx g(x) dx Rd + Simp and Simp Rd cf (x) dx = c × Simp Rd f (x) dx for all c ∈ [0. the integral Simp Rd f (x) dx is deﬁned for real signed f by the formula Simp Rd f (x) dx := Simp Rd f+ (x) dx − Simp Rd f− (x) dx . then Simp Rd f (x) dx = Simp Rd g(x) dx. (iii) (Vanishing) We have Simp is zero almost everywhere.3. +∞]. and its support has ﬁnite measure. but we give it here as motivation for that more general notion of integration. This integral will soon be superceded by the absolutely Lebesgue integral.
real signed. Also we have Simp Rd f (x) dx = Simp Rd f (x) dx. and is thus a complex vector space. Let f. g : Rd → C be absolutely integrable simple functions.11) Simp Rd cf (x) dx = c × Simp Rd f (x) dx for all c ∈ C. The properties of the unsigned simple integral then can be used to deduce analogous properties for the complexvalued integral: Exercise 1. then Simp Rd f (x) dx = Simp Rd g(x) dx. (ii) (Equivalence) If f and g agree almost everywhere. . Rd + i Simp Note from the preceding exercise that a complexvalued simple function f is absolutely integrable if and only if it has ﬁnite measure support (since ﬁniteness almost everywhere is automatic).3. In particular. and for complexvalued f by the formula13 Simp Rd f (x) dx := Simp Rd Re f (x) dx Im f (x) dx.2 (Basic properties of the complexvalued simple integral).1. 0) and f− (x) := max(−f (x). this is an abuse of notation as we have now deﬁned the simple integral Simp Rd three diﬀerent times. 0) (note that these are unsigned simple functions that are pointwise dominated by f  and thus have ﬁnite integral).3. (i) (*linearity) We have Simp Rd f (x) + g(x) dx = Simp Rd f (x) dx g(x) dx Rd + Simp and (1. but one easily veriﬁes that these three deﬁnitions agree with each other on their common domains of deﬁnition. so it is safe to use a single notation for all three. and complexvalued simple functions. The Lebesgue integral 55 where f+ (x) := max(f (x). 13Strictly speaking. for unsigned. the space Simpabs (Rd ) of absolutely integrable simple functions is closed under addition and scalar multiplication by complex numbers.
g. such as absolute integrability). it is convenient to abstract the notion of an almost everywhere deﬁned function somewhat.11).e. Such classes are then no longer functions in the .) Furthermore. c = −1 separately. of course. f is deﬁned on some set Rd \N where N is a null set). To establish (1. in the subﬁeld of analysis known as functional analysis. but merely deﬁned almost everywhere on Rd (i. by replacing any such function f with the equivalence class of almost everywhere deﬁned functions that are equal to f almost everywhere. and so one can still perform a large portion of analysis on such functions. show that the complexvalued simple integral f → Simp Rd f (x) dx is the only map from the space Simpabs (Rd ) of absolutely integrable simple functions to C that obeys all of the above properties. one has Simp Rd 1E (x) dx = m(E). We now comment further on the fact that (simple) functions that agree almost everywhere. Indeed. as 1 there are many natural functions (e. (Hints: Work out the realvalued counterpart of the linearity property ﬁrst. simply by extending f to all of Rd in some arbitrary fashion (e. start with the identity f + g = (f + g)+ − (f + g)− = (f+ − f− ) + (g+ − g− ) and rearrange the second inequality so that no subtraction appears. have the same integral. c = 0.g. Measure theory (iii) (Compatibility with Lebesgue measure) For any Lebesgue measurable E. by setting f equal to zero on N ). g.56 1. once one has this noise tolerance. sin x in one dimension. To deal with the additivity for real functions f. or xα x for various α > 0 in higher dimensions) that are only deﬁned almost everywhere instead of everywhere (often due to “division by zero” problems when a denominator vanishes). This is extremely convenient for analysis. In fact. and this will not aﬀect the ﬁnal value of the integral. While such functions cannot be evaulated at certain singular points. one can even integrate functions f that are not deﬁned everywhere on Rd . they can still be integrated (provided they obey some integrability condition. We can view this as an assertion that integration is a noisetolerant operation: one can have “noise” or “errors” in a function f (x) on a null set. treat the cases c > 0.
The “Lebesgue philosophy” that one is willing to lose control on sets of measure zero is a perspective that distinguishes Lebesguetype analysis from other types of analysis. if it is the pointwise limit of unsigned simple functions. One of the shortest ways to deﬁne this class is as follows: Deﬁnition 1. that lets one pass from almost everywhere true statements to everywhere true statements). if there exists a sequence f1 .8 (Unsigned measurable function). if one needs to control a function at absolutely every point.3. 1. then one often needs to use other tools than integration theory (unless one has some regularity on the function. . the unsigned simple integral can be completed to the unsigned Lebesgue integral. +∞] of unsigned simple functions such that fn (x) → f (x) for every x ∈ Rd . by extending the class of unsigned simple functions to the larger class of unsigned Lebesgue measurable functions. and not just almost every point. The Lebesgue integral 57 standard settheoretic sense (they do not map each point in the domain to a unique point in the range.9 (Equivalent notions of measurability). f3 . it has many equivalent forms: Lemma 1.1. Remark 1. or measurable for short.2. since points in Rd have measure zero). This particular deﬁnition is not always the most tractable. various topologies become Hausdorﬀ.3. but can give completely diﬀerent structural classiﬁcations to a pair of sets that agree almost everywhere. and so forth).3. f2 . Vol. I for further discussion. This loss of control on null sets is the price one has to pay for gaining access to the powerful tool of the Lebesgue integral. See §1.3. Fortunately.3 of An epsilon of room. +∞] be an unsigned function. but the properties of various function spaces improve when one does this (various seminorms become norms. +∞] is unsigned Lebesgue measurable. .e. Then the following are equivalent: . Measurable functions. An unsigned function f : Rd → [0. which is also interested in studying subsets of Rd .7. most notably that of descriptive set theory. . Let f : Rd → [0. Much as the piecewise constant integral can be completed to the Riemann integral. : Rd → [0.3. such as continuity. i.
(iv) implies (ii). 1. Measure theory (ii) f is the pointwise limit of unsigned simple functions fn (thus the limit limn→∞ fn (x) exists and is equal to f (x) for all x ∈ Rd ). +∞] converges. of unsigned simple functions fn . (iv) f is the supremum f (x) = supn fn (x) of an increasing sequence 0 ≤ f1 ≤ f2 ≤ . +∞). . n→∞ n→∞ N >0 n≥N This implies that. Now we show that (iii) implies (v). (x) For every interval I ⊂ [0. +∞). the set {x ∈ Rd : f (x) ≤ λ} is Lebesgue measurable. As every monotone sequence in [0. (ix) For every λ ∈ [0. Proof. (xi) For every (relatively) open set U ⊂ [0. the set f −1 (I) := {x ∈ Rd : f (x) ∈ I} is Lebesgue measurable. the set {x ∈ Rd : f (x) > λ} is Lebesgue measurable. then for almost every x ∈ Rd one has f (x) = lim fn (x) = lim sup fn (x) = inf sup fn (x). +∞]. the set {x ∈ Rd : f (x) < λ} is Lebesgue measurable. each of which are bounded with ﬁnite measure support. the set f −1 (K) := {x ∈ Rd : f (x) ∈ K} is Lebesgue measurable. for any λ. (v) For every λ ∈ [0. (vii) For every λ ∈ [0. (vi) For every λ ∈ [0. +∞]. +∞]. the set {x ∈ Rd : f (x) > λ} is equal to {x ∈ Rd : sup fn (x) > λ + M >0 N >0 n≥N 1 } M .58 (i) f is unsigned Lebesgue measurable. +∞]. the set {x ∈ Rd : f (x) ≥ λ} is Lebesgue measurable. +∞). If f is the pointwise almost everywhere limit of fn . . (xii) For every (relatively) closed set K ⊂ [0. the set f −1 (U ) := {x ∈ Rd : f (x) ∈ U } is Lebesgue measurable. (iii) f is the pointwise almost everywhere limit of unsigned simple functions fn (thus the limit limn→∞ fn (x) exists and is equal to f (x) for almost every x ∈ Rd ). (i) and (ii) are equivalent by deﬁnition. (ii) clearly implies (iii).
+∞] are the nonnegative rationals. The Lebesgue integral outside of a set of measure zero. because every open set in [0. The only remaining task is to show that (v)(xi) implies (iv). +∞). +∞). with fn (x) := 0 for x > n. By expressing an interval as the intersection of two halfintervals. The claim then easily follows from the countable nature of Q+ (treating the extreme cases λ = 0. A similar argument lets one deduce (v) or (vi) from (ix). Since countable unions or countable intersections of Lebesgue measurable sets are Lebesgue measurable. n) when x ≤ n. and {x ∈ Rd : f (x) < λ} is the complement of {x ∈ Rd : f (x) ≥ λ}. +∞) is the union of countably many open intervals in [0. where Q := Q ∩ [0. we obtain (v). (vi) with (vii).3. But as each fn is an unsigned simple 1 function. To obtain the equivalence of (v) and (vi). we see that (ix) follows from (v)(viii). we let fn (x) be deﬁned to be the largest integer multiple of 2−n that is less than or equal to min(f (x). +∞] and {x ∈ Rd : f (x) > λ} = λ + ∈Q+ :λ {x ∈ Rd : f (x) ≥ λ } >λ λ ∈ [0. Let f obey (v)(xi). the sets {x ∈ Rd : fn (x) > λ + M } are Lebesgue measurable. Conversely. (ix) implies (x). The equivalence of (v). For each positive integer n. and modifying a Lebesgue measurable set on a null set produces another Lebesgue measurable set. this set in turn is equal to 1 {x ∈ Rd : fn (x) > λ + } M M >0 N >0 n≥N 59 outside of a set of measure zero. +∞ separately if necessary). (viii) comes from the observation that {x ∈ Rd : f (x) ≤ λ} is the complement of {x ∈ Rd : f (x) > λ}. A similar argument shows that (x) and (xi) are equivalent. . observe that {x ∈ Rd : f (x) ≥ λ} = λ ∈Q+ :λ {x ∈ Rd : f (x) > λ } <λ for λ ∈ (0. and so all of (v)(ix) are now shown to be equivalent.1. Clearly (x) implies (vii). and hence (v)(ix).
or limit inferior of unsigned measurable functions is unsigned measurable.3. by extending that function arbitrarily to the null set where it is currently undeﬁned. (iv) Show that an unsigned function that is equal almost everywhere to an unsigned measurable function. With these equivalent formulations.3(iv).60 1. +∞] is measurable and φ : [0. show that φ ◦ f : Rd → [0. fn is a simple function. and by construction it is bounded and has ﬁnite measure support. +∞] → [0. rather than everywhere on Rd . we can now generate plenty of measurable functions: Exercise 1. Furthermore. then f is also measurable. Measure theory From construction it is easy to see that the fn : Rd → [0. +∞] is continuous. inﬁmum. The claim follows. −1 the set fn (c) takes the form f −1 (Ic ) ∩ {x ∈ Rd : x ≤ n} for some interval or ray Ic . (v) Show that if a sequence fn of unsigned measurable functions converges pointwise almost everywhere to an unsigned limit f . limit superior. is itself measurable. (iii) Show that the supremum. (i) Show that every continuous function f : Rd → [0. +∞] are increasing and have f as their supremum. one can deﬁne the concept of measurability for an unsigned function that is only deﬁned almost everywhere on Rd . and is thus measurable. In view of Exercise 1. +∞] is measurable. (vi) If f : Rd → [0. g are unsigned measurable functions.3. and for each nonzero value c it attains. (ii) Show that every unsigned simple function is measurable. As a consequence. show that f + g and f g are measurable. +∞] is measurable. . (vii) If f.3. each fn takes on only ﬁnitely many values.
we let C be the Cantor set ∞ C := { j=1 aj 3−j : aj ∈ {0. Let f : Rd → [0. and is bijective on the set A of nonterminating decimals in [0. On the other hand. Let f : Rd → [0. and so has ∞ a unique binary expansion x = j=1 bj 2−j for some bj ∈ {0. If we set E := f (F ). and so the inverse image of a Lebesgue measurable set by a measurable function need not remain Lebesgue measurable.3. but we will wait until Exercise 1. we caution that it is not necessarily the case that f −1 (E) is Lebesgue measurable if E is Lebesgue measurable. The Lebesgue integral 61 Exercise 1.3. 1] is not a terminating binary decimal. Show that the region {(x. (There is a converse to this statement.6. +∞] is measurable.3. +∞] be the function deﬁned by setting ∞ f (x) := j=1 2bj 3−j whenever x ∈ [0. 2} for all j} and let f : R → [0.9 tells us that if f : Rd → [0. by modifying the construction from the previous notes.9. Lemma 1. but f −1 (E) = F is nonmeasurable.3.) Remark 1. Using Lemma 1. Show that f is a bounded unsigned measurable function if and only if f is the uniform limit of bounded simple functions.23) available to us. To see this. However. . +∞]. it is not diﬃcult to show that f is measurable. 1]. t) ∈ Rd × R : 0 ≤ t ≤ f (x)} is a measurable subset of Rd+1 . then E is a subset of the null set C and is thus itself a null set. Exercise 1. Show that an unsigned function f : Rd → [0.10.4. 1}.1.3.7. +∞] be an unsigned measurable function.24 to prove it. and f (x) := 0 otherwise. We thus see that f takes values in C.7. once we have the FubiniTonelli theorem (Corollary 1.3. +∞] is a simple function if and only if it is measurable and takes on at most ﬁnitely many values. we can ﬁnd a subset F of A which is nonmeasurable. Exercise 1.5.3. then f −1 (E) is Lebesgue measurable for many classes of sets E.
3. From the above exercise. . if it is the pointwise almost everywhere limit of complexvalued simple functions. An almost everywhere deﬁned complexvalued function f : Rd → C is Lebesgue measurable.3.8. see Exercise 1. (i) Show that every continuous function f : Rd → C is measurable. Exercise 1. or measurable for short. to allow for the possibility that the function becomes singular or otherwise undeﬁned on a null set. Let f : Rd → C be an almost everywhere deﬁned complexvalued function. (v) f −1 (K) is Lebesgue measurable for every closed set K ⊂ C. it will be convenient to allow for such functions to only be deﬁned almost everywhere. rather than everywhere. +∞) = [0. (iv) f −1 (U ) is Lebesgue measurable for every open set U ⊂ C. namely Borel measurability. (ii) Show that a function f : Rd → C is simple if and only if it is measurable and takes on at most ﬁnitely many values.11 (Complex measurability). Then the following are equivalent: (i) f is measurable. Measure theory However.7.62 1. we see that the notion of complexvalued measurability and unsigned measurability are compatible when applied to a function that takes values in [0. As before. (iii) The (magnitudes of the) positive and negative parts of Re(f ) and Im(f ) are unsigned measurable functions. Deﬁnition 1. As discussed earlier.4. there are several equivalent deﬁnitions: Exercise 1.3.29(iii). (ii) f is the pointwise almost everywhere limit of complexvalued simple functions. Now we can deﬁne the concept of a complexvalued measurable function. +∞] ∩ C everywhere (or almost everywhere). we will later see that it is still true that f −1 (E) is Lebesgue measurable if E has a slightly stronger measurability property than Lebesgue measurability.
+∞]. show that f + g and f g are measurable.12 (Lower unsigned Lebesgue integral). g are measurable functions.g sup Simp simple g(x) dx Rd where g ranges over all unsigned simple functions g : Rd → [0. We are now ready to integrate unsigned measurable functions. then f is measurable. Show that if one extends f to all of R by deﬁning f (x) = 0 for x ∈ [a. One can also deﬁne the upper unsigned Lebesgue integral f (x) dx := Rd h≥f . Let f : Rd → [0. then f is also measurable. 1.3. Unsigned Lebesgue integrals. We deﬁne the lower unsigned Lebesgue integral Rd f (x) dx to be the quantity f (x) dx := Rd 0≤g≤f .9.3. (iv) Show that if a sequence fn of complexvalued measurable functions converges pointwise almost everywhere to an complexvalued limit f . Let f : [a. +∞] be an unsigned function (not necessarily measurable). and that the upper Lebesgue integral is always at least as large as the lower Lebesgue integral. (v) If f : Rd → C is measurable and φ : C → C is continuous.3.1. Exercise 1. Note that both integrals take values in [0. +∞] that are pointwise bounded by f . The Lebesgue integral 63 (iii) Show that a complexvalued function that is equal almost everywhere to an measurable function. (vi) If f.3. show that φ ◦ f : Rd → C is measurable. b].h inf Simp simple h(x) dx Rd but we will use this integral much more rarely. is itself measurable. g is required to be bounded by f pointwise everywhere. We begin with the notion of the lower unsigned Lebesgue integral. which can be deﬁned for arbitrary unsigned functions (not necessarily measurable): Deﬁnition 1. In the deﬁnition of the lower unsigned Lebesgue integral. b] → R be a Riemann integrable function. but it is easy to .3.
then Rd Rd f (x) dx = g(x) dx and Rd f (x) dx = Rd g(x) dx.11 one has m(E ∩ {x : x ≤ n}) → m(E) for any measurable set E.2. then Simp Rd f (x) + g(x) dx = Rd f (x) dx + Rd g(x) dx. +∞). (The claim unfortunately fails for c = +∞. Let f. Rd (ii) (Monotonicity) If f ≤ g pointwise almost everywhere. one has f (x)1E (x) dx + Rd f (x)1Rd \E (x) dx.) (iv) (Equivalence) If f. then Rd cf (x) dx = c Rd f (x) dx.3.e.64 1. since the simple integral is not aﬀected by modiﬁcations on sets of measure zero. then f (x) dx = Rd f (x) dx = Simp Rd f (x) dx. g : Rd → [0. Measure theory see that one could also require g to just be bounded by f pointwise almost everywhere without aﬀecting the value of the integral. Rd (iii) (Homogeneity) If c ∈ [0. n) dx (ix) (Vertical truncation) As n → ∞. but this is somewhat tricky to show. . Rd (viii) (Horizontal truncation) As n → ∞. (i) (Compatibility with the simple integral) If f is simple. it is absolutely integrable). Rd f (x)1x≤n dx converges to Rd f (x) dx.10 (Basic properties of the lower Lebesgue integral). f (x) dx+ (vi) (Subadditivity of upper integral) g(x) dx Rd f (x)+g(x) dx ≤ Rd (vii) (Divisibility) For any measurable set E. Hint: From Exercise 1. converges to Rd f (x) dx. +∞] be unsigned functions (not necessarily measurable). g agree almost everywhere. (x) (Reﬂection) If f + g is a simple function that is bounded with ﬁnite measure support (i. The following properties of the lower Lebesgue integral are easy to establish: Exercise 1. Rd (v) (Superadditivity) Rd f (x)+g(x) dx ≥ Rd f (x) dx+ Rd g(x) dx. Rd Rd f (x) dx = min(f (x). then f (x) dx ≤ Rd g(x) dx and Rd f (x) dx ≤ Rd g(x) dx.
Show that the lower and upper Lebesgue integrals of f agree. g : Rd → [0. we may assume that f.14 (Finite additivity of the Lebesgue integral). If f : Rd → [0.3.11. and vanishing outside of a set of ﬁnite measure. Let f : Rd → [0.) One nice feature of measurable functions is that the lower and upper Lebesgue integrals can match. From the vertical truncation property and another limiting argument. if one also assumes some boundedness: Exercise 1. From the horizontal truncation property and a limiting argument.3.13 (Unsigned Lebesgue integral).3.11.1. (Hint: use Exercise 1. +∞] is measurable. but we will defer it to later notes.3. we now see that the lower and upper Lebesgue integrals of f . In the next section we will improve this ﬁnite additivity property for the unsigned Lebesgue integral further.3. we deﬁne the unsigned Lebesgue integral Rd f (x) dx of f to equal the lower unsigned Lebesgue integral Rd f (x) dx. we may assume that f. Deﬁnition 1. +∞] be measurable.3. The Lebesgue integral 65 Do the horizontal and vertical truncation properties hold if the lower Lebesgue integral is replaced with the upper Lebesgue integral? Now we restrict attention to measurable functions. What happens if f is allowed to be unbounded. bounded. (For nonmeasurable functions. or is not supported inside a set of ﬁnite measure? This gives an important corollary: Corollary 1. and f + g agree. g. Let f. to countable additivity. Rd Proof. we leave the unsigned Lebesgue integral undeﬁned. . g are bounded. +∞] be measurable. g are supported inside a bounded set.) There is a converse to this statement. From Exercise 1. Then Rd f (x) + g(x) dx = f (x) dx + Rd g(x) dx.4. The claim now follows by combining the superadditivity of the lower Lebesgue integral with the subadditivity of the upper Lebesgue integral.
but not quite as convenient for establishing basic properties such as additivity. Show that Rd f (T −1 (x)) dx =  det T  Rd f (x) dx. then f (x) dx = Simp Rd f (x) dx.44). or 1 equivalently Rd f (T x) dx =  det T  Rd f (x) dx. . Rd 1E (x) dx = m∗ (E). +∞]: (i) (Compatibility with the simple integral) If f is simple. Show that the Lebesgue integral f → Rd f (x) dx is the only map from measurable unsigned functions f : Rd → [0. Show that for any set E ⊂ Rd . +∞] is measurable. n) dx f (x)1x≤n dx con Exercise 1. Show that Rd f (x+y) dx = Rd f (x) dx for any y ∈ Rd .3. g : Rd → [0. Rd (ii) (Finite additivity) g(x) dx.) (Hint: use Exercise 1. t) ∈ Rd × R : 0 ≤ t ≤ f (x)}.22.12 (Upper Lebesgue integral and outer Lebesgue measure). Rd Rd f (x) + g(x) dx = Rd f (x) dx + (iii) (Horizontal truncation) As n → ∞. Exercise 1.3. show that Rd f (x) dx is equal to the d + 1dimensional Lebesgue measure of the region {(x. verges to Rd f (x) dx. If f : Rd → [0. and more geometrically intuitive. Conclude that the upper and lower Lebesgue integrals are not necessarily additive if no measurability hypotheses are assumed. it is a more convenient formulation for establishing the basic convergence theorems. (This can be used as an alternate.3. +∞] that obeys the following properties for measurable f.13 (Area interpretation of integral).14 (Uniqueness of the Lebesgue integral). Exercise 1.) Exercise 1. Let f : Rd → [0.15 (Translation invariance). +∞] be measurable.3. Exercise 1.4. (iv) (Vertical truncation) As n → ∞.2. and let T : Rd → Rd be an invertible linear transformation.16 (Linear change of variables).66 1. +∞] to [0. deﬁnition of the unsigned Lebesgue integral.3. +∞] be measurable. Measure theory this property is also known as the monotone convergence theorem (Theorem 1. Let f : Rd → [0. converges to Rd f (x) dx. Rd Rd min(f (x).
17 (Compatibility with the Riemann integral). known as Markov’s inequality.1. +∞] be measurable. we obtain the following important corollary: Exercise 1. b] → [0. Let f : Rd → [0. Remark 1. one has 1 m({x ∈ Rd : f (x) ≥ λ}) ≤ f (x) dx. The Lebesgue integral 67 Exercise 1. b]. We record a basic inequality. If we extend f to R by declaring f to equal zero outside of [a.3. The use of the integral Rd f (x) dx to control the distribution of f is known as the ﬁrst moment method. that asserts that the Lebesgue integral of an unsigned measurable function controls how often that function can be large: Lemma 1. By sending λ to inﬁnity or to zero. +∞] be Riemann integrable.3.3. +∞] be measurable.3. One can also control this distribution using higher moments such as Rd f (x)p dx for various values of p. . Give a counterexample to show that the converse statement is false. then f is ﬁnite almost everywhere. show that R f (x) dx = b a f (x) dx.3. Let f : [a. such moment methods are fundamental to probability theory. (i) Show that if Rd f (x) dx < ∞.15 (Markov’s inequality). Then for any 0 < λ < ∞.18. λ Rd Proof. Let f : Rd → [0. We have the trivial pointwise inequality λ1{x∈Rd :f (x)≥λ} ≤ f (x). we conclude that λm({x ∈ Rd : f (x) ≥ λ}) ≤ Rd f (x) dx and the claim follows.16. (ii) Show that Rd f (x) dx = 0 if and only if f is zero almost everywhere. From the deﬁnition of the lower Lebesgue integral. or exponential moments such as Rd etf (x) dx or the Fourier moments Rd eitf (x) dx for various values of t.
4. One can attempt to deﬁne integrals for nonabsolutely∞ integrable functions. and complexvalued Lebesgue integrals deﬁned in this manner are compatible on their common domains of deﬁnition. Absolute integrability. Note from construction that the absolutely integrable Lebesgue integral extends the absolutely integrable simple integral. Deﬁnition 1. −∞ f (x) dx := limR→∞ −R f (x) dx one sees in the classical onedimensional Riemannian theory. An almost everywhere deﬁned measurable function f : Rd → C is said to be absolutely integrable if the unsigned integral f L1 (Rd ) := Rd f (x) dx is ﬁnite. and use L1 (Rd ) or L1 (Rd → C) to denote the space of absolutely integrable functions.3. Remark 1. we can now deﬁne the absolutely convergent Lebesgue integral. f− := max(−f.v. Measure theory 1. If f is complexvalued and absolutely integrable. we deﬁne the Lebesgue integral Rd f (x) dx by the formula (1. realvalued. Having set out the theory of the unsigned Lebesgue integral.17 (Absolute integrability). While one can certainly generate any number of such extensions of the Lebesgue integral concept. f− are pointwise dominated by f ). analogous to the improper integrals 0 f (x) dx := R ∞ limR→∞ 0 f (x) dx or the principal value integrals p. 0).3. It is easy to see that the unsigned. If f is realvalued and absolutely integrable. such extensions tend R . We refer to this quantity f L1 (Rd ) as the L1 (Rd ) norm of f . which is now redundant and will not be needed any further in the sequel.18. we deﬁne the Lebesgue integral Rd f (x) dx by the formula f (x) dx := Rd Rd Re f (x) dx + i Rd Im f (x) dx where the two integrals on the right are interpreted as realvalued absolutely integrable Lebesgue integrals.3. as f+ .68 1.12) Rd f (x) dx := Rd f+ (x) dx − Rd f− (x) dx where f+ := max(f. 0) are the magnitudes of the positive and negative components of f (note that the two unsigned integrals on the righthand side are ﬁnite.
Given two functions f. we conclude the L1 triangle inequality (1. dL1 is only a semimetric (also known as a pseudometric) rather than a metric. As such. (Later on. As such. (The L1 norm is then a seminorm on this space. Thanks to (1. From the pointwise triangle inequality f (x) + g(x) ≤ f (x) + g(x).3. with one exception: it is possible for two diﬀerent functions f.13) f +g L1 (Rd ) ≤ f L1 (Rd ) + g L1 (Rd ) for any almost everywhere deﬁned measurable f. if they agree almost everywhere. we see that L1 (Rd → C) is a complex vector space. g : Rd → C. see §1.) From Exercise 1.3.3 of An epsilon of room. such as change of variables or exchanging limits and integrals. if one adopts the convention that any two functions that agree almost everywhere are considered equivalent (or more formally. It is also easy to see that cf L1 (Rd ) = c f L1 (Rd ) for any complex number c. then one recovers a genuine metric. g ∈ L1 (Rd → C) to have a zero L1 distance. Vol. we can deﬁne the L1 distance dL1 (f. if and only if f is zero almost everywhere. g) between them by the formula dL1 (f. we will establish the important fact that this metric makes the (quotient space) L1 (Rd ) a .13). However. and instead deal with such exotic integrals on an ad hoc basis. f L1 (Rd ) = 0. g ∈ L1 (Rd → C). g) := f − g L1 (Rd ) .1. so it is usually not worthwhile to try to set up a systematic theory for such nonabsolutelyintegrable integrals that is anywhere near as complete as the absolutely integrable theory. this distance obeys almost all the axioms of a metric on L1 (Rd ). one works in the quotient space of L1 (Rd ) by the equivalence relation of almost everywhere agreement. I. The Lebesgue integral 69 to be poorly behaved with respect to various important operations. which by abuse of notation is also denoted L1 (Rd )).18 we make the important observation that a function f ∈ L1 (Rd → C) has zero L1 norm.
Show that Exercises 1. absolutely integrable) if its extension . Exercise 1.70 1.16.15. a fact known as the L1 RieszFischer theorem.3. Also establish the identity f (x) dx = Rd Rd f (x) dx.3. if f : E → C is a function.3. absolutely integrable functions rather than for unsigned measurable functions. We can localise the absolutely convergent integral to any measurable subset E of Rd .) The linearity properties of the unsigned integral induce analogous linearity properties of the absolutely convergent Lebesgue integral: Exercise 1. Exercise 1. but a *linear operation.3. Let (cn )n∈Z be a doubly inﬁnite sequence of complex numbers. show that f (x) + g(x) dx = Rd Rd f (x) dx + Rd g(x) dx and cf (x) dx = c Rd Rd f (x) dx for all absolutely integrable f. and 1. in which case one has R f (x) dx = n∈Z cn .21 (Absolute summability is a special case of absolute integrability). In Rd other words. Indeed. g : Rd → C and complex numbers c. 1. Show that f is absolutely integrable if and only if the series n∈Z cn is absolutely convergent.3.n+1) (x) = c x where x is the greatest integer less than x.3.20. we say that f is measurable (resp.17 also hold for complexvalued. Measure theory complete metric space.19 (Integration is linear). Show that integration f → f (x) dx is a (complex) linear operation from L1 (Rd ) to C. and let f : R → C be the function f (x) := n∈Z cn 1[n. this completeness is one of the main reasons we spend so much eﬀort setting up Lebesgue integration theory in the ﬁrst place. which makes integration not just a linear operation.
17 tells us that b f (x) dx = a [a. Proof.1. We will study the properties of the absolutely convergent Lebesgue integral in more detail in later notes.22. Note that for any complex number z. for instance. we record one very basic inequality: Lemma 1.19 (Triangle inequality). one cannot argue quite so simply. For now. To do better. we have  Rd f (x) dx = eiθ Rd f (x) dx = Rd eiθ f (x) dx . absolutely integrable). Exercise 1. F are disjoint measurable subsets of Rd . where f (x) is deﬁned to equal f (x) when x ∈ E and zero otherwise.3. b] → C.3.3. The Lebesgue integral 71 ˜ ˜ f : Rd → C is measurable (resp. the absolutely integrable analogue of Exercise 1. then f  = f+ +f− and the claim is obvious from (1.12). a naive mimicking of the realvalued argument would lose a factor of 2.3. If f is realvalued. as follows. Let f ∈ L1 (Rd → C). When f is complexvalued. one can write z as zeiθ for some real θ. giving the inferior bound  Rd f (x) dx ≤ 2 Rd f (x) dx. we exploit the phase rotation invariance properties of the absolute value operation and of the integral. In particular. show that f (x) dx = E E∪F f (x)1E (x) dx and f (x) dx + E F f (x) dx = E∪F f (x) dx. and f : E ∪ F → C is absolutely integrable. If E. as a special case of the more general Lebesgue integration theory on abstract measure spaces. Thus.b] f (x) dx for any Riemannintegrable f : [a. and then we ˜ deﬁne E f (x) dx := Rd f (x) dx. Then  Rd f (x) dx ≤ Rd f (x) dx.
this is an important fact for the theory of distributions. compactly supported functions. Theorem 1. (i) There exists an absolutely integrable simple function g such that f − g L1 (Rd ) ≤ ε. Vol.3. Various manifestations of the ﬁrst principle were given in Exercise 1. Now we turn to the second principle. I it is shown that a similar statement holds if one replaces continuous. (ii) There exists a step function g such that f − g L1 (Rd ) ≤ ε. the step functions. and (iii) Every (pointwise) convergent sequence of functions is nearly uniformly convergent. we obtain the claim. Deﬁne a step function to be a ﬁnite linear combination of indicator functions 1B of boxes B. Taking real parts of both sides.2. Measure theory for some real θ.16. .2.13 of An epsilon of room.20 (Approximation of L1 functions). Littlewood’s three principles are informal heuristics that convey much of the basic intuition behind the measure theory of Lebesgue.5. and the continuous. also known as test functions. In §1. Brieﬂy. (iii) There exists a continuous. compactly supported functions with smooth.72 1.3. To put things another way. we obtain  Rd f (x) dx = Rd Re(eiθ f (x)) dx.7 and Exercise 1. (ii) Every (absolutely integrable) function is nearly continuous. compactly supported functions are all dense subsets of L1 (Rd ) with respect to the L1 (Rd ) (semi)metric. compactly supported g such that f − g L1 (Rd ) ≤ ε. Littlewood’s three principles. the three principles are as follows: (i) Every (measurable) set is nearly a ﬁnite sum of intervals. the absolutely integrable simple functions. 1. Since Re(eiθ f (x)) ≤ eiθ f (x) = f (x). Let f ∈ L1 (Rd ) and ε > 0.
1. . The Lebesgue integral 73 Proof. The case when f is realvalued then follows by splitting f into positive and negative parts (and adjusting ε as necessary). and the claim follows. we see from (ii) and the argument from the preceding paragraph that it suﬃces to show this when f = 1E is the indicator function of a box. This gives (i) when f is unsigned. To establish part (iii). I ). By linearity (and more applications of the triangle inequality).2. To establish part (ii). 0) for some suﬃciently large R. For now. It is then clear from construction that f − g L1 (Rd ) ≤ ε as required. Indeed. we turn to Littlewood’s third principle. But then. it then suﬃces to show this when f = 1E is the indicator function of a measurable set E ⊂ Rd of ﬁnite measure. one can set g(x) = max(1 − R dist(x. we see from (i) and the triangle inequality in L1 that it suﬃces to show this when f is an absolutely integrable simple function. we see from the deﬁnition of the lower Lebesgue integral that there exists an unsigned simple function g such that g ≤ f (so.3.16). E). by Exercise (1. we return to this point shortly. see §1. We recall three basic ways in which a sequence fn : Rd → C of functions can converge to a limit f : Rd → C: (i) (Pointwise convergence) fn (x) → f (x) for every x ∈ Rd .10 of An epsilon of room. and the case when f is complexvalued then follows by splitting f into real and imaginary parts (and adjusting ε yet again). but has a volume at most ε more than that of E. g is absolutely integrable) and g(x) dx ≥ Rd Rd f (x) dx − ε.g. This is not the only way to make Littlewood’s second principle manifest. But one can then establish the claim by direct construction. We begin with part (i). When f is unsigned. in particular. then one can directly construct a piecewise linear continuous function g supported on F that equals 1 on E (e. which by linearity implies that f − g L1 (Rd ) ≤ ε. if one makes a slightly larger box F that contains the closure of E in its interior. Vol. one may also invoke Urysohn’s lemma. such a set can be approximated (up to an error of measure at most ε) by an elementary set.
there exists an open neighbourhood U of x0 such that fn converges uniformly to f on U . A sequence of functions fn : Rd → C converges locally uniformly to a limit f : Rd → C if. the adverb “locally” in mathematics is usually used in this fashion. Remark 1. there is an open neighbourhood of x0 in X on which P holds. that is weaker than uniform convergence but stronger than pointwise convergence: Deﬁnition 1. Uniform convergence implies pointwise convergence. The equivalence of the two deﬁnitions is immediate from the HeineBorel theorem.21 (Locally uniform convergence).3. The partial sums n=0 n! of the Taylor series ∞ xn x x e = n=0 n! converges to e locally uniformly (and hence pointwise) on R. Measure theory (ii) (Pointwise almost everywhere convergence) fn (x) → f (x) for almost every x ∈ Rd .3.24. We now add a fourth mode of convergence.23. N n . which in turn implies pointwise almost everywhere convergence. the boundedset notion of local uniform convergence is not equivalent to the openset notion of local uniform convergence (though. The functions x → x/n on R for n = 1. In other words. but do not converge uniformly. (iii) (Uniform convergence) For every ε > 0. for every bounded E ⊂ Rd and every ε > 0. converge locally uniformly (and hence pointwise) to zero on R. an equivalent definition of local uniform convergence is: fn converges locally uniformly to f if. one can recover equivalence of one replaces “bounded” by “compact”). At least as far as Rd is concerned. fn converges uniformly to f on E. 2. x Example 1. One should caution. that on domains on which the HeineBorel theorem does not hold. for every point x0 ∈ Rd . Example 1.74 1. for every bounded subset E of Rd .3. More generally. but not uniformly.3.22. there exists N > 0 such that fn (x) − f (x) ≤ ε for all n ≥ N and x ∈ E. . for every point x0 in that domain. . . a propery P is said to hold locally on some domain X if. though. there exists N such that fn (x) − f (x) ≤ ε for all n ≥ N and all x ∈ Rd . for locally compact spaces.
1. . the distinction is not too important since one local uniform convergence in both senses. Then there exists a Lebesgue measurable set A of measure at most ε.) Proof. . we see that pointwise convergence (either everywhere or almost everywhere) is a weaker concept than local uniform convergence. We can rewrite this fact settheoretically as ∞ EN. even if it does not do so in the boundedset sense. (with the convention that fn (0) = 0) converge pointwise everywhere to zero. .3.25 demonstrates that the exceptional set A in Egorov’s theorem cannot be taken to have zero measure.3. asserts that one can recover local uniform convergence as long as one is willing to delete a set of small measure: Theorem 1.m := {x ∈ Rd : fn (x) − f (x) > 1/m for some n ≥ N }. By modifying fn and f on a set of measure zero (that can be absorbed into A at the end of the argument) we may assume that fn converges pointwise everywhere to f . but do not converge locally uniformly. and let ε > 0.3.3. so for the purposes of applying Egorov’s theorem. a remarkable theorem of Egorov. such that fn converges locally uniformly to f outside of A. (If one instead takes the “open neighbourhood” deﬁnition. Note that Example 1. where EN.25 does converge locally uniformly on R\{0} in the open neighbourhood sense. which demonstrates Littlewood’s third principle.3. The functions fn (x) := nx 1x>0 for n = 1.26 (Egorov’s theorem). thus for every x ∈ Rd and m > 0 there exists N ≥ 0 such that fn (x) − f (x) ≤ 1/m for all n ≥ N . Let fn : Rd → C be a sequence of measurable functions that converge pointwise almost everywhere to another function f : Rd → C.21. From the preceding example. but not conversely.3. The Lebesgue integral 75 1 Example 1. On a domain such as Rd \A. 2. boundedset locally uniform convergence implies openneighbourhood locally uniform convergence. Nevertheless.m = ∅ N =0 for each m. .25. at least if one uses the boundedset deﬁnition of local uniform convergence from Deﬁnition 1. then the sequence in Example 1.
g.11(ii)) we conclude that. Now let A := m=1 ENm .m ∩ B(0. R)) = 0. Applying downward monotone convergence (Exercise 1. if all the fn and f were supported on a ﬁxed set E of ﬁnite measure (e. By construction. it is easy to see from the above argument that one can recover uniform convergence (and not just locally uniform convergence) outside of a set of arbitrarily small measure. it must contain the intervals [n. the claim follows. This sequence converges pointwise (and locally uniformly) to the zero function f ≡ 0. m0 )\A. and are decreasing in N . m0 ) with an integer radius. then that set A has to contain a set of measure 1. x ∈ Rd \A. ∞ . we have fn (x) − f (x) ≤ 1/m whenever m ≥ 1. one cannot in general upgrade local uniform convergence to uniform convergence in Egorov’s theorem. Unfortunately. we have fn (x) − f (x) > ε on a set of measure 1.27. In fact. namely on the interval [n. fn converges uniformly to f on B(0. m). on a ball B(0. we see for any ball B(0. m(A) ≤ ε. However. n + 1] for all suﬃciently large n and must therefore have inﬁnite measure. In particular.m are Lebesgue measurable. then the above “escape to horizontal inﬁnity” cannot occur. Measure theory It is clear that the EN. and by countable subadditivity.m ∩ B(0.76 1. Since every bounded set is contained in such a ball. (The restriction to the ball B(0. one has N →∞ lim m(EN. Thus. and n ≥ Nm .3. R) is necessary. if one wanted fn to converge uniformly to f outside of a set A.n+1] on R. for any m ≥ 1. for any 0 < ε < 1 and any n. we can ﬁnd Nm such that ε m(EN.2. because the downward monotone convergence property only works when the sets involved have ﬁnite measure. Remark 1. A basic example here is the moving bump example fn := 1[n. However. Then A is Lebesgue measurable. R)). which “escapes to horizontal inﬁnity”. x ≤ m. for any radius R > 0. n + 1].) In particular.m ∩ B(0. m)) ≤ m 2 for all N ≥ Nm .
and then relaxed further to that of being measurable (but still ﬁnite everywhere or almost everywhere). and fn converges uniformly to f outside of A. (Hint: if f : Rd → C is measurable and n ≥ 1. Show that a function f : Rd → C is measurable if and only if it is the pointwise almost everywhere limit of continuous functions fn : Rd → C.3. we conclude that A is Lebesgue measurable with measure at most ε/2.20 to give another version of Littlewood’s second principle. n) : f (x) − fn (x) ≥ 1/n} has measure at . alternatively. one can replace f locally with a horizontal truncation f 1f ≤n . Exercise 1. Then there exists a Lebesgue measurable set E ⊂ Rd of measure at most ε such that the restriction of f to the complementary set Rd \E is continuous on that set. as required.3. but on the other hand.23. Caution: this theorem does not imply that the unrestricted function f is continuous on Rd \E. absolutely integrable on every bounded set). known as Lusin’s theorem: Theorem 1. then the restriction of f to R\E is identically zero and thus continuous. for any n ≥ 1 one can ﬁnd a continuous. (To achieve the latter goal. By Theorem 1. For instance.20.3. Let f : Rd → C be absolutely integrable. and the same is true for local uniform limits (because continuity is itself a local property). the absolutely integrable function 1Q : R → C is nowhere continuous.3.e. But the uniform limit of continuous functions is continuous.3. show that there exists a continuous function fn : Rd → C for which the set {x ∈ B(0.15). compactly supported function fn such that f − fn L1 (Rd ) ≤ ε/4n (say).28 (Lusin’s theorem). Proof. By Markov’s inequality (Lemma 1. if one deletes the measure zero set E := Q from the reals.1.3.3. The Lebesgue integral 77 We now use Theorem 1. so is certainly not continuous on R\E for any E of ﬁnite measure. one can replace f with a bounded variant.) Exercise 1. Letting A := n=1 An . Show that the hypothesis that f is absolutely integrable in Lusin’s theorem can be relaxed to being locally absolutely integrable (i. We conclude that the restriction f to Rd \E is continuous. and let ε > 0. that implies that f (x)−fn (x) ≤ 1/2n−1 for all x outside of a Lebesgue measurable set ∞ An of measure at most ε/2n+1 .24. such as (1+ff2 )1/2 .
+∞] with the extended topology). then Lusin’s theorem applies (since one can simply zero out f on the null set where it is inﬁnite. Remark 1. if one knows already that f is almost everywhere ﬁnite (which is for instance the case when f is absolutely integrable). However. R)\E.10 of An epsilon of room. .3. Show that there exists a measurable set E ⊂ Rd of measure at most ε outside of which f is locally bounded.3. I ). Measure theory most 21 .2. or in other words that Rd \B(0. Show that there exists a ball B(0.26) to give an alternate proof of Lusin’s theorem for arbitrary measurable functions.3. Remark 1. but are in a similar spirit. +∞].3.78 1. then Lusin’s theorem does not apply directly because f could be inﬁnite on a set of positive measure. Vol. and let ε > 0. The following facts are not. Theorem 1. R) outside of which f has an L1 norm of at most ε. This is a trivial but important remark: when dealing with unsigned measurable functions such as f : Rd → [0.29.25 (Littlewoodlike principles). (ii) (Measurable functions are almost locally bounded) Let f : Rd → C be a measurable function supported on a set of ﬁnite measure.R) f (x) dx ≤ ε. Exercise 1. which is clearly in contradiction with the conclusion of Lusin’s theorem (unless one allows the continuous function to also take values in the extended nonnegative reals [0.3.25 below to be useful for this. You may ﬁnd Exercise 1. By combining Lusin’s theorem with inner regularity (Exercise 1.30. one can conclude that every measurable function f : Rd → C agrees (outside of a set of arbitrarily small measure) with a continuous function g : Rd → C.15) and the Tietze extension theorem (see §1. and let ε > 0. strictly speaking. instances of any of Littlewood’s three principles. and add that null set to the exceptional set of Lusin’s theorem). or in other words that for every R > 0 there exists M < ∞ such that f (x) ≤ M for all x ∈ B(0.) n Use this (and Egorov’s theorem. (i) (Absolutely integrable functions almost have bounded support) Let f : Rd → C be an absolutely integrable function.
Abstract measure spaces 79 As with Remark 1. say. +∞] one assigns to each measurable set E ∈ B. B is the collection B = L[Rd ] of all Lebesgue measurable subsets of Rd . the measure µ has to obey a number of axioms (most notably. Now. unsigned but took the value +∞ on a set of positive measure. Abstract measure spaces Thus far. These play much the same role in abstract measure theory that metric spaces or topological spaces play in abstract pointset topology. we have only focused on measure and integration theory in the context of Euclidean spaces Rd . The collection B has to obey a number of axioms (e. or that vector spaces play in abstract linear algebra. Similarly. It turns out that in order to properly deﬁne measure and integration on a general space X. One also needs to specify two additional pieces of data: (i) A collection B of subsets of X that one is allowed to measure. 1. and µ(E) is the Lebesgue measure µ(E) = m(E) of E.3. which is a stronger variant of the more wellknown concept of a boolean algebra. in which the Euclidean space Rd is replaced by a more general space X. B. For instance. the triple (X. a countable additivity axiom) in order to obtain a measure and integration theory comparable to the Lebesgue theory on Euclidean spaces. Lebesgue measure theory covers the case when X is a Euclidean space Rd . µ) is known as a measure space. being closed with respect to countable unions) that make it a σalgebra. the result would of course fail if f was. When all these axioms are satisﬁed. On any measure space.4. we will work in a more abstract and general setting. it is important in the second part of the exercise that f is known to be ﬁnite everywhere (or at least almost everywhere). and (ii) The measure µ(E) ∈ [0.g. one can set up the unsigned and absolutely convergent integrals in almost exactly the same way as was done in .1.4.29. it is not enough to just specify the set X.
Measure theory the previous notes for the Lebesgue integral on Euclidean spaces. We have chosen a “minimalist” deﬁnition of a Boolean algebra. or that B is coarser than or a coarsening of B . Boolean algebras. or a reﬁnement of B. We will return to this issue in Section 1.1 (Boolean algebras). A (concrete) Boolean algebra on X is a collection B of X which obeys the following properties: (i) (Empty set) ∅ ∈ B. although the approximation theorems are largely unavailable at this level of generality due to the lack of such concepts as “elementary set” or “continuous function” for an abstract measure space. if E ∈ B. we say that B is ﬁner than. (iii) (Finite unions) If E. Vol. Let X be a set. will not be covered here. I ). namely complement and ﬁnite union. Given two Boolean algebras B.4. a subalgebra of. then E ∪ F ∈ B. and we present these results here. in which one is only assumed to be closed under two of the basic Boolean operations. then the complement E c := X\E also lies in B. it is easy to see that a Boolean algebra is also closed under other .4.80 1. but instead in §1. We begin by recalling the concept of a Boolean algebra. We sometimes say that E is Bmeasurable. Deﬁnition 1. by using the laws of Boolean algebra (such as de Morgan’s laws). F ∈ B. or measurable with respect to B. one does have the fundamental convergence theorems for the subject. if B ⊂ B .7 (although one of the most powerful tools for such constructions. namely the Riesz representation theorem.1.10 of An epsilon of room. One question that will not be addressed much in this section is how one actually constructs interesting examples of measures. However. 1. the monotone convergence theorem and the dominated convergence theorem. On the other hand. (ii) (Complement) If E ∈ B. B on X. namely Fatou’s lemma.
Exercise 1.1. Example 1. Example 1.3 of An epsilon of room. Given any set X. Vol. but coarser than the discrete one.3 (Trivial and discrete algebra). the coarsest Boolean algebra is the trivial algebra {∅.5 (Lebesgue algebra). but for which one has a collection of abstract Boolean operations such as meet ∧ and join ∨ instead of the concrete operations of intersection ∩ and union ∪. We will call this algebra the elementary Boolean algebra of Rd . Remark 1. and symmetric diﬀerence E∆F . X}. So one could have placed these additional closure properties inside the deﬁnition of a Boolean algebra without any loss of generality. Let L[Rd ] be the collection of Lebesgue measurable subsets of Rd . The ﬁnest Boolean algebra is the discrete algebra 2X := {E : E ⊂ X}. Then J [Rd ] is a Boolean algebra that is ﬁner than the elementary algebra.4.4.4. Then L[Rd ] is a Boolean algebra .e. which is codiﬁed by Stone’s theorem. it is convenient to have as minimal a set of axioms as possible. One can also consider abstract Boolean algebras B. but see §2. in which every set is measurable.2. Let J [Rd ] be the collection of subsets of Rd that are either Jordan measurable or coJordan measurable (i. which do not necessarily live in an ambient domain X. the complement of an elementary set). the complement of a Jordan measurable set). or coelementary sets (i. However. when we are verifying that a given collection B of sets is indeed a Boolean algebra.1 (Elementary algebra). I for some further discussion of the relationship between concrete and abstract Boolean algebras. We refer to this algebra as the Jordan algebra on Rd (but caution that there is a completely diﬀerent concept of a Jordan algebra in abstract algebra. in which the only measurable sets are the empty set and the whole set. Abstract measure spaces 81 Boolean algebra operations such as intersection E ∩F .e. All other Boolean algebras are intermediate between these two extremes: ﬁner than the trivial algebra. set diﬀererence E\F . Let E[Rd ] be the collection of those sets E ⊂ Rd that are either elementary sets.4.4 (Jordan algebra).4.) Example 1. Show that E[Rd ] is a Boolean algebra.4. We will not take this abstract perspective here.
The trivial algebra corresponds to the trivial partition X = X into a single atom.2 (Restriction).82 1.8 (Dyadic algebras). A((Aα )α∈I ) is the collection of all sets that can be represented as the union of one or more atoms.6 (Null algebra). coarser) partitions lead to ﬁner (resp. If Y is Bmeasurable.e. i.4. and let Y be a subset of X (not necessarily Bmeasurable). In this deﬁnition. More generally. Let n be an integer.7 (Atomic algebra).4. Example 1. Then N (Rd ) is a Boolean algebra that is coarser than the Lebesgue algebra. and we refer to it as the atomic algebra with atoms (Aα )α∈I . we refer to this as the Lebesgue algebra on Rd . the discrete algebra corresponds to the discrete partition X = x∈X {x} into singleton atoms. Let B be a Boolean algebra on a set X. × id id + 1 . deﬁned as the collection of all the sets E of the form E = α∈J Aα for some J ⊂ I.4. 2n 2n × . coarser) atomic algebra. This is easily veriﬁed to be a Boolean algebra. Example 1. Let X be partitioned into a union X = α∈I Aα of disjoint sets Aα . we refer to it as the null algebra on Rd . 2n 2n . note that ﬁner (resp. at the other extreme. but it is clear that empty atoms have no impact on the ﬁnal atomic algebra.. Example 1. we permit some of the atoms in the partition to be empty. Measure theory that is ﬁner than the Jordan algebra.. and so without loss of generality one can delete all empty atoms and assume that all atoms are nonempty if one wishes. which we refer to as atoms. Then this partition generates a Boolean algebra A((Aα )α∈I ). Let N (Rd ) be the collection of subsets of Rd that are either Lebesgue null sets or Lebesgue conull sets (the complement of null sets).4. Show that the restriction B Y := {E ∩ Y : E ∈ B} of B to Y is a Boolean algebra on Y . The dyadic algebra Dn (Rd ) at scale 2−n in Rd is deﬁned to be the atomic algebra generated by the halfopen dyadic cubes i1 i1 + 1 . show that B Y = B ∩ 2Y = {E ⊂ Y : E ∈ B}. Exercise 1.
14).3.1. as opposed to the larger class of images that can be resolved by a ﬁner resolution monitor.3 we see that there is a onetoone correspondence between ﬁnite Boolean algebras on X and ﬁnite partitions of X into nonempty sets (up to relabeling). many are not.4. Exercise 1.9. discrete. null. Exercise 1. (Hint: argue by contradiction.4. Remark 1. what must the atoms be?) Now we describe some further ways to generate Boolean algebras. More precisely. and Lebesgue.) Conclude that every ﬁnite Boolean algebra has a cardinality of the form 2n for some natural number n.5. Abstract measure spaces 83 of length 2−n (see Exercise 1. show that if X = α∈I Aα = α ∈I Aα are two partitions of X into nonempty atoms Aα . which subdivide space into square pixels. The dyadic algebras are analogous to the ﬁnite resolution one has on modern computer monitors. These are Boolean algebras which are increasing in n: Dn+1 ⊃ Dn . While many Boolean algebras are atomic.4.4. Lebesgue. Show that the intersection α∈I Bα := α∈I Bα of these algebras is still a Boolean algebra.4. and is the ﬁnest . Show that the elementary. and trivial algebras. (A Boolean algebra B is ﬁnite if its cardinality is ﬁnite. From this exercise and Exercise 1. Show that the nonempty atoms of an atomic algebra are determined up to relabeling. as the following two exercises indicate. Jordan. Aα . i. Exercise 1. then A((Aα )α∈I ) = A((Aα )α ∈I ) if and only if exists a bijection φ : I → I such that Aφ(α) = Aα for all α ∈ I. A low resolution monitor (in which each pixel has a large size) can only resolve a very small set of “blocky” images. Show that every ﬁnite Boolean algebra is an atomic algebra.e. indexed by a (possibly inﬁnite or uncountable) label set I. Exercise 1. and null algebras are not atomic algebras.1.6 (Intersection of algebras).4. If these algebras were atomic. Draw a diagram to indicate how these algebras sit in relation to the elementary. there are only ﬁnitely many measurable sets. Jordan.4. Let (Bα )α∈I be a family of Boolean algebras on a set X.4.
7. (Hint: for the latter. . bool Exercise 1.4. Show that if F is a ﬁnite collection of n sets. Let n be a natural number.4. Give an example to show that this bound is best possible. = F.84 1.4.) Deﬁnition 1. Show that the elementary algebra E(Rd ) is generated by the collection of boxes in Rd . F bool is the coarsest Boolean algebra that contains F. (If I is empty. 1}n . ﬁnite sets generate ﬁnite algebras). Let F be any family of sets in X. . then F bool is a ﬁnite Boolean algebra n of cardinality at most 22 (in particular. F is a Boolean algebra if and only if F thus each Boolean algebra is generated by itself. we adopt the convention that α∈I Bα is the discrete algebra.4. it may be convenient to use a discrete ambient space such as the discrete cube X = {0. recursively as follows: (i) F0 := F.) The Boolean algebra F of F as follows: bool can be described explicitly in terms Exercise 1. F2 . Equivalently. which is again a Boolean algebra by Exercise 1. Let F be a collection of sets in a set X. Measure theory Boolean algebra that is coarser than all of the Bα .10 (Generation of algebras).4. Deﬁne the sets F0 .4. . (ii) For each n ≥ 1. We say that F bool is the Boolean algebra generated by F. F1 . Example 1. . or the complement of such a union.11.6. we deﬁne Fn to be the collection of all sets that either the union of a ﬁnite number of sets in Fn−1 (including the empty union ∅). We deﬁne F bool to be the intersection of all the Boolean algebras that contain F. Exercise 1.8.9 (Recursive description of a generated Boolean algebra). Show that F bool = ∞ n=0 Fn .
4.11. and must be improved to a countable union axiom: Deﬁnition 1. (ii) (Complement) If E ∈ B.4. or Fσ set (a countable union of closed sets) for other instances of this preﬁx. we see that σalgebras are closed under countable intersections as well as countable unions. . In particular.13.10. . Show that all atomic algebras are σalgebras. Exercise 1.4. Show that the Lebesgue and null algebras are σalgebras.4. then ∞ n=1 En ∈ B. . the ﬁnite union axiom of a Boolean algebra is insuﬃcient. the discrete algebra and trivial algebra are σalgebras. Exercise 1. Other instances of this preﬁx include a σcompact topological space (a countable union of compact sets). ∈ B. From de Morgan’s law (which is just as valid for inﬁnite unions and intersections as it is for ﬁnite ones). we automatically inherit the notion of being measurable with respect to a σalgebra. We refer to the pair (X.12. a σﬁnite measure space (a countable union of sets of ﬁnite measure). . as are the ﬁnite algebras and the dyadic algebras on Euclidean spaces. In order to obtain a measure and integration theory that can cope well with limits. Let X be a set. The preﬁx σ usually denotes “countable union”. Remark 1. By padding a ﬁnite union into a countable union by using the empty set.4.2) is again a σalgebra on the subspace Y . but the elementary and Jordan algebras are not.4. then the complement E c := X\E also lies in B. Show that any restriction B Y of a σalgebra B to a subspace Y of X (as deﬁned in Exercise 1. σalgebras and measurable spaces. Abstract measure spaces 85 1. (iii) (Countable unions) If E1 .4. we see that every σalgebra is automatically a Boolean algebra.12 (Sigma algebras). E2 .1.2. B) of a set X together with a σalgebra on that set as a measurable space. Thus. A σalgebra on X is a collection B of X which obeys the following properties: (i) (Empty set) ∅ ∈ B. or of one σalgebra being coarser or ﬁner than another. Exercise 1.4.
(ii) P (E) is true for all E ∈ F. E2 . we have the trivial inclusion F bool ⊂ F . From the deﬁnitions.6: Exercise 1. Let F be any family of sets in X. we have a notion of generation: Deﬁnition 1. (iv) If E1 .4. We say that F is the σalgebra generated by F.4. . Since every σalgebra is a Boolean algebra.15. .14 (Generation of σalgebras). Measure theory There is an exact analogue of Exercise 1. Indeed. it only holds if and only if F bool is a σalgebra. which is again a σalgebra by Exercise 1. the set of all E for which P (E) holds is a σalgebra that contains F. For instance. equality need not hold. F is the coarsest σalgebra that contains F. somewhat analogous to the principle of mathematical induction: if F is a family of sets in X. then ∞ P ( n=1 En ) is true also. We now turn to an important example of a σalgebra: . then P (X\E) is true also. it is clear that we have the following principle.13 (Intersection of σalgebras). and is the ﬁnest σalgebra that is coarser than all of the Bα .4. Similarly.13. Show that the intersection α∈I Bα := α∈I Bα of an arbitrary (and possibly inﬁnite or uncountable) number of σalgebras Bα is again a σalgebra. and P (E) is a property of sets E ⊂ X which obeys the following axioms: (i) P (∅) is true. Equivalently. (iii) If P (E) is true for some E ⊂ X. However. We deﬁne F to be the intersection of all the σalgebras that contain F.86 1. Remark 1.4. as it is not a σalgebra. but F cannot equal this algebra. .7). Then one can conclude that P (E) is true for all E ∈ F . then F bool is the elementary algebra (Exercise 1. This principle is particularly useful for establishing properties of Borel measurable sets (see below). if F is the collection of all boxes in Rd .4. whence the claim.4. ⊂ X are such that P (En ) is true for all n.
the countable intersections of open sets (known as Gδ sets). Recall that we are assuming the axiom of choice throughout this text. and so we see that the Borel σalgebra is coarser than the Lebesgue σalgebra. Vol. (vi) The elementary sets in Rd . (Hint: To show that two families F. that the two σalgebras are not equal. The Borel σalgebra B[X] of X is deﬁned to be the σalgebra generated by the open subsets of X.16 (Borel σalgebra).) Let F be .4. F of sets generate the same σalgebra. Show that the Borel σalgebra B[Rd ] of a Euclidean set is generated by any of the following collections of sets: (i) The open subsets of Rd . We will shortly see.4.14. and so forth. (v) The boxes in Rd . for instance.9. though.15 (Recursive description of a generated σalgebra).1. In Rd .4. or more generally a topological space. Thus. However.4 of An epsilon of room. We deﬁned the Borel σalgebra to be generated by the open sets. I. the Borel σalgebra contains the open sets.4. (This exercise requires familiarity with the theory of ordinals. which is reviewed in §2. (ii) The closed subsets of Rd . Abstract measure spaces 87 Deﬁnition 1. and conversely.) There is an analogue of Exercise 1. every open set is Lebesgue measurable. it suﬃces to show that every σalgebra that contains F. which illustrates the extent to which a generated σalgebra is “larger” than the analogous generated Boolean algebra: Exercise 1. (iii) The compact subsets of Rd . the countable intersections of Fσ sets. Elements of B[X] will be called Borel measurable. they are also generated by several other sets: Exercise 1. (iv) The open balls of Rd . the countable unions of closed sets (known as Fσ sets).4. Let X be a metric space. contains F also. the closed sets (which are complements of open sets).
17. Show that F = α∈ω1 Fα . and one can view the property of Borel measurability intuitively .4. Show that F has cardinality at most κℵ0 .4. (ii) For each countable successor ordinal α = β + 1. Remark 1. and let ω1 be the ﬁrst uncountable ordinal. we deﬁne Fα := β<α Fβ .4. Indeed. Deﬁne the sets Fα for every countable ordinal α ∈ ω1 via transﬁnite induction as follows: (i) Fα := F.) Let F be an inﬁnite family of subsets of X of cardinality κ (thus κ is an inﬁnite cardinal). (Hint: How many subsets of the Cantor set are there?) Use this to place the Borel σalgebra on the diagram that you drew for Exercise 1. these play an important role in descriptive set theory. it is remarkably diﬃcult (though not impossible) to exhibit a speciﬁc set that is not Borel measurable. In the case when F is the collection of open sets in a topological space. or the complement of such a union. (iii) For each countable limit ordinal α = supβ<α β.4. (This exercise requires familiarity with the theory of cardinals. I. Vol. we deﬁne Fα to be the collection of all sets that either the union of an at most countable number of sets in Fn−1 (including the empty union ∅). for instance by generating counterexamples to various plausible statements in pointset topology.8. (Hint: use Exercise 1.18. Exercise 1. Despite this demonstration that not all Lebesgue measurable subsets are Borel measurable. show that the Borel σalgebra B[Rd ] has cardinality at most c := 2ℵ0 .88 1. Conclude that there exist Jordan measurable (and hence Lebesgue measurable) subsets of Rd which are not Borel measurable.4. Measure theory a collection of sets in a set X. then moves on to the Fσ and Gδ sets. so that F .) In particular. a large majority of the explicitly constructible sets that one actually encounters in practice tend to be Borel measurable. The ﬁrst uncountable ordinal ω1 will make several further cameo appearances here and in An epsilon of room. then the sets Fα are essentially the Borel hierarchy (which starts at the open and closed sets. and so forth). Remark 1.16.15.
15 yet again.3.4. Rd2 respectively.17. (Hint: the Cartesian product of any set with a point is a null set. Countably additive measures and measure spaces. x2 ) ∈ E} is a Borel measurable subset of Rd2 . Let E be a Borel measurable subset of Rd1 +d2 .4.18.19 (Finitely additive measure).19.4. in contrast. apply Remark 1.1. We begin with the ﬁnitely additive theory. Exercise 1. 1. Similarly.4. x2 ) ∈ E} is a Borel measurable subset of Rd1 . Show that the Lebesgue σalgebra on Rd is generated by the union of the Borel σalgebra and the null σalgebra. we now endow these structures with a measure.4. one can view the Borel measurable sets as those sets of “countable descriptive complexity”. To obtain the general case. by using Remark 1. show that for every x2 ∈ Rd2 . Let E. although this theory is too weak for our purposes and will soon be supplanted by the countably additive theory. (Hint: ﬁrst establish this in the case when F is a box. even if the ﬁrst set was not measurable.) Exercise 1.4. the slice {x1 ∈ Rd1 : (x1 .4. Having set out the concept of a σalgebra a measurable space. Deﬁnition 1. of course). +∞] that obeys the following axioms: (i) (Empty set) µ(∅) = 0.15. (ii) Give a counterexample to show that this claim is not true if “Borel” is replaced with “Lebesgue” throughout. Show that E ×F is a Borel measurable subset of Rd1 +d2 . Let B be a Boolean algebra on a space X. as a very crude ﬁrst approximation. Abstract measure spaces 89 as a kind of “constructibility” property. F be Borel measurable subsets of Rd1 . An (unsigned) ﬁnitely additive measure µ on B is a map µ : B → [0. .4. sets of ﬁnite descriptive complexity tend to be Jordan measurable (assuming they are bounded. (Indeed. (i) Show that for any x1 ∈ Rd1 .) The above exercise has a partial converse: Exercise 1. the slice {x2 ∈ Rd2 : (x1 .
On the other hand. Example 1. Example 1. and coelementary sets have inﬁnite elementary measure). is also a ﬁnitely additive measure. Jordan measure and elementary measure are ﬁnitely additive (adopting the convention that coJordan measurable sets have inﬁnite Jordan measure. If B is a Boolean algebra on X. +∞] of B to Y . If B is a Boolean algebra on X.21.4. µ : B → [0.22 (Dirac measure). if E ∈ B and E ⊂ Y ). +∞] is a ﬁnitely additive measure. Lebesgue measure m is a ﬁnitely additive measure on the Lebesgue σalgebra. as is cµ : E → c × µ(E) for any c ∈ [0. Remark 1. +∞] are ﬁnitely additive measures on B. The zero measure 0 : E → 0 is a ﬁnitely additive measure on any Boolean algebra.25 (Restriction of a measure). +∞] deﬁned by setting #(E) to be .24 (Linear combinations of measures).4. Example 1. Example 1.23 (Zero measure). then the restriction µ Y : B Y → [0. Example 1.4. and µ. +∞]. ν : B → [0. then the function # : B → [0. The empty set axiom is needed in order to rule out the degenerate situation in which every set (including the empty set) has inﬁnite measure. or the elementary algebra).4. Then the Dirac measure δx at x.4. then µ(E ∪ F ) = µ(E) + µ(F ). Let x ∈ X and B be an arbitrary Boolean algebra on X.90 1. Example 1.20. Measure theory (ii) (Finite additivity) Whenever E. is ﬁnitely additive. deﬁned by setting δx (E) := 1E (x). Lebesgue outer measure is not ﬁnitely additive on the discrete algebra. as we saw in previous notes. and Y is a Bmeasurable subset of X.4. the sum of Lebesgue measure and a Dirac measure is also a ﬁnitely additive measure on the Lebesgue algebra (or on any of its subalgebras). F ∈ B are disjoint. and Jordan outer measure is not ﬁnitely additive on the Lebesgue algebra. If B is a Boolean algebra on X.4. then µ+ν : E → µ(E)+ν(E) is also a ﬁnitely additive measure. and hence on all subalgebras (such as the null algebra.e. deﬁned by setting µ Y (E) := µ(E) whenever E ∈ B Y (i.26 (Counting measure). for instance. In particular. the Jordan algebra. Thus.
generated by a ﬁnite family A1 . Show that for every ﬁnitely additive measure µ on B there exists c1 . we adopted a “minimalist” deﬁnition so that the axioms are easy to verify. Let B be a ﬁnite Boolean algebra. and E1 . then µ(E ∪ F ) + µ(E ∩ F ) = µ(E) + µ(F ). Ek are Bmeasurable. . . . Let µ : B → [0. +∞] be a ﬁnitely additive measure on a Boolean σalgebra B. ck ∈ [0. (iii) (Finite subadditivity) If k is a natural number. +∞] if c is inﬁnite.) One can characterise measures completely for any ﬁnite algebra: Exercise 1. and #(E) := +∞ if E is inﬁnite. . . ∪ Ek ) ≤ µ(E1 ) + . Furthermore. then µ(E) ≤ µ(F ). . . .4. . . Equivalently. then µ(E1 ∪ . ∪ Ek ) = µ(E1 ) + . ck are uniquely determined by µ. and E1 . show that the c1 . + µ(Ek ). . (iv) (Inclusionexclusion for two sets) If E. +∞] such that µ(E) = 1≤j≤k:Aj ⊂E cj . . F are Bmeasurable and E ⊂ F . . . . . known as counting measure. is a ﬁnitely additive measure.4. then µ(E1 ∪ . . . Ak of nonempty atoms.21. . . F are Bmeasurable. if xj is a point in Aj for each 1 ≤ j ≤ k. . Ek are Bmeasurable and disjoint. .20. As with our deﬁnition of Boolean algebras and σalgebras. . . and so the use of cancellation (or subtraction) should be avoided if possible. (ii) (Finite additivity) If k is a natural number. . Abstract measure spaces 91 the cardinality of E if E is ﬁnite.4. . then k µ= j=1 cj δxj . . .1. (Caution: remember that the cancellation law a+c = b+c =⇒ a = b does not hold in [0. Establish the following facts: (i) (Monotonicity) If E. . But they imply several further useful properties: Exercise 1. + µ(Ek ).
.22 (Countable combinations of measures). . Measure theory This is about the limit of what one can say about ﬁnitely additive measures at this level of generality. A triplet (X. An (unsigned) countably additive measure µ on B.4. Example 1. ∈ B are a count∞ able sequence of disjoint measurable sets. then the sum n=1 µn : E → n=1 µn (E) is also a countably additive measure.27 (Countably additive measure). Deﬁnition 1. +∞] is a countably additive measure. µ). Any restriction of a countably additive measure to a measurable subspace is again countably additive. is a map µ : B → [0. . We now specialise to the countably additive measures on σalgebras. or measure for short.4.30. B. The Dirac measures from Exercise 1. B) is a measurable space and µ : B → [0. Let (X. is known as a measure space.92 1. B) be a measurable space. as is counting measure. (ii) If µ1 . where (X. Note the distinction between a measure space and a measurable space. The latter has the capability to be equipped with a measure. then µ( n=1 En ) = ∞ n=1 µ(En ). B) be a measurable space. and c ∈ [0. are a sequence of countably additive measures ∞ ∞ on B. and hence on every subσalgebra (such as the Borel σalgebra). . +∞] that obeys the following axioms: (i) (Empty set) µ(∅) = 0. (i) If µ is a countably additive measure on B. Example 1. . then cµ is also countably additive.4.4.28. Lebesgue measure is a countably additive measure on the Lebesgue σalgebra. but the former is actually equipped with a measure. (ii) (Countable additivity) Whenever E1 . Let (X. . Example 1. µ2 .29. Note that countable additivity measures are necessarily ﬁnitely additive (by padding out a ﬁnite union into a countable union using .4. Exercise 1. +∞].22 are countably additive. E2 .4.
(ii) If there exists a Bmeasurable set F of ﬁnite measure (i. B.25.4.4. But one also has additional properties: Exercise 1. µ) be a measure space. n→∞ n (iii) (Downwards monotone convergence) If E1 ⊃ E2 ⊃ . (Hint: mimic the solution to Exercise 1. are Bmeasurable.1.2. . Show that every measure µ on this measurable space can . B.4. . are Bmeasurable. . then ∞ µ( n=1 En ) = lim µ(En ) = sup µ(En ). . Exercise 1.e. n→∞ n Show that the downward monotone convergence claim can fail if the hypothesis that µ(En ) < ∞ for at least one n is dropped. E2 . . µ(F ) < ∞) that contains all of the En . Let (X. µ) be a measure space. . and µ(En ) < ∞ for at least one n. (i) Show that E is also Bmeasurable.4. Abstract measure spaces 93 the empty set). (ii) (Upwards monotone convergence) If E1 ⊂ E2 ⊂ . Let E1 .11. be a sequence of Bmeasurable sets that converge to another set E. Let (X. are Bmeasurable. .24 (Dominated convergence for sets). E2 .) (iii) Show that the previous part of this exercise can fail if the hypothesis that all the En are contained in a set of ﬁnite measure is omitted. . then ∞ µ( n=1 En ) = lim µ(En ) = inf µ(En ). (Hint: Apply downward monotonicity to the sets n>N (En ∆E). show that limn→∞ µ(En ) = µ(E). Let X be an at most countable set with the discrete σalgebra. ∞ ∞ then µ( n=1 En ) ≤ n=1 µ(En ).23. .) Exercise 1. (i) (Countable subadditivity) If E1 . and so countably additive measures inherit all the properties of ﬁnitely additive properties. in the sense that 1En converges pointwise to 1E . such as monotonicity and ﬁnite subadditivity. .
28 (Approximation by an algebra). B[Rd ]. (This claim fails in the uncountable case.26 (Completion). Exercise 1.4. Furthermore. for instance. µ). it is fairly easy to modify any measure space to be complete: Exercise 1. Show that the Lebesgue measure space (Rd . m) is not (as can be seen from the solution to Exercise 1. A null set of a measure space (X. Completion is a convenient property to have in some cases. but the Borel measure space (Rd . . B. Thus.) A useful technical property. B.94 be uniquely represented in the form µ= x∈X 1. known as the completion of (X. Show that there exists a unique reﬁnement (X. µ) that is complete. although showing this is slightly tricky. is that of completeness: Deﬁnition 1. Exercise 1. B. µ) is deﬁned to be a Bmeasurable set of measure zero. µ) be a measure space.4. show that for every E ∈ A and ε > 0 there exists F ∈ A such that µ(E∆F ) < ε. L[Rd ]. and let µ be a measure on A . the Lebesgue measure space (Rd . Let (X.16). Let A be a Boolean algebra on X. thus µ(E) = x∈E cx for all E ⊂ X.4. m). B. show that B consists precisely of those sets that diﬀer from a Bmeasurable set by a Bsubnull set. enjoyed by some measure spaces. Fortunately. particularly when dealing with properties that hold almost everywhere. µ). Measure theory cx δx for some cx ∈ [0.31 (Completeness). A subnull set is any subset of a null set.27. L[Rd ]. which is the coarsest reﬁnement of (X. B. B[Rd ].4. m) is the completion of the Borel measure space (Rd . m) is complete. +∞].4. A measure space is said to be complete if every subnull set is a null set. (i) If µ(X) < ∞.
. Recall that a function f : X → Y between two topological spaces X. Let (X. Exercise 1. (ii) Show that an indicator function 1E of a set E ⊂ X is measurable if and only if E itself is Bmeasurable. A2 . (iv) Show that a function f : X → C is measurable if and only if its real and imaginary parts are measurable.9. +∞] is replaced by C. From Lemma 1. Abstract measure spaces ∞ 95 (ii) More generally.3. B) be a measurable space. +∞] or f : X → C is measurable if and only if f −1 (E) is Bmeasurable for every Borelmeasurable subset E of [0. Y is continuous if the inverse image f −1 (U ) of any open set is open. Now we are ready to deﬁne integration on measure spaces. +∞] or C. .4. +∞] is measurable if and only if the level sets {x ∈ X : f (x) > λ} are Bmeasurable.1. . B) be a measurable space. (vi) If fn : X → [0. then show that f is also measurable.4. which is analogous to that of a continuous function in topology. E ∈ A has ﬁnite measure. Let (X.4. .32. we have Deﬁnition 1. (iii) Show that a function f : X → [0. In a similar spirit. show that there exists F ∈ A such that µ(E∆F ) < ε. Measurable functions. and ε > 0. we see that this generalises the notion of a Lebesgue measurable function. Obtain the same claim if [0. 0).29. (v) Show that a function f : X → R is measurable if and only if the magnitudes f+ := max(f. and integration on a measure space. ∈ A with µ(An ) < ∞ for all n. if X = n=1 An for some A1 . f− := max(−f.4.4. +∞] are a sequence of measurable functions that converge pointwise to a limit f : X → [0. +∞]. and let f : X → [0. +∞] or C. We ﬁrst need the notion of a measurable function. (i) Show that a function f : X → [0. We say that f is measurable if f −1 (U ) is Bmeasurable for every open subset U of [0. 1. 0) of its positive and negative parts are measurable. +∞] or f : X → C be an unsigned or complexvalued function.
thus B = A((Aα )α∈I ) for some partition X = α∈I Aα of X into disjoint nonempty atoms. Obtain the same claim if [0. B ) to (X. to compose a transformation T : (X. C) to be a function f : X → Y with the property that f −1 (E) is Bmeasurable for every Cmeasurable set E. Remark 1.4. such as ergodic theory (discussed in [Ta2009]). B) → (X. +∞] or f : X → C is measurable if and only if it is constant on each atom. +∞] or C. Then a measurable function f : X → [0. Deﬁne measurable morphism or measurable map f from one measurable space (X. but it will not play a major role in this text. Measurable functions are particularly easy to describe on atomic spaces: Exercise 1. show that φ ◦ f is measurable. B). One can also view measurable functions in a more category theoretic fashion. This is important in those ﬁelds of mathematics. +∞] or in C as appropriate.4. +∞] or C is still measurable. the cα are uniquely determined by f . B) to another (Y. where the latter is equipped with the Borel σalgebra. +∞] → [0. B) be a measurable space that is atomic. The main purpose of adopting this viewpoint is that it is obvious that the composition of measurable morphisms is again a measurable morphism.31 (Egorov’s theorem). B) with itself repeatedly). +∞] is replaced by C. Also.33. or equivalently if one has a representation of the form f= α∈I cα 1Aα for some constants cα in [0. and let fn : X → C be a sequence of . Let (X. Furthermore. +∞] or f : X → C is the same thing as a measurable morphism from X to [0.96 1. in which one frequently wishes to compose measurable transformations (and in particular. Show that a function f : X → [0. (viii) Show that the sum or product of two measurable functions in [0. one σalgebra B on a space X is coarser than another B precisely when the identity map idX : X → X is a measurable morphism from (X.4. µ) be a ﬁnite measure space (so µ(X) < ∞).30. Let (X. +∞] is measurable and φ : [0. Measure theory (vii) If f : X → [0. +∞] is continuous. B. Exercise 1.
and let ε > 0. B. . . An .4.1. as well as the linearity properties Simp X f + g dµ = Simp X f dµ + Simp X g dµ . then by Exercise 1. We perform the same three stages here. . the precise decomposition into atoms does not aﬀect the deﬁnition of the simple integral.30 it has a unique representation of the form n f= i=1 ci 1Ai for some c1 .32. Note that. . In Section 1.3 we deﬁned ﬁrst an simple integral. thanks to Exercise 1. it is clear that one has the monotonicity property Simp X f dµ ≤ Simp X g dµ whenever f ≤ g are unsigned measurable. Abstract measure spaces 97 measurable functions that converge pointwise almost everywhere to a limit f : X → C. and then ﬁnally an absolutely convergent integral. X is partitioned into a ﬁnite number of atoms A1 . µ) be a measure space with B ﬁnite.4. . . .34 (Simple integral). cn ∈ [0. which in the abstract setting becomes integration in the case when the σalgebra is ﬁnite: Deﬁnition 1.3.4.4.4. . +∞] is measurable. We begin with the simple integral.4. +∞].4. Show that there exists a measurable set E of measure at most ε such that fn converges uniformly to f outside of E. Let (X. By Exercise 1. then an unsigned integral. Exercise 1. Give an example to show that the claim can fail when the measure µ is not ﬁnite. With this deﬁnition. Propose a deﬁnition for the simple integral for absolutely convergent complexvalued functions on a measurable space with a ﬁnite σalgebra. We then deﬁne the simple integral Simp X f dµ of f by the formula n Simp X f dµ := i=1 ci µ(Ai ). If f : X → [0.
+∞]. . Note that such a function is then automatically measurable with respect to at least one ﬁnite subσalgebra B of B. then it would also be measurable with respect to their common reﬁnement B ∨B := B ∪B . but Exercise 1. we say that a property P (x) of an element x ∈ X of a measure space (X. Show that Simp X f dµ = Simp X f dµ . and then by Exercise 1. Suppose that both B. Let (X. .4. We then deﬁne the simple integral Simp X f dµ by the formula Simp X f dµ := Simp X f dµ B . . µ ) be a reﬁnement of (X. and let f : B → [0. . . B. B.33 (Simple integral unaﬀected by reﬁnements). which is also ﬁnite (by Exercise 1. +∞] on B. B.4.4. . namely the σalgebra B generated by the preimages f −1 ({a1 }).8). Measure theory cf dµ = c × Simp X f dµ for unsigned measurable f. . and let (X.4. An (unsigned) simple function f : X → [0.98 and Simp X 1. We also make the following important technical observation: Exercise 1. X f dµ B and X f dµ B are both equal to X f dµ B ∨B . g and c ∈ [0. µ) holds µalmost everywhere if it holds outside of a subnull set. +∞] is the restriction of µ : B → [0.33.4. . B . Note that there could be multiple ﬁnite σalgebras with respect to which f is measurable. . +∞] be measurable. . This allows one to extend the simple integral to simple functions: Deﬁnition 1. which means that B contains B and µ : B → [0. +∞] to B . From this we can deduce the following properties of the simple integral. +∞] on a measurable space (X. µ). ak .33 guarantees that all such extensions will give the same simple integral. ak . As with the Lebesgue theory. B) is a measurable function that takes on ﬁnitely many values a1 . B are ﬁnite. and hence equal to each other. Indeed. . f −1 ({ak }) of a1 . +∞] agrees with µ : B → [0. .35 (Integral of simple functions). µ) be a measure space. if f were measurable with respect to two separate ﬁnite subσalgebras B and B of B. . where µ B : B → [0.
(vii) (Finiteness) Simp X f dµ < ∞ if and only if f is ﬁnite almost everywhere.4. then Simp X f dµ = Simp X f dµ . . then Simp Simp X g dµ. as we will now do. rather than countably additive ones.1. Abstract measure spaces 99 Exercise 1. .35 (Inclusionexclusion principle). The simple integral could also be deﬁned on ﬁnitely additive measure spaces. X f dµ ≤ (ii) (Compatibility with measure) For every Bmeasurable set E. However. +∞] be simple functions. then Simp X f dµ = Simp X g dµ. we have Simp X 1E dµ = µ(E).4. Let (X.) Remark 1. B . µ ) is a reﬁnement of (X. . g : X → [0. B. (Hint: Compute Simp X (1− n i=1 (1−1Ai )) dµ in two diﬀerent ways. one has Simp c × Simp X f dµ. (vi) (Almost everywhere equivalence) If f (x) = g(x) for µalmost every x ∈ X. Let (X. Show that n µ i=1 Ai = J⊂{1. +∞]. µ) be a measure space.4. and all the above properties would still apply. An be Bmeasurable sets of ﬁnite measure. (i) (Monotonicity) If f ≤ g pointwise. and let f. X X cf dµ = (f + g) dµ = Simp X f dµ + (v) (Insensitivity to reﬁnement) If (X. . (iii) (Homogeneity) For every c ∈ [0.33). (viii) (Vanishing) Simp everywhere.4... B. and is supported on a set of ﬁnite measure. (iv) (Finite additivity) Simp Simp X g dµ. B. on a ﬁnitely additive measure space one would have diﬃculty extending the integral beyond simple functions.n}:J=∅ (−1)J−1 µ i∈J Ai . µ) be a measure space.36. and let A1 . ...34 (Basic properties of the simple integral). X f dµ = 0 if and only if f is zero almost Exercise 1.4. µ) (as deﬁned in Exercise 1.
this deﬁnition generalises Deﬁnition 1.g sup Simp simple g dµ. if f : R → [0. +∞] be measurable.13. if X f dµ < ∞. g : X → [0. Then we deﬁne the unsigned integral f dµ of f by the formula X (1. µ) be a measure space.37. then X f dµ = X g dµ (ii) (Monotonicity) If f ≤ g µalmost everywhere. µ) be a measure space.36 (Easy properties of the unsigned integral).3. d We record some easy properties of this integral: Exercise 1. X In particular. Let (X. then f (x) is zero for µalmost X .4. one has µ({x ∈ X : f (x) ≥ λ}) ≤ 1 λ f dµ. Indeed. (viii) (Vanishing) If every x. then f dµ = Simp X f dµ. (iv) (Superadditivity) We have X X f dµ ≤ cf dµ = c (f +g) dµ ≥ X f dµ for every f dµ+ g dµ.3. (vii) (Finiteness) If every x. X (vi) (Markov’s inequality) For any 0 < λ < ∞. then the sets {x ∈ X : f (x) ≥ λ} have ﬁnite measure for each λ > 0. Let (X. +∞] be measurable. then f (x) is ﬁnite for µalmost f dµ = 0. B. Deﬁnition 1. and let f : X → [0.14) X f dµ := 0≤g≤f . then g dµ. in analogy to how the unsigned Lebesgue integral was constructed in Section 1. X (iii) (Homogeneity) We have c ∈ [0. we can now deﬁne the unsigned integral. X X X (v) (Compatibility with the simple integral) If f is simple. then Rd f (x) dx = Rd f dm.100 1. Measure theory From the simple integral. X f dµ < ∞. B. +∞] is Lebesgue measurable. and let f. (i) (Almost everywhere equivalence) If f = g µalmost everywhere. X Clearly. +∞].3.4.
fε and f ε are simple. Since f is bounded. Abstract measure spaces (ix) (Vertical truncation) We have limn→∞ f dµ.1. and let fε be f rounded down to the nearest integer multiple of ε. and f ε be f rounded up to the nearest integer multiple. Clearly. +∞] is the restriction of Y f : X → [0. X X 101 min(f. is an increasing sequence of Bmeasurable sets. +∞] be measurable. (xi) (Restriction) If Y is a measurable subset of X. where f Y : Y → [0. . .25. then n→∞ lim f 1En dµ = X X f1 ∞ n=1 En dµ. Similarly deﬁne gε . +∞] to Y .4. µ) be a measure space. g are bounded. we have the pointwise bounds fε (x) ≤ f (x) ≤ f ε (x) and f ε (x) − fε (x) ≤ ε. Proof. Pick an ε > 0. g : X → [0.38. then X f 1Y dµ = f Y dµ Y . and let f. As before. g ε . n) dµ = (x) (Horizontal truncation) If E1 ⊂ E2 ⊂ .4. In view of superadditivity. B. Let (X. .4. We ﬁrst deal with the case when µ is a ﬁnite measure (which means that µ(X) < ∞) and f. Then (f + g) dµ = X X f dµ + X g dµ. and the restriction µ Y was deﬁned in Example 1. it suﬃces to establish the subadditivity property (f + g) dµ ≤ X X f dµ + X g dµ We establish this in stages. We will often abbreviate Y f Y dµ Y (by slight abuse of notation) as Y f dµ. We then have the pointwise bound f + g ≤ f ε + g ε ≤ fε + gε + 2ε. one of the key properties of this integral is its additivity: Theorem 1.
we no longer assume that µ is of ﬁnite measure. Then for any natural number n. and the claim follows. f + g are supported on n=1 En . Taking limits as n → ∞ using vertical truncation.4. we obtain the claim. X . then by monotonicity. n) ≤ X X min(f. n) + min(g. n) ≤ min(f.36(vi)). n) + min(g. X f + g dµ is inﬁnite as well. If either X f dµ or X g dµ is inﬁnite. we conclude that min(f + g. n) dµ + X min(g. n) dµ. f + g dµ ≤ X X fε + gε + 2ε dµ fε + gε + 2ε dµ X = Simp = Simp X fε dµ + Simp X gε dµ + 2εµ(X). so we may assume that X f dµ and X g dµ are both ﬁnite. n). Measure theory hence by Exercise 1. g are bounded. Letting ε → 0 and using the assumption that µ(X) is ﬁnite. g to be bounded. By Markov’s inequality (Exercise 1. n) dµ ≤ X X min(f. n) dµ. and also do not require f. n) dµ + X min(g.4.102 1. and ∞ f.14) we conclude that f + g dµ ≤ X X f dµ + X g dµ + 2εµ(X). we conclude that 1 for each natural number n. and so by horizontal truncation (f + g) dµ = lim X n→∞ (f + g)1En dµ. but now do not assume that f. we obtain the claim. From (1. Now we continue to assume that µ is a ﬁnite measure. g. Finally. we can use the previous case to deduce that min(f. the set En := {x ∈ X : f (x) > n } ∪ {x ∈ 1 X : g(x) > n } has ﬁnite measure. Since min(f + g.36 and the properties of the simple integral. These sets are increasing in n.
(i) Show that X f d(cµ) = c × ∞ ∞ X f dµ for every c ∈ [0. B. and let m be Lebesgue measure on Rd .4. µ) be a measure space. but it is also possible to prove the exercise without this theorem. show that f d X n=1 µn = n=1 X f dµn .38. +∞] be measurable. Let X be an arbitrary set (with the discrete σalgebra). C). B) to another measurable space (Y. and let f : X → [0.4. +∞].4. Letting n → ∞ and using horizontal truncation we obtain the claim. Let (X. (ii) If µ1 .1.44) below. B. µ2 . µ) be a measure space.4. are a sequence of measures on B.4.) Exercise 1. +∞] is measurable. Abstract measure spaces From the previous case. +∞] of µ by φ by the formula φ∗ µ(E) := µ(φ−1 (E)). where the pushforward T∗ m of m was deﬁned in Exercise 1. Exercise 1. Show that T∗ m = 1  det T  m. and let f : X → [0.40 (Sums as integrals). Let T : Rd → Rd be an invertible linear transformation. . . C. show that (f ◦ φ) dµ. Exercise 1. Deﬁne the pushforward φ∗ µ : C → [0. and let φ : X → Y be a measurable morphism (as deﬁned in Remark 1. φ∗ µ) is a measure space. let # be counting measure (see Exercise 1.4. Let (X.38 (Change of variables formula). (ii) If f : Y → [0. so that (Y. . (i) Show that φ∗ µ is a measure on C. +∞] be an arbitrary unsigned function.37 (Linearity in µ).4.4.33) from (X. .39. Exercise 1.4.26). X Y f dφ∗ µ = (Hint: the quickest proof here is via the monotone convergence theorem (Theorem 1. we have (f + g)1En dµ ≤ X X 103 f 1En dµ + X g1En dµ.
f− := max(−f. this deﬁnition generalises the Deﬁnition 1.B. µ) be a measure space. or L1 (µ) to denote the space of absolutely integrable functions. Clearly. We record some of the key facts about the absolutely convergent integral: Exercise 1. we deﬁne the integral X f dµ by the formula f dµ := X X f+ dµ − X f− dµ where f+ := max(f. If f is realvalued and absolutely integrable. B.4. X f dµ is a complex . B.104 Show that f is measurable with f d# = X x∈X 1. A measurable function f : X → C is said to be absolutely integrable if the unsigned integral f L1 (X. It is easy to see that the unsigned. Let (X.3. we deﬁne the integral X f dµ by the formula f dµ := X X Re f dµ + i X Im f dµ where the two integrals on the right are interpreted as realvalued integrals. (i) Show that L1 (X. If f is complexvalued and absolutely integrable. 0) are the magnitudes of the positive and negative components of f .39 (Absolutely convergent integral). Measure theory f (x). B. µ). and use L1 (X. and complexvalued integrals deﬁned in this manner are compatible on their common domains of deﬁnition. µ) to C. L1 (X). B.µ) := X f  dµ is ﬁnite. B.17.4. µ) is a complex vector space. one can deﬁne the absolutely convergent integral exactly as in the Lebesgue case: Deﬁnition 1. realvalued. 0). µ) be a measure space. (ii) Show that the integration map f → linear map from L1 (X. Let (X. Once one has the unsigned integral.41.
then f Y ∈ L1 (Y. µ Y ) and Y f Y dµ Y = X f 1Y dµ. g ∈ L1 (X. To get the other inequality. or µalmost everywhere.42 (Uniform convergence on a ﬁnite measure space). µ). B. n→∞ lim fn dµ = X ? X n→∞ lim fn dµ? There are certainly some cases in which one can safely do this: Exercise 1. is both vertically and horizontally truncated). B. : X → [0. A basic question in the subject is to determine the conditions under which such pointwise convergence would imply convergence of the integral: fn dµ → X X ? f dµ. then f ∈ L1 (X. Suppose that as n → ∞. and fn : X → [0. and X f dµ = X f dµ. µ ). to a measurable limit f . The convergence theorems. ﬁrst work in the case when f is both bounded and has ﬁnite measure support (i.e. L1 (µ) = 0 if and only (vii) If Y ⊂ X is Bmeasurable and f ∈ L1 (X. . by abuse of notation we write Y f dµ for Y f Y dµ Y . Suppose that (X. µ). µ) and c ∈ C. and let f1 .4. (Hint: it is easy to get one inequality. µ). (v) If f ∈ L1 (X. To put it another way: when can we ensure that one can interchange integrals and limits.5. g ∈ L1 (X.4. µ ) is a reﬁnement of (X. B. B . B. f2 . Abstract measure spaces 105 (iii) Establish the triangle inequality f + g L1 (µ) ≤ f L1 (µ) + g L1 (µ) and the homogeneity property cf L1 (µ) = c f L1 (µ) for all f.4. B Y . +∞] (resp. B. B. B . µ). +∞] be a sequence of measurable functions. then f if f is zero µalmost everywhere. µ) are such that f (x) = g(x) for µalmost every x ∈ X. fn (x) converges pointwise either everywhere.) (vi) Show that if f ∈ L1 (X. Let (X. then X f dµ = X g dµ. B. .1. B. and (X. As before. µ) is a ﬁnite measure space (so µ(X) < ∞). absolutely integrable functions) . (iv) Show that if f. . 1. fn : X → C) are a sequence of unsigned measurable functions (resp. µ) be a measure space.
all the mass in the fn has escaped by moving oﬀ to inﬁnity in a horizontal direction. [0. and let fn := n 1[0. the mass has escaped vertically. but R fn (x) dx = 1 still does not converge to f (x) dx = 0.42 (Escape to vertical inﬁnity). Exercise 1. Somehow. we have ﬁnite measure. Then fn converges pointwise to f := 0.42 would prevent this from happening R if all the fn were supported in a single set of ﬁnite measure. but the increasingly wide nature of the support of the fn prevents this from happening. We give the three classic examples. Then fn now converges uniformly f := 0.n+1] .43. And again. escape to zero frequency. 1] with Lebesgue measure (restricted from R). This time.4. Let X be the real line 1 with Lebesgue measure. fn dµ converges However. spacefrequency analysis). even when the fn are unsigned. but R fn (x) dx = 1 does not converge to R f (x) dx = 0. but no uniform convergence.4. n ] . and escape to inﬁnite frequency respectively. and let 1 2 fn := n1[ n .4.n] . through the increasingly large values of fn .1] f (x) dx = 0.4. leaving none behind for the pointwise limit f . Show that to X f dµ. Let X be the unit interval [0. . Example 1. Now. Let X be the real line with Lebesgue measure. and fn converges pointwise to f .106 1. these three escapes are analogous (though not quite identical) to escape to spatial inﬁnity.40 (Escape to horizontal inﬁnity). Example 1. there are also cases in which one cannot interchange limits and integrals.1] fn (x) dx = 1 is not converging to [0. From the perspective of timefrequency analysis (or perhaps more accurately. and let fn := 1[n. though the way in which the bump moves varies from example to example: Example 1.4.41 (Escape to width inﬁnity). Measure theory X that converge uniformly to a limit f . Remark 1. thus describing the three diﬀerent ways in which phase space fails to be compact (if one excises the zero frequency as being singular). all of “moving bump” type.
which prevents each fn from abandoning the location where the mass of the preceding f1 . . the upwards monotone convergence property will play a key role in the proof of this theorem. It remains to establish the reverse inequality f dµ ≤ lim X n→∞ fn dµ. There are two major ways to accomplish this. .23(ii)).44 (Monotone convergence theorem). Proof. . and let 0 ≤ f1 ≤ f2 ≤ . fn−1 was concentrated and which thus shuts down the above three escape scenarios. One is to enforce monotonicity. we have the monotone convergence theorem: Theorem 1. be a monotone nondecreasing sequence of unsigned measurable functions on X. By vertical truncation we may assume without loss of generality that g . once one shuts down these avenues of escape to inﬁnity. Let (X. Write f := limn→∞ fn = supn fn . . X By deﬁnition. Abstract measure spaces 107 However. which gives the bound n→∞ lim fn dµ ≤ X X f dµ. More precisely. X whenever g is a simple function that is bounded pointwise by f . it suﬃces to show that g dµ ≤ lim X n→∞ fn dµ. B.4. Then we have n→∞ lim fn dµ = X X n→∞ lim fn dµ.1. Conversely. µ) be a measure space. then f : X → [0. it turns out that one can recover convergence of the integral.4. this theorem collapses to the upwards monotone convergence property (Exercise 1. Note that in the special case when each fn is an indicator function fn = 1En . +∞] is measurable. Since the fn are nondecreasing to f . we see from monotonicity that X fn dµ are nondecreasing and bounded above by X f dµ. .4. .
n for any n. we obtain k n→∞ lim fn dµ ≥ (1 − ε) X i=1 ci µ(Ai ). integrating this. we conclude that n→∞ lim µ(Ai. Remark 1.108 also is ﬁnite everywhere. observe the pointwise bound k fn ≥ i=1 (1 − ε)ci 1Ai. On the other hand.n := {x ∈ Ai : fn (x) > (1 − ε)ci } then the Ai. we obtain k fn dµ ≥ (1 − ε) X i=1 ci µ(Ai.n increase to Ai and are measurable. Taking limits as n → ∞. It is easy to see that the result still holds if the monotonicity fn ≤ fn+1 only holds almost everywhere rather than everywhere. sending ε → 0 we then obtain the claim.4. By upwards monotonicity of measure. . thus k g dµ = X i=1 ci µ(Ai ). if we deﬁne the sets Ai.n ). . . .n ) = µ(Ai ). Let 0 < ε < 1 be arbitrary.45. Thus. Then we have f (x) = sup fn (x) > (1 − ε)ci n for all x ∈ Ai . then we can write k 1. Ak . Measure theory g= i=1 ci 1Ai for some 0 ≤ ci < ∞ and some disjoint Bmeasurable sets A1 . .
known as Fatou’s lemma: . and let f1 . . (Hint: think about the three escapes to inﬁnity.) Exercise 1.44) that does not go through any of the convergence theorems. Abstract measure spaces 109 This has a number of important corollaries. B. Firstly.1. . (i) Give an alternate proof of the BorelCantelli lemma (Exercise 1. B. (ii) Give a counterexample that shows that the BorelCantelli ∞ lemma can fail if the condition n=1 µ(En ) < ∞ is relaxed to limn→∞ µ(En ) = 0.45. . Let (X.4.4.4. Show that almost every x ∈ X is contained in at most ﬁnitely many of the En (i. Let (X.43. f2 . : X → [0.23. {n ∈ N : x ∈ En } is ﬁnite for almost every x ∈ X). . E3 . we can generalise (part of) Tonelli’s theorem for exchanging sums (see Theorem 0. Secondly. +∞] be a sequence of unsigned measurable functions. when one does not have monotonicity.44) N to the partial sums FN := n=1 fn . even if the sum n=1 fn (x) is absolutely convergent for each x. Then one has ∞ ∞ fn dµ = X n=1 n=1 X fn dµ. Give an example to show that this corollary can fail if the fn are assumed to be absolutely integrable rather than unsigned ∞ measurable.e.44 (BorelCantelli lemma). but instead exploits the more basic properties of measure from Exercise 1. µ) be a measure space. and let E1 .2): Corollary 1.46 (Tonelli’s theorem for sums and integrals). E2 .4. Apply the monotone convergence theorem (Theorem 1. Proof. Exercise 1. (Hint: Apply Tonelli’s theorem to the indicator functions 1En . .0. .4.4.4.4. be a sequence of Bmeasurable sets such ∞ that n=1 µ(En ) < ∞.) Exercise 1. one can at least obtain an important inequality. µ) be a measure space.
and thus FN dµ ≤ inf X n≥N fn dµ.48. µ) be a measure space. that mass X fn dµ can be destroyed in the limit (as was the case in the three key moving bump examples). B. This result is known as the dominated convergence theorem: . Remark 1. I for some examples of this.110 1.4. which is to dominate all of the functions involved by an absolutely convergent one. X The claim then follows by another appeal to the deﬁnition of the lim inferior. . See §1. Fatou’s lemma tells us that when taking the pointwise limit of unsigned functions fn . Proof.44) sup FN dµ = sup X N >0 N >0 X FN dµ. but it cannot be created in the limit. Then X n→∞ lim inf fn dµ ≤ lim inf n→∞ X fn dµ. we have supN >0 FN = lim inf n→∞ fn . but not created. Finally. : X → [0. f2 . and hence by the monotone convergence theorem (Theorem 1. . X Hence we have X n→∞ lim inf fn dµ ≤ sup inf N >0 n≥N fn dµ.4. By deﬁnition of lim inf. by the process of taking limits) tends to hold for other “weak” notions of convergence. Then the FN are measurable and nondecreasing. . Informally. we give the other major way to shut down loss of mass via escape to inﬁnity. Measure theory Corollary 1. we have X FN dµ ≤ X fn dµ for all n ≥ N . Vol. Let (X.4. +∞] be a sequence of unsigned measurable functions. By monotonicity.9 of An epsilon of room. Write FN := inf n≥N fn for each N . the same general principle (that mass can be destroyed. While this lemma was stated only for pointwise limits. and let f1 .47 (Fatou’s lemma). Of course the unsigned hypothesis is necessary here (consider for instance multiplying any of the moving bump examples by −1).
. we see that f + G dµ ≤ lim inf X n→∞ X X fn + G dµ. Abstract measure spaces 111 Theorem 1.24. G dµ gives which on subtracting the ﬁnite quantity f dµ ≤ lim inf X n→∞ X fn dµ. .4.4. thus −G ≤ fn ≤ G pointwise. If we apply Fatou’s lemma (Corollary1. Proof. and similarly we can assume that fn are bounded by G pointwise everywhere rather than µalmost everywhere. By modifying fn . . Similarly. : X → C be a sequence of measurable functions that converge pointwise µalmost everywhere to a measurable limit f : X → C. we may assume without loss of generality that the fn converge to f pointwise everywhere rather than µalmost everywhere. the dominated convergence theorem collapses to Exercise 1.4. in each of the moving bump examples.47) to the unsigned functions fn + G.49 (Dominated convergence theorem). without appealing to the above theorem. By taking real and imaginary parts we may assume without loss of generality that fn . f are real. From the moving bump examples we see that this statement fails if there is no absolutely integrable dominating function G. Of course. no such dominating function exists. B. . f on a null set. µ) be a measure space.4. +∞] such that fn  are pointwise µalmost everywhere bounded by G for each n. Note also that when each of the fn is an indicator function fn = 1En . this implies that −G ≤ f ≤ G pointwise also. we obtain G − f dµ ≤ lim inf X n→∞ X G − fn dµ. if we apply that lemma to the unsigned functions G − fn .1. Let (X. Then we have n→∞ lim fn dµ = X X f dµ. f2 . and let f1 . Suppose that there is an unsigned absolutely integrable function G : X → [0. The reader is encouraged to see why.
f2 . and let f1 . g1 . Show that n→∞ lim fn dµ = X X f dµ. For instance. Exercise 1. and let f1 . g2 .4. .4. : X → [0. one ﬁrst obtains the slightly simpler bounded convergence theorem. Measure theory X G dµ again we con fn dµ ≤ X f dµ. one can obtain these theorems in a diﬀerent order. However.48 (Defect version of Fatou’s lemma).47 (Almost dominated convergence). . Suppose that there is an unsigned absolutely integrable functions G. It is instructive to view a couple diﬀerent derivations of these key theorems to get more of an intuitive understanding as to how they work. .46. . and then the horizontal and vertical truncation properties are used to extend the bounded convergence theorem to the dominated convergence theorem. Exercise 1.4. Under the hypotheses of the dominated convergence theorem (Theorem 1.4. which is the dominated convergence theorem under the assumption that the functions are uniformly bounded and all supported on a single set of ﬁnite measure. µ) be a measure space. Exercise 1. the logic is somewhat diﬀerent. and Fatou’s lemma from the monotone convergence theorem. : X → C be a sequence of measurable functions that converge pointwise µalmost everywhere to a measurable limit f : X → C. depending on one’s taste. and that X gn dµ → 0 as n → ∞. establish also that fn − f L1 → 0 as n → ∞.50. : X → [0. which in turn is used to deduce the monotone convergence theorem. B. B. . +∞] be a sequence of . . in [StSk2005]. f2 .4. +∞] such that the fn  are pointwise µalmost everywhere bounded by G + gn . µ) be a measure space. Let (X. We deduced the dominated convergence theorem from Fatou’s lemma. and then uses that to deduce Fatou’s lemma.49). . Let (X. Remark 1. . . as they are so closely related.112 negating this inequality and then cancelling clude that lim sup n→∞ X 1. The claim then follows by combining these inequalities.
Vol. a deﬁning property of the unsigned integral. µ) be a measure space. Exercise 1. one has I(cf ) = cI(f ). I : f → I(f ) be a map from the space U(X. Then there exists a unique measure µ on (X.4. one has I(f + g) = I(f ) + I(g). +∞] to [0.4. Show that the function µg : B → [0. I. Show that fn dµ − X X f dµ − f − fn L1 (µ) →0 as n → ∞. B). µ is given by the formula X µ(E) := I(1E ) for all Bmeasurable sets E. +∞].49) to min(fn . +∞] be measurable.1.49. B).) The monotone convergence theorem is. +∞] that obeys the following axioms: (i) (Homogeneity) For every f ∈ U(X. Exercise 1. B) and c ∈ [0. g ∈ U(X. . B.50 (Characterisation of the unsigned integral). Let (X. are a nondecreasing sequence of unsigned measurable functions. then I(limn→∞ fn ) = limn→∞ I(fn ). B) such that I(f ) = f dµ for all f ∈ U(X. in some sense.2 of An epsilon of room. B) be a measurable space. (Hint: Apply the dominated convergence theorem (Theorem 1. +∞] deﬁned by the formula µg (E) := X 1E g dµ = E g dµ is a measure. (Such measures are studied in greater detail in §1.4. as the following exercise illustrates. Abstract measure spaces 113 unsigned absolutely integrable functions that converges pointwise to an absolutely integrable limit f .4. this tells us that the gap between the left and right hand sides of Fatou’s lemma can be measured by the quantity f − fn L1 (µ) . . f ).) Informally. (iii) (Monotone convergence) If 0 ≤ f1 ≤ f2 ≤ . B) of unsigned measurable functions f : X → [0. and let g : X → [0. Let (X. . (ii) (Finite additivity) For every f. Furthermore.
Similarly for a sequence z1 . it is also unambiguous what it means for that sequence to converge to a limit v ∈ Rd or v ∈ Cd . and let f : X → R be a bounded function. . . . (This is a converse to Exercise 1. a sequence of vectors converges in the Euclidean norm if and only if it converges in the supremum norm. Here. of ddimensional vectors vn in a real vector space Rd or complex vector space Cd . x3 . it means that for every ε > 0. these norms are all equivalent. the supremum norm (j) v ∞ := sup1≤j≤d v . I ). ∈ R of real numbers xn . or any other number of norms. More generally. 1. .51. Modes of convergence If one has a sequence x1 .114 1. Suppose that µ is complete (see Deﬁnition 1. . there exists an N such that xn − x ≤ ε for all n > N . but for the purposes of convergence. B. . there exists an N such that vn − v ≤ ε for all n ≥ N . and the dominated convergence theorem make an appearance throughout the rest of this text (and in An epsilon of room. . . . Show that f is measurable. v (d) ) can be chosen to be d the Euclidean norm v 2 := ( j=1 (v (j) )2 )1/2 . v3 . v2 .) We will continue to see the monotone convergence theorem. the norm v of a vector v = (v (1) . . .g inf simple g dµ X and lower integral f dµ := X h≤f . Measure theory Exercise 1. . and similarly for any other two norms on the ﬁnitedimensional space Rd or Cd . Vol. x2 . Suppose that the upper integral f dµ := X g≥f .5.4. z2 .11.e. µ(X) < ∞). it is unambiguous what it means for that sequence to converge to a limit x ∈ R: it means that for every ε > 0. Let (X.31). ∈ C of complex numbers zn converging to a limit z ∈ C.3.h sup simple h dµ X agree. z3 . Fatou’s lemma. µ) be a ﬁnite measure space (i. if one has a sequence v1 .4. . .
however. pointwise and uniform convergence are only two of dozens of many other modes of convergence that are of importance in analysis. and a putative limit f : X → R or f : X → C. for every ε > 0. . In other words. fn (x) − f (x) ≤ ε for every x ∈ X. but not conversely. . the functions fn acquire an inﬁnite number of degrees of freedom. d}). The diﬀerence between uniform convergence and pointwise convergence is that with the former. f3 . We will. but not uniformly. discuss some of the modes of convergence that arise from measure theory. for every x ∈ X. . I ). fn (x) converges to f (x). and this allows them to approach f in any number of inequivalent ways. there exists N such that for every n ≥ N . .5.) This is contrast with the situation with scalars xn or zn (which corresponds to the case when X is a single point) or vectors vn (which corresponds to the case when X is a ﬁnite set such as {1. for every ε > 0 and x ∈ X. there exists N (that depends on both ε and x) such that fn (x) − f (x) ≤ ε whenever n ≥ N . (ii) We say that fn converges to f uniformly if. there can now be many diﬀerent ways in which the sequence fn may or may not converge to the limit f . of functions fn : X → R or fn : X → C on a common domain X. the time N at which fn (x) must be permanently εclose to f (x) is not permitted to depend on x. What diﬀerent types of convergence are there? As an undergraduate. . but we will not discuss this issue at all here. A typical example: the functions fn : R → R deﬁned by fn (x) := x/n converge pointwise to the zero function f (x) := 0.9 of An epsilon of room. However. f2 . We will not attempt to exhaustively enumerate these modes here (but see §1. Modes of convergence 115 If however one has a sequence f1 . when the domain X is equipped with the structure . Vol. . (One could also consider convergence of functions fn : Xn → C on diﬀerent domains Xn . but must instead be chosen uniformly in x. Uniform convergence implies pointwise convergence.1. Once X becomes inﬁnite. . one learns of the following two basic modes of convergence: (i) We say that fn converges to f pointwise if.
for every ε > 0. for every ε > 0. or in L∞ norm if. . In the context of probability theory (see Section 2. we have some additional modes of convergence: (i) We say that fn converges to f pointwise almost everywhere if. the pointwise and uniform modes of convergence can be aﬀected if one modiﬁes fn or f even on a single point. B. Measure theory of a measure space (X. Remark 1. (iii) We say that fn converges to f almost uniformly if. and the functions fn (and their limit f ) are measurable with respect to this space. Vol. fn (x) − f (x) ≤ ε for µalmost every x ∈ X. pointwise convergence almost everywhere is often referred to as almost sure convergence.116 1. there exists an exceptional set E ∈ B of measure µ(E) ≤ ε such that fn converges uniformly to f on the complement of E. the measures µ({x ∈ X : fn (x) − f (x) ≥ ε}) converge to zero as n → ∞. The L1 and L∞ modes of converges are special cases of the Lp mode of convergence. (ii) We say that fn converges to f uniformly almost everywhere. In contrast. there exists N such that for every n ≥ N .3). Observe that each of these ﬁve modes of convergence is unaﬀected if one modiﬁes fn or f on a set of measure zero. which is discussed in §1. for (µ)almost everywhere x ∈ X. µ). I. (v) We say that fn converges to f in measure if. In this context.3 of An epsilon of room. in which fn and f are interpreted as random variables. fn (x) converges to f (x). for every ε > 0.1.5. convergence in L1 norm is often referred to as convergence in mean. essentially uniformly. (iv) We say that fn converges to f in L1 norm if the quantity fn − f L1 (µ) = X fn (x) − f (x) dµ converges to 0 as n → ∞. and convergence in measure is often referred to as convergence in probability.
and gn  ≤ fn pointwise for each n. (i) Show that fn converges to f along one of the above seven modes of convergence if and only if fn − f  converges to 0 along the same mode. (ii) If fn converges to f along one of the above seven modes of convergence.5. show that gn converges to 0 along the same mode. show that fn + gn converges to f + g along the same mode.5. Modes of convergence 117 Exercise 1. B. g : X → C be measurable functions. and that cfn converges to cf along the same mode for any c ∈ C. gn : X → C be sequences of measurable functions.2 (Easy implications). and let f. then fn converges to f pointwise almost everywhere. (iv) If fn converges to f almost uniformly. µ) be a measure space. let fn . and let fn : X → C and f : X → C be measurable functions. (iii) If fn converges to f in L∞ norm.1 (Linearity of convergence). B. there exists a null set E such that the restriction fn X\E of fn to the complement of E converges to the restriction f X\E of f ). (iii) (Squeeze test) If fn converges to 0 along one of the above seven modes. µ) be a measure space. then fn converges to f pointwise almost everywhere. and gn converges to g along the same mode. then fn converges to f almost uniformly. We have some easy implications between modes: Exercise 1. if fn converges to f in L∞ norm. (ii) If fn converges to f uniformly. Conversely.5. Let (X. Let (X. then fn converges to f in measure. then fn converges to f pointwise. (v) If fn converges to f pointwise. then fn converges to f uniformly outside of a null set (i.e.1. then fn converges to f in L∞ norm. . (i) If fn converges to f uniformly. (vi) If fn converges to f in L1 norm.
almost uniformly. Then fn converges to zero in measure and in L1 norm. n−2k +1 ] 2k 2k whenever k ≥ 0 and 2k ≤ n < 2k+1 .4.5. Let fn be deﬁned by the formula fn := 1[ n−2k . 1 Example 1. or in L1 norm. marching across the unit interval [0. nor uniformly). Let fn := 1[n. but the fourth is new. in L∞ norm. Then fn converges to zero pointwise (and thus. n ] .n] . pointwise. Let fn := n1[ n . in L∞ norm. nor in L∞ norm. . in L∞ norm.n+1] .5. Then fn converges to zero uniformly (and thus. This is a sequence of indicator functions of intervals of decreasing length. The reader is encouraged to draw a diagram that summarises the logical implications between the seven modes of convergence that the above exercise describes. The ﬁrst three of these examples already were introduced in Section 1. in the case when X is the real line R with Lebesgue measure. and in measure). but not uniformly. but not pointwise almost everywhere (and hence also not pointwise. Example 1. pointwise almost everywhere.5 (Typewriter sequence). in L1 norm.2 (Escape to horizontal inﬁnity).118 1. or in measure.4 (Escape to vertical inﬁnity). 1] over and over again. Example 1. pointwise almost everywhere) and almost uniformly (and hence in measure). We give four key examples that distinguish between these modes. Let fn := n 1[0. but not in L1 norm. almost uniformly. pointwise almost everywhere).5. and also important. Then fn converges to zero pointwise (and thus. not almost uniformly. Measure theory (vii) If fn converges to f almost uniformly. 1 2 Example 1.5.3 (Escape to width inﬁnity). but not uniformly. then fn converges to f in measure.
5. Then fn converges to f in L∞ norm if and only if fn − f L∞ (µ) → 0 as n → ∞. B. +∞] that are essential upper bounds for f in the sense that f (x) ≤ M for almost every x.5. as one sees from the triangle inequality. unlike the situation with pointwise and uniform convergence. they are all compatible in the sense that they never disagree about which function f a sequence of functions fn converges to. (X. as the above examples show. Unfortunately. I. Modes of convergence 119 Remark 1.1. in the case when the fn are absolutely integrable. one cannot simply rank these modes in a linear order from strongest to weakest. studied in §1.5. if one imposes some additional assumptions to shut down one or more of these escape to inﬁnity scenarios. µ) denotes a measure space. Even though the modes of convergence all diﬀer from each other. Throughout these notes. such as a ﬁnite measure hypothesis µ(X) < ∞ or a uniform integrability hypothesis.3 of An epsilon of room. More precisely: . The purpose of these notes is to compare these modes of convergence with each other. Uniqueness. On the other hand. The L∞ and L1 norms are part of the larger family of Lp norms. The L∞ norm f L∞ (µ) of a measurable function f : X → C is deﬁned to the inﬁmum of all the quantities M ∈ [0. Vol. as well as to the “typewriter” behaviour when a single set is “overwritten” many times. then one can obtain some additional implications between the diﬀerent modes.6. 1. it implies convergence of the integrals. none of the other modes of convergence automatically imply this convergence of the integral. Unfortunately. the relationship between these modes is not particularly simple. One particular advantage of L1 convergence is that. fn dµ → X X f dµ.1. This is ultimately because the diﬀerent modes react in diﬀerent ways to the three “escape to inﬁnity” scenarios described above. outside of a set of measure zero. We abbreviate “µalmost everywhere” as “almost everywhere” throughout.
or in measure. It suﬃces to show that for every ε > 0. Let fn : X → C be a sequence of measurable functions. or in measure. This is a measurable set. . . as the claim then follows by setting ε = 1/m for m = 1. Suppose that fn converges to f along one of the seven modes of convergence deﬁned above. and fn converges to g along another of the seven modes of convergence (or perhaps the same mode of convergence as for f ). and let A := {x ∈ X : f (x) − g(x) > ε}.5. our task is to show that it has measure zero. Then f and g agree almost everywhere. . and let f. thus N =1 AN is equal to A outside of a null . and fn converges to g in measure. Suppose ﬁrst that fn converges to both f and g pointwise almost everywhere.5. 0 converges to f − g pointwise almost everywhere. In view of Exercise 1. We need to show that f = g almost everywhere. and the claim follows. 3. g : X → C be two additional measurable functions.5. Measure theory Proposition 1. 2. since as remarked earlier. We consider the sets AN := {x ∈ A : fn (x) − f (x) ≤ ε/2 for all n ≥ N }. we see that almost every x ∈ A belongs to ∞ at least one of the AN . that f (x) − g(x) ≤ ε for almost every x.2. Fix ε > 0. Note that the conclusion is the best one can hope for in the case of the last ﬁve modes of convergence. Then by Exercise 1.1. As fn converges to f almost everywhere. Proof.7. and similarly that fn converges to g either pointwise almost everywhere. By symmetry. we may assume that fn converges to f either pointwise almost everywhere. Suppose for contradiction that µ(A) > 0. These are measurable sets that are increasing in N . and using the fact that the countable union of null sets is again a null set. A similar argument applies if fn converges to both f and g in measure. the only remaining case that needs to be considered is when fn converges to f pointwise almost everywhere. these modes of convergence are unaﬀected if one modiﬁes f or g on a set of measure zero.120 1. which clearly implies that f − g is zero almost everywhere.
5. It turns out that the answer to question is controlled more or less completely by the following three quantities: (i) The height An of the nth function fn . 1. f2 . One way to appreciate the distinctions between the above modes of convergence is to focus on the case when f = 0. we have: Exercise 1.2. we have fn (x) − g(x) > ε/2 for all x ∈ AN and all n ≥ N . . f3 .5. or else they are bounded away from zero (i. for each of the seven modes of convergence. . fn cannot converge in measure to g. We also assume the An exhibit one of two modes of behaviour: either the An converge to zero. and ∗ (iii) The N th tail support EN := f1 . Establish the following claims: . Let the notation and assumptions be as above. Applying monotone convergence for sets.5. Modes of convergence set. It is easy to see that if a sequence An does not converge to zero. As a consequence. and when each of the fn is a step function. (ii) The width µ(En ) of the nth function fn . Given such a sequence fn = An 1En of step functions.e. so it does not cause too much loss of generality to restrict to one of these two cases. In particular. . we now ask.3 (Convergence for step functions). which gives the desired contradiction. For simplicity we will assume that the An > 0 are positive reals. But by the triangle inequality. there exists c > 0 such that An ≥ c for every n. by which we mean a constant multiple fn = An 1En of a measurable set En . then it has a subsequence that is bounded away from zero. ∞ 121 µ( N =1 AN ) > 0. we conclude that µ(AN ) > 0 for some ﬁnite N . and that the En have a positive measure µ(En ) > 0..1. The case of a step function. n≥N En of the sequence Indeed. what it means for this sequence to converge to zero along that mode.
In particular: (i) In the escape to horizontal inﬁnity scenario. then we never have uniform or L∞ convergence. or N =1 EN is a null set. which requires that the product of the height and the width goes to zero. or N =1 EN = ∅. .5. we have pointwise almost everywhere convergence if the tail support shrinks to a null set. If instead the height is bounded away from zero and the width is positive. (v) fn converges pointwise almost everywhere to zero if and only ∞ ∗ if An → 0 as n → ∞. It is instructive to compare this exercise with Exercise 1. the height goes to zero. then one has convergence to zero in all modes except possibly for L1 convergence. but the width (and tail support) go to inﬁnity. To put it more informally: when the height goes to zero. (ii) fn converges in L∞ norm to zero if and only if An → 0 as n → ∞.122 1. but the tail set shrinks to the empty set (while remaining of inﬁnite measure throughout). causing the L1 norm to stay bounded away from zero. or with the four examples given in the introduction. or µ(En ) → 0 as n → ∞. (ii) In the escape to width inﬁnity scenario. but we have convergence in measure if the width goes to zero. Measure theory (i) fn converges uniformly to zero if and only if An → 0 as n → ∞. and pointwise convergence if the tail support shrinks to the empty set. (vi) fn converges in measure to zero if and only if An → 0 as n → ∞.2. we have almost uniform convergence if the tail support (which has larger measure than the width) has measure that goes to zero. the height and width do not shrink to zero. (iii) fn converges almost uniformly to zero if and only if An → 0 ∗ as n → ∞. (iv) fn converges pointwise to zero if and only if An → 0 as ∞ ∗ n → ∞. (vii) fn converges in L1 norm if and only if An µ(En ) → 0 as n → ∞. or µ(EN ) → 0 as N → ∞.
The typewriter example shows that L1 convergence is not strong enough to force almost uniform or pointwise .8.5. The monotone convergence theorem then asserts that An µ(En ) → Aµ(E) as n → ∞. in the case when (X. causing the L1 norm to stay bounded away from zero. The monotone convergence theorem (Theorem 1.4.31).5. observe that the fn converge pointwise to ∞ f := A1E . Show that if fn converges to f in L∞ norm. Finite measure spaces. Fast convergence. In such cases. Let X have ﬁnite measure. (iv) In the typewriter example. 1. Then fn converges to f pointwise almost everywhere if and only if fn converges to f almost uniformly.5. and let fn : X → C and f : X → C be measurable functions. see Section 2. we now have Theorem 1. again).3.4. but the width (and tail support) go to zero (or the empty set). the height goes to inﬁnity.9 (Egorov’s theorem.3.5. This shuts down two of the four examples (namely.23(iii)). 1. escape to horizontal inﬁnity or width inﬁnity) and creates a few more equivalences.5. B. Modes of convergence 123 (iii) In the escape to vertical inﬁnity. Remark 1. from Egorov’s theorem (Exercise 1. the width goes to zero.44) can also be specialised to this case. and let fn : X → C and f : X → C be measurable functions. Note that when one specialises to step functions using Exercise 1.4.5. then fn also converges to f in L1 norm.4.4. but the height and the tail support stay ﬁxed (and thus bounded away from zero).3). Indeed.5. Let X have ﬁnite measure. which is a consequence of the monotone convergence theorem µ(En ) → µ(E) for sets. The situation simpliﬁes somewhat if the space X has ﬁnite measure (and in particular. where A := limn→∞ An and E := n=1 En .1. Observe that the fn = An 1En are monotone increasing if and only if An ≤ An+1 and En ⊂ En+1 for each n. then Egorov’s theorem collapses to the downward monotone convergence property for sets (Exercise 1. Another nice feature of the ﬁnite measure case is that L∞ convergence implies L1 convergence: Exercise 1. µ) is a probability space.
This is enough for the previous exercise to apply.) As a corollary. Suppose that fn : X → C are a sequence of measurable functions that converge in measure to a limit f . not only do the quantities fn − f L1 (µ) go to zero (which would mean L1 convergence). Measure theory almost everywhere convergence. of course).) . one can strengthen this corollary a bit by relaxing L1 convergence to convergence in measure: Exercise 1. thus. Then there exists a subsequence fnj that converges almost uniformly (and hence. we can select n1 < n2 < n3 < . (i) Show that fn converges pointwise almost everywhere to f . but the ﬁrst is a little easier to prove and may thus serve as a useful warmup. pointwise almost everywhere) to f .5. The second part of the exercise implies the ﬁrst. (ii) Show that fn converges almost uniformly to f . pointwise almost everywhere) to f (while remaining convergent in L1 norm. Suppose that fn .6. However. f : X → C ∞ are measurable functions such that n=1 fn − f L1 (µ) < ∞. The ε/2n trick may come in handy for the second part. but they converge in an absolutely summable fashion.10. (Hint: Choose the nj so that the sets {x ∈ X : fnj (x) − f (x) > 1/j} have a suitably small measure. .3 in order to gain some intuition. .5 (Fast L1 convergence).5. Then there exists a subsequence fnj that converges almost uniformly (and hence. we see that L1 convergence implies almost uniform or pointwise almost everywhere convergence if we are allowed to pass to a subsequence: Corollary 1.5.124 1. (Hint: If you have trouble getting started. such that fnj − f L1 (µ) ≤ 2−j (say). try working ﬁrst in the special case in which fn = An 1En are step functions and f = 0 and use Exercise 1. Proof. this can be rectiﬁed if one assumes that the L1 convergence is suﬃciently fast: Exercise 1. Suppose that fn : X → C are a sequence of measurable functions that converge in L1 norm to a limit f . Actually.5. Since fn − f L1 (µ) → 0 as n → ∞.
Exercise 1.8. A counterexample is if X = NN with counting measure. let fn. Now we turn to the reverse question.5.7. (ii) Show the same claim is true if. . and fn. 1. we merely assume that X is σﬁnite. Let fn : X → C be a sequence of measurable functions. Show that the following are equivalent: (i) fn converges in measure to f . i. However. one can do better if one places some domination hypotheses on the fn that shut down both of these escape routes. . and for each n. µ) be a measure space. fn and f are identically zero for all n ∈ N. Let (X. m2 .5. pointwise almost everywhere convergence. (i) If µ(X) is ﬁnite.5.5. such that fn.m is the indicator function the space of all sequences (ai )i∈N ∈ NN with an ≥ m. Domination and uniform integrability. the answer to this question is no. . .1. instead of assuming that µ(X) is ﬁnite. and let f : X → C be another measurable function. let fn : X → C be a sequence of measurable functions converging pointwise almost everywhere as n → ∞ to a measurable limit f : X → C. or convergence in measure can imply L1 convergence. Modes of convergence 125 It is instructive to see how this subsequence is extracted in the case of the typewriter sequence. one can view the operation of passing to a subsequence as being able to eliminate “typewriter” situations in which the tail support is much larger than the width. In general.e. (The claim can fail if X is not σﬁnite. B.5. The escape to vertical and width inﬁnity examples shows that without any further hypotheses.m : X → C be a sequence of measurable functions converging pointwise almost everywhere as m → ∞ (keeping n ﬁxed) to fn . it is the countable union of sets of ﬁnite measure. of whether almost uniform convergence.) Exercise 1. show that there exists a sequence m1 .mn converges pointwise almost everywhere to f. (ii) Every subsequence fnj of the fn has a further subsequence fnji that converges almost uniformly to f .
126
1. Measure theory
We say that a sequence fn : X → C is dominated if there exists an absolutely integrable function g : X → C such that fn (x) ≤ g(x) for all n and almost every x. For instance, if X has ﬁnite measure and the fn are uniformly bounded, then they are dominated. Observe that the sequences in the vertical and width escape to inﬁnity examples are not dominated (why?). The dominated convergence theorem (Theorem 1.4.49) then asserts that if fn converges to f pointwise almost everywhere, then it necessarily converges to f in L1 norm (and hence also in measure). Here is a variant: Exercise 1.5.9. Suppose that fn : X → C are a dominated sequence of measurable functions, and let f : X → C be another measurable function. Show that fn converges in L1 norm to f if and only if fn converges in measure to f . (Hint: one way to establish the “if” direction is ﬁrst show that every subsequence of the fn has a further subsequence that converges in L1 to f , using Exercise 1.5.6 and the dominated convergence theorem (Theorem 1.4.49). Alternatively, use monotone convergence to ﬁnd a set E of ﬁnite measure such that g dµ, and hence X\E fn dµ and X\E f dµ, are small.) X\E There is a more general notion than domination, known as uniform integrability, which serves as a substitute for domination in many (but not all) contexts. Deﬁnition 1.5.11 (Uniform integrability). A sequence fn : X → C of absolutely integrable functions is said to be uniformly integrable if the following three statements hold: (i) (Uniform bound on L1 norm) One has supn fn supn X fn  dµ < +∞. (ii) (No escape to vertical inﬁnity) One has supn 0 as M → +∞. (iii) (No escape to width inﬁnity) One has supn 0 as δ → 0.
fn ≥M L1 (µ)
=
fn  dµ →
fn ≤δ
fn  dµ →
Remark 1.5.12. It is instructive to understand uniform integrability in the step function case fn = An 1En . The uniform bound on the
1.5. Modes of convergence
127
L1 norm then asserts that An µ(En ) stays bounded. The lack of escape to vertical inﬁnity means that along any subsequence for which An → ∞, An µ(En ) must go to zero. Similarly, the lack of escape to width inﬁnity means that along any subsequence for which An → 0, An µ(En ) must go to zero. Exercise 1.5.10. (i) Show that if f is an absolutely integrable function, then the constant sequence fn = f is uniformly integrable. (Hint: use the monotone convergence theorem.) (ii) Show that every dominated sequence of measurable functions is uniformly integrable. (iii) Give an example of a sequence that is uniformly integrable but not dominated. In the case of a ﬁnite measure space, there is no escape to width inﬁnity, and the criterion for uniform integrability simpliﬁes to just that of excluding vertical inﬁnity: Exercise 1.5.11. Suppose that X has ﬁnite measure, and let fn : X → C be a sequence of measurable functions. Show that fn is uniformly integrable if and only if supn fn ≥M fn  dµ → 0 as M → +∞. Exercise 1.5.12 (Uniform Lp bound on ﬁnite measure implies uniform integrability). Suppose that X have ﬁnite measure, let 1 < p < ∞, an d suppose that fn : X → C is a sequence of measurable functions such that supn X fn p dµ < ∞. Show that the sequence fn is uniformly integrable. Exercise 1.5.13. Let fn : X → C be a uniformly integrable sequence of functions. Show that for every ε > 0 there exists a δ > 0 such that fn  dµ ≤ ε
E
whenever n ≥ 1 and E is a measurable set with µ(E) ≤ δ. Exercise 1.5.14. This exercise is a partial converse to Exercise 1.5.13. Let X be a probability space, and let fn : X → C be a sequence of absolutely integrable functions with supn fn L1 < ∞.
128
1. Measure theory
Suppose that for every ε > 0 there exists a δ > 0 such that fn  dµ ≤ ε
E
whenever n ≥ 1 and E is a measurable set with µ(E) ≤ δ. Show that the sequence fn is uniformly integrable. The dominated convergence theorem (Theorem 1.4.49) does not have an analogue in the uniformly integrable setting: Exercise 1.5.15. Give an example of a sequence fn of uniformly integrable functions that converge pointwise almost everywhere to zero, but do not converge almost uniformly, in measure, or in L1 norm. However, one does have an analogue of Exercise 1.5.9: Theorem 1.5.13 (Uniformly integrable convergence in measure). Let fn : X → C be a uniformly integrable sequence of functions, and let f : X → C be another function. Then fn converges in L1 norm to f if and only if fn converges to f in measure. Proof. The “only if” part follows from Exercise 1.5.2, so we establish the “if” part. By uniform integrability, there exists a ﬁnite A > 0 such that fn  dµ ≤ A
X
for all n. By Exercise 1.5.6, there is a subsequence of the fn that converges pointwise almost everywhere to f . Applying Fatou’s lemma (Corollary1.4.47), we conclude that f  dµ ≤ A,
X
thus f is absolutely integrable. Now let ε > 0 be arbitrary. By uniform integrability, one can ﬁnd δ > 0 such that (1.15)
fn ≤δ
fn  dµ ≤ ε
1.5. Modes of convergence
129
for all n. By monotone convergence, and decreasing δ if necessary, we may say the same for f , thus (1.16)
f ≤δ
f  dµ ≤ ε.
Let 0 < κ < δ/2 be another small quantity (that can depend on A, ε, δ) that we will choose a bit later. From (1.15), (1.16) and the hypothesis κ < δ/2 we have fn  dµ ≤ ε
fn −f <κ;f ≤δ/2
and f  dµ ≤ ε
fn −f <κ;f ≤δ/2
and hence by the triangle inequality (1.17)
fn −f <κ;f ≤δ/2
f − fn  dµ ≤ 2ε.
Finally, from Markov’s inequality (Exercise 1.4.36(vi)) we have µ({x : f (x) > δ/2}) ≤ and thus f − fn  dµ ≤ ε ≤
fn −f <κ;f >δ/2
A δ/2 A κ. δ/2
In particular, by shrinking κ further if necessary we see that f − fn  dµ ≤ ε
fn −f <κ;f >δ/2
and hence by (1.17) (1.18)
fn −f <κ
f − fn  dµ ≤ 3ε
for all n. Meanwhile, since fn converges in measure to f , we know that there exists an N (depending on κ) such that µ(fn (x) − f (x) ≥ κ) ≤ κ
130
1. Measure theory
for all n ≥ N . Applying Exercise 1.5.13, we conclude (making κ smaller if necessary) that fn  dµ ≤ ε
fn −f ≥κ
and f  dµ ≤ ε
fn −f ≥κ
and hence by the triangle inequality f − fn  dµ ≤ 2ε
fn −f ≥κ
for all n ≥ N . Combining this with (1.18) we conclude that fn − f
L1 (µ)
=
X
f − fn  dµ ≤ 5ε
for all n ≥ N , and so fn converges to f in L1 norm as desired. Finally, we recall two results from the previous notes for unsigned functions. Exercise 1.5.16 (Monotone convergence theorem). Suppose that fn : X → [0, +∞) are measurable, monotone nondecreasing in n and are such that supn X fn dµ < ∞. Show that fn converges in L1 norm to supn fn . (Note that supn fn can be inﬁnite on a null set, but the deﬁnition of L1 convergence can be easily modiﬁed to accomodate this.) Exercise 1.5.17 (Defect version of Fatou’s lemma). Suppose that fn : X → [0, +∞) are measurable, are such that supn X fn dµ < ∞, and converge pointwise almost everywhere to some measurable limit f : X → [0, +∞). Show that fn converges in L1 norm to f if and only if X fn dµ converges to X f dµ. Informally, we see that in the unsigned, bounded mass case, pointwise convergence implies L1 norm convergence if and only if there is no loss of mass. Exercise 1.5.18. Suppose that fn : X → C are a dominated sequence of measurable functions, and let f : X → C be another measurable function. Show that fn converges pointwise almost everywhere to f if and only if fn converges in almost uniformly to f .
Given any realvalued measurable function f : X → R. In that case. b] be a compact interval of positive length (thus −∞ < a < b < +∞). b] → R is said to be diﬀerentiable at a point x ∈ [a. b] if the limit (1. Let X be a probability space (see Section 2.19. (i) Show that if fn converges to f in any of the seven senses discussed above (uniformly. we deﬁne the cumulative distribution function F : R → [0.3). or just derivative for short. I is commonly used in probability. Recall that a function F : [a. in L1 .13 of An epsilon of room.19) F (x) := lim y→x. (iii) Show that convergence in distribution is not linear. (ii) Give an example in which fn converges to f in distribution. which is studied in S 1. g. 1. in the sense that if fn converges to f in distribution. as the above exercise demonstrates.5. which are not equal almost everywhere. Diﬀerentiation theorems 131 Exercise 1. (iv) Show that a sequence fn can converge in distribution to two diﬀerent limits f. Vol. of F at x. almost uniformly pointwise. but not in any of the above seven senses. Convergence in distribution (not to be confused with convergence in the sense of distributions. Diﬀerentiation theorems Let [a. we call F (x) the strong derivative. or in measure).b]\{x} F (y) − F (x) y−x exists. essentially uniformly. We say that F is .1. 1] of f to be the function F (λ) := µ({x ∈ X : f (x) ≤ λ}). we say that fn converges in distribution to f if the cumulative distribution function Fn (λ) of fn converges pointwise to the cumulative distribution function F (λ) of f at all λ ∈ R for which F is continuous. classical derivative. and gn converges to g.y∈[a. Given another sequence fn : X → R of realvalued measurable functions. then it converges in distribution to f .6. it is quite a weak notion of convergence. then fn + gn need not converge to f + g.6. but. pointwise almost everywhere. lacking many of the properties of the modes of convergence discussed here.
Remark 1. say at the origin 0. However. but not continuously diﬀerentiable.6.2 (Rolle’s theorem). This type of derivative can be applied to a much rougher class of functions and is in many ways more suitable than the classical derivative for doing “Lebesgue” type analysis (i. but give an example to demonstrate that F need not be continuous.132 1. Then there exists x ∈ (a. and diﬀerentiable almost everywhere if it is diﬀerentiable at almost every point x ∈ [a. Give an example of a function F : [a. analysis centred around the Lebesgue integral. In §1.6. and in particular allowing functions to be uncontrolled. so assume that F is nonzero somewhere.6. If F is identically zero then the claim is trivial. or even undeﬁned on sets of measure zero). we may . or diﬀerentiable for short. b] outside of a null set).1. if it is diﬀerentiable at all points x ∈ [a. Let [a. Exercise 1.) In singlevariable calculus. By replacing F with −F if necessary. Vol. b] → R be a diﬀerentiable function such that F (a) = F (b). By subtracting a constant from F (which does not aﬀect differentiability or the derivative) we may assume that F (a) = F (b) = 0. Theorem 1. (Hint: choose an F that vanishes quickly at some point. then we say that F is continuously diﬀerentiable. inﬁnite. b] be a compact interval of positive length.e. I. starting with Rolle’s theorem. b]. the notion of a weak derivative or distributional derivative is introduced. If F : [a. show that F is continuous and F is measurable. b]. If F is diﬀerentiable everywhere and its derivative F is continuous. Measure theory everywhere diﬀerentiable. b] → R is everywhere diﬀerentiable.6. b) such that F (x) = 0. show that the (almost everywhere deﬁned) function F is measurable (i.1. b] → R which is everywhere diﬀerentiable. the operations of integration and differentiation are connected by a number of basic theorems. Proof. for now we will stick with the classical approach to diﬀerentiation.13 of An epsilon of room. If F is almost everywhere diﬀerentiable. and let F : [a. but which also oscillates rapidly near that point.2. Exercise 1. it is equal to an everywhere deﬁned measurable function on [a.e.
Then F (x) must be positive and so x cannot equal either a or b. Thus F (x) = 0 and the claim follows. but few that manage to conclude everywhere diﬀerentiability. and thus must lie in the interior. Let [a. One can easily amplify Rolle’s theorem to the mean value theorem: Corollary 1. so long as it is continuous all the way up to the boundary of [a. It is important to note that Rolle’s theorem only works in the real scalar case when F is realvalued.6. and let F : [a.) Similar remarks to functions taking values in a ﬁnitedimensional vector space. we consider complexvalued scalar functions F : [a.3. as it relies heavily on the least upper bound property for the domain R. but its derivative F (x) = 2πie2πix is never zero. there are many theorems that assert in their conclusion that a function is almost everywhere diﬀerentiable. From the right limit of (1. the function F : [0. If. This example illustrates that everywhere diﬀerentiability is a signiﬁcantly stronger property than almost everywhere diﬀerentiability.6. as F is continuous and [a.19) we see that F (x) ≤ 0. b] → C. Remark 1. thus there exists x ∈ [a. b) such that F (x) = F (b)−F (a) . Remark 1. 1] → C deﬁned by F (x) := e2πix −1 vanishes at both endpoints and is diﬀerentiable. Exercise 1. while from the left limit we have F (x) ≥ 0. even if one adds the additional hypothesis that f is continuous. F must attain its maximum somewhere. b].6. such as Rn . We will see further evidence of this fact later in these notes. (Rolle’s theorem does imply that the real and imaginary parts of the derivative F both vanish somewhere.1. b] is compact. then the theorem can fail. b]. but the problem is that they don’t simultaneously vanish at the same point. for instance. b] be a compact interval of positive length. thus supx∈[a. for instance. Diﬀerentiation theorems 133 assume that F is positive somewhere.6. b−a . Then there exists x ∈ (a. b) of the interval [a. Observe that the same proof also works if F is only diﬀerentiable in the interior (a. b] such that F (x) ≥ F (y) for all y ∈ [a.6.4.b] F (x) > 0. On the other hand. b]. Give an example to show that Rolle’s theorem can fail if f is merely assumed to be almost everywhere diﬀerentiable.3.5 (Mean value theorem). b] → R be a diﬀerentiable function.
Remark 1. Then the Riemann integral a F (x) dx of F is equal to F (b) − F (a). b] be a compact interval of positive length. b] → R be a diﬀerentiable function. tj ] such that j F (t∗ )(tj − tj−1 ) = F (tj ) − F (tj−1 ) j and thus by telescoping series b (F (b) − F (a)) − a F (x) ≤ ε.7 (Second fundamental theorem of calculus). j Fix this partition. the claim follows. such that F is Riemann inb tegrable.6. As Rolle’s theorem is only applicable to real scalarvalued functions. b] if and only if F (x) = G(x) + C for some constant C ∈ R and all x ∈ [a. Measure theory Proof. Apply Rolle’s theorem to the function x → F (x)− F (b)−F (a) (x− b−a a). b]. the more general mean value theorem is also only applicable to such functions. From the mean value theorem. for each 1 ≤ j ≤ k one can ﬁnd t∗ ∈ [tj−1 . b] → R be diﬀerentiable functions.6. and let F : [a. Let ε > 0.6.6. Let [a. we have a F (x) dx = F (b) − F (a) whenever F is continuously diﬀerentiable. Proof. there exists a ﬁnite partition a = t0 < t1 < .4 (Uniqueness of antiderivatives up to constants). Since ε > 0 was arbitrary. . < tk = b such that k b b  j=1 F (t∗ )(tj − tj−1 ) − j a F (x) ≤ ε for every choice of t∗ ∈ [tj−1 . tj ]. b] → R and G : [a. We can use the mean value theorem to deduce one of the fundamental theorems of calculus: Theorem 1. . Exercise 1. Let F : [a. . Show that F (x) = G (x) for every x ∈ [a.134 1. In particular. By the deﬁnition of Riemann integrability.
1] as h → 0 (keeping x 1 ﬁxed). b] → C be a continuous function on a compact interval. Proof. b]. Let f : [a. as one can simply apply that theorem to each component of that function separately. b].x+h] . and h→0− lim F (x + h) − F (x) = f (x) h for all x ∈ (a.10 (Diﬀerentiation theorem for continuous functions). b). 0 f (x+ht) dt thus converges 1 to 0 f (x) dt = f (x). with derivative F (x) = f (x) for all x ∈ [a.6. It suﬃces to show that h→0+ lim F (x + h) − F (x) = f (x) h for all x ∈ [a. Let f : [a. b] → C be the indeﬁnite integral x F (x) := a f (t) dt. b] and any suﬃciently small h < 0. Even though the mean value theorem only holds for real scalar functions. Let [a. As the interval [0. the fundamental theorem of calculus holds for complex or vectorvalued functions. b] → C be a continuous function.6. In particular. we also have the other half of the fundamental theorem of calculus: Theorem 1. we can write F (x + h) − F (x) = h 1 f (x + ht) dt 0 for any x ∈ [a.1. After a change of variables. or any x ∈ (a. As f is continuous. b] be a compact interval of positive length. b) and any suﬃciently small h > 0.6. Corollary 1. F is continuously diﬀerentiable. 1] is bounded. b]. Then we have 1 lim f (t) dt = f (x) h→0+ h [x.6. and the claim follows. Diﬀerentiation theorems 135 Remark 1. Of course.8. Then F is diﬀerentiable on [a. and let F : [a.9 (First fundamental theorem of calculus). the function t → f (x+ht) converges uniformly to f (x) on [0.
x+h] for all x ∈ (a. and F (x) = f (x) for almost every x ∈ R.x] f (t) dt. the hypotheses are weaker because f is only assumed to be absolutely integrable. (ii) A number of diﬀerentiation theorems. F . rather than continuous. and let F : R → C be the deﬁnite integral F (x) := [−∞. or bounded variation functions in one dimension are almost everywhere diﬀerentiable.10. b). but the conclusion is weaker too. The Lebesgue diﬀerentiation theorem in one dimension. because . Lipschitz.x] for all x ∈ (a. which roughly speaking asserts that Corollary 1. Among the results proven in these notes are (i) The Lebesgue diﬀerentiation theorem. This can be viewed as a variant of Corollary 1. rather than continuous (and can live on the entire real line. In these notes we explore the question of the extent to which these theorems continue to hold when the diﬀerentiability or integrability conditions on the various functions F. and not just on a compact interval). f are relaxed. and (iii) The second fundamental theorem of calculus for absolutely continuous functions.6. 1. The main objective of this section is to show Theorem 1.6. which assert for instance that monotone. Let f : R → C be an absolutely integrable function. Measure theory lim 1 h f (t) dt = f (x) [x−h. onedimensional case). Then F is continuous and almost everywhere diﬀerentiable. b).10 continues to hold for almost every x if f is merely absolutely integrable.136 for all x ∈ [a.6. b]. h→0+ 1.6. and thus h→0+ lim 1 2h f (t) dt = f (x) [x−h.1.11 (Lebesgue diﬀerentiation theorem.
The main diﬃculty is to show that F (x) = f (x) for almost every x ∈ R. We will just prove the ﬁrst fact (1.6. but it is clear from taking real and imaginary parts that it suﬃces to prove the claim when f is realvalued.12 (Lebesgue diﬀerentiation theorem. Show that F is continuous.x+h] f (t) dt) converge in some sense (in this case.21) lim f (t) dt = f (x) h→0+ h [x−h. pointwise almost everywhere) to a speciﬁed limit (in this case. Then (1.x+h] for almost every x ∈ R. f ).6. a certain sequence of linear expressions Th f (in this case.6.x] f (t) dt.x] for almost every x ∈ R. (But such a relaxation of the conclusion is necessary at this level of generality. Let f : R → C be an absolutely integrable function. Let f : R → C be an absolutely integrable function. The conclusion (1. . Show that Theorem 1.20). consider for instance the example when f = 1[0. and let F : R → C be the deﬁnite integral F (x) := [−∞.6.1. second formulation).6. and 1 (1. the right averages 1 Th f (x) = h [x.5. and we shall thus assume this for the rest of the argument. We are taking f to be complex valued. This will follow from Theorem 1.6.11 follows from Theorem 1.20) by replacing f with the reﬂected function x → f (−x). Diﬀerentiation theorems 137 F is only found to be almost everywhere diﬀerentiable.20) h→0+ lim 1 h f (t) dt = f (x) [x. the class of absolutely integrable functions f : R → R).1] .an assertion that for all functions f in a given class (in this case. the second fact (1.21) is similar (or can be deduced from (1. rather than everywhere diﬀerentiable. Exercise 1.6.12.20) we want to prove is a convergence theorem .) The continuity is an easy exercise: Exercise 1.
Such functions are continuous. Then fh converges in L1 norm to f as h → 0. We ﬁrst verify this claim for a dense subclass of f . Measure theory There is a general and very useful argument to prove such convergence theorems. This argument requires two ingredients. Rd Proof. it is usually not too hard to put them together to obtain the desired convergence theorem for general functions f (not just those in the dense subclass). and for each h ∈ Rd . the support of fh − f stays uniformly bounded for h in a bounded set. smooth functions.13 (Translation is continuous in L1 ). Let f : Rd → C be an absolutely integrable function. as f is compactly supported. let fh : Rd → C be the shifted function fh (x) := f (x − h). (ii) A quantitative estimate that upper bounds the maximal ﬂuctuation of the linear expressions Th f in terms of the “size” of the function f (where the precise deﬁnition of “size” depends on the nature of the approximation in the ﬁrst ingredient). simple functions. we mean that a general function f in the original class can be approximated to arbitrary accuracy in a suitable sense by a function in the nice subclass. and thus fh converges uniformly to f as h → 0. known as the density argument.138 1.e. which we state informally as follows: (i) A veriﬁcation of the convergence result for some “dense subclass” of “nice” functions f . namely the functions f which are continuous and compactly supported (i. . From this we see that fh also converges to f in L1 norm as required.6. We illustrate this with a simple example: Proposition 1. Once one has these two ingredients. such as continuous functions.. thus h→0 lim fh (x) − f (x) dx = 0. etc. they vanish outside of a compact set). By “dense”. Furthermore.
it is also possible to use .20(iii)) to the absolutely integrable function F . we proved the required quantitative estimate directly for all functions f in the original class of functions. and let ε > 0 be arbitrary. However. Rd which we rearrange as (fh − f )h (x) − (gh − g)(x) dx ≤ 2ε. Diﬀerentiation theorems Next. Rd By the dense subclass result. we can ﬁnd a continuous.3. we conclude that (f − g)h (x) − (f − g)(x) dx ≤ 2ε. compactly supported function g : Rd → C such that f (x) − g(x) dx ≤ ε. Let f : Rd → C be absolutely integrable. and the claim follows. Now we put the two ingredients together. Rd Applying (1.14. we observe the quantitative estimate (1.6. we also know that gh (x) − g(x) dx ≤ ε Rd for all h suﬃciently close to zero. Remark 1.22) Rd 139 fh (x) − f (x) dx ≤ 2 Rd f (x) dx for any h ∈ Rd . Applying Littlewood’s second principle (Theorem 1. we conclude that fh (x) − f (x) dx ≤ 3ε Rd for all h suﬃciently close to zero.1. This follows easily from the triangle inequality fh (x) − f (x) dx ≤ Rd Rd fh (x) dx + Rd f (x) dx together with the translation invariance of the Lebesgue integral: fh (x) dx = Rd Rd f (x) dx. From the triangle inequality.6. In the above application of the density argument.22).
6. which is that convolutions tend to be smoothing in nature. which are particularly suited for showing that upper bound estimates are preserved with respect to limits.140 1.8 (Steinhaus theorem).e. g : Rd → C be Lebesgue measurable functions such that f is absolutely integrable and g is essentially bounded (i.g. Show that the convolution f ∗ g : Rd → C deﬁned by the formula f ∗ g(x) = Rd f (y)g(x − y) dy is welldeﬁned (in the sense that the integrand on the righthand side is absolutely integrable) and that f ∗ g is a bounded. Measure theory the density argument a second time and initially verify the quantitative estimate just for functions f in a nice subclass (e. the convolution f ∗ g of two functions is usually at least as regular as.) Exercise 1. and often more regular than. namely the Steinhaus theorem: Exercise 1. (i) Show that all measurable homomorphisms are continuous. and then apply the previous exercise to the convolution 1E ∗ 1−E .9. y ∈ Rd . This smoothing phenomenon gives rise to an important fact. Let E ⊂ Rd be a Lebesgue measurable set of positive measure. continuous function. where −E := {−y : y ∈ E}. The above exercise is illustrative of a more general intuition. Show that the set E − E := {x − y : x. Let f : Rd → C.6. Exercise 1. A homomorphism f : Rd → C is a map with the property that f (x + y) = f (x) + f (y) for all x. either of the two factors f. (Hint: for any disk D centered at the origin in the complex plane. continuous functions of compact support). y ∈ E} contains an open neighbourhood of the origin. (Hint: reduce to the case when E is bounded.6. show that f −1 (z + D) has positive measure for at least one z ∈ C. bounded outside of a null set). and then use the Steinhaus theorem from the previous exercise. one can then extend that estimate to the general case by using tools such as Fatou’s lemma (Corollary1.) . g. In many cases.4.7.47).
. On the one hand.has at least one basis. Remark 1. it is not hard to see that f (x) − f (x) dx ≥ c for some absolute constant c > 0. Actually. To illustrate this issue. . (Hint: ﬁrst establish this for rational x1 .4 of An epsilon of room.15. . For instance. xd ∈ R and some complex coeﬃcients z1 . let’s work in one dimension and consider the function f (x) := sin(N x)1[0.) (ii) (For readers familiar with Zorn’s lemma. . . we thus see that the rate of convergence of R fh (x) − f (x) dx to zero can be arbitrarily slow. by which we mean a uniformly continuous function with a reasonable modulus of continuity. See [Ta2008. one has to make h at most π/N from the origin. On the other hand. and then use the previous part of this exercise. (Hint: view Rd (or C) as a vector space over the rationals Q. .1.even an inﬁnitedimensional one . . Diﬀerentiation theorems 141 • Show that f is a measurable homomorphism if and only if it takes the form f (x1 .2π] (x). . .4] for some further discussion . Thus. . . . f is bounded in the L1 norm uniformly in N : R f (x) dx ≤ 2π (indeed.6. but it depends on “how measurable” the function f is. the proof does eventually give such a bound. R π/N if one force R fh (x) − f (x) dx to drop below c. I ) Show that there exist homomorphisms f : Rd → C which are not of the form in the previous exercise. Making N large. due to the increasingly oscillatory nature of f . + xd zd for all x1 . in Proposition 1.6.6. . . . but we do not know exactly how δ depends on ε and f . the lefthand side is equal to 2). . or more precisely how “easy” it is to approximate f by a “nice” function. where N ≥ 1 is a large integer. zd . . even though f is bounded in L1 . it becomes increasingly diﬃcult to approximate f well by a “nice” function. xd . see §2. we know that for any ε > 0. there exists δ > 0 such that Rd fh (x) − f (x) dx ≤ ε whenever h ≤ δ. and use the fact (from Zorn’s lemma) that every vector space . xd ) = x1 z1 + . The problem is that as N gets large. §1. One drawback with the density argument is it gives convergence results which are qualitative rather than quantitative there is no explicit bound on the rate of convergence.) This gives an alternate construction of a nonmeasurable set to that given in previous notes. .13. Vol.
λ > 0 be arbitrary. one has both (1.142 1. Now we return to the Lebesgue diﬀerentiation theorem. Measure theory of this issue. and let ε. we can ﬁnd a function g : R → C which is continuous and compactly supported. R Applying the onesided HardyLittlewood maximal inequality. and let λ > 0.6.10.x+h] In a similar spirit. Then 1 1 f (t) dt ≥ λ}) ≤ f (t) dt.24) for all h > 0.6. and what quantitative substitutes are available for such qualitative results. from Markov’s inequality (Lemma 1. Let f : R → C be absolutely integrable.16 (Onesided HardyLittlewood maximal inequality). we conclude that 1 ε f (t) − g(t) dt ≥ λ}) ≤ .x+h] . which asserts that (1.3. The quantitative estimate we will need is the following special case of the HardyLittlewood maximal inequality: Lemma 1. combined with the dense subclass result. Let f : R → C be an absolutely integrable function. we conclude that for all x ∈ R outside of a set E of measure at most 2ε/λ.20) holds for all continuous functions f .x+h] λ R h>0 We will prove this lemma shortly. m({x ∈ R : sup λ h>0 h [x. f (x) − g(x) < λ 1 h f (t) − g(t) dt < λ [x. The dense subclass result is already contained in Corollary 1.23) and (1. m({x ∈ R : sup h [x. λ By subadditivity. will give the Lebesgue differentiation theorem. with f (x) − g(x) dx ≤ ε. and apply the density argument.15) we have ε m({x ∈ R : f (x) − g(x) ≥ λ}) ≤ . Then by Littlewood’s second principle. but let us ﬁrst see how this.
1. and let F : [a. bn ) in [a. and the claim follows. we conclude that  1 h f (t) dt − f (x) < 3λ [x.17 (Rising sun lemma).6.x+h] for all h suﬃciently close to zero. b] does not lie in any of the intervals In .x+h] whenever h is suﬃciently close to x. From the dense subclass result (Corollary 1.6. Let [a. we conclude that lim sup  h→0 1 h f (t) dt − f (x) = 0 [x.23). we conclude that lim sup  h→0 1 h f (t) dt − f (x) < 3λ [x. (1. b] → R be a continuous function. or else an = a and F (bn ) ≥ F (an ).). The only remaining task is to establish the onesided HardyLittlewood maximal inequality.6. either F (an ) = F (bn ).x+h] for almost every x ∈ R. Keeping λ ﬁxed and sending ε to zero. b] be a compact interval. (ii) If x ∈ [a. Combining this with (1. Then one can ﬁnd an at most countable family of disjoint nonempty open intervals In = (an .g. . λ := 1/n for n = 1. We will do so by using the rising sun lemma: Lemma 1. In particular we have lim sup  h→0 1 h f (t) dt − f (x) < 3λ [x. and the triangle inequality. we have  1 h g(t) dt − g(x) < λ [x. . Diﬀerentiation theorems 143 Now let x ∈ R\E. 2. b] with the following properties: (i) For each n.24). then one must have F (y) ≤ F (x) for all x ≤ y ≤ b. If we then let λ go to zero along a countable sequence (e.x+h] for almost every x ∈ R.10) applied to the continuous function g. . .x+h] for all x outside of a set of measure 2ε/λ.
if you will). whose endpoints lie outside of U . 0) (or rising from the east. In particular we have F (bn ) ≤ F (an ). bn ) is such that an = a. . and thus there exists t∗ < y ≤ b such that F (y) > F (t∗ ). (Proof of rising sun lemma) Let U be the set of all x ∈ (a. F (x)) : x ∈ [a. so it suﬃces to establish the ﬁrst.) Proof. as I feel that it is far more instructive and useful for the reader to directly create a personalised visual aid for these results. similarly we have F (y) ≤ F (bn ) for all bn ≤ y ≤ b. Suppose ﬁrst that In = (an . As the endpoint an does not lie in U . we must have F (y) ≤ F (an ) for all an ≤ y ≤ b. The reader is encouraged to draw a picture14 to illustrate this perspective. bn lying outside of U . b) of U . b] : F (s) ≥ F (t)}. and that these maximal open subintervals are disjoint. Since F (t∗ ) ≥ F (t) > F (bn ).18.6. Let A := {s ∈ [t. b) such that F (y) > F (x) for at least one x < y < b. Show that any open subset U of R can be written as the union of at most countably many disjoint nonempty open intervals. and F (bn ) ≥ F (z) for all 14Author’s note: I have deliberately omitted including such pictures in the text. Measure theory Remark 1. Suppose for contradiction that there was an < t < bn with F (bn ) < F (t).6. Set t∗ := sup(A). it will then suﬃce to show that F (bn ) ≥ F (t) for all an < t < bn . with the endpoints an . with the sun shining horizontally from the rightward inﬁnity (+∞. The second conclusion of the rising sun lemma is clear from construction. As F is continuous. imagine the graph {(x. bn ). This lemma is proven using the following basic fact: Exercise 1.10. b]} of F as depicting a hilly landscape. then t∗ ∈ [t. U is open. with each such interval containing at least one rational number. b) ⊂ In ⊂ U . Those x for which F (y) ≤ F (x) are the locations on the landscape which are illuminated by the sun.144 1. To explain the name “rising sun lemma”. then A is a closed set that contains t but not b. (Hint: ﬁrst show that every x in U is contained in a maximal open subinterval (a. By the continuity of F . and so U is the union of at most countably many disjoint nonempty open intervals In = (an . The intervals In then represent the portions of the landscape that are in shadow.
6. b]. {x ∈ [a. we may replace the nonstrict inequality here with strict inequality: (1.[x.25) by n (bn − an ).x+h]⊂[a. we observe that 1 f (t) dt > λ} ⊂ In . the only diﬀerence is that we can no longer assert that F (y) ≤ F (an ) for all an ≤ y ≤ b. We apply the rising sun lemma to the function F : [a.1.b] Fix [a. The case when an = a is similar and is left to the reader. By modifying λ by an epsilon.x+h] λ R h>0. and so do not have the upper bound F (bn ) ≤ F (an ).[x.5.b] h sup f (t) dt ≥ λ}) ≤ [x. b] → R deﬁned as F (x) := [a. F is continuous. and so we can ﬁnd an at most countable sequence of intervals In = (an .x+h]⊂[a. and thus lies in A. we may thus upper bound the lefthand side of (1. By Lemma 1. b]. By upwards monotonicity. Diﬀerentiation theorems 145 bn ≤ z ≤ b. Now we can prove the onesided HardyLittlewood maximal inequality.b] n 1 since the property h [x.25) 1 1 m({x ∈ [a.x] f (t) dt − (x − a)λ. but this contradicts the fact that t∗ is the supremum of A.x+h] f (t) dt > λ can be rearranged as F (x + h) > F (x).[x.x+h] h>0.6. b] : sup f (t) dt > λ}) ≤ f (t) dt h [x. b] : sup h [x.x+h] 1 λ f (t) dt R for any compact interval [a.x+h]⊂[a. By countable additivity. we see that y cannot exceed bn . bn ) with the properties given by the rising sun lemma. since F (bn ) − F (an ) ≥ 0. On the other hand. From the second property of that lemma. n In . b] : 1 h>0. we have f (t) dt ≥ λ(bn − an ) In and thus (bn − an ) ≤ n 1 λ f (t) dt. it will suﬃce to show that m({x ∈ [a.
6.16.) .6. and let f ∗ : R → R be the onesided signed HardyLittlewood maximal function f ∗ (x) := sup 1 h h>0 f (t) dt. (Hint: First do the λ = 0 case. Measure theory As the In are disjoint intervals in I.146 1. Exercise 1. and the claim follows.11 (Twosided HardyLittlewood maximal inequality).6. [x.13.9] for some further discussion of inequalities of this type. in which case one can apply the rising sun lemma to a suﬃciently large interval containing the support of f .12 (Rising sun inequality).6. Exercise 1. Let f : R → R be an absolutely integrable function.) See [Ta2009. (Hint: one may ﬁrst wish to try this in the case when f has compact support. Let f : R → C be an absolutely integrable function. and applications to ergodic theory (and in particular the maximal ergodic theorem).6. Exercise 1. R where the supremum ranges over all intervals I of positive length that contain x. by invoking the rising sun lemma.16 are in fact equal. §2. Show that m({x ∈ R : sup x∈I 1 I f (t) dt ≥ λ}) ≤ I 2 λ f (t) dt.b] f (t) dt. and show that this inequality implies Lemma 1. and let λ > 0. Show that the left and righthand sides in Lemma 1. we may apply monotone convergence and monotonicity to conclude that f (t) dt ≤ n In [a.x+h] Establish the rising sun inequality λm({f ∗ (x) > λ}) ≤ x:f ∗ (x)>λ f (x) dx for all real λ (note here that we permit λ to be zero or negative).
one has (1. B(0.6. r)) f (y) − f (x) dy B(x. r) := {y ∈ Rd : x − y < r} is the open ball of radius r centred at x. .r) ≤ 1 m(B(x.6.r) so we see that the ﬁrst conclusion of Theorem 1.11 does not have an obvious highdimensional analogue. there exists an open neighbourhood of x on which f is absolutely integrable.2.6.6.6. Let f : Rd → C be an absolutely integrable function.6.19 implies the second. for an absolutely integrable function f . Then for almost every x ∈ Rd . almost every point in Rd will be a Lebesgue point for Rd .6. r)) f (y) − f (x) dy. Call a function f : Rd → C locally integrable if. A point x for which (1. r)) lim f (y) − f (x) dy = 0 B(x.r) 1 m(B(x.1. r)) f (y) dy − f (x) =  B(x.26) holds is called a Lebesgue point of f .19 implies a generalisation of itself in which the condition of absolute integrability of f is weakened to local integrability. (i) Show that f is locally integrable if and only if ∞ for all r > 0.r) f (x) dx < (ii) Show that Theorem 1. B(x.12 does: Theorem 1. B(x. Diﬀerentiation theorems 147 1. thus. Theorem 1.6. Exercise 1. but Theorem 1. Now we extend the Lebesgue diﬀerentiation theorem to higher dimensions. for every x ∈ Rd .r) 1 r→0 m(B(x.26) and 1 r→0 m(B(x. r)) lim f (y) dy = f (x).r) where B(x. From the triangle inequality we see that  1 m(B(x. The Lebesgue diﬀerentiation theorem in higher dimensions.14.19 (Lebesgue diﬀerentiation theorem in general dimension).
12. we will use the following covering lemma.17.20 (HardyLittlewood maximal inequality). r) in Rd and a real number c > 0. Then 1 Cd f (y) dy ≥ λ}) ≤ f (t) dt m({x ∈ Rd : sup m(B(x. and let λ > 0. let Eh be a subset of B(0.21.6.r) f (y) dy ≥ λ} is known as the HardyLittlewood maximal function of f . To prove Theorem 1. then 1 lim f (y) dy = f (x).r)) B(x. (Note that this is slightly diﬀerent from the set c · B := {cy : y ∈ B} .6. h→0 m(Eh ) x+E h Conclude that Theorem 1. The dense subclass case is easy: Exercise 1. this estimate was established via the rising sun lemma. that lemma relied heavily on the ordered nature of R.6. The quantitative estimate needed is the following: Theorem 1. Given an open ball B = B(x. Exercise 1. Let f : Rd → C be an absolutely integrable function.20 implies Theorem 1. The expression supr>0 m(B(x. Unfortunately.6.15. but c times the radius.r) λ R r>0 for some constant Cd > 0 depending only on d. . Show that Theorem 1.148 1. r)) B(x.6. h)) for some c > 0 independent of h. we write cB := B(x. and does not have an obvious analogue in higher dimensions.19 holds whenever f is continuous. Use the density argument to show that Theorem 1. h) with the property that m(Eh ) ≥ cm(B(0. It is an important function in the ﬁeld of (realvariable) harmonic analysis.16. and is often denoted M f (x). cr) for the ball with the same centre as B. 1 Remark 1. Measure theory Exercise 1. we use the density argument. In the onedimensional case.19.6.why?) Note that cB = cd B for any open ball B ⊂ Rd and any c > 0. and x is a Lebesgue point of f .19 implies Theorem 1.6.19.6.6. For each h > 0. Instead.6. Show that if f : Rd → C is locally integrable.6.
1.6. Diﬀerentiation theorems
149
Lemma 1.6.22 (Vitalitype covering lemma). Let B1 , . . . , Bn be a ﬁnite collection of open balls in Rd (not necessarily disjoint). Then there exists a subcollection B1 , . . . , Bm of disjoint balls in this collection, such that
n m
(1.27)
i=1
Bi ⊂
j=1
3Bj .
In particular, by ﬁnite subadditivity,
n m
m(
i=1
Bi ) ≤ 3d
j=1
m(Bj ).
Proof. We use a greedy algorithm argument, selecting the balls Bi to be as large as possible while remaining disjoint. More precisely, we run the following algorithm: Step 0. Initialise m = 0 (so that, initially, there are no balls B1 , . . . , Bm in the desired collection). Step 1. Consider all the balls Bj that do not already intersect one of the B1 , . . . , Bm (so, initially, all of the balls B1 , . . . , Bn will be considered). If there are no such balls, STOP. Otherwise, go on to Step 2. Step 2. Locate the largest ball Bj that does not already intersect one of the B1 , . . . , Bm . (If there are multiple largest balls with exactly the same radius, break the tie arbitrarily.) Add this ball to the collection B1 , . . . , Bm by setting Bm+1 := Bj and then incrementing m to m + 1. Then return to Step 1. Note that at each iteration of this algorithm, the number of available balls amongst the B1 , . . . , Bn drops by at least one (since each ball selected certainly intersects itself and so cannot be selected again). So this algorithm terminates in ﬁnite time. It is also clear from construction that the B1 , . . . , Bm are a subcollection of the B1 , . . . , Bn consisting of disjoint balls. So the only task remaining is to verify that (1.27) holds at the completion of the algorithm, i.e. to show that each ball Bi in the original collection is covered by the triples 3Bj of the subcollection.
150
1. Measure theory
For this, we argue as follows. Take any ball Bi in the original collection. Because the algorithm only halts when there are no more balls that are disjoint from the B1 , . . . , Bm , the ball Bi must intersect at least one of the balls Bj in the subcollection. Let Bj be the ﬁrst ball with this property, thus Bi is disjoint from B1 , . . . , Bj−1 , but intersects Bj . Because Bj was chosen to be largest amongst all balls that did not intersect B1 , . . . , Bj−1 , we conclude that the radius of Bi cannot exceed that of Bj . From the triangle inequality, this implies that Bi ⊂ 3Bj , and the claim follows. Exercise 1.6.18. Technically speaking, the above algorithmic argument was not phrased in the standard language of formal mathematical deduction, because in that language, any mathematical object (such as the natural number m) can only be deﬁned once, and not redeﬁned multiple times as is done in most algorithms. Rewrite the above argument in a way that avoids redeﬁning any variable. (Hint: introduce a “time” variable t, and recursively construct families B1,t , . . . , Bmt ,t of balls that represent the outcome of the above algorithm after t iterations (or t∗ iterations, if the algorithm halted at some previous time t∗ < t). For this particular algorithm, there are also more ad hoc approaches that exploit the relatively simple nature of the algorithm to allow for a less notationally complicated construction.) More generally, it is possible to use this time parameter trick to convert any construction involving a provably terminating algorithm into a construction that does not redeﬁne any variable. (It is however dangerous to work with any algorithm that has an inﬁnite run time, unless one has a suitably strong convergence result for the algorithm that allows one to take limits, either in the classical sense or in the more general sense of jumping to limit ordinals; in the latter case, one needs to use transﬁnite induction in order to ensure that the use of such algorithms is rigorous; see §2.4 of An epsilon of room, Vol. I.) Remark 1.6.23. The actual Vitali covering lemma[Vi1908] is slightly diﬀerent to this one, but we will not need it here. Actually there is a family of related covering lemmas which are useful for a variety of tasks in harmonic analysis, see for instance [deG1981] for further discussion.
1.6. Diﬀerentiation theorems
151
Now we can prove the HardyLittlewood inequality, which we will do with the constant Cd := 3d . It suﬃces to verify the claim with strict inequality, 1 Cd m({x ∈ Rd : sup f (y) dy > λ}) ≤ f (t) dt λ R r>0 m(B(x, r)) B(x,r) as the nonstrict case then follows by perturbing λ slightly and then taking limits. Fix f and λ. By inner regularity, it suﬃces to show that m(K) ≤ 3d λ f (t) dt
R 1 m(B(x,r)) B(x,r)
whenever K is a compact set that is contained in {x ∈ Rd : supr>0 λ}.
f (y) dy >
By construction, for every x ∈ K, there exists an open ball B(x, r) such that 1 (1.28) f (y) dy > λ. m(B(x, r)) B(x,r) By compactness of K, we can cover K by a ﬁnite number B1 , . . . , Bn of such balls. Applying the Vitalitype covering lemma, we can ﬁnd a subcollection B1 , . . . , Bm of disjoint balls such that
n m
m(
i=1
Bi ) ≤ 3d
j=1
m(Bj ).
By (1.28), on each ball Bj we have m(Bj ) < 1 λ f (y) dy;
Bj
summing in j and using the disjointness of the Bj we conclude that
n
m(
i=1
Bi ) ≤
3d λ
f (y) dy.
Rd
Since the B1 , . . . , Bn cover K, we obtain Theorem 1.6.20 as desired. Exercise 1.6.19. Improve the constant 3d in the HardyLittlewood maximal inequality to 2d . (Hint: observe that with the construction used to prove the Vitali covering lemma, the centres of the balls Bi m m are contained in j=1 2Bj and not just in j=1 3Bj . To exploit this
152
1. Measure theory
observation one may need to ﬁrst create an epsilon of room, as the centers are not by themselves suﬃcient to cover the required set.) Remark 1.6.24. The optimal value of Cd is not known in general, although a fairly recent result of Melas[Me2003] gives the surprising √ conclusion that the optimal value of C1 is C1 = 11+ 61 = 1.56 . . .. It 12 is known that Cd grows at most linearly in d, thanks to a result of Stein and Str¨mberg[StSt1983], but it is not known if Cd is bounded o in d or grows as d → ∞. Exercise 1.6.20 (Dyadic maximal inequality). If f : Rd → C is an absolutely integrable function, establish the dyadic HardyLittlewood maximal inequality 1 1 m({x ∈ Rd : sup f (y) dy ≥ λ}) ≤ f (t) dt Q Q λ R x∈Q where the supremum ranges over all dyadic cubes Q that contain x. (Hint: the nesting property of dyadic cubes will be useful when it comes to the covering lemma stage of the argument, much as it was in Exercise 1.1.14.) Exercise 1.6.21 (Besicovich covering lemma in one dimension). Let I1 , . . . , In be a ﬁnite family of open intervals in R (not necessarily disjoint). Show that there exist a subfamily I1 , . . . , Im of intervals such that (i)
n i=1 In
=
m j=1 Im ;
and
(ii) Each point x ∈ R is contained in at most two of the Im . (Hint: First reﬁne the family of intervals so that no interval Ii is contained in the union of the the other intervals. At that point, show that it is no longer possible for a point to be contained in three of the intervals.) There is a variant of this lemma that holds in higher dimensions, known as the Besicovitch covering lemma. Exercise 1.6.22. Let µ be a Borel measure (i.e. a countably additive measure on the Borel σalgebra) on R, such that 0 < µ(I) < ∞ for every interval I of positive length. Assume that µ is inner regular, in the sense that µ(E) = supK⊂E, compact µ(K) for every Borel measurable set E. (As it turns out, from the theory of Radon measures,
1.6. Diﬀerentiation theorems
153
all locally ﬁnite Borel measures have this property, but we will not prove this here; see §1.10 of An epsilon of room, Vol. I.) Establish the HardyLittlewood maximal inequality µ({x ∈ R : sup
x∈I
1 µ(I)
f (y) dµ(y) ≥ λ}) ≤
I
2 λ
f (y) dµ(y)
R
for any absolutely integrable function f ∈ L1 (µ), where the supremum ranges over all open intervals I that contain x. Note that this essentially generalises Exercise 1.6.11, in which µ is replaced by Lebesgue measure. (Hint: Repeat the proof of the usual HardyLittlewood maximal inequality, but use the Besicovich covering lemma in place of the Vitalitype covering lemma. Why do we need the former lemma here instead of the latter?) Exercise 1.6.23 (Cousin’s theorem). Prove Cousin’s theorem: given any function δ : [a, b] → (0, +∞) on a compact interval [a, b] of positive length, there exists a partition a = t0 < t1 < . . . < tk = b with k ≥ 1, together with real numbers t∗ ∈ [tj−1 , tj ] for each 1 ≤ j ≤ k j and tj − tj−1 ≤ δ(t∗ ). (Hint: use the HeineBorel theorem, which j asserts that any open cover of [a, b] has a ﬁnite subcover, followed by the Besicovitch covering lemma.) This theorem is useful in a variety of applications related to the second fundamental theorem of calculus, as we shall see below. The positive function δ is known as a gauge function. Now we turn to consequences of the Lebesgue diﬀerentiation theorem. Given a Lebesgue measurable set E ⊂ Rd , call a point x ∈ Rd a point of density for E if m(E∩B(x,r)) → 1 as r → 0. Thus, for inm(B(x,r)) stance, if E = [−1, 1]\{0}, then every point in (−1, 1) (including the boundary point 0) is a point of density for E, but the endpoints −1, 1 (as well as the exterior of E) are not points of density. One can think of a point of density as being an “almost interior” point of E; it is not necessarily the case that one can ﬁt an small ball B(x, r) centred at x inside of E, but one can ﬁt most of that small ball inside E. Exercise 1.6.24. If E ⊂ Rd is Lebesgue measurable, show that almost every point in E is a point of density for E, and almost every point in the complement of E is not a point of density for E.
(i) Give an example of a compact set K ⊂ R of positive measure such that m(K ∩ I) < I for every interval I of positive length. convergence in L1 or L∞ norm. . Measure theory Exercise 1.) (ii) Give an example of a measurable set E ⊂ R such that 0 < m(E ∩ I) < I for every interval I of positive length. +∞) → 15Diﬀerent texts have slightly diﬀerent notions of what a good kernel is. etc.25. almost everywhere convergence.6.154 1. The complement of the set K in the ﬁrst example is the union of at most countably many open intervals.) (iii) Use the above result to give an alternate proof of the Steinhaus theorem (Exercise 1.8).6. such as (−1. 2). radial (which means that there is a function P : [0. (Hint: reduce to the case when E is bounded.24. show that there exists a cube Q ⊂ Rd of positive sidelength such that m(E ∩ Q) > (1 − ε)m(Q). (i) Using Exercise 1.10. and on what hypotheses one wishes to place on the original function f.g. a good principle to adopt in analysis is that cubes and balls are “equivalent up to constants”. such as balls.6. thanks to Exercise 1.).6.15 and Exercise 1. 1] of measure strictly less than 1.6. Of course. Deﬁne a good kernel 15 to be a measurable function P : Rd → R+ which is non˜ negative.) Exercise 1. (Hint: ﬁrst construct an open dense subset of [0.6.26. This type of mental equivalence is analogous to. (ii) Give an alternate proof of the above claim that avoids the Lebesgue diﬀerentiation theorem. (Hint: ﬁrst work in a bounded interval. in that a cube of some sidelength can be contained in a ball of comparable radius.) Exercise 1.6. the “right” class of kernels to consider depends to some extent on what type of convergence results one is interested in (e. Now ﬁll in these open intervals and iterate. and vice versa. Let E ⊂ Rd be a measurable set of positive measure. one can replace cubes here by other comparable shapes. though not identical with.27 (Approximations to the identity). the famous dictum that a topologist cannot distinguish a doughnut from a coﬀee cup. and let ε > 0. (Indeed. then approximate E by an almost disjoint union of cubes.
with the standard example being the absolute value function f (x) := x. which is continuous not diﬀerentiable at the 16Note that we have modiﬁed the usual formulation of the heat kernel by replacing t with t2 in order to make it conform to the notational conventions used in this exercise. not every continuous function f : R → R is diﬀerentiable. and has total mass Rd P (x) dx equal to 1. Almost everywhere diﬀerentiability.) (iii) Establish the quantitative upper bound  Rd f (y)Pt (x − y) dy ≤ Cd sup r>0 1 B(x. (Hint: compare P with such “horizontal wedding cake” functions ∞ ˜ as n=−∞ 12n−1 <x≤2n P (2n ). r) f (y) dy B(x. if the constant cd > 0 is chosen correctly (in fact one has cd = Γ((d + 1)/2)/π (d+1)/2 . f ∗ Pt converges pointwise almost everywhere to f . 1.1.r) for any absolutely integrable function f and some constant Cd > 0 depending only on d.6. (Hint: split f (y) as the sum of f (x) and f (y) − f (x). .) In particular. radially nonincreasing (so that P is a nonincreasing function). but you are not required to establish this). Diﬀerentiation theorems 155 ˜ ˜ R+ such that P (x) = P (x)). (iv) Show that if f : Rd → C is absolutely integrable and x is a Lebesgue point of f . The functions Pt (x) := t1 P ( x ) for t > 0 are then said to be a good d t family of approximations to the identity. As we see in undergraduate real analysis. then ∞ 2 2 cd < n=−∞ ˜ 2dn P (2n ) ≤ Cd for some constants 0 < cd < Cd depending only on d. (ii) Show that if P is a good kernel. then the convolution f ∗ Pt (x) := Rd f (y)Pt (x − y) dy converges to f (x) as t → 0.3. (i) Show that the heat kernels 16 Pt (x) := (4πt1)d/2 e−x /4t and 2 t Poisson kernels Pt (x) := cd (t2 +x2 )(d+1)/2 are good families of approximations to the identity.6.
8n one has F ( j+1 )−F ( 8jn ) ≥ c4−n for some absolute constant n 8 c > 0.) Note that it is not enough to formally diﬀerentiate the series term by term and observe that the resulting series is divergent . (i) Show that F is welldeﬁned (in the sense that the series is absolutely convergent) and that F is a bounded continuous function. Exercise 1. one can construct continuous functions that are in fact nowhere diﬀerentiable: Exercise 1. which can lead to breakdown of diﬀerentiability.6. Show that every monotone function is measurable. Of course. Any function F : R → R which is monotone (either monotone nondecreasing or monotone nonincreasing) is diﬀerentiable almost everywhere.6. we have Theorem 1. this function is still almost everywhere differentiable.25 (Monotone diﬀerentiation theorem).29. To understand the diﬀerentiability of F . we just treat the case when F is monotone nondecreasing.156 1.28 (Weierstrass function). as the nonincreasing case is similar (and can be deduced from the nondecreasing case by replacing F with −F ). With a bit more eﬀort. if one can somehow limit the amount of oscillation present.6. as this allows us to use the rising sun lemma. Measure theory origin x = 0. Let F : R → R be the function ∞ F (x) := n=1 4−n sin(8n πx). We also ﬁrst focus on the case when F is continuous.why not? The diﬃculty here is that a continuous function can still contain a large amount of oscillation. (ii) Show that for every 8dyadic interval [ 8jn . (iii) Show that F is not diﬀerentiable at any point x ∈ R. then one can often recover a fair bit of diﬀerentiability. To prove this theorem. For instance. However. (Hint: argue by contradiction and use the previous part of this exercise. j+1 ] with n ≥ 1. we introduce the four Dini derivatives of F at x: .
the four Dini derivatives always exist and take values in the extended real line [−∞. but this is a measure zero set and will not impact our analysis. h F (x+h)−F (x) .29) D+ F (x) = D+ F (x) = D− F (x) = D− F (x) ∈ (−∞. thus 0 ≤ D+ F (x) ≤ D+ F (x). h F (x+h)−F (x) . Diﬀerentiation theorems (i) The upper right derivative D+ F (x) := lim suph→0+ (ii) The lower right derivative D+ F (x) := lim inf h→0+ (iii) The upper left derivative D− F (x) := lim suph→0− (iv) The lower right derivative D− F (x) := lim inf h→0− 157 F (x+h)−F (x) . h F (x+h)−F (x) . b] → R be a continuous monotone nondecreasing function. Then we have m({x ∈ [a.30.) Exercise 1. (If F is only deﬁned on an interval [a. ∞]. and let λ > 0.1. We also have the trivial inequalities D+ F (x) ≤ D+ F (x). rather than on the endpoints. then some of the Dini derivatives may not exist at the endpoints.) A function F is diﬀerentiable at x precisely when the four derivatives are equal and ﬁnite: (1. (Hint: the main diﬃculty is to reformulate the derivatives so that h ranges over a countable set rather than an uncountable one.6. show that the four Dini derivatives of F are measurable. If F is monotone.26 (Onesided HardyLittlewood inequality). 0 ≤ D− F (x) ≤ D− F (x). D− F (x) ≤ D− F (x). Let F : [a. h Regardless of whether F is diﬀerentiable or not (or even whether F is continuous or not). b]. λ Similarly for the other three Dini derivatives of F . all these quantities are nonnegative.6. The onesided HardyLittlewood maximal inequality has an analogue in this setting: Lemma 1. +∞). If F is nondecreasing.6. . b] : D+ F (x) ≥ λ}) ≤ F (b) − F (a) .
16 without diﬃculty to use here. and [a.6.6. It suﬃces to prove the claim for D+ F . then we have the weaker inequality m({x ∈ [a. the same argument works for D− F . b] with [−b. From telescoping series and the monotone nature of F λ we have n F (bn ) − F (an ) ≤ F (b) − F (a) (this is easiest to prove by . Note that if one naively applies the fundamental theorems of calculus. This gives us an at most countable family of intervals In = (an .16.27. Observe that if x ∈ (a. b) : D+ F (x) > λ}) ≤ λ We may apply the rising sun lemma (Lemma 1. b).6. bn ) in (a. By modifying λ by an epsilon. Thus we see that the set {x ∈ (a. Nevertheless. b] as they have measure zero. b).6. Remark 1.6. we can borrow the proof of Lemma 1. and such that G(y) ≤ G(x) whenever a ≤ x ≤ y ≤ b and x lies outside of all of the In . b] : D+ F (x) ≥ λ}) ≤ C for some absolute constant C > 0. and G(y) ≤ G(x) for all x ≤ y ≤ b. and dropping the endpoints from [a. b) : D+ F (x) > λ} is contained in the union of the In .158 1.26 is equivalent to Lemma 1. We just prove the continuous case and leave the discontinuous case as an exercise. and this is exactly what we will do. then D+ F (x) ≤ λ. −a]). and so by countable additivity m({x ∈ (a. b) : D+ F (x) > λ}) ≤ n F (b) − F (a) λ bn − an . and then this trivially implies the same inequalities for D+ F and D− F . one can formally see that the ﬁrst part of Lemma 1. We cannot however use this argument rigorously because we have not established the necessary fundamental theorems of calculus to do this.17) to the continuous function G(x) := F (x) − λx. Proof. Measure theory If F is not assumed to be continuous. it suﬃces to show that F (b) − F (a) m({x ∈ (a. such that G(bn ) ≥ G(an ) for each n. But we can rearrange the inequality G(bn ) ≤ G(an ) as bn − an ≤ F (bn )−F (an ) . by reﬂection (replacing F (x) with −F (−x).
Proof. b]) ≤ R b − a. −a].6. the set E = Er.6. we would conclude that the set {x ∈ R : D+ F (x) > D− F (x)} is a null set (recall that the Dini derivatives are all nonnegative when F is nondecreasing).3.R ∩ [a. we conclude as a corollary that all the four Dini derivatives of a continuous monotone nondecreasing function are ﬁnite almost everywhere. and the claim follows. we will establish the following estimate: Lemma 1. this lemma implies that E has no points of density. This gives an at most countable family of disjoint .26 in the discontinuous case.6.29) holds for almost every x. and then sending [a. and then taking suprema).25 for continuous monotone nondecreasing functions.31.28 (E has density less than one). To prove that it is null. Indeed. since by letting R. For any interval [a. which by Exercise 1.6.24 forces E to be a null set. The discontinuous case is left as an exercise. but one can use either the Vitalitype covering lemma (which will give C = 3) or the Besicovitch lemma (which will give C = 2). and the claim follows. one has m(Er. In view of the trivial inequalities.R := {x ∈ R : D+ F (x) > R > r > D− F (x)} is a null set. it suﬃces to show that (1.1. by modifying the proof of Theorem 1. We begin by applying the rising sun lemma to the function G(x) := rx + F (−x) on [−b. It will suﬃce to show that for every pair 0 < r < R of real numbers. Prove Lemma 1.6. So to prove Theorem 1.6. We will just show the ﬁrst inequality. as the second follows by replacing F with its reﬂection x → −F (−x). Sending λ → ∞ in the above lemma (cf. Exercise 1. it suﬃces to show that D+ F (x) ≤ D− F (x) and D− F (x) ≤ D+ F (x) for almost every x. r range over rationals with R > r > 0 and taking countable unions. Clearly E is a measurable set. bn ).6. Diﬀerentiation theorems 159 ﬁrst working with a ﬁnite subcollection of the intervals (an . b] to R.20. Exercise 1. b] r and any 0 < r < R. (Hint: the rising sun lemma is no longer available. the large number of negative signs present here is needed in order to properly deal with the lower left Dini derivative D− F .18).
m(Er. Thus we see that Er. Measure theory intervals −In = (−bn . in this case. as the space of continuous monotone functions are not suﬃciently dense in the space of all monotone functions in the relevant sense (which.26.R is contained inside the union of the intervals In = (an . it is still important to keep track of constants. −a) lies outside of all of the −In .) This concludes the proof of Theorem 1. b). one thus has m(Er. Observe that if x ∈ (a.26 we have F (bn ) − F (an ) . is in the total variation sense. such that G(−an ) ≥ G(−bn ) for all n. bn ) are disjoint inside (a. If we naively try to run the density argument as we did in previous sections.29.R ∩ (an . −a).6. From countable additivity. On the other hand. so from countable additivity again. . Now we work on removing the continuity hypothesis (which was needed in order to make the rising sun lemma work properly).6. the constants are not terribly important. To bridge this gap.6. then (for once) the argument does not work very well.25 in the continuous monotone nondecreasing case. then D− F (x) ≥ r. and one would then be unable to prevent D+ F from being up to C times as large as D− F . we have to supplement the continuous monotone functions with another class of monotone functions. (But this is the exception rather than the rule: for a large portion of arguments in analysis. and G(−x) ≤ G(−y) for all −x ≤ −y ≤ −a.). bn )) ≤ R But we can rearrange the inequality G(−an ) ≤ G(−bn ) as F (bn ) − F (an ) ≤ r(bn − an ).160 1.26. So sometimes. then one would lose a factor of C here from the second part of Lemma 1. even when all one is seeking is a qualitative result such as diﬀerentiability. n But the (an . known as the jump functions. Note if F was not assumed to be continuous. which is what is needed to invoke such tools as Lemma 1. Remark 1. −an ) in (−b. from the ﬁrst part of Lemma 1.R ) ≤ r R bn − an . bn ).6. and the claim follows. and such that G(−x) ≤ G(−y) whenever −x ≤ −y ≤ −a and −x ∈ (−b. b).6. we have n bn − an ≤ b − a.
If there are only ﬁnitely many n involved.30 (Jump function). essentially generate all monotone functions. but are unequal.6. Thus. Clearly. . More precisely.6. . if q1 . (ii) There are at most countably many discontinuities of F . for instance. A basic jump function J is a function of the form 0 J(x) := θ 1 when x < x0 when x = x0 when x > x0 for some real numbers x0 ∈ R and 0 ≤ θ ≤ 1.6.1. q3 . Let F : R → R be a monotone nondecreasing function. together with the continuous monotone functions.e. i. we say that F is a piecewise constant jump function. then n=1 2−n 1[qn . for instance ∞ N n=1 cn Jn is the uniform limit of n=1 cn Jn . From the absolute convergence of the cn we see that every jump function is the uniform limit of piecewise constant jump functions.31 (Continuoussingular decomposition for monotone functions). The key fact is that these functions. A jump function is any absolutely convergent combination of basic jump functions. Observe that such functions are monotone nondecreasing. is any enumeration of the ra∞ tionals.e. we call x0 the point of discontinuity for J and θ the fraction. with limy→x− F (y) < limy→x+ F (y). where n ranges over an at most countable set. . i. if x is a point where F is discontinuous. and the cn are positivereals with n cn < ∞. each Jn is a basic jump function. at least in the bounded case: Lemma 1. all jump functions are monotone nondecreasing. . One consequence of ∞ this is that the points of discontinuity of a jump function n=1 cn Jn are precisely those of the individual summands cn Jn . a function of the form F = n cn Jn .+∞) is a jump function. of the points xn where each Jn jumps. then the limits limy→x− F (y) and limy→x+ F (y) both exist. q2 . (i) The only discontinuities of F are jump discontinuities. but have a discontinuity at one point. Diﬀerentiation theorems 161 Deﬁnition 1.
(x)−F− (x) Thus F+ (x) = F− (x) + cx and F (x) = F− (x) + θx cx . As discussed previously. Note that cx is the measure of the interval (F− (x). thus A is at most countable. and the fraction θx := F+(x)−F− (x) ∈ [0.6.32. Remark 1.162 1. I. This decomposition is part of the more general Lebesgue decomposition. The only . Now we prove (iii). by the boundedness of F . and from monotonicity. and for each x ∈ A one easily checks that (Fpp )+ (x) = (Fpp )− (x) + cx and Fpp (x) = (Fpp )− (x) + θx cx where (Fpp )− (x) := limy→x− Fpp (y). and so if we let Jx be the basic jump function with point of discontinuity x and fraction θx . This gives (ii). This gives (i). G is discontinuous only at A. We thus see that the diﬀerence Fc := F − Fpp is continuous. By countable additivity. the limits F− (x) := limy→x− F (y) and F + (x) := limy→x+ F (y) always exist. By monotonicity. and (Fpp )+ (x) := limy→x+ Fpp (y). By monotonicity. discussed in §1. their union is bounded. Vol.2 of An epsilon of room. For each x ∈ A. these intervals are disjoint. then the function Fpp := x∈A cx Jx is a jump function. with F− (x) ≤ F (x) ≤ F+ (x) for all x. Proof. each rational number can be assigned to at most one discontinuity. 1]. whenever there is a discontinuity x of F . there is at least one rational number qx strictly between F− (x) and F+ (x). F+ (x)). then F can be expressed as the sum of a continuous monotone nondecreasing function Fc and a jump function Fpp . we deﬁne the jump F cx := F+ (x) − F− (x) > 0. By (i). Let A be the set of discontinuities of F . we thus have x∈A cx < ∞. Measure theory (iii) If F is bounded.
we can easily reduce to the case of bounded monotone nondecreasing functions.6.6. F (b)). we are able to use the density argument.31 (and linearity of the derivative) to verify the claim for jump functions. Now. the derivative exists and is zero outside of ﬁnitely many discontinuities). Show that the decomposition of a bounded monotone nondecreasing function F into continuous Fc and jump components Fpp given by the above lemma is unique.1. As every jump function is the uniform limit of piecewise constant jump functions. it suﬃces by Lemma 1. F (b)). Diﬀerentiation theorems 163 remaining task is to verify that Fc is monotone nondecreasing. thus we need Fpp (b) − Fpp (a) ≤ F (b) − F (a) for all a < b.25. As diﬀerentiability is a local condition. since to test diﬀerentiability of a monotone nondecreasing function F in any compact interval [a. As we have already proven the claim for continuous functions. fortunately for us. the claim follows from countable additivity. the density argument does not particularly care that there is a loss of a constant factor in this estimate.26 for the quantitative estimate. and let ε > 0 and λ > 0 be arbitrary. (Hint: the notion to shoot for here is that of a “locally jump function”. we can ﬁnd . Let F be a bounded jump function. Find a suitable generalisation of the notion of a jump function that allows one to extend the above decomposition to unbounded monotone functions. Now we run the density argument. but these form a set of measure zero). it suﬃces to prove the claim for monotone nondecreasing functions. Exercise 1. F (a)) with no change in the diﬀerentiability in [a. As each cx is the measure of the interval (F− (x).6. ﬁnally. b] (except perhaps at the endpoints a.6. Exercise 1. using the piecewise constant jump functions as the dense subclass. the claim is clear (indeed.6. As noted previously. and then prove this extension. F+ (x)).33. and these intervals for x ∈ A ∩ [a.32.6.) Now we can ﬁnish the proof of Theorem 1. b] we may replace F by the bounded monotone nondecreasing function max(min(F.b] cx . and using the second part of Lemma 1. b. b] are disjoint and lie in (F (a). But the lefthand side can be rewritten as x∈A∩[a. For piecewise constant jump functions.
all of the Dini derivatives of F − Fε are less than λ..6.164 1.26. . F T V (R) is often written as F BV (R) or just F BV . Sending ε to zero (holding λ ﬁxed).25. This concludes the proof of Theorem 1. If we then send λ to zero. and in particular are ﬁnite and lie within 2λ of each other. and the claim follows.. Indeed. the diﬀerentiation theory of monotone functions can be used to develop a parallel diﬀerentiation theory for the class of functions of bounded variation: Deﬁnition 1. . outside of a set of measure at most 8Cε/λ. this is a quantity in [0. we have {x ∈ R : D+ (F − Fε )(x) ≥ λ} ≤ 2Cε λ for some absolute constant C. Just as the integration theory of unsigned functions can be used to develop the integration theory of the absolutely convergent functions (see Section 1. Let F : R → R be a function. Measure theory a piecewise constant jump function Fε such that F (x) − Fε (x) ≤ ε for all x.) . and similarly for the other four Dini derivatives. by taking Fε to be a partial sum of the basic jump functions that make up F . we can ensure that F − Fε is also a monotone nondecreasing function.6. The total variation F T V (R) (or F T V for short) of F is deﬁned to be the supremum n F T V (R) := sup x0 <. . all the Dini derivatives of F (x) lie within λ of Fε (x). we conclude that for almost every x. +∞]. . Thus. Since Fε is almost everywhere diﬀerentiable.33 (Bounded variation). xn of real numbers with n ≥ 0.4). we conclude that outside of a set of measure at most 8Cε/λ. (In this case.<xn i=1 F (xi ) − F (xi+1 ) where the supremum ranges over all ﬁnite increasing sequences x0 . the Dini derivatives of F are ﬁnite and lie within 2λ of each other. Applying the second part of Lemma 1.3. We say that F has bounded variation (on R) if F T V (R) is ﬁnite.6. we see that for almost every x. the Dini derivatives of F agree with each other and are ﬁnite.
34.N ]) .) Much as an absolutely integrable function can be expressed as the diﬀerence of its positive and negative parts.6. Diﬀerentiation theorems Given any interval [a.c]) whenever a ≤ b ≤ c.34.<xn ≤b i=1 F (xi ) − F (xi+1 ). If F : R → R is a monotone function. Exercise 1. establish the triangle property F + G T V (R) ≤ F T V (R) + G T V (R) and the homogeneity property cF T V (R) = c F T V (R) for any c ∈ R. b] as n 165 T V ([a.36. . and that F T V (R) = f L1 (R) .38. We say that a function F has bounded variation on [a.b]) F T V ([a.. Show that F is of bounded variation. compactly supported function f that is not of bounded variation. thus the deﬁnition is the same. Exercise 1.6. (i) Show that every function f : R → R of bounded variation is bounded. b] if F BV ([a.6. and that the limits limx→+∞ f (x) and limx→−∞ f (x). but the points x0 . and let F : R → R be the indeﬁnite integral F (x) := [−∞.b]) := sup a≤x0 <.37. T V ([a. show that F T V ([a.6.6.b]) = F (b) − F (a) for any interval [a. (Hint: the upper bound F T V (R) ≤ f L1 (R) is relatively easy to establish. Thus for instance F T V (R) = supN →∞ F T V ([−N. (ii) Give a counterexample of a bounded. .35. use the density argument.b]) + Exercise 1. b]. To obtain the lower bound.1. .. A function F : R → R is of bounded variation if and only if it is the diﬀerence of two bounded monotone functions. continuous. Also show that F T V = 0 if and only if F is constant. For any functions F. Exercise 1.b]) is ﬁnite. Let f : R → R be an absolutely integrable function. b]. .6.6. and that F has bounded variation on R if and only if it is bounded. are welldeﬁned. Exercise 1.c]) = F T V ([a. xn are restricted to lie in [a. G : R → R. show that F F T V ([b. . b]. If F : R → R is a function.x] f (x). we deﬁne the total variation F of F on [a. a bounded variation function can be expressed as the diﬀerence of two bounded monotone functions: Proposition 1.
34. However.. n thus increasing the sum supx0 <.. = F + (+∞) + F − (+∞) for every interval [a. it suﬃces to (by writing F = F+ −(F+ −F− ) to show that F+ −F is nondecreasing.<xn ≤x i=1 max(−F (xi+1 ) + F (xi ). .6. To conclude the proposition. Measure theory Proof.b] = F + (b) − F + (a) + F − (b) − F − (a). It is clear from construction that this is a monotone increasing function. b]. .6. so assume that F (b) − F (a) ≥ 0..35 that the diﬀerence of two bounded monotone functions is bounded.) .. Exercise 1. < xn ≤ x that is good for F + need not be good for F − .<xn ≤x i=1 max(F (xi+1 ) − F (xi ).39. Deﬁne the positive variation F + by (1. Let F : R → R be of bounded variation. < xn ≤ a can be extended by one or two elements by adding a and b.. and the negative variation F − by n F − (x) := sup x0 <.30). If F (b) − F (a) is negative then this is clear from the monotone nondecreasing nature of F + . 0). where F (−∞) := limx→−∞ F (x). and vice versa. Establish the identities F (x) = F (−∞) + F + (x) − F − (x). . 1.<xn i=1 max(F (xi ) − F (xi+1 ).166 1. or in other words to show that F + (b) ≥ F + (a) + F (b) − F (a).6. Now deﬁne the positive variation F + : R → R of F by the formula n (1. F and F TV T V [a.. taking values between 0 and F T V (R) . (Hint: The main diﬃculty comes from the fact that a partition x0 < . 0) by at least F (b) − F (a). .30) F (x) := + sup x0 <. But then the claim follows because any sequence of real numbers x0 < . and F − (+∞) := limx→+∞ F − (x). and is thus bounded. 0). this can be ﬁxed by taking a good partition for F + and a good partition for F − and combining them together into a common reﬁnement. F + (+∞) := limx→+∞ F + (x). It is clear from Exercises 1.
is bounded in magnitude by the Lipschitz constant of F . then it is continuous and almost everywhere diﬀerentiable. when we have the powerful tool of the FubiniTonelli theorem (Corollary 1.35 (BV diﬀerentiation theorem). onedimensional case). Remark 1. A function f : R → R is said to be Lipschitz continuous if there exists a constant C > 0 such that f (x) − f (y) ≤ Cx − y for all x. The same result is true in higher dimensions. show that the derivative F . Every bounded variation function is diﬀerentiable almost everywhere. Diﬀerentiation theorems 167 From Proposition 1.6.6.) Thus we see that in some sense. Show that every function that is locally of bounded variation is diﬀerentiable almost everywhere. Show that if f is convex. together with a number of chords and tangent lines. of course.36. and hence diﬀerentiable almost everywhere.6. that is particularly useful for deducing higherdimensional results in analysis from lowerdimensional ones. and is known as the Radamacher diﬀerentiation theorem. and its derivative f is equal almost everywhere to a monotone nondecreasing function. (Hint: Drawing the graph of f .6. . convex functions are “almost everywhere twice diﬀerentiable”. Show that every Lipschitz continuous function F is locally of bounded variation.6.40. the smallest C with this property is known as the Lipschitz constant of f .34 and Theorem 1.6. Call a function locally of bounded variation if it is of bounded variation on every compact interval [a. but we will defer the proof of this theorem to Section 2.1. Exercise 1.7. b]. and so is itself almost everywhere diﬀerentiable.23) available. when it exists.6. y ∈ R.6.2. Similar claims also hold for concave functions.41 (Lipschitz diﬀerentiation theorem. Exercise 1.25 we immediately obtain Corollary 1. A function f : R → R is said to be convex if one has f ((1 − t)x + ty) ≤ (1 − t)f (x) + tf (y) for all x < y and 0 < t < 1. is likely to be very helpful in providing visual intuition. Exercise 1.42. Furthermore.
an easy modiﬁcation of Exercise 1.6.4. b] → R is monotone nondecreasing.b+1/n] F (y) dy − [a.6. One half of the second fundamental theorem is easy: Proposition 1. from monotonicity we see that F is nonnegative whenever it is deﬁned.b] F (x + 1/n) − F (x) dx. Then F (x) dx ≤ F (b) − F (a). F is deﬁned almost everywhere.b+1/n] F (x) dx − [a.168 1.4. the Newton quotients fn (x) := F (x + 1/n) − F (x) 1/n converge pointwise almost everywhere to F . We begin with the case when F : [a. so F is deﬁned a. this implies that F is diﬀerentiable almost everywhere in [a. b] → R be monotone nondecreasing (so that. we conclude that F (x) dx ≤ lim inf [a.6. [a. F is absolutely integrable. b].25 (extending F to the rest of the real line if needed). The second fundamental theorem of calculus.47).b] n→∞ [a. 1/n The righthand side can be rearranged as lim inf n( n→∞ [a+1/n. Proof.b] In particular.a+1/n] F (x) dx). We are now ﬁnally ready to attack the second fundamental theorem of calculus in the cases where F is not assumed to be continuously diﬀerentiable. .37 (Upper bound for second fundamental theorem). Also. and F vanishes outside of [a. then F is now a bounded monotone function on R. b]. as discussed above.b] F (x) dx) which can be rearranged further as lim inf n( n→∞ [b. Applying Fatou’s lemma (Corollary1. is unsigned.6. As F is almost everywhere diﬀerentiable. From Theorem 1. It is convenient to extend F to all of R by declaring F (x) := F (b) for x > b and F (x) := F (a) for x < a.1 shows that F is measurable. Measure theory 1. and is measurable). Let F : [a.e..
6. but F (b) − F (a) is not equal to [a. such as the (appropriately named) Heaviside function F := 1[0.37. In fact.49) in place of Fatou’s lemma (Corollary1. one may hope to recover equality in Proposition 1. (Hint: ﬁrst show that the product of two Lipschitz continuous functions on [a. Let F. one can do better: Exercise 1.6. it is clear that F vanishes almost everywhere.6.45 (Integration by parts formula).47). Inspired by the Lipschitz case.+∞) .6. Show that F (x)G(x) dx = F (b)G(b) − F (a)G(a) [a.) Exercise 1. Exercise 1. This is most obvious in the case of a discontinuous monotone function.4. Show that F (x) dx = F (b) − F (a).43.6.6.b] sition 1. Let F : [a. In the Lipschitz case. Show that any function of bounded variation has an (almost everywhere deﬁned) derivative that is absolutely integrable.4.37 for such functions F . Diﬀerentiation theorems 169 Since F is equal to F (b) for the ﬁrst integral and is at least F (a) for the second integral. (Hint: Argue as in the proof of Propo[a. there is an important obstruction to this. the same problem arises for all jump functions: .1. and thus undetectable by the Lebesgue integral of F .44 (Second fundamental theorem for Lipschitz functions). b] is again Lipschitz continuous.) Now we return to the monotone case. b] → R be Lipschitz continuous functions.b] F (x)G (x) dx. but use the dominated convergence theorem (Theorem 1.b] F (x) dx if b and a lie on opposite sides of the discontinuity at 0.b] − [a. which is that all the variation of F may be concentrated in a set of measure zero. However. G : [a. this expression is at most ≤ lim inf n(F (b)/n − F (a)/n) = F (b) − F (a) n→∞ and the claim follows. b] → R be Lipschitz continuous.
F1 . . also known as the Devil’s staircase function. Show that if F is a jump function..6. 1. F1 . 1] → R recursively as follows: 1. 2. 1] → R. The construction of this function is detailed in the exercise below. and that as long as one restricts attention to continuous monotone functions.6. 1]. that one can recover the second fundamental theorem. 1. such as the middle thirds Cantor set (Exercise 1. not in a countable collection of jump discontinuities.are the only obstruction to the second fundamental theorem of calculus holding for monotone functions. (Hint: induct on n. deﬁne 1 if x ∈ [0.) One may hope that jump functions . Exercise 1.) (iii) Show that for each n = 0.47 (Cantor function). then F vanishes almost everywhere.170 1.in which all the ﬂuctuation is concentrated in a countable set . Deﬁne the functions F0 . . . (ii) Show that for each n = 0. . Conclude that the Fn converge uniformly to a limit F : [0.9). . this is still not true. However. This can be illustrated by the key counterexample of the Cantor function. 2 1 1 + 2 Fn−1 (3x − 2) if x ∈ [2/3. . . 1/3]. (Hint: use the density argument. Set F0 (x) := x for all x ∈ [0.2. 2. 2 Fn−1 (3x) 1 Fn (x) := if x ∈ (1/3. . . Fn is a continuous monotone nondecreasing function with Fn (0) = 0 and Fn (1) = 1. and F3 (preferably on a single graph). . . (iv) Show that the Cantor function F is continuous and monotone nondecreasing. in turn. with F (0) = 0 and F (1) = 1.46.6. one has Fn+1 (x)−Fn (x) ≤ 2−n for each x ∈ [0. Measure theory Exercise 1. For each n = 1. F2 . starting from piecewise constant jump functions and using Proposition 1. 1]. F2 . 1] 2 (i) Graph F0 . but instead in an uncountable set of zero measure. : [0.. 2/3). . .37 as the quantitative estimate. This limit is known as the Cantor function. because it is possible for all the ﬂuctuation to now be concentrated.
9). i i (1) Let I = [ i=1 ai . . an −n for any digits (vi) Show that F ( n=1 an 3−n ) = n=1 2 2 a1 . Show that I is an interval of length 3−n . ∈ {0.6. we see that we need to add an additional hypothesis to the continuous monotone nonincreasing function F before we can recover the second fundamental theorem. I. Conclude that F (x) dx = 0 = 1 = F (1) − F (0). In §1.h=0 F (x+h)−F (x) of a function has some defects. and which is related to the classical derivative via the LebesgueRadonNikodym theorem.2.7. .38.3 the LebesgueStieltjes integral. b) is an interval of length at most δ that contains x0 .6. so that the second [0. which despite the name. in some sense. but F (I) is an interval of length 2−n .9).) In view of this counterexample. Remark 1. This example shows that the classical derivative F (x) := limh→0. an ∈ {0.2. this will be rectiﬁed by introducing the concept of the weak derivative of a function. and in particular F (x) = 0. 2}. converts base three expansions to base two expansions. then F is constant in a neighbourhood of x. which is another (closely related) way to capture all of the variation of a monotone function. . Vol. for every ε > 0 and x0 ∈ R. Vol. Thus the Cantor function. it h cannot “see” some of the variation of a continuous monotone function such as the Cantor function. i=1 ai + 31 ] be one of the intervals used n 3 3 th in the n cover In of C (see Exercise 1. Diﬀerentiation theorems 171 (v) Show that if x ∈ [0. let us recall two existing deﬁnitions: (i) A function F : R → R is continuous if.13 of An epsilon of room. see §1. . I. (We will also encounter in Section 1. ∞ ∞ n n (2) Show that F is not diﬀerentiable at any element of the Cantor set C. .2 of An epsilon of room. One such hypothesis is absolute continuity. a2 . To motivate this deﬁnition. . thus n ≥ 0 and a1 .1.1] fundamental theorem of calculus fails for this function. . . is more able than the strong derivative to detect this type of singular variation behaviour. 2}. there exists a δ > 0 such that F (b) − F (a) ≤ ε whenever (a. 1] lies outside the middle thirds Cantor set (Exercise 1.
(ii) Show that every absolutely continuous function is of bounded variation on every compact interval [a.47 is continuous. (iii) Show that every Lipschitz continuous function is absolutely continuous. 1]. for every ε > 0. but not Lipschitz continuous. (v) Show that the Cantor function from Exercise 1.40). (i) Show that every absolutely continuous function is uniformly continuous and therefore continuous. and that F is diﬀerentiable almost everywhere with F (x) = f (x) for almost every x. bn ) is a ﬁnite n collection of disjoint intervals of total length j=1 bj − aj at most δ. b) is an interval of length at most δ. What happens if we work on R instead of on [a. on [0.6. but not absolutely continuous.6. 1]. with the only diﬀerence being that the intervals [aj . on the interval [0. . A function F : R → R is said to be absolutely continuous if. b] of F . there exists a δ > 0 such that n j=1 F (bj ) − F (aj ) ≤ ε whenever (a1 . We deﬁne absolute continuity for a function F : [a. b] similarly. b] remains absolutely continuous. b1 ). (an . b]. b] → R deﬁned on an interval [a. .) In particular (by Exercise 1. √ (iv) Show that the function x → x is absolutely continuous. for every ε > 0. (vi) If f : R → R is absolutely integrable.6. bj ] are of course now required to lie in the domain [a. The following exercise places absolute continuity in relation to other regularity properties: Exercise 1. . (vii) Show that the sum or product of two absolutely continuous functions on an interval [a.39. Measure theory (ii) A function F : R → R is uniformly continuous if. absolutely continuous functions are diﬀerentiable almost everywhere. there exists a δ > 0 such that F (b) − F (a) ≤ ε whenever (a.172 1.6. show that the indefinite integral F (x) := [−∞.48.x] f (y) dy is absolutely continuous. Deﬁnition 1. b]? . monotone. (Hint: ﬁrst show this is true for any suﬃciently small interval. . and uniformly continuous.
. Now deﬁne a gauge function δ : [a.43. we deﬁne δ(x) > 0 to be small enough that the open interval (x − δ(x).6. b] → (0. such a δ(x) exists by the deﬁnition of diﬀerentiability.6.b] Proof. b] be the set of points x where F is not diﬀerentiable. [a. we may n also assume from absolute continuity that j=1 F (bj ) − F (aj ) ≤ ε whenever (a1 . We let δ(x) > 0 be small enough that F (y) − F (x)−(y−x)F (x) ≤ εy−x holds whenever y−x ≤ δ(x). we can recover the second fundamental theorem of calculus: Theorem 1. (Here we adopt the convention that F vanishes outside of [a. thus E is a null set.40 (Second fundamental theorem for absolutely continuous functions). For absolutely continuous functions. Diﬀerentiation theorems 173 Exercise 1. . b. . (i) Show that absolutely continuous functions map null sets to null sets. b1 ). if F : R → R is absolutely continuous and E is a null set then F (E) := {F (x) : x ∈ E} is also a null set. as well as the points where x is not a Lebesgue point of F . By Exercise 1. By outer regularity (or the deﬁnition of outer measure) we can ﬁnd an open set U containing E of measure m(U ) < κ. By Exercise 1. Let F : [a.e. Let E ⊂ [a. +∞) as follows. x + δ(x)) lies in U . In particular. and of Lebesgue .13.6.10. Now let ε > 0.23). bn ) is a ﬁnite collection of disjoint intervals n of total length j=1 bj − aj at most κ. b] → R be absolutely continuous. By Exercise 1.6. Then F (x) dx = F (b) − F (a). together with the endpoints a. we can ﬁnd κ > 0 such that U F (x) dx ≤ ε whenever U ⊂ [a.1.5. (an . Our main tool here will be Cousin’s theorem (Exercise 1. (i) If x ∈ E.6.49. 1 and such that  I I F (y) dy − F (x) ≤ ε whenever I is an interval containing x of length at most δ(x). then F is diﬀerentiable at x and x is a Lebesgue point of F . b]. b] is a measurable set of measure at most κ. i. (ii) Show that the Cantor function does not have this property.5. (ii) If x ∈ E. F is absolutely integrable. U F (x) dx ≤ ε.) By making κ small enough. . F is thus uniformly integrable.
. Applying Cousin’s theorem. To estimate the size of this sum. Measure theory point. due to such rules as O(X) + O(X) = O(X). we consider those j for which t∗ ∈ E. This notation is convenient for managing error terms when it is not important to keep track of the exact value of constants such as C. by construction. we can ﬁnd a partition a = t0 < t1 < . from construction again we have F (y) dy = (tj − tj−1 )F (t∗ ) + O(εtj − tj−1 ) j [tj−1 . . . tj ] for j each 1 ≤ j ≤ k and tj − tj−1 ≤ δ(t∗ ). the intervals (tj−1 . Then. together with real numbers t∗ ∈ [tj−1 . By construction. We rewrite these properties using bigO notation 17 as F (y) − F (x) = (y − x)F (x) + O(εy − x) and I F (y) dy = IF (x) + O(εI). for j those j we have F (tj ) − F (t∗ ) = (tj − t∗ )F (t∗ ) + O(εtj − t∗ ) j j j j and F (t∗ ) − F (tj−1 ) = (t∗ − tj−1 )F (t∗ ) + O(εt∗ − tj−1 ) j j j j and thus F (tj ) − F (tj−1 ) = (tj − tj−1 )F (t∗ ) + O(εtj − tj−1 ). By construction of κ. j:t∗ ∈E j Next. j We can express F (b) − F (a) as a telescoping series k F (b) − F (a) = j=1 F (tj ) − F (tj−1 ).tj ] 17In this notation. < tk = b with k ≥ 1. let us ﬁrst consider those j for which t∗ ∈ E. tj ) are disjoint in j U . we thus have F (tj ) − F (tj−1 ) ≤ ε j:t∗ ∈E j and thus F (tj ) − F (tj−1 ) = O(ε). j On the other hand. we use O(X) to denote a quantity Y whose magnitude Y  is at most CX for some absolute constant C.174 1.
tj ] with t∗ ∈ E. Putting everything together. and to control an exceptional null set E.48.50. we conclude that F (tj ) − F (tj−1 ) = j:t∗ ∈E j S F (y) dy + O(ε(b − a)). Combining this result with Exercise 1. namely that the function F is everywhere diﬀerentiable rather than merely almost everywhere diﬀerentiable. b). b]\U . we conclude that F (b) − F (a) = [a.6. More precisely. By conj struction.6.51 (Compatibility of the strong and weak derivatives in the absolutely continuous case).b] F φ(x) dx = − [a.6. Show that [a. b] → R and a constant C. Since ε > 0 was arbitrary. we see that the absolute continuity was used primarily in two ways: ﬁrstly.b] F (y) dy + O(ε) + O(εb − a). we obtain a satisfactory classiﬁcation of the absolutely continuous functions: Exercise 1.tj ] 175 F (y) dy + O(εtj − tj−1 ). b] → R be an absolutely continuous function.6. It turns out that one can achieve the latter control by making a diﬀerent hypothesis. Summing in j.b] F (y) dy + O(ε). to ensure the almost everywhere existence. where S is the union of all the [tj−1 . and let φ : [a.1. b] and contains [a. Let F : [a. this set is contained in [a. we conclude that U F (y) dy = S [a.b] F φ (x) dx. Exercise 1. the claim follows. Show that a function F : [a. b] → R be a continuously diﬀerentiable function supported in a compact subset of (a.40. b] → R is absolutely continuous if and only if it takes the form F (x) = [a. Diﬀerentiation theorems and thus F (tj ) − F (tj−1 ) = [tj−1 .6.x] f (y) dy + C for some absolutely integrable f : [a. Since F (x) dx ≤ ε. we have . Inspecting the proof of Theorem 1.
together with real numbers t∗ ∈ [tj−1 . Let ε > 0. we can ﬁnd an open set Um containing E of measure m(Um ) ≤ κ/4m . together with the endpoints a.b] F (x) dx of F is equal to F (b) − F (a). Let E ⊂ [a.6. j As before. Then the Lebesgue integral [a. 2. Proof. we can ﬁnd a partition a = t0 < t1 < . b] be the set of points x which are not Lebesgue points of F . For every natural number m = 1. the one main new twist being that we need several open sets U instead of just one. Applying Cousin’s theorem. tj ] for j each 1 ≤ j ≤ k and tj − tj−1 ≤ δ(t∗ ). < tk = b with k ≥ 1.41 (Second fundamental theorem of calculus. and then let κ > 0 be small enough that U F (x) dx ≤ ε whenever U is measurable with m(U ) ≤ κ. . we deﬁne δ(x) > 0 to be small enough that the open interval (x − δ(x). +∞) as follows.40. let F : [a. Let [a. such that F is absolutely integrable.6.176 1. we let δ(x) > 0 be small enough that F (y)−F (x)− (y − x)F (x) ≤ εy − x holds whenever y − x ≤ δ(x). . Measure theory Proposition 1. exactly as in the proof of Theorem 1. . b] → R be a diﬀerentiable function. (Here we crucially use the everywhere diﬀerentiability to ensure that f (x) exists and is ﬁnite here. . b] → (0. m=1 Now deﬁne a gauge function δ : [a. This is a null set. where m is the ﬁrst natural number such that F (x) ≤ 2m . b. . b] be a compact interval of positive length. .40.) (ii) If x ∈ E. x + δ(x)) lies in Um . This will be similar to the proof of Theorem 1. We can also ensure that κ ≤ ε.6. and also small enough that F (y) − F (x) − (y − x)F (x) ≤ εy − x holds whenever y − x ≤ δ(x). again). In particular we see ∞ that m( m=1 Um ) ≤ κ and thus ∞ Um F (x) dx ≤ ε. we express F (b) − F (a) as a telescoping series k F (b) − F (a) = j=1 F (tj ) − F (tj−1 ). and 1 such that  I I F (y) dy − F (x) ≤ ε whenever I is an interval containing x of length at most δ(x). (i) If x ∈ E.
we again have F (b) − F (a) = [a. and so F (tj ) − F (tj−1 ) = ( j:t∗ ∈E j j:t∗ ∈E j (tj − tj−1 )F (t∗ )) + O(ε(b − a)). because there are very few methods that establish the everywhere diﬀerentiability of a function that do not also establish continuous diﬀerentiability (or at least Riemann integrability of the derivative). Remark 1. tj ] with t∗ ∈ E. By construction. though.40 to conclude eventually that F (tj ) − F (tj−1 ) = j:t∗ ∈E j S F (y) dy + O(ε(b − a)). Now we turn to those j with t∗ ∈ E.6. Putting all this together. . . we argue exactly as in j the proof of Theorem 1. the above proposition is not as useful as one might initially think. . j Next.1.b] F (y) dy + O(ε).42. By countable additivity. The above proposition is yet another illustration of how the property of everywhere diﬀerentiability is signiﬁcantly better than that of almost everywhere diﬀerentiability. at which point one could just use Theorem 1. In practice. Since ε > 0 was arbitrary. Diﬀerentiation theorems 177 For the contributions of those j with t∗ ∈ E. we conclude that ∞ ∞ ( j:t∗ ∈E j (tj − tj−1 )F (t∗ )) ≤ j m=1 2m m(Um ) ≤ m=1 2m ε/4m = O(ε). tj ] ⊂ Um for some j natural number m = 1. the claim follows. by construction.6. .7 instead.b]\S ∞ m=1 F (x) dx ≤ ε Um we thus have F (y) dy = S [a. Since j F (x) dx ≤ [a.b] F (y) dy + O(ε) + O(εb − a). where S is the union of all the [tj−1 . 2.6. for each j we have F (t∗ ) ≤ 2m and [tj−1 . we have j F (tj ) − F (tj−1 ) = (tj − tj−1 )F (t∗ ) + O(εtj − tj−1 ) j ﬁr these intervals..6.
6. .53 (HenstockKurzweil integral).6.6. (i) Show that if a function is HenstockKurzweil integrable.6.b] f (x) dx. b] → R is everywhere diﬀerentiable. . t∗ are 1 k such that t∗ ∈ [tj−1 .41. (Hint: this is a variant of the proof of Theorem 1. b] be a compact interval of positive length. then F is HenstockKurzweil integrable. and the HenstockKurzweil integral [a. .b] f (x) dx is equal to the Riemann integral a f (x) dx. (Hint: this is a variant of the proof of Theorem 1. b] → R is everywhere deﬁned. (Hint: use Cousin’s theorem.178 1. b] → (0.) . j j When this occurs. then it is HenstockKurzweil integrable. . .40 or Proposition 1. Let [a.b] f (x) dx. See however the next exercise.b] f (x) dx is equal to the Lebesgue integral [a. +∞) such that one has k  j=1 f (t∗ )(tj − tj−1 ) − L ≤ ε j whenever k ≥ 1 and a = t0 < t1 < . and the HenstockKurzweil integral [a.41. b] → R is HenstockKurzweil integrable with integral L ∈ R if for every ε > 0 there exists a gauge function δ : [a. (iii) Show that if a function f : [a.6. and so the second fundamental theorem of calculus does not apply in this case (at least if we interpret [a. . we call L the HenstockKurzweil integral of f and write it as [a.b] F (x) dx using the absolutely convergent Lebesgue integral).40 or Proposition 1.) (iv) Show that if F : [a. everywhere ﬁnite. 1] → R be the function deﬁned by 1 setting F (x) := x2 sin( x3 ) when x is nonzero. but the deriative F is not absolutely integrable. Exercise 1. < tk = b and t∗ . We say that a function f : [a.6. Measure theory Exercise 1. and F (0) := 0. then it is HenstockKurzweil integrable.b] F (x) dx is equal to F (b) − F (a). and is absolutely integrable.) (ii) Show that if a function is Riemann integrable. Show that F is everywhere diﬀerentiable. Let F : [−1. it has a unique HenstockKurzweil integral. and the HenstockKurzweil b integral [a.52. tj ] and tj − tj−1  ≤ δ(t∗ ) for every 1 ≤ j ≤ k.
Remark 1. which in turn was constructed from the even more primitive concept of elementary measure. which is willing to tolerate functions being inﬁnite or undeﬁned so long as this only occurs on a null set). much as conditional summation is not always wellbehaved with respect to rearrangement.7. premeasures. In this section. due to its reliance on the order structure of the real line R. 1. we will give the Carath´odory extension theorem.6.6. and product measures In this text so far. also.1. It turns out that both of these constructions can be abstracted. the HenstockKurzweil integral does not always react well to changes of variable.43. product measure 179 (v) Explain why the above results give an alternate proof of Exercise 1.52. we have focused primarily on one speciﬁc example of a countably additive measure. such as the Euclidean space Rd . which e constructs a countably additive measure from any abstract outer . or to abstract measure spaces. As the above exercise indicates.6.7. The HenstockKurzweil integral can also integrate some (highly oscillatory) functions that the Lebesgue integral cannot. This measure was constructed from a more primitive concept of Lebesgue outer measure. Outer measure. at least as long as one restricts attention to functions that are deﬁned and are ﬁnite everywhere (in contrast to the Lebesgue integral. premeasure. as seen in part 4 of the above exercise. namely Lebesgue measure.4 and of Proposition 1. This is N analogous to how conditional summation limN →∞ n=1 an can sum ∞ conditionally convergent series n=1 an . the HenstockKurzweil integral (also known as the Denjoy integral or Perron integral ) extends the Riemann integral and the absolutely convergent Lebesgue integral. it can also be used as a uniﬁed framework for all the proofs in this section that invoked Cousin’s theorem. However. Outer measures. it is diﬃcult to extend the HenstockKurzweil integral to more general spaces. It is the notion of integration that is most naturally associated with the fundamental theorem of calculus for everywhere diﬀerentiable functions.6. even if they are not absolutely integrable.41. such as the derivative F of the function F appearing in Exercise 1.
The most important result about product measure. as it allows one to set up probability spaces associated to both discrete and continuous random processes. Vol. E2 . Outer measures and the Carath´odory extension thee orem. This fact is known as the FubiniTonelli theorem. Outer measures are also known as exterior measures. and is of particular importance in the foundations of probability theory. One can in turn construct outer measures from another concept known as a premeasure. then µ∗ (E) ≤ µ∗ (F ). 1. +∞] to every set E ⊂ X which obeys the following axioms: (i) (Empty set) µ∗ (∅) = 0. this generalises the construction of Lebesgue measure from Lebesgue outer measure. and is an absolutely indispensable tool for computing integrals.7. is that one can use it to evaluate iterated integrals. which is discussed in §1. We begin with the abstract concept of an outer measure.7. one can also establish the Kolmogorov extension theorem. product measures. and for deducing higherdimensional results from lowerdimensional ones. (iii) (Countable subadditivity) If E1 . . In this section we will however omit a very important way to construct measures. ⊂ X is a countable ∞ ∞ sequence of subsets of X. and Hausdorﬀ measures. and to interchange their order. of which elementary measure is a typical example. one can start constructing many more measures. With a little more eﬀort. provided that the integrand is either unsigned or absolutely integrable. . With these tools. Let X be a set. I.1 (Abstract outer measure). beyond the fact that it exists. An abstract outer measure (or outer measure for short) is a map µ∗ : 2X → [0.180 1. then µ∗ ( n=1 En ) ≤ n=1 µ∗ (En ).1. . . which allows one to construct a variety of measures on inﬁnitedimensional spaces. (ii) (Monotonicity) If E ⊂ F . namely the Riesz representation theorem. Deﬁnition 1. Measure theory measure. even if they have inﬁnite length. such as LebesgueStieltjes measures.10 of An epsilon of room. +∞] that assigns an unsigned extended real number µ∗ (E) ∈ [0.
3). but this still required the notion of a box or an elementary set.7. an outer measure. A set E ⊂ X is said to be Carath´odory meae surable with respect to µ∗ if one has µ∗ (A) = µ∗ (A ∩ E) + µ∗ (A\E) for every set A ⊂ X. This deﬁnition is not available in our more abstract setting. Note that outer measures are weaker than measures in that they are merely countably subadditive. (Hint: one direction follows from Exercise 1.17. such as when E is a box. Exercise 1.1 (Null sets are Carath´odory measurable). premeasure. which is still not available in this setting. Let µ∗ be an outer e measure on a set X.2. whereas measures can only measure a σalgebra of measurable sets. Outer measure. we can modify that deﬁnition to give an abstract deﬁnition of measurability: Deﬁnition 1. Show that E is Carath´odory measurable with respect to µ∗ . or when E or A are bounded.) The construction of Lebesgue measure can then be abstracted as follows: . Suppose e that E is a null set for an outer measure µ∗ (i. product measure 181 Thus. strictly speaking.e. e Exercise 1. as we do not necessarily have the notion of an open set. For the other direction. Lebesgue outer measure m∗ is an outer measure (see Exercise 1.2. for instance. µ∗ (E) = 0). they are able to measure all subsets of X.2.17.2.2.2 (Carath´odory measurability). On the other hand. In Deﬁnition 1. rather than countably additive. we used Lebesgue outer measure together with the notion of an open set to deﬁne the concept of Lebesgue measurability.1.(J) is only ﬁnitely subadditive rather than countably subadditive and thus is not. Jordan outer measure m∗. for this reason this concept is often referred to as Jordan outer content rather than Jordan outer measure. Nevertheless. On the other hand. ﬁrst verify simple cases. An alternative deﬁnition of measurability was put forth in Exercise 1.7.7.2 (Compatibility with Lebesgue measurability). Show that a set E ⊂ Rd is Carath´odory measurable with respect to e Lebesgue outer measurable if and only if it is Lebesgue measurable.7.
and so the arguments still work when some sets have inﬁnite outer measure. (Note that no subtraction is employed here. On the other hand. A11 := A ∩ E ∩ F (the reader may wish to draw a Venn diagram here to understand the nature of these sets). +∞] be the restriction of µ∗ to B (thus µ(E) := µ