You are on page 1of 543

MEASURE THEORY

Volume 2
D.H.Fremlin
By the same author:
Topological Riesz Spaces and Measure Theory, Cambridge University Press, 1974.
Consequences of Martins Axiom, Cambridge University Press, 1982.
Companions to the present volume:
Measure Theory, vol. 1, Torres Fremlin, 2000;
Measure Theory, vol. 3, Torres Fremlin, 2002;
Measure Theory, vol. 4, Torres Fremlin, 2003;
Measure Theore, vol. 5, Torres Fremlin, 2008.
First edition May 2001
Second edition January 2010
MEASURE THEORY
Volume 2
Broad Foundations
D.H.Fremlin
Research Professor in Mathematics, University of Essex
Dedicated by the Author
to the Publisher
This book may be ordered from the printers, http://www.lulu.com/buy.
First published in 2001
by Torres Fremlin, 25 Ireton Road, Colchester CO3 3AT, England
c _ D.H.Fremlin 2001
The right of D.H.Fremlin to be identied as author of this work has been asserted in accordance with the Copyright,
Designs and Patents Act 1988. This work is issued under the terms of the Design Science License as published in
http://www.gnu.org/licenses/dsl.html. For the source les see http://www.essex.ac.uk/maths/staff/fremlin
/mt2.2010/index.htm.
Library of Congress classication QA312.F72
AMS 2010 classication 28-01
ISBN 978-0-9538129-7-4
Typeset by /
/
o-T
E
X
Printed by Lulu.com
5
Contents
General Introduction 9
Introduction to Volume 2 10
*Chapter 21: Taxonomy of measure spaces
Introduction 12
211 Denitions 12
Complete, totally nite, -nite, strictly localizable, semi-nite, localizable, locally determined measure spaces; atoms;
elementary relationships; countable-cocountable measures.
212 Complete spaces 17
Measurable and integrable functions on complete spaces; completion of a measure.
213 Semi-nite, locally determined and localizable spaces 23
Integration on semi-nite spaces; c.l.d. versions; measurable envelopes; characterizing localizability, strict localizabil-
ity, -niteness.
214 Subspaces 33
Subspace measures on arbitrary subsets; integration; direct sums of measure spaces; *extending measures to well-
ordered families of sets.
215 -nite spaces and the principle of exhaustion 43
The principle of exhaustion; characterizations of -niteness; the intermediate value theorem for atomless measures.
*216 Examples 46
A complete localizable non-locally-determined space; a complete locally determined non-localizable space; a complete
locally determined localizable space which is not strictly localizable.
Chapter 22: The Fundamental Theorem of Calculus
Introduction 52
221 Vitalis theorem in R 52
Vitalis theorem for intervals in R.
222 Dierentiating an indenite integral 55
Monotonic functions are dierentiable a.e., and their derivatives are integrable;
d
dx
_
x
a
f = f a.e.; *the Denjoy-Young-
Saks theorem.
223 Lebesgues density theorems 63
f(x) = lim
h0
1
2h
_
x+h
xh
f a.e. (x); density points; lim
h0
1
2h
_
x+h
xh
|f f(x)| = 0 a.e. (x); the Lebesgue set of a
function.
224 Functions of bounded variation 67
Variation of a function; dierences of monotonic functions; sums and products, limits, continuity and dierentiability
for b.v. functions; an inequality for
_
fg.
225 Absolutely continuous functions 75
Absolute continuity of indenite integrals; absolutely continuous functions on R; integration by parts; lower semi-
continuous functions; *direct images of negligible sets; the Cantor function.
*226 The Lebesgue decomposition of a function of bounded variation 84
Sums over arbitrary index sets; saltus functions; the Lebesgue decomposition.
Chapter 23: The Radon-Nikod ym theorem
Introduction 92
231 Countably additive functionals 92
Additive and countably additive functionals; Jordan and Hahn decompositions.
232 The Radon-Nikod ym theorem 96
Absolutely and truly continuous additive functionals; truly continuous functionals are indenite integrals; *the
Lebesgue decomposition of a countably additive functional.
233 Conditional expectations 105
-subalgebras; conditional expectations of integrable functions; convex functions; Jensens inequality.
234 Operations on measures 112
Inverse-measure-preserving functions; image measures; sums of measures; indenite-integral measures; ordering of
measures.
235 Measurable transformations 122
The formula
_
g(y)(dy) =
_
J(x)g((x))(dx); detailed conditions of applicability; inverse-measure-preserving func-
tions; the image measure catastrophe; using the Radon-Nikod ym theorem.
6
Chapter 24: Function spaces
Introduction 133
241 L
0
and L
0
133
The linear, order and multiplicative structure of L
0
; Dedekind completeness and localizability; action of Borel func-
tions.
242 L
1
141
The normed lattice L
1
; integration as a linear functional; completeness and Dedekind completeness; the Radon-
Nikod ym theorem and conditional expectations; convex functions; dense subspaces.
243 L

151
The normed lattice L

; norm-completeness; the duality between L


1
and L

; localizability, Dedekind completeness


and the identication L

= (L
1
)

.
244 L
p
158
The normed lattices L
p
, for 1 < p < ; H olders inequality; completeness and Dedekind completeness; (L
p
)

= L
q
;
conditional expectations; *uniform convexity.
245 Convergence in measure 172
The topology of (local) convergence in measure on L
0
; pointwise convergence; localizability and Dedekind complete-
ness; embedding L
p
in L
0
;
1
-convergence and convergence in measure; -nite spaces, metrizability and sequential
convergence.
246 Uniform integrability 183
Uniformly integrable sets in L
1
and L
1
; elementary properties; disjoint-sequence characterizations;
1
and conver-
gence in measure on uniformly integrable sets.
247 Weak compactness in L
1
191
A subset of L
1
is uniformly integrable i it is relatively weakly compact.
Chapter 25: Product measures
Introduction 196
251 Finite products 196
Primitive and c.l.d. products; basic properties; Lebesgue measure on R
r+s
as a product measure; products of direct
sums and subspaces; c.l.d. versions.
252 Fubinis theorem 212
When
__
f(x, y)dxdy and
_
f(x, y)d(x, y) are equal; measures of ordinate sets; *the volume of a ball in R
r
.
253 Tensor products 228
Bilinear operators; bilinear operators L
1
()L
1
()W and linear operators L
1
()W; positive bilinear operators
and the ordering of L
1
( ); conditional expectations; upper integrals.
254 Innite products 238
Products of arbitrary families of probability spaces; basic properties; inverse-measure-preserving functions; usual
measure on {0, 1}
I
; {0, 1}
N
isomorphic, as measure space, to [0, 1]; subspaces of full outer measure; sets determined
by coordinates in a subset of the index set; generalized associative law for products of measures; subproducts as
image measures; factoring functions through subproducts; conditional expectations on subalgebras corresponding to
subproducts.
255 Convolutions of functions 255
Shifts in R
2
as measure space automorphisms; convolutions of functions on R;
_
h(f g) =
_
h(x+y)f(x)g(y)d(x, y);
f (g h) = (f g) h; f g
1
f
1
g
1
; the groups R
r
and ], ].
256 Radon measures on R
r
267
Denition of Radon measures on R
r
; completions of Borel measures; Lusin measurability; image measures; products
of two Radon measures; semi-continuous functions.
257 Convolutions of measures 274
Convolution of totally nite Radon measures on R
r
;
_
hd(
1

2
) =
__
h(x+y)
1
(dx)
2
(dy);
1
(
2

3
) = (
1

2
)
3
.
Chapter 26: Change of variable in the integral
Introduction 277
261 Vitalis theorem in R
r
277
Vitalis theorem for balls in R
r
; Lebesgues Density Theorem.
262 Lipschitz and dierentiable functions 285
Lipschitz functions; elementary properties; dierentiable functions from R
r
to R
s
; dierentiability and partial deriva-
tives; approximating a dierentiable function by piecewise ane functions; *Rademachers theorem.
263 Dierentiable transformations in R
r
296
In the formula
_
g(y)dy =
_
J(x)g((x))dx, nd J when is (i) linear (ii) dierentiable; detailed conditions of
applicability; polar coordinates; the one-dimensional case.
264 Hausdor measures 307
r-dimensional Hausdor measure on R
s
; Borel sets are measurable; Lipschitz functions; if s = r, we have a multiple
of Lebesgue measure; *Cantor measure as a Hausdor measure.
7
265 Surface measures 316
Normalized Hausdor measure; action of linear operators and dierentiable functions; surface measure on a sphere.
*266 The Brunn-Minkowski inequality 324
Arithmetic and geometric means; essential closures; the Brunn-Minkowski inequality.
Chapter 27: Probability theory
Introduction 328
271 Distributions 329
Terminology; distributions as Radon measures; distribution functions; densities; transformations of random variables;
*distribution functions and convergence in measure.
272 Independence 335
Independent families of random variables; characterizations of independence; joint distributions of (nite) independent
families, and product measures; the zero-one law; E(XY ), Var(X + Y ); distribution of a sum as convolution of
distributions; Etemadis inequality; *Hoedings inequality.
273 The strong law of large numbers 348
1
n+1

n
i=0
X
i
0 a.e. if the X
n
are independent with zero expectation and either (i)

n=0
1
(n+1)
2
Var(X
n
) < or
(ii)

n=0
E(|X
n
|
1+
) < for some > 0 or (iii) the X
n
are identically distributed.
274 The Central Limit Theorem 360
Normally distributed r.v.s; Lindebergs conditions for the Central Limit Theorem; corollaries; estimating
_

e
x
2
/2
dx.
275 Martingales 371
Sequences of -algebras, and martingales adapted to them; up-crossings; Doobs Martingale Convergence Theorem;
uniform integrability,
1
-convergence and martingales as sequences of conditional expectations; reverse martingales;
stopping times.
276 Martingale dierence sequences 382
Martingale dierence sequences; strong law of large numbers for m.d.ss.; Komlos theorem.
Chapter 28: Fourier analysis
Introduction 390
281 The Stone-Weierstrass theorem 390
Approximating a function on a compact set by members of a given lattice or algebra of functions; real and complex
cases; approximation by polynomials and trigonometric functions; Weyls Equidistribution Theorem in [0, 1]
r
.
282 Fourier series 401
Fourier and Fejer sums; Dirichlet and Fejer kernels; Riemann-Lebesgue lemma; uniform convergence of Fejer sums of
a continuous function; a.e. and
1
-convergence of Fejer sums of an integrable function;
2
-convergence of Fourier
sums of a square-integrable function; convergence of Fourier sums of a dierentiable or b.v. function; convolutions
and Fourier coecients.
283 Fourier transforms I 419
Fourier and inverse Fourier transforms; elementary properties;
_

0
1
x
sin xdx =
1
2
; the formula

= f for dieren-
tiable and b.v. f; convolutions; e
x
2
/2
;
_
f

g =
_
f g.
284 Fourier transforms II 434
Test functions;

= h; tempered functions; tempered functions which represent each others transforms; convolutions;
square-integrable functions; Diracs delta function.
285 Characteristic functions 450
The characteristic function of a distribution; independent r.v.s; the normal distribution; the vague topology on the
space of distributions, and sequential convergence of characteristic functions; Poissons theorem.
*286 Carlesons theorem 464
The Hardy-Littlewood Maximal Theorem; the Lacey-Thiele proof of Carlesons theorem.
Appendix to Volume 2
Introduction 494
2A1 Set theory 494
Ordered sets; transnite recursion; ordinals; initial ordinals; Schr oder-Bernstein theorem; lters; Axiom of Choice;
Zermelos Well-Ordering Theorem; Zorns Lemma; ultralters.
2A2 The topology of Euclidean space 500
Closures; continuous functions; compact sets; open sets in R.
2A3 General topology 503
Topologies; continuous functions; subspace topologies; closures and interiors; Hausdor topologies; pseudometrics;
convergence of sequences; compact spaces; cluster points of sequences; convergence of lters; lim sup and lim inf;
product topologies; dense subsets.
8
2A4 Normed spaces 511
Normed spaces; linear subspaces; Banach spaces; bounded linear operators; dual spaces; extending a linear operator
from a dense subspace; normed algebras.
2A5 Linear topological spaces 514
Linear topological spaces; topologies dened by functionals; convex sets; completeness; weak topologies.
2A6 Factorization of matrices 517
Determinants; orthonormal families; T = PDQ where D is diagonal and P, Q are orthogonal.
Concordance 519
References for Volume 2 521
Index to Volumes 1 and 2
Principal topics and results 524
General index 528
General introduction 9
General introduction In this treatise I aim to give a comprehensive description of modern abstract measure theory,
with some indication of its principal applications. The rst two volumes are set at an introductory level; they are
intended for students with a solid grounding in the concepts of real analysis, but possibly with rather limited detailed
knowledge. As the book proceeds, the level of sophistication and expertise demanded will increase; thus for the volume
on topological measure spaces, familiarity with general topology will be assumed. The emphasis throughout is on the
mathematical ideas involved, which in this subject are mostly to be found in the details of the proofs.
My intention is that the book should be usable both as a rst introduction to the subject and as a reference work.
For the sake of the rst aim, I try to limit the ideas of the early volumes to those which are really essential to the
development of the basic theorems. For the sake of the second aim, I try to express these ideas in their full natural
generality, and in particular I take care to avoid suggesting any unnecessary restrictions in their applicability. Of course
these principles are to to some extent contradictory. Nevertheless, I nd that most of the time they are very nearly
reconcilable, provided that I indulge in a certain degree of repetition. For instance, right at the beginning, the puzzle
arises: should one develop Lebesgue measure rst on the real line, and then in spaces of higher dimension, or should
one go straight to the multidimensional case? I believe that there is no single correct answer to this question. Most
students will nd the one-dimensional case easier, and it therefore seems more appropriate for a rst introduction, since
even in that case the technical problems can be daunting. But certainly every student of measure theory must at a
fairly early stage come to terms with Lebesgue area and volume as well as length; and with the correct formulations,
the multidimensional case diers from the one-dimensional case only in a denition and a (substantial) lemma. So
what I have done is to write them both out (114-115). In the same spirit, I have been uninhibited, when setting
out exercises, by the fact that many of the results I invite students to look for will appear in later chapters; I believe
that throughout mathematics one has a better chance of understanding a theorem if one has previously attempted
something similar alone.
The plan of the work is as follows:
Volume 1: The Irreducible Minimum
Volume 2: Broad Foundations
Volume 3: Measure Algebras
Volume 4: Topological Measure Spaces
Volume 5: Set-theoretic Measure Theory.
Volume 1 is intended for those with no prior knowledge of measure theory, but competent in the elementary techniques
of real analysis. I hope that it will be found useful by undergraduates meeting Lebesgue measure for the rst time.
Volume 2 aims to lay out some of the fundamental results of pure measure theory (the Radon-Nikod ym theorem,
Fubinis theorem), but also gives short introductions to some of the most important applications of measure theory
(probability theory, Fourier analysis). While I should like to believe that most of it is written at a level accessible
to anyone who has mastered the contents of Volume 1, I should not myself have the courage to try to cover it in an
undergraduate course, though I would certainly attempt to include some parts of it. Volumes 3 and 4 are set at a
rather higher level, suitable to postgraduate courses; while Volume 5 assumes a wide-ranging competence over large
parts of analysis and set theory.
There is a disclaimer which I ought to make in a place where you might see it in time to avoid paying for this book.
I make no attempt to describe the history of the subject. This is not because I think the history uninteresting or
unimportant; rather, it is because I have no condence of saying anything which would not be seriously misleading.
Indeed I have very little condence in anything I have ever read concerning the history of ideas. So while I am happy to
honour the names of Lebesgue and Kolmogorov and Maharam in more or less appropriate places, and I try to include
in the bibliographies the works which I have myself consulted, I leave any consideration of the details to those bolder
and better qualied than myself.
For the time being, at least, printing will be in short runs. I hope that readers will be energetic in commenting on
errors and omissions, since it should be possible to correct these relatively promptly. An inevitable consequence of this
is that paragraph references may go out of date rather quickly. I shall be most attered if anyone chooses to rely on
this book as a source for basic material; and I am willing to attempt to maintain a concordance to such references,
indicating where migratory results have come to rest for the moment, if authors will supply me with copies of papers
which use them. In the concordance to the present volume you will nd notes on the items which have been referred
to in other published volumes of this work.
I mention some minor points concerning the layout of the material. Most sections conclude with lists of basic
exercises and further exercises, which I hope will be generally instructive and occasionally entertaining. How many
of these you should attempt must be for you and your teacher, if any, to decide, as no two students will have quite
the same needs. I mark with a >>> those which seem to me to be particularly important. But while you may not need
to write out solutions to all the basic exercises, if you are in any doubt as to your capacity to do so you should take
10 General introduction
this as a warning to slow down a bit. The further exercises are unbounded in diculty, and are unied only by a
presumption that each has at least one solution based on ideas already introduced. Occasionally I add a nal problem,
a question to which I do not know the answer and which seems to arise naturally in the course of the work.
The impulse to write this book is in large part a desire to present a unied account of the subject. Cross-references
are correspondingly abundant and wide-ranging. In order to be able to refer freely across the whole text, I have chosen
a reference system which gives the same code name to a paragraph wherever it is being called from. Thus 132E is the
fth paragraph in the second section of the third chapter of Volume 1, and is referred to by that name throughout. Let
me emphasize that cross-references are supposed to help the reader, not distract her. Do not take the interpolation
(121A) as an instruction, or even a recommendation, to lift Volume 1 o the shelf and hunt for 121. If you are happy
with an argument as it stands, independently of the reference, then carry on. If, however, I seem to have made rather
a large jump, or the notation has suddenly become opaque, local cross-references may help you to ll in the gaps.
Each volume will have an appendix of useful facts, in which I set out material which is called on somewhere in that
volume, and which I do not feel I can take for granted. Typically the arrangement of material in these appendices is
directed very narrowly at the particular applications I have in mind, and is unlikely to be a satisfactory substitute for
conventional treatments of the topics touched on. Moreover, the ideas may well be needed only on rare and isolated
occasions. So as a rule I recommend you to ignore the appendices until you have some direct reason to suppose that a
fragment may be useful to you.
During the extended gestation of this project I have been helped by many people, and I hope that my friends and
colleagues will be pleased when they recognise their ideas scattered through the pages below. But I am especially
grateful to those who have taken the trouble to read through earlier drafts and comment on obscurities and errors.
Introduction to Volume 2
For this second volume I have chosen seven topics through which to explore the insights and challenges oered by
measure theory. Some, like the Radon-Nikod ym theorem (Chapter 23) are necessary for any understanding of the
structure of the subject; others, like Fourier analysis (Chapter 28) and the discussion of function spaces (Chapter 24)
demonstrate the power of measure theory to attack problems in general real and functional analysis. But all have
applications outside measure theory, and all have inuenced its development. These are the parts of measure theory
which any analyst may nd himself using.
Every topic is one which ideally one would wish undergraduates to have seen, but the length of this volume makes
it plain that no ordinary undergraduate course could include very much of it. It is directed rather at graduate level,
where I hope it will be found adequate to support all but the most ambitious courses in measure theory, though it is
perhaps a bit too solid to be suitable for direct use as a course text, except with careful selection of the parts to be
covered. If you are using it to teach yourself measure theory, I strongly recommend an eclectic approach, looking for
particular subjects and theorems that seem startling or useful, and working backwards from them. My other objective,
of course, is to provide an account of the central ideas at this level in measure theory, rather fuller than can easily be
found in one volume elsewhere. I cannot claim that it is denitive, but I do think I cover a good deal of ground in ways
that provide a rm foundation for further study. As in Volume 1, I usually do not shrink from giving best results, like
Lindebergs conditions for the Central Limit Theorem (274), or the theory of products of arbitrary measure spaces
(251). If I were teaching this material to students in a PhD programme I would rather accept a limitation in the
breadth of the course than leave them unaware of what could be done in the areas discussed.
The topics interact in complex ways one of the purposes of this book is to exhibit their relationships. There is no
canonical linear ordering in which they should be taken. Nor do I think organization charts are very helpful, not least
because it may be only two or three paragraphs in a section which are needed for a given chapter later on. I do at least
try to lay the material of each section out in an order which makes initial segments useful by themselves. But the order
in which to take the chapters is to a considerable extent for you to choose, perhaps after a glance at their individual
introductions. I have done my best to pitch the exposition at much the same level throughout the volume, sometimes
allowing gradients to steepen in the course of a chapter or a section, but always trying to return to something which
anyone who has mastered Volume 1 ought to be able to cope with. (Though perhaps the main theorems of Chapter 26
are harder work than the principal results elsewhere, and 286 is only for the most determined.)
I said there were seven topics, and you will see eight chapters ahead of you. This is because Chapter 21 is rather
dierent from the rest. It is the purest of pure measure theory, and is here only because there are places later in
the volume where (in my view) the theorems make sense only in the light of some abstract concepts which are not
particularly dicult, but are also not obvious. However it is fair to say that the most important ideas of this volume
do not really depend on the work of Chapter 21.
As always, it is a puzzle to know how much prior knowledge to assume in this volume. I do of course call on the
results of Volume 1 of this treatise whenever they seem to be relevant. I do not doubt, however, that there will be
Introduction to Volume 2 11
readers who have learnt the elementary theory from other sources. Provided you can, from rst principles, construct
Lebesgue measure and prove the basic convergence theorems for integrals on arbitrary measure spaces, you ought to be
able to embark on the present volume. Perhaps it would be helpful to have in hand the results-only version of Volume
1, since that includes the most important denitions as well as statements of the theorems.
There is also the question of how much material from outside measure theory is needed. Chapter 21 calls for some
non-trivial set theory (given in 2A1), but the more advanced ideas are needed only for the counter-examples in 216,
and can be passed over to begin with. The problems become acute in Chapter 24. Here we need a variety of results
from functional analysis, some of them depending on non-trivial ideas in general topology. For a full understanding of
this material there is no substitute for a course in normed spaces up to and including a study of weak compactness.
But I do not like to insist on such a preparation, because it is likely to be simultaneously too much and too little.
Too much, because I hardly mention linear operators at this stage; too little, because I do ask for some of the theory
of non-locally-convex spaces, which is often omitted in rst courses on functional analysis. At the risk, therefore, of
wasting paper, I have written out condensed accounts of the essential facts (2A3-2A5).
Note on second printing, April 2003
For the second printing of this volume, I have made two substantial corrections to inadequate proofs and a large
number of minor amendments; I am most grateful to T.D.Austin for his careful reading of the rst printing. In addition,
I have added a dozen exercises and a handful of straightforward results which turn out to be relevant to the work of
later volumes and t naturally here.
The regular process of revision of this work has led me to make a couple of notational innovations not de-
scribed explicitly in the early printings of Volume 1. I trust that most readers will nd these immediately com-
prehensible. If, however, you nd that there is a puzzling cross-reference which you are unable to match with
anything in the version of Volume 1 which you are using, it may be worth while checking the errata pages in
http://www.essex.ac.uk/maths/staff/fremlin/mterr.htm.
Note on second edition, January 2010
For the new (Lulu) edition of this volume, I have eliminated a number of further errors; no doubt many remain.
There are many new exercises, several new theorems and some corresponding rearrangements of material. The new
results are mostly additions with little eect on the structure of the work, but there is a short new section (266) on
the Brunn-Minkowski inequality.
12 Taxonomy of measure spaces
*Chapter 21
Taxonomy of measure spaces
I begin this volume with a starred chapter. The point is that I do not really recommend this chapter for beginners.
It deals with a variety of technical questions which are of great importance for the later development of the subject,
but are likely to be both abstract and obscure for anyone who has not encountered the problems these techniques are
designed to solve. On the other hand, if (as is customary) this work is omitted, and the ideas are introduced only when
urgently needed, the student is likely to nish with very vague ideas on which theorems can be expected to apply to
which types of measure space, and with no vocabulary in which to express those ideas. I therefore take a few pages to
introduce the terminology and concepts which can be used to distinguish good measure spaces from others, with a
few of the basic relationships. The only paragraphs which are immediately relevant to the theory set out in Volume 1
are those on complete, -nite and semi-nite measure spaces (211A, 211D, 211F, 211Lc, 212, 213A-213B, 215B),
and on Lebesgue measure (211M). For the rest, I think that a newcomer to the subject can very well pass over this
chapter for the time being, and return to it for particular items when the text of later chapters refers to it. On the
other hand, it can also be used as an introduction to the avour of the purest kind of measure theory, the study of
measure spaces for their own sake, with a systematic discussion of a few of the elementary constructions.
211 Denitions
I start with a list of denitions, corresponding to the concepts which I have found to be of value in distinguishing
dierent types of measure space. Necessarily, the signicance of many of these ideas is likely to be obscure until you
have encountered some of the obstacles which arise later on. Nevertheless, you will I hope be able to deal with these
denitions on a formal, abstract basis, and to follow the elementary arguments involved in establishing the relationships
between them (211L).
In 216C-216E below I will give three substantial examples to demonstrate the rich variety of objects which the
denition of measure space encompasses. In the present section, therefore, I content myself with very brief descriptions
of sucient cases to show at least that each of the denitions here discriminates between dierent spaces (211M-211R).
211A Denition Let (X, , ) be a measure space. Then , or (X, , ), is (Caratheodory) complete if whenever
A E and E = 0 then A ; that is, if every negligible subset of X is measurable.
211B Denition Let (X, , ) be a measure space. Then (X, , ) is a probability space if X = 1. In this case
is called a probability or probability measure.
211C Denition Let (X, , ) be a measure space. Then , or (X, , ), is totally nite if X < .
211D Denition Let (X, , ) be a measure space. Then , or (X, , ), is -nite if there is a sequence E
n

nN
of measurable sets of nite measure such that X =

nN
E
n
.
Remark Note that in this case we can set
F
n
= E
n

i<n
E
i
, G
n
=

in
E
i
for each n, to obtain a partition F
n

nN
of X (that is, a disjoint cover of X) into measurable sets of nite measure,
and a non-decreasing sequence G
n

nN
of sets of nite measure covering X.
211E Denition Let (X, , ) be a measure space. Then , or (X, , ), is strictly localizable or decomposable
if there is a partition X
i

iI
of X into measurable sets of nite measure such that
= E : E X, E X
i
i I,
E =

iI
(E X
i
) for every E .
I will call such a family X
i

iI
a decomposition of X.
Remark In this context, we can interpret the sum

iI
(E X
i
) simply as
sup

iJ
(E X
i
) : J is a nite subset of I,
taking

i
(E X
i
) = 0, because we are concerned only with sums of non-negative terms (cf. 112Bd).
211L Denitions 13
211F Denition Let (X, , ) be a measure space. Then , or (X, , ), is semi-nite if whenever E and
E = there is an F E such that F and 0 < F < .
211G Denition Let (X, , ) be a measure space. Then , or (X, , ), is localizable if it is semi-nite and, for
every c , there is an H such that (i) E H is negligible for every E c (ii) if G and E G is negligible for
every E c, then H G is negligible. It will be convenient to call such a set H an essential supremum of c in .
Remark The denition here is clumsy, because really the concept applies to measure algebras rather than to measure
spaces (see 211Ya-211Yb). However, the present denition can be made to work (see 213N, 241G, 243G below) and
enables us to proceed without a formal introduction to the concept of measure algebra before the time comes to do
the job properly in Volume 3.
211H Denition Let (X, , ) be a measure space. Then , or (X, , ), is locally determined if it is semi-nite
and
= E : E X, E F whenever F and F < ;
that is to say, for any E TX there is an F such that F < and E F / .
211I Denition Let (X, , ) be a measure space. A set E is an atom for if E > 0 and whenever F ,
F E one of F, E F is negligible.
211J Denition Let (X, , ) be a measure space. Then , or (X, , ), is atomless or diused if there are no
atoms for . (Note that this is not the same thing as saying that all nite sets are negligible; see 211R below. Some
authors use the word continuous in this context.)
211K Denition Let (X, , ) be a measure space. Then , or (X, , ), is purely atomic if whenever E
and E is not negligible there is an atom for included in E.
Remark Recall that a measure on a set X is point-supported if measures every subset of X and E =

xE
x
for every E X (112Bd). Every point-supported measure is purely atomic, because x must be an atom whenever
x > 0, but not every purely atomic measure is point-supported (211R).
211L The relationships between the concepts above are in a sense very straightforward; all the direct implications
in which one property implies another are given in the next theorem.
Theorem (a) A probability space is totally nite.
(b) A totally nite measure space is -nite.
(c) A -nite measure space is strictly localizable.
(d) A strictly localizable measure space is localizable and locally determined.
(e) A localizable measure space is semi-nite.
(f) A locally determined measure space is semi-nite.
proof (a), (b), (e) and (f) are trivial.
(c) Let (X, , ) be a -nite measure space; let F
n

nN
be a disjoint sequence of measurable sets of nite measure
covering X (see the remark in 211D). If E , then of course E F
n
for every n N, and
E =

n=0
(E F
n
) =

nN
(E F
n
).
If E X and E F
n
for every n N, then
E =

nN
E F
n
.
So F
n

nN
is a decomposition of X and (X, , ) is strictly localizable.
(d) Let (X, , ) be a strictly localizable measure space; let X
i

iI
be a decomposition of X.
(i) Let c be a family of measurable subsets of X. Let T be the family of measurable sets F X such that
(F E) = 0 for every E c. Note that T and, if F
n

nN
is any sequence in T, then

nN
F
n
T. For each
i I, set
i
= sup(F X
i
) : F T and choose a sequence F
in

nN
in T such that lim
n
(F
in
X
i
) =
i
; set
F
i
=

nN
F
in
T.
14 Taxonomy of measure spaces 211L
Set
F =

iI
F
i
X
i
X
and H = X F.
We see that F X
i
= F
i
X
i
for each i I (because X
i

iI
is disjoint), so F and H . For any E c,
(E H) = (E F) =

iI
(E F X
i
) =

iI
(E F
i
X
i
) = 0
because every F
i
belongs to T. Thus F T. If G and (E G) = 0 for every E c, then X G and
F

= F (X G) belong to T. So (F

X
i
)
i
for each i I. But also (F X
i
) sup
nN
(F
in
X
i
) =
i
, so
(F X
i
) = (F

X
i
) for each i. Because X
i
is nite, it follows that ((F

F) X
i
) = 0, for each i. Summing
over i, (F

F) = 0, that is, (H G) = 0.
Thus H is an essential supremum for c in . As c is arbitrary, (X, , ) is localizable.
(ii) If E and E = , then there is some i I such that
0 < (E X
i
) X
i
< ;
so (X, , ) is semi-nite. If E X and E F whenever F < , then, in particular, E X
i
for every i I,
so E ; thus (X, , ) is locally determined.
211M Example: Lebesgue measure Let us consider Lebesgue measure in the light of the concepts above. Write
for Lebesgue measure on R
r
and for its domain.
(a) is complete, because it is constructed by Caratheodorys method; if A E and E = 0, then

A =

E = 0
(writing

for Lebesgue outer measure), so, for any B R,

(B A) +

(B A) 0 +

B =

B,
and A must be measurable.
(b) is -nite, because R
r
=

nN
[n, n], writing n for the vector (n, . . . , n), and [n, n] = (2n)
r
< for
every n. Of course is neither totally nite nor a probability measure.
(c) Because is -nite, it is strictly localizable (211Lc), localizable (211Ld), locally determined (211Ld) and
semi-nite (211Le or 211Lf).
(d) is atomless. PPP Suppose that E . Consider the function
a f(a) = (E [a, a]) : [0, [ R
We have
f(a) f(b) f(a) +[b, b] [a, a] = f(a) + (2b)
r
(2a)
r
whenever a b in [0, [, so f is continuous. Now f(0) = 0 and lim
n
f(n) = E > 0. By the Intermediate Value
Theorem there is an a [0, [ such that 0 < f(a) < E. So we have
0 < (E [a, a]) < E.
As E is arbitrary, is atomless. QQQ
(e) It is now a trivial observation that cannot be purely atomic, because R
r
itself is a set of positive measure not
including any atom.
211N Counting measure Take X to be any uncountable set (e.g., X = R), and to be counting measure on X
(112Bd).
(a) is complete, because if A E and E = 0 then
A = E = .
(b) is not -nite, because if E
n

nN
is any sequence of sets of nite measure then every E
n
is nite, therefore
countable, and

nN
E
n
is countable (1A1F), so cannot be X. A fortiori, is not a probability measure nor totally
nite.
211Rb Denitions 15
(c) is strictly localizable. PPP Set X
x
= x for every x X. Then X
x

xX
is a partition of X, and for any E X
(E X
x
) = 1 if x E, 0 otherwise.
By the denition of ,
E =

xX
(E X
x
)
for every E X, and X
x

xX
is a decomposition of X. QQQ
Consequently is localizable, locally determined and semi-nite.
(d) is purely atomic. PPP x is an atom for every x X, and if E > 0 then surely E includes x for some x.
QQQ Obviously, is not atomless.
211O A non-semi-nite space Set X = 0, = , X, = 0 and X = . Then is not semi-nite, as
X = but X has no subset of non-zero nite measure. It follows that cannot be localizable, locally determined,
-nite, totally nite nor a probability measure. Because = TX, is complete. X is an atom for , so is purely
atomic (indeed, it is point-supported).
211P A non-complete space Write B for the -algebra of Borel subsets of R (111G), and for the restriction
of Lebesgue measure to B (recall that by 114G every Borel subset of R is Lebesgue measurable). Then (R, B, ) is
atomless, -nite and not complete.
proof (a) To see that is not complete, recall that there is a continuous, strictly increasing bijection g : [0, 1] [0, 1]
such that g[C] > 0, where C is the Cantor set, so that there is a set A g[C] which is not Lebesgue measurable
(134Ib). Now g
1
[A] C cannot be a Borel set, since A = (g
1
[A])g
1
is not Lebesgue measurable, therefore not
Borel measurable, and the composition of two Borel measurable functions is Borel measurable (121Eg); so g
1
[A] is a
non-measurable subset of the negligible set C.
(b) The rest of the arguments of 211M apply to just as well as to true Lebesgue measure, so is -nite and
atomless.
*Remark The argument oered in (a) could give rise to a seriously false impression. The set A referred to there can
be constructed only with the use of a strong form of the axiom of choice. No such device is necessary for the result
here. There are many methods of constructing non-Borel subsets of the Cantor set, all illuminating in dierent ways.
In the absence of any form of the axiom of choice, there are diculties with the concept of Borel set, and others with
the concept of Lebesgue measure, which I will come to in Chapter 56; but countable choice is quite sucient for the
existence of a non-Borel subset of R. For details of a possible approach see 423L in Volume 4.
211Q Some probability spaces Two obvious constructions of probability spaces, restricting myself to the methods
described in Volume 1, are
(a) the subspace measure induced by Lebesgue measure on [0, 1] (131B);
(b) the point-supported measure induced on a set X by a function h : X [0, 1] such that

xX
h(x) = 1, writing
E =

xE
h(x) for every E X; for instance, if X is a singleton x and h(x) = 1, or if X = N and h(n) = 2
n1
.
Of these two, (a) gives an atomless probability measure and (b) gives a purely atomic probability measure.
211R Countable-cocountable measure The following is one of the basic constructions to keep in mind when
considering abstract measure spaces.
(a) Let X be any set. Let be the family of those sets E X such that either E or X E is countable. Then is
a -algebra of subsets of X. PPP (i) is countable, so belongs to . (ii) The condition for E to belong to is symmetric
between E and X E, so X E for every E . (iii) Let E
n

nN
be any sequence in , and set E =

nN
E
n
. If
every E
n
is countable, then E is countable, so belongs to . Otherwise, there is some n such that X E
n
is countable,
in which case X E X E
n
is countable, so again E . QQQ is called the countable-cocountable -algebra
of X.
(b) Now consider the function : 0, 1 dened by writing E = 0 if E is countable, E = 1 if E and E
is not countable. Then is a measure. PPP (i) is countable so = 0. (ii) Let E
n

nN
be a disjoint sequence in ,
and E its union. () If every E
m
is countable, then so is E, so
E = 0 =

n=0
E
n
.
16 Taxonomy of measure spaces 211Rb
() If some E
m
is uncountable, then E E
m
also is uncountable, and E = E
m
= 1. But in this case, because
E
m
, X E
m
is countable, so E
n
, being a subset of X E
m
, is countable for every n ,= m; thus E
n
= 0 for every
n ,= m, and
E = 1 =

n=0
E
n
.
As E
n

nN
is arbitrary, is a measure. QQQ This is the countable-cocountable measure on X.
(c) If X is any uncountable set and is the countable-cocountable measure on X, then is a complete, purely
atomic probability measure, but is not point-supported. PPP (i) If A E and E = 0, then E is countable, so A also
is countable and belongs to . Thus is complete. (ii) Because X is uncountable, X = 1 and is a probability
measure. (iii) If E > 0, then F = E = 1 whenever F is a non-negligible measurable subset of E, so E is itself an
atom; thus is purely atomic. (iv) X = 1 > 0 =

xX
x, so is not point-supported. QQQ
211X Basic exercises >>>(a) Let (X, , ) be a measure space. Show that is -nite i there is a totally nite
measure on X with the same measurable sets and the same negligible sets as .
>>>(b) Let g : R R be a non-decreasing function and
g
the associated Lebesgue-Stieltjes measure (114Xa). Show
that
g
is complete and -nite. Show that
(i)
g
is totally nite i g is bounded;
(ii)
g
is a probability measure i lim
x
g(x) lim
x
g(x) = 1;
(iii)
g
is atomless i g is continuous;
(iv) if E is any atom for
g
, there is a point x E such that
g
E =
g
x;
(v)
g
is purely atomic i it is point-supported.
>>>(c) Let be counting measure on a set X. Show that is always complete, strictly localizable and purely atomic,
and that it is -nite i X is countable, totally nite i X is nite, a probability measure i X is a singleton, and
atomless i X is empty.
(d) Show that a point-supported measure is always complete, and is strictly localizable i it is semi-nite.
(e) Let X be a set. Show that for any -ideal J of subsets of X (denition: 112Db), the set
= J X A : A J
is a -algebra of subsets of X, and that there is a measure : 0, 1 given by setting
E = 0 if E J, E = 1 if E J.
Show that J is precisely the null ideal of , that is complete, totally nite and purely atomic, and is a probability
measure i X / J.
(f ) Show that a point-supported measure is strictly localizable i it is semi-nite.
(g) Let (X, , ) be a measure space such that X > 0. Show that the set of conegligible subsets of X is a lter on
X.
211Y Further exercises (a) Let (X, , ) be a measure space, and for E, F write E F if (EF) = 0.
Show that is an equivalence relation on . Let A be the set of equivalence classes in for ; for E , write
E

A for its equivalence class. Show that there is a partial ordering on A dened by saying that, for E, F ,
E

(E F) = 0.
Show that is localizable i for every A A there is an h A such that (i) a h for every a A (ii) whenever g A
is such that a g for every a A, then h g.
(b) Let (X, , ) be a measure space, and construct A as in 211Ya. Show that there are operations , , \ on A
dened by saying that
E

= (E F)

, E

= (E F)

, E

\ F

= (E F)

for all E, F . Show that if A A is any countable set, then there is an h A such that (i) a h for every a A
(ii) whenever g A is such that a g for every a A, then h g. Show that there is a functional : A [0, ]
dened by saying that (E

) = E for every E . ((A, ) is called the measure algebra of (X, , ).)


212B Complete spaces 17
(c) Let (X, , ) be a semi-nite measure space. Show that it is atomless i whenever > 0, E and E < ,
then there is a nite partition of E into measurable sets of measure at most .
(d) Let (X, , ) be a strictly localizable measure space. Show that it is atomless i for every > 0 there is a
decomposition of X consisting of sets of measure at most .
(e) Let be the countable-cocountable -algebra of R. Show that [0, [ / . Let be the restriction of counting
measure to . Show that (R, , ) is complete, semi-nite and purely atomic, but not localizable nor locally determined.
211 Notes and comments The list of denitions in 211A-211K probably strikes you as quite long enough, even though
I have omitted many occasionally useful ideas. The concepts here vary widely in importance, and the importance of
each varies widely with context. My own view is that it is absolutely necessary, when studying any measure space, to
know its classication under the eleven discriminating features listed here, and to be able to describe any atoms which
are present. Fortunately, for most ordinary measure spaces, the classication is fairly quick, because if (for instance)
the space is -nite, and you know the measure of the whole space, the only remaining questions concern completeness
and atoms. The distinctions between spaces which are, or are not, strictly localizable, semi-nite, localizable and
locally determined are relevant only for spaces which are not -nite, and do not arise in elementary applications.
I think it is also fair to say that the notions of complete and locally determined measure space are technical; I mean,
that they do not correspond to signicant features of the essential structure of a space, though there are some interesting
problems concerning incomplete measures. One manifestation of this is the existence of canonical constructions for
rendering spaces complete or complete and locally determined (212C, 213D-213E). In addition, measure spaces which
are not semi-nite do not really belong to measure theory, but rather to the more general study of -algebras and
-ideals. The most important classications, in terms of the behaviour of a measure space, seem to me to be -nite,
localizable and strictly localizable; these are the critical features which cannot be forced by elementary constructions.
If you know anything about Borel subsets of the real line, the argument of part (a) of the proof of 211P must look
very clumsy. But better proofs rely on ideas which we shall not need until Volume 4, and the proof here is based on
a construction which we have to understand for other reasons.
212 Complete spaces
In the next two sections of this chapter I give brief accounts of the theory of measure spaces possessing certain of
the properties described in 211. I begin with completeness. I give the elementary facts about complete measure
spaces in 212A-212B; then I turn to the notion of completion of a measure (212C) and its relationships with the other
concepts of measure theory introduced so far (212D-212G).
212A Proposition Any measure space constructed by Caratheodorys method is complete.
proof Recall that Caratheodorys method starts from an arbitrary outer measure : TX [0, ] and sets
= E : E X, A = (A E) +(A E) for every A X, =
(113C). In this case, if B E and E = 0, then B = E = 0 (113A(ii)), so for any A X we have
(A B) +(A B) = (A B) A (A B) +(A B),
and B .
212B Proposition (a) If (X, , ) is a complete measure space, then any conegligible subset of X is measurable.
(b) Let (X, , ) be a complete measure space, and f a [, ]-valued function dened on a subset of X. If f is
virtually measurable (that is, there is a conegligible set E X such that fE is measurable), then f is measurable.
(c) Let (X, , ) be a complete measure space, and f a real-valued function dened on a conegligible subset of X.
Then the following are equiveridical, that is, if one is true so are the others:
(i) f is integrable;
(ii) f is measurable and [f[ is integrable;
(iii) f is measurable and there is an integrable function g such that [f[
a.e.
g.
proof (a) If E is conegligible, then X E is negligible, therefore measurable, and E is measurable.
(b) Let a R. Then there is an H such that
x : (fE)(x) a = H dom(fE) = H E domf.
18 Taxonomy of measure spaces 212B
Now F = x : x domf E, f(x) a is a subset of the negligible set X E, so is measurable, and
x : f(x) a = (F H) domf
domf
,
writing
D
= D E : E , as in 121A. As a is arbitrary, f is measurable (135E).
(c)(i)(ii) If f is integrable, then by 122P f is virtually measurable and by 122Re [f[ is integrable. By (b) here,
f is measurable, so (ii) is true.
(ii)(iii) is trivial.
(iii)(i) If f is measurable and g is integrable and [f[
a.e.
g, then f is virtually measurable, [g[ is integrable
and [f[
a.e.
[g[, so 122P tells us that f is integrable.
212C The completion of a measure Let (X, , ) be any measure space.
(a) Let

be the family of those sets E X such that there are E

, E

with E

E E

and (E

) = 0.
Then

is a -algebra of subsets of X. PPP (i) Of course belongs to

, because we can take E

= E

= . (ii) If E

,
take E

, E

such that E

E E

and (E

) = 0. Then
X E

X E X E

, ((X E

) (X E

)) = (E

) = 0,
so X E

. (iii) If E
n

nN
is a sequence in

, then for each n choose E

n
, E

n
such that E

n
E
n
E

n
and
(E

n
E

n
) = 0. Set
E =

nN
E
n
, E

nN
E

n
, E

nN
E

n
;
then E

E E

and E

nN
(E

n
E

n
) is negligible, so E

. QQQ
(b) For E

, set
E =

E = minF : E F
(132A). It is worth remarking at once that if E

, E

, E

, E

E E

and (E

) = 0, then
E

= E = E

; this is because
E

= E

= E

+(E

E) = E

(recalling from 132A, or noting now, that

B whenever A B X, and that

agrees with on ).
(c) We now nd that (X,

, ) is a measure space. PPP (i) Of course , like , takes values in [0, ]. (ii) = = 0.
(iii) Let E
n

nN
be a disjoint sequence in

, with union E. For each n N choose E

n
, E

n
such that E

n
E
n
E

n
and (E

n
E

n
) = 0. Set E

nN
E

n
, E

nN
E

n
. Then (as in (a-iii) above) E

E E

and (E

) = 0,
so
E = E

n=0
E

n
=

n=0
E
n
because E

nN
, like E
n

nN
, is disjoint. QQQ
(d) The measure space (X,

, ) is called the completion of the measure space (X, , ); equally, I will call the
completion of , and occasionally (if it seems plain which null ideal is under consideration) I will call

the completion
of . Members of

are sometimes called -measurable.
212D There is something I had better check at once.
Proposition Let (X, , ) be any measure space. Then (X,

, ), as dened in 212C, is a complete measure space
and is an extension of ; and (X,

, ) = (X, , ) i (X, , ) is complete.
proof (a) Suppose that A E

and E = 0. Then (by 212Cb) there is an E

such that E E

and E

= 0.
Accordingly we have
A E

, (E

) = 0,
so A

. As A is arbitrary, is complete.
(b) If E , then of course E

, because E E E and (E E) = 0; and E =

E = E. Thus

and
extends .
212F Complete spaces 19
(c) If = then of course must be complete. If is complete, and E

, then there are E

, E

such that
E

E E

and (E

) = 0. But now E E

, so (because (X, , ) is complete) E E

and
E = E

(E E

) . As E is arbitrary,

and

= and = .
212E The importance of this construction is such that it is worth spelling out some further elementary properties.
Proposition Let (X, , ) be a measure space, and (X,

, ) its completion.
(a) The outer measures

dened from and coincide.


(b) , give rise to the same negligible and conegligible sets and the same sets of full outer measure.
(c) is the only measure with domain

which agrees with on .
(d) A subset of X belongs to

i it is expressible as FA where F and A is -negligible.
proof (a) Take any A X. (i) If A F , then F

and F = F, so

A F = F;
as F is arbitrary,

A. (ii) If A E

, there is an E

such that E E

and E

= E, so

A E

= E;
as E is arbitrary,

A.
(b) Now, for A X,
A is -negligible

A = 0

A = 0 A is -negligible,
A is -conegligible

(X A) = 0

(X A) = 0 A is -conegligible.
If A has full outer measure for , F

and F A = , then there is an F

such that F

F and F

= F;
as F

A = , F

is -negligible and F is -negligible; as F is arbitrary, A has full outer measure for . In the other
direction, of course, if A has full outer measure for then

(F A) =

(F A) = F = F
for every F , so A has full outer measure for .
(c) If is any measure with domain

extending , we must have
E

E E

, E

= E = E

,
so E = E, whenever E

, E

, E

E E

and (E

) = 0.
(d)(i) If E

, take E

, E

such that E

E E

and (E

) = 0. Then E E

, so E E

is
-negligible, and E = E

(E E

) is the symmetric dierence of a member of and a negligible set.


(ii) If E = FA, where F and A is -negligible, take G such that G = 0 and A G; then
F G E F G and ((F G) (F G)) = G = 0, so E

.
212F Now let us consider integration with respect to the completion of a measure.
Proposition Let (X, , ) be a measure space and (X,

, ) its completion.
(a) A [, ]-valued function f dened on a subset of X is

-measurable i it is -virtually measurable.
(b) Let f be a [, ]-valued function dened on a subset of X. Then
_
fd =
_
fd if either is dened in
[, ]; in particular, f is -integrable i it is -integrable.
proof (a)(i) Suppose that f is a [, ]-valued

-measurable function. For q Q let E
q


be such that
x : f(x) q = domf E
q
, and choose E

q
, E

q
such that E

q
E
q
E

q
and (E

q
E

q
) = 0. Set
H = X

qQ
(E

q
E

q
); then H is -conegligible. For a R set
G
a
=

qQ,q<a
E

q
;
then
x : x dom(fH), (fH)(x) < a = G
a
dom(fH).
This shows that fH is -measurable, so that f is -virtually measurable.
20 Taxonomy of measure spaces 212F
(ii) If f is -virtually measurable, then there is a -conegligible set H X such that fH is -measurable. Since


, fH is also

-measurable. And H is -conegligible, by 212Eb. But this means that f is -virtually measurable,
therefore

-measurable, by 212Bb.
(b)(i) Let f : D [, ] be a function, where D X. If either of
_
fd,
_
fd is dened in [, ], then
f is virtually measurable, and dened almost everywhere, for one of the appropriate measures, and therefore for both
(putting (a) above together with 212Bb).
(ii) Now suppose that f is non-negative and integrable either with respect to or with respect to . Let E
be a conegligible set included in domf such that fE is -measurable. For n N, k 1 set
E
nk
= x : x E, f(x) 2
n
k;
then each E
nk
belongs to and is of nite measure for both and . (If f is -integrable,
E
nk
= E
nk
2
n
_
fd;
if f is -integrable,
E
nk
= E
nk
2
n
_
fd .)
So
f
n
=

4
n
k=1
2
n
E
nk
is both -simple and -simple, and
_
f
n
d =
_
f
n
d . Observe that, for x E,
f
n
(x) = 2
n
k if k < 4
n
and 2
n
k f(x) < 2
n
(k + 1),
= 2
n
if f(x) 2
n
.
Thus f
n

nN
is a non-decreasing sequence of functions converging to f at every point of E, that is, both -almost
everywhere and -almost everywhere. So we have, for any c R,
_
fd = c lim
n
_
f
n
d = c
lim
n
_
f
n
d = c
_
fd = c.
(iii) As for innite integrals, recall that for a non-negative function I write
_
f = just when f is dened
almost everywhere, is virtually measurable, and is not integrable. So (i) and (ii) together show that
_
fd =
_
fd
whenever f is non-negative and either integral is dened in [0, ].
(iv) Since both , agree that
_
f is to be interpreted as
_
f
+

_
f

just when this can be dened in [, ],


writing f
+
(x) = max(f(x), 0), f

(x) = max(f(x), 0) for x domf, the result for general real-valued f follows at
once.
212G I turn now to the question of the eect of the construction on the properties listed in 211B-211K.
Proposition Let (X, , ) be a measure space, and (X,

, ) its completion.
(a) (X,

, ) is a probability space, or totally nite, or -nite, or semi-nite, or localizable, i (X, , ) is.
(b) (X,

, ) is strictly localizable if (X, , ) is, and any decomposition of X for is a decomposition for .
(c) A set H

is an atom for i there is an E such that E is an atom for and (HE) = 0.
(d) (X,

, ) is atomless or purely atomic i (X, , ) is.
proof (a)(i) Because X = X, (X,

, ) is a probability space, or totally nite, i (X, , ) is.
(ii)() If (X, , ) is -nite, there is a sequence E
n

nN
, covering X, with E
n
< for each n. Now E
n
<
for each n, so (X,

, ) is -nite.
() If (X,

, ) is -nite, there is a sequence E
n

nN
, covering X, with E
n
< for each n. Now we can
nd, for each n, an E

n
such that E

n
< and E
n
E

n
; so that E

nN
witnesses that (X, , ) is -nite.
(iii)() If (X, , ) is semi-nite and E = , then there is an E

such that E

E and E

= . Next,
there is an F such that F E

and 0 < F < . Of course we now have F



, F E and 0 < F < . As E
is arbitrary, (X,

, ) is semi-nite.
212X Complete spaces 21
() If (X,

, ) is semi-nite and E = , then E = , so there is an F E such that 0 < F < . Next,
there is an F

such that F

F and F

= F. Of course we now have F

E and 0 < F

< . As E is
arbitrary, (X, , ) is semi-nite.
(iv)() If (X, , ) is localizable and c

, then set
T = F : F , E c, F E.
Let H be an essential supremum of T in , as in 211G.
If E c, there is an E

such that E

E and E E

is negligible; now E

T, so
(E H) (E E

) +(E

H) = 0.
If G

and (E G) = 0 for every E c, let G

be such that G G

and (G

G) = 0; then, for any F T,


there is an E c including F, so that
(F G

) (E G) = 0.
As F is arbitrary, (H G

) = 0 and (H G) = 0. This shows that H is an essential supremum of c in



. As c is
arbitrary, (X,

, ) is localizable.
() Suppose that (X,

, ) is localizable and that c . Working in (X,

, ), let H be an essential supremum
for c in

. Let H

be such that H

H and (H H

) = 0. Then
(E H

) (E H) + (H H

) = 0
for every E c; while if G and (E G) = 0 for every E c, we must have
(H

G) (H G) = 0.
Thus H

is an essential supremum of c in . As c is arbitrary, (X, , ) is localizable.


(b) Let X
i

iI
be a decomposition of X for , as in 211E. Of course it is a partition of X into sets of nite
-measure. If H X and H X
i


for every i, choose for each i I sets E

i
, E

i
such that
E

i
H X
i
E

i
, (E

i
E

i
) = 0.
Set E

iI
E

i
, E

iI
(E

i
X
i
). Then E

X
i
= E

i
, E

X
i
= E

i
X
i
for each i, so E

and E

belong to .
Also
(E

) =

iI
(E

i
X
i
E

i
) = 0.
As E

H E

, H

and
H = E

iI
E

i
=

iI
(H X
i
).
As H is arbitrary, X
i

iI
is a decomposition of X for .
Accordingly, (X,

, ) is strictly localizable if such a decomposition exists, which is so if (X, , ) is strictly localiz-
able.
(c)-(d)(i) Suppose that E

is an atom for . Let E

be such that E

E and (E E

) = 0. Then
E

= E > 0. If F and F E

, then F E, so either F = F = 0 or (E

F) = (E F) = 0. As F is
arbitrary, E

is an atom for , and (EE

) = (E E

) = 0.
(ii) Suppose that E is an atom for , and that H

is such that (HE) = 0. Then H = E > 0. If
F

and F H, let F

F be such that F

and (F F

) = 0. Then E F

E and (F(E F

)) = 0, so
either F = (E F

) = 0 or (H F) = (E F

) = 0. As F is arbitrary, H is an atom for .


(iii) It follows at once that (X,

, ) is atomless i (X, , ) is.
(iv)() On the other hand, if (X, , ) is purely atomic and H > 0, there is an E such that E H and
E > 0, and an atom F for such that F E; but F is also an atom for . As H is arbitrary, (X,

, ) is purely
atomic.
() And if (X,

, ) is purely atomic and E > 0, then there is an H E which is an atom for ; now let
F be such that F H and (H F) = 0, so that F is an atom for and F E. As E is arbitary, (X, , ) is
purely atomic.
212X Basic exercises >>>(a) Let (X, , ) be a complete measure space. Suppose that A E and that

A+

(E A) = E < . Show that A .


22 Taxonomy of measure spaces 212Xb
>>>(b) Let and be two measures on a set X, with completions and . Show that the following are equiveridical:
(i) the outer measures

dened from and coincide; (ii) E = E whenever either is dened and nite; (iii)
_
fd =
_
fd whenever f is a real-valued function such that either integral is dened and nite. (Hint: for (i)(ii),
if E < , take a measurable envelope F of E for and calculate

E +

(F E).)
(c) Let be the restriction of Lebesgue measure to the Borel -algebra of R, as in 211P. Show that its completion
is Lebesgue measure itself. (Hint: 134F.)
(d) Repeat 212Xc for (i) Lebesgue measure on R
r
(ii) Lebesgue-Stieltjes measures on R (114Xa).
(e) Let X be a set and a -algebra of subsets of X. Let J be a -ideal of subsets of X (112Db). (i) Show that

1
= EA : E , A J is a -algebra of subsets of X. (ii) Let
2
be the family of sets E X such that there
are E

, E

with E

E E

and E

J. Show that
2
is a -algebra of subsets of X and that
2

1
.
(iii) Show that
2
=
1
i every member of J is included in a member of J.
(f ) Let (X, , ) be a measure space, Y any set and : X Y a function. Set B =

1
[B] for every B Y .
(i) Show that is an outer measure on Y . (ii) Let be the measure dened from by Caratheodorys method, and
T its domain. Show that if C Y and
1
[C] then C T. (iii) Suppose that (X, , ) is complete and totally
nite. Show that is the image measure
1
.
(g) Let g, h be two non-decreasing functions from R to itself, and
g
,
h
the associated Lebesgue-Stieltjes measures.
Show that a real-valued function f dened on a subset of R is
g+h
-integrable i it is both
g
-integrable and
h
-
integrable, and that then
_
fd
g+h
=
_
fd
g
+
_
fd
h
. (Hint: 114Yb).
(h) Let (X, , ) be a measure space, and J a -ideal of subsets of X; set
1
= EA : E , A J, as in
212Xe. Show that if every member of J is -negligible, then there is a unique extension of to a measure
1
with
domain
1
such that
1
A = 0 for every A J.
(i) Let (X, , ) be a complete measure space such that X > 0, Y a set, f : X Y a function and f
1
the
image measure on Y . Show that if T is the lter of -conegligible subsets of X, then the image lter f[[T]] (2A1Ib) is
the lter of f
1
-conegligible subsets of Y .
212Y Further exercises (a) Let X be a set and an inner measure on X, that is, a functional from TX to [0, ]
such that
= 0,
(A B) A+B if A B = ,
(

nN
A
n
) = lim
n
A
n
whenever A
n

nN
is a non-increasing sequence of subsets of X and A
0
< ,
if A = , a R there is a B A such that a B < .
Let be the measure dened from , that is, = , where
= E : (A) = (A E) +(A E) A X
(113Yg). Show that must be complete.
212 Notes and comments The process of completion is so natural, and so universally applicable, and so convenient,
that over large parts of measure theory it is reasonable to use only complete measure spaces. Indeed many authors
so phrase their denitions that, explicitly or implicitly, only complete measure spaces are considered. In this treatise
I avoid taking quite such a large step, even though it would simplify the statements of many of the theorems in this
volume (for instance). I did take the trouble, in Volume 1, to give a denition of integrable function which, in eect,
looks at integrability with respect to the completion of a measure (212Fb). There are non-complete measure spaces
which are worthy of study (for example, the restriction of Lebesgue measure to the Borel -algebra of R see 211P),
and some interesting questions to be dealt with in Volumes 3 and 5 apply to them. At the cost of rather a lot of verbal
manoeuvres, therefore, I prefer to write theorems out in a form in which they can be applied to arbitrary measure
spaces, without assuming completeness. But it would be reasonable, and indeed would sharpen your technique, if you
regularly sought the alternative formulations which become natural if you are interested only in complete spaces.
213B Semi-nite, locally determined and localizable spaces 23
213 Semi-nite, locally determined and localizable spaces
In this section I collect a variety of useful facts concerning these types of measure space. I start with the characteristic
properties of semi-nite spaces (213A-213B), and continue with the complete locally determined spaces (213C) and the
concept of c.l.d. version (213D-213H), the most powerful of the universally available methods of modifying a measure
space into a better-behaved one. I briey discuss locally determined negligible sets (213I-213L), and measurable
envelopes (213L-213M), and end with results on localizable spaces (213N) and strictly localizable spaces (213O).
213A Lemma Let (X, , ) be a semi-nite measure space. Then
E = supF : F , F E, F <
for every E .
proof Set c = supF : F , F E, F < . Then surely c E, so if c = we can stop. If c < , let
F
n

nN
be a sequence of measurable subsets of E, of nite measure, such that lim
n
F
n
= c; set F =

nN
F
n
.
For each n N,

kn
F
k
is a measurable set of nite measure included in E, so (

kn
F
k
) c, and
F = lim
n
(

kn
F
k
) c.
Also
F sup
nN
F
n
c,
so F = c.
If F

is a measurable subset of E F and F

< , then F F

has nite measure and is included in E, so has


measure at most c = F; it follows that F

= 0. But this means that (E F) cannot be innite, since then, because


(X, , ) is semi-nite, it would have to include a measurable set of non-zero nite measure. So E F has nite
measure, and is therefore in fact negligible; and E = c, as claimed.
213B Proposition Let (X, , ) be a semi-nite measure space. Let f be a -virtually measurable [0, ]-valued
function dened almost everywhere in X. Then
_
f = sup
_
g : g is a simple function, g
a.e.
f
= sup
F,F<
_
F
f
in [0, ].
proof (a) For any measure space (X, , ), a [0, ]-valued function dened on a subset of X is integrable i there is
a conegligible set E such that
() E domf and fE is measurable,
() sup
_
g : g is a simple function, g
a.e.
f is nite,
() for every > 0, x : x E, f(x) has nite measure,
() f is nite almost everywhere
(see 122Ja, 133B). But if is semi-nite, () and () are consequences of the rest. PPP Let > 0. Set
E

= x : x E, f(x) ,
c = sup
_
g : g is a simple function, g
a.e.
f;
we are supposing that c is nite. If F E

is measurable and F < , then F is a simple function and F


a.e.
f,
so
F =
_
F c, F c/.
As F is arbitrary, 213A tells us that E

c/ is nite. As is arbitrary, () is satised.


As for (), if F = x : x E, f(x) = then F is nite (by ()) and nF
a.e.
f, so nF c, for every n N,
so F = 0. QQQ
(b) Now suppose that f : D [0, ] is a -virtually measurable function, where D X is conegligible, so that
_
f
is dened in [0, ] (135F). Then (a) tells us that
24 Taxonomy of measure spaces 213B
_
f = sup
g is simple,gf a.e.
_
g
(if either is nite, and therefore also if either is innite)
= sup
g is simple,gf a.e.,F<
_
F
g sup
F<
_
F
f
_
f,
so we have the equalities we seek.
*213C Proposition Let (X, , ) be a complete locally determined measure space, and

the outer measure


derived from (132A-132B). Then the measure dened from

by Caratheodorys method is itself.


proof Write for the measure dened by Caratheodorys method from

, and

for its domain.
(a) If E and A X then

(A E) +

(A E) =

A (132Af), so E

. Now E =

E = E (132Ac).
Thus

and = .
(b) Now suppose that H

. Let E be such that E < . Then HE . PPP Let E
1
, E
2
be measurable
envelopes of E H, E H respectively, both included in E (132Ee). Because H

,
E
1
+E
2
=

(E H) +

(E H) =

E = E.
As E
1
E
2
= E,
(E
1
E
2
) = E
1
+E
2
E = 0.
Now E
1
(E H) E
1
E
2
; because is complete, E
1
(E H) and E H belong to . QQQ
As E is arbitrary, and is locally determined, H . As H is arbitrary,

= and = .
213D C.l.d. versions: Proposition Let (X, , ) be a measure space. Write (X,

, ) for its completion (212C)
and
f
for E : E , E < . Set

= H : H X, H E

for every E
f
,
and for H

set
H = sup (H E) : E
f
.
Then (X,

, ) is a complete locally determined measure space.
proof (a) I check rst that

is a -algebra. PPP (i) E =

for every E
f
, so

. (ii) if H

then
(X H) E = E (E H)

for every E
f
, so X H

. (iii) If H
n

nN
is a sequence in

with union H, then
H E =

nN
H H
n

for every E
f
, so H

. QQQ
(b) It is obvious that = 0. If H
n

nN
is a disjoint sequence in

with union H, then
H = sup (H E) : E
f

= sup

n=0
(H
n
E) : E
f

n=0
H
n
.
On the other hand, given a <

n=0
H
n
, there is an m N such that a <

m
n=0
H
n
; now we can nd E
0
, . . . , E
m

f
such that a

m
n=0
(H
n
E
n
). Set E =

nm
E
n

f
; then
H (H E) =

n=0
(H
n
E)

m
n=0
(H
n
E
n
) a.
As a is arbitrary, H

n=0
H
n
and H =

n=0
H
n
.
(c) Thus (X,

, ) is a measure space. To see that it is semi-nite, note rst that



(because if H

then
surely HE

for every E
f
), and that H = H whenever H < (because then, by the denition in 212Ca,
213G Semi-nite, locally determined and localizable spaces 25
there is an E
f
such that H E, so that H = (H E) = H). Now suppose that H

and that H = .
There is surely an E
f
such that (H E) > 0; but now 0 < (H E) < , so 0 < (H E) < .
(d) Thus (X, , ) is a semi-nite measure space. To see that it is locally determined, let H X be such that
H G

whenever G

and G < . Then, in particular, we must have H E

for every E
f
. But this
means in fact that H E

for every E
f
, so that H

. As H is arbitrary, (X, , ) is locally determined.
(e) To see that (X,

, ) is complete, suppose that A H

and that H = 0. Then for every E
f
we must
have (H E) = 0. Because (X,

, ) is complete, and A E H E, A E

. As E is arbitrary, A

.
213E Denition For any measure space (X, , ), I will call (X,

, ), as constructed in 213D, the c.l.d. version
(complete locally determined version) of (X, , ); and will be the c.l.d. version of .
213F Following the same pattern as in 212E-212G, I start with some elementary remarks to facilitate manipulation
of this construction.
Proposition Let (X, , ) be any measure space and (X,

, ) its c.l.d. version.
(a)

and E = E whenever E and E < in fact, if (X,

, ) is the completion of (X, , ),


and E = E whenever E < .


(b) Writing

and

for the outer measures dened from and respectively,

A for every A X,
with equality if

A is nite. In particular, -negligible sets are -negligible; consequently, -conegligible sets are
-conegligible.
(c) For every H

there is an E such that E H and E = H; if H < then (H E) = 0.
proof (a) This is already covered by remarks in the proof of 213D.
(b) If

A = then surely

A. If

A < , take E such that A E and E =

A (132Aa). Then

A E = E =

A.
On the other hand, if A H

, then
H (H E)

A =

A,
using 212Ea. So

A and

A =

A.
(c) Write
f
for E : E , E < ; then, by the denition in 213D, H = sup (H E) : E
f
. Let
E
n

nN
be a sequence in
f
such that H = sup
nN
(H E
n
). For each n N there is an F
n
such that
F
n
H E
n
and F
n
= (H E
n
) (212C). Set E =

nN
F
n
. Then E , E H and
H = sup
nN
F
n
lim
n
(
_
in
F
i
) = E
= lim
n
(
_
in
F
i
) H,
so E = H, and if H < then (H E) = 0.
213G The next step is to look at functions which are measurable or integrable with respect to .
Proposition Let (X, , ) be a measure space, and (X,

, ) its c.l.d. version.
(a) If a real-valued function f dened on a subset of X is -virtually measurable, it is

-measurable.
(b) If a real-valued function is -integrable, it is -integrable with the same integral.
(c) If f is a -integrable real-valued function, there is a -integrable real-valued function which is equal to f -almost
everywhere.
proof Write
f
for E : E , E < . By 213Fa, and agree on
f
.
(a) By 212Fa, f is

-measurable, where

is the domain of the completion of ; but since



, f is

-measurable.
(b)(i) If f is a -simple function it is -simple, and
_
fd =
_
fd , because E = E for every E
f
.
(ii) If f is a non-negative -integrable function, there is a non-decreasing sequence f
n

nN
of -simple functions
converging to f -almost everywhere; now (by 213Fb) -negligible sets are -negligible, so f
n

nN
converges to f -a.e.
and (by B.Levis theorem, 123A) f is -integrable, with
26 Taxonomy of measure spaces 213G
_
fd = lim
n
_
f
n
d = lim
n
_
f
n
d =
_
fd.
(iii) In general, if
_
fd is dened in R, we have
_
fd =
_
f
+
d
_
f

d =
_
f
+
d
_
f

d =
_
fd,
writing f
+
= f 0, f

= (f) 0.
(c)(i) Let f be a -simple function. Express it as

n
i=0
a
i
H
i
where H
i
< for each i. Choose E
0
, . . . , E
n

such that E
i
H
i
and (H
i
E
i
) = 0 for each i (using 213Fc above). Then g =

n
i=0
a
i
E
i
is -simple, g = f -a.e.,
and
_
gd =
_
fd .
(ii) Let f be a non-negative -integrable function. Let f
n

nN
be a non-decreasing sequence of -simple functions
converging -almost everywhere to f. For each n, choose a -simple function g
n
equal -almost everywhere to f
n
.
Then x : g
n+1
(x) < g
n
(x) belongs to
f
and is -negligible, therefore -negligible. So g
n

nN
is non-decreasing
-almost everywhere. Because
lim
n
_
g
n
d = lim
n
_
f
n
d =
_
fd ,
B.Levis theorem tells us that g
n

nN
converges -almost everywhere to a -integrable function g; because -negligible
sets are -negligible,
(X domf) (X domg)

_
nN
x : f
n
(x) ,= g
n
(x)
x : x domf, f(x) ,= sup
nN
f
n
(x)
x : x domg, g(x) ,= sup
nN
g
n
(x)
is -negligible, and f = g -a.e.
(iii) If f is -integrable, express it as f
1
f
2
where f
1
and f
2
are -integrable and non-negative; then there are
-integrable functions g
1
, g
2
such that f
1
= g
1
, f
2
= g
2
-a.e., so that g = g
1
g
2
is -integrable and equal to f -a.e.
213H Thirdly, I turn to the eect of the construction here on the other properties being considered in this chapter.
Proposition Let (X, , ) be a measure space and (X,

, ) its completion, (X,

, ) its c.l.d. version.
(a) If (X, , ) is a probability space, or totally nite, or -nite, or strictly localizable, so is (X,

, ), and in all
these cases = ;
(b) if (X, , ) is localizable, so is (X,

, ), and for every H

there is an E such that (EH) = 0;
(c) (X, , ) is semi-nite i F = F for every F , and in this case
_
fd =
_
fd whenever the latter is
dened in [, ];
(d) a set H

is an atom for i there is an atom E for such that E < and (HE) = 0;
(e) if (X, , ) is atomless or purely atomic, so is (X,

, );
(f) (X, , ) is complete and locally determined i = .
proof (a)(i) I start by showing that if (X, , ) is strictly localizable, then = . PPP Let X
i

iI
be a decomposition
of X for ; then it is also a decomposition for (212Gb). If H

, we shall have HX
i


for every i, and therefore
H

; moreover,
H =

iI
(H X
i
) = sup

iJ
(H X
i
) : J I is nite
sup (H E) : E , E < = H H.
So H = H for every H

and = . QQQ
(ii) Consequently, if (X, , ) is a probability space, or totally nite, or -nite, or strictly localizable, so is
(X,

, ), using 212Ga-212Gb to see that (X,

, ) has the property involved.
(b) If (X, , ) is localizable, let 1 be any subset of

. Set
c = E : E , H 1, E H.
213H Semi-nite, locally determined and localizable spaces 27
Working in (X, , ), let F be an essential supremum for c.
(i) ??? Suppose, if possible, that there is an H 1 such that (H F) > 0. Then there is an E such that
E H F and E = (H F) > 0 (213Fc). This E belongs to c and (E F) = E > 0; which is impossible if F is
an essential supremum of c. XXX
(ii) Thus (H F) = 0 for every H 1. Now take any G

such that (H G) = 0 for every H 1. Let
E
0
be such that E
0
F G and E
0
= (F G); note that F E
0
F G. If E c, there is an H 1 such
that E H, so that
(E (F E
0
)) (H (F G)) (H F) + (H G) = 0.
Because F is an essential supremum for c in ,
0 = (F (F E
0
)) = E
0
= (F G).
This shows that F is an essential supremum for 1 in

. As 1 is arbitrary, (X,

, ) is localizable.
(iii) The argument of (i)-(ii) shows in fact that if 1

then 1 has an essential supremum F in

such that F
actually belongs to . Taking 1 = H, we see that if H

there is an F such that (HF) = 0.
(c) We already know that E E for every E , with equality if E < , by 213Fa.
(i) If (X, , ) is semi-nite, then for any F we have
F = supE : E , E F, E <
= sup E : E , E F, E < F F,
so that F = F.
(ii) Suppose that F = F for every F . If F = , then there must be an E such that E < ,
(F E) > 0; in which case F E and 0 < (F E) < . As F is arbitrary, (X, , ) is semi-nite.
(iii) If f is non-negative and
_
fd = , then f is -virtually measurable, therefore

-measurable (213Ga), and
dened -almost everywhere, therefore -almost everywhere. Now
_
fd = sup
_
g d : g is -simple, 0 g f -a.e.
sup
_
g d : g is -simple, 0 g f -a.e. =
by 213B. With 213Gb, this shows that
_
fd =
_
fd whenever f is non-negative and
_
fd is dened in [0, ].
Applying this to the positive and negative parts of f, we see that
_
fd =
_
fd whenever the latter is dened in
[, ].
(d)(i) If H

is an atom for , then (because is semi-nite) there is surely an H



such that H

H and
0 < H

< , and we must have (H H

) = 0, so that H < . Accordingly there is an E such that E H


and (H E) = 0 (213Fc above). We have E = H > 0. If F and F E, then either F = F = 0 or
(E F) = (H F) = 0. Thus E is an atom for with (HE) = 0 and E = H < .
(ii) If H

and there is an atom E for such that E < and (HE) = 0, let G

be a subset of H. We
have
G H = E < ,
so there is an F such that F G and (G F) = 0. Now either G = (E F) = 0 or (H G) = (E F) = 0.
This is true whenever G

and G H; also H = E > 0. So H is an atom for .
(e) If (X, , ) is atomless, then (X,

, ) must be atomless, by (d).
If (X, , ) is purely atomic and H

, H > 0, then there is an E such that 0 < (H E) < . Let E
1

be such that E
1
H E and E
1
> 0. There is an atom F for such that F E
1
; now F < so F is an atom
for , by (d). Also F H. As H is arbitrary, (X,

, ) is purely atomic.
(f ) If = , then of course (X, , ) must be complete and locally determined, because (X,

, ) is. If (X, , ) is
complete and locally determined, then = so (using the denition in 213D)

and = , by (c) above.
28 Taxonomy of measure spaces 213I
213I Locally determined negligible sets The following simple idea is occasionally useful.
Denition A measure space (X, , ) has locally determined negligible sets if for every non-negligible A X
there is an E such that E < and A E is not negligible.
213J Proposition If a measure space (X, , ) is either strictly localizable or complete and locally determined, it
has locally determined negligible sets.
proof Let A X be a set such that A E is negligible whenever E < ; I need to show that A is negligible.
(i) If is strictly localizable, let X
i

iI
be a decomposition of X. For each i I, A X
i
is negligible, so there we
can choose a negligible E
i
such that A X
i
E
i
. Set E =

iI
E
i
X
i
. Then E =

iI
(E
i
X
i
) = 0 and
A E, so A is negligible.
(ii) If is complete and locally determined, take any measurable set E of nite measure. Then A E is negligible,
therefore measurable; as E is arbitrary, A is measurable; as is semi-nite, A is negligible.
213K Lemma If a measure space (X, , ) has locally determined negligible sets, and c has an essential
supremum H in the sense of 211G, then H

c is negligible.
proof Set A = H

c. Take any F such that F < . Then F A has a measurable envelope V say (132Ee
again). If E c, then
(E (X V )) = (E V ) =

(E F A) = 0,
so H V = H (X V ) is negligible and F A is negligible. As F is arbitrary and has locally determined negligible
sets, A is negligible, as claimed.
213L Proposition Let (X, , ) be a localizable measure space with locally determined negligible sets. Then every
subset A of X has a measurable envelope.
proof Set
c = E : E ,

(A E) = E < .
Let G be an essential supremum for c in .
(i) A G is negligible. PPP Let F be any set of nite measure for . Let E be a measurable envelope of A F.
Then E c so E G is negligible. But F A G E G, so F A G is negligible. Because has locally determined
negligible sets, this is enough to show that A G is negligible. QQQ
(ii) Let E
0
be a negligible measurable set including A G, and set

G = E
0
G, so that

G , A

G and
(

G G) = 0. ??? Suppose, if possible, that there is an F such that

(A F) < (

G F). Let F
1
F be a
measurable envelope of A F. Set H = X (F F
1
); then A H. If E c then
E =

(A E) (H E),
so E H is negligible; as E is arbitrary, G H is negligible and

G H is negligible. But

G F F
1


G H and
(

G F F
1
) = (

G F)

(A F) > 0. XXX
This shows that

G is a measurable envelope of A, as required.
213M Corollary (a) If (X, , ) is -nite, then every subset of X has a measurable envelope for .
(b) If (X, , ) is localizable, then every subset of X has a measurable envelope for the c.l.d. version of .
proof (a) Use 132Ee, or 213L, 211Lc and 213L.
(b) Use 213L and the fact that the c.l.d. version of is localizable as well as being complete and locally determined
(213Hb).
213N When we come to use the concept of localizability, it will frequently be through the following characteriza-
tion.
Theorem Let (X, , ) be a localizable measure space. Suppose that is a family of measurable real-valued functions,
all dened on measurable subsets of X, such that whenever f, g then f = g almost everywhere in domf domg.
Then there is a measurable function h : X R such that every f agrees with h almost everywhere in domf.
213O Semi-nite, locally determined and localizable spaces 29
proof For q Q, f set
E
fq
= x : x domf, f(x) q .
For each q Q, let E
q
be an essential supremum of E
fq
: f in . Set
h

(x) = supq : q Q, x E
q
[, ]
for x X, taking sup = if necessary.
If f, g and q Q, then
E
fq
(X (domg E
gq
)) = E
fq
domg E
gq
x : x domf domg, f(x) ,= g(x)
is negligible; as f is arbitrary,
E
q
domg E
gq
= E
q
(X (domg E
gq
))
is negligible. Also E
gq
E
q
is negligible, so E
gq
(E
q
domg) is negligible. Set H
g
=

qQ
E
gq
(E
q
domg); then
H
g
is negligible. But if x domg H
g
, then, for every q Q, x E
q
x E
gq
; it follows that for such x,
h

(x) = g(x). Thus h

= g almost everywhere in domg; and this is true for every g .


The function h

is not necessarily real-valued. But it is measurable, because


x : h

(x) > a =

E
q
: q Q, q > a
for every real a. So if we modify it by setting
h(x) = h

(x) if h(x) R,
= 0 if h

(x) , ,
we shall get a measurable real-valued function h : X R; and for any g , h(x) will be equal to g(x) at least
whenever h

(x) = g(x), which is true for almost every x domg. Thus h is a suitable function.
213O There is an interesting and useful criterion for a space to be strictly localizable which I introduce at this
point, though it will be used rarely in this volume.
Proposition Let (X, , ) be a complete locally determined space.
(a) Suppose that there is a disjoint family c such that () E < for every E c () whenever F and
F > 0 then there is an E c such that (E F) > 0. Then (X, , ) is strictly localizable,

c is conegligible, and
c X

c is a decomposition of X.
(b) Suppose that X
i

iI
is a partition of X into measurable sets of nite measure such that whenever E and
E > 0 there is an i I such that (EX
i
) > 0. Then (X, , ) is strictly localizable, and X
i

iI
is a decomposition
of X.
proof (a)(i) The rst thing to note is that if F and F < , there is a countable c

c such that (F

) = 0.
PPP Set
c

n
= E : E c, (F E) 2
n
for each n N,
c

nN
c

n
= E : E c, (F E) > 0.
Because c is disjoint, we must have
#(c

n
) 2
n
F
for every n N, so that every c

n
is nite and c

, being the union of a sequence of countable sets, is countable. Set


E

and F

= F E

, so that both E

and F

belong to . If E c

, then E E

so (E F

) = = 0; if
E c c

, then (EF

) = (EF) = 0. Thus (EF

) = 0 for every E c. By the hypothesis () on c, F

= 0,
so (F

) = 0, as required. QQQ
(ii) Now suppose that H X is such that H E for every E c. In this case H . PPP Let F be
such that F < . Let c

c be a countable set such that (F E

) = 0, where E

. Then H (F E

)
because (X, , ) is complete. But also H E

EE

H E . So
H F = (H (F E

)) (F (H E

)) .
As F is arbitrary and (X, , ) is locally determined, H . QQQ
30 Taxonomy of measure spaces 213O
(iii) We nd also that H =

EE
(H E) for every H . PPP () Because c is disjoint, we must have

EE

(H E) H for every nite c

c, so

EE
(H E) = sup

EE

(H E) : c

c is nite H.
() For the reverse inequality, consider rst the case H < . By (i), there is a countable c

c such that
(H

) = 0, so that
H = (H

) =

EE

(H E)

EE
(H E).
() In general, because (X, , ) is semi-nite,
H = supF : F H, F <
sup

EE
(F E) : F H, F <

EE
(H E).
So in all cases we have H

EE
(H E), and the two are equal. QQQ
(iv) In particular, setting E
0
= X

c, E
0
and E
0
= 0; that is,

c is conegligible. Consider c

= c E
0
.
This is a partition of X into sets of nite measure (now using the hypothesis () on c). If H X is such that HE
for every E c

, then H and
H =

EE
(H E) =

EE

(H E).
Thus c

(or, if you prefer, the indexed family E


EE
) is a decomposition witnessing that (X, , ) is strictly localiz-
able.
(b) Apply (a) with c = X
i
: i I, noting that E
0
in (iv) is empty, so can be dropped.
213X Basic exercises (a) Let (X, , ) be any measure space,

the outer measure dened from , and


the measure dened by Caratheodorys method from

; write

for the domain of . Show that (i) extends the
completion of ; (ii) if H X is such that H F

whenever F and F < , then H

; (iii) ( )

,
so that the integrable functions for and are the same (212Xb); (iv) if is strictly localizable then = ; (v) if
is dened by Caratheodorys method from another outer measure, then = .
>>>(b) Let be counting measure restricted to the countable-cocountable -algebra of a set X (211R, 211Ye). (i)
Show that the c.l.d. version of is just counting measure on X. (ii) Show that , as dened in 213Xa, is equal to ,
and in particular strictly extends the completion of .
(c) Let (X, , ) be any measure space. For E set

sf
E = sup(E F) : F , F < .
(i) Show that (X, ,
sf
) is a semi-nite measure space, and is equal to (X, , ) i (X, , ) is semi-nite.
(ii) Show that a -integrable real-valued function f is
sf
-integrable, with the same integral.
(iii) Show that if E and
sf
E < , then E can be expressed as E
1
E
2
where E
1
, E
2
, E
1
=
sf
E
1
and
sf
E
2
= 0.
(iv) Show that if f is a
sf
-integrable real-valued function on X, it is equal
sf
-almost everywhere to a -integrable
function.
(v) Show that if (X, ,
sf
) is complete, so is (X, , ).
(vi) Show that and
sf
have identical c.l.d. versions.
(d) Let (X, , ) be any measure space. Dene as in 213Xa. Show that ( )
sf
, as constructed in 213Xc, is precisely
the c.l.d. version of , so that = i is semi-nite.
(e) Let (X, , ) be a measure space. For A X set

A = supE : E , E < , E A, as in 113Yh. (i)


Show that the measure constructed from

by the method of 113Yg is just the c.l.d. version of . (ii) Show that

. (iii) Show that if is another measure on X, with domain T, then = i

.
(f ) Let X be a set and an outer measure on X. Show that
sf
, dened by writing

sf
A = supB : B A, B <
is also an outer measure on X. Show that the measures dened by Caratheodorys method from ,
sf
have the same
domains.
213Yb Semi-nite, locally determined and localizable spaces 31
(g) Let (X, , ) be any measure space. Set

sf
A = sup

(A E) : E , E <
for every A X.
(i) Show that

sf
A = sup

B : B A,

B <
for every A.
(ii) Show that

sf
is an outer measure.
(iii) Show that if A X and

sf
A < , there is an E such that

sf
A =

(A E) = E,

sf
(A E) = 0.
(Hint: take a non-decreasing sequence E
n

nN
of measurable sets of nite measure such that

sf
A = lim
n

(A
E
n
), and let E

nN
E
n
be a measurable envelope of A

nN
E
n
.)
(iv) Show that the measure dened from

sf
by Caratheodorys method is precisely the c.l.d. version of .
(v) Show that

sf
=

, so that if is complete and locally determined then

sf
=

.
>>>(h) Let (X, , ) be a strictly localizable measure space with a decomposition X
i

iI
. Show that

A =

iI

(A X
i
) for every A X.
>>>(i) Let (X, , ) be a complete locally determined measure space, and let A X be such that max(

(E
A),

(E A)) < E whenever E and 0 < E < . Show that A . (Hint: given F < , consider the
intersection E of measurable envelopes of F A, F A to see that

(F A) +

(F A) = F.)
>>>(j) Let (X, , ) be a measure space, its c.l.d. version, and the measure dened by Caratheodorys method
from

. (i) Show that the following are equiveridical: () has locally determined negligible sets; () and have
the same negligible sets; () = . (ii) Show that in this case is semi-nite.
(k) Let (X, , ) be a measure space. Show that the following are equiveridical: (i) (X, , ) has locally determined
negligible sets; (ii) the completion and c.l.d. version of have the same sets of nite measure; (iii) and have
the same integrable functions; (iv)

; (v) the outer measure

sf
of 213Xg is equal to

.
(l) Let us say that a measure space (X, , ) has the measurable envelope property if every subset of X has a
measurable envelope. (i) Show that a semi-nite space with the measurable envelope property has locally determined
negligible sets. (ii) Show that a complete semi-nite space with the measurable envelope property is locally determined.
(m) Let (X, , ) be a semi-nite measure space, and suppose that it satises the conclusion of Theorem 213N.
Show that it is localizable. (Hint: given c , set T = F : F , E F is negligible for every E c. Let be
the set of functions f from subsets of X to 0, 1 such that f
1
[1] c and f
1
[0] T.)
(n) Let (X, , ) be a measure space. Show that its c.l.d. version is strictly localizable i there is a disjoint family
c such that E < for every E c and whenever F , 0 < F < there is an E c such that (EF) > 0.
(o) Show that the c.l.d. version of any point-supported measure is point-supported.
213Y Further exercises (a) Set X = N, and for A X set
A =
_
#(A) if A is nite, if A is innite.
Show that is an outer measure on X, that A = supB : B A, B < for every A X, but that the measure
dened from by Caratheodorys method is not semi-nite. Show that if is the measure dened by Caratheodorys
method from

(213Xa), then ,= .
(b) Set X = [0, 1] 0, 1, and let be the family of those subsets E of X such that
x : x [0, 1], E[x] ,= , E[x] ,= 0, 1
is countable, writing E[x] = y : (x, y) E for each x [0, 1]. Show that is a -algebra of subsets of X. For
E , set E = #(x : (x, 1) E) if this is nite, otherwise. Show that is a complete semi-nite measure.
Show that the measure dened from

by Caratheodorys method (213Xa) is not semi-nite. Show that the domain


of the c.l.d. version of is the whole of TX.
32 Taxonomy of measure spaces 213Yc
(c) Set X = N, and for A X set
A = #(A)
2
if A is nite, if A is innite.
Show that satises the conditions of 113Yg/212Ya, but that the measure dened from by the method of 113Yg is
not semi-nite.
(d) Let (X, , ) be a complete locally determined measure space. Suppose that D X and that f : D R is a
function. Show that the following are equiveridical: (i) f is measurable; (ii)

x : x D E, f(x) a +

x : x D E, f(x) b E
whenever a < b in R, E and E < (iii)
max(

x : x D E, f(x) a,

x : x D E, f(x) b) < E
whenever a < b in R and 0 < E < . (Hint: for (iii)(i), show that if E X then

x : x D E, f(x) > a = sup


b>a

x : x D E, f(x) b,
and use 213Xi above.)
(e) Let (X, , ) be a complete locally determined measure space and suppose that c is such that E <
for every E c and whenever F and F < there is a countable c
0
c such that F

c
0
, F

(c c
0
) are
negligible. Show that (X, , ) is strictly localizable.
(f ) Let (X, , ) be a measure space. Show that is semi-nite i there is a family c such that E < for
every E c and F =

EE
(F E) for every F . (Hint: take c maximal subject to the intersection of any two
elements being negligible.)
213 Notes and comments I think it is fair to say that if the denition of measure space were re-written to exclude
all spaces which are not semi-nite, nothing signicant would be lost from the theory. There are solid reasons for not
taking such a drastic step, starting with the fact that it would confuse everyone (if you say to an unprepared audience
let (X, , ) be a measure space, there is a danger that some will imagine that you mean -nite measure space,
but very few will suppose that you mean semi-nite measure space). But the whole point of measure theory is that
we distinguish between sets by their measures, and if every subset of E is either non-measurable, or negligible, or of
innite measure, the classication is too crude to support most of the usual ideas, starting, of course, with ordinary
integration.
Let us say that a measurable set E is purely innite if E itself and all its non-negligible measurable subsets have
innite measure. On the denition of the integral which I chose in Volume 1, every simple function, and therefore every
integrable function, must be zero almost everywhere in E. This means that the whole theory of integration will ignore
E entirely. Looking at the denition of c.l.d. version (213D-213E), you will see that the c.l.d. version of the measure
will render E negligible, as does the semi-nite version described in 213Xc. These amendments do not, however, aect
sets of nite measure, and consequently leave integrable functions integrable, with the same integrals.
The strongest reason we have yet seen for admitting non-semi-nite spaces into consideration is that Caratheodorys
method does not always produce semi-nite spaces. (I give examples in 213Ya-213Yb; more important ones are the
Hausdor measures of 264-265 below.) In practice the right thing to do is often to take the c.l.d. version of the
measure produced by Caratheodorys construction.
It is a reasonable general philosophy, in measure theory, to say that we wish to measure as many sets, and integrate
as many functions, as we can manage in a canonical way I mean, without making blatantly arbitrary choices about
the values we assign to our measure or integral. The revision of a measure to its c.l.d. version is about as far as
we can go with an arbitrary measure space in which we have no other structure to guide our choices.
You will observe that is not as close to as the completion of is; naturally so, because if E is purely
innite for then we have to choose between setting E = 0 ,= E and nding some way of tting many sets of
nite measure into E; which if E is a singleton will be actually impossible, and in any case would be an arbitrary
process. However the integrable functions for , while not always the same as those for (since turns purely innite
sets into negligible ones, so that their characteristic functions become integrable), are nearly the same, in the sense
that any -integrable function can be changed into a -integrable function by adjusting it on a -negligible set. This
corresponds, of course, to the fact that any set of nite measure for is the symmetric dierence of a set of nite
measure for and a -negligible set. For sets of innite measure this can fail, unless is localizable (213Hb, 213Xb).
If (X, , ) is semi-nite, or localizable, or strictly localizable, then of course it is correspondingly closer to (X,

, ),
as detailed in 213Ha-c.
214C Subspaces 33
It is worth noting that while the measure obtained by Caratheodorys method directly from the outer measure

dened from may fail to be semi-nite, even when is (213Yb), a simple modication of

(213Xg) yields the c.l.d.


version of , which can also be obtained from an appropriate inner measure (213Xe). The measure is of course
related in other ways to ; see 213Xd.
214 Subspaces
In 131 I described a construction for subspace measures on measurable subsets. It is now time to give the gener-
alization to subspace measures on arbitrary subsets of a measure space. The relationship between this construction
and the properties listed in 211 is not quite as straightforward as one might imagine, and in this section I try to give
a full account of what can be expected of subspaces in general. I think that for the present volume only (i) general
subspaces of -nite spaces and (ii) measurable subspaces of general measure spaces will be needed in any essential
way, and these do not give any diculty; but in later volumes we shall need the full theory.
I begin with a general construction for subspace measures (214A-214C), with an account of integration with respect
to a subspace measure (214E-214G); these (with 131E-131H) give a solid foundation for the concept of integration
over a subset (214D). I present this work in its full natural generality, which will eventually be essential, but even for
Lebesgue measure alone it is important to be aware of the ideas here. I continue with answers to some obvious questions
concerning subspace measures and the properties of measure spaces so far considered, both for general subspaces (214I)
and for measurable subspaces (214K), and I mention a basic construction for assembling measure spaces side-by-side,
the direct sums of 214L-214M. At the end of the section I discuss a measure extension problem (214O-214P).
214A Proposition Let (X, , ) be a measure space, and Y any subset of X. Let

be the outer measure dened


from (132A-132B), and set
Y
= E Y : E ; let
Y
be the restriction of

to
Y
. Then (Y,
Y
,
Y
) is a
measure space.
proof (a) I have noted in 121A that
Y
is a -algebra of subsets of Y .
(b) Of course
Y
F [0, ] for every F
Y
.
(c)
Y
=

= 0.
(d) If F
n

nN
is a disjoint sequence in
Y
with union F, then choose E
n
, E

n
, E such that F
n
= Y E
n
, F
n
E

n
and
Y
F
n
= E

n
for each n, F E and
Y
F = E (using 132Aa repeatedly). Set G
n
= E
n
E

n
E

m<n
E
m
for
each n N; then G
n

nN
is disjoint and F
n
G
n
E

n
for each n, so
Y
F
n
= G
n
. Also F

nN
G
n
E, so

Y
F = (

nN
G
n
) =

n=0
G
n
=

n=0

Y
F
n
.
As F
n

nN
is arbitrary,
Y
is a measure.
214B Denition If (X, , ) is any measure space and Y is any subset of X, then
Y
, dened as in 214A, is the
subspace measure on Y .
It is worth noting the following.
214C Lemma Let (X, , ) be a measure space, Y a subset of X,
Y
the subspace measure on Y and
Y
its
domain. Then
(a) for any F
Y
, there is an E such that F = E Y and E =
Y
F;
(b) for any A Y , A is
Y
-negligible i it is -negligible;
(c)(i) if A X is -conegligible, then A Y is
Y
-conegligible;
(ii) if A Y is
Y
-conegligible, then A (X Y ) is -conegligible;
(d) (
Y
)

, the outer measure on Y dened from


Y
, agrees with

on TY ;
(e) if Z Y X, then
Z
= (
Y
)
Z
, the subspace -algebra of subsets of Z regarded as a subspace of (Y,
Y
), and

Z
= (
Y
)
Z
is the subspace measure on Z regarded as a subspace of (Y,
Y
);
(f) if Y , then
Y
, as dened here, is exactly the subspace measure on Y dened in 131A-131B; that is,

Y
= TY and
Y
=
Y
.
proof (a) By the denition of
Y
, there is an E
0
such that F = E
0
Y . By 132Aa, there is an E
1
such that
F E
1
and

F = E
1
. Set E = E
0
E
1
; this serves.
34 Taxonomy of measure spaces 214C
(b)(i) If A is
Y
-negligible, there is a set F
Y
such that A F and
Y
F = 0; now

F = 0 so A is
-negligible, by 132Ad. (ii) If A is -negligible, there is an E such that A E and E = 0; now A E Y
Y
and
Y
(E Y ) = 0, so A is
Y
-negligible.
(c) If A X is -conegligible, then AY is
Y
-conegligible, because Y A = Y (X A) is -negligible, therefore

Y
-negligible. If A Y is
Y
-conegligible, then A (X Y ) is -conegligible because X (A (X Y )) = Y A is

Y
-negligible, therefore -negligible.
(d) Let A Y . (i) If A E , then A EY
Y
, so

Y
A
Y
(EY ) E; as E is arbitrary,

Y
A

A.
(ii) If A F
Y
, there is an E such that F E and
Y
F =

F = E; now A E so

A E =
Y
F. As
F is arbitrary,

Y
A.
(e) That
Z
= (
Y
)
Z
is immediate from the denition of
Y
, etc.; now
(
Y
)
Z
=

Z
=

Z
=
Z
by (d).
(f ) This is elementary, because E Y and

(E Y ) = (E Y ) for every E .
214D Integration over subsets: Denition Let (X, , ) be a measure space, Y a subset of X and f a
[, ]-valued function dened on a subset of X. By
_
Y
f (or
_
Y
f(x)(dx), etc.) I mean
_
(fY )d
Y
, if this exists in
[, ], following the denitions of 214A-214B, 133A and 135F, and taking dom(fY ) = Y domf, (fY )(x) = f(x)
for x Y domf. (Compare 131D.)
214E Proposition Let (X, , ) be a measure space, Y X, and f a [, ]-valued function dened on a subset
domf of X.
(a) If f is -integrable then fY is
Y
-integrable, and
_
Y
f
_
f if f is non-negative.
(b) If domf Y and f is
Y
-integrable, then there is a -integrable function

f on X, extending f, such that
_
F

f =
_
FY
f for every F .
proof (a)(i) If f is -simple, it is expressible as

n
i=0
a
i
E
i
, where E
0
, . . . , E
n
, a
0
, . . . , a
n
R and E
i
< for
each i. Now fY =

n
i=0
a
i

Y
(E
i
Y ), where
Y
(E
i
Y ) = (E
i
)Y is the characteristic function of E
i
Y regarded
as a subset of Y ; and each E
i
Y belongs to
Y
, with
Y
(E
i
Y ) E
i
< , so fY : Y R is
Y
-simple.
If f : X R is a non-negative simple function, it is expressible as

n
i=0
a
i
E
i
where E
0
, . . . , E
n
are disjoint sets of
nite measure (122Cb). Now fY =

n
i=0
a
i

Y
(E
i
Y ) and
_
(fY )d
Y
=

n
i=0
a
i

Y
(E
i
Y )

n
i=0
a
i
E
i
=
_
fd
because a
i
0 whenever E
i
,= , so that a
i

Y
(E
i
Y ) a
i
E
i
for every i.
(ii) If f is a non-negative -integrable function, there is a non-decreasing sequence f
n

nN
of non-negative
-simple functions converging to f -almost everywhere; now f
n
Y
nN
is a non-decreasing sequence of
Y
-simple
functions increasing to fY
Y
-a.e. (by 214Cb), and
sup
nN
_
(f
n
Y )d
Y
sup
nN
_
f
n
d =
_
fd < ,
so
_
(fY )d
Y
exists and is at most
_
fd.
(iii) Finally, if f is any -integrable real-valued function, it is expressible as f
1
f
2
where f
1
and f
2
are non-
negative -integrable functions, so that fY = (f
1
Y ) (f
2
Y ) is
Y
-integrable.
(b) Let us say that if f is a
Y
-integrable function, then an enveloping extension of f is a -integrable function

f,
extending f, real-valued on X Y , such that
_
F

f =
_
FY
f for every F .
(i) If f is of the form H, where H
Y
and
Y
H < , let E
0
be such that H = Y E
0
and E
1
a
measurable envelope for H (132Ee); then E = E
0
E
1
is a measurable envelope for H and H = E Y . Set

f = E,
regarded as a function from X to 0, 1. Then

fY = f, and for any F we have
_
F

f =
F
(E F) = (E F) =

(H F) =
Y F
(H F) =
_
Y F
f.
So

f is an enveloping extension of f.
(ii) If f, g are
Y
-integrable functions with enveloping extensions

f, g, and a, b R, then a

f +b g extends af +bg
and
214F Subspaces 35
_
F
a

f +b g = a
_
F

f +b
_
F
g
= a
_
FY
f +b
_
FY
g =
_
FY
af +bg
for every F , so a

f +b g is an enveloping extension of af +bg.


(iii) Putting (i) and (ii) together, we see that every
Y
-simple function f has an enveloping extension.
(iv) Now suppose that f
n

nN
is a non-decreasing sequence of non-negative
Y
-simple functions converging
Y
-
almost everywhere to a
Y
-integrable function f. For each n N let

f
n
be an enveloping extension of f
n
. Then

f
n

a.e.

f
n+1
. PPP If F then
_
F

f
n
=
_
FY
f
n

_
FY
f
n+1
=
_
F

f
n+1
.
So

f
n

a.e.

f
n+1
, by 131Ha. QQQ Also
lim
n
_
F

f
n
= lim
n
_
FY
f
n
=
_
FY
f
for every F . Taking F = X to begin with, B.Levis theorem tells us that h = lim
n

f
n
is dened (as a
real-valued function) -almost everywhere; now letting F vary, we have
_
F
h =
_
FY
f for every F , because
hF = lim
n

f
n
F
F
-a.e. (I seem to be using 214Cb here.) Now hY = f
Y
-a.e., by 214Cb again. If we dene

f
by setting

f(x) = f(x) for x domf, h(x) for x domh domf, 0 for other x X,
then

f is dened everywhere in X and is equal to h -almost everywhere; so that if F ,

fF will be equal to
hF
F
-almost everywhere, and
_
F

f =
_
F
h =
_
FY
f.
As F is arbitrary,

f is an enveloping extension of f.
(v) Thus every non-negative
Y
-integrable function has an enveloping extension. Using (ii) again, every
Y
-
integrable function has an enveloping extension, as claimed.
214F Proposition Let (X, , ) be a measure space, Y a subset of X, and f a [, ]-valued function such that
_
X
f is dened in [, ]. If either Y is of full outer measure in X or f is zero almost everywhere in X Y , then
_
Y
f is dened and equal to
_
X
f.
proof (a) Suppose rst that f is non-negative, -measurable and dened everywhere in X. In this case fY is
Y
-
measurable. Set F
nk
= x : x X, f(x) 2
n
k for k, n N, f
n
=

4
n
k=1
2
n
F
nk
for n N, so that f
n

nN
is a
non-decreasing sequence of real-valued measurable functions converging everywhere to f, and
_
X
f = lim
n
_
X
f
n
.
For each n N and k 1,

Y
(F
nk
Y ) =

(F
nk
Y ) = F
nk
either because F
nk
Y is negligible or because X is a measurable envelope of Y . So
_
Y
f = lim
n
_
Y
f
n
= lim
n
4
n

k=1
2
n

Y
(F
nk
Y )
= lim
n
4
n

k=1
2
n
F
nk
= lim
n
_
X
f
n
=
_
X
f.
(b) Now suppose that f is non-negative, dened almost everywhere in X and -virtually measurable. In this case
there is a conegligible measurable set E domf such that fE is measurable. Set

f(x) = f(x) for x E, 0 for
x X E; then

f satises the conditions of (a) and f =

f -a.e. Accordingly fY =

fY
Y
-a.e. (214Cc), and
_
Y
f =
_
Y

f =
_
X

f =
_
X
f.
(c) Finally, for the general case, we can apply (b) to the positive and negative parts f
+
, f

of f to get
_
Y
f =
_
Y
f
+

_
Y
f

=
_
X
f
+

_
X
f

=
_
X
f.
36 Taxonomy of measure spaces 214G
214G Corollary Let (X, , ) be a measure space, Y a subset of X, and E a measurable envelope of Y . If f
is a [, ]-valued function such that
_
E
f is dened in [, ], then
_
Y
f is dened and equal to
_
E
f.
proof By 214Ce, we can identify the subspace measure
Y
with the subspace measure (
E
)
Y
induced by the subspace
measure on E. Now, regarded as a subspace of E, Y is of full outer measure, so 214F gives the result.
214H Subspaces and Caratheodorys method The following easy technical results will occasionally be useful.
Lemma Let X be a set, Y X a subset, and an outer measure on X.
(a)
Y
= TY is an outer measure on Y .
(b) Let , be the measures on X, Y dened by Caratheodorys method from the outer measures ,
Y
, and , T
their domains; let
Y
be the subspace measure on Y induced by , and
Y
its domain. Then
(i)
Y
T and F
Y
F for every F
Y
;
(ii) if Y then =
Y
;
(iii) if =

(that is, is regular) then extends


Y
;
(iv) if =

and Y < then =


Y
.
proof (a) You have only to read the denition of outer measure (113A).
(b)(i) Suppose that F
Y
. Then it is of the form E Y where E . If A Y , then

Y
(A F) +
Y
(A F) = (A F) +(A F) = (A E) +(A E) = A =
Y
A,
so F T. Now
F =
Y
F = F

F =
Y
F.
(ii) Suppose that F T. If A X, then
A = (A Y ) +(A Y ) =
Y
(A Y ) +(A Y )
=
Y
(A Y F) +
Y
(A Y F) +(A Y )
= (A F) +(A Y F) +(A Y )
= (A F) +((A F) Y ) +((A F) Y ) = (A F) +(A F);
as A is arbitrary, F and therefore F
Y
. Also

Y
F = F = F =
Y
F = F.
Putting this together with (i), we see that
Y
and are identical.
(iii) Let F
Y
. Then F T, by (i). Now F = F =

F =
Y
F. As F is arbitrary, extends
Y
.
(iv) Now suppose that F T. Because

Y = Y < , we have measurable envelopes E


1
, E
2
of F and Y F
for (132Ee). Then
Y =
Y
Y =
Y
F +
Y
(Y F) = F +(Y F)
=

F +

(Y F) = E
1
+E
2
(E
1
E
2
) = (E
1
E
2
) Y,
so E
1
+E
2
= (E
1
E
2
) and
(E
1
E
2
) = E
1
+E
2
(E
1
E
2
) = 0.
As is complete (212A) and E
1
Y F E
1
E
2
is -negligible, therefore belongs to , F = Y (E
1
(E
1
Y F))
belongs to
Y
. Thus T
Y
; putting this together with (iii), we see that =
Y
.
214I I now turn to the relationships between subspace measures and the classication of measure spaces developed
in this chapter.
Theorem Let (X, , ) be a measure space and Y a subset of X. Let
Y
be the subspace measure on Y and
Y
its
domain.
(a) If (X, , ) is complete, or totally nite, or -nite, or strictly localizable, so is (Y,
Y
,
Y
). If X
i

iI
is a
decomposition of X for , then X
i
Y
iI
is a decomposition of Y for
Y
.
(b) Writing for the completion of , the subspace measure
Y
= ( )
Y
is the completion of
Y
.
(c) If (X, , ) has locally determined negligible sets, then
Y
is semi-nite.
214I Subspaces 37
(d) If (X, , ) is complete and locally determined, then (Y,
Y
,
Y
) is complete and semi-nite.
(e) If (X, , ) is complete, locally determined and localizable then so is (Y,
Y
,
Y
).
proof (a)(i) Suppose that (X, , ) is complete. If A U
Y
and
Y
U = 0, there is an E such that U = EY
and E =
Y
U = 0; now A E so A and A = A Y
Y
.
(ii)
Y
Y =

Y X, so
Y
is totally nite if is.
(iii) If X
n

nN
is a sequence of sets of nite measure for which covers X, then X
n
Y
nN
is a sequence of
sets of nite measure for
Y
which covers Y . So (Y,
Y
,
Y
) is -nite if (X, , ) is.
(iv) Suppose that X
i

iI
is a decomposition of X for . Then X
i
Y
iI
is a decomposition of Y for
Y
. PPP
Because
Y
(X
i
Y ) X
i
< for each i, X
i
Y
iI
is a partition of Y into sets of nite measure. Suppose that
U Y is such that U
i
= U X
i
Y
Y
for every i. For each i I, choose E
i
such that U
i
= E
i
Y and
E
i
=
Y
U
i
; we may of course suppose that E
i
X
i
. Set E =

iI
E
i
. Then E X
i
= E
i
for every i, so E
and E =

iI
E
i
. Now U = E Y so U
Y
and

Y
U E =

iI
E
i
=

iI

Y
U
i
.
On the other hand,
Y
U is surely greater than or equal to

iI

Y
U
i
= sup
JI is nite

iJ

Y
U
i
, so they are equal.
As U is arbitrary, X
i
Y
iI
is a decomposition of Y for
Y
. QQQ
Consequently (Y,
Y
,
Y
) is strictly localizable if (X, , ) is.
(b) The domain of the completion (
Y
) is

Y
= FA : F
Y
, A Y is
Y
-negligible
= (E Y )(A Y ) : E , A X is -negligible
(214Cb)
= (EA) Y : E , A is -negligible = dom
Y
.
If H

Y
then
(
Y
)(H) =

Y
H =

H = ( )

H =
Y
H,
using 214Cd for the second step, and 212Ea for the third.
(c) Take U
Y
such that
Y
U > 0. Then there is an E such that E < and

(E U) > 0. PPP???
Otherwise, E U is -negligible whenever E < ; because has locally determined negligible sets, U is -negligible
and
Y
U =

U = 0. XXXQQQNow E U
Y
and
0 <

(E U) =
Y
(E U) E < .
(d) By (a),
Y
is complete; by (c) and 213J, it is semi-nite.
(e) By (d),
Y
is complete and semi-nite. To see that it is locally determined, take any U Y such that UV
Y
whenever V
Y
and
Y
V < . By 213L and 213J, there is a measurable envelope E of U for ; of course EY
Y
.
I claim that (E Y U) = 0. PPP Take any F with F < . Then F U
Y
, so

Y
(F E Y ) (F E) =

(F U) =
Y
(F U)
Y
(F E Y );
thus
Y
(F E Y ) =
Y
(F U) and

(F E Y U) =
Y
(F E Y U) = 0.
Because is complete, (F E Y U) = 0; because is locally determined and F is arbitrary, (E Y U) = 0.
QQQ But this means that E Y U
Y
and U
Y
. As U is arbitrary,
Y
is locally determined.
To see that
Y
is localizable, let | be any family in
Y
. Set
c = E : E , E < , E =

(E U) for some U |,
and let G be an essential supremum for c in . I claim that G Y is an essential supremum for | in
Y
. PPP (i)
??? If U | and U (G Y ) is not negligible, then (because
Y
is semi-nite) there is a V
Y
such that V U G
and 0 <
Y
V < . Now there is an E such that V E and E =

V . We have

(E U)

V = E,
so E c and E G must be negligible; but V E G is not negligible. XXX Thus U (G Y ) is negligible for every
U |. (ii) If W
Y
is such that U W is negligible for every U |, express W as H Y where H . If E c,
38 Taxonomy of measure spaces 214I
there is a U | such that E =

(E U); now

(E U W) = 0, so E =

(E U W) (E H) and
E H is negligible. As E is arbitrary, H is an essential upper bound for c and G H is negligible; but this means that
G Y W is negligible. As W is arbitrary, G Y is an essential supremum for |. QQQ
As | is arbitrary,
Y
is localizable.
214J Upper and lower integrals The following elementary facts are sometimes useful.
Proposition Let (X, , ) be a measure space, A a subset of X and f a real-valued function dened almost everywhere
in X. Then
(a) if either f is non-negative or A has full outer measure in X,
_
(fA)d
A

_
fd;
(b) if A has full outer measure in X,
_
fd
_
(fA)d
A
.
proof (a)(i) Suppose that f is non-negative. If
_
fd = , the result is trivial. Otherwise, there is a -integrable
function g such that f g -a.e. and
_
fd =
_
g d, by 133Ja. Now fA gA
A
-a.e., by 214Cb, and
_
(gA) d
A
is dened and less than or equal to
_
g d, by 214Ea; so
_
(fA)d
A

_
(gA)d
A

_
g d =
_
fd.
(ii) Now suppose that A has full outer measure in X. If g is such that f g -a.e. and
_
g d is dened in
[, ], then fA gA
A
-a.e. and
_
(gA)d
A
=
_
g d, by 214F. So
_
(fA)d
A

_
g d. As g is arbitrary,
_
(fA)d
A

_
f d.
(b) Apply (a) to f, and use 133J(b-iv).
214K Measurable subspaces: Proposition Let (X, , ) be a measure space.
(a) Let E and let
E
be the subspace measure, with
E
its domain. If (X, , ) is complete, or totally nite,
or -nite, or strictly localizable, or semi-nite, or localizable, or locally determined, or atomless, or purely atomic, so
is (E,
E
,
E
).
(b) Suppose that X
i

iI
is a partition of X into measurable sets (not necessarily of nite measure) such that
= E : E X, E X
i
for every i I,
E =

iI
(E X
i
) for every E .
Then (X, , ) is complete, or strictly localizable, or semi-nite, or localizable, or locally determined, or atomless, or
purely atomic, i (X
i
,
X
i
,
X
i
) has that property for every i I.
proof I really think that if you have read attentively up to this point, you ought to nd this easy. If you are in any
doubt, this makes a very suitable set of sixteen exercises to do.
214L Direct sums Let (X
i
,
i
,
i
)
iI
be any indexed family of measure spaces. Set X =

iI
(X
i
i); for
E X, i I set E
i
= x : (x, i) E. Write
= E : E X, E
i

i
for every i I,
E =

iI

i
E
i
for every E .
Then it is easy to check that (X, , ) is a measure space; I will call it the direct sum of the family (X
i
,
i
,
i
)
iI
.
Note that if (X, , ) is any strictly localizable measure space, with decomposition X
i

iI
, then we have a natural
isomorphism between (X, , ) and the direct sum (X

) =

iI
(X
i
,
X
i
,
X
i
) of the subspace measures, if we
match (x, i) X

with x X for every i I and x X


i
.
For some of the elementary properties (to put it plainly, I know of no properties which are not elementary) of direct
sums, see 214M and 214Xh-214Xk.
214M Proposition Let (X
i
,
i
,
i
)
iI
be a family of measure spaces, with direct sum (X, , ). Let f be a
real-valued function dened on a subset of X. For each i I, set f
i
(x) = f(x, i) whenever (x, i) domf.
(a) f is measurable i f
i
is measurable for every i I.
(b) If f is non-negative, then
_
fd =

iI
_
f
i
d
i
if either is dened in [0, ].
proof (a) For a R, set F
a
= (x, i) : (x, i) domf, f(x, i) a. (i) If f is measurable, i I and a R, then there
is an E such that F
a
= E domf; now
*214O Subspaces 39
x : f
i
(x) a = domf
i
x : (x, i) E
belongs to the subspace -algebra on domf
i
induced by
i
. As a is arbitrary, f
i
is measurable. (ii) If every f
i
is
measurable and a R, then for each i I there is an E
i

i
such that x : (x, i) F
a
= E
i
domf; setting
E = (x, i) : i I, x E
i
, F
a
= domf E belongs to the subspace -algebra on domf. As a is arbitrary, f is
measurable.
(b)(i) Suppose rst that f is measurable and dened everywhere. Set F
nk
= (x, i) : (x, i) X, f(x, i) 2
n
k for
k, n N, g
n
=

4
n
k=1
2
n
F
nk
for n N, F
nki
= x : (x, i) F
nk
for k, n N and i I, g
ni
(x) = g
n
(x, i) for i I,
x X
i
. Then
_
fd = lim
n
_
g
n
d = sup
nN
4
n

k=1
2
n
F
nk
= sup
nN
4
n

k=1

iI
2
n
F
nki
=

iI
sup
nN
4
n

k=1
2
n
F
nki
=

iI
sup
nN
_
g
ni
d
i
=

iI
_
f
i
d
i
.
(ii) Generally, if
_
fd is dened, there are a measurable g : X [0, [ and a conegligible measurable set E
domf such that g = f on E. Now E
i
= x : (x, i) X
i
belongs to
i
for each i, and

iI

i
(X
i
E
i
) = (XE) = 0,
so E
i
is
i
-conegligible for every i. Setting g
i
(x) = g(x, i) for x X
i
, (i) tells us that

iI
_
f
i
d
i
=

iI
_
g
i
d
i
=
_
g d =
_
fd.
(iii) On the other hand, if
_
f
i
d
i
is dened for every i I, then for each i I we can nd a measurable function
g
i
: X
i
[0, [ and a
i
-conegligible measurable set E
i
domf
i
such that g
i
= f
i
on E
i
. Setting g(x, i) = g
i
(x) for
i I, x X
i
, (a) tells us that g is measurable, while g = f on (x, i) : i I, x E
i
, which is conegligible (by the
calculation in (ii) just above); so
_
fd =
_
g d =

iI
_
g
i
d
i
=

iI
_
f
i
d
i
,
again using (i) for the middle step.
214N Corollary Let (X, , ) be a measure space with a decomposition X
i

iI
. If f is a real-valued function
dened on a subset of X, then
(a) f is measurable i fX
i
is measurable for every i I,
(b) if f 0, then
_
f =

iI
_
X
i
f if either is dened in [0, ].
proof Apply 214M to the direct sum of (X
i
,
X
i
,
X
i
)
iI
, identied with (X, , ) as in 214L.
*214O I make space here for a general theorem which puts rather heavy demands on the reader. So I ought to say
that I advise skipping it on rst reading. It will not be quoted in this volume, in the full form here I do not expect
to use it anywhere in this treatise, only the special case of 214Xm is at all often applied, and the proof depends on a
concept (ideal of sets) and a technique (transnite induction, part (d) of the proof of 214P) which are used nowhere
else in this volume. However, extension of measures is one of the central themes of Volume 4, and this result may
help to make sense of some of the patterns which will appear there.
Lemma Let (X, , ) be a measure space, and J an ideal of subsets of X, that is, a family of subsets of X such that
J, I J J for all I, J J, and I J whenever I J J. Then there is a measure on X such that
J dom, E = E + sup
II

(E I) for every E , and I = 0 for every I J.


proof (a) Let be the set of those F X such that there are E and a countable J such that EF

.
Then is a -algebra of subsets of X including J. PPP because EE

for every E . J because


I

I for every I J. In particular, . If F , let E and J be such that is countable and


FE

; then (X F)(X E)

so X F . If F
n

nN
is a sequence in with union F, then for
each n N choose E
n
,
n
J such that
n
is countable and E
n
F
n

n
; then E =

nN
E
n
belongs to ,
=

nN

n
is a countable subset of J and EF

, so F . Thus is a -algebra. QQQ


(b) For F set
40 Taxonomy of measure spaces *214O
F = supE : E , E F,

(E I) = 0 for every I J.
Then is a measure. PPP The only subset of is , so = 0. Let F
n

nN
be a disjoint sequence in with union F;
set u =

n=0
F
n
. (i) If E , E F and

(E I) = 0 for every I J, then for each n set E


n
= E F
n
. As

(E
n
I) = 0 for every I J, E
n
F
n
for each n. Now E
n

nN
is disjoint and has union E, so
E =

n=0
E
n

n=0
F
n
= u.
As E is arbitrary, F u. (ii) Take any < u. For n N, set
n
= F
n
2
n1
min(1, u ) if F
n
is nite,
otherwise. For each n, we can nd an E
n
such that E
n
F
n
,

(E
n
I) = 0 for every I J, and E
n

n
. Set
E =

nN
E
n
; then E F and E I =

nN
E
n
I is -negligible for every I J, so F E =

n=0
E
n
.
As is arbitrary, F u. (iii) As F
n

nN
is arbitrary, is a measure. QQQ
(c) Now take any E and set u = sup
II

(E I). If u = then we certainly have E = = E + u.


Otherwise, let I
n

nN
be a sequence in J such that lim
n

(E I
n
) = u; replacing I
n
by

mn
I
m
for each n
if necessary, we may suppose that I
n

nN
is non-decreasing. Set A = E

nN
I
n
; because E I
n
has nite outer
measure for each n, A can be covered by a sequence of sets of nite measure, and has a measurable envelope H for
included in E (132Ee). Observe that
H =

A = sup
nN

(E I
n
) = u
by 132Ae.
Set G = E H. Then

(GI) = 0 for every I J. PPP For any n N there is an F such that F E (I


n
I)
and F u; in which case

(G I) +

(E I
n
) (F H) +(F H) u.
As n is arbitrary,

(G I) = 0. QQQ Accordingly
u +E H +G = E.
On the other hand, if F is such that F E and

(F I) = 0 for every I J, then

(E I
n
) (E F) +

(F I
n
) = (E F)
for every n, so
u +F (E F) +F = E;
as F is arbitrary, u +E E.
(d) If J J, F , F J and

(F I) = 0 for every I J, then F J = F is -negligible; as F is arbitrary,


J = 0. Thus has all the required properties.
*214P Theorem Let (X, , ) be a measure space, and / a family of subsets of X which is well-ordered by the
relation . Then there is an extension of to a measure on X such that (E A) is dened and equal to

(E A)
whenever E and A /.
proof (a) Adding and X to / if necessary, we may suppose that / has as its least member and X as its greatest
member. By 2A1Dg, / is isomorphic, as ordered set, to some ordinal; since / has a greatest member, this ordinal is a
successor, expressible as + 1; let A

: + 1 / be the order-isomorphism, so that A

is a non-decreasing
family of subsets of X, A
0
= and A

= X.
(b) For each ordinal , write

for the subspace measure on A

for its domain and J

for

<
TA

.
Because A

= A
max(,

)
for ,

< , J

is an ideal of subsets of A

. By 214O, we have a measure

on A

,
with domain

, such that

E =

E + sup
II

(E I) for every E

and

I = 0 for every I J

.
Because every member of J

is included in A

for some < , we have

(E A

) =

(E A

) + sup
<

(E A

) =

(E A

) + sup
<

(E A

)
(214Cd) for every E . Also, of course,

= 0 for every < .


(c) Now set
= F : F X, F A

for every ,
F =

(F A

)
for every F . Because

is a -algebra of subsets of A

for each , is a -algebra of subsets of X; because every

is a measure, so is . If E , then
214Xg Subspaces 41
E A

for each , so E . If , then for each either < and


A

= A

or and A

= A

belongs to

. So A

for every .
(d) Finally, (E A

) =

(E A

) whenever E and . PPP??? Otherwise, because the ordinal + 1 is


well-ordered, there is a least such that (E A

) ,=

(E A

). As A
0
= we surely have (E A
0
) =

(E A
0
)
and > 0. Note that if > , then

(E A

) = 0; so
(E A

) =

(E A

) =

(E A

).
Now

(E A

) =

(E A

) + sup

<

(E A

)
((b) above)
=

(E A

) + sup

<

(E A

)
(because was the rst problematic ordinal)
=

(E A

) + sup

<
sup
K

+1 is nite

(E A

)
(see the denition of sum in 112Bd, or 226A below)
=

(E A

) + sup
K is nite

(E A

)
= sup
K+1 is nite

(E A

) =

(E A

) ,=

(E A

)
by the choice of ; but this is absurd. XXXQQQ
In particular,
E = (E A

) =

(E A

) = E
for every E . This completes the proof of the theorem.
214X Basic exercises (a) Let (X, , ) be a localizable measure space. Show that there is an E such that
the subspace measure
E
is purely atomic and
X\E
is atomless.
(b) Let X be a set, a regular outer measure on X, and Y a subset of X. Let be the measure on X dened by
Caratheodorys method from ,
Y
the subspace measure on Y , and the measure on Y dened by Caratheodorys
method from TY . Show that if
Y
is locally determined (in particular, if is locally determined and localizable)
then =
Y
.
(c) Let (X, , ) be a localizable measure space, and Y a subset of X such that the subspace measure
Y
is
semi-nite. Show that
Y
is localizable.
>>>(d) Let (X, , ) be a measure space, and Y a subset of X such that the subspace measure
Y
is semi-nite. (i)
Show that a set F Y is an atom for
Y
i it is of the form E Y where E an atom for . (ii) Show that if is
atomless or purely atomic, so is
Y
.
(e) Let (X, , ) be a localizable measure space, and Y any subset of X. Show that the c.l.d. version of the subspace
measure on Y is localizable.
(f ) Let (X, , ) be a measure space with locally determined negligible sets, and Y a subset of X, with its subspace
measure
Y
. Show that
Y
has locally determined negligible sets.
>>>(g) Let (X, , ) be a measure space. Show that (X, , ) has locally determined negligible sets i the subspace
measure
Y
is semi-nite for every Y X.
42 Taxonomy of measure spaces 214Xh
>>>(h) Let (X
i
,
i
,
i
)
iI
be a family of measure spaces, with direct sum (X, , ) (214L). Set X

i
= X
i
i X
for each i I. Show that X

i
, with the subspace measure, is isomorphic to (X
i
,
i
,
i
). Under what circumstances is
X

iI
a decomposition of X? Show that is complete, or strictly localizable, or localizable, or locally determined,
or semi-nite, or atomless, or purely atomic i every
i
is. Show that a measure space is strictly localizable i it is
isomorphic to a direct sum of totally nite spaces.
>>>(i) Let (X
i
,
i
,
i
)
iI
be a family of measure spaces, and (X, , ) their direct sum. Show that the completion
of (X, , ) can be identied with the direct sum of the completions of the (X
i
,
i
,
i
), and that the c.l.d. version of
(X, , ) can be identied with the direct sum of the c.l.d. versions of the (X
i
,
i
,
i
).
(j) Let (X
i
,
i
,
i
)
iI
be a family of measure spaces. Show that their direct sum has locally determined negligible
sets i every
i
has.
(k) Let (X
i
,
i
,
i
)
iI
be a family of measure spaces, and (X, , ) their direct sum. Show that (X, , ) has the
measurable envelope property (213Xl) i every (X
i
,
i
,
i
) has.
(l) Let (X, , ) be a measure space, Y a subset of X, and f : X [0, ] a function such that
_
Y
f is dened in
[0, ]. Show that
_
Y
f =
_
f Y d.
>>>(m) Write out a direct proof of 214P in the special case in which / = A. (Hint: for E, F ,
((E A) (F A)) =

(E A) + supG : G , G F A.)
>>>(n) Let (X, , ) be a measure space and / a nite family of subsets of X. Show that there is a measure on X,
extending , which measures every member of /.
214Y Further exercises (a) Let (X, , ) be a measure space and A a subset of X such that the subspace measure
on A is semi-nite. Set = supE : E , E A. Show that if

A then there is a measure on X,


extending , such that A = .
(b) Let (X, , ) be a measure space and A
n

nZ
a double-ended sequence of subsets of X such that A
m
A
n
whenever m n in Z. Show that there is a measure on X, extending , which measures every A
n
. (Hint: use 214P
twice.)
(c) Let X be a set and / a family of subsets of X. Show that the following are equiveridical: (i) for every measure
on X there is a measure on X extending and measuring every member of /; (ii) for every totally nite measure
on X there is a measure on X extending and measuring every member of /. (Hint: 213Xa.)
214 Notes and comments I take the rst part of the section, down to 214H, slowly and carefully, because while none
of the arguments are deep (214Eb is the longest) the patterns formed by the results are not always easy to predict.
There is a counter-example to a tempting extension of 214H/214Xb in 216Xb.
The message of the second part of the section (214I-214L) is that subspaces inherit many, but not all, of the properties
of a measure space; and in particular there is a diculty with semi-niteness, unless we have locally determined
negligible sets (214Xg). (I give an example in 216Xa.) Of course 213Hb shows that if we start with a localizable space,
we can convert it into a complete locally determined localizable space without doing great violence to the structure of
the space, so the diculty is ordinarily superable.
By far the most important case of 214P is when / = A is a singleton, so that the argument simplies dramatically
(214Xm). In 439 of Volume 4 I will return to the problem of extending a measure to a given larger -algebra in the
absence of any helpful auxiliary structure. That section will mostly oer counter-examples, in particular showing that
there is no general theorem extending 214Xn from nite families to countable families, and that the special conditions
in 214P and 214Yb are there for good reasons. But in 552 of Volume 5 I will present some positive results dependent
on special axioms beyond those of ZFC.
215B -nite spaces and the principle of exhaustion 43
215 -nite spaces and the principle of exhaustion
I interpolate a short section to deal with some useful facts which might get lost if buried in one of the longer sections
of this chapter. The great majority of the applications of measure theory involve -nite spaces, to the point that
many authors skim over any others. I myself prefer to signal the importance of such concepts by explicitly stating just
which theorems apply only to the restricted class of spaces. But undoubtedly some facts about -nite spaces need to
be grasped early on. In 215B I give a list of properties characterizing -nite spaces. Some of these make better sense
in the light of the principle of exhaustion (215A). I take the opportunity to include a fundamental fact about atomless
measure spaces (215D).
215A The principle of exhaustion The following is an example of the use of one of the most important methods
in measure theory.
Lemma Let (X, , ) be any measure space and c a non-empty set such that sup
nN
F
n
is nite for every
non-decreasing sequence F
n

nN
in c.
(a) There is a non-decreasing sequence F
n

nN
in c such that, for every E , either there is an n N such that
E F
n
is not included in any member of c or, setting F =

nN
F
n
,
lim
n
(E F
n
) = (E F) = 0.
In particular, if E c and E F, then E F is negligible.
(b) If c is upwards-directed, then there is a non-decreasing sequence F
n

nN
in c such that, setting F =

nN
F
n
,
F = sup
EE
E and E F is negligible for every E c, so that F is an essential supremum of c in in the sense of
211G.
(c) If the union of any non-decreasing sequence in c belongs to c, then there is an F c such that EF is negligible
whenever E c and F E.
proof (a) Choose F
n

nN
, c
n

nN
and u
n

nN
inductively, as follows. Take F
0
to be any member of c. Given
F
n
c, set c
n
= E : F
n
E c and u
n
= supE : E c
n
in [0, ], and choose F
n+1
c
n
such that
F
n+1
min(n, u
n
2
n
); continue.
Observe that this construction yields a non-decreasing sequence F
n

nN
in c. Since c
n+1
c
n
for every n, u
n

nN
is non-increasing, and has a limit u in [0, ]. Since min(n, u 2
n
) F
n+1
u
n
for every n, lim
n
F
n
= u. Our
hypothesis on c now tells us that u is nite.
If E is such that for every n N there is an E
n
c such that E F
n
E
n
, then E
n
c
n
, so
F
n
(E F
n
) E
n
u
n
for every n, and lim
n
(E F
n
) = u. But this means that
(E F) lim
n
(E F
n
) = lim
n
(E F
n
) F
n
= 0,
as stated. In particular, this is so if E c and E F.
(b) Take F
n

nN
from (a). If E c, then (because c is upwards-directed) E F
n
is included in some member of
c for every n N; so we must have the second alternative of (a), and E F is negligible. It follows that
sup
EE
E F = lim
n
F
n
sup
EE
E,
so F = sup
EE
E.
If G is any measurable set such that E F is negligible for every E c, then F
n
G is negligible for every n, so
that F G is negligible; thus F is an essential supremum for c.
(c) Again take F
n

nN
from (a), and set F =

nN
E
n
. Our hypothesis now is that F c, so has both the
properties declared.
215B -nite spaces are so important that I think it is worth spelling out the following facts.
Proposition Let (X, , ) be a semi-nite measure space. Write A for the family of -negligible sets and
f
for the
family of measurable sets of nite measure. Then the following are equiveridical:
(i) (X, , ) is -nite;
(ii) every disjoint family in
f
A is countable;
(iii) every disjoint family in A is countable;
(iv) for every c there is a countable set c
0
c such that E

c
0
is negligible for every E c;
(v) for every non-empty upwards-directed c there is a non-decreasing sequence F
n

nN
in c such that E

nN
F
n
is negligible for every E c;
44 Taxonomy of measure spaces 215B
(vi) for every non-empty c , there is a non-decreasing sequence F
n

nN
in c such that E

nN
F
n
is negligible
whenever E c and E F
n
for every n N;
(vii) either X = 0 or there is a probability measure on X with the same domain and the same negligible sets as
;
(viii) there is a measurable integrable function f : X ]0, 1];
(ix) either X = 0 or there is a measurable function f : X ]0, [ such that
_
fd = 1.
proof (i)(vii) and (viii) If X = 0, (vii) is trivial and we can take f = X in (viii). Otherwise, let E
n

nN
be
a disjoint sequence in
f
covering X. Then it is easy to see that there is a sequence
n

nN
of strictly positive real
numbers such that

n=0

n
E
n
= 1. Set E =

n=0

n
(E E
n
) for E ; then is a probability measure with
domain and the same negligible sets as . Also f =

n=0
min(1,
n
)E
n
is a strictly positive measurable integrable
function.
(vii)(vi) and (v) Assume (vii), and let c be a non-empty family of measurable sets. If X = 0 then (vi) and (v)
are certainly true. Otherwise, let be a probability measure with domain and the same negligible sets as . Since
sup
EE
E 1 is nite, we can apply 215Aa to nd a non-decreasing sequence F
n

nN
in c such that E

nN
F
n
is negligible whenever E c includes

nN
F
n
; and if c is upwards-directed, E

nN
F
n
will be negligible for every
E c, as in 215Ab.
(vi)(iv) Assume (vi), and let c be any subset of . Set
1 =

c
0
: c
0
c is countable.
By (vi), there is a sequence H
n

nN
in 1 such that H

nN
H
n
is negligible whenever H 1 and H H
n
for every
n N. Now we can express each H
n
as

n
, where c

n
c is countable; setting c
0
=

nN
c

n
, c
0
is countable. If
E c, then E

nN
H
n
=

(E c
0
) belongs to 1 and includes every H
n
, so that E

c
0
= E

nN
H
n
is
negligible. So c
0
has the property we need, and (iv) is true.
(iv)(iii) Assume (iv). If c is a disjoint family in A, take a countable c
0
c such that E

c
0
is negligible
for every E c. Then E = E

c
0
is negligible for every E c c
0
; but this just means that c c
0
is empty, so that
c = c
0
is countable.
(iii)(ii) is trivial.
(ii)(i) Assume (ii). Let P be the set of all disjoint subsets of
f
A, ordered by . Then P is a partially ordered
set, not empty (as P), and if Q P is non-empty and totally ordered then it has an upper bound in P. PPP Set
c =

Q, the union of all the disjoint families belonging to Q. If E c then E ( for some ( Q, so E
f
A.
If E, F c and E ,= F, then there are (, T Q such that E (, F T; now Q is totally ordered, so one of (, T is
larger than the other, and in either case ( T is a member of Q containing both E and F. But since any member of
Q is a disjoint collection of sets, E F = . As E and F are arbitrary, c is a disjoint family of sets and belongs to P.
And of course ( c for every ( Q, so c is an upper bound for Q in P. QQQ
By Zorns Lemma (2A1M), P has a maximal element c say. By (ii), c must be countable, so

c . Now
H = X

c is negligible. PPP??? Suppose, if possible, otherwise. Because (X, , ) is semi-nite, there is a set G of
nite measure such that G H and G > 0, that is, G
f
A and GE = for every E c. But this means that
G c is a member of P strictly larger than c, which is supposed to be impossible. XXXQQQ
Let X
n

nN
be a sequence running over c H. Then X
n

nN
is a cover of X by a sequence of measurable sets
of nite measure, so (X, , ) is -nite.
(v)(i) If (v) is true, then we have a sequence E
n

nN
in
f
such that E

nN
E
n
is negligible for every E
f
.
Because is semi-nite, X

nN
E
n
must be negligible, so X is covered by a countable family of sets of nite measure
and is -nite.
(viii)(ix) If X = 0 this is trivial. Otherwise, if f is a strictly positive measurable integrable function, then
c =
_
f > 0 (122Rc), so
1
c
f is a strictly positive measurable function with integral 1.
(ix)(i) If f : X ]0, [ is measurable and integrable, x : f(x) 2
n

nN
is a sequence of sets of nite
measure covering X.
215C Corollary Let (X, , ) be a -nite measure space, and suppose that c is any non-empty set.
(a) There is a non-decreasing sequence F
n

nN
in c such that, for every E , either there is an n N such that
E F
n
is not included in any member of c or E

nN
F
n
is negligible.
215Yb -nite spaces and the principle of exhaustion 45
(b) If c is upwards-directed, then there is a non-decreasing sequence F
n

nN
in c such that

nN
F
n
is an essential
supremum of c in .
(c) If the union of any non-decreasing sequence in c belongs to c, then there is an F c such that EF is negligible
whenever E c and F E.
proof By 215B, there is a totally nite measure on X with the same measurable sets and the same negligible sets
as . Since sup
EE
E is nite, we can apply 215A to to obtain the results.
215D As a further example of the use of the principle of exhaustion, I give a fundamental fact about atomless
measure spaces.
Proposition Let (X, , ) be an atomless measure space. If E and 0 E < , there is an F such
that F E and F = .
proof (a) We need to know that if G is non-negligible and n N, then there is an H G such that 0 < H
2
n
G. PPP Induce on n. For n = 0 this is trivial. For the inductive step to n + 1, use the inductive hypothesis to nd
H G such that 0 < H 2
n
G. Because is atomless, there is an H

H such that H

, (H H

) are both
dened and non-zero. Now at least one of them has measure less than or equal to
1
2
H, so gives us a subset of G of
non-zero measure less than or equal to 2
n1
G. QQQ
It follows that if G has non-zero nite measure and > 0, there is a measurable set H G such that 0 < H .
(b) Let 1 be the family of all those H such that H E and H . If H
n

nN
is any non-decreasing
sequence in 1, then (

nN
H
n
) = lim
n
H
n
, so

nN
H
n
1. So 215Ac tells us that there is an F 1
such that H F is negligible whenever H 1 and F H. ??? Suppose, if possible, that F < . By (a), there is an
H E F such that 0 < H F. But in this case H F 1 and ((H F) F) > 0, which is impossible. XXX
So we have found an appropriate set F.
215X Basic exercises (a) Let (X, , ) be a measure space and a non-empty set of -integrable real-valued
functions from X to R. Suppose that sup
nN
_
f
n
is nite for every sequence f
n

nN
in such that f
n

a.e.
f
n+1
for every n. Show that there is a sequence f
n

nN
in such that f
n

a.e.
f
n+1
for every n and, for every integrable
real-valued function f on X, either f
a.e.
sup
nN
f
n
or there is an n N such that no member of is greater than
or equal to max(f, f
n
) almost everywhere.
>>>(b) Let (X, , ) be a measure space. (i) Suppose that c is a non-empty upwards-directed subset of such that
c = sup
EE
E is nite. Show that E

nN
F
n
is negligible whenever E c and F
n

nN
is a sequence in c such that
lim
n
F
n
= c. (ii) Let be a non-empty set of integrable functions on X which is upwards-directed in the sense
that for all f, g there is an h such that max(f, g)
a.e.
h, and suppose that c = sup
f
_
f is nite. Show
that f
a.e.
sup
nN
f
n
whenever f and f
n

nN
is a sequence in such that lim
n
_
f
n
= c.
(c) Use 215A to shorten the proof of 211Ld.
(d) Give an example of a (non-semi-nite) measure space (X, , ) satisfying conditions (ii)-(iv) of 215B, but not
(i).
>>>(e) Let (X, , ) be an atomless -nite measure space. Show that for any > 0 there is a disjoint sequence
E
n

nN
of measurable sets with measure at most such that X =

nN
E
n
.
(f ) Let (X, , ) be an atomless strictly localizable measure space. Show that for any > 0 there is a decomposition
X
i

iI
of X such that X
i
for every i I.
215Y Further exercises (a) Let (X, , ) be a -nite measure space and f
mn

m,nN
, f
m

mN
, f measurable
real-valued functions dened almost everywhere in X and such that f
mn

nN
f
m
a.e. for each m and f
m

mN
f
a.e. Show that there is a strictly increasing sequence n
m

mN
in N such that f
m,n
m

mN
f a.e. (Compare 134Yb.)
(b) Let (X, , ) be a -nite measure space. Let f
n

nN
be a sequence of measurable real-valued functions such
that f = lim
n
f
n
is dened almost everywhere in X. Show that there is a non-decreasing sequence X
k

kN
of
measurable subsets of X such that

kN
X
k
is conegligible in X and f
n

nN
f uniformly on every X
k
, in the sense
that for any > 0 there is an m N such that [f
j
(x) f(x)[ is dened and less than or equal to whenever j m,
x X
k
.
(This is a version of Egorovs theorem.)
46 Taxonomy of measure spaces 215Yc
(c) Let (X, , ) be a totally nite measure space and f
n

nN
, f measurable real-valued functions dened almost
everywhere in X. Show that f
n

nN
f a.e. i there is a sequence
n

nN
of strictly positive real numbers, converging
to 0, such that
lim
n

kn
x : x domf
k
domf, [f
k
(x) f(x)[
n
) = 0.
(d) Find a direct proof of (v)(vi) in 215B. (Hint: given c , use Zorns Lemma to nd a maximal totally
ordered c

c such that EF / A for any distinct E, F c

, and apply (v) to c

.)
215 Notes and comments The common ground of 215A, 215B(vi), 215C and 215Xa is actually one of the most
fundamental ideas in measure theory. It appears in such various forms that it is often easier to prove an application
from rst principles than to explain how it can be reduced to the versions here. But I will try henceforth to signal such
applications as they arise, calling the method (the proof of 215Aa or 215Xa) the principle of exhaustion. One point
which is perhaps worth noting here is the inductive construction of the sequence F
n

nN
in the proof of 215Aa. Each
F
n+1
is chosen after the preceding one. It is this which makes it possible, in the proof of 215B(vii)(vi), to extract
a suitable sequence F
n

nN
directly. In many applications (starting with what is surely the most important one in
the elementary theory, the Radon-Nikod ym theorem of 232, or with part (i) of the proof of 211Ld), this renement is
not needed; we are dealing with an upwards-directed set, as in 215B(v), and can choose the whole sequence F
n

nN
at once, no term interacting with any other, as in 215Xb. The axiom of dependent choice, which asserts that we can
construct sequences term-by-term, is known to be stronger than the axiom of countable choice, which asserts only
that we can choose countably many objects simultaneously.
In 215B I try to indicate the most characteristic properties of -niteness; in particular, the properties which
distinguish -nite measures from other strictly localizable measures. This result is in a way more abstract than the
manipulations in the rest of the section. Note that it makes an essential use of the axiom of choice in the form of Zorns
Lemma. I spent a paragraph in 134C commenting on the distinction between countable choice, which is needed for
anything which looks like the standard theory of Lebesgue measure, and the full axiom of choice, which is relatively
little used in the elementary theory. The implication (ii)(i) of 215B is one of the points where we do need something
beyond countable choice. (I should perhaps remark that the whole theory of non--nite measure spaces looks very odd
without the general axiom of choice.) Note also that in 215B the proofs of (i)(vii) and (vii)(vi) are the only points
where anything so vulgar as a number appears. The conditions (iii), (iv), (v) and (vi) are linked in ways that have
nothing to do with measure theory, and involve only with the structure (X, , A). (See 215Yd here, and 316D-316E
in Volume 3.) There are similar conditions relating to measurable functions rather than measurable sets; for a fairly
abstract example, see 241Ye.
In 215Ya-215Yc are three more standard theorems on almost-everywhere-convergent sequences which depend on -
or total niteness.
216 Examples
It is common practice and, in my view, good practice in books on pure mathematics, to provide discriminating
examples; I mean that whenever we are given a list of new concepts, we expect to be provided with examples to show
that we have a fair picture of the relationships between them, and in particular that we are not being kept ignorant of
some startling implication. Concerning the concepts listed in 211A-211K, we have ten dierent properties which some,
but not all, measure spaces possess, giving a conceivable total of 2
10
dierent types of measure space, classied according
to which of these ten properties they have. The list of basic relationships in 211L reduces these 1024 possibilities to 72.
Observing that a space can be simultaneously atomless and purely atomic only when the measure of the whole space
is 0, we nd ourselves with 56 possibilities, being two trivial cases with X = 0 (because such a measure may or may
not be complete) together with 9 2 3 cases, corresponding to the nine classes
probability spaces,
spaces which are totally nite, but not probability spaces,
spaces which are -nite, but not totally nite,
spaces which are strictly localizable, but not -nite,
spaces which are localizable and locally determined, but not strictly localizable,
spaces which are localizable, but not locally determined,
spaces which are locally determined, but not localizable,
spaces which are semi-nite, but neither locally determined nor localizable,
spaces which are not semi-nite;
*216C Examples 47
the two classes
spaces which are complete,
spaces which are not complete;
and the three classes
spaces which are atomless, not of measure 0,
spaces which are purely atomic, not of measure 0,
spaces which are neither atomless nor purely atomic.
I do not propose to give a complete set of fty-six examples, particularly as rather fewer than fty-six dierent ideas
are required. However, I do think that for a proper understanding of abstract measure spaces it is necessary to have
seen realizations of some of the critical combinations of properties. I therefore take a few paragraphs to describe three
special examples to add to those of 211M-211R.
216A Lebesgue measure Before turning to the new ideas, let me mention Lebesgue measure again. As remarked
in 211M, 211P and 211Qa,
(a) Lebesgue measure on R is complete, atomless and -nite, therefore strictly localizable, localizable and locally
determined.
(b) The subspace measure
[0,1]
on [0, 1] is a complete, atomless probability measure.
(c) The restriction B of to the Borel -algebra B of R is atomless, -nite and not complete.
216B I now embark on the description of three counter-examples; meaning spaces built specically for the purpose
of showing that there are no unexpected implications among the ten properties under consideration here. Even by
the standards of this chapter these must be regarded as dispensable by the student who wants to get on with the real
business of understanding the big theorems of the subject. Neither the existence of these examples, nor the techniques
needed in constructing them, are vital for anything else we shall look at before Volume 5. But if you are going to take
abstract measure theory seriously at all, sooner or later you will need to form some kind of mental picture of the nature
of the spaces possessing the dierent properties here, and a minimal requirement of such a picture is that it should
include the discriminations witnessed by these examples.
*216C A complete, localizable, non-locally-determined space The rst example hardly needs an idea beyond
what we already have, but it does call for more manipulations than it seems fair to set as an exercise, and may therefore
be useful as a demonstration of technique.
(a) Let I be any uncountable set, and set X = 0, 1 I. For E X, y 0, 1 set E[y] = i : (y, i) E I.
Set
= E : E X, E[0]E[1] is countable.
Then is a -algebra of subsets of X. PPP (i) [0][1] = is countable, so . (ii) If E then
(X E)[0](X E)[1] = E[0]E[1]
is countable. (iii) If E
n

nN
is a sequence in and E =

nN
E
n
, then
E[0]E[1]

nN
E
n
[0]E
n
[1]
is countable. QQQ
For E , set E = #(E[0]) if this is nite, otherwise; then (X, , ) is a measure space.
(b) (X, , ) is complete. PPP If A E and E = 0, then (0, i) / E for every i. So
A[0]A[1] = A[1] E[1] = E[1]E[0]
must be countable, and A . QQQ
(c) (X, , ) is semi-nite. PPP If E and E > 0, there is an i I such that (0, i) E; now F = (0, i) E
and F = 1. QQQ
(d) (X, , ) is localizable. PPP Let c be any subset of . Set
J =

EE
E[0], G = 0, 1 J.
Then G . If H , then
48 Taxonomy of measure spaces *216C
(E H) = 0 for every E c
E[0] H[0] for every E c
(0, i) H for every i J
(G H) = 0.
Thus G is an essential supremum for c in ; as c is arbitrary, is localizable. QQQ
(e) (X, , ) is not locally determined. PPP Consider H = 0 I. Then H / because H[0]H[1] = I is
uncountable. But let E be any set such that E < . Then
(E H)[0](E H)[1] = (E H)[0] E[0]
is nite, so E H . As E is arbitrary, H witnesses that is not locally determined. QQQ
(f ) (X, , ) is purely atomic. PPP Let E be any set of non-zero measure. Let i I be such that (0, i) E.
Then (0, i) E and F = (0, i) is a set of measure 1, included in E; because F is a singleton set, it must be an atom
for ; as E is arbitrary, is purely atomic. QQQ
(g) Thus the construction here yields a complete, localizable, purely atomic, non-locally-determined space.
*216D A complete, locally determined space which is not localizable The next construction requires a
little set theory. We need two sets I, J such that I is uncountable (more strictly, I cannot be expressed as the union of
countably many countable sets), I J and J cannot be expressed as

iI
K
i
where every K
i
is countable. The most
natural way of doing this, subject to the axiom of choice, is to take I =
1
, the rst uncountable ordinal, and J to
be
2
, the rst ordinal from which there is no injection into
1
(see 2A1Fc); but in case you prefer other formulations
(e.g., I = x : x R and J = TR), I will write the following argument in terms of I and J, and you can pick your
own pair.
(a) Let T be the countable-cocountable -algebra of J and the countable-cocountable measure on J (211R). Set
X = J J and for E X set
E[] = : (, ) E, E
1
[] = : (, ) E
for every J. Set
= E : E[] and E
1
[] belong to T for every J,
E =

J
E[] +

J
E
1
[]
for every E . It is easy to check that is a -algebra and that is a measure.
(b) (X, , ) is complete. PPP If A E and E = 0, then all the sets E[] and E
1
[] are countable, so the
same is true of all the sets A[] and A
1
[], and A . QQQ
(d) (X, , ) is semi-nite. PPP For each J, set
G

= J,

G

= J .
Then all the sections G

[], G
1

[],

G

[] and

G
1

[] are either J or or , so belong to T, and all the G

belong to , with -measure 1.


Suppose that E is a set of strictly positive measure. Then there must be some J such that
0 < E[] +E
1
[] = (E G

) +(E

G

) < ,
and one of the sets E G

, E

G

is a set of non-zero nite measure included in E. QQQ


(e) (X, , ) is locally determined. PPP Suppose that H X is such that H E whenever E and E < .
Then, in particular, H G

and H

G

belong to , so
H[] = (H

G

)[] T,
H
1
[] = (H G

)
1
[] T,
for every J. This shows that H . As H is arbitrary, is locally determined. QQQ
(f ) (X, , ) is not localizable. PPP Set c = G

: J. ??? Suppose, if possible, that G is an essential


supremum for c. Then
*216E Examples 49
(J G[]) = (G

G) = 0
and J G[] is countable, for every J. Consequently J ,=

I
(J G[]), and there is an belonging to
J

I
(J G[]) =

I
G[]. This means just that (, ) G for every I, that is, that I G
1
[].
Accordingly G
1
[] is uncountable, so that G
1
[] = (G

G

) = 1. But observe that (G

) = (, ) = 0
for every J. This means that, setting H = X

G

, E H is negligible, for every E c; so that we must have


0 = (G H) = (G

G

) = 1, which is absurd. XXX


Thus c has no essential supremum in , and cannot be localizable. QQQ
(g) (X, , ) is purely atomic. PPP If E has non-zero measure, there must be some J such that one of E[],
E
1
[] is not countable; that is, such that one of EG

, E

G

is not negligible. But if now H and H EG

,
either H[] is countable, and H = 0, or J H[] is countable, and (G

H) = 0; similarly, if H E

G

, one
of H, (

G

H) must be 0, according to whether H


1
[] is countable or not. Thus E G

and E

G

, if not
negligible, must be atoms, and E must include an atom. As E is arbitrary, is purely atomic. QQQ
(h) Thus (X, , ) is complete, locally determined and purely atomic, but is not localizable.
*216E A complete, locally determined, localizable space which is not strictly localizable For the last,
and most interesting, construction, we need a non-trivial result in innitary combinatorics, which I have written out
in 2A1P: if I is any set, and f

A
is a family in 0, 1
I
, the set of functions from I to 0, 1, with #(A) strictly
greater than c, the cardinal of the continuum, and if K

A
is any family of countable subsets of I, then there must
be distinct , A such that f

and f

agree on K

.
Armed with this fact, I proceed as follows.
(a) Let C be any set of cardinal greater than c. Set I = TC and X = 0, 1
I
. For C, dene x

X by saying
that x

() = 1 if C and x

() = 0 if / C. Let / be the family of countable subsets of I, and for


K /, C set
F
K
= x : x X, xK = x

K X.
Let

= E : E X, either there is a K / such that F


K
E
or there is a K / such that F
K
X E.
Then

is a -algebra of subsets of X. PPP (i) F

X so

. (ii) The denition of

is symmetric between
E and X E, so X E

whenever E

. (iii) Let E
n

nN
be a sequence in

, with union E. () If there are


n N, K / such that F
K
E
n
, then F
K
E, so E

. () Otherwise, there is for each n N a K


n
/ such
that F
,K
n
X E
n
. Set K =

nN
K
n
/. Then
F
K
= x : xK = x

K = x : xK
n
= x

K
n
for every n N
=

nN
F
,K
n

nN
X E
n
= X E,
so again E

. As E
n

nN
is arbitrary,

is a -algebra. QQQ
(b) Set
=

;
then , being an intersection of -algebras, is a -algebra of subsets of X (see 111Ga). Dene : [0, ] by setting
E = #( : x

E) if this is nite,
= otherwise;
then is a measure.
(c) It will be convenient later to know something about the sets
G
D
= x : x X, x(D) = 1
for D C. In particular, every G
D
belongs to . PPP If D, then x

(D) = 1 so G
D
= F
,{D}

. If C D,
then x

(D) = 0 so G
D
= X F
,{D}

. QQQ Also, of course, : x

G
D
= D.
50 Taxonomy of measure spaces *216E
(d) (X, , ) is complete. PPP Suppose that A E and that E = 0. For every C, E

and x

/ E, so
F
K
, E for any K / and there is a K / such that
F
K
X E X A.
Thus A

; as is arbitrary, A . As A is arbitrary, is complete. QQQ


(e) (X, , ) is semi-nite. PPP Let E be a set of positive measure. Then there must be some C such that
x

E. Consider E

= E G
{}
. As x

, E

1 > 0. On the other hand, G


{}
= #( : ) = 1, so
E

= 1. As E is arbitrary, is semi-nite. QQQ


(f ) (X, , ) is localizable. PPP Let c be any subset of . Set D = : C, x

c. Consider G
D
. For H ,
(E H) = 0 for every E c
x

/ E H for every E c, C
x

H for every D
x

/ G
D
H for every C
(G
D
H) = 0.
Thus G
D
is an essential supremum for c in . As c is arbitrary, is localizable. QQQ
(g) (X, , ) is not strictly localizable. PPP??? Suppose, if possible, that X
j

jJ
is a decomposition of (X, , ). Set
J

= j : j J, X
j
> 0. For each j J

, the set C
j
= : x

X
j
must be nite and non-empty. Moreover,
for each C, there must be some j J such that (G
{}
X
j
) > 0, and in this case j J

and C
j
. Thus
C =

jJ

C
j
. Because #(C) > c, #(J

) > c (2A1Ld).
For each j J

, choose
j
C
j
. Then
x

j
X
j

j
,
so there must be a K
j
/ such that F

j
,K
j
X
j
.
At this point I nally turn to the result cited at the start of this example. Because #(J

) > c, there must be distinct


j, k J

such that x

j
and x

k
agree on K
j
K
k
. We may therefore dene x X by saying that
x() = x

j
() if K
j
,
= x

k
() if K
k
,
= 0 if C (K
j
K
j
).
Now
x F

j
,K
j
F

k
,K
k
X
j
X
k
,
and X
j
X
k
,= ; contradicting the assumption that the X
j
formed a decomposition of X. XXXQQQ
(h) (X, , ) is purely atomic. PPP If E and E > 0, then (as remarked in (e) above) there is a C such that
(E G
{}
) = 1; now E G
{}
must be an atom. QQQ
(i) Accordingly (X, , ) is a complete, locally determined, localizable, purely atomic measure space which is not
strictly localizable.
216X Basic exercises (a) In the construction of 216C, show that the subspace measure on 1I is not semi-nite.
(b) Suppose, in 216D, that I =
1
. (i) Show that the set (, ) : <
1
is measured by the measure
constructed by Caratheodorys method from

T(I I), but not by the subspace measure on I I. (ii) Hence, or


otherwise, show that the subspace measure on I I is not locally determined.
(c) In 216Ya, 252Yq and 252Ys below, I indicate how to construct atomless versions of 216C, 216D and 216E, that
is, atomless complete measure spaces of which the rst is localizable but not locally determined, the second is locally
determined spaces but not localizable, and the third is locally determined and localizable but not strictly localizable.
Show how direct sums of these, together with counting measure and the examples described in this chapter, can be
assembled to provide all 56 examples called for by the discussion in the introduction to this section.
216Y Further exercises (a) Let be Lebesgue measure on [0, 1], and its domain. Set Y = [0, 1] 0, 1 and
write
216 Notes Examples 51
T = F : F Y, F
1
[0] ,
F = F
1
[0] for every F T.
Set
T
0
= F : F T, F
1
[0]F
1
[1] is -negligible.
Let I be an uncountable set. Set X = Y I,
= E : E X, E
1
[i] T for every i I, i : E
1
[i] / T
0
is countable,
E =

iI
E
1
[i] for E .
(i) Show that (Y, T, ) and (Y, T
0
, T
0
) are complete probability spaces, and that for every F T there is an F

T
0
such that (FF

) = 0. (ii) Show that (X, , ) is an atomless complete localizable measure space which is not locally
determined.
(b) Dene a measure on X =
2

2
as follows. Take to be the -algebra of subsets of X generated by
A
2
: A
2

2
: <
2
.
For E set
W(E) = : <
2
, sup E[] =
2
,
and set E = #(W(E)) if this is nite, otherwise. Show that is a measure on X, is localizable and locally
determined, but does not have locally determined negligible sets. Find a subspace Y of X such that the subspace
measure on Y is not semi-nite.
(c) Show that in the space described in 216E every set has a measurable envelope, but that this is not true in the
spaces of 216C and 216D.
(d) Set X =
1

2
. For E X set
A(E) = : for some , just one of (, ), (, + 1) belongs to E,
B(E) = : there are ,

such that <

<
2
and just one of (, ), (,

) belongs to E,
W(E) = : #(E[]) =
2
.
Let be the set of subsets E of X such that A(E) is countable and #(B(E))
1
. For E , set E = #(W(E)) if
this is nite, otherwise. (i) Show that (X, , ) is a measure space. (ii) Show that if is the completion of , then
its domain is the set of subsets E of X such that A(E) is countable, and is strictly localizable. (iii) Show that is
not strictly localizable.
216 Notes and comments The examples 216C-216E are designed to form, with Lebesgue measure, a basis for
constructing a complete set of examples for the concepts listed in 211A-211K. One does not really expect to encounter
these phenomena in applications, but a clear understanding of the possibilities demonstrated by these examples is part
of a proper appreciation of their rarity. Of course, if we add further properties to our list for instance, the property of
having locally determined negligible sets (213I), or the property that every subset should have a measurable envelope
(213Xl) then there are further positive results to complement 211L, and more examples to hunt for, like 216Yb. But
it is time, perhaps past time, that we returned to the classical theorems which apply to the measure spaces at the
centre of the subject.
52 The Fundamental Theorem of Calculus
Chapter 22
The Fundamental Theorem of Calculus
In this chapter I address one of the most important properties of the Lebesgue integral. Given an integrable function
f : [a, b] R, we can form its indenite integral F(x) =
_
x
a
f(t)dt for x [a, b]. Two questions immediately present
themselves. (i) Can we expect to have the derivative F

of F equal to f? (ii) Can we identify which functions F


will appear as indenite integrals? Reasonably satisfactory answers may be found for both of these questions: F

= f
almost everywhere (222E) and indenite integrals are the absolutely continuous functions (225E). In the course of
dealing with them, we need to develop a variety of techniques which lead to many striking results both in the theory
of Lebesgue measure and in other, apparently unrelated, topics in real analysis.
The rst step is Vitalis theorem (221), a remarkable argument it is more a method than a theorem which uses
the geometric nature of the real line to extract disjoint subfamilies from collections of intervals. It is the foundation
stone not only of the results in 222 but of all geometric measure theory, that is, measure theory on spaces with
a geometric structure. I use it here to show that monotonic functions are dierentiable almost everywhere (222A).
Following this, Fatous Lemma and Lebesgues Dominated Convergence Theorem are enough to show that the derivative
of an indenite integral is almost everywhere equal to the integrand. We nd that some innocent-looking manipulations
of this fact take us surprisingly far; I present these in 223.
I begin the second half of the chapter with a discussion of functions of bounded variation, that is, expressible as the
dierence of bounded monotonic functions (224). This is one of the least measure-theoretic sections in the volume;
only in 224I and 224J are measure and integration even mentioned. But this material is needed for Chapter 28 as
well as for the next section, and is also one of the basic topics of twentieth-century real analysis. 225 deals with
the characterization of indenite integrals as the absolutely continuous functions. In fact this is now quite easy; it
helps to call on Vitalis theorem again, but everything else is a straightforward application of methods previously used.
The second half of the section introduces some new ideas in an attempt to give a deeper intuition into the essential
nature of absolutely continuous functions. 226 returns to functions of bounded variation and their decomposition into
saltus and absolutely continuous and singular parts, the rst two being relatively manageable and the last looking
something like the Cantor function.
221 Vitalis theorem in R
I give the rst theorem of this chapter a section to itself. It occupies a position between measure theory and geometry
(it is, indeed, one of the fundamental results of geometric measure theory), and its proof involves both the measure
and the geometry of the real line.
221A Vitalis theorem Let A be a bounded subset of R and J a family of non-singleton closed intervals in R such
that every point of A belongs to arbitrarily short members of J. Then there is a countable set J
0
J such that (i) J
0
is disjoint, that is, I I

= for all distinct I, I

J
0
(ii) (A

J
0
) = 0, where is Lebesgue measure on R.
proof (a) If there is a nite disjoint set J
0
J such that A

J
0
(including the possibility that A = J
0
= ), we
can stop. So let us suppose henceforth that there is no such J
0
.
Let

be Lebesgue outer measure on R. Suppose that [x[ < M for every x A, and set
J

= I : I J, I [M, M].
(b) In this case, if J
0
is any nite disjoint subset of J

, there is a J J

which is disjoint from any member of J


0
.
PPP Take x A

J
0
. Now there is a > 0 such that [x , x + ] does not meet any member of J
0
, and as [x[ < M
we can suppose that [x , x + ] [M, M]. Let J be a member of J, containing x, and of length at most ; then
J J

and J

J
0
= . QQQ
(c) We can now choose a sequence
n

nN
of real numbers and a disjoint sequence I
n

nN
in J

inductively, as
follows. Given I
j

j<n
(if n = 0, this is the empty sequence, with no members), with I
j
J

for each j < n, and


I
j
I
k
= for j < k < n, set

n
= I : I J

, I I
j
= for every j < n.
We know from (b) that
n
,= . Set

n
= supI : I
n
;
then 0 <
n
2M. We may therefore choose a set I
n

n
such that I
n

1
2

n
, and this continues the induction.
221X Vitalis theorem in R 53
(e) Because the I
n
are disjoint Lebesgue measurable subsets of [M, M], we have

n=0

n
2

n=0
I
n
4M < ,
and lim
n

n
= 0. Now dene I

n
to be the closed interval with the same midpoint as I
n
but ve times the length,
so that it projects past each end of I
n
by at least
n
. I claim that, for any n,
A

j<n
I
j

jn
I

j
.
PPP??? Suppose, if possible, otherwise. Take any x belonging to A (

j<n
I
j

jn
I

j
). Let > 0 be such that
[x , x +] [M, M]

j<n
I
j
,
and let J J be such that
x J [x , x +].
Then
J > 0 = lim
m

m
;
let m be the least integer greater than or equal to n such that
m
< J. In this case J cannot belong to
m
, so there
must be some k < m such that J I
k
,= , because certainly J J

. By the choice of , k cannot be less than n, so


n k < m, and
k
J. In this case, the distance from x to the nearest endpoint of I
k
is at most J
k
. But the
ends of I

k
project beyond the ends of I
k
by at least
k
, so x I

k
; which contradicts the choice of x. XXXQQQ
(f ) It follows that

(A

j<n
I
j
) (

jn
I

j
)

j=n
I

j
5

j=n
I
j
.
As

j=0
I
j
2M < ,
we must have
lim
n

(A

j<n
I
j
) = 0,
and
(A

jN
I
j
) =

(A

jN
I
j
) inf
nN

(A

j<n
I
j
) = 0.
Thus in this case we may set J
0
= I
n
: n N to obtain a countable disjoint family in J with (A

J
0
) = 0.
221B Remarks (a) I have expressed this theorem in the form there is a countable set J
0
J such that . . . in an
attempt to nd a concise way of expressing the three possibilities
(i) A = J = , so that we must take J
0
= ;
(ii) there are disjoint I
0
, . . . , I
n
J such that A I
0
. . . I
n
, so that we can take J
0
= I
0
, . . . , I
n
;
(iii) there is a disjoint sequence I
n

nN
in J such that (A

nN
I
n
) = 0, so that we can take J
0
=
I
n
: n N.
Of course many applications, like the proof of 221A itself, will use forms of these three alternatives.
(b) The actual theorem here, as stated, will be used in the next section. But quite as important as the statement
of the theorem is the principle of its proof. The I
n
are chosen greedily, that is, when we come to choose I
n
we look
at the family
n
of possible intervals, given the choices I
0
, . . . , I
n1
already made, and choose an I
n

n
which is
about as big as it could be. The supremum of the possibilities for I
n
is
n
; but since we do not know that there
is any I
n
such that I =
n
, we must settle for a little less. I follow the standard formula in taking I
n

1
2

n
,
but of course I could have taken I
n

99
100

n
, or I
n
(1 2
n
)
n
, if that had helped later on. The remarkable
thing is that this works; we can choose the I
n
without foresight and without considering their interrelationships (for
that matter, without examining the set A) beyond the minimal requirement that I
n
I
j
= for j < n, and even this
arbitrary and casual procedure yields a suitable sequence.
(c) I have stated the theorem in terms of bounded sets A and closed intervals, which is adequate for our needs, but
very small changes in the proof suce to deal with arbitrary (non-singleton) intervals, and another renement handles
unbounded sets A. (See 221Ya.)
54 The Fundamental Theorem of Calculus 221X
221X Basic exercises (a) Let ]0, 1[. Suppose, in part (c) of the proof of 221A, we take I
n

n
for each
n N, rather than I
n

1
2

n
. What will be the appropriate constant to take in place of 5 in dening the sets I

j
of
part (e)?
221Y Further exercises (a) Let A be a subset of R and J a family of non-singleton intervals in R such that every
point of A belongs to arbitrarily short members of J. Show that there is a countable disjoint set J
0
J such that
A

J
0
is Lebesgue negligible. (Hint: apply 221A to the sets A ]n, n + 1[, I : I J, I ]n, n + 1[, writing I for
the closed interval with the same endpoints as I.)
(b) Let be any family of non-singleton intervals in R. Show that

is Lebesgue measurable. (Hint: apply (a)


to A =

and the family J of non-singleton subintervals of members of .)


(c) Let (X, ) be a metric space, A a subset of X, and J a family of closed balls of non-zero radius in X such that
every point of A belongs to arbitrarily small members of J. (I say here that a set is a closed ball of non-zero radius
if it is expressible in the form B(x, ) = y : (y, x) where x X and > 0. Of course it is possible for such
a ball to be a singleton x.) Show that either A can be covered by a nite disjoint family in J or there is a disjoint
sequence B(x
n
,
n
)
nN
in J such that
A

mn
B(x
m
,
m
)

m>n
B(x
m
, 5
m
) for every n N
or there is a disjoint sequence B(x
n
,
n
)
nN
in J such that inf
nN

n
> 0.
(d) Give an example of a family J of open intervals such that every point of R belongs to arbitrarily small members
of J, but if I
n

nN
is any disjoint sequence in J, and for each n N we write I

n
for the closed interval with the same
centre as I
n
and ten times the length, then there is an n such that ]0, 1[ ,

m<n
I
m

mn
I

m
.
(e)(i) Show that if J is a nite family of intervals in R there are J
0
, J
1
J such that

(J
0
J
1
) =

J and both
J
0
and J
1
are disjoint families. (Hint: induce on #(J).) (ii) Suppose that J is a family of non-singleton intervals, of
length at most 1, covering a bounded set A R, and that > 0. Show that there is a disjoint subfamily J
0
of J such
that

(A

J
0
)
1
2

A + . (Hint: replacing each member of J by a slightly longer one with rational endpoints,
reduce to the case in which J is countable and thence to the case in which J is nite; now use (i).) (iii) Use (ii) to
prove Vitalis theorem. (I learnt this argument from J.Aldaz.)
221 Notes and comments I have headed this section Vitalis theorem in R because there is an r-dimensional version,
which will appear in Chapter 26 below. There is an anomaly in the position of this theorem. It is an indispensable
element of the proofs of some of the most important theorems in measure theory; on the other hand, the ideas involved
in its own proof are not used elsewhere in the elementary theory. I have therefore myself sometimes omitted the proof
when teaching this material, and would not reproach any student who left it to one side for the moment. At some
stage, of course, any measure theorist must master the method, not just for the sake of completeness, but in order to
gain an intuition for possible variations. I must emphasize that it is the principle of the proof, rather than its details,
which is important, because there are innumerable forms of Vitalis theorem. (I oer some variations in the exercises
above and in 261 below, and there are many others which are important in more advanced work; one will appear in
472 in Volume 4.) This principle is, I suppose, that
(i) we choose the I
n
greedily, according to some more or less natural criterion applicable to each I
n
as
we come to choose it, without attempting to look ahead;
(ii) we prove that their sizes tend to zero, even though we seemed to do nothing to ensure that they
would (but note the shift from J to J

in part (a) of the proof of 221A, which is exactly what is needed to


make this step work);
(iii) we check that for a suitable denition of I

n
, enlarging I
n
, we shall have A

m<n
I
m

mn
I

m
for every n, while

n=0
I

n
< .
In a way, we have to count ourselves lucky every time this works. The reason for studying as many variations as
possible of a technique of this kind is to learn to guess when we might be lucky.
222A Dierentiating an indenite integral 55
222 Dierentiating an indenite integral
I come now to the rst of the two questions mentioned in the introduction to this chapter: if f is an integrable
function on [a, b], what is
d
dx
_
x
a
f? It turns out that this derivative exists and is equal to f almost everywhere (222E).
The argument is based on a striking property of monotonic functions: they are dierentiable almost everywhere (222A),
and we can bound the integrals of their derivatives (222C).
222A Theorem Let I R be an interval and f : I R a monotonic function. Then f is dierentiable almost
everywhere in I.
Remark If I seem to be speaking of a measure on R without naming it, as here, I mean Lebesgue measure.
proof As usual, write

for Lebesgue outer measure on R, for Lebesgue measure.


(a) To begin with (down to the end of (c) below), let us suppose that f is non-decreasing and I is a bounded open
interval on which f is bounded; say [f(x)[ M for x I. For any closed subinterval J = [a, b] of I, write f

(J) for
the open interval ]f(a), f(b)[. For x I, write
D

f(x) = limsup
h0
1
h
(f(x +h) f(x)), D

f(x) = liminf
h0
1
h
(f(x +h) f(x)),
allowing the value in both cases. Then f is dierentiable at x i D

f(x) = D

f(x) R. Because surely D

f(x)
D

f(x) 0, f will be dierentiable at x i D

f(x) is nite and D

f(x) D

f(x).
I therefore have to show that the sets
x : x I, D

f(x) = , x : x I, D

f(x) > D

f(x)
are negligible.
(b) Let us take A = x : x I, D

f(x) = rst. Fix an integer m 1 for the moment, and set


A
m
= x : x I, D

f(x) > m A.
Let J be the family of non-trivial closed intervals [a, b] I such that f(b) f(a) m(b a); then f

(J) mJ for
every J J. If x A
m
, then for any > 0 we have an h with 0 < [h[ and
1
h
(f(x +h) f(x)) > m, so that
[x, x +h] J if h > 0, [x +h, x] J if h < 0;
thus every member of A
m
belongs to arbitrarily small intervals in J. By Vitalis theorem (221A), there is a countable
disjoint set J
0
J such that (A

J
0
) = 0. Now, because f is non-decreasing, f

(J)
JI
0
is disjoint, and all the
f

(J) are included in [M, M], so

JI
0
f

(J) 2M and

JI
0
J 2M/m. Because A
m

J
0
is negligible,

A
m

2M
m
.
As m is arbitrary,

A = 0 and A is negligible.
(c) Now consider B = x : x I, D

f(x) > D

f(x). For q, q

Q with 0 q < q

, set
B
qq
= x : x I, D

f(x) < q, D

f(x) > q

.
Fix such q, q

for the moment, and write =

B
qq
. Take any > 0, and let G be an open set including B
qq
such
that G + (134Fa). Let be the set of non-trivial closed intervals [a, b] I G such that f(b) f(a) q(b a);
this time f

(J) qJ for J . Then every member of B


qq
is included in arbitrarily small members of , so there
is a countable disjoint
0
such that B
qq

0
is negligible. Let L be the set of endpoints of members of
0
;
then L is a countable union of doubleton sets, so is countable, therefore negligible. Set
C = B
qq

0
L;
then

C = . Let J be the set of non-trivial closed intervals J = [a, b] such that (i) J is included in one of the
members of
0
(ii) f(b) f(a) q

(b a); now f

(J) q

J for every J J. Once again, because every member of


C is an interior point of some member of
0
, every point of C belongs to arbitrarily small members of J; so there is a
countable disjoint J
0
J such that (C

J
0
) = 0.
As in (b) above,
q

J
0
) =

II
0
q

II
0
f

(I) = (

II
0
f

(I)).
On the other hand,
56 The Fundamental Theorem of Calculus 222A
(
_
JJ
0
f

(J)) =

JJ
0
f

(J) q

JJ
0
J = q(
_

0
)
q(
_
) qG q( +).
But

II
0
f

(I)

JJ
0
f

(J), because every member of J


0
is included in a member of
0
, so q

q( + ) and
q/(q

q). As is arbitrary, = 0.
Thus every B
qq
is negligible. Consequently B =

q,q

Q,0q<q

B
qq
is negligible.
(d) This deals with the case of a bounded open interval on which f is bounded and non-decreasing. Still for non-
decreasing f, but for an arbitrary interval I, observe that K = (q, q

) : q, q

I Q, q < q

is countable and that


I

(q,q

)K
]q, q

[ has at most two points (the endpoints of I, if any), so is negligible. If we write S for the set of points
of I at which f is not dierentiable, then from (a)-(c) we see that S ]q, q

[ is negligible for every (q, q

) K, so that
S

(q,q

)K
]q, q

[ is negligible and S is negligible.


(e) Thus we are done if f is non-decreasing. For non-increasing f, apply the above to f, which is dierentiable at
exactly the same points as f.
222B Remarks(a) I note that in the above argument I am using such formulae as

JI
0
f

(J). This is because


Vitalis theorem leaves it open whether the families J
0
will be nite or innite. The sum must be interpreted along
the lines laid down in 112Bd in Volume 1; generally,

kK
a
k
, where K is an arbitrary set and every a
k
0, is to be
sup
LK is nite

kL
a
k
, with the convention that

k
a
k
= 0. Now, in this context, if (X, , ) is a measure space,
K is a countable set, and E
k

kK
is a family in ,
(

kK
E
k
)

kK
E
k
,
with equality if E
k

kK
is disjoint. PPP If K = , this is trivial. Otherwise, let n k
n
: N K be a surjection, and
set
K
n
= k
i
: i n, G
n
=

in
E
k
i
=

kK
n
E
k
for each n N. Then G
n

nN
is a non-decreasing sequence with union E =

kK
E
k
, so
E = lim
n
G
n
= sup
nN
G
n
;
and
G
n

kK
n
E
k

kK
E
k
for every n, so E

kK
E
k
. If the E
k
are disjoint, then G
n
is precisely

kK
n
E
k
for each n; but as K
n

nN
is a non-decreasing sequence of sets with union K, every nite subset of K is included in some K
n
, and

kK
E
k
= sup
nN

kK
n
E
k
= sup
nN
G
n
= E,
as required. QQQ
(b) Some readers will prefer to re-index sets regularly, so that all the sums they need to look at will be of the form

n
i=0
or

i=0
. In eect, that is what I did in Volume 1, in the proof of 114Da/115Da, when showing that Lebesgue
outer measure is indeed an outer measure. The disadvantage of this procedure in the context of 222A is that we must
continually check that it doesnt matter whether we have a nite or innite sum at any particular moment. I believe
that it is worth taking the trouble to learn the technique sketched here, because it very frequently happens that we
wish to consider unions of sets indexed by sets other than N and 0, . . . , n.
(c) Of course the argument above can be shortened if you know a tiny bit more about countable sets than I have
explicitly stated so far. But note that the value assigned to

kK
a
k
must not depend on which enumeration k
n

nN
we pick on.
222C Lemma Suppose that a b in R, and that F : [a, b] R is a non-decreasing function. Then
_
b
a
F

exists
and is at most F(b) F(a).
Remark I discussed integration over subsets at length in 131 and 214. For measurable subsets, which are sucient
for our needs in this chapter, we have a simple description: if (X, , ) is a measure space, E and f is a real-valued
function, then
_
E
f =
_

f if the latter integral exists, where dom

f = (E domf) (X E) and

f(x) = f(x) if
x E domf, 0 if x X E (apply 131Fa to

f). It follows at once that if now F and F E,
_
F
f =
_
E
f F.
222D Dierentiating an indenite integral 57
I write
_
x
a
f to mean
_
[a,x[
f, which (because [a, x[ is measurable) can be dealt with as described above. Note that, as
long as we are dealing with Lebesgue measure, so that [a, x] ]a, x[ = a, x is negligible, there is no need to distinguish
between
_
[a,x]
,
_
]a,x[
,
_
[a,x[
,
_
]a,x]
; for other measures on R we may need to take more care. I use half-open intervals to
make it obvious that
_
x
a
f +
_
y
x
f =
_
y
a
f if a x y, because
f [a, y[ = f [a, x[ + f [x, y[.
proof (a) The result is trivial if a = b; let us suppose that a < b. By 222A, F

is dened almost everywhere in [a, b].


(b) For each n N, dene a simple function g
n
: [a, b[ R as follows. For 0 k < 2
n
, set a
nk
= a + 2
n
k(b a),
b
nk
= a + 2
n
(k + 1)(b a), I
nk
= [a
nk
, b
nk
[. For each x [a, b[, take that k < 2
n
such that x I
nk
, and set
g
n
(x) =
2
n
ba
(F(b
nk
) F(a
nk
))
for x I
nk
, so that g
n
gives the slope of the chord of the graph of F dened by the endpoints of I
nk
. Then
_
b
a
g
n
=

2
n
1
k=0
F(b
nk
) F(a
nk
) = F(b) F(a).
(c) On the other hand, if we set
C = x : x ]a, b[ , F

(x) exists,
then [a, b] C is negligible, by 222A, and F

(x) = lim
n
g
n
(x) for every x C. PPP Let > 0. Then there is a > 0
such that x+h [a, b] and [F(x+h) F(x)) hF

(x)[ [h[ whenever [h[ . Let n N be such that 2


n
(ba) .
Let k < 2
n
be such that x I
nk
. Then
x a
nk
x < b
nk
x +, g
n
(x) =
2
n
ba
(F(b
nk
) F(a
nk
)).
Now we have
[g
n
(x) F

(x)[ = [
2
n
ba
(F(b
nk
) F(a
nk
)) F

(x)[
=
2
n
ba
[F(b
nk
) F(a
nk
) (b
nk
a
nk
)F

(x)[

2
n
ba
_
[F(b
nk
) F(x) (b
nk
x)F

(x)[
+[F(x) F(a
nk
) (x a
nk
)F

(x)[
_

2
n
ba
([b
nk
x[ +[x a
nk
[) = .
And this is true whenever 2
n
, that is, for all n large enough. As is arbitrary, F

(x) = lim
n
g
n
(x). QQQ
(d) Thus g
n
F

almost everywhere in [a, b]. By Fatous Lemma,


_
b
a
F

=
_
b
a
liminf
n
g
n
liminf
n
_
b
a
g
n
= lim
n
_
b
a
g
n
= F(b) F(a),
as required.
Remark There is a generalization of this result in 224I.
222D Lemma Suppose that a < b in R, and that f, g are real-valued functions, both integrable over [a, b], such
that
_
x
a
f =
_
x
a
g for every x [a, b]. Then f = g almost everywhere in [a, b].
proof The point is that
_
E
f =
_
b
a
f E =
_
b
a
g E =
_
E
g
for any measurable set E [a, b[.
PPP (i) If E = [c, d[ where a c d b, then
_
E
f =
_
d
a
f
_
c
a
f =
_
d
a
g
_
c
a
g =
_
E
g.
(ii) If E = [a, b[ G for some open set G R, then for each n N set
K
n
= k : k Z, [k[ 4
n
, [2
n
k, 2
n
(k + 1)[ G,
58 The Fundamental Theorem of Calculus 222D
H
n
=

kK
n
[2
n
k, 2
n
(k + 1)[ [a, b[;
then H
n

nN
is a non-decreasing sequence of measurable sets with union E, so f E = lim
n
f H
n
, and
(by Lebesgues Dominated Convergence Theorem, because [f H
n
[ [f[ almost everywhere for every n, and [f[ is
integrable)
_
E
f = lim
n
_
H
n
f.
At the same time, each H
n
is a nite disjoint union of half-open intervals in [a, b[, so
_
H
n
f =

kK
n
_
[2
n
k,2
n
(k+1)[[a,b[
f =

kK
n
_
[2
n
k,2
n
(k+1)[[a,b[
g =
_
H
n
g,
and
_
E
g = lim
n
_
H
n
g = lim
n
_
H
n
f =
_
E
f.
(iii) For general measurable E [a, b[, we can choose for each n N an open set G
n
E such that G
n
E+2
n
(134Fa). Set G

n
=

mn
G
m
, E
n
= [a, b[ G

n
for each n,
F = [a, b[

nN
G
n
=

nN
[a, b[ G

n
=

nN
E
n
.
Then E F and
F inf
nN
G
n
= E,
so F E is negligible and f (F E) is zero almost everywhere; consequently
_
F\E
f = 0 and
_
F
f =
_
E
f. On the
other hand,
f F = lim
n
f E
n
,
so by Lebesgues Dominated Convergence Theorem again
_
E
f =
_
F
f = lim
n
_
E
n
f.
Similarly
_
E
g = lim
n
_
E
n
g.
But by part (ii) we have
_
E
n
g =
_
E
n
f for every n, so
_
E
g =
_
E
f, as required. QQQ
By 131Hb, f = g almost everywhere in [a, b[, and therefore almost everywhere in [a, b].
222E Theorem Suppose that a b in R and that f is a real-valued function which is integrable over [a, b]. Then
F(x) =
_
x
a
f exists in R for every x [a, b], and the derivative F

(x) exists and is equal to f(x) for almost every


x [a, b].
proof (a) For most of this proof (down to the end of (c) below) I suppose that f is non-negative. In this case,
F(y) = F(x) +
_
y
x
f F(x)
whenever a x y b; thus F is non-decreasing and therefore dierentiable almost everywhere in [a, b], by 222A.
By 222C we know also that
_
x
a
F

exists and is less than or equal to F(x) F(a) = F(x) for every x [a, b].
(b) Now suppose, in addition, that f is bounded; say 0 f(t) M for every t domf. Then M f is integrable
over [a, b]; let G be its indenite integral, so that G(x) = M(x a) F(x) for every x [a, b]. Applying (a) to M f
and G, we have
_
x
a
G

G(x) for every x [a, b]; but of course G

= M F

, so M(xa)
_
x
a
F

M(xa) F(x),
that is,
_
x
a
F

F(x) for every x [a, b]. Thus


_
x
a
F

=
_
x
a
f for every x [a, b]. Now 222D tells us that F

= f almost
everywhere in [a, b].
(c) Thus for bounded, non-negative f we are done. For unbounded f, let f
n

nN
be a non-decreasing sequence
of non-negative simple functions converging to f almost everywhere in [a, b], and let F
n

nN
be the corresponding
indenite integrals. Then for any n and any x, y with a x y b, we have
F(y) F(x) =
_
y
x
f
_
y
x
f
n
= F
n
(y) F
n
(x),
so that F

(x) F

n
(x) for any x ]a, b[ where both are dened, and F

(x) f
n
(x) for almost every x [a, b]. This is
true for every n, so F

f almost everywhere, and F

f 0 almost everywhere. On the other hand, as noted in (a),


_
b
a
F

F(b) F(a) =
_
b
a
f,
*222J Dierentiating an indenite integral 59
so
_
b
a
F

f 0. It follows that F

=
a.e.
f (that is, that F

= f almost everywhere in [a, b])(122Rd).


(d) This completes the proof for non-negative f. For general f, we can express f as f
1
f
2
where f
1
, f
2
are
non-negative integrable functions; now F = F
1
F
2
where F
1
, F
2
are the corresponding indenite integrals, so
F

=
a.e.
F

1
F

2
=
a.e.
f
1
f
2
, and F

=
a.e.
f.
222F Corollary Suppose that f is any real-valued function which is integrable over R, and set F(x) =
_
x

f for
every x R. Then F

(x) exists and is equal to f(x) for almost every x R.


proof For each n N, set
F
n
(x) =
_
x
n
f
for x [n, n]. Then F

n
(x) = f(x) for almost every x [n, n]. But F(x) = F(n) +F
n
(x) for every x [n, n], so
F

(x) exists and is equal to F

n
(x) for every x ]n, n[ for which F

n
(x) is dened; and F

(x) = f(x) for almost every


x [n, n]. As n is arbitrary, F

=
a.e.
f.
222G Corollary Suppose that E R is a measurable set and that f is a real-valued function which is integrable
over E. Set F(x) =
_
E],x[
f for x R. Then F

(x) = f(x) for almost every x E, and F

(x) = 0 for almost every


x R E.
proof Apply 222F to

f, where

f(x) = f(x) for x E domf and

f(x) = 0 for x R E, so that F(x) =
_
x

f for
every x R.
222H The result that
d
dx
_
x
a
f = f(x) for almost every x is satisfying, but is no substitute for the more elementary
result that this equality is valid at any point at which f is continuous.
Proposition Suppose that a b in R and that f is a real-valued function which is integrable over [a, b]. Set F(x) =
_
x
a
f
for x [a, b]. Then F

(x) exists and is equal to f(x) at any point x dom(f) ]a, b[ at which f is continuous.
proof Set c = f(x). Let > 0. Let > 0 be such that min(b x, x a) and [f(t) c[ whenever t domf
and [t x[ . If x < y x +, then
[
F(y)F(x)
yx
c[ =
1
yx
[
_
y
x
f c[
1
yx
_
y
x
[f c[ .
Similarly, if x y < x,
[
F(y)F(x)
yx
f(x)[ =
1
xy
[
_
x
y
f c[
1
xy
_
x
y
[f c[ .
As is arbitrary,
F

(x) = lim
yx
F(y)F(x)
yx
= c,
as required.
222I Complex-valued functions So far in this section, I have taken every f to be real-valued. The extension to
complex-valued f is just a matter of applying the above results to the real and imaginary parts of f. Specically, we
have the following.
(a) If a b in R and f is a complex-valued function which is integrable over [a, b], then F(x) =
_
x
a
f is dened
in C for every x [a, b], and its derivative F

(x) exists and is equal to f(x) for almost every x [a, b]; moreover,
F

(x) = f(x) whenever x dom(f) ]a, b[ and f is continuous at x.


(b) If f is a complex-valued function which is integrable over R, and F(x) =
_
x

f for each x R, then F

exists
and is equal to f almost everywhere in R.
(c) If E R is a measurable set and f is a complex-valued function which is integrable over E, and F(x) =
_
E],x[
f for each x R, then F

(x) = f(x) for almost every x E and F

(x) = 0 for almost every x R E.


*222J The Denjoy-Young-Saks theorem The next result will not be used, on present plans, anywhere in this
treatise. It is however central to parts of real analysis for which this volume is supposed to be a foundation, and while
60 The Fundamental Theorem of Calculus *222J
the argument requires a certain sophistication it is not really a large step from Lebesgues theorem 222A. I must begin
with some notation.
Denition Let f be any real function, and A R its domain. Write

A
+
= x : x A, ]x, x +] A ,= for every > 0,

= x : x A, [x , x[ A ,= for every > 0.


Set
D
+
(x) = limsup
yA,yx
f(y)f(x)
yx
= inf
>0
sup
yA,x<yx+
f(y)f(x)
yx
,
D
+
(x) = liminf
yA,yx
f(y)f(x)
yx
= sup
>0
inf
yA,x<yx+
f(y)f(x)
yx
for x

A
+
, and
D

(x) = limsup
yA,yx
f(y)f(x)
yx
= inf
>0
sup
yA,xy<x
f(y)f(x)
yx
,
D

(x) = liminf
yA,yx
f(y)f(x)
yx
= sup
>0
inf
yA,xy<x
f(y)f(x)
yx
for x

A

, all dened in [, ]. (These are the four Dini derivates of f. You will also see D
+
, d
+
, D

, d

used
in place of my D
+
, D
+
, D

and D

.)
Note that we surely have (D
+
f)(x) (D
+
f)(x) for every x

A
+
, while (D

f)(x) (D

f)(x) for every x



A

.
The ordinary derivative f

(x) is dened and equal to c R i () x belongs to some open interval included in A ()


(D
+
f)(x) = (D
+
f)(x) = (D

f)(x) = (D
+
f)(x) = c.
*222K Lemma Let A be any subset of R, and dene

A
+
and

A

as in 222J. Then A

A

and A

A

are countable,
therefore negligible.
proof We have
A

A
+
=

qQ
x : x A, x < q, A ]x, q] = .
But for any q Q there can be at most one x A such that x < q and ]x, q] does not meet A, so A

A
+
is a countable
union of nite sets and is countable. Similarly,
A

A

qQ
x : x A, q < x, A [q, x[ =
is countable.
*222L Theorem Let f be any real function, and A its domain. Then for almost every x A
either all four Dini derivates of f at x are dened, nite and equal
or (D
+
f)(x) = (D

f)(x) is nite, (D
+
f)(x) = and (D
+
f)(x) =
or (D
+
f)(x) = (D

f)(x) is nite, (D
+
f)(x) = and (D

f)(x) =
or (D
+
f)(x) = (D

f)(x) = and (D
+
f)(x) = (D

f)(x) = .
proof (a) Set

A =

A
+

, dening

A
+
and

A

as in 222J, so that

A is a cocountable subset of A and all four Dini
derivates are dened on

A. For n N, q Q set
E
qn
= x : x

A, x < q, f(y) f(x) n(y x) for every y A [x, q].
Observe that

nN,qQ
E
qn
= x : x

A, (D
+
f)(x) > .
For those q Q, n N such that E
qn
is not empty, set
qn
= sup E
qn
], q],
qn
= inf E
qn
[,
qn
], and for
x ]
qn
,
qn
[ set g
qn
(x) = inff(y) + ny : y A [x, q]. Note that if x E
qn

qn
,
qn
then g
qn
(x) = f(x) + nx
is nite; also g is monotonic, therefore nite everywhere in ]
qn
,
qn
[, and of course g
qn
(x) f(x) + nx for every
x A ]
qn
,
qn
[.
By 222A, almost every point of ]
qn
,
qn
[ belongs to F
qn
= domg

qn
; in particular, E
qn
F
qn
is negligible. Set
h
qn
(x) = g
qn
(x) nx for x ]
qn
,
qn
[; then h is dierentiable at every point of F
qn
. Now if x E
qn
F
qn
, we have
h
qn
(x) = f(x), while h
qn
(x) f(x) for x A ]
qn
,
qn
[; it follows that
*222L Dierentiating an indenite integral 61
(D
+
f)(x) = sup
>0
inf
yA]x,x+]
f(y)f(x)
yx
sup
>0
inf
yA]x,x+]
h
qn
(y)h
qn
(x)
yx
sup
0<<
qn
x
inf
y]x,x+]
h
qn
(y)h
qn
(x)
yx
= h

qn
(x),
and similarly (D

f)(x) h

qn
(x). On the other hand, if x

E
+
qn
, then
(D
+
f)(x) = sup
>0
inf
yA]x,x+]
f(y)f(x)
yx
sup
>0
inf
yE
qn
]x,x+]
f(y)f(x)
yx
= sup
>0
inf
yE
qn
]x,x+]
h
qn
(y)h
qn
(x)
yx
inf
>0
sup
yE
qn
]x,x+]
h
qn
(y)h
qn
(x)
yx
inf
0<<
qn
x
sup
y]x,x+]
h
qn
(y)h
qn
(x)
yx
= h

qn
(x).
Putting these together, we see that if x F
qn


E
+
qn
then (D
+
f)(x) = h

qn
(x) (D

f)(x).
Conventionally setting F
qn
= if E
qn
is empty, the last sentence is true for all q Q and n N, while A

A and

qQ,nN
E
qn
(F
qn


E
+
qn
) are negligible, and (D
+
f)(x) = for every x

A

qQ,nN
E
qn
. So we see that, for
almost every x A, either (D
+
f)(x) = or > (D
+
f)(x) (D

f)(x).
(b) Reecting the above argument left-to-right or up-to-down, we see that, for almost every x A,
either (D

f)(x) = or > (D

f)(x) (D
+
f)(x),
either (D
+
f)(x) = or < (D
+
f)(x) (D

f)(x),
either (D

f)(x) = or < (D

f)(x) (D
+
f)(x),
and also
(D
+
f)(x) (D
+
f)(x), (D

f)(x) (D

f)(x).
For such x, therefore,
(D
+
f)(x) > = (D

f)(x) (D
+
f)(x) < = (D
+
f)(x) = (D

f)(x) R,
and similarly
(D

f)(x) > = (D

f)(x) = (D
+
f)(x) R,
(D
+
f)(x) < = (D
+
f)(x) = (D

f)(x) R,
(D

f)(x) < = (D

f)(x) = (D
+
f)(x) R.
So we have
either (D
+
f)(x) = (D

f)(x) is nite or (D
+
f)(x) = and (D

f)(x) = ,
either (D

f)(x) = (D
+
f)(x) is nite or (D

f)(x) = and (D
+
f)(x) = .
These two dichotomies lead to four possibilities; and since
(D
+
f)(x) = (D

f)(x) is nite, (D

f)(x) = (D
+
f)(x) is nite
can be true together only when all four derivates are equal and nite, we have the four cases listed in the statement of
the theorem.
62 The Fundamental Theorem of Calculus 222X
222X Basic exercises >>>(a) Let F : [0, 1] [0, 1] be the Cantor function (134H). Show that
_
1
0
F

= 0 <
F(1) F(0).
>>>(b) Suppose that a < b in R and that h is a real-valued function such that
_
y
x
h exists and is non-negative whenever
a x y b. Show that h 0 almost everywhere in [a, b].
>>>(c) Suppose that a < b in R and that f, g are integrable complex-valued functions on [a, b] such that
_
x
a
f =
_
x
a
g
for every x [a, b]. Show that f = g almost everywhere in [a, b].
>>>(d) Suppose that a < b in R and that f is a real-valued function which is integrable over [a, b]. Show that the
indenite integral x
_
x
a
f is continuous.
222Y Further exercises (a) Let F
n

nN
be a sequence of non-negative, non-decreasing functions on [0, 1] such
that F(x) =

n=0
F
n
(x) is nite for every x [0, 1]. Show that

n=0
F

n
(x) = F

(x) for almost every x [0, 1].


(Hint: take n
k

kN
such that

k=0
F(1) G
k
(1) < , where G
k
=

n
k
j=0
F
j
, and set H(x) =

k=0
F(x) G
k
(x).
Observe that

k=0
F

(x) G

k
(x) H

(x) whenever all the derivatives are dened, so that F

= lim
k
G

k
almost
everywhere.)
(b) Let F : [0, 1] R be a continuous non-decreasing function. (i) Show that if c R then C = (x, y) : x, y
[0, 1], F(y) F(x) = c is connected. (Hint: A set A R
r
is connected if there is no continuous surjection h : A
0, 1. Show that if h : C 0, 1 is continuous then it is of the form (x, y) h
1
(x) for some continuous function h
1
.)
(ii) Now suppose that F(0) = 0, F(1) = 1 and that G : [0, 1] [0, 1] is a second continuous non-decreasing function
with G(0) = 0, G(1) = 1. Show that for any n 1 there are x, y [0, 1] such that F(y) F(x) = G(y) G(x) =
1
n
.
(c) Let f, g be non-negative integrable functions on R, and n 1. Show that there are u < v in [, ] such that
_
v
u
f =
1
n
_
f and
_
v
u
g =
1
n
_
g.
(d) Let f : R R be measurable. Show that H = domf

is a measurable set and that f

is a measurable function.
(e) Construct a Borel measurable function f : [0, 1] 1, 0, 1 such that each of the four possibilities described
in Theorem 222L occurs on a set of measure
1
4
.
222 Notes and comments I have relegated to an exercise (222Xd) the fundamental fact that an indenite integral
x
_
x
a
f is always continuous; this is not strictly speaking needed in this section, and a much stronger result is given
in 225E. There is also much more to be said about monotonic functions, to which I will return in 224. What we
need here is the fact that they are dierentiable almost everywhere (222A), which I prove by applying Vitalis theorem
three times, once in part (b) of the proof and twice in part (c). Following this, the arguments of 222C-222E form a
ne series of exercises in the central ideas of Volume 1, using the concept of integration over a (measurable) subset,
Fatous Lemma (part (d) of the proof of 222C), Lebesgues Dominated Convergence Theorem (parts (ii) and (iii) of the
proof of 222D) and the approximation of Lebesgue measurable sets by open sets (part (iii) of the proof of 222D). Of
course knowing that
d
dx
_
x
a
f = f(x) almost everywhere is not at all the same thing as knowing that this holds for any
particular x, and when we come to dierentiate any particular indenite integral we generally turn to 222H rst; the
point of 222E is that it applies to wildly discontinuous functions, for which more primitive methods give no information
at all.
The Denjoy-Young-Saks theorem (222L) is one of the starting points of a ourishing theory of typical phenomena
in real analysis. It is easy to build a function f with any prescribed set of values for (D
+
f)(0), (D
+
f)(0), (D

f)(0)
and (D

f)(0) (subject, of course, to the requirements (D


+
f)(0) (D
+
f)(0) and (D

f)(0) (D

f)(0)). But 222L


tells us that such combinations as (D

f)(x) = (D
+
f)(x) = (what we might call f

(x) = ) can occur only on


negligible sets. The four easily realized possibilities in 222L (see 222Ye) are the only ones which can appear at points
which are typical for the given function, from the point of view of Lebesgue measure. For a monotonic function, 222A
tells us more: at typical points for a monotonic function, the function is actually dierentiable. In the next section we
shall see some more ways of generating negligible and conegligible sets from a given set or function, leading to further
renements of the idea.
223C Lebesgues density theorems 63
223 Lebesgues density theorems
I now turn to a group of results which may be thought of as corollaries of Theorem 222E, but which also have a
vigorous life of their own, including the possibility of signicant generalizations which will be treated in Chapter 26.
The idea is that any measurable function f on R is almost everywhere continuous in a variety of very weak senses;
for almost every x, the value f(x) is determined by the behaviour of f near x, in the sense that f(y) f(x) for most
y near x. I should perhaps say that while I recommend this work as a preparation for Chapter 26, and I also rely on it
in Chapter 28, I shall not refer to it again in the present chapter, so that readers in a hurry to characterize indenite
integrals may proceed directly to 224.
223A Lebesgues Density Theorem: integral form Let I be an interval in R, and let f be a real-valued
function which is integrable over I. Then
f(x) = lim
h0
1
h
_
x+h
x
f = lim
h0
1
h
_
x
xh
f = lim
h0
1
2h
_
x+h
xh
f
for almost every x I.
proof Setting F(x) =
_
I],x[
f, we know from 222G that
f(x) = F

(x) = lim
h0
1
h
(F(x +h) F(x)) = lim
h0
1
h
_
x+h
x
f
= lim
h0
1
h
(F(x) F(x h)) = lim
h0
1
h
_
x
xh
f
= lim
h0
1
2h
(F(x +h) F(x h)) = lim
h0
1
2h
_
x+h
xh
f
for almost every x I.
223B Corollary Let E R be a measurable set. Then
lim
h0
1
2h
(E [x h, x +h]) = 1 for almost every x E,
lim
h0
1
2h
(E [x h, x +h]) = 0 for almost every x R E.
proof Take n N. Applying 223A to f = (E [n, n]), we see that
lim
h0
1
2h
_
x+h
xh
f = lim
h0
1
2h
(E [x h, x +h])
whenever x ]n, n[ and either limit exists, so that
lim
h0
1
2h
(E [x h, x +h]) = 1 for almost every x E [n, n],
lim
h0
1
2h
(E [x h, x +h]) = 0 for almost every x [n, n] E.
As n is arbitrary, we have the result.
Remark For a measurable set E R, a point x such that lim
h0
1
2h
(E [x h, x + h]) = 1 is sometimes called a
density point of E.
223C Corollary Let f be a measurable real-valued function dened almost everywhere in R. Then for almost every
x R,
lim
h0
1
2h
y : y domf, [y x[ h, [f(y) f(x)[ = 1,
lim
h0
1
2h
y : y domf, [y x[ h, [f(y) f(x)[ = 0
64 The Fundamental Theorem of Calculus 223C
for every > 0.
proof For q, q

Q, set
D
qq
= x : x domf, q f(x) < q

,
so that D
qq
is measurable,
C
qq
= x : x D
qq
, lim
h0
1
2h
(D
qq
[x h, x +h]) = 1,
so that D
qq
C
qq
is negligible, by 223B; now set
C = domf

q,q

Q
(D
qq
C
qq
),
so that C is conegligible. If x C and > 0, then there are q, q

Q such that f(x) q f(x) < q

f(x) + ,
so that x belongs to D
qq
and therefore to C
qq
, and now
liminf
h0
1
2h
y : y domf [x h, x +h], [f(y) f(x)[
liminf
h0
1
2h
(D
qq
[x h, x +h]) = 1,
so
lim
h0
1
2h
y : y domf [x h, x +h], [f(y) f(x)[ = 1.
It follows at once that
lim
h0
1
2h
y : y domf [x h, x +h], [f(y) f(x)[ > = 0
for almost every x; but since is arbitrary, this is also true of
1
2
, so in fact
lim
h0
1
2h
y : y domf [x h, x +h], [f(y) f(x)[ = 0
for almost every x.
223D Theorem Let I be an interval in R, and let f be a real-valued function which is integrable over I. Then
lim
h0
1
2h
_
x+h
xh
[f(y) f(x)[dy = 0
for almost every x I.
proof (a) Suppose rst that I is a bounded open interval ]a, b[. For each q Q, set g
q
(x) = [f(x)q[ for x I domf;
then g is integrable over I, and
lim
h0
1
2h
_
x+h
xh
g
q
= g
q
(x)
for almost every x I, by 223A. Setting
E
q
= x : x I domf, lim
h0
1
2h
_
x+h
xh
g
q
= g
q
(x),
we have I E
q
negligible, so I E is negligible, where E =

qQ
E
q
. Now
lim
h0
1
2h
_
x+h
xh
[f(y) f(x)[dy = 0
for every x E. PPP Take x E and > 0. Then there is a q Q such that [f(x) q[ , so that
[f(y) f(x)[ [f(y) q[ + = g
q
(y) +
for every y I domf, and
limsup
h0
1
2h
_
x+h
xh
[f(y) f(x)[dy limsup
h0
1
2h
_
x+h
xh
g
q
(y) + dy
= +g
q
(x) 2.
As is arbitrary,
223Xf Lebesgues density theorems 65
lim
h0
1
2h
_
x+h
xh
[f(y) f(x)[dy = 0,
as required. QQQ
(b) If I is an unbounded open interval, apply (a) to the intervals I
n
= I ]n, n[ to see that the limit is zero almost
everywhere in every I
n
, and therefore on I. If I is an arbitrary interval, note that it diers by at most two points from
an open interval, and that since we are looking only for something to happen almost everywhere we can ignore these
points.
Remark The set
x : x domf, lim
h0
1
2h
_
x+h
xh
[f(y) f(x)[dy = 0
is sometimes called the Lebesgue set of f.
223E Complex-valued functions I have expressed the results above in terms of real-valued functions, this being
the most natural vehicle for the ideas. However there are applications of great importance in which the functions
involved are complex-valued, so I spell out the relevant statements here. In all cases the proof is elementary, being
nothing more than applying the corresponding result (223A, 223C or 223D) to the real and imaginary parts of the
function f.
(a) Let I be an interval in R, and let f be a complex-valued function which is integrable over I. Then
f(x) = lim
h0
1
h
_
x+h
x
f = lim
h0
1
h
_
x
xh
f = lim
h0
1
2h
_
x+h
xh
f
for almost every x I.
(b) Let f be a measurable complex-valued function dened almost everywhere in R. Then for almost every x R,
lim
h0
1
2h
y : y domf, [y x[ h, [f(y) f(x)[ = 0
for every > 0.
(c) Let I be an interval in R, and let f be a complex-valued function which is integrable over I. Then
lim
h0
1
2h
_
x+h
xh
[f(y) f(x)[dy = 0
for almost every x I.
223X Basic exercises >>>(a) Let E [0, 1] be a measurable set for which there is an > 0 such that (E[a, b])
(b a) whenever 0 a b 1. Show that E = 1.
(b) Let A R be any set. Show that lim
h0
1
2h

(A[xh, x+h]) = 1 for almost every x A. (Hint: apply 223B


to a measurable envelope E of A.)
(c) Let E, F R be measurable sets, and x R a point which is a density point of both. Show that x is a density
point of E F.
(d) Let E R be a non-negligible measurable set. Show that for any n N there is a > 0 such that

in
E +x
i
is non-empty whenever x
0
, . . . , x
n
R are such that [x
i
x
j
[ for all i, j n. (Hint: nd a non-trivial interval I
such that (E I) >
n
n+1
I.)
(e) Let f be any real-valued function dened almost everywhere in R. Show that lim
h0
1
2h

y : y domf, [yx[
h, [f(y) f(x)[ = 1 for almost every x R. (Hint: use the argument of 223C, but with 223Xb in place of 223B.)
>>>(f ) Let I be an interval in R, and let f be a real-valued function which is integrable over I. Show that
lim
h0
1
h
_
x+h
x
[f(y) f(x)[dy = 0 for almost every x I.
66 The Fundamental Theorem of Calculus 223Xg
(g) Let E, F R be measurable sets, and suppose that F is bounded and of non-zero measure. Let x R be such
that lim
h0
1
2h
(E[xh, x+h]) = 1. Show that lim
h0
(E(x+hF))
hF
= 1. (Hint: it helps to know that (hF) = hF
(134Ya, 263A). Show that if F [M, M], then
1
2hM
(E [x hM, x +hM]) 1
F
2M
_
1
(E(x+hF))
hF
_
.)
(Compare 223Ya.)
(h) Let f be a real-valued function which is integrable over R, and E be the Lebesgue set of f. Show that
lim
h0
1
2h
_
x+h
xh
[f(t) c[dt = [f(x) c[ for every x E and c R.
(i) Let f be an integrable real-valued function dened almost everywhere in R. Let x domf be such that
lim
n
n
2
_
x+1/n
x1/n
[f(y) f(x)[ = 0. Show that x belongs to the Lebesgue set of f.
(j) Let f be an integrable real-valued function dened almost everywhere in R, and x any point of the Lebesgue set
of f. Show that for every > 0 there is a > 0 such that whenever I is a non-trivial interval and x I [x, x+],
then [f(x)
1
I
_
I
f[ .
223Y Further exercises (a) Let E, F R be measurable sets, and suppose that 0 < F < . Let x R be
such that lim
h0
1
2h
(E [x h, x +h]) = 1. Show that
lim
h0
(E(x+hF))
hF
= 1.
(Hint: apply 223Xg to sets of the form F [M, M].)
(b) Let T be the family of measurable sets G R such that every point of G is a density point of G. (i) Show that
T is a topology on R. (Hint: take ( T. By 215B(iv) there is a countable (
0
( such that (G

(
0
) = 0 for every
G (. Show that

( x : limsup
h0
1
2h
(

(
0
[x h, x +h]) > 0,
so that (

(
0
) = 0.) (ii) Show that a function f : R R is measurable i it is T-continuous at almost every
x R. (T is the density topology on R. See 414P in Volume 4.)
(c) Show that if f : [a, b] R is bounded and continuous for the density topology on R, then f(x) =
d
dx
_
x
a
f for
every x ]a, b[.
(d) Show that a function f : R R is continuous for the density topology at x R i lim
h0
1
2h

y : y
[x h, x +h], [f(y) f(x)[ = 0 for every > 0.
(e) A set A R is porous at a point x R if limsup
yx
(y,A)
|yx|
> 0, where (y, A) = inf
aA
[y a[. (Take
(y, ) = .) Show that if A is porous at every x A then A is negligible.
(f ) For a measurable set E R write int*E for the set of its density points. Show that if E, F R are measurable
then (i) int*(EF) = int*Eint*F (ii) int*E int*F i (EF) = 0 (iii) (Eint*E) = 0 (iv) int*(int*E) = int*E
(v) for every compact set K int*E there is a compact set L K E such that K int*L.
(g) Let f be an integrable real-valued function dened almost everywhere in R, and x any point of the Lebesgue set
of f. Show that for every > 0 there is a > 0 such that [f(x)
_
g
_
f g[
_
g whenever g : R [0, [ is such that
g is non-decreasing on ], x], non-increasing on [x, [ and zero outside [x, x+]. (Hint: express g as a limit almost
everywhere of functions of the form
g(x)
n+1

n
i=0
]a
i
, b
i
[, where x a
0
. . . a
n
x b
n
. . . b
0
x +.)
(h) For each integrable real-valued function f dened almost everywhere in R, let E
f
be the Lebesgue set of f.
Show that E
f
E
g
E
f+g
, E
f
E
|f|
for all integrable f, g.
224C Functions of bounded variation 67
223 Notes and comments The results of this section can be thought of as saying that a measurable function is in
some sense almost continuous; indeed, 223Yb is an attempt to make this notion precise. For an integrable function
we have stronger results, of which the furthest-reaching seems to be 223D/223Ec.
There are r-dimensional versions of all these theorems, using balls centered on x in place of intervals [xh, x+h]; I
give these in 261C-261E. A new idea is needed for the r-dimensional version of Lebesgues density theorem (261C), but
the rest of the generalization is straightforward. A less natural, and less important, extension, also in 261, involves
functions dened on non-measurable sets (compare 223Xb-223Xe).
224 Functions of bounded variation
I turn now to the second of the two problems to which this chapter is devoted: the identication of those real
functions which are indenite integrals. I take the opportunity to oer a brief introduction to the theory of functions
of bounded variation, which are interesting in themselves and will be important in Chapter 28. I give the basic
characterization of these functions as dierences of monotonic functions (224D), with a representative sample of their
elementary properties.
224A Denition Let f be a real-valued function and D a subset of R. I dene Var
D
(f), the (total) variation of
f on D, as follows. If D domf = , Var
D
(f) = 0. Otherwise, Var
D
(f) is
sup

n
i=1
[f(a
i
) f(a
i1
)[ : a
0
, a
1
, . . . , a
n
D domf, a
0
a
1
. . . a
n
,
allowing Var
D
(f) = . If Var
D
(f) is nite, we say that f is of bounded variation on D. If the context seems clear,
I may write Var f for Var
domf
(f), and say that f is simply of bounded variation if this is nite.
224B Remarks (a) In the present chapter, we shall virtually exclusively be concerned with the case in which D
is a bounded closed interval included in domf. The general formulation will be useful for some technical questions
arising in Chapter 28; but if it makes you more comfortable, you will lose nothing by supposing for the moment that
D is an interval.
(b) Clearly
Var
D
(f) = Var
Ddomf
(f) = Var(fD)
for all D, f.
224C Proposition (a) If f, g are two real-valued functions and D R, then
Var
D
(f +g) Var
D
(f) + Var
D
(g).
(b) If f is a real-valued function, D R and c R then Var
D
(cf) = [c[ Var
D
(f).
(c) If f is a real-valued function, D R and x R then
Var
D
(f) Var
D],x]
(f) + Var
D[x,[
(f),
with equality if x D domf.
(d) If f is a real-valued function and D D

R then Var
D
(f) Var
D
(f).
(e) If f is a real-valued function and D R, then [f(x) f(y)[ Var
D
(f) for all x, y D domf; so if f is of
bounded variation on D then f is bounded on D domf and (if D domf ,= )
sup
yDdomf
[f(y)[ [f(x)[ + Var
D
(f)
for every x D domf.
(f) If f is a monotonic real-valued function and D R meets domf, then Var
D
(f) = sup
xDdomf
f(x)
inf
xDdomf
f(x).
proof (a) If D dom(f + g) = this is trivial, because Var
D
(f) and Var
D
(g) are surely non-negative. Otherwise, if
a
0
. . . a
n
in D dom(f +g), then
n

i=1
[(f +g)(a
i
) (f +g)(a
i1
)[
n

i=1
[f(a
i
) f(a
i1
)[ +
n

i=1
[g(a
i
) g(a
i1
)[
Var
D
(f) + Var
D
(g);
as a
0
, . . . , a
n
are arbitrary, Var
D
(f +g) Var
D
(f) + Var
D
(g).
68 The Fundamental Theorem of Calculus 224C
(b)

n
i=1
[(cf)(a
i
) (cf)(a
i1
)[ = [c[

n
i=1
[f(a
i
) f(a
i1
)[
whenever a
0
. . . a
n
in D domf.
(c)(i) If either D], x] domf or D[x, [ domf is empty, this is trivial. If a
0
. . . a
m
in D], x]
domf, b
0
. . . b
n
in D [x, [ domf, then
m

i=1
[f(a
i
) f(a
i1
)[ +
n

j=1
[f(b
i
) f(b
i1
)[
m+n+1

i=1
[f(a
i
) f(a
i1
)[
Var
[a,b]
(f),
if we write a
i
= b
im1
for m+ 1 i m+n + 1. So
Var
D],x]
(f) + Var
D[x,[
(f) Var
D
(f).
(ii) Now suppose that x D domf. If a
0
. . . a
n
in D domf, and a
0
x a
n
, let k be such that
x [a
k1
, a
k
]; then
n

i=1
[f(a
i
) f(a
i1
)[
k1

i=1
[f(a
i
) f(a
i1
)[ +[f(x) f(a
k1
)[
+[f(a
k
) f(x)[ +
n

i=k+1
[f(a
i
) f(a
i1
)[
Var
D],x]
(f) + Var
D[x,[
(f)
(counting empty sums

0
i=1
,

n
i=n+1
as 0). If x a
0
then

n
i=1
[f(a
i
) f(a
i1
)[ Var
D[x,[
(f); if x a
n
then

n
i=1
[f(a
i
) f(a
i1
)[ Var
D],x]
(f). Thus

n
i=1
[f(a
i
) f(a
i1
)[ Var
D],x]
(f) + Var
D[x,[
(f)
in all cases; as a
0
, . . . , a
n
are arbitrary,
Var
D
(f) Var
D],x]
(f) + Var
D[x,[
(f).
So the two sides are equal.
(d) is trivial.
(e) If x, y D domf and x y then
[f(x) f(y)[ = [f(y) f(x)[ Var
D
(f)
by the denition of Var
D
; and the same is true if y x. So of course [f(y)[ [f(x)[ + Var
D
(f).
(f ) If f is non-decreasing, then
Var
D
(f) = sup
n

i=1
[f(a
i
) f(a
i1
)[ : a
0
, a
1
, . . . , a
n
D domf, a
0
a
1
. . . a
n

= sup
n

i=1
f(a
i
) f(a
i1
) : a
0
, a
1
, . . . , a
n
D domf, a
0
a
1
. . . a
n

= supf(b) f(a) : a, b D domf, a b


= sup
bDdomf
f(b) inf
aDdomf
f(a).
If f is non-increasing then
224D Functions of bounded variation 69
Var
D
(f) = sup
n

i=1
[f(a
i
) f(a
i1
)[ : a
0
, a
1
, . . . , a
n
D domf, a
0
a
1
. . . a
n

= sup
n

i=1
f(a
i1
) f(a
i
) : a
0
, a
1
, . . . , a
n
D domf, a
0
a
1
. . . a
n

= supf(a) f(b) : a, b D domf, a b


= sup
aDdomf
f(a) inf
bDdomf
f(b).
224D Theorem For any real-valued function f and any set D R, the following are equiveridical:
(i) there are two bounded non-decreasing functions f
1
, f
2
: R R such that f = f
1
f
2
on D domf;
(ii) f is of bounded variation on D;
(iii) there are bounded non-decreasing functions f
1
, f
2
: R R such that f = f
1
f
2
on D domf and
Var
D
(f) = Var f
1
+ Var f
2
.
proof (i)(ii) If f : R R is bounded and non-decreasing, then Var f = sup
xR
f(x) inf
xR
f(x) is nite. So if f
agrees on D domf with f
1
f
2
where f
1
and f
2
are bounded and non-decreasing, then
Var
D
(f) = Var
Ddomf
(f) Var
Ddomf
(f
1
) + Var
Ddomf
(f
2
)
Var f
1
+ Var f
2
< ,
using (a), (b) and (d) of 224C.
(ii)(iii) Suppose that f is of bounded variation on D. Set D

= D domf. If D

= we can take both f


j
to be
the zero function, so henceforth suppose that D

,= . Write
g(x) = Var
D],x]
(f)
for x D

. Then g
1
= g +f and g
2
= g f are both non-decreasing. PPP If a, b D

and a b, then
g(b) = g(a) + Var
D[a,b]
(f) g(a) +[f(b) f(a)[.
So
g
1
(b) g
1
(a) = g(b) g(a) +f(b) f(a), g
2
(b) g
2
(a) = g(b) g(a) f(b) +f(a)
are both non-negative. QQQ
Now there are non-decreasing functions h
1
, h
2
: R R, extending g
1
, g
2
respectively, such that Var h
j
= Var g
j
for
both j. PPP f is bounded on D, by 224Ce, and g is bounded just because Var
D
(f) < , so that g
j
is bounded. Set
c
j
= inf
xD
g
j
(x) and
h
j
(x) = sup(c
j
g
j
(y) : y D

, y x)
for every x R; this works. QQQ Observe that for x D

,
h
1
(x) +h
2
(x) = g
1
(x) +g
2
(x) = g(x) +f(x) +g(x) f(x) = 2g(x),
h
1
(x) h
2
(x) = 2f(x).
Now, because g
1
and g
2
are non-decreasing,
sup
xD
g
1
(x) + sup
xD
g
2
(x) = sup
xD
g
1
(x) +g
2
(x) = 2 sup
xD
g(x),
inf
xD
g
1
(x) + inf
xD
g
2
(x) = inf
xD
g
1
(x) +g
2
(x) = 2 inf
xD
g(x) 0.
But this means that
Var h
1
+ Var h
2
= Var g
1
+ Var g
2
= 2 Var g 2 Var
D
(f),
using 224Cf three times. So if we set f
j
(x) =
1
2
h
j
(x) for j 1, 2 and x R, we shall have non-decreasing functions
such that
f
1
(x) f
2
(x) = f(x) for x D

, Var f
1
+ Var f
2
=
1
2
Var h
1
+
1
2
Var h
2
Var
D
(f).
Since we surely also have
70 The Fundamental Theorem of Calculus 224D
Var
D
(f) Var
D
(f
1
) + Var
D
(f
2
) Var f
1
+ Var f
2
,
we see that Var
D
(f) = Var f
1
+ Var f
2
, and (iii) is true.
(iii)(i) is trivial.
224E Corollary Let f be a real-valued function and D any subset of R. If f is of bounded variation on D, then
lim
xa
Var
D]a,x]
(f) = lim
xa
Var
D[x,a[
(f) = 0
for every a R, and
lim
a
Var
D],a]
(f) = lim
a
Var
D[a,[
(f) = 0.
proof (a) Consider rst the case in which D = domf = R and f is a bounded non-decreasing function. Then
Var
D]a,x]
(f) = sup
y]a,x]
f(x) f(y) = f(x) inf
y>a
f(y) = f(x) lim
ya
f(y),
so of course
lim
xa
Var
D]a,x]
(f) = lim
xa
f(x) lim
ya
f(y) = 0.
In the same way
lim
xa
Var
D[x,a[
(f) = lim
ya
f(y) lim
xa
f(x) = 0,
lim
a
Var
D],a]
(f) = lim
a
f(a) lim
y
f(y) = 0,
lim
a
Var
D[a,[
(f) = lim
y
f(y) lim
a
f(a) = 0.
(b) For the general case, dene f
1
, f
2
from f and D as in 224D. Then for every interval I we have
Var
DI
(f) Var
I
(f
1
) + Var
I
(f
2
),
so the results for f follow from those for f
1
and f
2
as established in part (a) of the proof.
224F Corollary Let f be a real-valued function of bounded variation on [a, b], where a < b. If domf meets every
interval ]a, a +] with > 0, then
lim
tdomf,ta
f(t)
is dened in R. If domf meets [b , b[ for every > 0, then
lim
tdomf,tb
f(t)
is dened in R.
proof Let f
1
, f
2
: R R be non-decreasing functions such that f = f
1
f
2
on [a, b] domf. Then
lim
tdomf,ta
f(t) = lim
ta
f
1
(t) lim
ta
f
2
(t) = inf
t>a
f
1
(t) inf
t>a
f
2
(t),
lim
tdomf,tb
f(t) = lim
tb
f
1
(t) lim
tb
f
2
(t) = sup
t<b
f
1
(t) sup
t<b
f
2
(t).
224G Corollary Let f, g be real functions and D a subset of R. If f and g are of bounded variation on D, so is
f g.
proof (a) The point is that there are non-negative bounded non-decreasing functions f
1
, f
2
: R R such that
f = f
1
f
2
on D domf. PPP We know that there are bounded non-decreasing h
1
, h
2
such that f = h
1
h
2
on
D domf. Set
i
= inf
xR
h
i
(x) for i = 1, 2,

1
= max(
1

2
, 0),
2
= max(
2

1
, 0),
f
1
= h
1

1
+
1
, f
2
= h
1

2
+
2
;
this works. QQQ
(b) Now taking similar functions g
1
, g
2
such that g = g
1
g
2
on D domg, we have
f g = f
1
g
1
f
2
g
1
f
1
g
2
+f
2
g
2
everywhere in D dom(f g) = D domf domg; but all the f
i
g
j
are bounded non-decreasing functions, so of
bounded variation, and f g must be of bounded variation on D.
224J Functions of bounded variation 71
224H Proposition Let f : D R be a function of bounded variation, where D R. Then f is continuous at all
except countably many points of D.
proof For n 1 set
A
n
= x : x D, for every > 0 there is a y D [x , x +]
such that [f(y) f(x)[
1
n
.
Then #(A
n
) nVar f. PPP??? Otherwise, we can nd distinct x
0
, . . . , x
k
A
n
with k + 1 > nVar f. Order these so
that x
0
< x
1
< . . . < x
k
. Set =
1
2
min
1ik
x
i
x
i1
> 0. For each i, there is a y
i
D [x
i
, x
i
+ ] such that
[f(y
i
) f(x
i
)[
1
n
. Take x

i
, y

i
to be x
i
, y
i
in order, so that x

i
< y

i
. Now
x

0
y

0
x

1
y

1
. . . x

k
y

k
,
and
Var f

k
i=0
[f(y

i
) f(x

i
)[ =

k
i=0
[f(y
i
) f(x
i
)[
1
n
(k + 1) > Var f,
which is impossible. XXXQQQ
It follows that A =

nN
A
n
is countable, being a countable union of nite sets. But A is exactly the set of points
of D at which f is not continuous.
224I Theorem Let I R be an interval, and f : I R a function of bounded variation. Then f is dierentiable
almost everywhere in I, and f

is integrable over I, with


_
I
[f

[ Var
I
(f).
proof (a) Let f
1
and f
2
be non-decreasing functions such that f = f
1
f
2
everywhere in I (224D). Then f
1
and f
2
are
dierentiable almost everywhere (222A). At any point of I except possibly its endpoints, if any, f will be dierentiable
if f
1
and f
2
are, so f

(x) is dened for almost every x I.


(b) Set F(x) = Var
I],x]
f for x R. If x, y I and x y, then
F(y) F(x) = Var
[x,y]
f [f(y) f(x)[,
by 224Cc; so F

(x) [f

(x)[ whenever x is an interior point of I and both derivatives exist, which is almost everywhere.
So
_
I
[f

[
_
I
F

. But if a, b I and a b,
_
b
a
F

F(b) F(a) F(b) Var f.


Now I is expressible as

nN
[a
n
, b
n
] where a
n+1
a
n
b
n
b
n+1
for every n. So
_
I
[f

[
_
I
F

=
_
F

I
=
_
sup
nN
F

[a
n
, b
n
] = sup
nN
_
F

[a
n
, b
n
]
(by B.Levis theorem)
= sup
nN
_
b
n
a
n
F

Var
I
(f).
224J The next result is not needed in this chapter, but is one of the most useful properties of functions of bounded
variation, and will be used repeatedly in Chapter 28.
Proposition Let f, g be real-valued functions dened on subsets of R, and suppose that g is integrable over an
interval [a, b], where a < b, and f is of bounded variation on ]a, b[ and dened almost everywhere in ]a, b[. Then f g
is integrable over [a, b], and

_
b
a
f g

_
lim
xdomf,xb
[f(x)[ + Var
]a,b[
(f)
_
sup
c[a,b]

_
c
a
g

.
72 The Fundamental Theorem of Calculus 224J
proof (a) By 224F, l = lim
xdomf,xb
f(x) is dened. Write M = [l[ + Var
]a,b[
(f). Note that if y is any point of
domf ]a, b[,
[f(y)[ [f(x)[ +[f(x) f(y)[ [f(x)[ + Var
]a,b[
(f) M
as x b in domf, so [f(y)[ M. Moreover, f is measurable on ]a, b[, because there are bounded monotonic functions
f
1
, f
2
: R R such that f = f
1
f
2
everywhere in ]a, b[ domf. So f g is measurable and dominated by M[g[, and
is integrable over ]a, b[ or [a, b].
(b) For n N, k 2
n
set a
nk
= a + 2
n
k(b a), and for 1 k 2
n
choose x
nk
domf ]a
n,k1
, a
nk
]. Dene
f
n
: ]a, b] R by setting f
n
(x) = f(x
nk
) if 1 k 2
n
and x ]a
n,k1
, a
nk
]. Then f(x) = lim
n
f
n
(x) whenever
x ]a, b[ domf and f is continuous at x, which must be almost everywhere (224H). Note next that all the f
n
are
measurable, and that they are uniformly bounded, in modulus, by M. So f
n
g : n N is dominated by the
integrable function M[g[, and Lebesgues Dominated Convergence Theorem tells us that
_
b
a
f g = lim
n
_
b
a
f
n
g.
(c) Fix n N for the moment. Set K = sup
c[a,b]
[
_
c
a
g[. (Note that K is nite because c
_
c
a
g is continuous.)
Then

_
b
a
f
n
g

2
n

k=1
_
a
nk
a
n,k1
f
n
g

2
n

k=1
f(x
nk
)(
_
a
nk
a
g
_
a
n,k1
a
g)

2
n
1

k=1
(f(x
nk
) f(x
n,k+1
))
_
a
nk
a
g +f(x
n,2
n)
_
b
a
g

f(x
n,2
n)

_
b
a
g

+
2
n
1

k=1

f(x
n,k+1
) f(x
nk
)

_
a
nk
a
g

([f(x
n,2
n)[ + Var
]a,b[
(f))K MK
as n .
(d) Now
[
_
b
a
f g[ = lim
n
[
_
b
a
f
n
g[ MK,
as required.
224K Complex-valued functions So far I have taken all functions to be real-valued. This is adequate for the
needs of the present chapter, but in Chapter 28 we shall need to look at complex-valued functions of bounded variation,
and I should perhaps spell out the (elementary) adaptations involved in the extension to the complex case.
(a) Let D be a subset of R and f a complex-valued function. The variation of f on D, Var
D
(f), is zero if
D domf = , and otherwise is
sup

n
j=1
[f(a
j
) f(a
j1
)[ : a
0
a
1
. . . a
n
in D domf,
allowing . If Var
D
(f) is nite, we say that f is of bounded variation on D.
(b) Just as in the real case, a complex-valued function of bounded variation must be bounded, and
Var
D
(f +g) Var
D
(f) + Var
D
(g),
Var
D
(cf) = [c[ Var
D
(f),
Var
D
(f) Var
D],x]
(f) + Var
D[x,[
(f)
for every x R, with equality if x D domf,
Var
D
(f) Var
D
(f) whenever D D

;
the arguments of 224C go through unchanged.
224Xh Functions of bounded variation 73
(c) A complex-valued function is of bounded variation i its real and imaginary parts are both of bounded variation
(because
max(Var
D
(!e f), Var
D
(Jmf)) Var
D
(f) Var
D
(!e f) + Var
D
(Jmf).)
So a complex-valued function f is of bounded variation on D i there are bounded non-decreasing functions f
1
, . . . , f
4
:
R R such that f = f
1
f
2
+if
3
if
4
on D (224D).
(d) Let f be a complex-valued function and D any subset of R. If f is of bounded variation on D, then
lim
xa
Var
D]a,x]
(f) = lim
xa
Var
D[x,a[
(f) = 0
for every a R, and
lim
a
Var
D],a]
(f) = lim
a
Var
D[a,[
(f) = 0.
(Apply 224E to the real and imaginary parts of f.)
(e) Let f be a complex-valued function of bounded variation on [a, b], where a < b. If domf meets every in-
terval ]a, a +] with > 0, then lim
tdomf,ta
f(t) is dened in C. If domf meets [b , b[ for every > 0, then
lim
tdomf,tb
f(t) is dened in C. (Apply 224F to the real and imaginary parts of f.)
(f ) Let f, g be complex functions and D a subset of R. If f and g are of bounded variation on D, so is f g. (For
f g is expressible as a linear combination of the four products !e f !e g, . . . , Jmf Jmg, to each of which we
can apply 224G.)
(g) Let I R be an interval, and f : I C a function of bounded variation. Then f is dierentiable almost
everywhere in I, and
_
I
[f

[ Var
I
(f). (As 224I.)
(h) Let f and g be complex-valued functions dened on subsets of R, and suppose that g is integrable over an
interval [a, b], where a < b, and f is of bounded variation on ]a, b[ and dened almost everywhere in ]a, b[. Then f g
is integrable over [a, b], and

_
b
a
f g

_
lim
xdomf,xb
[f(x)[ + Var
]a,b[
(f)
_
sup
c[a,b]

_
c
a
g

.
(The argument of 224J applies virtually unchanged.)
224X Basic exercises >>>(a) Set f(x) = x
2
sin
1
x
2
for x ,= 0, f(0) = 0. Show that f : R R is dierentiable
everywhere and uniformly continuous, but is not of bounded variation on any non-trivial interval containing 0.
(b) Give an example of a non-negative function g : [0, 1] [0, 1], of bounded variation, such that

g is not of
bounded variation.
(c) Show that if f is any real-valued function dened on a subset of R, there is a function

f : R R, extending f,
such that Var

f = Var f. Under what circumstances is

f unique?
(d) Let f : D R be a function of bounded variation, where D R is a non-empty set. Show that if inf
xD
[f(x)[ >
0 then 1/f is of bounded variation.
(e) Let f : [a, b] R be a continuous function, where a b in R. Show that if c < Var f then there is a > 0 such
that

n
i=1
[f(a
i
) f(a
i1
)[ c whenever a = a
0
a
1
. . . a
n
= b and max
1in
a
i
a
i1
.
(f ) Let f
n

nN
be a sequence of real functions, and set f(x) = lim
n
f
n
(x) whenever the limit is dened. Show
that Var f liminf
n
Var f
n
.
(g) Let f be a real-valued function which is integrable over an interval [a, b] R. Set F(x) =
_
x
a
f for x [a, b].
Show that Var F =
_
b
a
[f[. (Hint: start by checking that Var F
_
[f[; for the reverse inequality, consider the case
f 0 rst.)
(h) Show that if f is a real-valued function dened on a non-empty set D R, then
Var f = sup[

n
i=1
(1)
i
(f(a
i
) f(a
i1
))[ : a
0
a
1
. . . a
n
in D.
74 The Fundamental Theorem of Calculus 224Xi
(i) Let f be a real-valued function which is integrable over a bounded interval [a, b] R. Show that
_
b
a
[f[ =sup[

n
i=1
(1)
i
_
a
i
a
i1
f[ : a = a
0
a
1
a
2
. . . a
n
= b.
(Hint: put 224Xg and 224Xh together.)
(j) Let f and g be real-valued functions dened on subsets of R, and suppose that g is integrable over an interval
[a, b], where a < b, and f is of bounded variation on ]a, b[ and dened almost everywhere in ]a, b[. Show that
[
_
b
a
f g[ (lim
xdomf,xa
[f(x)[ + Var
]a,b[
(f)) sup
c[a,b]
[
_
b
c
g[.
224Y Further exercises (a) Show that if f is any complex-valued function dened on a subset of R, there is a
function

f : R C, extending f, such that Var

f = Var f. Under what circumstances is

f unique?
(b) Let D be any non-empty subset of R, and let V be the space of functions f : D R of bounded variation. For
f V set
|f| = sup[f(t
0
)[ +

n
i=1
[f(t
i
) f(t
i1
)[ : t
0
. . . t
n
D.
Show that (i) | | is a norm on V (ii) V is complete under | | (iii) |f g| |f||g| for all f, g V, so that V is a
Banach algebra.
(c) Let f : R R be a function of bounded variation. Show that there is a sequence f
n

nN
of dierentiable
functions such that lim
n
f
n
(x) = f(x) for every x R, lim
n
_
[f
n
f[ = 0, and Var(f
n
) Var(f) for every
n N. (Hint: start with non-decreasing f.)
(d) For any partially ordered set X and any function f : X R, say that Var
X
(f) = 0 if X = and otherwise
Var
X
(f) = sup

n
i=1
[f(a
i
) f(a
i1
)[ : a
0
, a
1
, . . . , a
n
X, a
0
a
1
. . . a
n
.
State and prove results in this framework generalizing 224D and 224Yb. (Hints: f will be non-decreasing if f(x) f(y)
whenever x y; interpret ], x] as y : y x.)
(e) Let (X, ) be a metric space and f : [a, b] X a function, where a b in R. Set Var
[a,b]
(f) = sup

n
i=1
(f(a
i
), f(a
i1
)) :
a a
0
. . . a
n
b. (i) Show that Var
[a,b]
(f) = Var
[a,c]
(f) + Var
[c,b]
(f) for every c [a, b]. (ii) Show that if
Var
[a,b]
(f) is nite then f is continuous at all but countably many points of [a, b]. (iii) Show that if X is complete and
Var
[a,b]
(f) < then lim
tx
f(t) is dened for every x ]a, b]. (iv) Show that if X is complete then Var
[a,b]
(f) is nite
i f is expressible as a composition gh, where h : [a, b] R is non-decreasing and g : R X is 1-Lipschitz, that is,
(g(c), g(d)) [c d[ for all c, d R.
(f ) Let U be a normed space and a b in R. For functions f : [a, b] U dene Var
[a,b]
(f) as in 224Ye,
using the standard metric (x, y) = |x y| for x, y U. (i) Show that Var
[a,b]
(f + g) Var
[a,b]
(f) + Var
[a,b]
(g),
Var
[a,b]
(cf) = [c[ Var
[a,b]
(f) for all f, g : [a, b] U and all c R. (ii) Show that if V is another normed space and
T : U V is a bounded linear operator then Var
[a,b]
(Tf) |T| Var
[a,b]
(f) for every f : [a, b] U.
(g) Let f : [0, 1] R be a continuous function. For y R set h(y) = #(f
1
[y]) if this is nite, otherwise.
Show that (if we allow as a value of the integral) Var
[0,1]
(f) =
_
h. (Hint: for n N, i < 2
n
set c
ni
= supf(x)
f(y) : x, y [2
n
i, 2
n
(i + 1)], h
ni
(y) = 1 if y f[ [2
n
i, 2
n
(i + 1)[ ], 0 otherwise. Show that c
ni
=
_
h
ni
,
lim
n

2
n
1
i=0
c
ni
= Var f, lim
n

2
n
1
i=0
h
ni
= h.) (See also 226Yb.)
(h) Let be any Lebesgue-Stieltjes measure on R, I R an interval (which may be either open or closed, bounded
or unbounded), and D I a non-empty set. Let V be the space of functions of bounded variation from D to R, and
| | the norm of 224Yb on V. Let g : D R be a function such that
_
[a,b]D
g d exists whenever a b in I, and
K = sup
a,bI,ab
[
_
[a,b]D
g d[. Show that [
_
D
f g d[ K|f| for every f V.
(i) Explain how to apply 224Yh with D = N to obtain Abels theorem that the product of a monotonic sequence
converging to 0 with a series which has bounded partial sums is summable.
(j) Suppose that I R is an interval, and that A
n

nN
is a sequence of sets covering I. Let f : I R be continuous.
Show that Var f

n=0
Var
A
n
f. (Hint: reduce to the case of closed sets A
n
; use Baires theorem (4A2Ma).)
225C Absolutely continuous functions 75
224 Notes and comments I have taken the ideas above rather farther than we need immediately; for the present
chapter, it is enough to consider the case in which D = domf = [a, b] for some interval [a, b] R. The extension to
functions with irregular domains will be useful in Chapter 28, and the extension to irregular sets D, while not important
to us here, is of some interest for instance, taking D = N, we obtain the notion of sequence of bounded variation,
which is surely relevant to problems of convergence and summability.
The central result of the section is of course the fact that a function of bounded variation can be expressed as the
dierence of monotonic functions (224D); indeed, one of the objects of the concept is to characterize the linear span of
the monotonic functions. Nearly everything else here can be derived as easy consequences of this, as in 224E-224G. In
224I and 224Xg we go a little deeper, and indeed some measure theory appears; this is where the ideas here begin to
connect with the real business of this chapter, to be continued in the next section. Another result which is easy enough
in itself, but contains the germs of important ideas, is 224Yg.
In 224Yb I mention a natural development in functional analysis, and in 224Yd-224Yf I suggest further wide-ranging
generalizations.
225 Absolutely continuous functions
We are now ready for a full characterization of the functions that can appear as indenite integrals (225E, 225Xh).
The essential idea is that of absolute continuity (225B). In the second half of the section (225G-225N) I describe some
of the relationships between this concept and those we have already seen.
225A Absolute continuity of the indenite integral I begin with an easy fundamental result from general
measure theory.
Theorem Let (X, , ) be any measure space and f an integrable real-valued function dened on a conegligible subset
of X. Then for any > 0 there are a measurable set E of nite measure and a real number > 0 such that
_
F
[f[
whenever F and (F E) .
proof There is a non-decreasing sequence g
n

nN
of non-negative simple functions such that [f[ =
a.e.
lim
n
g
n
and
_
[f[ = lim
n
_
g
n
. Take n N such that
_
g
n

_
[f[
1
2
. Let M > 0, E be such that E < and
g
n
ME; set = /2M. If F and (F E) , then
_
F
g
n
=
_
g
n
F M(F E)
1
2
;
consequently
_
F
[f[ =
_
F
g
n
+
_
F
[f[ g
n

1
2
+
_
[f[ g
n
.
225B Absolutely continuous functions on R: Denition If [a, b] is a non-empty closed interval in R and
f : [a, b] R is a function, we say that f is absolutely continuous if for every > 0 there is a > 0 such that

n
i=1
[f(b
i
) f(a
i
)[ whenever a a
1
b
1
a
2
b
2
. . . a
n
b
n
b and

n
i=1
b
i
a
i
.
Remark The phrase absolutely continuous is used in various senses in measure theory, closely related (if you look at
them in the right way) but not identical; you will need to keep the context of each denition in clear focus.
225C Proposition Let [a, b] be a non-empty closed interval in R.
(a) If f : [a, b] R is absolutely continuous, it is uniformly continuous.
(b) If f : [a, b] R is absolutely continuous it is of bounded variation on [a, b], so is dierentiable almost everywhere
in [a, b], and its derivative is integrable over [a, b].
(c) If f, g : [a, b] R are absolutely continuous, so are f +g and cf, for every c R.
(d) If f, g : [a, b] R are absolutely continuous so is f g.
(e) If g : [a, b] [c, d] and f : [c, d] R are absolutely continuous, and g is non-decreasing, then the composition
fg : [a, b] R is absolutely continuous.
proof (a) Let > 0. Then there is a > 0 such that

n
i=1
[f(b
i
) f(a
i
)[ whenever a a
1
b
1
a
2
b
2
. . .
a
n
b
n
b and

n
i=1
b
i
a
i
; but of course now [f(y) f(x)[ whenever x, y [a, b] and [x y[ . As is
arbitrary, f is uniformly continuous.
76 The Fundamental Theorem of Calculus 225C
(b) Let > 0 be such that

n
i=1
[f(b
i
) f(a
i
)[ 1 whenever a a
1
b
1
a
2
b
2
. . . a
n
b
n
b and

n
i=1
b
i
a
i
. If a c = c
0
c
1
. . . c
n
d min(b, c +), then

n
i=1
[f(c
i
) f(c
i1
)[ 1, so Var
[c,d]
(f) 1;
accordingly (inducing on k, using 224Cc for the inductive step) Var
[a,min(a+k,b)]
(f) k for every k, and
Var
[a,b]
(f) (b a)/ < .
It follows that f

is integrable, by 224I.
(c)(i) Let > 0. Then there are
1
,
2
> 0 such that

n
i=1
[f(b
i
) f(a
i
)[
1
2

whenever a a
1
b
1
a
2
b
2
. . . a
n
b
n
b and

n
i=1
b
i
a
i

1
,

n
i=1
[g(b
i
) g(a
i
)[
1
2

whenever a a
1
b
1
a
2
b
2
. . . a
n
b
n
b and

n
i=1
b
i
a
i

2
. Set = min(
1
,
2
) > 0, and suppose that
a a
1
b
1
a
2
b
2
. . . a
n
b
n
b and

n
i=1
b
i
a
i
. Then

n
i=1
[(f +g)(b
i
) (f +g)(a
i
)[

n
i=1
[f(b
i
) f(a
i
)[ +

n
i=1
[g(b
i
) g(a
i
)[ .
As is arbitrary, f +g is absolutely continuous.
(ii) Let > 0. Then there is a > 0 such that

n
i=1
[f(b
i
) f(a
i
)[

1+|c|
whenever a a
1
b
1
a
2
b
2
. . . a
n
b
n
b and

n
i=1
b
i
a
i
. Now

n
i=1
[(cf)(b
i
) (cf)(a
i
)[
whenever a a
1
b
1
a
2
b
2
. . . a
n
b
n
b and

n
i=1
b
i
a
i
. As is arbitrary, cf is absolutely
continuous.
(d) By either (a) or (b), f and g are bounded; set M = sup
x[a,b]
[f(x)[, M

= sup
x[a,b]
[g(x)[. Let > 0. Then
there are
1
,
2
> 0 such that

n
i=1
[f(b
i
) f(a
i
)[ whenever a a
1
b
1
a
2
b
2
. . . a
n
b
n
b and

n
i=1
b
i
a
i

1
,

n
i=1
[g(b
i
) g(a
i
)[ whenever a a
1
b
1
a
2
b
2
. . . a
n
b
n
b and

n
i=1
b
i
a
i

2
.
Set = min(
1
,
2
) > 0 and suppose that a a
1
b
1
. . . b
n
b and

n
i=1
b
i
a
i
. Then
n

i=1
[f(b
i
)g(b
i
) f(a
i
)g(a
i
)[ =
n

i=1
[(f(b
i
) f(a
i
))g(b
i
) +f(a
i
)(g(b
i
) g(a
i
))[

i=1
[f(b
i
) f(a
i
)[[g(b
i
)[ +[f(a
i
)[[g(b
i
) g(a
i
)[

i=1
[f(b
i
) f(a
i
)[M

+M[g(b
i
) g(a
i
)[
M

+M = (M +M

).
As is arbitrary, f g is absolutely continuous.
(e) Let > 0. Then there is a > 0 such that

n
i=1
[f(d
i
)f(c
i
)[ whenever c c
1
d
1
. . . c
n
d
n
d and

n
i=1
d
i
c
i
; and there is an > 0 such that

n
i=1
[g(b
i
) g(a
i
)[ whenever a a
1
b
1
. . . a
n
b
n
b
and

n
i=1
b
i
a
i
. Now suppose that a a
1
b
1
. . . a
n
b
n
b and

n
i=1
b
i
a
i
. Because g is
non-decreasing, we have c g(a
1
) . . . g(b
n
) d and

n
i=1
g(b
i
) g(a
i
) , so

n
i=1
[f(g(b
i
)) f(g(a
i
))[ ; as
is arbitrary, fg is absolutely continuous.
225D Lemma Let [a, b] be a non-empty closed interval in R and f : [a, b] R an absolutely continuous function
which has zero derivative almost everywhere in [a, b]. Then f is constant on [a, b].
proof Let x [a, b], > 0. Let > 0 be such that

n
i=1
[f(b
i
) f(a
i
)[ whenever a a
1
b
1
a
2
b
2

. . . a
n
b
n
b and

n
i=1
b
i
a
i
. Set A = t : a < t < x, f

(t) exists = 0; then A = x a, writing


for Lebesgue measure as usual. Let J be the set of non-empty non-singleton closed intervals [c, d] [a, x] such that
225F Absolutely continuous functions 77
[f(d)f(c)[ (dc); then every member of A belongs to arbitrarily short members of J. By Vitalis theorem (221A),
there is a countable disjoint family J
0
J such that (A

J
0
) = 0, that is,
x a = (

J
0
) =

II
0
I.
Now there is a nite J
1
J
0
such that
(

J
1
) =

II
1
I x a .
If J
1
= , then x a + and [f(x) f(a)[ . Otherwise, express J
1
as [c
0
, d
0
], . . . , [c
n
, d
n
], where a c
0
< d
0
<
c
1
< d
1
< . . . < c
n
< d
n
x. Then
(c
0
a) +

n
i=1
(c
i
d
i1
) + (x d
n
) = ([a, x]

J
1
) ,
so
[f(c
0
) f(a)[ +

n
i=1
[f(c
i
) f(d
i1
)[ +[f(x) f(d
n
)[ .
On the other hand, [f(d
i
) f(c
i
)[ (d
i
c
i
) for each i, so

n
i=0
[f(d
i
) f(c
i
)[

n
i=0
d
i
c
i
(x a).
Putting these together,
[f(x) f(a)[ [f(c
0
) f(a)[ +[f(d
0
) f(c
0
)[ +[f(c
1
) f(d
0
)[ +. . .
+[f(d
n
) f(c
n
)[ +[f(x) f(d
n
)[
= [f(c
0
) f(a)[ +
n

i=1
[f(c
i
) f(d
i1
)[
+[f(x) f(d
n
)[ +
n

i=0
[f(d
i
) f(c
i
)[
+(x a) = (1 +x a).
As is arbitrary, f(x) = f(a). As x is arbitrary, f is constant.
225E Theorem Let [a, b] be a non-empty closed interval in R and F : [a, b] R a function. Then the following are
equiveridical:
(i) there is an integrable real-valued function f such that F(x) = F(a) +
_
x
a
f for every x [a, b];
(ii)
_
x
a
F

exists and is equal to F(x) F(a) for every x [a, b];


(iii) F is absolutely continuous.
proof (i)(iii) Assume (i). Let > 0. By 225A, there is a > 0 such that
_
H
[f[ whenever H [a, b] and
H , writing for Lebesgue measure as usual. Now suppose that a a
1
b
1
a
2
b
2
. . . a
n
b
n
b and

n
i=1
b
i
a
i
. Consider H =

1in
[a
i
, b
i
[. Then H and

n
i=1
[F(b
i
) F(a
i
)[ =

n
i=1
[
_
[a
i
,b
i
[
f[

n
i=1
_
[a
i
,b
i
[
[f[ =
_
F
[f[ .
As is arbitrary, F is absolutely continuous.
(iii)(ii) If F is absolutely continuous, then it is of bounded variation (by 225Ba), so
_
b
a
F

exists (224I). Set


G(x) =
_
x
a
F

for x [a, b]; then G

=
a.e.
F

(222E) and G is absolutely continuous (by (i)(iii) just proved).


Accordingly GF is absolutely continuous (225Bb) and is dierentiable, with zero derivative, almost everywhere. It
follows that GF must be constant (225D). But as G(a) = 0, G = F +F(a); just as required by (ii).
(ii)(i) is trivial.
225F Integration by parts As an application of this result, I give a justication of a familiar formula.
Theorem Let f be a real-valued function which is integrable over an interval [a, b] R, and g : [a, b] R an absolutely
continuous function. Suppose that F is an indenite integral of f, so that F(x) F(a) =
_
x
a
f for x [a, b]. Then
_
b
a
f g = F(b)g(b) F(a)g(a)
_
b
a
F g

.
proof Set h = F g. Because F is absolutely continuous (225E), so is h (225Cd). Consequently h(b) h(a) =
_
b
a
h

,
by (iii)(ii) of 225E. But h

= F

g + F g

wherever F

and g

are dened, which is almost everywhere, and


78 The Fundamental Theorem of Calculus 225F
F

=
a.e.
f, by 222E; so h

=
a.e.
f g + F g

. Finally, g and F are continuous, therefore measurable, and bounded,


while f and g

are integrable (using 225E yet again), so f g and F g

are integrable, and


F(b)g(b) F(a)g(a) = h(b) h(a) =
_
b
a
h

=
_
b
a
f g +
_
b
a
F g

,
as required.
225G I come now to a group of results at a rather deeper level than most of the work of this chapter, being closer
to the ideas of Chapter 26.
Proposition Let [a, b] be a non-empty closed interval in R and f : [a, b] R an absolutely continuous function. Then
f[A] is negligible for every negligible set A R.
proof Let > 0. Then there is a > 0 such that

n
i=1
[f(b
i
) f(a
i
)[ whenever a a
1
b
1
. . . a
n
b
n
b
and

n
i=1
b
i
a
i
. Now there is a sequence I
k

kN
of closed intervals, covering A, with

k=0
I
k
. For each
m N, let F
m
be [a, b]

km
I
k
. Then f[F
m
] . PPP F
m
must be expressible as

in
[c
i
, d
i
] where n m and
a c
0
d
0
. . . c
n
d
n
b. For each i n choose x
i
, y
i
such that c
i
x
i
, y
i
d
i
and
f(x
i
) = min
x[c
i
,d
i
]
f(x), f(y
i
) = max
x[c
i
,d
i
]
f(x);
such exist because f is continuous, so is bounded and attains its bounds on [c
i
, d
i
]. Set a
i
= min(x
i
, y
i
), b
i
= max(x
i
, y
i
),
so that c
i
a
i
b
i
d
i
. Then

n
i=0
b
i
a
i

n
i=0
d
i
c
i
= F
m
(

kN
I
k
) ,
so
f[F
m
] = (
_
im
f[ [c
i
, d
i
] ])
n

i=0
(f[ [c
i
, d
i
] ])
=
n

i=0
[f(x
i
), f(y
i
)] =
n

i=0
[f(b
i
) f(a
i
)[ . QQQ
But f[F
m
]
mN
is a non-decreasing sequence covering f[A], so

f[A] (

mN
f[F
m
]) = sup
mN
f[F
m
] .
As is arbitrary, f[A] is negligible, as claimed.
225H Semi-continuous functions In preparation for the last main result of this section, I give a general result
concerning measurable real-valued functions on subsets of R. It will be convenient here, for once, to consider functions
taking values in [, ]. If D R
r
, a function g : D [, ] is lower semi-continuous if x : g(x) > u is an
open subset of D (for the subspace topology, see 2A3C) for every u [, ]. Any lower semi-continuous function is
Borel measurable, therefore Lebesgue measurable (121B-121D). Now we have the following result.
225I Proposition Suppose that r 1 and that f is a real-valued function, dened on a subset D of R
r
, which
is integrable over D. Then for any > 0 there is a lower semi-continuous function g : R
r
[, ] such that
g(x) f(x) for every x D and
_
D
g is dened and not greater than +
_
D
f.
Remarks This is a result of great general importance, so I give it in a fairly general form; but for the present chapter
all we need is the case r = 1, D = [a, b] where a b.
proof (a) We can enumerate Q as q
n

nN
. By 225A, there is a > 0 such that
_
F
[f[
1
2
whenever
D
F ,
where
D
is the subspace measure on D, so that
D
F =

F, the outer Lebesgue measure of F, for every F


D
,
the domain of
D
(214A-214B). For each n N, set

n
= 2
n1
min(

1+2|q
n
|
, ),
so that

n=0

n
[q
n
[
1
2
and

n=0

n
. For each n N, let E
n
R
r
be a Lebesgue measurable set such that
x : f(x) q
n
= D E
n
, and choose an open set G
n
E
n
B(0, n) such that G
n
(E
n
B(0, n)) +
n
(134Fa),
writing B(0, n) for the ball x : |x| n. For x R
r
, set
g(x) = supq
n
: x G
n
,
allowing as sup and as the supremum of a set with no upper bound in R.
225J Absolutely continuous functions 79
(b) Now check the properties of g.
(i) g is lower semi-continuous. PPP If u [, ], then
x : g(x) > u =

G
n
: q
n
> u
is a union of open sets, therefore open. QQQ
(ii) g(x) f(x) for every x D. PPP If x D and > 0, there is an n N such that |x| n and f(x)
q
n
f(x); now x E
n
G
n
so g(x) q
n
f(x) . As is arbitrary, g(x) f(x). QQQ
(iii) Consider the functions h
1
, h
2
: D ], ] dened by setting
h
1
(x) = [f(x)[ if x D
_
nN
(G
n
E
n
),
= 0 for other x D,
h
2
(x) =

n=0
[q
n
[(G
n
E
n
)(x) for every x D.
Setting F =

nN
G
n
E
n
,
F

n=0
(G
n
E
n
) ,
so
_
D
h
1

_
DF
[f[
1
2

by the choice of . As for h


2
, we have (by B.Levis theorem)
_
D
h
2
=

n=0
[q
n
[
D
(D G
n
F
n
)

n=0
[q
n
[(G
n
F
n
)
1
2

because this is nite, h


2
(x) < for almost every x D. Thus
_
D
h
1
+h
2
.
(iv) The point is that g f + h
1
+ h
2
everywhere in D. PPP Take any x D. If n N and x G
n
, then either
x E
n
, in which case
f(x) +h
1
(x) +h
2
(x) f(x) q
n
,
or x G
n
E
n
, in which case
f(x) +h
1
(x) +h
2
(x) f(x) +[f(x)[ +[q
n
[ q
n
.
Thus
f(x) +h
1
(x) +h
2
(x) supq
n
: x G
n
g(x). QQQ
So g f +h
1
+h
2
everywhere in D.
(v) Putting (iii) and (iv) together,
_
D
g
_
D
f +h
1
+h
2
+
_
D
f,
as required.
225J We need some results on Borel measurable sets and functions which are of independent interest.
Theorem Let D be a subset of R and f : D R any function. Then
E = x : x D, f is continuous at x
is relatively Borel measurable in D, and
F = x : x D, f is dierentiable at x
is actually Borel measurable; moreover, f

: F R is Borel measurable.
proof (a) For k N set
(
k
= ]a, b[ : a, b R, [f(x) f(y)[ 2
k
for all x, y D ]a, b[.
Then G
k
=

(
k
is an open set, so E
0
=

kN
G
k
is a Borel set. But E = D E
0
, so E is a relatively Borel subset of
D.
80 The Fundamental Theorem of Calculus 225J
(b)(i) I should perhaps say at once that when interpreting the formula f

(x) = lim
h0
(f(x +h) f(x))/h, I insist
on the restrictive denition
a = lim
h0
f(x+h)f(x)
h
if
for every > 0 there is a > 0 such that
f(x+h)f(x)
h
is dened and
[
f(x+h)f(x)
h
a[ whenever 0 < [h[ .
So f

(x) can be dened only if there is some > 0 such that the whole interval [x , x +] lies within the domain D
of f.
(ii) For p, q, q

Q and k N set
H(k, p, q, q

) = if ]q, q

[ , D,
= x : x E ]q, q

[ , [f(y) f(x) p(y x)[ 2


k
for every y ]q, q

[
if ]q, q

[ D.
Then H(k, p, q, q

) = E ]q, q

[ H(k, p, q, q

). PPP If x E ]q, q

[ H(k, p, q, q

) there is a sequence x
n

nN
in
H(k, p, q, q

) converging to x. Because f is continuous at x,


[f(y) f(x) p(y x)[ = lim
n
[f(y) f(x
n
) p(y x
n
)[ 2
k
for every y ]q, q

[, so that x H(k, p, q, q

). QQQ Since E is a Borel set, by (a), so is H(k, p, q, q

).
(iii) Now
F =

kN

p,q,q

Q
H(k, p, q, q

).
PPP () Suppose x F, that is, f

(x) is dened; say f

(x) = a. Take any k N. Then there are p Q, ]0, 1]


such that [p a[ 2
k1
and [x , x + ] D and [
f(x+h)f(x)
h
a[ 2
k1
whenever 0 < [h[ ; now take
q Q[x , x[, q

Q]x, x +] and see that x H(k, p, q, q

). As x is arbitrary, F

kN

p,q,q

Q
H(k, p, q, q

).
() If x

kN

p,q,q

Q
H(k, p, q, q

), then for each k N choose p


k
, q
k
, q

k
Q such that x H(k, p
k
, q
k
, q

k
). If
h ,= 0, x + h ]q
k
, q

k
[ then [
f(x+h)f(x)
h
p
k
[ 2
k
. But this means, rst, that [p
k
p
l
[ 2
k
+ 2
l
for every k, l
(since surely there is some h ,= 0 such that x +h ]q
k
, q

k
[ ]q
l
, q

l
[), so that p
k

kN
is a Cauchy sequence, with limit a
say; and, second, that [
f(x+h)f(x)
h
a[ 2
k
+[a p
k
[ whenever h ,= 0 and x + h ]q
k
, q

k
[, so that f

(x) is dened
and equal to a. QQQ
(iv) Because Q is countable, all the unions

p,q,q

Q
H(k, p, q, q

) are Borel sets, so F also is.


(v) Now enumerate Q
3
as (p
i
, q
i
, q

i
)
iN
, and set H

ki
= H(k, p
i
, q
i
, q

i
)

j<i
H(k, p
j
, q
j
, q

j
) for each k, i N.
Every H

ki
is Borel measurable, H

ki

iN
is disjoint, and

iN
H

ki
=

iN
H(k, p
i
, q
i
, q

i
) F
for each k. Note that [f

(x) p[ 2
k
whenever x F H(k, p, q, q

), so if we set f
k
(x) = p
i
for every x H

ki
we
shall have a Borel function f
k
such that [f(x) f
k
(x)[ 2
k
for every x F. Accordingly f

= lim
k
f
k
F is Borel
measurable.
225K Proposition Let [a, b] be a non-empty closed interval in R, and f : [a, b] R a function. Set F = x :
x ]a, b[ , f

(x) is dened. Then f is absolutely continuous i (i) f is continuous (ii) f

is integrable over F (iii)


f[ [a, b] F] is negligible.
proof (a) Suppose rst that f is absolutely continuous. Then f is surely continuous (225Ca) and f

is integrable over
[a, b], therefore over F (225E); also [a, b] F is negligible, so f[ [a, b] F] is negligible, by 225G.
(b) So now suppose that f satises the conditions. Set f

(x) = [f

(x)[ for x F, 0 for x [a, b] F. Then


f(b) f(a) +
_
b
a
f

.
PPP (i) Because F is a Borel set and f

is a Borel measurable function (225J), f

is measurable. Let > 0. Let G be


an open subset of R such that f[ [a, b] F] G and G (134Fa). Let g : R [0, ] be a lower semi-continuous
function such that f

(x) g(x) for every x [a, b] and


_
b
a
g
_
b
a
f

+ (225I). Consider
225L Absolutely continuous functions 81
A = x : a x b, ([f(a), f(x)] G) 2(x a) +
_
x
a
g,
interpreting [f(a), f(x)] as if f(x) < f(a). Then a A [a, b], so c = sup A is dened and belongs to [a, b].
Because f is continuous, the function x ([f(a), f(x)] G) is continuous; also x 2(x a) +
_
x
a
g is certainly
continuous, so c A.
(ii) ??? If c F, so that f

(c) = [f

(c)[, then there is a > 0 such that


a c c + b,
g(x) g(c) [f

(c)[ whenever [x c[ ,
[
f(x)f(c)
xc
f

(c)[ whenever [x c[ .
Consider x = c +. Then c < x b and
([f(a), f(x)] G) ([f(a), f(c)] G) +[f(x) f(c)[
2(c a) +
_
c
a
g +(x c) +[f

(c)[(x c)
2(c a) +
_
c
a
g +(x c) +
_
x
c
(g +)
(because g(t) [f

(c)[ whenever c t x)
= 2(x a) +
_
x
a
g,
so x A; but c is supposed to be an upper bound of A. XXX
Thus c [a, b] F.
(iii) ??? Now suppose, if possible, that c < b. We know that f(c) G, so there is an > 0 such that [f(c), f(c)+
] G; now there is a > 0 such that [f(x) f(c)[ whenever x [a, b] and [x c[ . Set x = min(c +, b); then
c < x b and [f(c), f(x)] G, so
([f(a), f(x)] G) = ([f(a), f(c)] G) 2(c a) +
_
c
a
g 2(x a) +
_
x
a
g
and once again x A, even though x > sup A. XXX
(iv) We conclude that c = b, so that b A. But this means that
f(b) f(a) ([f(a), f(b)]) ([f(a), f(b)] G) +G
2(b a) +
_
b
a
g + 2(b a) +
_
b
a
f

+ +
= 2(1 +b a) +
_
b
a
f

.
As is arbitrary, f(b) f(a)
_
b
a
f

, as claimed. QQQ
(c) Similarly, or applying (b) to f, f(a) f(b)
_
b
a
f

, so that [f(b) f(a)[


_
b
a
f

.
Of course the argument applies equally to any subinterval of [a, b], so [f(d) f(c)[
_
d
c
f

whenever a c d b.
Now let > 0. By 225A once more, there is a > 0 such that
_
E
f

whenever E [a, b] and E . Suppose


that a a
1
b
1
. . . a
n
b
n
b and

n
i=1
b
i
a
i
. Then

n
i=1
[f(b
i
) f(a
i
)[

n
i=1
_
b
i
a
i
f

=
_

in
[a
i
,b
i
]
f

.
So f is absolutely continuous, as claimed.
225L Corollary Let [a, b] be a non-empty closed interval in R. Let f : [a, b] R be a continuous function which
is dierentiable on the open interval ]a, b[. If its derivative f

is integrable over [a, b], then f is absolutely continuous,


and f(b) f(a) =
_
b
a
f

.
proof f[ [a, b]F] = f(a), f(b) is surely negligible, so f is absolutely continuous, by 225K; consequently f(b)f(a) =
_
b
a
f

, by 225E.
82 The Fundamental Theorem of Calculus 225M
225M Corollary Let [a, b] be a non-empty closed interval in R, and f : [a, b] R a continuous function. Then f is
absolutely continuous i it is continuous and of bounded variation and f[A] is negligible for every negligible A [a, b].
proof (a) Suppose that f is absolutely continuous. By 225C(a-b) it is continuous and of bounded variation, and by
225G we have f[A] negligible for every negligible A [a, b].
(b) So now suppose that f satises the conditions. Set F = x : x ]a, b[ , f

(x) is dened. By 224I, [a, b] F is


negligible, so f[ [a, b] F] is negligible. Moreover, also by 224I, f

is integrable over [a, b] or F. So the conditions of


225K are satised and f is absolutely continuous.
225N The Cantor function I should mention the standard example of a continuous function of bounded variation
which is not absolutely continuous. Let C [0, 1] be the Cantor set (134G). Recall that the Cantor function is a
non-decreasing continuous function f : [0, 1] [0, 1] such that f

(x) is dened and equal to zero for every x [0, 1] C,


but f(0) = 0 < 1 = f(1) (134H). Of course f is of bounded variation and not absolutely continuous. C is negligible
and f[C] = [0, 1] is not. If x C, then for every n N there is an interval of length 3
n
, containing x, on which f
increases by 2
n
; so f cannot be dierentiable at x, and the set F = domf

of 225K is precisely [0, 1] C, so that


f[ [0, 1] F] = [0, 1].
225O Complex-valued functions As usual, I spell out the results above in the forms applicable to complex-valued
functions.
(a) Let (X, , ) be any measure space and f an integrable complex-valued function dened on a conegligible subset
of X. Then for any > 0 there are a measurable set E of nite measure and a real number > 0 such that
_
F
[f[
whenever F and (F E) . (Apply 225A to [f[.)
(b) If [a, b] is a non-empty closed interval in R and f : [a, b] C is a function, we say that f is absolutely
continuous if for every > 0 there is a > 0 such that

n
i=1
[f(b
i
) f(a
i
)[ whenever a a
1
b
1
a
2
b
2

. . . a
n
b
n
b and

n
i=1
b
i
a
i
. Observe that f is absolutely continuous i its real and imaginary parts are
both absolutely continuous.
(c) Let [a, b] be a non-empty closed interval in R.
(i) If f : [a, b] C is absolutely continuous it is of bounded variation on [a, b], so is dierentiable almost everywhere
in [a, b], and its derivative is integrable over [a, b].
(ii) If f, g : [a, b] C are absolutely continuous, so are f +g and f, for any C, and f g.
(iii) If g : [a, b] [c, d] is monotonic and absolutely continuous, and f : [c, d] C is absolutely continuous, then
fg : [a, b] C is absolutely continuous.
(d) Let [a, b] be a non-empty closed interval in R and F : [a, b] C a function. Then the following are equiveridical:
(i) there is an integrable complex-valued function f such that F(x) = F(a) +
_
x
a
f for every x [a, b];
(ii)
_
x
a
F

exists and is equal to F(x) F(a) for every x [a, b];


(iii) F is absolutely continuous.
(Apply 225E to the real and imaginary parts of F.)
(e) Let f be an integrable complex-valued function on an interval [a, b] R, and g : [a, b] C an absolutely
continuous function. Set F(x) =
_
x
a
f for x [a, b]. Then
_
b
a
f g = F(b)g(b) F(a)g(a)
_
b
a
F g

.
(Apply 225F to the real and imaginary parts of f and g.)
(f ) Let f be a continuous complex-valued function on a closed interval [a, b] R, and suppose that f is dierentiable
at every point of the open interval ]a, b[, with f

integrable over [a, b]. Then f is absolutely continuous. (Apply 225L


to the real and imaginary parts of f.)
(g) For a result corresponding to 225M, see 264Yp.
225X Basic exercises (a) Show directly from the denition in 225B (without appealing to 225E) that any ab-
solutely continuous real-valued function on a closed interval [a, b] is expressible as the dierence of non-decreasing
absolutely continuous functions.
225Xo Absolutely continuous functions 83
(b) Let f : [a, b] R be an absolutely continuous function, where a b. (i) Show that [f[ : [a, b] R is absolutely
continuous. (ii) Show that gf is absolutely continuous whenever g : R R is a dierentiable function with bounded
derivative.
(c) Show directly from the denition in 225B and the Mean Value Theorem (without appealing to 225K) that if a
function f is continuous on a closed interval [a, b], dierentiable on the open interval ]a, b[, and has bounded derivative
in ]a, b[, then f is absolutely continuous, so that f(x) = f(a) +
_
x
a
f

for every x [a, b].


(d) Show that if f : [a, b] R is absolutely continuous, then Var f =
_
b
a
[f

[. (Hint: put 224I and 225E together.)


(e) Let f : [0, [ C be a function which is absolutely continuous on [0, a] for every a [0, [ and has Laplace
transform F(s) =
_

0
e
sx
f(x)dx dened on s : !e s > S. Suppose also that lim
x
e
Sx
f(x) = 0. Show that f

has Laplace transform sF(s) f(0) dened whenever !e s > S. (Hint: show that
f(x)e
sx
f(0) =
_
x
0
d
dt
(f(t)e
st
)dt
for every x 0.)
(f ) Let g : R R be a non-decreasing function which is absolutely continuous on every bounded interval; let
g
be
the associated Lebesgue-Stieltjes measure (114Xa), and
g
its domain. Show that
_
E
g

=
g
E for any E
g
, if we
allow as a value of the integral. (Hint: start with intervals E.)
(g) Let g : [a, b] R be a non-decreasing absolutely continuous function, and f : [g(a), g(b)] R a continuous
function. Show that
_
g(b)
g(a)
f(t)dt =
_
b
a
f(g(t))g

(t)dt. (Hint: set F(x) =


_
x
g(a)
f, G = Fg and consider
_
b
a
G

(t)dt. See
also 263I.)
(h) Suppose that I R is any non-trivial interval (bounded or unbounded, open, closed or half-open, but not empty
or a singleton), and f : I R a function. Show that f is absolutely continuous on every closed bounded subinterval
of I i there is a function g such that
_
b
a
g = f(b) f(a) whenever a b in I, and in this case g is integrable i f is of
bounded variation on I.
(i) Show that
_
1
0
ln x
x1
dx =

n=1
1
n
2
. (Hint: use 225F to nd
_
1
0
x
n
ln xdx, and recall that
1
1x
=

n=0
x
n
for
0 x < 1.)
(j)(i) Show that
_
1
0
t
a
dt is nite for every a > 1. (ii) Show that
_

1
t
a
e
t
dt is nite for every a R. (Hint: show
that there is an M such that t
a
Me
t/2
for t M.) (iii) Show that (a) =
_

0
t
a1
e
t
dt is dened for every a > 0.
(iv) Show that (a + 1) = a(a) for every a > 0. (v) Show that (n + 1) = n! for every n N.
( is of course the gamma function.)
(k) Show that if b > 0 then
_

0
u
b1
e
u
2
/2
du = 2
(b2)/2
(
b
2
). (Hint: consider f(t) = t
(b2)/2
e
t
, g(u) = u
2
/2 in
225Xg.)
(l) Suppose that f, g are lower semi-continuous functions, dened on subsets of R
r
, and taking values in ], ].
(i) Show that f +g, f g and f g are lower semi-continuous, and that f is lower semi-continuous for every 0.
(ii) Show that if f and g are non-negative, then f g is lower semi-continuous. (iii) Show that if f is non-negative and
g is continuous, then f g is lower semi-continuous. (iv) Show that if f is non-decreasing then the composition fg is
lower semi-continuous.
(m) Let A be a non-empty family of lower semi-continuous functions dened on subsets of R
r
and taking values in
[, ]. Set g(x) = supf(x) : f A, x domf for x D =

fA
domf. Show that g is lower semi-continuous.
(n) Suppose that f : [a, b] R is continuous, and dierentiable at all but countably many points of [a, b]. Show
that f is absolutely continuous i it is of bounded variation.
(o) Show that if f : [a, b] R is absolutely continuous, then f[E] is Lebesgue measurable for every Lebesgue
measurable set E [a, b].
84 The Fundamental Theorem of Calculus 225Y
225Y Further exercises (a) Show that the composition of two absolutely continuous functions need not be
absolutely continuous. (Hint: 224Xb.)
(b) Let f : [a, b] R be a continuous function, where a < b. Set G = x : x ]a, b[ , y ]x, b] such that
f(x) < f(y). Show that G is open and is expressible as a disjoint union of intervals ]c, d[ where f(c) f(d). Use this
to prove 225D without calling on Vitalis theorem.
(c) Let f : [a, b] R be a function of bounded variation and > 0. Show that there is an absolutely continuous
function g : [a, b] R such that [g

(x)[ wherever the derivative is dened and x : x [a, b], f(x) ,= g(x) has
measure at most
1
Var f. (Hint: reduce to the case of non-decreasing f. Apply 225Yb to the function x f(x) x
and show that G Var
[a,b]
(f). Set g(x) = f(x) for x ]a, b[ G.)
(d) Let f be a non-negative measurable real-valued function dened on a subset D of R
r
, where r 1. Show that
for any > 0 there is a lower semi-continuous function g : R
r
[, ] such that g(x) f(x) for every x D and
_
D
g f .
(e) Let f be a measurable real-valued function dened on a subset D of R
r
, where r 1. Show that for any
> 0 there is a lower semi-continuous function g : R
r
[, ] such that g(x) f(x) for every x D and

x : x D, g(x) > f(x) . (Hint: 134Yd, 134Fb.)


(f )(i) Show that if f is a Lebesgue measurable real function then all its Dini derivates are Lebesgue measurable. (ii)
Show that if f is a Borel measurable real function then all its Dini derivates are Borel measurable.
225 Notes and comments There is a good deal more to say about absolutely continuous functions; I will return to
the topic in the next section and in Chapter 26. I shall not make direct use of any of the results from 225H on, but it
seems to me that this kind of investigation is necessary for any clear picture of the relationships between such concepts
as absolute continuity and bounded variation. Of course, in order to apply these results, we do need a store of simple
kinds of absolutely continuous function, dierentiable functions with bounded derivative forming the most important
class (225Xc). A larger family of the same kind is the class of Lipschitz functions (262Bc).
The denition of absolutely continuous function is ordinarily set out for closed bounded intervals, as in 225B. The
point is that for other intervals the simplest generalizations of this formulation do not seem quite appropriate. In
225Xh I try to suggest the kind of demands one might make on functions dened on other types of interval.
I should remark that the real prize is still not quite within our grasp. I have been able to give a reasonably
satisfactory formulation of simple integration by parts (225F), at least for bounded intervals a further limiting
process is necessary to deal with unbounded intervals. But a companion method from advanced calculus, integration
by substitution, remains elusive. The best I think we can do at this point is 225Xg, which insists on a continuous
integrand f. It is the case that the result is valid for general integrable f, but there are some further subtleties to
be mastered on the way; the necessary ideas are given in the much more general results 235A and 263D below, and
applied to the one-dimensional case in 263I.
On the way to the characterization of absolutely continuous functions in 225K, I nd myself calling on one of
the fundamental relationships between Lebesgue measure and the topology of R
r
(225I). The technique here can be
adapted to give many variations of the result; see 225Yd-225Ye. If you have not seen semi-continuous functions before,
225Xl-225Xm give a partial idea of their properties. In 225J I give a fragment of descriptive set theory, the study
of the kinds of set which can arise from the formulae of analysis. These ideas too will re-surface elsewhere; compare
225Yf and also the proof of 262M below.
226 The Lebesgue decomposition of a function of bounded variation
I end this chapter with some notes on a method of analysing a general function of bounded variation which may help
to give a picture of what such functions can be, though it is not directly necessary for anything of great importance
dealt with in this volume.
226A Sums over arbitrary index sets To get a full picture of this fragment of real analysis, a bit of preparation
will be helpful. This concerns the notion of a sum over an arbitrary index set, which I have rather been skirting around
so far.
(a) If I is any set and a
i

iI
any family in [0, ], we set
226Ad The Lebesgue decomposition of a function of bounded variation 85

iI
a
i
= sup

iK
a
i
: K is a nite subset of I,
with the convention that

i
a
i
= 0. (See 112Bd, 222Ba.) For general a
i
[, ], we can set

iI
a
i
=

iI
a
+
i

iI
a

i
if this is dened in [, ], that is, at least one of

iI
a
+
i
,

iI
a

i
is nite, where a
+
= max(a, 0) and a

=
max(a, 0) for each a. If

iI
a
i
is dened and nite, we say that a
i

iI
is summable.
(b) Since this is a book on measure theory, I will immediately describe the relationship between this kind of
summability and an appropriate notion of integration. For any set I, we have the corresponding counting measure
on I (112Bd). Every subset of I is measurable, so every family a
i

iI
of real numbers is a measurable real-
valued function on I. A subset of I has nite measure i it is nite; so a real-valued function f on I is simple if
K = i : f(i) ,= 0 is nite. In this case,
_
fd =

iK
f(i) =

iI
f(i)
as dened in part (a). The measure is semi-nite (211Nc) so a non-negative function f is integrable i
_
f =
sup
K<
_
K
f is nite (213B); but of course this supremum is precisely
sup

iK
f(i) : K I is nite =

iI
f(i).
Now a general function f : I R is integrable i it is measurable and
_
[f[d < , that is, i

iI
[f(i)[ < , and
in this case
_
fd =
_
f
+
d
_
f

d =

iI
f(i)
+

iI
f(i)

iI
f(i),
writing f

(i) = f(i)

for each i. Thus we have

iI
a
i
=
_
I
a
i
(di),
and the standard rules under which we allow as the value of an integral (133A, 135F) match well with the interpre-
tations in (a) above.
(c) Accordingly, and unsurprisingly, the operation of summation is a linear operation on the linear space of summable
families of real numbers.
I observe here that this notion of summability is absolute; a family a
i

iI
is summable i it is absolutely summable.
This is necessary because it must also be unconditional; we have no structure on an arbitrary set I to guide us to
take the sum in any particular order. See 226Xf. In particular, I distinguish between

nN
a
n
, which in this book
will always be interpreted as in 226A above, and

n=0
a
n
which (if it makes a dierence) should be interpreted as
lim
m

m
n=0
a
n
. So, for instance,

n=0
(1)
n
n+1
= ln 2, while

nN
(1)
n
n+1
is undened. Of course

n=0
a
n
=

nN
a
n
whenever the latter is dened in [, ].
(d) There is another, and very important, approach to the sum described here. If a
i

iI
is an (absolutely) summable
family of real numbers, then for every > 0 there is a nite K I such that

iI\K
[a
i
[ . PPP This is nothing but
a special case of 225A; there is a set K with K < such that
_
I\K
[a
i
[(di) , but
_
I\K
[a
i
[(di) =

iI\K
[a
i
[. QQQ
(Of course there are direct proofs of this result from the denition in (a), not mentioning measures or integrals. But
I think you will see that they rely on the same idea as that in the proof of 225A.) Consequently, for any family a
i

iI
of real numbers and any s R, the following are equiveridical:
(i)

iI
a
i
= s;
(ii) for every > 0 there is a nite K I such that [s

iJ
a
i
[ whenever J is nite and K J I.
PPP (i)(ii) Take K such that

iI\K
[a
i
[ . If K J I, then
[s

iJ
a
i
[ = [

iI\J
a
i
[

iI\K
[a
i
[ .
(ii)(i) Let > 0, and let K I be as described in (ii). If J IK is any nite set, then set J
1
= i : i J, a
i
0,
J
2
= J J
1
. We have
86 The Fundamental Theorem of Calculus 226Ad

iJ
[a
i
[ = [

iJ
1
K
a
i

iJ
2
K
a
i
[
[s

iJ
1
K
a
i
[ +[s

iJ
2
K
a
i
[ 2.
As J is arbitrary,

iI\K
[a
i
[ 2 and

iI
[a
i
[

iK
[a
i
[ + 2 < .
Accordingly

iI
a
i
is well-dened in R. Also
[s

iI
a
i
[ [s

iK
a
i
[ +[

iI\K
a
i
[ +

iI\K
[a
i
[ 3.
As is arbitrary,

iI
a
i
= s, as required. QQQ
In this way, we express

iI
a
i
directly as a limit; we could write it as

iI
a
i
= lim
KI

iK
a
i
,
on the understanding that we look at nite sets K in the right-hand formula.
(e) Yet another approach is through the following fact. If

iI
[a
i
[ < , then for any > 0 the set i : [a
i
[ is
nite, indeed can have at most
1

iI
[a
i
[ members; consequently
J = i : a
i
,= 0 =

nN
i : [a
i
[ 2
n

is countable (1A1F). If J is nite, then of course

iI
a
i
=

iJ
a
i
reduces to a nite sum. Otherwise, we can
enumerate J as j
n

nN
, and we shall have

iI
a
i
=

iJ
a
i
= lim
n

n
k=0
a
j
k
=

n=0
a
j
n
(using (d) to reduce the sum

iJ
a
i
to a limit of nite sums). Conversely, if a
i

iI
is such that there is a countably
innite J i : a
i
,= 0 enumerated as j
n

nN
, and if

n=0
[a
j
n
[ < , then

iI
a
i
will be

n=0
a
j
n
.
(f ) It will be useful later to have a fragment of general theory. Let I and J be sets and a
ij

iI,jJ
a family in
[0, ]. Then

(i,j)IJ
a
ij
=

iI
(

jJ
a
ij
) =

jJ
(

iI
a
ij
).
PPP (i) If

(i,j)IJ
a
ij
> u, then there is a nite set M I J such that

(i,j)M
a
ij
> u. Now K = i : (i, j) M
and L = j : (i, j) M are nite, so

iI

jJ
a
ij

iK

jJ
a
ij

iK

jL
a
ij
(because

jJ
a
ij

jL
a
ij
for every i)
=

(i,j)KL
a
ij

(i,j)M
a
ij
> u.
As u is arbitrary,

iI

jJ
a
ij


(i,j)IJ
a
ij
. (ii) If

iI

jJ
a
ij
> u, there is a nite set K I such that

iK

jJ
a
ij
> u. Let ]0, 1[ be such that

iK

jJ
a
ij
> u + , and set =

#(K)
. For each i K set

i
= min(u + 1,

jJ
a
ij
) ; then
+

iK

i
=

iK
min(u + 1,

jJ
a
ij
) min(u + 1,

iK

jJ
a
ij
) > u +,
so

iK

i
> u. For each i K,
i
<

jJ
a
ij
, so there is a nite L
i
J such that

jL
i
a
ij

i
. Set
M = (i, j) : i K, j L
i
, so that M is a nite subset of I J; then

(i,j)IJ
a
ij

(i,j)M
a
ij
=

iK

jL
i
a
ij

iK

i
> u.
As u is arbitrary,

(i,j)IJ
a
ij


iI

jJ
a
ij
and these two sums are equal. (iii) Similarly,

(i,j)IJ
a
ij
=

jJ

iI
a
ij
. QQQ
226B Saltus functions Now we are ready for a special type of function of bounded variation on R. Suppose that
a < b in R.
226Bd The Lebesgue decomposition of a function of bounded variation 87
(a) A (real) saltus function on [a, b] is a function F : [a, b] R expressible in the form
F(x) =

t[a,x[
u
t
+

t[a,x]
v
t
for x [a, b], where u
t

t[a,b[
, v
t

t[a,b]
are real-valued families such that

t[a,b[
[u
t
[ and

t[a,b]
[v
t
[ are nite.
(b) For any function F : [a, b] R we can write
F(x
+
) = lim
yx
F(y) if x [a, b[ and the limit exists,
F(x

) = lim
yx
F(y) if x ]a, b] and the limit exists.
(I hope that this will not lead to confusion with the alternative interpretation of x
+
as max(x, 0).) Observe that if F is a
saltus function, as dened in (b), with associated families u
t

t[a,b[
and v
t

t[a,b]
, then v
a
= F(a), v
x
= F(x) F(x

)
for x ]a, b], u
x
= F(x
+
) F(x) for x [a, b[. PPP Let > 0. As remarked in 226Ad, there is a nite K [a, b] such
that

t[a,b[\K
[u
t
[ +

t[a,b]\K
[v
t
[ .
Given x [a, b], let > 0 be such that [x , x + ] contains no point of K except perhaps x. In this case, if
max(a, x ) y < x, we must have
[F(y) (F(x) v
x
)[ = [

t[y,x[
u
t
+

t]y,x[
v
t
[

t[a,b[\K
[u
t
[ +

t[a,b]\K
[v
t
[ ,
while if x < y min(b, x +) we shall have
[F(y) (F(x) +u
x
)[ = [

t]x,y[
u
t
+

t]x,y]
v
t
[

t[a,b[\K
[u
t
[ +

t[a,b]\K
[v
t
[ .
As is arbitrary, we get F(x

) = F(x) v
x
(if x > a) and F(x
+
) = F(x) +u
x
(if x < b). QQQ
It follows that F is continuous at x ]a, b[ i u
x
= v
x
= 0, while F is continuous at a i u
a
= 0 and F is continuous
at b i v
b
= 0. In particular, x : x [a, b], F is not continuous at x is countable (see 226Ae).
(c) If F is a saltus function dened on [a, b], with associated families u
t

t[a,b[
, v
t

t[a,b]
, then F is of bounded
variation on [a, b], and
Var
[a,b]
(F)

t[a,b[
[u
t
[ +

t]a,b]
[v
t
[.
PPP If a x < y b, then
F(y) F(x) = u
x
+

t]x,y[
(u
t
+v
t
) +v
y
,
so
[F(y) F(x)[

t[x,y[
[u
t
[ +

t]x,y]
[v
t
[.
If a a
0
a
1
. . . a
n
b, then
n

i=1
[F(a
i
) F(a
i1
)[
n

i=1
_

t[a
i1
,a
i
[
[u
t
[ +

t]a
i1
,a
i
]
[v
t
[
_

t[a,b[
[u
t
[ +

t]a,b]
[v
t
[.
Consequently
Var
[a,b]
(F)

t[a,b[
[u
t
[ +

t]a,b]
[v
t
[ < . QQQ
(d) The inequality in (c) is actually an equality. To see this, note rst that if a x < y b, then Var
[x,y]
(F)
[u
x
[ + [v
y
[. PPP I noted in (b) that u
x
= lim
tx
F(t) F(x) and v
y
= F(y) lim
ty
F(t). So, given > 0, we can nd
t
1
, t
2
such that x < t
1
t
2
y and
88 The Fundamental Theorem of Calculus 226Bd
[F(t
1
) F(x)[ [u
x
[ , [F(y) F(t
2
)[ [v
y
[ .
Now
Var
[x,y]
(F) [F(t
1
) F(x)[ +[F(t
2
) F(t
1
)[ +[F(y) F(t
2
)[ [u
x
[ +[v
y
[ 2.
As is arbitrary, we have the result. QQQ
Now, given a t
0
< t
1
< . . . < t
n
b, we must have
Var
[a,b]
(F)
n

i=1
Var
[t
i1
,t
i
]
(F)
(using 224Cc)

i=1
[u
t
i1
[ +[v
t
i
[.
As t
0
, . . . , t
n
are arbitrary,
Var
[a,b]
(F)

t[a,b[
[u
t
[ +

t]a,b]
[v
t
[,
as required.
(e) Because a saltus function is of bounded variation ((c) above), it is dierentiable almost everywhere (224I). In fact
its derivative is zero almost everywhere. PPP Let F : [a, b] R be a saltus function, with associated families u
t

t[a,b[
,
v
t

t[a,b]
. Let > 0. Let K [a, b] be a nite set such that

t[a,b[\K
[u
t
[ +

t[a,b]\K
[v
t
[ .
Set
u

t
= u
t
if t [a, b[ K,
= 0 if t [a, b[ K,
v

t
= v
t
if t K,
= 0 if t [a, b] K,
u

t
= u
t
u

t
for t [a, b[ ,
v

t
= v
t
v

t
for t [a, b].
Let G, H be the saltus functions corresponding to u

t[a,b[
, v

t[a,b]
and u

t[a,b[
v

t[a,b]
, so that F = G + H.
Then G

(t) = 0 for every t ]a, b[ K, since ]a, b[ K comprises a nite number of open intervals on each of which G
is constant. So G

= 0 a.e. and F

=
a.e.
H

. On the other hand,


_
b
a
[H

[ Var
[a,b]
(H) =

t[a,b[\K
[u
t
[ +

t]a,b]\K
[v
t
[ ,
using 224I and (d) above. So
_
b
a
[F

[ =
_
b
a
[H

[ .
As is arbitrary,
_
b
a
[F

[ = 0 and F

= 0 a.e., as claimed. QQQ


226C The Lebesgue decomposition of a function of bounded variation Take a, b R with a < b.
(a) If F : [a, b] R is non-decreasing, set v
a
= 0, v
t
= F(t) F(t

) for t ]a, b], u


t
= F(t
+
) F(t) for t [a, b[,
dening F(t
+
), F(t

) as in 226Bb. Then all the v


t
, u
t
are non-negative, and if a < t
0
< t
1
< . . . < t
n
< b, then

n
i=0
(u
t
i
+v
t
i
) =

n
i=0
(F(t
+
i
) F(t

i
)) F(b) F(a).
Accordingly

t[a,b[
u
t
and

t[a,b]
v
t
are both nite. Let F
p
be the corresponding saltus function, as dened in 226Ba,
so that
F
p
(x) = F(a
+
) F(a) +

t]a,x[
(F(t
+
) F(t

)) +F(x) F(x

)
if a < x b. If a x < y b then
226D The Lebesgue decomposition of a function of bounded variation 89
F
p
(y) F
p
(x) = F(x
+
) F(x) +

t]x,y[
(F(t
+
) F(t

)) +F(y) F(y

)
F(y) F(x)
because if x = t
0
< t
1
< . . . < t
n
< t
n+1
= y then
F(x
+
) F(x) +
n

i=1
(F(t
+
i
) F(t

i
)) +F(y) F(y

)
= F(y) F(x)
n+1

i=1
(F(t

i
) F(t
+
i1
)) F(y) F(x).
Accordingly both F
p
and F
c
= F F
p
are non-decreasing. Also, because
F
p
(a) = 0 = v
a
,
F
p
(t) F
p
(t

) = v
t
= F(t) F(t

) for t ]a, b],


F
p
(t
+
) F
p
(t) = u
t
= F(t
+
) F(t) for t [a, b[,
we shall have
F
c
(a) = F(a),
F
c
(t) = F
c
(t

) for t ]a, b],


F
c
(t) = F
c
(t
+
) for t [a, b[,
and F
c
is continuous.
Clearly this expression of F = F
p
+ F
c
as the sum of a saltus function and a continuous function is unique, except
that we can freely add a constant to one if we subtract it from the other.
(b) Still taking F : [a, b] R to be non-decreasing, we know that F

is integrable (222C); moreover, F

=
a.e.
F

c
, by
226Be. Set F
ac
(x) = F(a) +
_
x
a
F

for each x [a, b]. We have


F
ac
(y) F
ac
(x) =
_
y
x
F

c
F
c
(y) F
c
(x)
for a x y b (222C again), so F
cs
= F
c
F
ac
is still non-decreasing; F
ac
is continuous (225A), so F
cs
is continuous;
F

ac
=
a.e.
F

=
a.e.
F

c
(222E), so F

cs
= 0 a.e.
Again, the expression of F
c
= F
ac
+ F
cs
as the sum of an absolutely continuous function and a function with zero
derivative almost everywhere is unique, except for the possibility of moving a constant from one to the other, because
two absolutely continuous functions whose derivatives are equal almost everywhere must dier by a constant (225D).
(c) Putting all these together: if F : [a, b] R is any non-decreasing function, it is expressible as F
p
+ F
ac
+ F
cs
,
where F
p
is a saltus function, F
ac
is absolutely continuous, and F
cs
is continuous and dierentiable, with zero derivative,
almost everywhere; all three components are non-decreasing; and the expression is unique if we say that F
ac
(a) = F(a),
F
p
(a) = F
cs
(a) = 0.
The Cantor function f : [0, 1] [0, 1] (134H) is continuous and f

= 0 a.e. (134Hb), so f
p
= f
ac
= 0 and f = f
cs
.
Setting g(x) =
1
2
(x +f(x)) for x [0, 1], as in 134I, we get g
p
(x) = 0, g
ac
(x) =
x
2
and g
cs
(x) =
1
2
f(x).
(d) Now suppose that F : [a, b] R is of bounded variation. Then it is expressible as a dierence G H of
non-decreasing functions (224D). So writing F
p
= G
p
H
p
, etc., we can express F as a sum F
p
+F
cs
+F
ac
, where F
p
is a saltus function, F
ac
is absolutely continuous, F
cs
is continuous, F

cs
= 0 a.e., F
ac
(a) = F(a), F
cs
(a) = F
p
(a) = 0.
Under these conditions the expression is unique, because (for instance) F
p
(t
+
) F
p
(t) = F(t
+
) F(t) for t [a, b[,
while F

ac
=
a.e.
(F F
p
)

=
a.e.
F

.
This is a Lebesgue decomposition of the function F. (I have to say a Lebesgue decomposition because of course
the assignments F
ac
(a) = F(a), F
p
(a) = F
cs
(a) = 0 are arbitrary.) I will call F
p
the saltus part of F.
226D Complex functions The modications needed to deal with complex functions are elementary.
(a) If I is any set and a
j

jI
is a family of complex numbers, then the following are equiveridical:
90 The Fundamental Theorem of Calculus 226D
(i)

jI
[a
j
[ < ;
(ii) there is an s C such that for every > 0 there is a nite K I such that [s

jJ
a
j
[
whenever J is nite and K J I.
In this case
s =

jI
!e(a
j
) +i

jI
Jm(a
j
) =
_
I
a
j
(dj),
where is counting measure on I, and we write s =

jI
a
j
.
(b) If a < b in R, a complex saltus function on [a, b] is a function F : [a, b] C expressible in the form
F(x) =

t[a,x[
u
t
+

t[a,x]
v
t
for x [a, b], where u
t

t[a,b[
, v
t

t[a,b]
are complex-valued families such that

t[a,b[
[u
t
[ and

t[a,b]
[v
t
[ are nite;
that is, if the real and imaginary parts of F are saltus functions. In this case F is continuous except at countably many
points and dierentiable, with zero derivative, almost everywhere in [a, b], and
u
x
= lim
tx
F(t) F(x) for every x [a, b[,
v
x
= lim
tx
F(x) F(t) for every x ]a, b]
(apply the results of 226B to the real and imaginary parts of F). F is of bounded variation, and its variation is
Var
[a,b]
(F) =

t[a,b[
[u
t
[ +

t]a,b]
[v
t
[
(repeat the arguments of 226Bc-d).
(c) If F : [a, b] C is a function of bounded variation, where a < b in R, it is uniquely expressible as F =
F
p
+ F
cs
+ F
ac
, where F
p
is a saltus function, F
ac
is absolutely continuous, F
cs
is continuous and has zero derivative
almost everywhere, and F
ac
(a) = F(a), F
p
(a) = F
cs
(a) = 0. (Apply 226C to the real and imaginary parts of F.)
226E As an elementary exercise in the language of 226A, I interpolate a version of a theorem of B.Levi which is
sometimes useful.
Proposition Let (X, , ) be a measure space, I a countable set, and f
i

iI
a family of -integrable real- or complex-
valued functions such that

iI
_
[f
i
[d is nite. Then f(x) =

iI
f
i
(x) is dened almost everywhere and
_
fd =

iI
_
f
i
d.
proof If I is nite this is elementary. Otherwise, since there must be a bijection between I and N, we may take it that
I = N. Setting g
n
=

n
i=0
[f
i
[ for each n, we have a non-decreasing sequence g
n

nN
of integrable functions such that
_
g
n

iN
_
[f
i
[ for every n, so that g = sup
nN
g
n
is integrable, by B.Levis theorem as stated in 123A. In particular,
g is nite almost everywhere. Now if x X is such that g(x) is dened and nite,

iJ
[f
i
(x)[ g(x) for every nite
J N, so

iN
[f
i
(x)[ and

iN
f
i
(x) are dened. In this case, of course,

iN
f
i
(x) = lim
n

n
i=0
f
i
(x). But
[

n
i=0
f
i
[
a.e.
g for each n, so Lebesgues Dominated Convergence Theorem tells us that
_

iN
f
i
= lim
n
_

n
i=0
f
i
= lim
n

n
i=0
_
f
i
=

iN
f
i
.
226X Basic exercises >>>(a) A step-function on an interval [a, b] is a function F such that, for suitable t
0
, . . . , t
n
with a = t
0
. . . t
n
= b, F is constant on each interval ]t
i1
, t
i
[. Show that F : [a, b] R is a saltus function i for
every > 0 there is a step-function G : [a, b] R such that Var
[a,b]
(F G) .
(b) Let F, G be real-valued functions of bounded variation dened on an interval [a, b] R. Show that, in the
language of 226C,
(F +G)
p
= F
p
+G
p
, (F +G)
c
= F
c
+G
c
,
(F +G)
cs
= F
cs
+G
cs
, (F +G)
ac
= F
ac
+G
ac
.
>>>(c) Let F be a real-valued function of bounded variation on an interval [a, b] R. Show that, in the language of
226C,
Var
[a,b]
(F) = Var
[a,b]
(F
p
) + Var
[a,b]
(F
c
) = Var
[a,b]
(F
p
) + Var
[a,b]
(F
cs
) + Var
[a,b]
(F
ac
).
226 Notes The Lebesgue decomposition of a function of bounded variation 91
(d) Let F be a real-valued function of bounded variation on an interval [a, b] R. Show that F is absolutely
continuous i Var
[a,b]
(F) =
_
b
a
[F

[.
(e) Consider the function g of 134I/226Cc. Show that g
1
: 0, 1] [0, 1] is dierentiable almost everywhere in [0, 1],
and nd x : (g
1
)

(x) a for each a R.


>>>(f ) Suppose that I and J are sets and that a
i

iI
is a summable family of real numbers. (i) Show that if f : J I
is injective then a
f(j)

jJ
is summable. (ii) Show that if g : I J is any function, then

jJ

ig
1
[{j}]
a
i
is dened
and equal to

iI
a
i
.
226Y Further exercises (a) Explain what modications are appropriate in the description of the Lebesgue
decomposition of a function of bounded variation if we wish to consider functions on open or half-open intervals,
including unbounded intervals.
(b) Suppose that F : [a, b] R is a function of bounded variation, and set h(y) = #(F
1
[y]) for y R. Show
that
_
h = Var
[a,b]
(F
c
), where F
c
is the continuous part of F as dened in 226Ca/226Cd.
(c) Show that a set I is countable i there is a summable family a
i

iI
of non-zero real numbers.
(d) Suppose that a < b in R, and that F : [a, b] R is a function of bounded variation; let F
p
be its saltus part.
Show that [F(b) F(a)[ F[ [a, b] ] + Var
[a,b]
F
p
, where is Lebesgue measure on R.
226 Notes and comments In 232I and 232Yb below I will revisit these ideas, linking them to a decomposition of
the Lebesgue-Stieltjes measure corresponding to a non-decreasing real function, and thence to more general measures.
All this work is peripheral to the main concerns of this volume, but I think it is illuminating, and certainly it is part
of the basic knowledge assumed of anyone working in real analysis.
92 The Radon-Nikodym theorem
Chapter 23
The Radon-Nikod ym Theorem
In Chapter 22, I discussed the indenite integrals of integrable functions on R, and gave what I hope you feel are
satisfying descriptions both of the functions which are indenite integrals (the absolutely continuous functions) and of
how to nd which functions they are indenite integrals of (you dierentiate them). For general measure spaces, we
have no structure present which can give such simple formulations; but nevertheless the same questions can be asked
and, up to a point, answered.
The rst section of this chapter introduces the basic machinery needed, the concept of countably additive functional
and its decomposition into positive and negative parts. The main theorem takes up the second section: indenite inte-
grals are the truly continuous additive functionals; on -nite spaces, these are the absolutely continuous countably
additive functionals. In 233 I discuss the most important single application of the theorem, its use in providing a
concept of conditional expectation. This is one of the central concepts of probability theory as you very likely know;
but the form here is a dramatic generalization of the elementary concept of the conditional probability of one event
given another, and needs the whole strength of the general theory of measure and integration as developed in Volume
1 and this chapter. I include some notes on convex functions, up to and including versions of Jensens inequality
(233I-233J).
While we are in the area of pure measure theory, I take the opportunity to discuss some further topics. I begin
with some essentially elementary constructions, image measures, sums of measures and indenite-integral measures; I
think the details need a little attention, and I work through them in 234. Rather deeper ideas are needed to deal
with measurable transformations. In 235 I set out the techniques necessary to provide an abstract basis for a general
method of integration-by-substitution, with a detailed account of sucient conditions for a formula of the type
_
g(y)dy =
_
g((x))J(x)dx
to be valid.
231 Countably additive functionals
I begin with an abstract description of the objects which will, in appropriate circumstances, correspond to the
indenite integrals of general integrable functions. In this section I give those parts of the theory which do not involve
a measure, but only a set with a distinguished -algebra of subsets. The basic concepts are those of nitely additive
and countably additive functional, and there is one substantial theorem, the Hahn decomposition (231E).
231A Denition Let X be a set and an algebra of subsets of X (136E). A functional : R is nitely
additive, or just additive, if (E F) = E +F whenever E, F and E F = .
231B Elementary facts Let X be a set, an algebra of subsets of X, and : R a nitely additive functional.
(a) = 0. (For = ( ) = +.)
(b) If E
0
, . . . , E
n
are disjoint members of then (

in
E
i
) =

n
i=0
E
i
.
(c) If E, F and E F then F = E +(F E). More generally, for any E, F ,
F = (F E) +(F E).
(d) If E, F then
E F = (E F) +(E F) (E F) (F E) = (E F) (F E).
231C Denition Let X be a set and an algebra of subsets of X. A function : R is countably additive
or -additive if

n=0
E
n
exists in R and is equal to (

nN
E
n
) for every disjoint sequence E
n

nN
in such that

nN
E
n
.
Remark Note that when I use the phrase countably additive functional I mean to exclude the possibility of as a
value of the functional. Thus a measure is a countably additive functional i it is totally nite (211C).
You will sometimes see the phrase signed measure used to mean what I call a countably additive functional.
231E Countably additive functionals 93
231D Elementary facts Let X be a set, a -algebra of subsets of X and : R a countably additive
functional.
(a) is nitely additive. PPP (i) Setting E
n
= for every n N,

n=0
must be dened in R so must be 0. (ii)
Now if E, F and E F = we can set E
0
= E, E
1
= F, E
n
= for n 2 and get
(E F) = (

nN
E
n
) =

n=0
E
n
= E +F. QQQ
(b) If E
n

nN
is a non-decreasing sequence in , with union E , then
E = E
0
+

n=0
(E
n+1
E
n
) = lim
n
E
n
.
(c) If E
n

nN
is a non-increasing sequence in with intersection E , then
E = E
0
lim
n
(E
0
E
n
) = lim
n
E
n
.
(d) If

: R is another countably additive functional, and c R, then +

: R and c : R are
countably additive.
(e) If H , then
H
: R is countably additive, where
H
E = (E H) for every E . PPP If E
n

nN
is a
disjoint sequence in with union E then E
n
H
nN
is disjoint, with union E H, so

H
E = (H E) = (

nN
(H E
n
)) =

n=0
(H E
n
) =

n=0

H
E
n
. QQQ
Remark For the time being, we shall be using the notion of countably additive functional only on -algebras , in
which case we can take it for granted that the unions and intersections above belong to .
231E All the ideas above amount to minor modications of ideas already needed at the very beginning of the
theory of measure spaces. We come now to something more substantial.
Theorem Let X be a set, a -algebra of subsets of X, and : R a countably additive functional. Then
(a) is bounded;
(b) there is a set H such that
F 0 whenever F and F H,
F 0 whenever F and F H = .
proof (a) ??? Suppose, if possible, otherwise. For E , set M(E) = sup[F[ : F , F E; then M(X) = .
Moreover, whenever E
1
, E
2
, F and F E
1
E
2
, then
[F[ = [(F E
1
) +(F E
1
)[ [(F E
1
)[ +[(F E
1
)[ M(E
1
) +M(E
2
),
so M(E
1
E
2
) M(E
1
) + M(E
2
). Choose a sequence E
n

nN
in as follows. E
0
= X. Given that M(E
n
) = ,
where n N, then surely there is an F
n
E
n
such that [F
n
[ 1 + [E
n
[, in which case [(E
n
F
n
)[ 1. Now at
least one of M(F
n
), M(E
n
F
n
) is innite; if M(F
n
) = , set E
n+1
= F
n
; otherwise, set E
n+1
= E
n
F
n
; in either
case, note that [(E
n
E
n+1
)[ 1 and M(E
n+1
) = , so that the induction will continue.
On completing this induction, set G
n
= E
n
E
n+1
for n N. Then G
n

nN
is a disjoint sequence in , so

n=0
G
n
is dened in R and lim
n
G
n
= 0; but [G
n
[ 1 for every n. XXX
(b)(i) By (a), = supE : E < . Choose a sequence E
n

nN
in such that E
n
2
n
for every
n N. For m n N, set F
mn
=

min
E
i
. Then F
mn
2 2
m
+2
n
for every n m. PPP Induce on n. For
n = m, this is due to the choice of E
m
= F
mm
. For the inductive step, we have F
m,n+1
= F
mn
E
n+1
, while surely
(E
n+1
F
mn
), so
+F
m,n+1
(E
n+1
F
mn
) +F
m,n+1
= E
n+1
+(F
mn
E
n+1
) +F
m,n+1
= E
n+1
+F
mn
2
n1
+ 2 2
m
+ 2
n
(by the choice of E
n+1
and the inductive hypothesis)
94 The Radon-Nikodym theorem 231E
= 2 2 2
m
+ 2
n1
.
Subtracting from both sides, F
m,n+1
2 2
m
+ 2
n1
and the induction proceeds. QQQ
(ii) For m N, set
F
m
=

nm
F
mn
=

nm
E
n
.
Then
F
m
= lim
n
F
mn
2 2
m
,
by 231Dc. Next, F
m

mN
is non-decreasing, so setting H =

mN
F
m
we have
H = lim
m
F
m
;
since H is surely less than or equal to , H = .
If F and F H, then
H F = (H F) = H,
so F 0. If F and F H = then
H +F = (H F) = H
so F 0. This completes the proof.
231F Corollary Let X be a set, a -algebra of subsets of X, and : R a countably additive functional.
Then can be expressed as the dierence of two totally nite measures with domain .
proof Take H as described in 231Eb. Set
1
E = (E H),
2
E = (E H) for E . Then, as in 231Dd-e,
both
1
and
2
are countably additive functionals on , and of course =
1

2
. But also, by the choice of H, both

1
and
2
are non-negative, so are totally nite measures.
Remark This is called the Jordan decomposition of . The expression of 231Eb is a Hahn decomposition.
231X Basic exercises (a) Let be the family of subsets A of N such that one of A, N A is nite. Show that
is an algebra of subsets of N. (This is the nite-conite algebra of subsets of N; compare 211Ra.)
(b) Let X be a set, an algebra of subsets of X and : R a nitely additive functional. Show that
(E F G) +(E F) +(E G) +(F G) = E +F +G+(E F G) for all E, F, G . Generalize this
result to longer sequences of sets.
>>>(c) Let be the nite-conite algebra of subsets of N, as in 231Xa. Dene : Z by setting
E = lim
n
_
#(i : i n, 2i E) #(i : i n, 2i + 1 E)
_
for every E . Show that is well-dened and nitely additive and unbounded.
(d) Let X be a set and an algebra of subsets of X. (i) Show that if : R and

: R are nitely
additive, so are +

and c for any c R. (ii) Show that if : R is nitely additive and H , then
H
is
nitely additive, where
H
(E) = (H E) for every E .
(e) Let X be a set, an algebra of subsets of X and : R a nitely additive functional. Let S be the linear
space of those real-valued functions on X expressible in the form

n
i=0
a
i
E
i
where E
i
for each i. (i) Show that
we have a linear functional
_
: S R given by writing
_

n
i=0
a
i
E
i
=

n
i=0
a
i
E
i
whenever a
0
, . . . , a
n
R and E
0
, . . . , E
n
. (ii) Show that if E 0 for every E then
_
f 0 whenever f S
and f(x) 0 for every x X. (iii) Show that if is bounded and X ,= then
sup[
_
f[ : f S, |f|

1 = sup
E,F
[E F[,
writing |f|

= sup
xX
[f(x)[.
>>>(f ) Let X be a set, a -algebra of subsets of X and : R a nitely additive functional. Show that the
following are equiveridical:
231Ye Countably additive functionals 95
(i) is countably additive;
(ii) lim
n
E
n
= 0 whenever E
n

nN
is a non-increasing sequence in and

nN
E
n
= ;
(iii) lim
n
E
n
= 0 whenever E
n

nN
is a sequence in and

nN

mn
E
m
= ;
(iv) lim
n
E
n
= E whenever E
n

nN
is a sequence in and
E =

nN

mn
E
m
=

nN

mn
E
m
.
(Hint: for (i)(iv), consider non-negative rst.)
(g) Let X be a set and a -algebra of subsets of X, and let : [, [ be a function which is countably
additive in the sense that = 0 and whenever E
n

nN
is a disjoint sequence in ,

n=0
E
n
= lim
n

n
i=0
E
i
is
dened in [, [ and is equal to (

nN
E
n
). Show that is bounded above and attains its upper bound (that is,
there is an H such that H = sup
F
F). Hence, or otherwise, show that is expressible as the dierence of a
totally nite measure and a measure, both with domain .
231Y Further exercises (a) Let X be a set, an algebra of subsets of X, and : R a bounded nitely
additive functional. Set

+
E = supF : F , F E,

E = infF : F , F E,
[[E = supF
1
F
2
: F
1
, F
2
, F
1
, F
2
E.
Show that
+
,

and [[ are all bounded nitely additive functionals on and that =


+

, [[ =
+
+

.
Show that if is countably additive so are
+
,

and [[. ([[ is sometimes called the variation of .)


(b) Let X be a set and an algebra of subsets of X. Let
1
,
2
be two bounded nitely additive functionals dened
on . Set
(
1

2
)(E) = sup
1
F +
2
(E F) : F , F E,
(
1

2
)(E) = inf
1
F +
2
(E F) : F , F E.
Show that
1

2
and
1

2
are nitely additive functionals, and that
1
+
2
=
1

2
+
1

2
. Show that, in the
language of 231Ya,

+
= 0,

= () 0 = ( 0), [[ = () =
+

=
+
+

1

2
=
1
+ (
2

1
)
+
,
1

2
=
1
(
1

2
)
+
,
so that
1

2
and
1

2
are countably additive if
1
and
2
are.
(c) Let X be a set and an algebra of subsets of X. Let M be the set of all bounded nitely additive functionals
from to R. Show that M is a linear space under the natural denitions of addition and scalar multiplication. Show
that M has a partial order dened by saying that

i E

E for every E ,
and that for this partial order
1

2
,
1

2
, as dened in 231Yb, are sup
1
,
2
, inf
1
,
2
.
(d) Let X be a set and an algebra of subsets of X. Let
0
, . . . ,
n
be bounded nitely additive functionals on
and set
E = sup

n
i=0

i
F
i
: F
0
, . . . , F
n
,

in
F
i
= E, F
i
F
j
= for i ,= j,
E = inf

n
i=0

i
F
i
: F
0
, . . . , F
n
,

in
F
i
= E, F
i
F
j
= for i ,= j
for E . Show that and are nitely additive and are, respectively, sup
0
, . . . ,
n
and inf
0
, . . . ,
n
in the
partially ordered set of nitely additive functionals on .
(e) Let X be a set and a -algebra of subsets of X; let M be the partially ordered set of all bounded nitely
additive functionals from to R. (i) Show that if A M is non-empty and bounded above in M, then A has a
supremum in M, given by the formula
96 The Radon-Nikodym theorem 231Ye
E = sup
n

i=0

i
F
i
:
0
, . . . ,
n
A, F
0
, . . . , F
n
,
_
in
F
i
= E,
F
i
F
j
= for i ,= j.
(ii) Show that if A M is non-empty and bounded below in M then it has an inmum M, given by the formula
E = inf
n

i=0

i
F
i
:
0
, . . . ,
n
A, F
0
, . . . , F
n
,
_
in
F
i
= E,
F
i
F
j
= for i ,= j.
(f ) Let X be a set, an algebra of subsets of X, and : R a non-negative nitely additive functional. For
E set

ca
(E) = infsup
nN
F
n
: F
n

nN
is a non-decreasing sequence in with union E.
Show that
ca
is a countably additive functional on and that if

is any countably additive functional with


then


ca
. Show that
ca
(
ca
) = 0.
(g) Let X be a set, an algebra of subsets of X, and : R a bounded nitely additive functional. Show that
is uniquely expressible as
ca
+
pfa
, where
ca
is countably additive,
pfa
is nitely additive and if 0

[
pfa
[
and

is countably additive then

= 0.
(h) Let X be a set and an algebra of subsets of X. Let M be the linear space of bounded nitely additive
functionals on , and for M set || = [[(X), dening [[ as in 231Ya. (|| is the total variation of .) Show
that | | is a norm on M under which M is a Banach space. Show that the space of bounded countably additive
functionals on is a closed linear subspace of M.
(i) Repeat as many as possible of the results of this section for complex-valued functionals.
231 Notes and comments The real purpose of this section has been to describe the Hahn decomposition of a
countably additive functional (231E). The very leisurely exposition in 231A-231D is intended as a review of the most
elementary properties of measures, in the slightly more general context of signed measures, with those properties
corresponding to additivity alone separated from those which depend on countable additivity. In 231Xf I set out
necessary and sucient conditions for a nitely additive functional on a -algebra to be countably additive, designed
to suggest that a nitely additive functional is countably additive i it is sequentially order-continuous in some sense.
The fact that a countably additive functional can be expressed as the dierence of non-negative countably additive
functionals (231F) has an important counterpart in the theory of nitely additive functionals: a nitely additive
functional can be expressed as the dierence of non-negative nitely additive functionals if (and only if) it is bounded
(231Ya). But I do not think that this, or the further properties of bounded nitely additive functionals described in
231Xe and 231Y, will be important to us before Volume 3.
232 The Radon-Nikod ym theorem
I come now to the chief theorem of this chapter, one of the central results of measure theory, relating countably
additive functionals to indenite integrals. The objective is to give a complete description of the functionals which
can arise as indenite integrals of integrable functions (232E). These can be characterized as the truly continuous
additive functionals (232Ab). A more commonly used concept, and one adequate in many cases, is that of absolutely
continuous additive functional (232Aa); I spend the rst few paragraphs (232B-232D) on elementary facts about truly
continuous and absolutely continuous functionals. I end the section with a discussion of the decomposition of general
countably additive functionals (232I).
232A Absolutely continuous functionals Let (X, , ) be a measure space and : R a nitely additive
functional.
(a) is absolutely continuous with respect to (sometimes written ) if for every > 0 there is a > 0
such that [E[ whenever E and E .
232B The Radon-Nikodym theorem 97
(b) is truly continuous with respect to if for every > 0 there are E , > 0 such that E is nite and
[F[ whenever F and (E F) .
(c) For reference, I add another denition here. If is countably additive, it is singular with respect to if there
is a set F such that F = 0 and E = 0 whenever E and E X F.
232B Proposition Let (X, , ) be a measure space and : R a nitely additive functional.
(a) If is countably additive, it is absolutely continuous with respect to i E = 0 whenever E = 0.
(b) is truly continuous with respect to i () it is countably additive () it is absolutely continuous with respect
to () whenever E and E ,= 0 there is an F such that F < and (E F) ,= 0.
(c) If (X, , ) is -nite, then is truly continuous with respect to i it is countably additive and absolutely
continuous with respect to .
(d) If (X, , ) is totally nite, then is truly continuous with respect to i it is absolutely continuous with respect
to .
proof (a)(i) If is absolutely continuous with respect to and E = 0, then E for every > 0, so [E[ for
every > 0 and E = 0.
(ii) ??? Suppose, if possible, that E = 0 whenever E = 0, but is not absolutely continuous. Then there is an
> 0 such that for every > 0 there is an E such that E but [E[ . For each n N we may choose an
F
n
such that F
n
2
n
and [F
n
[ . Consider F =

nN

kn
F
k
. Then we have
F inf
nN
(

kn
F
k
) inf
nN

k=n
2
k
= 0,
so F = 0.
Now recall that by 231Eb there is an H such that G 0 when G and G H, and G 0 when G
and G H = . As in 231F, set
1
G = (G H),
2
G = (G H) for G , so that
1
and
2
are totally nite
measures, and
1
F =
2
F = 0 because (F H) = (F H) = 0. Consequently
0 =
i
F = lim
n

i
(

mn
F
m
) limsup
n

i
F
n
for both i, and
0 = lim
n
(
1
F
n
+
2
F
n
) liminf
n
[F
n
[ > 0,
which is absurd. XXX
(b)(i) Suppose that is truly continuous with respect to . It is obvious from the denitions that is absolutely
continuous with respect to . If E ,= 0, there must be an F of nite measure such that [G[ < [E[ whenever
G F = , so that [(E F)[ < [E[ and (E F) ,= 0. This deals with the conditions () and ().
To check that is countably additive, let E
n

nN
be a disjoint sequence in , with union E, and > 0. Let > 0,
F be such F < and [G[ whenever G and (F G) . Then

n=0
(E
n
F) F < ,
so there is an n N such that

i=n
(E
i
F) . Take any m n and consider E

m
=

im
E
i
. We have
[E

m
i=0
E
i
[ = [E E

m
[ = [(E E

m
)[ ,
because
(F E E

m
) =

i=m+1
(F E
i
) .
As is arbitrary,
E =

i=0
E
i
;
as E
n

nN
is arbitrary, is countably additive.
(ii) Now suppose that satises the three conditions. By 231F, can be expressed as the dierence of two
non-negative countably additive functionals
1
,
2
; set

=
1
+
2
, so that

is a non-negative countably additive


functional and [F[

F for every F . Set


= sup

F : F , F <

X < ,
and choose a sequence F
n

nN
of sets of nite measure such that lim
n

F
n
= ; set F

nN
F
n
. If G and
G F

= then G = 0. PPP??? Otherwise, by condition (), there is an F such that F < and (G F) ,= 0.
It follows that
98 The Radon-Nikodym theorem 232B

(F F

(F G) [(F G)[ > 0,


and there must be an n N such that
<

F
n
+

(F F

) =

(F
n
(F F

))

(F F
n
)
because (F F
n
) < ; but this is impossible. XXXQQQ
Setting F

n
=

kn
F
k
for each n, we have lim
n

(F

n
) = 0. Take any > 0, and (using condition ()) let
> 0 be such that [E[
1
2
whenever E . Let n be such that

(F

n
)
1
2
. Now if F and (F F

n
)
then
[F[ [(F F

n
)[ +[(F F

n
)[ +[(F F

)[

1
2
+

(F F

n
) + 0

1
2
+

(F

n
)
1
2
+
1
2
= .
And F

n
< . As is arbitrary, is truly continuous.
(c) Now suppose that (X, , ) is -nite and that is countably additive and absolutely continuous with respect
to . Let X
n

nN
be a non-decreasing sequence of sets of nite measure covering X (211D). If E ,= 0, then
lim
n
(E X
n
) ,= 0, so (E X
n
) ,= 0 for some n. This shows that satises condition () of (b), so is truly
continuous.
Of course the converse of this fact is already covered by (b).
(d) Finally, suppose that X < and that is absolutely continuous with respect to . Then it must be truly
continuous, because we can take F = X in the denition 232Ab.
232C Lemma Let (X, , ) be a measure space and ,

two countably additive functionals on which are truly


continuous with respect to . Take c R and H , and set
H
E = (E H), as in 231De. Then +

, c and

H
are all truly continuous with respect to , and is expressible as the dierence of non-negative countably additive
functionals which are truly continuous with respect to .
proof Let > 0. Set = /(2 +[c[) > 0. Then there are ,

> 0 and E, E

such that E < , E

< and
[F[ whenever (F E) , [

F[ whenever (F E)

. Set

= min(,

) > 0, E

= E E

; then
E

E +E

< .
Suppose that F and (F E

; then
(F H E) (F E)

, (F E

so
[( +

)F[ [F[ +[

F[ + ,
[(c)F[ = [c[[F[ [c[ ,
[
H
F[ = [(F H)[ .
As is arbitrary, +

, c and
H
are all truly continuous.
Now, taking H from 231Eb, we see that
1
=
H
and
2
=
X\H
are truly continuous and non-negative, and
=
1

2
is the dierence of truly continuous measures.
232D Proposition Let (X, , ) be a measure space, and f a -integrable real-valued function. For E set
E =
_
E
f. Then : R is a countably additive functional and is truly continuous with respect to , therefore
absolutely continuous with respect to .
proof Recall that
_
E
f =
_
f E is dened for every E (131Fa). So : R is well-dened. If E, F are
disjoint then
(E F) =
_
f (E F) =
_
(f E) + (f F)
=
_
f E +
_
f F = E +F,
232E The Radon-Nikodym theorem 99
so is nitely additive.
Now 225A, without using the phrase truly continuous, proved exactly that is truly continuous with respect to .
It follows from 232Bb that is countably additive and absolutely continuous.
Remark The functional E
_
E
f is called the indenite integral of f.
232E We are now at last ready for the theorem.
The Radon-Nikod ym theorem Let (X, , ) be a measure space and : R a function. Then the following are
equiveridical:
(i) there is a -integrable function f such that E =
_
E
f for every E ;
(ii) is nitely additive and truly continuous with respect to .
proof (a) If f is a -integrable real-valued function and E =
_
E
f for every E , then 232D tells us that is
nitely additive and truly continuous.
(b) In the other direction, suppose that is nitely additive and truly continuous; note that (by 232B(a-b)) E = 0
whenever E = 0. To begin with, suppose that is non-negative and not zero.
In this case, there is a non-negative simple function f such that
_
f > 0 and
_
E
f E for every E . PPP Let
H be such that H > 0; set =
1
3
H > 0. Let E , > 0 be such that E < and F whenever F
and (F E) ; then (H E) so E (H E) 2 and E (H E) > 0. Set
E
F = (F E) for every
F ; then
E
is a countably additive functional on . Set

=
E
, where = /E; then

is a countably
additive functional and

E > 0. By 231Eb, as usual, there is a set G such that

F 0 if F , F G, but

F 0 if F and F G = . As

(E G) 0,
0 <

(E G) (E G)
and (E G) > 0. Set f = (E G); then f is a non-negative simple function and
_
f = (E G) > 0.
If F then

(F G) 0, that is,
(F G)
E
(F G) = (F E G) =
_
F
f.
So
F (F G)
_
F
f,
as required. QQQ
(c) Still supposing that is a non-negative, truly continuous additive functional, let be the set of non-negative
simple functions f : X R such that
_
E
f E for every E ; then the constant function 0 belongs to , so is
not empty.
If f, g then f g , where (f g)(x) = max(f(x), g(x)) for x X. PPP Set H = x : (f g)(x) 0 ;
then f g = (f H) + (g (X H)) is a non-negative simple function, and for any E ,
_
E
f g =
_
EH
f +
_
E\H
g (E H) +(E H) = E. QQQ
Set
= sup
_
f : f X < .
Choose a sequence f
n

nN
in such that lim
n
_
f
n
= . For each n, set g
n
= f
0
f
1
. . . f
n
; then g
n
and
_
f
n

_
g
n
for each n, so lim
n
_
g
n
= . By B.Levis theorem, f = lim
n
g
n
is integrable and
_
f = . Note
that if E then
_
E
f = lim
n
_
E
f
n
E.
??? Suppose, if possible, that there is an H such that
_
H
f ,= H. Set

1
F = F
_
F
f 0
for every F ; then by (a) of this proof and 232C,
1
is a truly continuous nitely additive functional, and we are
supposing that
1
,= 0. By (b) of this proof, there is a non-negative simple function g such that
_
F
g
1
F for every
F and
_
g > 0. Take n N such that
_
f
n
+
_
g > . Then f
n
+g is a non-negative simple function and
_
F
(f
n
+g) =
_
F
f
n
+
_
F
g
_
F
f +
_
F
g = F
1
F +
_
F
g F
for any F , so f
n
+g , and
100 The Radon-Nikodym theorem 232E
<
_
f
n
+
_
g =
_
f
n
+g ,
which is absurd. XXX Thus we have
_
H
f = H for every H .
(d) This proves the theorem for non-negative . For general , we need only observe that is expressible as
1

2
,
where
1
and
2
are non-negative truly continuous countably additive functionals, by 232C; so that there are integrable
functions f
1
, f
2
such that
i
F =
_
F
f
i
for both i and every F . Of course f = f
1
f
2
is integrable and F =
_
F
f
for every F . This completes the proof.
232F Corollary Let (X, , ) be a -nite measure space and : R a function. Then there is a -integrable
function f such that E =
_
E
f for every E i is countably additive and absolutely continuous with respect to
.
proof Put 232Bc and 232E together.
232G Corollary Let (X, , ) be a totally nite measure space and : R a function. Then there is a -
integrable function f on X such that E =
_
E
f for every E i is nitely additive and absolutely continuous
with respect to .
proof Put 232Bd and 232E together.
232H Remarks (a) Most authors are satised with 232F as the Radon-Nikod ym theorem. In my view the problem
of identifying indenite integrals is of sucient importance to justify an analysis which applies to all measure spaces,
even if it requires a new concept (the notion of truly continuous functional).
(b) I ought to oer an example of an absolutely continuous functional which is not truly continuous. A simple one
is the following. Let X be any uncountable set. Let be the countable-cocountable -algebra of subsets of X and
the countable-cocountable measure on X (211R). Let be the restriction to of counting measure on X. If E = 0
then E = and E = 0, so is absolutely continuous. But for any E of nite measure we have (X E) = 1, so is
not truly continuous. See also 232Xf(i).
*(c) The space (X, , ) of this example is, in terms of the classication developed in Chapter 21, somewhat
irregular; for instance, it is neither locally determined nor localizable, and therefore not strictly localizable, though it
is complete and semi-nite. Can this phenomenon occur in a strictly localizable measure space? We are led here into
a fascinating question. Suppose, in (b), I used the same idea, but with = TX. No diculty arises in constructing ;
but can there now be a with the required properties, that is, a non-zero countably additive functional from TX to
R which is zero on all nite sets? This is the Banach-Ulam problem, on which I have written extensively elsewhere
(Fremlin 93), and to which I will return in Volume 5. The present question is touched on again in 363S in Volume 3.
(d) Following the Radon-Nikod ym theorem, the question immediately arises: for a given , how much possible
variation is there in the corresponding f? The answer is straightforward enough: two integrable functions f and g give
rise to the same indenite integral i they are equal almost everywhere (131Hb).
(e) I have stated the Radon-Nikod ym theorem in terms of arbitrary integrable functions, meaning to interpret
integrability in a wide sense, according to the conventions of Volume 1. However, given a truly continuous countably
additive functional , we can ask whether there is in any sense a canonical integrable function representing it. The
answer is no. But we certainly do not need to take arbitrary integrable functions of the type considered in Chapter
12. If f is any integrable function, there is a conegligible set E such that fE is measurable, and now we can nd a
conegligible measurable set G E domf; if we set g(x) = f(x) for x G, 0 for x X G, then f =
a.e.
g, so g has
the same indenite integral as f (as noted in (d) just above), while g is measurable and has domain X. Thus we can
make a trivial, but sometimes convenient, renement to the theorem: if (X, , ) is a measure space, and : R is
nitely additive and truly continuous with respect to , then there is a -measurable -integrable function g : X R
such that
_
E
g = E for every E .
(f ) It is convenient to introduce now a general denition. If (X, , ) is a measure space and is a [, ]-valued
functional dened on a family of subsets of X, I will say that a [, ]-valued function f dened on a subset of X is
a Radon-Nikod ym derivative of with respect to if
_
E
fd is dened (in the sense of 214D) and equal to E for
every E dom. Thus the integrable functions called f in 232E-232G are all Radon-Nikod ym derivatives; later on
we shall have less well-regulated examples.
When is a measure and f is non-negative, f may be called a density function.
232I The Radon-Nikodym theorem 101
(g) Throughout the work above I have taken it that is dened on the whole domain of . In some of the most
important applications, however, is dened only on some smaller -algebra T. In this case we commonly seek to
apply the same results with T in place of .
232I The Lebesgue decomposition of a countably additive functional: Proposition (a) Let (X, , ) be
a measure space and : R a countably additive functional. Then has unique expressions as
=
s
+
ac
=
s
+
tc
+
e
,
where
tc
is truly continuous with respect to ,
s
is singular with respect to , and
e
is absolutely continuous with
respect to and zero on every set of nite measure.
(b) If X = R
r
, is the algebra of Borel sets in R
r
and is the restriction of Lebesgue measure to , then is
uniquely expressible as
p
+
cs
+
ac
where
ac
is absolutely continuous with respect to ,
cs
is singular with respect
to and zero on singletons, and
p
E =

xE

p
x for every E .
proof (a)(i) Suppose rst that is non-negative. In this case, set

s
E = sup(E F) : F , F = 0,

t
E = sup(E F) : F , F < .
Then both
s
and
t
are countably additive. PPP Surely
s
=
t
= 0. Let E
n

nN
be a disjoint sequence in with
union E. () If F and F = 0, then
(E F) =

n=0
(E
n
F)

n=0

s
(E
n
);
as F is arbitrary,

s
E

n=0

s
E
n
.
() If F and F < , then
(E F) =

n=0
(E
n
F)

n=0

t
(E
n
);
as F is arbitrary,

t
E

n=0

t
E
n
.
() If > 0, then (because

n=0
E
n
= E < ) there is an n N such that

k=n+1
E
k
. Now, for each k n,
there is an F
k
such that F
k
= 0 and (E
k
F
k
)
s
E
k


n+1
. In this case, F =

kn
F
k
, F = 0 and

s
E (E F)

n
k=0
(E
k
F
k
)

n
k=0

s
E
k

k=0

s
E
k
2,
because

k=n+1

s
E
k

k=n+1
E
k
.
As is arbitrary,

s
E

k=0

s
E
k
.
() Similarly, for each k n, there is an F

k
such that F

k
< and (E
k
F

k
)
t
E
k


n+1
. In this case,
F

kn
F

k
, F

< and

t
E (E F

n
k=0
(E
k
F

k
)

n
k=0

t
E
k

k=0

t
E
k
2,
because

k=n+1

t
E
k

k=n+1
E
k
.
As is arbitrary,

t
E

k=0

t
E
k
.
() Putting these together,
s
E =

n=0

s
E
n
and
t
E =

n=0

t
E
n
. As E
n

nN
is arbitrary,
s
and
t
are countably
additive. QQQ
(ii) Still supposing that is non-negative, if we choose a sequence F
n

nN
in such that F
n
= 0 for each n
and lim
n
F
n
=
s
X, then F

nN
F
n
has F

= 0, F

=
s
X; so that
s
(X F

) = 0, and
s
is singular
with respect to in the sense of 232Ac.
Note that
s
F = F whenever F = 0. So if we write
ac
=
s
, then
ac
is a countably additive functional and

ac
F = 0 whenever F = 0; that is,
ac
is absolutely continuous with respect to .
102 The Radon-Nikodym theorem 232I
If we write
tc
=
t

s
, then
tc
is a non-negative countably additive functional;
tc
F = 0 whenever F = 0, and
if
tc
E > 0 there is a set F with F < and
tc
(E F) > 0. So
tc
is truly continuous with respect to , by 232Bb.
Set
e
=
t
=
ac

tc
.
Thus for any non-negative countably additive functional , we have expressions
=
s
+
ac
,
ac
=
tc
+
e
where
s
,
ac
,
tc
and
e
are all non-negative countably additive functionals,
s
is singular with respect to ,
ac
and
e
are absolutely continuous with respect to ,
tc
is truly continuous with respect to , and
e
F = 0 whenever F < .
(iii) For general countably additive functionals : R, we can express as

, where

and

are non-negative countably additive functionals. If we dene

s
,

s
, . . . ,

e
as in (i)-(ii), we get countably additive
functionals

s
=

s
,
ac
=

ac

ac
,
tc
=

tc

tc
,
e
=

e
such that
s
is singular with respect to (if F

, F

are such that


F = F

s
(X F) =

s
(X F) = 0,
then (F

) = 0 and
s
E = 0 whenever E X (F

)),
ac
is absolutely continuous with respect to ,
tc
is
truly continuous with respect to , and
e
F = 0 whenever F < , while
=
s
+
ac
=
s
+
tc
+
e
.
(iv) Moreover, these decompositions are unique. PPP() If, for instance, =
s
+
ac
, where
s
is singular and
ac
is absolutely continuous with respect to , let F,

F be such that F =

F = 0 and
s
E = 0 whenever E

F = ,

s
E = 0 whenever E F = ; then we must have

ac
(E (F

F)) =
ac
(E (F

F)) = 0
for every E , so

s
E = (E (F

F)) =
s
E
for every E . Thus
s
=
s
and
ac
=
ac
.
() Similarly, if
ac
=
tc
+
e
where
tc
is truly continuous with respect to and
e
F = 0 whenever F < ,
then there are sequences F
n

nN
,

F
n

nN
of sets of nite measure such that
tc
F = 0 whenever F

nN
F
n
=
and
tc
F = 0 whenever F

nN

F
n
= . Write F

nN
(F
n


F
n
); then
e
E =
e
E = 0 whenever E F

and

tc
E =
tc
E = 0 whenever E F

= , so
e
E =
ac
(E F

) =
e
E for every E , and
e
=
e
,
tc
=
tc
. QQQ
(b) In this case, is -nite (cf. 211P), so every absolutely continuous countably additive functional is truly
continuous (232Bc), and we shall always have
e
= 0,
ac
=
tc
. But in the other direction we know that singleton
sets, and therefore countable sets, are all measurable. We therefore have a further decomposition
s
=
p
+
cs
, where
there is a countable set K R
r
with
p
E = 0 whenever E , E K = , and
cs
is singular with respect to and
zero on countable sets. PPP (i) If 0, set

p
E = sup(E K) : K R
r
is countable;
just as with
s
, dealt with in (a) above,
p
is countably additive and there is a countable K R
r
such that
p
E =
(E K) for every E . (ii) For general , we can express as

where

and

are non-negative, and write

p
=

p
. (iii)
p
is characterized by saying that there is a countable set K such that
p
E = (E K) for every
E and x = 0 for every x R
r
K. (iv) So if we set
cs
=
s

p
,
cs
will be singular with respect to and
zero on countable sets. QQQ
Now, for any E ,

p
E = (E K) =

xKE
x =

xE
x.
Remark The expression =
p
+
cs
+
ac
of (b) is the Lebesgue decomposition of .
232X Basic exercises (a) Let (X, , ) be a measure space and : R a countably additive functional which
is absolutely continuous with respect to . Show that the following are equiveridical: (i) is truly continuous with
respect to ; (ii) there is a sequence E
n

nN
in such that E
n
< for every n N and F = 0 whenever F
and F

nN
E
n
= .
232Yg The Radon-Nikodym theorem 103
>>>(b) Let g : R R be a bounded non-decreasing function and
g
the associated Lebesgue-Stieltjes measure
(114Xa). Show that
g
is absolutely continuous (equivalently, truly continuous) with respect to Lebesgue measure i
the restriction of g to any closed bounded interval is absolutely continuous in the sense of 225B.
(c) Let X be a set and a -algebra of subsets of X; let : R be a countably additive functional. Let J be
an ideal of , that is, a subset of such that () J () E F J for all E, F J () if E , F J and
E F then E J. Show that has a unique decomposition as =
I
+

I
, where
I
and

I
are countably additive
functionals,

I
E = 0 for every E J, and whenever E ,
I
E ,= 0 there is an F J such that
I
(E F) ,= 0.
>>>(d) Let X be a non-empty set and a -algebra of subsets of X. Show that for any sequence
n

nN
of
countably additive functionals on there is a probability measure on X, with domain , such that every
n
is
absolutely continuous with respect to . (Hint: start with the case
n
0.)
(e) Let (X, , ) be a measure space and (X,

, ) its completion (212C). Let : R be an additive functional
such that E = 0 whenever E = 0. Show that has a unique extension to an additive functional :

R such
that E = 0 whenever E = 0.
(f ) Let T be an ultralter on N including the lter NI : I N is nite (2A1O). Dene : TN 0, 1 by setting
E = 1 if E T, 0 for E TN T. (i) Let
1
be counting measure on TN. Show that is additive and absolutely
continuous with respect to
2
, but is not truly continuous. (ii) Dene
2
: TN [0, 1] by setting
2
E =

nE
2
n1
.
Show that is zero on
2
-negligible sets, but is not absolutely continuous with respect to
2
.
(g) Rewrite this section in terms of complex-valued additive functionals.
(h) Let (X, , ) be a measure space, and and additive functionals on of which is positive and countably
additive, so that (X, , ) also is a measure space. (i) Show that if is absolutely continuous with respect to and
is absolutely continuous with respect to , then is absolutely continuous with respect to . (ii) Show that if is
truly continuous with respect to and is absolutely continuous with respect to then is truly continuous with
respect to .
232Y Further exercises (a) Let (X, , ) be a measure space and : R a nitely additive functional. If E,
F, H and H < set
H
(E, F) = (H (EF)). (i) Show that
H
is a pseudometric on (2A3Fa). (ii) Let
T be the topology on generated by
H
: H , H < (2A3Fc). Show that is continuous for T i it is truly
continuous in the sense of 232Ab. (T is the topology of convergence in measure on .)
(b) For a non-decreasing function F : [a, b] R, where a < b, let
F
be the corresponding Lebesgue-Stieltjes
measure. Show that if we dene (
F
)
ac
, etc., with regard to Lebesgue measure on [a, b], as in 232I, then
(
F
)
p
=
F
p
, (
F
)
ac
=
F
ac
, (
F
)
cs
=
F
cs
,
where F
p
, F
cs
and F
ac
are dened as in 226C.
(c) Extend the idea of (b) to general functions F of bounded variation.
(d) Extend the ideas of (b) and (c) to open, half-open and unbounded intervals (cf. 226Ya).
(e) Let (X, , ) be a measure space and (X,

, ) its c.l.d. version (213E). Let : R be an additive functional
which is truly continuous with respect to . Show that has a unique extension to a functional :

R which is
truly continuous with respect to .
(f ) Let (X, , ) be a measure space and f a -integrable real-valued function. Show that the indenite integral of
f is the unique countably additive functional : R such that whenever E and f(x) [a, b] for almost every
x E, then aE E bE.
(g) Say that two bounded additive functionals
1
,
2
on an algebra of sets are mutually singular if for any > 0
there is an H such that
sup[
1
F[ : F , F H ,
sup[
2
F[ : F , F H = .
104 The Radon-Nikodym theorem 232Yg
(i) Show that
1
and
2
are mutually singular i, in the language of 231Ya-231Yb, [
1
[ [
2
[ = 0.
(ii) Show that if is a -algebra and
1
and
2
are countably additive, then they are mutually singular i there
is an H such that
1
F = 0 whenever F and F H, while
2
F = 0 whenever F and F H = .
(iii) Show that if
s
,
tc
and
e
are dened from and as in 232I, then each pair of the three are mutually
singular.
(h) Let (X, , ) be a measure space and f a non-negative real-valued function which is integrable over X; let be
its indenite integral. Show that for any function g : X R,
_
g d =
_
f g d in the sense that if one of these is
dened in [, ] so is the other, and they are then equal. (Hint: start with simple functions g.)
(i) Let (X, , ) be a measure space, f an integrable function, and : R the indenite integral of f. Show
that [[, as dened in 231Ya, is the indenite integral of [f[.
(j) Let X be a set, a -algebra of subsets of X, and : R a countably additive functional. Show that has
a Radon-Nikod ym derivative with respect to [[ as dened in 231Ya, and that any such derivative has modulus equal
to 1 [[-a.e.
(k) (H.K onig) Let X be a set and , two measures on X with the same domain . For 0, E set
( )(E) = inf(E F) +(E F) : F (cf. 112Ya
1
). Show that the following are equiveridical: (i) E = 0
whenever E = 0; (ii) sup
0
( )(E) = E for every E .
232 Notes and comments The Radon-Nikod ym theorem must be on any list of the half-dozen most important
theorems of measure theory, and not only the theorem itself, but the techniques necessary to prove it, are at the heart
of the subject. In my book Fremlin 74 I discussed a variety of more or less abstract versions of the theorem and of
the method, to some of which I will return in 327 and 365 of the next volume.
As I have presented it here, the essence of the proof is split between 231E and 232E. I think we can distinguish the
following elements. Let be a countably additive functional.
(i) is bounded (231Ea).
(ii) is expressible as the dierence of non-negative functionals (231F).
(I gave this as a corollary of 231Eb, but it can also be proved by simpler methods, as in 231Ya.)
(iii) If > 0, there is an integrable f such that 0 <
f
,
writing
f
for the indenite integral of f. (This is the point at which we really do need the Hahn decomposition 231Eb.)
(iv) The set = f :
f
is closed under countable suprema, so there is an f maximising
_
f.
(In part (b) of the proof of 232E, I spoke of simple functions; but this was solely to simplify the technical details, and
the same argument works if we apply it to instead of . Note the use here of B.Levis theorem.)
(v) Take f from (iv) and use (iii) to show that
f
= 0.
Each of the steps (i)-(iv) requires a non-trivial idea, and the importance of the theorem lies not only in its remarkable
direct consequences in the rest of this chapter and elsewhere, but in the versatility and power of these ideas.
I introduce the idea of truly continuous functional in order to give a reasonably straightforward account of the status
of the Radon-Nikod ym theorem in non--nite measure spaces. Of course the whole point is that a truly continuous
functional, like an indenite integral, must be concentrated on a -nite part of the space (232Xa), so that 232E, as
stated, can be deduced easily from the standard form 232F. I dare to use the word truly in this context because this
kind of continuity does indeed correspond to a topological notion (232Ya).
There is a possible trap in the denition I give of absolutely continuous functional. Many authors use the condition
of 232Ba as a denition, saying that is absolutely continuous with respect to if E = 0 whenever E = 0. For
countably additive functionals this coincides with the - formulation in 232Aa; but for other additive functionals this
need not be so (232Xf(ii)). Mostly the distinction is insignicant, but I note that in 232Bd it is critical, since there
is not assumed to be countably additive.
In 232I I describe one of the many ways of decomposing a countably additive functional into mutually singular parts
with special properties. In 231Yf-231Yg I have already suggested a method of decomposing an additive functional into
the sum of a countably additive part and a purely nitely additive part. All these results have natural expressions in
terms of the ordered linear space of bounded additive functionals on an algebra (231Yc).
1
Formerly 112Yb.
233C Conditional expectations 105
233 Conditional expectations
I devote a section to a rst look at one of the principal applications of the Radon-Nikod ym theorem. It is one of
the most vital ideas of measure theory, and will appear repeatedly in one form or another. Here I give the denition
and most basic properties of conditional expectations as they arise in abstract probability theory, with notes on convex
functions and a version of Jensens inequality (233I-233J).
233A -subalgebras Let X be a set and a -algebra of subsets of X. A -subalgebra of is a -algebra T of
subsets of X such that T . If (X, , ) is a measure space and T is a -subalgebra of , then (X, T, T) is again
a measure space; this is immediate from the denition (112A). Now we have the following straightforward lemma. It
is a special case of 235G below, but I give a separate proof in case you do not wish as yet to embark on the general
investigation pursued in 235.
233B Lemma Let (X, , ) be a measure space and T a -subalgebra of . A real-valued function f dened on a
subset of X is T-integrable i (i) it is -integrable (ii) domf is T-conegligible (iii) f is T-virtually measurable;
and in this case
_
fd(T) =
_
fd.
proof (a) Note rst that if f is a T-simple function, that is, is expressible as

n
i=0
a
i
E
i
where a
i
R, E
i
T and
(T)E
i
< for each i, then f is -simple and
_
fd =

n
i=0
a
i
E
i
=
_
fd(T).
(b) Let U

be the set of non-negative -integrable functions and U


T
the set of non-negative T-integrable
functions.
Suppose f U
T
. Then there is a non-decreasing sequence f
n

nN
of T-simple functions such that f(x) =
lim
n
f
n
T-a.e. and
_
fd(T) = lim
n
_
f
n
d(T).
But now every f
n
is also -simple, and
_
f
n
d =
_
f
n
d(T) for every n, and f = lim
n
f
n
-a.e. So f U

and
_
fd =
_
fd(T).
(c) Now suppose that f is T-integrable. Then it is the dierence of two members of U
T
, so is -integrable, and
_
fd =
_
fd(T). Also conditions (ii) and (iii) are satised, according to the conventions established in Volume 1
(122Nc, 122P-122Q).
(d) Suppose that f satises conditions (i)-(iii). Then [f[ U

, and there is a conegligible set E domf such that


E T and fE is T-measurable. Accordingly [f[E is T-measurable. Now, if > 0, then
(T)x : x E, [f[(x) = x : x E, [f[(x)
1

_
[f[d < ;
moreover,
sup
_
g d(T) : g is a T-simple function, g [f[ T-a.e.
= sup
_
g d : g is a T-simple function, g [f[ T-a.e.
sup
_
g d : g is a -simple function, g [f[ -a.e.

_
[f[d < .
By the criterion of 122Ja, [f[ U
T
. Consequently f, being T-virtually T-measurable, is T-integrable, by 122P.
This completes the proof.
233C Remarks (a) My argument just above is detailed to the point of pedantry. I think, however, that while I
can be accused of wasting paper by writing everything down, every element of the argument is necessary to the result.
To be sure, some of the details are needed only because I use such a wide notion of integrable function; if you restrict
the notion of integrability to measurable functions dened on the whole measure space, there are simplications at
this stage, to be paid for later when you discover that many of the principal applications are to functions dened by
formulae which do not apply on the whole underlying space.
The essential point which does have to be grasped is that while a T-negligible set is always -negligible, a
-negligible set need not be T-negligible.
106 The Radon-Nikodym theorem 233Cb
(b) As the simplest possible example of the problems which can arise, I oer the following. Let (X, , ) be [0, 1]
2
with Lebesgue measure. Let T be the set of those members of expressible as F [0, 1] for some F [0, 1]; it is easy
to see that T is a -subalgebra of . Consider f, g : X [0, 1] dened by saying that
f(t, u) = 1 if u > 0, 0 otherwise,
g(t, u) = 1 if t > 0, 0 otherwise.
Then both f and g are -integrable, being constant -a.e. But only g is T-integrable, because any non-negligible
E T includes a complete vertical section t [0, 1], so that f takes both values 0 and 1 on E. If we set
h(t, u) = 1 if u > 0, undened otherwise,
then again (on the conventions I use) h is -integrable but not T-integrable, as there is no conegligible member of
T included in the domain of h.
(c) If f is dened everywhere in X, and T is complete, then of course f is T-integrable i it is -integrable
and T-measurable. But note that in the example just above, which is one of the archetypes for this topic, T is not
complete, as singleton sets are negligible but not measurable.
233D Conditional expectations Let (X, , ) be a probability space, that is, a measure space with X = 1.
(Nearly all the ideas here work perfectly well for any totally nite measure space, but there seems nothing to be gained
from the extension, and the traditional phrase conditional expectation demands a probability space.) Let T be
a -subalgebra.
(a) For any -integrable real-valued function f dened on a conegligible subset of X, we have a corresponding
indenite integral
f
: R given by the formula
f
E =
_
E
f for every E . We know that
f
is countably
additive and truly continuous with respect to , which in the present context is the same as saying that it is absolutely
continuous (232Bc-232Bd). Now consider the restrictions T,
f
T of and
f
to the -algebra T. It follows directly
from the denitions of countably additive and absolutely continuous that
f
T is countably additive and absolutely
continuous with respect to T, therefore truly continuous with respect to T. Consequently, the Radon-Nikod ym
theorem (232E) tells us that there is a T-integrable function g such that (
f
T)F =
_
F
g d(T) for every F T.
(b) Let us dene a conditional expectation of f on T to be such a function; that is, a T-integrable function
g such that
_
F
g d(T) =
_
F
fd for every F T. Looking back at 233B, we see that for such a g we have
_
F
g d(T) =
_
g F d(T) =
_
g F d =
_
F
g d
for every F T; also, that g is almost everywhere equal to a T-measurable function dened everywhere in X which is
also a conditional expectation of f on T (232He).
(c) I set the word a of the phrase a conditional expectation in bold type to emphasize that there is nothing
unique about the function g. In 242J I will return to this point, and describe an object which could properly be called
the conditional expectation of f on T. g is essentially unique only in the sense that if g
1
, g
2
are both conditional
expectations of f on T then g
1
= g
2
T-a.e. (131Hb). This does of course mean that a very large number of its
properties for instance, the distribution function G(a) = x : g(x) a, where is the completion of (212C)
are independent of which g we take.
(d) A word of explanation of the phrase conditional expectation is in order. This derives from the standard
identication of probability with measure, due to Kolmogorov, which I will discuss more fully in Chapter 27. A real-
valued random variable may be regarded as a measurable, or virtually measurable, function f on a probability space
(X, , ); its expectation becomes identied with
_
fd, supposing that this exists. If F and F > 0 then the
conditional expectation of f given F is
1
F
_
F
f. If F
0
, . . . , F
n
is a partition of X into measurable sets of non-zero
measure, then the function g given by
g(x) =
1
F
i
_
F
i
f if x F
i
is a kind of anticipated conditional expectation; if we are one day told that x F
i
, then g(x) will be our subsequent
estimate of the expectation of f. In the terms of the denition above, g is a conditional expectation of f on the nite
algebra T generated by F
0
, . . . , F
n
. An appropriate intuition for general -algebras T is that they consist of the
events which we shall be able to observe at some stated future time t
0
, while the whole algebra consists of all events,
including those not observable until times later than t
0
, if ever.
233E Conditional expectations 107
233E I list some of the elementary facts concerning conditional expectations.
Proposition Let (X, , ) be a probability space and T a -subalgebra of . Let f
n

nN
be a sequence of -integrable
real-valued functions, and for each n let g
n
be a conditional expectation of f
n
on T. Then
(a) g
1
+g
2
is a conditional expectation of f
1
+f
2
on T;
(b) for any c R, cg
0
is a conditional expectation of cf
0
on T;
(c) if f
1

a.e.
f
2
then g
1

a.e.
g
2
;
(d) if f
n

nN
is non-decreasing a.e. and f = lim
n
f
n
is -integrable, then lim
n
g
n
is a conditional expectation
of f on T;
(e) if f = lim
n
f
n
is dened a.e. and there is a -integrable function h such that [f
n
[
a.e.
h for every n, then
lim
n
g
n
is a conditional expectation of f on T;
(f) if F T then g
0
F is a conditional expectation of f
0
F on T;
(g) if h is a bounded, T-virtually measurable real-valued function dened T-almost everywhere in X, then
g
0
h is a conditional expectation of f
0
h on T;
(h) if is a -subalgebra of T, then a function h
0
is a conditional expectation of f
0
on i it is a conditional
expectation of g
0
on .
proof (a)-(b) We have only to observe that
_
F
g
1
+g
2
d(T) =
_
F
g
1
d(T) +
_
F
g
2
d(T) =
_
F
f
1
d +
_
F
f
2
d =
_
F
f
1
+f
2
d,
_
F
cg
0
d(T) = c
_
F
g
0
d(T) = c
_
F
f
0
d =
_
F
cf
0
d
for every F T.
(c) If F T then
_
F
g
1
d(T) =
_
F
f
1
d
_
F
f
2
d =
_
F
g
2
d(T)
for every F T; consequently g
1
g
2
T-a.e. (131Ha).
(d) By (c), g
n

nN
is non-decreasing T-a.e.; moreover,
sup
nN
_
g
n
d(T) = sup
nN
_
f
n
d =
_
fd < .
By B.Levis theorem, g = lim
n
g
n
is dened T-almost everywhere, and
_
F
g d(T) = lim
n
_
F
g
n
d(T) = lim
n
_
F
f
n
d =
_
F
fd
for every F T, so g is a conditional expectation of f on T.
(e) Set f

n
= inf
mn
f
m
, f

n
= sup
mn
f
m
for each n N. Then we have
h
a.e.
f

n
f
n
f

n

a.e.
h,
and f

nN
, f

nN
are almost-everywhere-monotonic sequences of functions both converging almost everywhere to
f. For each n, let g

n
, g

n
be conditional expectations of f

n
, f

n
on T. By (iii) and (iv), g

nN
and g

nN
are
almost-everywhere-monotonic sequences converging almost everywhere to conditional expectations g

, g

of f. Of
course g

= g

T-a.e. (233Dc). Also, for each n, g

n

a.e.
g
n

a.e.
g

n
, so g
n

nN
converges to g

T-a.e., and
g = lim
n
g
n
is dened almost everywhere and is a conditional expectation of f on T.
(f ) For any H T,
_
H
g
0
F d(T) =
_
HF
g
0
d(T) =
_
HF
f
0
d =
_
H
f
0
F d.
(g)(i) If h is actually (T)-simple, say h =

n
i=0
a
i
F
i
where F
i
T for each i, then
_
F
g
0
hd(T) =

n
i=0
a
i
_
F
g
0
F
i
d(T) =

n
i=0
a
i
_
F
f F
i
d =
_
F
f hd
for every F T. (ii) For the general case, if h is T-virtually measurable and [h(x)[ M T-almost everywhere,
then there is a sequence h
n

nN
of T-simple functions converging to h almost everywhere, and with [h
n
(x)[ M for
every x, n. Now f
0
h
n
f
0
h a.e. and [f
0
h
n
[
a.e.
M[f
0
[ for each n, while g
0
h
n
is a conditional expectation
of f
0
h
n
for every n, so by (e) we see that lim
n
g
0
h
n
will be a conditional expectation of f
0
h; but this is
equal almost everywhere to g
0
h.
(h) We need note only that
_
H
g
0
d(T) =
_
H
f
0
d for every H , so
108 The Radon-Nikodym theorem 233E
_
H
h
0
d() =
_
H
g
0
d(T) for every H

_
H
h
0
d() =
_
H
f
0
d for every H .
233F Remarks Of course the results above are individually nearly trivial (though I think (e) and (g) might give
you pause for thought if they were oered without previous preparation of the ground). Cumulatively they amount to
some quite strong properties. In 242 I will restate them in language which is syntactically more direct, but relies on
a deeper level of abstraction.
As an illustration of the power of conditional expectations to surprise us, I oer the next proposition, which depends
on the concept of convex function.
233G Convex functions Recall that a real-valued function dened on an interval I R is convex if
(tb + (1 t)c) t(b) + (1 t)(c)
whenever b, c I and t [0, 1].
Examples The formulae [x[, x
2
, e
x
x dene convex functions on R; on ]1, 1[ we have 1/(1 x
2
); on ]0, [ we
have 1/x and xln x; on [0, 1] we have the function which is zero on ]0, 1[ and 1 on 0, 1.
233H The general theory of convex functions is both extensive and important; I list a few of their more salient
properties in 233Xe. For the moment the following lemma covers what we need.
Lemma Let I R be a non-empty open interval (bounded or unbounded) and : I R a convex function.
(a) For every a I there is a b R such that (x) (a) +b(x a) for every x I.
(b) If we take, for each q I Q, a b
q
R such that (x) (q) +b
q
(x q) for every x I, then
(x) = sup
qIQ
(q) +b
q
(x q)
for every x I.
(c) is Borel measurable.
proof (a) If c, c

I and c < a < c

, then a is expressible as dc + (1 d)c

for some d ]0, 1[, so that (a)


d(c) + (1 d)(c

) and
(a)(c)
ac

d(c)+(1d)(c

)(c)
dc+(1d)c

c
=
(1d)((c

)(c))
(1d)(c

c)
=
d((c

)(c))
d(c

c)
=
(c

)d(c)(1d)(c

)
c

dc(1d)c


(c

)(a)
c

a
.
This means that
b = sup
c<a,cI
(a)(c)
ac
is nite, and b
(c

)(a)
c

a
whenever a < c

I; accordingly (x) (a) +b(x a) for every x I.


(b) By the choice of the b
q
, (x) sup
qQ

q
(x). On the other hand, given x I, x y I such that x < y and let
b R be such that (z) (x) +b(z x) for every z I. If q Q and x < q < y, we have (y) (q) +b
q
(y q), so
that b
q

(y)(q)
yq
and
(q) +b
q
(x q) = (q) b
q
(q x) (q)
(y)(q)
yq
(q x)
=
yx
yq
(q)
qx
yq
(y)
yx
yq
((x) +b(q x))
qx
yq
(y).
Now
233K Conditional expectations 109
(x) = lim
qx
yx
yq
((x) +b(q x))
qx
yq
(y)
sup
qQ]x,y[
yx
yq
((x) +b(q x))
qx
yq
(y)
sup
qQ]x,y[
(q) +b
q
(x q) sup
qQI
(q) +b
q
(x q).
(c) Writing
q
(x) = (q)+b
q
(xq) for every q QI, every
q
is a Borel measurable function, and = sup
qIQ

q
is the supremum of a countable family of Borel measurable functions, so is Borel measurable.
233I Jensens inequality Let (X, , ) be a measure space and : R R a convex function.
(a) Suppose that f and g are real-valued -virtually measurable functions dened almost everywhere in X and that
g 0 almost everywhere,
_
g = 1 and g f is integrable. Then (
_
g f)
_
g f, where we may need to interpret
the right-hand integral as .
(b) In particular, if X = 1 and f is a real-valued function which is integrable over X, then (
_
f)
_
f.
proof (a) For each q Q take b
q
such that (t)
q
(t) = (q) +b
q
(t q) for every t R (233Ha). Because is Borel
measurable (233Hc), f is -virtually measurable (121H), so g f also is; since g f is dened almost everywhere
and almost everywhere greater than or equal to the integrable function g
0
f,
_
g f is dened in ], ]. Now

q
_
_
g f
_
= (q) +b
q
_
g f b
q
q
=
_
g (b
q
f + ((q) b
q
q)X) =
_
g
q
f
_
g f,
because
_
g = 1 and g 0 a.e. By 233Hb,
(
_
g f) = sup
qQ

q
(
_
g f)
_
g f.
(b) Take g to be the constant function with value 1.
233J Even the special case 233Ib of Jensens inequality is already very useful. It can be extended as follows.
Theorem Let (X, , ) be a probability space and T a -subalgebra of . Let : R R be a convex function and f
a -integrable real-valued function dened almost everywhere in X such that the composition f is also integrable. If
g and h are conditional expectations on T of f, f respectively, then g
a.e.
h. Consequently
_
g
_
f.
proof We use the same ideas as in 233I. For each q Q take a b
q
R such that (t)
q
(t) = (q) + b
q
(t q) for
every t R, so that (t) = sup
qQ

q
(t) for every t R. Now setting

q
(x) = (q) +b
q
(g(x) q)
for x domg, we see that
q
=
q
g is a conditional expectation of
q
f, and as
q
f
a.e.
f we must have
q

a.e.
h.
But also g = sup
qQ

q
wherever g is dened, so g
a.e.
h, as claimed.
It follows at once that
_
g
_
h =
_
f.
233K I give the following proposition, an elaboration of 233Eg, in a very general form, as its applications can turn
up anywhere.
Proposition Let (X, , ) be a probability space, and T a -subalgebra of . Suppose that f is a -integrable
function and h is a (T)-virtually measurable real-valued function dened (T)-almost everywhere in X. Let g, g
0
be conditional expectations of f and [f[ on T. Then f h is integrable i g
0
h is integrable, and in this case g h
is a conditional expectation of f h on T.
proof (a) Suppose that h is a T-simple function. Then surely f h and g
0
h are integrable, and g h is a
conditional expectation of f h as in 233Eg.
(b) Now suppose that f, h 0. Then g = g
0
0 a.e. (233Ec). Let

h be a non-negative T-measurable function
dened everywhere in X such that h =
a.e.

h. For each n N set
110 The Radon-Nikodym theorem 233K
h
n
(x) = 2
n
k if 0 k < 4
n
and 2
n
k

h(x) < 2
n
(k + 1),
= 2
n
if

h(x) 2
n
.
Then h
n
is a (T)-simple function, so g h
n
is a conditional expectation of f h
n
. Both f h
n

nN
and g h
n

nN
are almost everywhere non-decreasing sequences of integrable functions, with limits f h and g h respectively. By
B.Levis theorem,
f h is integrable f

h is integrable
sup
nN
_
f h
n
< sup
nN
_
g h
n
<
(because
_
g h
n
=
_
f h
n
for each n)
g h is integrable g
0
h is integrable.
Moreover, in this case
_
E
f h =
_
E
f

h = lim
n
_
E
f h
n
= lim
n
_
E
g h
n
=
_
E
g

h =
_
E
g h
for every E T, while g h is (T)-virtually measurable, so g h is a conditional expectation of f h.
(c) Finally, consider the general case of integrable f and virtually measurable h. Set f
+
= f 0, f

= (f) 0,
so that f = f
+
f

and 0 f
+
, f

[f[; similarly, set h


+
= h 0, h

= (h) 0. Let g
1
, g
2
be conditional
expectations of f
+
, f

on T. Because 0 f
+
, f

[f[, 0 g
1
, g
2

a.e.
g
0
, while g =
a.e.
g
1
g
2
.
We see that
f h is integrable [f[ [h[ = [f h[ is integrable
g
0
[h[ is integrable
g
0
h is integrable.
And in this case all four of f
+
h
+
, . . . , f

are integrable, so
(g
1
g
2
) h = g
1
h
+
g
2
h
+
g
1
h

+g
2
h

is a conditional expectation of
f
+
h
+
f

h
+
f
+
h

+f

= f h.
Since g h =
a.e.
(g
1
g
2
) h, this also is a conditional expectation of f h, and were done.
233X Basic exercises (a) Let (X, , ) be a probability space and T a -subalgebra of . Let f
n

nN
be a
sequence of non-negative -integrable functions and suppose that g
n
is a conditional expectation of f
n
on T for each n.
Suppose that f = liminf
n
f
n
is integrable and has a conditional expectation g. Show that g
a.e.
liminf
n
g
n
.
(b) Let I R be an interval, and : I R a function. Show that is convex i x : x I, (x) + bx c is an
interval for every b, c R.
>>>(c) Let I R be an open interval and : I R a function. (i) Show that if is dierentiable then it is convex
i

is non-decreasing. (ii) Show that if is absolutely continuous on every bounded closed subinterval of I then is
convex i

is non-decreasing on its domain.


(d) For any r 1, a subset C of R
r
is convex if tx + (1 t)y C for all x, y C and t [0, 1]. If C R
r
is
convex, then a function : C R is convex if (tx + (1 t)y) t(x) + (1 t)(y) for all x, y C and t [0, 1].
Let C R
r
be a convex set and : C R a function. Show that the following are equiveridical: (i) the function
is convex; (ii) the set (x, t) : x C, t R, t (x) is convex in R
r+1
; (iii) the set x : x C, (x) + b. x c is
convex in R
r
for every b R
r
and c R, writing b. x =

r
i=1

i
if b = (
1
, . . . ,
r
) and x = (
1
, . . . ,
r
).
233Yh Conditional expectations 111
(e) Let I R be an interval and : I R a convex function.
(i) Show that if a, d I and a < b c < d then
(b)(a)
ba

(d)(c)
dc
.
(ii) Show that is continuous at every interior point of I.
(iii) Show that either is monotonic on I or there is a c I such that (c) = min
xI
(x) and is non-increasing
on I ], c], monotonic non-decreasing on I [c, [.
(iv) Show that is dierentiable at all but countably many points of I, and that its derivative is non-decreasing
in the sense that

(x)

(y) whenever x, y dom

and x y.
(v) Show that if I is closed and bounded and is continuous then is absolutely continuous.
(vi) Show that if I is closed and bounded and : I R is absolutely continuous with a non-decreasing derivative
then is convex.
(f ) Show that if I R is an interval and , : I R are convex functions so is a +b for any a, b 0.
(g) In the context of 233K, give an example in which g h is integrable but f h is not. (Hint: take X, , T as
in 233Cb, and arrange for g to be 0.)
(h) Let I R be an interval and a non-empty family of convex real-valued functions on I such that (x) =
sup

(x) is nite for every x I. Show that is convex.


233Y Further exercises (a) If I R is an interval, a function : I R is mid-convex if (
x+y
2
)
1
2
((x)+(y))
for all x, y I. Show that a mid-convex function which is bounded on any non-trivial subinterval of I is convex.
(b) Generalize 233Xd to arbitrary normed spaces in place of R
r
.
(c) Let (X, , ) be a probability space and T a -subalgebra of . Let be a convex real-valued function with
domain an interval I R, and f an integrable real-valued function on X such that f(x) I for almost every x X
and f is integrable. Let g, h be conditional expectations on T of f, f respectively. Show that g(x) I for almost
every x and that g
a.e.
h.
(d)(i) Show that if I R is a bounded interval, E I is Lebesgue measurable, and E >
2
3
I where is Lebesgue
measure, then for every x I there are y, z E such that z =
x+y
2
. (Hint: by 134Ya/263A, (x+E)+(2E) > (2I).)
(ii) Show that if f : [0, 1] R is a mid-convex Lebesgue measurable function (denition: 233Ya), a > 0, and
E = x : x [0, 1], a f(x) < 2a is not negligible, then there is a non-trivial interval I [0, 1] such that f(x) > 0
for every x I. (Hint: 223B.) (iii) Suppose that f : [0, 1] R is a mid-convex function such that f 0 almost
everywhere in [0, 1]. Show that f 0 everywhere in ]0, 1[. (Hint: for every x ]0, 1[, max(f(x t), f(x + t)) 0 for
almost every t [0, min(x, 1 x)].) (iv) Suppose that f : [0, 1] R is a mid-convex Lebesgue measurable function
such that f(0) = f(1) = 0. Show that f(x) 0 for every x [0, 1]. (Hint: show that x : f(x) 0 is dense in [0, 1],
use (ii) to show that it is conegligible in [0, 1] and apply (iii).) (v) Show that if I R is an interval and f : I R is a
mid-convex Lebesgue measurable function then it is convex.
(e) Let (X, , ) be a probability space, T a -subalgebra of subsets of X, and f : X [0, ] a -measurable
function. Show that (i) there is a T-measurable g : X [0, ] such that
_
F
g =
_
F
f for every F T (ii) any two
such functions are equal a.e.
(f ) Suppose that r 1 and C R
r
0 is a convex set. Show that there is a non-zero b R
r
such that b. z 0
for every z C. (Hint: if r = 2, identify R
2
with C; reduce to the case in which C contains no points which are real
and negative; set = suparg z : z C and b = ie
i
. Now induce on r.)
(g) Suppose that r 1, C R
r
is a convex set and : C R is a convex function. Show that there is a function
h : R
r
[, [ such that (z) = suph(y)+z . y : y R
r
for every z C. (Hint: try h(y) = inf(z)z . y : z C,
and apply 233Yf to a translate of (z, t) : (z) t.)
(h) Let (X, , ) be a probability space, r 1 an integer and C R
r
a convex set. Let f
1
, . . . , f
r
be -integrable
real-valued functions and suppose that x : x

jr
domf
j
, (f
1
(x), . . . , f
r
(x)) C is a conegligible subset of X.
Show that (
_
f
1
, . . . ,
_
f
r
) C. (Hint: induce on r.)
112 The Radon-Nikodym theorem 233Yi
(i) Let (X, , ) be a probability space, r 1 an integer, C R
r
a convex set and : C R
r
a convex function.
Let f
1
, . . . , f
r
be -integrable real-valued functions and suppose that x : x

jr
domf
j
, (f
1
(x), . . . , f
r
(x)) C is
a conegligible subset of X. Show that (
_
f
1
, . . . ,
_
f
r
)
_
(f
1
, . . . , f
r
).
(j) Let (X, , ) be a measure space with X > 0, r 1 an integer, C R
r
a convex set such that tz C whenever
z C and t > 0, and : C R a convex function. Let f
1
, . . . , f
r
be -integrable real-valued functions and suppose
that x : x

jr
domf
j
, (f
1
(x), . . . , f
r
(x)) C is a conegligible subset of X. Show that (
_
f
1
, . . . ,
_
f
r
) C and
that (
_
f
1
, . . . ,
_
f
r
)
_
(f
1
, . . . , f
r
). (Hint: putting 215B(viii) and 235K below together, show that there are a
probability measure on X and a function h : X [0, [ such that
_
f
j
d =
_
f
j
hd for every j.)
233 Notes and comments The concept of conditional expectation is fundamental in probability theory, and will
reappear in Chapter 27 in its natural context. I hope that even as an exercise in technique, however, it will strike you
as worth taking note of.
I introduced 233E as a list of elementary facts, and they are indeed straightforward. But below the surface there
are some remarkable ideas waiting for expression. If you take T to be the trivial algebra , X, so that the (unique)
conditional expectation of an integrable function f is the constant function (
_
f)X, then 233Ed and 233Ee become
versions of B.Levis theorem and Lebesgues Dominated Convergence Theorem. (Fatous Lemma is in 233Xa.) Even
233Eg can be thought of as a generalization of the result that
_
cf = c
_
f, where the constant c has been replaced by a
bounded T-measurable function. A recurrent theme in the later parts of this treatise will be the search for conditioned
versions of theorems. The proof of 233Ee is a typical example of an argument which has been translated from a proof
of the original unconditioned result.
I suggested that 233I-233J are surprising, and I think that most of us nd them so, even applied to the list of convex
functions given in 233G. But I should remark that in a way 233J has very little to do with conditional expectations.
The only properties of conditional expectations used in the proof are (i) that if g is a conditional expectation of f, then
aX+bg is a conditional expectation of aX+bf for all real a, b (ii) if g
1
, g
2
are conditional expectations of f
1
, f
2
and
f
1

a.e.
f
2
, then g
1

a.e.
g
2
. See 244Xm below. Jensens inequality has an interesting extension to the multidimensional
case, explored in 233Yf-233Yj. If you have encountered geometric forms of the Hahn-Banach theorem (see 3A5C in
Volume 3) you will nd 233Yf and 233Yg very natural, and you may notice that the nite-dimensional case is slightly
dierent from the innite-dimensional case you have probably been taught. I think that in fact the most delicate step
is in 233Yh.
Note that 233Ib can be regarded as the special case of 233J in which T = , X. In fact 233Ia can be derived from
233Ib applied to the measure where E =
_
E
g for every E .
Like 233B, 233K seems to have rather a lot of technical detail in the argument. The point of this result is that we
can deduce the integrability of f h from that of g
0
h (but not from the integrability of g h; see 233Xg). Otherwise
it should be routine.
234 Operations on measures
I take a few pages to describe some standard constructions. The ideas are straightforward, but a number of details
need to be worked out if they are to be securely integrated into the general framework I employ. The rst step is
to formally introduce inverse-measure-preserving functions (234A-234B), the most important class of transformations
between measure spaces. For construction of new measures, we have the notions of image measure (234C-234E), sum of
measures (234G-234H) and indenite-integral measure (234I-234O). Finally I mention a way of ordering the measures
on a given set (234P-234Q).
234A Inverse-measure-preserving functions It is high time that I introduced the nearest thing in measure
theory to a morphism. If (X, , ) and (Y, T, ) are measure spaces, a function : X Y is inverse-measure-
preserving if
1
[F] and (
1
[F]) = F for every F T.
234B Proposition Let (X, , ) and (Y, T, ) be measure spaces, and : X Y an inverse-measure-preserving
function.
(a) If , are the completions of , respectively, is also inverse-measure-preserving for and .
(b) is a probability measure i is a probability measure.
(c) is totally nite i is totally nite.
(d)(i) If is -nite, then is -nite.
(ii) If is semi-nite and is -nite, then is -nite.
234C Operations on measures 113
(e)(i) If is -nite and atomless, then is atomless.
(ii) If is semi-nite and is purely atomic, then is purely atomic.
(f)(i)

1
[B]

B for every B Y .
(ii)

[A] for every A X.


(g) If (Z, , ) is another measure space, and : Y Z is inverse-measure-preserving, then : X Z is inverse-
measure-preserving.
proof (a) If measures F, there are F

, F

T such that F

F F

and (F

) = 0. Now

1
[F

]
1
[F]
1
[F

], (
1
[F

]
1
[F

]) = (F

) = 0,
so measures
1
[F] and
(
1
[F]) =
1
[F

] = F

= F.
As F is arbitrary, is inverse-measure-preserving for and .
(b)-(c) are surely obvious.
(d)(i) If F
n

nN
is a cover of Y by sets of nite measure for , then
1
[F
n
]
nN
is a cover of X by sets of nite
measure for .
(ii) Let T T be a disjoint family of non--negligible sets. Then
1
[F]
FF
is a disjoint family of non--
negligible sets. By 215B(iii), T is countable. By 215B(iii) in the opposite direction, is -nite.
(e)(i) Suppose that E and E > 0. Let F
n

nN
be a cover of Y by sets of nite measure for . Because is
atomless, we can nd, for each n, a nite partition F
ni

iI
n
of F
n
such that F
ni
< E for every i I
n
(use 215D
repeatedly). Now X =

nN,iI
n

1
[F
ni
], so there are n N and i I
n
with
0 < (E
1
[F
ni
)
1
[F
ni
] = F
ni
< E,
and E is not a -atom. As E is arbitrary, is atomless.
(ii) Suppose that F T and F > 0. Because is semi-nite, there is an F
1
F such that 0 < F
1
< . Now

1
[F
1
] > 0; because is purely atomic, there is a -atom E
1
[F
1
].
Let ( be the set of those G T such that G F
1
and (E
1
[G]) = 0. Then the union of any sequence in (
belongs to (, so by 215Ac there is an H ( such that (G H) = 0 whenever G (. Consider F
1
H. We have
(F
1
H) = (
1
[F
1
]
1
[H]) (E
1
[H]) = E > 0.
If G T and G F
1
H, then one of E
1
[G], E
1
[G] is -negligible. In the former case, G ( and G = G H
is -negligible. In the latter case, F
1
G ( and (F
1
H) G is -negligible. As G is arbitrary, F
1
H is a -atom
included in F; as F is arbitrary, is purely atomic.
(f )(i) Let F T be such that B F and

B = F (132Aa); then
1
[B]
1
[F] so

1
[B]
1
[F] = F =

B.
(ii)

(
1
[[A]])

[A] by (i).
(g) For any W ,
()
1
[W] =
1
[
1
[W]] =
1
[W] = W.
234C Image measures The following construction is one of the commonest ways in which new measure spaces
appear.
Proposition Let (X, , ) be a measure space, Y any set, and : X Y a function. Set
T = F : F Y,
1
[F] , F = (
1
[F]) for every F T.
Then (Y, T, ) is a measure space.
proof (a) =
1
[] so T.
(b) If F T, then
1
[F] , so X
1
[F] ; but X
1
[F] =
1
[Y F], so Y F T.
(c) If F
n

nN
is a sequence in T, then
1
[F
n
] for every n, so

nN

1
[F
n
] ; but
1
[

nN
F
n
] =

nN

1
[F
n
], so

nN
F
n
T.
114 The Radon-Nikodym theorem 234C
Thus T is a -algebra.
(d) =
1
[] = = 0.
(e) If F
n

nN
is a disjoint sequence in T, then
1
[F
n
]
nN
is a disjoint sequence in , so
(

nN
F
n
) =
1
[

nN
F
n
] = (

nN

1
[F
n
]) =

n=0

1
[F
n
] =

n=0
F
n
.
So is a measure.
234D Denition In the context of 234C, is called the image measure or push-forward measure; I will denote
it
1
.
Remark I ought perhaps to say that this construction does not always produce exactly the right measure on Y ; there
are circumstances in which some modication of the measure
1
described here is more useful. But I will note these
explicitly when they occur; when I use the unadorned phrase image measure I shall mean the measure constructed
above.
234E Proposition Let (X, , ) be a measure space, Y a set and : X Y a function; let
1
be the image
measure on Y .
(a) is inverse-measure-preserving for and
1
.
(b) If is complete, so is
1
.
(c) If Z is another set, and : Y Z a function, then the image measures ()
1
and (
1
)
1
on Z are the
same.
proof (a) Immediate from the denitions.
(b) Write for
1
and T for its domain. If

B = 0, then

1
[B] = 0, by 234B(f-i); as is complete,

1
[B] , so B T. As B is arbitrary, is complete.
(c) For G Z and u [0, ],
(()
1
)(G) is dened and equal to u
(()
1
[G]) is dened and equal to u
(
1
[
1
[G]]) is dened and equal to u
(
1
)(
1
[G])) is dened and equal to u
((
1
)
1
)(G) is dened and equal to u.
*234F In the opposite direction, the following construction of a pull-back measure is sometimes useful.
Proposition Let X be a set, (Y, T, ) a measure space, and : X Y a function such that [X] has full outer
measure in Y . Then there is a measure on X, with domain =
1
[F] : F T, such that is inverse-measure-
preserving for and .
proof The check that is a -algebra of subsets of X is straightforward; all we need to know is that
1
[] = ,
X
1
[F] =
1
[Y F] for every F Y , and that
1
[

nN
F
n
] =

nN

1
[F
n
] for every sequence F
n

nN
of
subsets of Y . The key fact is that if F
1
, F
2
T and
1
[F
1
] =
1
[F
2
], then [X] does not meet F
1
F
2
; because [X]
has full outer measure, F
1
F
2
is -negligible and F
1
= F
2
. Accordingly the formula
1
[F] = F does dene a
function : [0, ]. Now
=
1
[] = = 0.
Next, if E
n

nN
is a disjoint sequence in , choose F
n
T such that E
n
=
1
[F
n
] for each n N. The sequence
F
n

nN
need not be disjoint, but if we set F

n
= F
n

i<n
F
i
for each n N, then F

nN
is disjoint and
E
n
= E
n

i<n
E
i
=
1
[F

n
]
for each n; so
(

nN
E
n
) = (

nN
F

n
) =

n=0
F

n
=

n=0
E
n
.
As E
n

nN
is arbitrary, is a measure on X, as required.
234H Operations on measures 115
234G Sums of measures I come now to a quite dierent way of building measures. The idea is an obvious one,
but the technical details, in the general case I wish to examine, need watching.
Proposition Let X be a set, and
i

iI
a family of measures on X. For each i I, let
i
be the domain of
i
. Set
= TX

iI

i
and dene : [0, ] by setting E =

iI

i
E for every E . Then is a measure on X.
proof is a -algebra of subsets of X because every
i
is. (Apply 111Ga with S =
i
: i I TX.) Of course
takes values in [0, ] (226A). = 0 because
i
= 0 for every i. If E
n

nN
is a disjoint sequence in with union
E, then
E =

iI

i
E =

iI

n=0

i
E
n
=

n=0

iI

i
E
n
(226Af)
=

n=0
E
n
.
So is a measure.
Remark In this context, I will call the sum of the family
i

iI
.
234H Proposition Let X be a set and
i

iI
a family of complete measures on X with sum .
(a) is complete.
(b)(i) A subset of X is -negligible i it is
i
-negligible for every i I.
(ii) A subset of X is -conegligible i it is
i
-conegligible for every i I.
(c) Let f be a function from a subset of X to [, ]. Then
_
fd is dened in [, ] i
_
fd
i
is dened in
[, ] for every i and one of

iI
f
+
d
i
,

iI
f

d
i
is nite, and in this case
_
fd =

iI
_
fd
i
.
proof Write
i
= dom
i
for i I, = TX

iI

i
= dom.
(a) If E F and F = 0, then
i
F = 0 for every i I; because
i
is complete, E
i

i
for every i I, and
E .
(b) This now follows at once, since a set A X is -negligible i A = 0.
(c)(i) Note rst that (b-ii) tells us that, under either hypothesis, domf is conegligible both for and for every
i
,
so that if we extend f to X by giving it the value 0 on X domf then neither
_
fd nor

iI
_
fd
i
is aected. So
let us assume from now on that f is dened everywhere on X. Now it is plain that either hypothesis ensures that f is
-measurable, that is, is
i
-measurable for every i I.
(ii) Suppose that f is non-negative. For n N set f
n
(x) =

4
n
k=1
2
n
x : f(x) 2
n
k, so that f
n

nN
is a
non-decreasing sequence with supremum f. We have
_
f
n
d =
4
n

k=1
2
n
x : f(x) 2
n
k =
4
n

k=1

iI
2
n

i
x : f(x) 2
n
k
=

iI
4
n

k=1
2
n

i
x : f(x) 2
n
k =

iI
_
f
n
d
i
for every n, so
_
fd = sup
nN
_
f
n
d = sup
nN
sup
JI is nite

iJ
_
f
n
d
i
= sup
JI is nite
sup
nN

iJ
_
f
n
d
i
= sup
JI is nite
lim
n

iJ
_
f
n
d
i
= sup
JI is nite

iJ
lim
n
_
f
n
d
i
=

iI
_
fd
i
.
(iii) Generally,
116 The Radon-Nikodym theorem 234H
_
fd is dened in [, ]

_
f
+
d and
_
f

d are dened and at most one is innite

iI
_
f
+
d
i
and

iI
_
f

d
i
are dened and at most one is innite

_
fd
i
is dened for every i and at most one of

iI
_
f
+
d
i
,

iI
_
f

d
i
is innite,
and in this case
_
fd =
_
f
+
d
_
f

d =

iI
_
f
+
d
i

iI
_
f

d
i
=

iI
_
fd
i
.
234I Indenite-integral measures Extending an idea already used in 232D, we are led to the following construc-
tion; once again, we need to take care over the formal details if we want to get full value from it.
Theorem Let (X, , ) be a measure space, and f a non-negative -virtually measurable real-valued function dened
on a conegligible subset of X. Write F =
_
f F d whenever F X is such that the integral is dened in [0, ]
according to the conventions of 133A. Then is a complete measure on X, and its domain includes .
proof (a) Write T for the domain of , that is, the family of sets F X such that
_
f F d is dened in [0, ],
that is, f F is -virtually measurable (133A). Then T is a -algebra of subsets of X. PPP For each F T let H
F
X
be a -conegligible set such that f FH
F
is -measurable. Because f itself is -virtually measurable, X T. If
F T, then
f (X F)(H
X
H
F
) = f(H
X
H
F
) (f F)(H
X
H
F
)
is -measurable, while H
X
H
F
is -conegligible, so X F T. If F
n

nN
is a sequence in T with union F, set
H =

nN
H
F
n
; then H is conegligible, f F
n
H is -measurable for every n N, and f F = sup
nN
f F
n
,
so f FH is -measurable, and F T. Thus T is a -algebra. If F , then f FH
X
is -measurable, so
F T. QQQ
(b) Next, is a measure. PPP Of course F [0, ] for every F T. f = 0 wherever it is dened, so = 0.
If F
n

nN
is a disjoint sequence in T with union F, then f F =

n=0
f F
n
. If F
m
= for some m, then we
surely have F = =

n=0
F
n
. If F
m
< for each m but

n=0
F
n
= , then
_
f (

nm
F
n
) =

m
n=0
_
f F
n

as m , so again F = =

n=0
F
n
. If

n=0
F
n
< then by B.Levis theorem
F =
_

n=0
f F
n
=

n=0
_
f F
n
=

n=0
F
n
. QQQ
(c) Finally, is complete. PPP If A F T and F = 0, then f F = 0 a.e., so f A = 0 a.e. and A is dened
and equal to zero. QQQ
234J Denition Let (X, , ) be a measure space, and another measure on X with domain T. I will call an
indenite-integral measure over , or sometimes a completed indenite-integral measure, if it can be obtained
by the method of 234I from some non-negative virtually measurable function f dened almost everywhere on X. In
this case, f is a Radon-Nikod ym derivative of with respect to in the sense of 232Hf. As in 232Hf, the phrase
density function is also used in this context.
234K Remarks Let (X, , ) be a measure space, and f a -virtually measurable non-negative real-valued function
dened almost everywhere on X; let be the associated indenite-integral measure.
(a) There is a -measurable function g : X [0, [ such that f = g -a.e. PPP Let H domf be a measurable
conegligible set such that fH is measurable, and set g(x) = f(x) for x H, g(x) = 0 for x X H. QQQ In this case,
_
f E d =
_
g E d if either is dened. So g is a Radon-Nikod ym derivative of , and has a Radon-Nikod ym
derivative which is -measurable and dened everywhere.
234M Operations on measures 117
(b) If E is -negligible, then f E = 0 -a.e., so E = 0. Many authors are prepared to say is absolutely
continuous with respect to in this context. But if is not totally nite, it need not be absolutely continuous in the
- sense of 232Aa (234Xh), and further diculties can arise if or is not -nite (see 234Yk, 234Ym).
(c) I have dened indenite-integral measure in such a way as to produce a complete measure. In my view this
is what makes best sense in most applications. There are occasions on which it seems more appropriate to use the
measure
0
: [0, ] dened by setting
0
E =
_
E
fd =
_
f E d for E . I suppose I would call this the
uncompleted indenite-integral measure over dened by f. ( is always the completion of
0
; see 234Lb.)
(d) Note the way in which I formulated the denition of : E =
_
f E d if the integral is dened, rather
than E =
_
E
fd. The point is that the longer formula gives a rule for deciding what the domain of must be. Of
course it is the case that E =
_
E
fd for every E dom (apply 214F to f E).
(e) Because and its completion dene the same virtually measurable functions, the same null ideals and the same
integrals (212Eb, 212F), they dene the same indenite-integral measures.
234L The domain of an indenite-integral measure It is sometimes useful to have an explicit description of
the domain of a measure constructed in this way.
Proposition Let (X, , ) be a measure space, f a non-negative -virtually measurable function dened almost
everywhere in X, and the associated indenite-integral measure. Set G = x : x domf, f(x) > 0, and let

be
the domain of the completion of .
(a) The domain T of is E : E X, E G

; in particular, T

.
(b) is the completion of its restriction to .
(c) A set A X is -negligible i A G is -negligible.
(d) In particular, if itself is complete, then T = E : E X, E G and A = 0 i (A G) = 0.
proof (a)(i) If E T, then f E is virtually measurable, so there is a conegligible measurable set H domf such
that f EH is measurable. Now E G H = x : x H, (f E)(x) > 0 must belong to , while E G H is
negligible, so belongs to

, and E G

.
(ii) If E G

, let F
1
, F
2
be such that F
1
E G F
2
and F
2
F
1
is negligible. Let H domf be a
conegligible set such that fH is measurable. Then H

= H (F
2
F
1
) is conegligible and f EH

= f F
1
H

is measurable, so f E is virtually measurable and E T.


(b) Thus the given formula does indeed describe T. If E T, let F
1
, F
2
be such that F
1
E G F
2
and
(F
2
F
1
) = 0. Because G itself also belongs to

, there are G
1
, G
2
such that G
1
G G
2
and (G
2
G
1
) = 0.
Set F

2
= F
2
(X G
1
). Then F

2
and F
1
E F

2
. But (F

2
F
1
) G (G
2
G
1
) (F
2
F
1
) is -negligible, so
(F

2
F
1
) = 0.
This shows that if

is the completion of and T

is its domain, then T T

. But as is complete, it surely


extends

, so =

, as claimed.
(c) Now take any A X. Because is complete,
A is -negligible A = 0

_
f Ad = 0
f A = 0 -a.e.
A G is -negligible.
(d) This is just a restatement of (a) and (c) when = .
234M Corollary If (X, , ) is a complete measure space and G , then the indenite-integral measure over
dened by G is just the measure G dened by setting
( G)(F) = (F G) whenever F X and F G .
proof 234Ld.
118 The Radon-Nikodym theorem *234N
*234N The next two results will not be relied on in this volume, but I include them for future reference, and to
give an idea of the scope of these ideas.
Proposition Let (X, , ) be a measure space, and an indenite-integral measure over .
(a) If is semi-nite, so is .
(b) If is complete and locally determined, so is .
(c) If is localizable, so is .
(d) If is strictly localizable, so is .
(e) If is -nite, so is .
(f) If is atomless, so is .
proof By 234Ka, we may express as the indenite integral of a -measurable function f : X [0, [. Let T be
the domain of , and

the domain of the completion of ; set G = x : x X, f(x) > 0 .
(a) Suppose that E T and that E = . Then E G cannot be -negligible. Because is semi-nite, there is
a non-negligible F such that F E G and F < . Now F =

nN
x : x F, 2
n
f(x) n, so there is
an n N such that F

= x : x F, 2
n
f(x) n is non-negligible. Because f is measurable, F

T and
2
n
F

nF

. Thus we have found an F

E such that 0 < F

< . As E is arbitrary, is semi-nite.


(b) We already know that is complete (234Lb) and semi-nite. Now suppose that E X is such that E F T,
that is, E F G (234Ld), whenever F T and F < . Then E G F whenever F and F < .
PPP Set F
n
= x : x F G, f(x) n. Then F
n
nF < , so E G F
n
for every n. But this means that
E GF =

nN
E GF
n
. QQQ Because is locally determined, E G and E T. As E is arbitrary, is
locally determined.
(c) Let T T be any set. Set c = F G : F T, so that c

. By 212Ga, is localizable, so c has an essential
supremum H

. But now, for any H

T, H

(X G) = (H

G) (X G) belongs to

, so
(F H

) = 0 for every F T
(F G H

) = 0 for every F T
(E H

) = 0 for every E c
(E (H

(X G))) = 0 for every E c


(H ((H

(X G))) = 0
(H G H

) = 0
(H H

) = 0.
Thus H is also an essential supremum of T in T. As T is arbitrary, is localizable.
(d) Let X
i

iI
be a decomposition of X for ; then it is also a decomposition for (212Gb). Set F
0
= XG, F
n
=
x : x G, n 1 < f(x) n for n 1. Then X
i
F
n

iI,nN
is a decomposition for . PPP (i) X
i

iI
and F
n

nN
are partitions of X into members of T, so X
i
F
n

iI,nN
also is. (ii) (X
i
F
0
) = 0, (X
i
F
n
) nX
i
<
for i I, n 1. (iii) If E X and EX
i
F
n
T for every i I and n N then EX
i
G =

nN
EX
i
F
n
G
belongs to

for every i, so E G

and E T. (iv) If E T, then of course

iI,nN
(E X
i
F
n
) = sup
JIN is nite

(i,n)J
(E X
i
F
n
) E.
So if

iI,nN
(E X
i
F
n
) = it is surely equal to E. If the sum is nite, then K = i : i I, (E X
i
) > 0
must be countable. But for i I K,
_
EX
i
fd = 0, so f = 0 -a.e. on E X
i
, that is, (E G X
i
) = 0. Because
X
i

iI
is a decomposition for , (E G

iI\K
X
i
) = 0 and (E

iI\K
X
i
) = 0. But this means that
E =

iK
(E X
i
) =

iK,nN
(E X
i
F
n
) =

iI,nN
(E X
i
F
n
).
As E is arbitrary, X
i
F
n

iI,nN
is a decomposition for . QQQ So is strictly localizable.
(e) If is -nite, then in (d) we can take I to be countable, so that I N also is countable, and will be -nite.
(f ) If is atomless, so is (212Gd). If E T and E > 0, then (E G) > 0, so there is an F

such that
F E G and neither F nor E G F is -negligible. But in this case both F =
_
F
fd and (E F) =
_
E\F
fd
must be greater than 0 (122Rc). As E is arbitrary, is atomless.
234Q Operations on measures 119
*234O For localizable measures, there is a straightforward description of the associated indenite-integral measures.
Theorem Let (X, , ) be a localizable measure space. Then a measure , with domain T , is an indenite-integral
measure over i () is semi-nite and zero on -negligible sets () is the completion of its restriction to ()
whenever E > 0 there is an F E such that F , F < and F > 0.
proof (a) If is an indenite-integral measure over , then by 234Na, 234Kb and 234Lb it is semi-nite, zero on
-negligible sets and the completion of its restriction to . Now suppose that E T and E > 0. Then there is an
E
0
such that E
0
E and E
0
= E, by 234Lb. If f : X R is a -measurable Radon-Nikod ym derivative of
(234Ka), and G = x : f(x) > 0, then (GE
0
) > 0; because is semi-nite, there is an F such that F GE
0
and 0 < F < , in which case F > 0.
(b) So now suppose that satises the conditions.
(i) Set c = E : E , E < . For each E c, consider
E
: R, setting
E
G = (G E) for every
G . Then
E
is countably additive and truly continuous with respect to . PPP
E
is countably additive, just as in
231De. Because is zero on -negligible sets,
E
must be absolutely continuous with respect to , by 232Ba. Since

E
clearly satises condition () of 232Bb, it must be truly continuous. QQQ
By 232E, there is a -integrable function f
E
such that
E
G =
_
G
f
E
d for every G , and we may suppose that
f
E
is -measurable (232He). Because
E
is non-negative, f
E
0 -almost everywhere.
(ii) If E, F c then f
E
= f
F
-a.e. on E F, because
_
G
f
E
d = G =
_
G
f
F
d
whenever G and G E F. Because (X, , ) is localizable, there is a measurable f : X R such that
f
E
= f -a.e. on E for every E c (213N). Because every f
E
is non-negative almost everywhere, we may suppose
that f is non-negative, since surely f
E
= f 0 -a.e. on E for every E c.
(iii) Let

be the indenite-integral measure dened by f. If E c then


E =
_
E
f
E
d =
_
E
fd =

E.
For E c, we have

E sup

F : F c, F E = supF : F c, F E = E =
because is semi-nite. Thus

and agree on . But since each is the completion of its restriction to , they must
be equal.
234P Ordering measures There are many ways in which one measure can dominate another. Here I will describe
one of the simplest.
Denition Let , be two measures on a set X. I will say that if E is dened, and E E, whenever
measures E.
234Q Proposition Let X be a set, and write M for the set of all measures on X.
(a) Dening as in 234P, (M, ) is a partially ordered set.
(b) If , M, then i there is a M such that + = .
(c) If in M and f is a [, ]-valued function, dened on a subset of X, such that
_
fd is dened in
[, ], then
_
fd is dened; if f is non-negative,
_
fd
_
fd.
proof (a) Of course for every M. If and in M, then dom dom dom, and
E E E whenever measures E. If and then dom dom dom and E E E for
every E in their common domain, so = .
(b)(i) If + = , then the denitions in 234G and 234P make it plain that .
(ii)() In the reverse direction, if , write T for the domain of . Dene : T [0, ] by setting
G = supF F : F T, F G, F <
for G T. Then M. PPP Of course dom = T is a -algebra, and = 0. Suppose that G
n

nN
is a disjoint
sequence in T with union G. If F T, F G and F < , then
120 The Radon-Nikodym theorem 234Q
F F =

n=0
(F G
n
)

n=0
(F G
n
)
=

n=0
(F G
n
) (F G
n
)

n=0
G
n
;
as F is arbitrary, G

n=0
G
n
. If <

n=0
G
n
, there are an m N such that <

m
n=0
G
n
, and F
0
, . . . , F
m
such that F
n
T, F
n
G
n
and F
n
< for every n m, while

m
n=0
F
n
F
n
. Set F =

nm
F
n
; then
F T, F G and F < , so
G F F =

m
n=0
F
n
F
n
.
As is arbitrary, G

n=0
G
n
; as G
n

nN
is arbitrary, is countably additive. QQQ
() Now + = . PPP The domain of + is dom dom = T = dom. Take G T. If G = , then
G = = ( +)G. Otherwise,
( +)G G+GG = G.
So if G = we shall certainly have G = ( +)G. Finally, if G < then
( +)G = G+ supF F : F T, F G
= supF +(G F) : F T, F G
supF +(G F) : F T, F G = G,
so again we have equality. QQQ
Thus we have an appropriate expression of as a sum of measures.
(c)(i) If f is non-negative, put (b) and 234Hc together.
(ii) In general, if
_
fd is dened, so are both
_
f
+
d and
_
f

d, and at most one is innite; so


_
f
+
d and
_
f

d are dened and at most one is innite.


234X Basic exercises (a) Let (X, , ) and (Y, T, ) be measure spaces, and : X Y an inverse-measure-
preserving function. Let A X be a set of full outer measure in X. Show that [A] has full outer measure in Y , and
that A is inverse-measure-preserving for the subspace measures on A and [A].
(b) Let (X, , ) be a measure space, Y a set and : X Y a function. Show that if is point-supported, so is
the image measure
1
.
(c) Give an example of a probability space (X, , ), a set Y , and a function : X Y such that the completion
of the image measure
1
is not the image of the completion of . (Hint: #(X) = 3.)
(d) Let X, Y be sets, : X Y a function and
i

iI
a family of measures on X with sum . Writing
i

1
,

1
for the image measures on Y , show that
1
=

iI

i

1
.
(e) Let X be a set. (i) Show that if
i

iI
is a countable family of -nite measures on X, and =

iI

i
is
semi-nite, then is -nite. (ii) Show that if
i

iI
is a family of purely atomic measures on X, and =

iI

i
is semi-nite, then is purely atomic. (iii) Show that if
i

iI
is any family of point-supported measures on X, then

iI

i
is point-supported.
>>>(f ) Let X be a set, and write M for the set of all measures on X. For M and [0, [, dene by saying
that if > 0 then ()(E) = E for E dom, while if = 0 then ()(E) = 0 for every E X. (i) Show that
M for all [0, [ and M. (ii) Show that ( +) = +, () = (), ( +) = + for all
, [0, [ and , M.
(g) Let X be a set, and
i

iI
a family of complete measures on X with sum . Show that a [, ]-valued
function f dened on a subset of X is -integrable i it is
i
-integrable for every i I and

iI
_
[f[d
i
is nite.
(h) Let be Lebesgue measure on [0, 1], and set f(x) =
1
x
for x > 0. Let be the associated indenite-integral
measure. Show that the domain of is equal to the domain of . Show that for every

0,
1
2

there is a measurable
set E such that E = but E =
1

.
234Yl Operations on measures 121
(i) Let (X, , ) be a measure space. (i) Show that if
1
and
2
are indenite-integral measures over , so is
1
+
2
.
(ii) Show that if
i

iI
is a countable family of indenite-integral measures over , and =

iI

i
is semi-nite,
then is an indenite-integral measure over .
(j) Let (X, , ) be a measure space, and an indenite-integral measure over . Show that if is purely atomic,
so is .
(k) Let be a point-supported measure. Show that any indenite-integral measure over is point-supported.
(l) Let X be a set, and M the set of measures on X, with the partial ordering dened in 234P. Show that (i) M
has greatest and least members (to be described); (ii) if
i

iI
and
i

iI
are families in M such that
i

i
for
every i, then

iI

i

iI

i
; (iii) if we dene scalar multiplication as in 234Xf, then whenever M and
[0, 1]; (iv) writing for the completion of , and whenever , M and ; (v) writing for the
c.l.d. version of , for every M; (vi) whenever A M is upwards-directed, it has a least upper bound in M.
(m) Write out an elementary direct proof of 234Qc not depending on 234Qb.
234Y Further exercises (a) Write for Lebesgue measure on Y = [0, 1], and T for its domain. Let A [0, 1]
be a set such that

A =

([0, 1] A) = 1, and set X = [0, 1] x + 1 : x A x + 2 : x [0, 1] A. Let


LX
be the subspace measure induced on X by Lebesgue measure, and set E =
1
3

LX
E for E = dom
LX
. Dene
: X Y by writing (x) = x if x [0, 1], (x) = x 1 if x X ]1, 2] and (x) = x 2 if x X ]2, 3]. Show that
is the image measure
1
, but that

A >

1
[A].
(b) Look for interesting examples of probability spaces (X, , ) and (Y, T, ) for which there are functions : X Y
such that [E] T and [E] = E for every E . (Hint: 254K, 343J.)
(c) Let be two-dimensional Lebesgue measure on the unit square [0, 1]
2
, and let : [0, 1]
2
[0, 1] be the projection
onto the rst coordinate, so that (
1
,
2
) =
1
for
1
,
2
[0, 1]. Show that the image measure
1
is Lebesgue
one-dimensional measure on [0, 1].
(d) In 234F, show that the image measure
1
extends , and is equal to if and only if F T for every
F Y [X].
(e) Let (Y, T, ) be a complete measure space, X a set and : X Y a surjection. Set
= E : E X, [E] T, ([E] [X E]) = 0, E = [E] for E T.
Show that is the completion of the measure constructed by the process of 234F.
(f ) Let X be a set, and M the set of measures on X. Show that M, with addition as dened for two measures by
the formulae of 234G, is a commutative semigroup with identity; describe the identity.
(g) Give an example of a set X, probability measures
1
,
2
on X and a set A X such that A is both
1
-negligible
and
2
-negligible, but is not -negligible, where =
1
+
2
.
(h) In 214O, show that if we set E = sup
II

(E I) for every E , then is a measure, while = +.


(i) Let (X, , ) be an atomless semi-nite measure space and an indenite-integral measure over . Show that
the following are equiveridical: (i) for every > 0 there is a > 0 such that E whenever E (ii) has a
Radon-Nikod ym derivative expressible as the sum of a bounded function and an integrable function.
(j) Let (X, , ) be a measure space and an indenite-integral measure over , with Radon-Nikod ym derivative
f. Show that the c.l.d. version of is the indenite-integral measure dened by f over the c.l.d. version of .
(k) Let (X, , ) be a semi-nite measure space which is not localizable. Show that there is a measure : [0, ]
such that E E for every E but there is no measurable function f such that E =
_
E
fd for every E .
(l) Let (X, , ) be a localizable measure space with locally determined negligible sets. Show that a measure , with
domain T , is an indenite-integral measure over i () is complete and semi-nite and zero on -negligible
sets () whenever E > 0 there is an F E such that F and F < and F > 0.
122 The Radon-Nikodym theorem 234Ym
(m) Give an example of a localizable measure space (X, , ) and a complete semi-nite measure on X, dened
on a -algebra T , zero on -negligible sets, and such that whenever E > 0 there is an F E such that F
and F < and F > 0, but is not an indenite-integral measure over . (Hint: 216Yb.)
(n) Let (X, , ) be a localizable measure space, and a complete localizable measure on X, with domain T ,
which is the completion of its restriction to . Show that if we set
1
F = sup(F E) : E , E < for every
F T, then
1
is an indenite-integral measure over , and there is an H such that
1
F = (F H) for every
F T.
(o) Let X be a set, and M
sf
the set of semi-nite measures on X. For , M
sf
say that if dom dom,
F F for every F dom, and whenever E dom and E > 0 there is an F dom such that F E and
0 < F < . (i) Show that (M
sf
, ) is a partially ordered set. (ii) Show that if A M
sf
is a non-empty set with an
upper bound in M
sf
, then it has a least upper bound dened by saying that dom =

A
dom and, for E dom,
E = sup
n

i=0

i
F
i
:
0
, . . . ,
n
A, F
i

in
is a partition of E,
F
i
dom for every i n
= sup
n

i=0

i
F
i
:
0
, . . . ,
n
A, F
0
, . . . , F
n
are disjoint,
F
i
dom
i
and F
i
E for every i n.
(iii) Suppose that , M
sf
have completions , and c.l.d. versions , . Show that . Show that if
then and .
234 Notes and comments One of the striking features of measure theory, compared with other comparably abstract
branches of pure mathematics, is the relative unimportance of any notion of morphism. The theory of groups, for
instance, is dominated by the concept of homomorphism, and general topology gives a similar place to continuous
function. In my view, the nearest equivalent in measure theory is the idea of inverse-measure-preserving function
(234A). I mean in Volumes 3 and 4 to explore this concept more thoroughly. In this volume I will content myself with
signalling such functions when they arise, and with the basic facts listed in 234B.
Naturally linked with the idea of inverse-measure-preserving function is the construction of image measures (234C).
These appear everywhere in the subject, starting with the not-quite-elementary 234Yc. They are of such importance
that it is natural to explore variations, as in 234F and 234Yb, but in my view none are of comparable signicance.
Nearly half the section is taken up with indenite-integral measures. I have taken this part very carefully because
the ideas I wish to express here, in so far as they extend the work of 232, rely critically on the details of the formulation
in 234I, and it is easy to make a false step once we have left the relatively sheltered context of complete -nite measures.
I believe that if we take a little trouble at this point we can develop a theory (234K-234N) which will oer a smooth
path to later applications; to see what I have in mind, you can refer to the entries under indenite-integral measure
in the index. For the moment I mention only a kind of Radon-Nikod ym theorem for localizable measures (234O).
The partial ordering described in 234P-234Q is only one of many which can be considered, and for some purposes
it seems unsatisfactory. The most important examples will appear in Chapter 41 of Volume 4, and have a variety of
special features for which it might be worth setting out further abstractions. However the version here has the merit
of simplicity and supports at least some of the relevant ideas (234Xl). For an alternative notion, see 234Yo.
235 Measurable transformations
I turn now to a topic which is separate from the Radon-Nikod ym theorem, but which seems to t better here than
in either of the next two chapters. I seek to give results which will generalize the basic formula of calculus
_
g(y)dy =
_
g((x))

(x)dx
in the context of a general transformation between measure spaces. The principal results are I suppose 235A/235E,
which are very similar expressions of the basic idea, and 235J, which gives a general criterion for a stronger result. A
formulation from a dierent direction is in 235R.
235B Measurable transformations 123
235A I start with the basic result, which is already sucient for a large proportion of the applications I have in
mind.
Theorem Let (X, , ) and (Y, T, ) be measure spaces, and : D

Y , J : D
J
[0, [ functions dened on
conegligible subsets D

, D
J
of X such that
_
J (
1
[F])d exists = F
whenever F T and F < . Then
_

1
[H]
J gd exists =
_
H
g d
for every -integrable function g taking values in [, ] and every H T, provided that we interpret (J g)(x) as
0 when J(x) = 0 and g((x)) is undened. Consequently, interpreting J f in the same way,
_
fd
_
J fd
_
J fd
_
fd
for every [, ]-valued function f dened almost everywhere in Y .
proof (a) If g is a simple function, say g =

n
i=0
a
i
F
i
where F
i
< for each i, then
_
J gd =

n
i=0
a
i
_
J (
1
[F
i
]) d =

n
i=0
a
i
F
i
=
_
g d.
(b) If F = 0 then
_
J (
1
[F]) = 0 so J = 0 a.e. on
1
[F]. So if g is dened -a.e., J = 0 -a.e. on
X dom(g) = (X D

)
1
[Y domg], and, on the convention proposed, J g is dened -a.e. Moreover, if
lim
n
g
n
= g -a.e., then lim
n
J g
n
= J g -a.e. So if g
n

nN
is a non-decreasing sequence of simple
functions converging almost everywhere to g, J g
n

nN
will be an non-decreasing sequence of integrable functions
converging almost everywhere to J g; by B.Levis theorem,
_
J gd exists = lim
n
_
J g
n
d = lim
n
_
g
n
d =
_
g d.
(c) If g = g
+
g

, where g
+
and g

are -integrable functions, then


_
J gd =
_
J g
+
d
_
J g

d =
_
g
+
d
_
g

d =
_
g d.
(d) This deals with the case H = Y . For the general case, we have
_
H
g d =
_
(g H)d
(131Fa)
=
_
J (g H)d =
_
J g (
1
[H])d =
_

1
[H]
J gd
by 214F.
(e) For the upper and lower integrals, I note rst that if F is -negligible then
_
J (
1
[F])d = 0, so that
J = 0 -a.e. on
1
[F]. It follows that if f and g are [, ]-valued functions on subsets of Y and f
a.e.
g, then
J f
a.e.
J g. Now if
_
fd = , we surely have
_
J fd
_
fd. Otherwise,
_
fd = inf
_
g d : g is integrable and f
a.e.
g
= inf
_
J gd : g is -integrable and f
a.e.
g
inf
_
hd : h is -integrable and J f
a.e.
h =
_
J fd.
Similarly, or applying this argument to f, we have
_
J fd
_
f d.
235B Remarks (a) Note the particular convention
0 undened = 0
124 The Radon-Nikodym theorem 235B
which I am applying to the interpretation of J g. This is the rst of a number of technical points which will concern
us in this section. The point is that if g is dened -almost everywhere, then for any extension of g to a function
g
1
: Y R we shall, on this convention, have J g = J g
1
except on x : J(x) > 0, (x) Y domg, which is
negligible; so that
_
J gd =
_
J g
1
d =
_
g
1
d =
_
g d
if g and g
1
are integrable. Thus the convention is appropriate here, and while it adds a phrase to the statements of
many of the results of this section, it makes their application smoother. (But I ought to insist that I am using this as
a local convention only, and the ordinary rule 0 undened = undened will stand elsewhere in this treatise unless
explicitly overruled.)
(b) I have had to take care in the formulation of this theorem to distinguish between the hypothesis
_
J(x)(
1
[F])(x)(dx) exists = F whenever F <
and the perhaps more elegant alternative
_

1
[F]
J(x)(dx) exists = F whenever F < ,
which is not quite adequate for the theorem. (See 235Q below.) Recall that by
_
A
f I mean
_
(fA)d
A
, where
A
is the subspace measure on A (214D). It is possible for
_
A
(fA)d
A
to be dened even when
_
f Ad is not; for
instance, take to be Lebesgue measure on [0, 1], A any non-measurable subset of [0, 1], and f the constant function
with value 1; then
_
A
f =

A, but f A = A is not -integrable. It is however the case that if


_
f Ad is
dened, then so is
_
A
f, and the two are equal; this is a consequence of 214F. While 235P shows that in most of the
cases relevant to the present volume the distinction can be passed over, it is important to avoid assuming that
1
[F]
is measurable for every F T. A simple example is the following. Set X = Y = [0, 1]. Let be Lebesgue measure on
[0, 1], and dene by setting
T = F : F [0, 1], F [0,
1
2
] is Lebesgue measurable,
F = 2(F [0,
1
2
]) for every F T.
Set (x) = x for every x [0, 1]. Then we have
F =
_
F
J d =
_
J (
1
[F])d
for every F T, where J(x) = 2 for x [0,
1
2
] and J(x) = 0 for x

1
2
, 1

. But of course there are subsets F of [


1
2
, 1]
which are not Lebesgue measurable (see 134D), and such an F necessarily belongs to T, even though
1
[F] does not
belong to the domain of .
The point here is that if F
0
= 0 then we expect to have J = 0 on
1
[F
0
], and it is of no importance whether

1
[F] is measurable for F F
0
.
235C Theorem 235A is concerned with integration, and accordingly the hypothesis
_
J (
1
[F])d = F looks
only at sets F of nite measure. If we wish to consider measurability of non-integrable functions, we need a slightly
stronger hypothesis. I approach this version more gently, with a couple of lemmas.
Lemma Let , T be -algebras of subsets of X and Y respectively. Suppose that D X and that : D Y
is a function such that
1
[F]
D
, the subspace -algebra, for every F T. Then g is -measurable for every
[, ]-valued T-measurable function g dened on a subset of Y .
proof Set C = domg and B = domg =
1
[C]. If a R, then there is an F T such that y : g(y) a = F C.
Now there is an E such that
1
[F] = E D. So
x : g(x) a = B E
B
.
As a is arbitrary, g is -measurable.
235D Some of the results below are easier when we can move freely between measure spaces and their completions
(212C). The next lemma is what we need.
Lemma Let (X, , ) and (Y, T, ) be measure spaces, with completions (X,

, ) and (Y,

T, ). Let : D

Y ,
J : D
J
[0, [ be functions dened on conegligible subsets of X.
(a) If
_
J (
1
[F])d = F whenever F T and F < , then
_
J (
1
[F])d = F whenever F

T and
F < .
235E Measurable transformations 125
(b) If
_
J (
1
[F])d = F whenever F T, then
_
J (
1
[F])d = F whenever F

T.
proof Both rely on the fact that either hypothesis is enough to ensure that
_
J (
1
[F])d = 0 whenever F = 0.
Accordingly, if F is -negligible, so that there is an F

T such that F F

and F

= 0, we shall have
_
J (
1
[F])d =
_
J (
1
[F

])d = 0.
But now, given any F

T, there is an F
0
T such that F
0
F and (F F
0
) = 0, so that
_
J (
1
[F])d =
_
J (
1
[F])d
=
_
J (
1
[F
0
])d +
_
J (
1
[F F
0
])d
= F
0
= F,
provided (for part (a)) that F < .
Remark Thus if we have the hypotheses of any of the principal results of this section valid for a pair of non-complete
measure spaces, we can expect to be able to draw some conclusion by applying the result to the completions of the
given spaces.
235E Now I come to the alternative version of 235A.
Theorem Let (X, , ) and (Y, T, ) be measure spaces, and : D

Y , J : D
J
[0, [ two functions dened on
conegligible subsets of X such that
_
J (
1
[F])d = F
for every F T, allowing as a value of the integral.
(a) J g is -virtually measurable for every -virtually measurable function g dened on a subset of Y .
(b) Let g be a -virtually measurable [, ]-valued function dened on a conegligible subset of Y . Then
_
J
gd =
_
g d whenever either integral is dened in [, ], if we interpret (J g)(x) as 0 when J(x) = 0 and
g((x)) is undened.
proof Let (X,

, ) and (Y,

T, ) be the completions of (X, , ) and (Y, T, ). By 235D,
_
J (
1
[F])d = F
for every F

T. Recalling that a real-valued function is -virtually measurable i it is

-measurable (212Fa), and
that
_
fd =
_
fd if either is dened in [, ] (212Fb), the conclusions we are seeking are
(a)

J g is

-measurable for every

T-measurable function g dened on a subset of Y ;
(b)

_
J gd =
_
g d whenever g is a

T-measurable function dened almost everywhere in Y and
either integral is dened in [, ].
(a) When I write
_
J D

d =
_
J (
1
[Y ])d = Y ,
which is part of the hypothesis of this theorem, I mean to imply that J D

is -virtually measurable, that is, is

-measurable. Because D

is conegligible, it follows that J is



-measurable, and its domain D
J
, being conegligible,
also belongs to

. Set G = x : x D
J
, J(x) > 0

. Then for any set A X, J A is

-measurable i AG

.
So the hypothesis is just that G
1
[F]

for every F

T.
Now let g be a [, ]-valued function, dened on a subset C of Y , which is

T-measurable. Applying 235C to
G, we see that gG is

-measurable, so (J g)G is

-measurable. On the other hand, J g is zero almost
everywhere in X G, so (because G

) J g is

-measurable, as required.
(b)(i) Suppose rst that g 0. Then J g 0, so (a) tells us that
_
J g is dened in [0, ].
() If
_
g d < then
_
J gd =
_
g d by 235A.
() If there is some > 0 such that H = , where H = y : g(y) , then
_
J gd
_
J (
1
[H])d = H = ,
so
126 The Radon-Nikodym theorem 235E
_
J gd = =
_
g d .
() Otherwise,
_
J gd sup
_
J hd : h is -integrable, 0 h g
= sup
_
hd : h is -integrable, 0 h g =
_
g d = ,
so once again
_
J d =
_
g d .
(ii) For general real-valued g, apply (i) to g
+
and g

where g
+
=
1
2
([g[ + g), g

=
1
2
([g[ g); the point is that
(J g)
+
= J g
+
and (J g)

= J g

, so that
_
J g =
_
J g
+

_
J g

=
_
g
+

_
g

=
_
g
if either side is dened in [, ].
235F Remarks (a) Of course there are two special cases of this theorem which between them carry all its content:
the case J = 1 a.e. and the case in which X = Y and is the identity function. If J = X we are very close to 235G
below, and if is the identity function we are close to the indenite-integral measures of 234.
(b) As in 235A, we can strengthen the conclusion of (b) in 235E to
_

1
[F]
J gd =
_
F
g d
whenever F T and
_
F
g d is dened in [, ].
235G Theorem Let (X, , ) and (Y, T, ) be measure spaces and : X Y an inverse-measure-preserving
function. Then
(a) if g is a -virtually measurable [, ]-valued function dened on a subset of Y , g is -virtually measurable;
(b) if g is a -virtually measurable [, ]-valued function dened on a conegligible subset of Y ,
_
gd =
_
g d
if either integral is dened in [, ];
(c) if g is a -virtually measurable [, ]-valued function dened on a conegligible subset of Y , and F T, then
_

1
[F]
gd =
_
F
g d if either integral is dened in [, ].
proof (a) This follows immediately from 234Ba and 235C; taking

,

T to be the domains of the completions of ,
respectively,
1
[F]

for every F

T, so if g is

T-measurable then g will be

-measurable.
(b) Apply 235E with J = X; we have
_
J (
1
[F])d =
1
[F] = F
for every F T, so
_
g =
_
J g =
_
g
if either integral is dened in [, ].
(c) Apply (b) to g F.
235H The image measure catastrophe Applications of 235A would run much more smoothly if we could say

_
g d exists and is equal to
_
J gd for every g : Y R such that J g is -integrable.
Unhappily there is no hope of a universally applicable result in this direction. Suppose, for instance, that is Lebesgue
measure on Y = [0, 1], that X [0, 1] is a non-Lebesgue-measurable set of outer measure 1 (134D), that is the
subspace measure
X
on X, and that (x) = x for x X. Then

1
F =

(X F) = F
for every Lebesgue measurable set F Y , so we can take J = X and the hypotheses of 235A and 235E will be
satised. But if we write g = X : [0, 1] 0, 1, then
_
gd is dened even though
_
g d is not.
The point here is that there is a set A Y such that (in the language of 235A/235E)
1
[A] but A /

T. This
is the image measure catastrophe. The search for contexts in which we can be sure that it does not occur will be
one of the motive themes of Volume 4. For the moment, I will oer some general remarks (235I-235J), and describe
one of the important cases in which the problem does not arise (235K).
235L Measurable transformations 127
235I Lemma Let , T be -algebras of subsets of X, Y respectively, and a function from a subset D of X to Y .
Suppose that G X and that
T = F : F Y, G
1
[F] .
Then a real-valued function g, dened on a member of T, is T-measurable i Gg is -measurable.
proof Because surely Y T, the hypothesis implies that G D = G
1
[Y ] belongs to .
Let g : C R be a function, where C T. Set B = dom(g) =
1
[C], and for a R set F
a
= y : g(y) a,
E
a
= G
1
[F
a
] = x : x G B, g(x) a.
Note that G B because C T.
(i) If g is T-measurable, then F
a
T and E
a
for every a. Now
G x : x B, g(x) a = G
1
[F
a
] = E
a
,
so x : x B, (Gg)(x) a is either E
a
or E
a
(B G), and in either case is relatively -measurable in B. As
a is arbitrary, Gg is -measurable.
(ii) If Gg is -measurable, then, for any a R,
E
a
= x : x G B, (Gg)(x) a
because G B and Gg is -measurable. So F
a
T. As a is arbitrary, g is T-measurable.
235J Theorem Let (X, , ) and (Y, T, ) be complete measure spaces. Let : D

Y , J : D
J
[0, [ be
functions dened on conegligible subsets of X, and set G = x : x D
J
, J(x) > 0. Suppose that
T = F : F Y, G
1
[F] ,
F =
_
J (
1
[F])d for every F T.
Then, for any real-valued function g dened on a subset of Y ,
_
J gd =
_
g d whenever either integral is dened
in [, ], provided that we interpret (J g)(x) as 0 when J(x) = 0 and g((x)) is undened.
proof If g is T-measurable and dened almost everywhere, this is a consequence of 235E. So I have to show that
if J g is measurable and dened almost everywhere, so is g. Set W = Y domg. Then J g is undened on
G
1
[W], because g is undened there and we cannot take advantage of the escape clause available when J = 0;
so G
1
[W] must be negligible, therefore measurable, and W T. Next,
W =
_
J (
1
[W]) = 0
because J (
1
[W]) can be non-zero only on the negligible set G
1
[W]. So g is dened almost everywhere.
Note that the hypothesis surely implies that J D

= J (
1
[Y ]) is measurable, so that J is measurable
(because D

is conegligible) and G . Writing K(x) = 1/J(x) for x G, 0 for x X G, the function K : X R


is measurable, and
Gg = K J g
is measurable. So 235I tells us that g must be measurable, and were done.
Remark When J = X, the hypothesis of this theorem becomes
T = F : F Y,
1
[F] , F =
1
[F] for every F T;
that is, is the image measure
1
as dened in 234D.
235K Corollary Let (X, , ) be a complete measure space, and J a non-negative measurable function dened
on a conegligible subset of X. Let be the associated indenite-integral measure, and T its domain. Then for any
real-valued function g dened on a subset of X, g is T-measurable i J g is -measurable, and
_
g d =
_
J g d if
either integral is dened in [, ], provided that we interpret (J g)(x) as 0 when J(x) = 0 and g(x) is undened.
proof Put 235J, taking Y = X and the identity function, together with 234Ld.
235L Applying the Radon-Nikod ym theorem In order to use 235A-235J eectively, we need to be able to nd
suitable functions J. This can be dicult some very special examples will take up most of Chapter 26 below. But
there are many circumstances in which we can be sure that such J exist, even if we do not know what they are. A
minimal requirement is that if F < and

1
[F] = 0 then F = 0, because
_
J (
1
[F])d will be zero for
any J. A sucient condition, in the special case of indenite-integral measures, is in 234O. Another is the following.
128 The Radon-Nikodym theorem 235M
235M Theorem Let (X, , ) be a -nite measure space, (Y, T, ) a semi-nite measure space, and : D Y a
function such that
(i) D is a conegligible subset of X,
(ii)
1
[F] for every F T;
(iii)
1
[F] > 0 whenever F T and F > 0.
Then there is a -measurable function J : X [0, [ such that
_
J
1
[F] d = F for every F T.
proof (a) To begin with (down to the end of (c) below) let us suppose that D = X and that is totally nite.
Set

T =
1
[F] : F T . Then

T is a -algebra of subsets of X. PPP (i)
=
1
[]

T.
(ii) If E

T, take F T such that E =
1
[F], so that
X E =
1
[Y F]

T.
(iii) If E
n

nN
is any sequence in

T, then for each n N choose F
n
T such that E
n
=
1
[F
n
]; then

nN
E
n
=
1
[

nN
F
n
]

T. QQQ
Next, we have a totally nite measure :

T [0, Y ] given by setting
(
1
[F]) = F for every F T.
PPP (i) If F, F

T and
1
[F] =
1
[F

], then
1
[FF

] = , so (
1
[FF

]) = 0 and (FF

) = 0; consequently
F = F

. This shows that is well-dened. (ii) Now


= (
1
[]) = = 0.
(iii) If E
n

nN
is a disjoint sequence in

T, let F
n

nN
be a sequence in T such that E
n
=
1
[F
n
] for each n; set
F

n
= F
n

m<n
F
m
for each n; then E
n
=
1
[F

n
] for each n, so
(

nN
E
n
) = (
1
[

nN
F

n
]) = (

nN
F

n
) =

n=0
F

n
=

n=0
E
n
. QQQ
Finally, observe that if E > 0 then E > 0, because E =
1
[F] where F > 0.
(b) By 215B(ix) there is a -measurable function h : X ]0, [ such that
_
hd is nite. Dene :

T [0, [
by setting E =
_
E
hd for every E

T; then is a totally nite measure. If E

T and E = 0, then (because h is
strictly positive) E = 0 and E = 0. Accordingly we may apply the Radon-Nikod ym theorem to and to see that
there is a

T-measurable function g : X R such that
_
E
g d = E for every E

T. Because is non-negative, we
may suppose that g 0.
(c) Applying 235A to , , h and the identity function from X to itself, we see that
_
E
g hd =
_
E
g d = E
for every E

T, that is, that
_
J (
1
[F])d = F
for every F T, writing J = g h.
(d) This completes the proof when is totally nite and D = X. For the general case, if Y = then X = 0 and
the result is trivial. Otherwise, let

be any extension of to a function from X to Y which is constant on X D;
then

1
[F] for every F T, because D =
1
[Y ] and

1
[F] is always either
1
[F] or (X D)
1
[F].
Now must be -nite. PPP Use the criterion of 215B(ii). If T is a disjoint family in F : F T, 0 < F < , then
c =

1
[F] : F T is a disjoint family in E : E > 0, so c and T are countable. QQQ
Let Y
n

nN
be a partition of Y into sets of nite -measure, and for each n N set
n
F = (F Y
n
) for every
F T. Then
n
is a totally nite measure on Y , and if
n
F > 0 then F > 0 so

1
[F] =
1
[F] > 0.
Accordingly ,

and
n
satisfy the assumptions of the theorem together with those of (a) above, and there is a
-measurable function J
n
: X [0, [ such that

n
F =
_
J
n
(
1
[F])d
for every F T. Now set J =

n=0
J
n
(
1
[Y
n
]), so that J : X [0, [ is -measurable. If F T, then
*235P Measurable transformations 129
_
J (
1
[F])d =

n=0
_
J
n
(
1
[Y
n
]) (
1
[F])d
=

n=0
_
J
n
(
1
[F Y
n
])d =

n=0
(F Y
n
) = F,
as required.
235N Remark Theorem 235M can fail if is only strictly localizable rather than -nite. PPP Let X = Y be an
uncountable set, = TX, counting measure on X (112Bd), T the countable-cocountable -algebra of Y , the
countable-cocountable measure on Y (211R), : X Y the identity map. Then
1
[F] and
1
[F] > 0
whenever F > 0. But if J is any -integrable function on X, then F = x : J(x) ,= 0 is countable and
(Y F) = 1 ,= 0 =
_

1
[Y \F]
J d. QQQ
*235O There are some simplications in the case of -nite spaces; in particular, 235A and 235E become conated.
I will give an adaptation of the hypotheses of 235A which may be used in the -nite case. First a lemma.
Lemma Let (X, , ) be a measure space and f a non-negative integrable function on X. If A X is such that
_
A
f +
_
X\A
f =
_
f, then f A is integrable.
proof By 214Eb, there are -integrable functions f
1
, f
2
such that f
1
extends fA, f
2
extends fX A, and
_
E
f
1
=
_
EA
f,
_
E
f
2
=
_
E\A
f
for every E . Because f is non-negative,
_
E
f
1
and
_
E
f
2
are non-negative for every E , and f
1
, f
2
are
non-negative a.e. Accordingly we have f A
a.e.
f
1
and f (X A)
a.e.
f
2
, so that f
a.e.
f
1
+f
2
. But also
_
f
1
+f
2
=
_
X
f
1
+
_
X
f
2
=
_
A
f +
_
X\A
f =
_
f,
so f =
a.e.
f
1
+f
2
. Accordingly
f
1
=
a.e.
f f
2

a.e.
f f (X A) = f A
a.e.
f
1
and f A =
a.e.
f
1
is integrable.
*235P Proposition Let (X, , ) be a complete measure space and (Y, T, ) a complete -nite measure space.
Suppose that : D

Y , J : D
J
[0, [ are functions dened on conegligible subsets D

, D
J
of X such that
_

1
[F]
J d exists and is equal to F whenever F T and F < .
(a) J g is -measurable for every T-measurable real-valued function g dened on a subset of Y .
(b) If g is a T-measurable real-valued function dened almost everywhere in Y , then
_
J gd =
_
g d whenever
either integral is dened in [, ], interpreting (J g)(x) as 0 when J(x) = 0, g((x)) is undened.
proof The point is that the hypotheses of 235E are satised. To see this, let us write
C
= E C : E and

C
=

C
for the subspace measure on C, for each C X. Let Y
n

nN
be a non-decreasing sequence of sets with
union Y and with Y
n
< for every n N, starting from Y
0
= .
(i) Take any F T with F < , and set F
n
= F Y
n
for each n N; write C
n
=
1
[F
n
].
Fix n for the moment. Then our hypothesis implies that
_
C
0
J d +
_
C
n
\C
0
J d = F +(F
n
F) = F
n
=
_
C
n
J d.
If we regard the subspace measures on C
0
and C
n
C
0
as derived from the measure
C
n
of C
n
(214Ce), then 235O
tells us that J C
0
is
C
n
-integrable, and there is a -integrable function h
n
such that h
n
extends (J C
0
)C
n
.
Let E be a -conegligible set, included in the domain D

of , such that h
n
E is -measurable for every n. Because
C
n

nN
is a non-decreasing sequence with union
1
[

nN
F
n
] = D

,
(J C
0
)(x) = lim
n
h
n
(x)
for every x E, and (J C
0
)E is measurable. At the same time, we know that there is a -integrable h extending
JC
0
, and 0
a.e.
J C
0

a.e.
[h[. Accordingly J C
0
is integrable, and (using 214F)
_
J
1
[F] d =
_
J C
0
d =
_
C
0
JC
0
d
C
0
= F.
130 The Radon-Nikodym theorem *235P
(ii) This deals with F of nite measure. For general F T,
_
J (
1
[F]) d = lim
n
_
J (
1
[F Y
n
]) d = lim
n
(F Y
n
) = F.
So the hypotheses of 235E are satised, and the result follows at once.
*235Q I remarked in 235Bb that a diculty can arise in 235A, for general measure spaces, if we speak of
_

1
[F]
J d
in the hypothesis, in place of
_
J (
1
[F])d. Here is an example.
Example Set X = Y = [0, 2]. Write
L
for the algebra of Lebesgue measurable subsets of R, and
L
for Lebesgue
measure; write
c
for counting measure on R. Set
= T = E : E [0, 2], E [0, 1[
L
;
of course this is a -algebra of subsets of [0, 2]. For E = T, set
E = E =
L
(E [0, 1[) +
c
(E [1, 2]);
then is a complete measure in eect, it is the direct sum of Lebesgue measure on [0, 1[ and counting measure on
[1, 2] (see 214L). It is easy to see that

B =

L
(B [0, 1[) +
c
(B [1, 2])
for every B [0, 2]. Let A [0, 1[ be a non-Lebesgue-measurable set such that

L
(E A) =
L
E for every Lebesgue
measurable E [0, 1[ (see 134D). Dene : [0, 2] [0, 2] by setting (x) = x + 1 if x A, (x) = x if x [0, 2] A.
If F , then

(
1
[F]) = F. PPP (i) If F [1, 2] is nite, then F =
L
(F [0, 1]) + #(F [1, 2]). Now

1
[F] = (F [0, 1[ A) (F [1, 2]) x : x A, x + 1 F;
as the last set is nite, therefore -negligible,

(
1
[F]) =

L
(F [0, 1[ A) + #(F [1, 2]) =
L
(F [0, 1[) + #(F [1, 2]) = F.
(ii) If F [1, 2] is innite, so is
1
[F] [1, 2], so

(
1
[F]) = = F. QQQ
This means that if we set J(x) = 1 for every x [0, 2],
_

1
[F]
J d =

1
[F]
(
1
[F]) =

(
1
[F]) = F
for every F , and , J satisfy the amended hypotheses for 235A. But if we set g = [0, 1[, then g is -integrable,
with
_
g d = 1, while
J(x)g((x)) = 1 if x [0, 1] A, 0 otherwise,
so, because A / , J g is not measurable, and therefore (since is complete) not -integrable.
235R Reversing the burden Throughout the work above, I have been using the formula
_
J g =
_
g,
as being the natural extension of the formula
_
g =
_
g

of ordinary advanced calculus. But we can also move the derivative J to the other side of the equation, as follows.
Theorem Let (X, , ), (Y, T, ) be measure spaces and : X Y , J : Y [0, [ functions such that
_
F
J d
and
1
[F] are dened in [0, ] and equal for every F T. Then
_
gd =
_
J g d whenever g is -virtually
measurable and dened -almost everywhere and either integral is dened in [, ].
proof Let
1
be the indenite-integral measure over dened by J, and the completion of . Then is inverse-
measure-preserving for and
1
. PPP If F T, then
1
F =
_
F
J d =
1
[F]; that is, is inverse-measure-preserving
for and
1
T. Since
1
is the completion of
1
T (234Lb), is inverse-measure-preserving for and
1
(234Ba). QQQ
Of course we can also regard
1
as being an indenite-integral measure over the completion of (212Fb). So if g
is -virtually measurable and dened -almost everywhere,
_
J g d =
_
J g d =
_
g d
1
=
_
gd =
_
gd
if any of the ve integrals is dened in [, ], by 235K, 235Gb and 212Fb again.
235Xm Measurable transformations 131
235X Basic exercises (a) Explain what 235A tells us when X = Y , T = , is the identity function and
E = E for every E .
(b) Let (X, , ) be a measure space, J an integrable non-negative real-valued function on X, and : D

R a
measurable function, where D

is a conegligible subset of X. Set


g(a) =
_
{x:(x)a}
J
for a R, and let
g
be the Lebesgue-Stieltjes measure associated with g. Show that
_
J fd =
_
fd
g
for every

g
-integrable real function f.
(c) Let , T and be -algebras of subsets of X, Y and Z respectively. Let us say that a function : A Y , where
A X, is (, T)-measurable if
1
[F]
A
, the subspace -algebra of A, for every F T. Suppose that A X,
B Y , : A Y is (, T)-measurable and : B Z is (T, )-measurable. Show that is (, )-measurable.
Deduce 235C.
(d) Let (X, , ) be a measure space and (Y, T, ) a semi-nite measure space. Let : D

Y and J : D
J
[0, [
be functions dened on conegligible subsets D

, D
J
of X such that
_
J (
1
[F])d exists = F whenever F T
and F < . Let g be a T-measurable real-valued function, dened on a conegligible subset of Y . Show that J g is
-integrable i g is -integrable, and the integrals are then equal, provided we interpret (J g)(x) as 0 when J(x) = 0
and g((x)) is undened.
(e) Let (X, , ) be a measure space and E . Dene a measure E on X by setting ( E)(F) = (E F)
whenever F X is such that F E . Show that, for any function f from a subset of X to [, ],
_
fd( E) =
_
E
fd if either is dened in [, ].
>>>(f ) Let g : R R be a non-decreasing function which is absolutely continuous on every closed bounded interval, and

g
the associated Lebesgue-Stieltjes measure (114Xa, 225Xf). Write for Lebesgue measure on R, and let f : R R
be a function. Show that
_
f g

d =
_
f d
g
in the sense that if one of the integrals exists, nite or innite, so does
the other, and they are then equal.
(g) Let g : R R be a non-decreasing function and J a non-negative real-valued
g
-integrable function, where

g
is the Lebesgue-Stieltjes measure dened from g. Set h(x) =
_
],x]
J d
g
for each x R, and let
h
be the
Lebesgue-Stieltjes measure associated with h. Show that, for any f : R R,
_
f J d
g
=
_
fd
h
, in the sense that
if one of the integrals is dened in [, ] so is the other, and they are then equal.
>>>(h) Let X be a set and , , three measures on X such that is an indenite-integral measure over , with
Radon-Nikod ym derivative f, and is an indenite-integral measure over , with Radon-Nikod ym derivative g. Show
that is an indenite-integral measure over , and that f g is a Radon-Nikod ym derivative of with respect to ,
provided we interpret (f g)(x) as 0 when f(x) = 0 and g(x) is undened.
(i) In 235M, if is not semi-nite, show that we can still nd a J such that
_

1
[F]
J d = F for every set F of
nite measure. (Hint: use the semi-nite version of , as described in 213Xc.)
(j) Let (X, , ) be a -nite measure space, and T a -subalgebra of . Let : T R be a countably additive
functional such that F = 0 whenever F T and F = 0. Show that there is a -integrable function f such that
_
F
fd = F for every F T. (Hint: use the method of 235M, applied to the positive and negative parts of .)
(k) Let (X, , ) and (Y, T, ) be measure spaces, with completions (X,

, ) and (Y,

T, ). Let : D

Y ,
J : D
J
[0, [ be functions dened on conegligible subsets of X. Show that if
_

1
[F]
J d = F whenever F T
and F < , then
_

1
[F]
J d = F whenever F

T and F < . Hence, or otherwise, show that 235Pb is valid for
non-complete spaces (X, , ) and (Y, T, ).
(l) Let (X, , ) be a complete measure space, Y a set, : X Y a function and =
1
the corresponding
image measure on Y . Let
1
be an indenite-integral measure over . Show that there is an indenite-integral measure

1
over such that
1
is the image measure
1

1
.
(m) Let (X, , ) and (Y, T, ) be measure spaces, and : X Y an inverse-measure-preserving function. Let
1
be an indenite-integral measure over . Show that there is an indenite-integral measure
1
over such that is
inverse-measure-preserving for
1
and
1
.
132 The Radon-Nikodym theorem 235Y
235Y Further exercises (a) Write T for the algebra of Borel subsets of Y = [0, 1], and for the restriction of
Lebesgue measure to T. Let A [0, 1] be a set such that both A and [0, 1] A have Lebesgue outer measure 1, and
set X = A [1, 2]. Let be the algebra of relatively Borel subsets of X, and set E =
A
(A E) for E , where

A
is the subspace measure induced on A by Lebesgue measure. Dene : X Y by setting (x) = x if x A, x 1
if x X A. Show that is the image measure
1
, but that, setting g = ([0, 1] A), g is -integrable while g is
not -integrable.
(b) Let (X, , ) be a probability space and T a -subalgebra of . Let f be a non-negative -integrable function
with
_
fd = 1, so that its indenite-integral measure is a probability measure. Let g be a -integrable real-valued
function and set h = f g, intepreting h(x) as 0 if f(x) = 0 and g(x) is undened. Let f
1
, h
1
be conditional expectations
of f, h on T with respect to the measure , and set g
1
= h
1
/f
1
, interpreting g
1
(x) as 0 if h
1
(x) = 0 and f
1
(x) is either
0 or undened. Show that g
1
is a conditional expectation of g on T with respect to the measure .
235 Notes and comments I see that I have taken up a great deal of space in this section with technicalities; the
hypotheses of the theorems vary erratically, with completeness, in particular, being invoked at apparently arbitrary
intervals, and ideas repeat themselves in a haphazard pattern. There is nothing deep, and most of the work consists in
laboriously verifying details. The trouble with this topic is that it is useful. The results here are abstract expressions
of integration-by-substitution; they have applications all over measure theory. I cannot therefore content myself with
theorems which will elegantly express the underlying ideas, but must seek formulations which I can quote in later
arguments.
I hope that the examples in 235Bb, 235H, 235N, 235Q, 234Ya and 235Ya will go some way to persuade you that there
are real traps for the unwary, and that the careful verications written out at such length are necessary. On the other
hand, it is happily the case that in simple contexts, in which the measures , are -nite and the transformations
are Borel isomorphisms, no insuperable diculties arise, and in particular the image measure catastrophe does not
trouble us. But for further work in this direction I refer you to the applications in 263, 265 and 271, and to Volume
4.
241Bd L
0
and L
0
133
Chapter 24
Function spaces
The extraordinary power of Lebesgues theory of integration is perhaps best demonstrated by its ability to provide
structures relevant to questions quite dierent from those to which it was at rst addressed. In this chapter I give the
constructions, and elementary properties, of some of the fundamental spaces of functional analysis.
I do not feel called on here to justify the study of normed spaces; if you have not met them before, I hope that the
introduction here will show at least that they oer a basis for a remarkable fusion of algebra and analysis. The fragments
of the theory of metric spaces, normed spaces and general topology which we shall need are sketched in 2A2-2A5.
The principal function spaces described in this chapter in fact combine three structural elements: they are (innite-
dimensional) linear spaces, they are metric spaces, with associated concepts of continuity and convergence, and they
are ordered spaces, with corresponding notions of supremum and inmum. The interactions between these three types
of structure provide an inexhaustible wealth of ideas. Furthermore, many of these ideas are directly applicable to a
wide variety of problems in more or less applied mathematics, particularly in dierential and integral equations, but
more generally in any system with innitely many degrees of freedom.
I have laid out the chapter with sections on L
0
(the space of equivalence classes of all real-valued measurable functions,
in which all the other spaces of the chapter are embedded), L
1
(equivalence classes of integrable functions), L

(equivalence classes of bounded measurable functions) and L


p
(equivalence classes of pth-power-integrable functions).
While ordinary functional analysis gives much more attention to the Banach spaces L
p
for 1 p than to L
0
,
from the special point of view of this book the space L
0
is at least as important and interesting as any of the others.
Following these four sections, I return to a study of the standard topology on L
0
, the topology of convergence in
measure (245), and then to two linked sections on uniform integrability and weak compactness in L
1
(246-247).
There is a technical point here which must never be lost sight of. While it is customary and natural to call L
1
, L
2
and the others function spaces, their elements are not in fact functions, but equivalence classes of functions. As you
see from the language of the preceding paragraph, my practice is to scrupulously maintain the distinction; I give my
reasons in the notes to 241.
241 L
0
and L
0
The chief aim of this chapter is to discuss the spaces L
1
, L

and L
p
of the following three sections. However it
will be convenient to regard all these as subspaces of a larger space L
0
of equivalence classes of (virtually) measurable
functions, and I have collected in this section the basic facts concerning the ordered linear space L
0
.
It is almost the rst principle of measure theory that sets of measure zero can often be ignored; the phrase negligible
set itself asserts this principle. Accordingly, two functions which agree almost everywhere may often (not always!)
be treated as identical. A suitable expression of this idea is to form the space of equivalence classes of functions,
saying that two functions are equivalent if they agree on a conegligible set. This is the basis of all the constructions of
this chapter. It is a remarkable fact that the spaces of equivalence classes so constructed are actually better adapted
to certain problems than the spaces of functions from which they are derived, so that once the technique has been
mastered it is easier to do ones thinking in the more abstract spaces.
241A The space L
0
: Denition It is time to give a name to a set of functions which has already been used more
than once. Let (X, , ) be a measure space. I write L
0
, or L
0
(), for the space of real-valued functions f dened on
conegligible subsets of X which are virtually measurable, that is, such that fE is measurable for some conegligible
set E X. Recall that f is -virtually measurable i it is

-measurable, where

is the completion of (212Fa).
241B Basic properties If (X, , ) is any measure space, then we have the following facts, corresponding to the
fundamental properties of measurable functions listed in 121 of Volume 1. I work through them in order, so that if
you have Volume 1 to hand you can see what has to be missed out.
(a) A constant real-valued function dened almost everywhere in X belongs to L
0
(121Ea).
(b) f + g L
0
for all f, g L
0
(for if fF and gG are measurable, then (f + g)(F G) = (fF) + (gG) is
measurable)(121Eb).
(c) cf L
0
for all f L
0
, c R (121Ec).
(d) f g L
0
for all f, g L
0
(121Ed).
134 Function spaces 241Be
(e) If f L
0
and h : R R is Borel measurable, then hf L
0
(121Eg).
(f ) If f
n

nN
is a sequence in L
0
and f = lim
n
f
n
is dened (as a real-valued function) almost everywhere in
X, then f L
0
(121Fa).
(g) If f
n

nN
is a sequence in L
0
and f = sup
nN
f
n
is dened (as a real-valued function) almost everywhere in X,
then f L
0
(121Fb).
(h) If f
n

nN
is a sequence in L
0
and f = inf
nN
f
n
is dened (as a real-valued function) almost everywhere in X,
then f L
0
(121Fc).
(i) If f
n

nN
is a sequence in L
0
and f = limsup
n
f
n
is dened (as a real-valued function) almost everywhere
in X, then f L
0
(121Fd).
(j) If f
n

nN
is a sequence in L
0
and f = liminf
n
f
n
is dened (as a real-valued function) almost everywhere in
X, then f L
0
(121Fe).
(k) L
0
is just the set of real-valued functions, dened on subsets of X, which are equal almost everywhere to
some -measurable function from X to R. PPP (i) If g : X R is -measurable and f =
a.e.
g, then F = x : x
domf, f(x) = g(x) is conegligible and fF = gF is measurable (121Eh), so f L
0
. (ii) If f L
0
, let E X be a
conegligible set such that fE is measurable. Then D = E domf is conegligible and fD is measurable, so there is
a measurable h : X R agreeing with f on D (121I); and h =
a.e.
f. QQQ
241C The space L
0
: Denition Let (X, , ) be any measure space. Then =
a.e.
is an equivalence relation on L
0
.
Write L
0
, or L
0
(), for the set of equivalence classes in L
0
under =
a.e.
. For f L
0
, write f

for its equivalence class


in L
0
.
241D The linear structure of L
0
Let (X, , ) be any measure space, and set L
0
= L
0
(), L
0
= L
0
().
(a) If f
1
, f
2
, g
1
, g
2
L
0
, f
1
=
a.e.
f
2
and g
1
=
a.e.
g
2
then f
1
+g
1
=
a.e.
f
2
+g
2
. Accordingly we may dene addition
on L
0
by setting f

+g

= (f +g)

for all f, g L
0
.
(b) If f
1
, f
2
L
0
and f
1
=
a.e.
f
2
, then cf
1
=
a.e.
cf
2
for every c R. Accordingly we may dene scalar multiplication
on L
0
by setting c f

= (cf)

for all f L
0
and c R.
(c) Now L
0
is a linear space over R, with zero 0

, where 0 is the function with domain X and constant value 0, and


negatives (f

) = (f)

. PPP (i)
f + (g +h) = (f +g) +h for all f, g, h L
0
,
so
u + (v +w) = (u +v) +w for all u, v, w L
0
.
(ii)
f +0 = 0 +f = f for every f L
0
,
so
u +0

= 0

+u = u for every u L
0
.
(iii)
f + (f) =
a.e.
0 for every f L
0
,
so
f

+ (f)

= 0

for every f L
0
.
(iv)
f +g = g +f for all f, g L
0
,
so
u +v = v +u for all u, v L
0
.
241Ed L
0
and L
0
135
(v)
c(f +g) = cf +cg for all f, g L
0
and c R,
so
c(u +v) = cu +cv for all u, v L
0
and c R.
(vi)
(a +b)f = af +bf for all f L
0
and a, b R,
so
(a +b)u = au +bu for all u L
0
and a, b R.
(vii)
(ab)f = a(bf) for all f L
0
and a, b R,
so
(ab)u = a(bu) for all u L
0
and a, b R.
(viii)
1f = f for all f L
0
,
so
1u = u for all u L
0
. QQQ
241E The order structure of L
0
Let (X, , ) be any measure space and set L
0
= L
0
(), L
0
= L
0
().
(a) If f
1
, f
2
, g
1
, g
2
L
0
, f
1
=
a.e.
f
2
, g
1
=
a.e.
g
2
and f
1

a.e.
g
1
, then f
2

a.e.
g
2
. Accordingly we may dene a
relation on L
0
by saying that f

i f
a.e.
g.
(b) Now is a partial order on L
0
. PPP (i) If f, g, h L
0
and f
a.e.
g and g
a.e.
h, then f
a.e.
h. Accordingly
u w whenever u, v, w L
0
, u v and v w. (ii) If f L
0
then f
a.e.
f; so u u for every u L
0
. (iii) If f,
g L
0
and f
a.e.
g and g
a.e.
f, then f =
a.e.
g, so if u v and v u then u = v. QQQ
(c) In fact L
0
, with , is a partially ordered linear space, that is, a (real) linear space with a partial order
such that
if u v then u +w v +w for every w,
if 0 u then 0 cu for every c 0.
PPP (i) If f, g, h L
0
and f
a.e.
g, then f + h
a.e.
g + h. (ii) If f L
0
and f 0 a.e., then cf 0 a.e. for every
c 0. QQQ
(d) More: L
0
is a Riesz space or vector lattice, that is, a partially ordered linear space such that uv = supu, v
and u v = infu, v are dened for all u, v L
0
. PPP Take f, g L
0
such that f

= u and g

= v. Then f g,
f g L
0
, writing
(f g)(x) = max(f(x), g(x)), (f g)(x) = min(f(x), g(x))
for x domf domg. (Compare 241Bg-h.) Now, for any h L
0
, we have
f g
a.e.
h f
a.e.
h and g
a.e.
h,
h
a.e.
f g h
a.e.
f and h
a.e.
g,
so for any w L
0
we have
(f g)

w u w and v w,
w (f g)

w u and w v.
Thus we have
(f g)

= supu, v = u v, (f g)

= infu, v = u v
in L
0
. QQQ
136 Function spaces 241Ee
(e) In particular, for any u L
0
we can speak of [u[ = u (u); if f L
0
then [f

[ = [f[

.
If f, g L
0
, c R then
[cf[ = [c[[f[, f g =
1
2
(f +g +[f g[),
f g =
1
2
(f +g [f g[), [f +g[
a.e.
[f[ +[g[,
so
[cu[ = [c[[u[, u v =
1
2
(u +v +[u v[),
u v =
1
2
(u +v [u v[), [u +v[ [u[ +[v[
for all u, v L
0
.
(f ) A special notation is often useful. If f is a real-valued function, set f
+
(x) = max(f(x), 0), f

(x) = max(f(x), 0)
for x domf, so that
f = f
+
f

, [f[ = f
+
+f

= f
+
f

,
all these functions being dened on domf. In L
0
, the corresponding operations are u
+
= u 0, u

= (u) 0, and
we have
u = u
+
u

, [u[ = u
+
+u

= u
+
u

, u
+
u

= 0.
(g) It is perhaps obvious, but I say it anyway: if u 0 in L
0
, then there is an f 0 in L
0
such that f

= u. PPP
Take any g L
0
such that u = g

, and set f = g 0. QQQ


241F Riesz spaces There is an extensive abstract theory of Riesz spaces, which I think it best to leave aside for the
moment; a general account may be found in Luxemburg & Zaanen 71 and Zaanen 83; my own book Fremlin 74
covers the elementary material, and Chapter 35 in the next volume repeats the most essential ideas. For our purposes
here we need only a few denitions and some simple results which are most easily proved for the special cases in which
we need them, without reference to the general theory.
(a) A Riesz space U is Archimedean if whenever u U, u > 0 (that is, u 0 and u ,= 0), and v U, there is an
n N such that nu , v.
(b) A Riesz space U is Dedekind -complete (or -order-complete, or -complete) if every non-empty count-
able set A U which is bounded above has a least upper bound in U.
(c) A Riesz space U is Dedekind complete (or order complete, or complete) if every non-empty set A U
which is bounded above in U has a least upper bound in U.
241G Now we have the following important properties of L
0
.
Theorem Let (X, , ) be a measure space. Set L
0
= L
0
().
(a) L
0
is Archimedean and Dedekind -complete.
(b) If (X, , ) is semi-nite, then L
0
is Dedekind complete i (X, , ) is localizable.
proof Set L
0
= L
0
().
(a)(i) If u, v L
0
and u > 0, express u as f

and v as g

where f, g L
0
. Then E = x : x domf, f(x) > 0 is
not negligible. So there is an n N such that
E
n
= x : x domf domg, nf(x) > g(x)
is not negligible, since E domg

nN
E
n
. But now nu , v. As u and v are arbitrary, L
0
is Archimedean.
(ii) Now let A L
0
be a non-empty countable set with an upper bound w in L
0
. Express A as f

n
: n N
where f
n

nN
is a sequence in L
0
, and w as h

where h L
0
. Set f = sup
nN
f
n
. Then we have f(x) dened in R at
any point x domh

nN
domf
n
such that f
n
(x) h(x) for every n N, that is, for almost every x X; so f L
0
(241Bg). Set u = f

L
0
. If v L
0
, say v = g

where g L
0
, then
241Ha L
0
and L
0
137
u
n
v for every n N
for every n N, f
n

a.e.
g
for almost every x X, f
n
(x) g(x) for every n N
f
a.e.
g u v.
Thus u = sup
nN
u
n
in L
0
. As A is arbitrary, L
0
is Dedekind -complete.
(b)(i) Suppose that (X, , ) is localizable. Let A L
0
be any non-empty set with an upper bound w
0
L
0
. Set
/ = f : f is a measurable function from X to R, f

A;
then every member of A is of the form f

for some f / (241Bk). For each q Q, let c


q
be the family of subsets of
X expressible in the form x : f(x) q for some f /; then c
q
. Because (X, , ) is localizable, there is a set
F
q
which is an essential supremum for c
q
. For x X, set
g

(x) = supq : q Q, x F
q
,
allowing as the supremum of a set which is not bounded above, and as sup . Then
x : g

(x) > a =

qQ,q>a
F
q

for every a R.
If f /, then f
a.e.
g

. PPP For each q Q, set


E
q
= x : f(x) q c
q
;
then E
q
F
q
is negligible. Set H =

qQ
(E
q
F
q
). If x X H, then
f(x) q = g

(x) q,
so f(x) g

(x); thus f
a.e.
g

. QQQ
If h : X R is measurable and u h

for every u A, then g


a.e.
h. PPP Set G
q
= x : h(x) q for each q Q.
If E c
q
, there is an f / such that E = x : f(x) q; now f
a.e.
h, so E G
q
x : f(x) > h(x) is negligible.
Because F
q
is an essential supremum for c
q
, F
q
G
q
is negligible; and this is true for every q Q. Consequently
x : h(x) < g

(x)

qQ
F
q
G
q
is negligible, and g


a.e.
h. QQQ
Now recall that we are assuming that A ,= and that A has an upper bound w
0
L
0
. Take any f
0
/ and a
measurable h
0
: X R such that h

0
= w
0
; then f
a.e.
h
0
for every f /, so f
0

a.e.
g


a.e.
h
0
, and g

must be
nite a.e. Setting g(x) = g

(x) when g

(x) R, we have g L
0
and g =
a.e.
g

, so that
f
a.e.
g
a.e.
h
whenever f, h are measurable functions from X to R, f

A and h

is an upper bound for A; that is,


u g

w
whenever u A and w is an upper bound for A. But this means that g

is a least upper bound for A in L


0
. As A is
arbitrary, L
0
is Dedekind complete.
(ii) Suppose that L
0
is Dedekind complete. We are assuming that (X, , ) is semi-nite. Let c be any subset of
. Set
A = 0 (E)

: E c L
0
.
Then A is bounded above by (X)

so has a least upper bound w L


0
. Express w as h

where h : X R is
measurable, and set F = x : h(x) > 0. Then F is an essential supremum for c in . PPP () If E c, then (E)

w
so E
a.e.
h, that is, h(x) 1 for almost every x E, and E F x : x E, h(x) < 1 is negligible. () If G
and E G is negligible for every E c, then E
a.e.
G for every E c, that is, (E)

(G)

for every E c; so
w (G)

, that is, h
a.e.
G. Accordingly F G x : h(x) > (G)(x) is negligible. QQQ
As c is arbitrary, (X, , ) is localizable.
241H The multiplicative structure of L
0
Let (X, , ) be any measure space; write L
0
= L
0
(), L
0
= L
0
().
(a) If f
1
, f
2
, g
1
, g
2
L
0
, f
1
=
a.e.
f
2
and g
1
=
a.e.
g
2
then f
1
g
1
=
a.e.
f
2
g
2
. Accordingly we may dene
multiplication on L
0
by setting f

= (f g)

for all f, g L
0
.
138 Function spaces 241Hb
(b) It is now easy to check that, for all u, v, w L
0
and c R,
u (v w) = (u v) w,
u e = e u = u,
where e = X

is the equivalence class of the function with constant value 1,


c(u v) = cu v = u cv,
u (v +w) = (u v) + (u w),
(u +v) w = (u w) + (v w),
u v = v u,
[u v[ = [u[ [v[,
u v = 0 i [u[ [v[ = 0,
[u[ [v[ i there is a w such that [w[ e and u = v w.
241I The action of Borel functions on L
0
Let (X, , ) be a measure space and h : R R a Borel measurable
function. Then hf L
0
= L
0
() for every f L
0
(241Be) and hf =
a.e.
hg whenever f =
a.e.
g. So we have a function

h : L
0
L
0
dened by setting

h(f

) = (hf)

for every f L
0
. For instance, if u L
0
and p 1, we can consider
[u[
p
=

h(u) where h(x) = [x[
p
for x R.
241J Complex L
0
The ideas of this chapter, like those of Chapters 22-23, are often applied to spaces based on
complex-valued functions instead of real-valued functions. Let (X, , ) be a measure space.
(a) We may write L
0
C
= L
0
C
() for the space of complex-valued functions f such that domf is a conegligible subset
of X and there is a conegligible subset E X such that fE is measurable; that is, such that the real and imaginary
parts of f both belong to L
0
(). Next, L
0
C
= L
0
C
() will be the space of equivalence classes in L
0
C
under the equivalence
relation =
a.e.
.
(b) Using just the same formulae as in 241D, it is easy to describe addition and scalar multiplication rendering L
0
C
a
linear space over C. We no longer have quite the same kind of order structure, but we can identify a real part, being
f

: f L
0
C
is real a.e.,
obviously identiable with the real linear space L
0
, and corresponding maps u !e(u), u Jm(u) : L
0
C
L
0
such
that u = !e(u) +i Jm(u) for every u. Moreover, we have a notion of modulus, writing
[f

[ = [f[

L
0
for every f L
0
C
,
satisfying the basic relations [cu[ = [c[[u[, [u +v[ [u[ +[v[ for u, v L
0
C
and c C, as in 241Ef. We do of course still
have a multiplication on L
0
C
, for which all the formulae in 241H are still valid.
(c) The following fact is useful. For any u L
0
C
, [u[ is the supremum in L
0
of !e(u) : C, [[ = 1. PPP (i)
If [[ = 1, then !e(u) [u[ = [u[. So [u[ is an upper bound of !e(u) : [[ = 1. (ii) If v L
0
and !e(u) v
whenever [[ = 1, then express u, v as f

, g

where f : X C and g : X R are measurable. For any q Q,


x X set f
q
(x) = !e(e
iqx
f(x)). Then f
q

a.e.
g. Accordingly H = x : f
q
(x) g(x) for every q Q is conegligible.
But of course H = x : [f(x)[ g(x), so [f[
a.e.
g and [u[ v. As v is arbitrary, [u[ is the least upper bound of
!e(u) : [[ = 1. QQQ
241X Basic exercises >>>(a) Let X be a set, and let be counting measure on X (112Bd). Show that L
0
() can
be identied with L
0
() = R
X
.
>>>(b) Let (X, , ) be a measure space and the completion of . Show that L
0
() = L
0
( ) and L
0
() = L
0
( ).
(c) Let (X, , ) be a measure space. (i) Show that for every u L
0
() we may dene an outer measure
u
: TR
[0, ] by writing
u
(A) =

f
1
[A] whenever A R and f L
0
() is such that f

= u. (ii) Show that the measure


dened from
u
by Caratheodorys method measures every Borel subset of R.
(d) Let (X
i
,
i
,
i
)
iI
be a family of measure spaces, with direct sum (X, , ) (214L). (i) Writing
i
: X
i
X
for the canonical maps (in the construction of 214L,
i
(x) = (x, i) for x X
i
), show that f f
i

iI
is a bijection
between L
0
() and

iI
L
0
(
i
). (ii) Show that this corresponds to a bijection between L
0
() and

iI
L
0
(
i
).
(e) Let U be a Dedekind -complete Riesz space and A U a non-empty countable set which is bounded below in
U. Show that inf A is dened in U.
241Yg L
0
and L
0
139
(f ) Let U be a Dedekind complete Riesz space and A U a non-empty set which is bounded below in U. Show
that inf A is dened in U.
(g) Let (X, , ) and (Y, T, ) be measure spaces, and : X Y an inverse-measure-preserving function. (i) Show
that we have a map T : L
0
() L
0
() dened by setting Tg

= (g)

for every g L
0
(). (ii) Show that T is linear,
that T(v w) = Tv Tw for all v, w L
0
(), and that T(sup
nN
v
n
) = sup
nN
Tv
n
whenever v
n

nN
is a sequence
in L
0
() with an upper bound in L
0
().
>>>(h) Let (X, , ) be a measure space. Suppose that r 1 and that h : R
r
R is a Borel measurable function.
Show that there is a function

h : L
0
()
r
L
0
() dened by writing

h(f

1
, . . . , f

r
) = (h(f
1
, . . . , f
r
))

for f
1
, . . . , f
r
L
0
().
(i) Let (X, , ) be a measure space and g, h, g
n

nN
Borel measurable functions from R to itself; write g,

h, g
n
for the corresponding functions from L
0
= L
0
() to itself (241I). (i) Show that
g(u) +

h(u) = g +h(u), g(u)

h(u) = g h(u), g(

h(u)) = gh(u)
for every u L
0
. (ii) Show that if g(t) h(t) for every t R, then g(u)

h(u) for every u L
0
. (iii) Show that if g
is non-decreasing, then g(u) g(v) whenever u v in L
0
. (iv) Show that if h(t) = sup
nN
g
n
(t) for every t R, then

h(u) = sup
nN
g
n
(u) in L
0
for every u L
0
.
241Y Further exercises (a) Let U be any Riesz space. For u U write [u[ = u(u), u
+
= u0, u

= (u) 0.
Show that, for any u, v U,
u = u
+
u

, [u[ = u
+
+u

= u
+
u

, u
+
u

= 0,
u v =
1
2
(u +v +[u v[) = u + (v u)
+
,
u v =
1
2
(u +v [u v[) = u (u v)
+
,
[u +v[ [u[ +[v[.
(b) Let U be a partially ordered linear space and N a linear subspace of U such that whenever u, u

N and
u

v u then v N. (i) Show that the linear space quotient U/N is a partially ordered linear space if we say that
u

in U/N i there is a w N such that u v +w in U. (ii) Show that in this case U/N is a Riesz space if U is
a Riesz space and [u[ N for every u N.
(c) Let (X, , ) be a measure space. Write L
0
strict
for the space of all measurable functions from X to R, and N
for the subspace of L
0
strict
consisting of measurable functions which are zero almost everywhere. (i) Show that L
0
strict
is
a Dedekind -complete Riesz space. (ii) Show that L
0
() can be identied, as ordered linear space, with the quotient
L
0
strict
/N as dened in 241Yb above.
(d) Show that any Dedekind -complete Riesz space is Archimedean.
(e) A Riesz space U is said to have the countable sup property if for every A U with a least upper bound in
U, there is a countable B A such that sup B = sup A. Show that if (X, , ) is a semi-nite measure space, then it
is -nite i L
0
() has the countable sup property.
(f ) Let (X, , ) be a measure space and the c.l.d. version of (213E). (i) Show that L
0
() L
0
( ). (ii) Show that
this inclusion denes a linear operator T : L
0
() L
0
( ) such that T(uv) = TuTv for all u, v L
0
(). (iii) Show
that whenever v > 0 in L
0
( ) there is a u 0 in L
0
() such that 0 < Tu v. (iv) Show that T(sup A) = sup T[A]
whenever A L
0
() is a non-empty set with a least upper bound in L
0
(). (v) Show that T is injective i is
semi-nite. (vi) Show that if is localizable, then T is an isomorphism for the linear and order structures of L
0
()
and L
0
( ). (Hint: 213Hb.)
(g) Let (X, , ) be a measure space and Y any subset of X; let
Y
be the subspace measure on Y . (i) Show that
L
0
(
Y
) = fY : f L
0
(). (ii) Show that there is a canonical surjection T : L
0
() L
0
(
Y
) dened by setting
T(f

) = (fY )

for every f L
0
(), which is linear and multiplicative and preserves nite suprema and inma, so
that (in particular) T([u[) = [Tu[ for every u L
0
(). (iii) Show that T is injective i Y has full outer measure.
140 Function spaces 241Yh
(h) Suppose, in 241Yg, that Y . Explain how L
0
(
Y
) may be identied (as ordered linear space) with the
subspace u : u (X Y )

= 0 of L
0
().
(i) Let (X, , ) be a measure space, and h : R R a non-decreasing function which is continuous on the left. Show
that if A L
0
= L
0
() is a non-empty set with a supremum v L
0
, then

h(v) = sup
uA

h(u), where

h : L
0
L
0
is
the function described in 241I.
241 Notes and comments As hinted in 241Ya and 241Yd, the elementary properties of the space L
0
which take up
most of this section are strongly interdependent; it is not dicult to develop a theory of Riesz algebras to incorporate
the ideas of 241H into the rest. (Indeed, I sketch such a theory in 352 in the next volume.)
If we write L
0
strict
for the space of measurable functions from X to R, then L
0
strict
is also a Dedekind -complete
Riesz space, and L
0
can be identied with the quotient L
0
strict
/N, writing N for the set of functions in L
0
strict
which are
zero almost everywhere. (To do this properly, we need a theory of quotients of ordered linear spaces; see 241Yb-241Yc
above.) Of course L
0
, as I dene it, is not quite a linear space. I choose the slightly more awkward description of L
0
as
a space of equivalence classes in L
0
rather than in L
0
strict
because it frequently happens in practice that a member of L
0
arises from a member of L
0
which is either not dened at every point of the underlying space, or not quite measurable;
and to adjust such a function so that it becomes a member of L
0
strict
, while trivial, is an arbitrary process which to my
mind is liable to distort the true nature of such a construction. Of course the same argument could be used in favour
of a slightly larger space, the space L
0

of -virtually measurable [, ]-valued functions dened and nite almost


everywhere, relying on 135E rather than on 121E-121F. But I maintain that the operation of restricting a function in
L
0

to the set on which it is nite is not arbitrary, but canonical and entirely natural.
Reading the exposition above or, for that matter, scanning the rest of this chapter you are sure to notice a
plethora of

s, adding a distinctive character to the pages which, I expect you will feel, is disagreeable to the eye and
daunting, or at any rate wearisome, to the spirit. Many, perhaps most, authors prefer to simplify the typography by
using the same symbol for a function in L
0
or L
0
strict
and for its equivalence class in L
0
; and indeed it is common to
use syntax which does not distinguish between them either, so that an object which has been dened as a member of
L
0
will suddenly become a function with actual values at points of the underlying measure space. I prefer to maintain
a rigid distinction; you must choose for yourself whether to follow me. Since I have chosen the more cumbersome
form, I suppose the burden of proof is on me, to justify my decision. (i) Anyone would agree that there is at least a
formal dierence between a function and a set of functions. This by itself does not justify insisting on the dierence
in every sentence; mathematical exposition would be impossible if we always insisted on consistency in such questions
as whether (for instance) the number 3 belonging to the set N of natural numbers is exactly the same object as the
number 3 belonging to the set C of complex numbers, or the ordinal 3. But the dierence between an object and
a set to which it belongs is a sucient dierence in kind to make any confusion extremely dangerous, and while I
agree that you can study this topic without using dierent symbols for f and f

, I do not think you can ever safely


escape a mental distinction for more than a few lines of argument. (ii) As a teacher, I have to say that quite a few
students, encountering this material for the rst time, are misled by any failure to make the distinction between f and
f

into believing that no distinction need be made; and as a teacher I always insist on a student convincing me, by
correctly writing out the more pedantic forms of the arguments for a few weeks, that he understands the manipulations
necessary, before I allow him to go his own way. (iii) The reason why it is possible to evade the distinction in certain
types of argument is just that the Dedekind -complete Riesz space L
0
strict
parallels the Dedekind -complete Riesz
space L
0
so closely that any proposition involving only countably many members of these spaces is likely to be valid in
one if and only if it is valid in the other. In my view, the implications of this correspondence are at the very heart of
measure theory. I prefer therefore to keep it constantly conspicuous, reminding myself through symbolism that every
theorem has a Siamese twin, and rising to each challenge to express the twin theorem in an appropriate language. (iv)
There are ways in which L
0
strict
and L
0
are actually very dierent, and many interesting ideas can be expressed only in
a language which keeps them clearly separated.
For more than half my life now I have felt that these points between them are sucient reason for being consistent
in maintaining the formal distinction between f and f

. You may feel that in (iii) and (iv) of the last paragraph I
am trying to have things both ways; I am arguing that both the similarities and the dierences between L
0
and L
0
support my case. Indeed that is exactly my position. If they were totally dierent, using the same language for both
would not give rise to confusion; if they were essentially the same, it would not matter if we were sometimes unclear
which we were talking about.
242Cd L
1
141
242 L
1
While the space L
0
treated in the previous section is of very great intrinsic interest, its chief use in the elementary
theory is as a space in which some of the most important spaces of functional analysis are embedded. In the next few
sections I introduce these one at a time.
The rst is the space L
1
of equivalence classes of integrable functions. The importance of this space is not only that
it oers a language in which to express those many theorems about integrable functions which do not depend on the
dierences between two functions which are equal almost everywhere. It can also appear as the natural space in which
to seek solutions to a wide variety of integral equations, and as the completion of a space of continuous functions.
242A The space L
1
Let (X, , ) be any measure space.
(a) Let L
1
= L
1
() be the set of real-valued functions, dened on subsets of X, which are integrable over X. Then
L
1
L
0
= L
0
(), as dened in 241, and, for f L
0
, we have f L
1
i there is a g L
1
such that [f[
a.e.
g; if
f L
1
, g L
0
and f =
a.e.
g, then g L
1
. (See 122P-122R.)
(b) Let L
1
= L
1
() L
0
= L
0
() be the set of equivalence classes of members of L
1
. If f, g L
1
and f =
a.e.
g
then
_
f =
_
g (122Rb). Accordingly we may dene a functional
_
on L
1
by writing
_
f

=
_
f for every f L
1
.
(c) It will be convenient to be able to write
_
A
u for u L
1
, A X; this may be dened by saying that
_
A
f

=
_
A
f
for every f L
1
, where the integral is dened in 214D. PPP I have only to check that if f =
a.e.
g then
_
A
f =
_
A
g; and
this is because fA = gA almost everywhere in A. QQQ
If E and u L
1
then
_
E
u =
_
u (E)

; this is because
_
E
f =
_
f E for every integrable function f
(131Fa).
(d) If u L
1
, there is a -measurable, -integrable function f : X R such that f

= u. PPP As noted in 241Bk,


there is a measurable f : X R such that f

= u; but of course f is integrable because it is equal almost everywhere


to some integrable function. QQQ
242B Theorem Let (X, , ) be any measure space. Then L
1
() is a linear subspace of L
0
() and
_
: L
1
R is
a linear functional.
proof If u, v L
1
= L
1
() and c R let f, g be integrable functions such that u = f

and v = g

; then f +g and cf
are integrable, so u +v = (f +g)

and cu = (cf)

belong to L
1
. Also
_
u +v =
_
f +g =
_
f +
_
g =
_
u +
_
v
and
_
cu =
_
cf = c
_
f = c
_
u.
242C The order structure of L
1
Let (X, , ) be any measure space.
(a) L
1
= L
1
() has an order structure derived from that of L
0
= L
0
() (241E); that is, f

i f g a.e. Being
a linear subspace of L
0
, L
1
must be a partially ordered linear space; the two conditions of 241Ec are obviously inherited
by linear subspaces.
Note also that if u, v L
1
and u v then
_
u
_
v, because if f, g are integrable functions and f
a.e.
g then
_
f
_
g (122Od).
(b) If u L
0
, v L
1
and [u[ [v[ then u L
1
. PPP Let f L
0
= L
0
(), g L
1
= L
1
() be such that u = f

and
v = g

; then g is integrable and [f[


a.e.
[g[, so f is integrable and u L
1
. QQQ
(c) In particular, [u[ L
1
whenever u L
1
, and
[
_
u[ = max(
_
u,
_
(u))
_
[u[,
because u, u [u[.
(d) Because [u[ L
1
for every u L
1
,
u v =
1
2
(u +v +[u v[), u v =
1
2
(u +v [u v[)
142 Function spaces 242Cd
belong to L
1
for all u, v L
1
. But if w L
1
we surely have
w u & w v w u v,
w u & w v w u v
because these are true for all w L
0
, so u v = supu, v and u v = infu, v in L
1
. Thus L
1
is, in itself, a Riesz
space.
(e) Note that if u L
1
, then u 0 i
_
E
u 0 for every E ; this is because if f is an integrable function on X
and
_
E
f 0 for every E , then f 0 a.e. (131Fb). More generally, if u, v L
1
and
_
E
u
_
E
v for every E ,
then u v. It follows at once that if u, v L
1
and
_
E
u =
_
E
v for every E , then u = v (cf. 131Fc).
(f ) If u 0 in L
1
, there is a non-negative f L
1
such that f

= u (compare 241Eg).
242D The norm of L
1
Let (X, , ) be any measure space.
(a) For f L
1
= L
1
() I write |f|
1
=
_
[f[ [0, [. For u L
1
= L
1
() set |u|
1
=
_
[u[, so that |f

|
1
= |f|
1
for every f L
1
. Then | |
1
is a norm on L
1
. PPP (i) If u, v L
1
then [u +v[ [u[ +[v[, by 241Ee, so
|u +v|
1
=
_
[u +v[
_
[u[ +[v[ =
_
[u[ +
_
[v[ = |u|
1
+|v|
1
.
(ii) If u L
1
and c R then
|cu|
1
=
_
[cu[ =
_
[c[[u[ = [c[
_
[u[ = [c[|u|
1
.
(iii) If u L
1
and |u|
1
= 0, express u as f

, where f L
1
; then
_
[f[ =
_
[u[ = 0. Because [f[ is non-negative, it must
be zero almost everywhere (122Rc), so f = 0 a.e. and u = 0 in L
1
. QQQ
(b) Thus L
1
, with | |
1
, is a normed space and
_
: L
1
R is a linear operator; observe that |
_
| 1, because
[
_
u[
_
[u[ = |u|
1
for every u L
1
.
(c) If u, v L
1
and [u[ [v[, then
|u|
1
=
_
[u[
_
[v[ = |v|
1
.
In particular, |u|
1
= |[u[|
1
for every u L
1
.
(d) Note the following property of the normed Riesz space L
1
: if u, v L
1
and u, v 0, then
|u +v|
1
=
_
u +v =
_
u +
_
v = |u|
1
+|v|
1
.
(e) The set (L
1
)
+
= u : u 0 is closed in L
1
. PPP If v L
1
, u (L
1
)
+
then |u v|
1
|v 0|
1
; this is because if
f, g L
1
and f 0 a.e., [f(x) g(x)[ [ min(g(x), 0)[ whenever f(x) and g(x) are both dened and f(x) 0, which
is almost everywhere, so
|u v|
1
=
_
[f g[
_
[g 0[ = |v 0|
1
.
Now this means that if v L
1
and v , 0, the ball w : |w v|
1
< does not meet (L
1
)
+
, where = |v 0|
1
> 0
because v 0 ,= 0. Thus L
1
(L
1
)
+
is open and (L
1
)
+
is closed. QQQ
242E For the next result we need a variant of B.Levis theorem.
Lemma Let (X, , ) be a measure space and f
n

nN
a sequence of -integrable real-valued functions such that

n=0
_
[f
n
[ < . Then f =

n=0
f
n
is integrable and
_
f =

n=0
_
f
n
,
_
[f[

n=0
_
[f
n
[.
proof (a) Suppose rst that every f
n
is non-negative. Set g
n
=

n
k=0
f
k
for each n; then g
n

nN
is increasing a.e.
and
lim
n
_
g
n
=

k=0
_
f
k
242H L
1
143
is nite, so by B.Levis theorem (123A) f = lim
n
g
n
is integrable and
_
f = lim
n
_
g
n
=

k=0
_
f
k
.
In this case, of course,
_
[f[ =
_
f =

n=0
_
f
n
=

n=0
_
[f
n
[.
(b) For the general case, set f
+
n
=
1
2
([f
n
[ + f
n
), f

n
=
1
2
([f
n
[ f
n
), as in 241Ef; then f
+
n
and f

n
are non-negative
integrable functions, and

n=0
_
f
+
n
+

n=0
_
f

n
=

n=0
_
[f
n
[ < .
So h
1
=

n=0
f
+
n
and h
2
=

n=0
f

n
are both integrable. Now f =
a.e.
h
1
h
2
, so
_
f =
_
h
1

_
h
2
=

n=0
_
f
+
n

n=0
_
f

n
=

n=0
_
f
n
.
Finally
_
[f[
_
[h
1
[ +
_
[h
2
[ =

n=0
_
f
+
n
+

n=0
_
f

n
=

n=0
_
[f
n
[.
242F Theorem For any measure space (X, , ), L
1
() is complete under its norm | |
1
.
proof Let u
n

nN
be a sequence in L
1
such that |u
n+1
u
n
|
1
4
n
for every n N. Choose integrable functions
f
n
such that f

0
= u
0
, f

n+1
= u
n+1
u
n
for each n N. Then

n=0
_
[f
n
[ = |u
0
|
1
+

n=0
|u
n+1
u
n
|
1
< .
So f =

n=0
f
n
is integrable, by 242E, and u = f

L
1
. Set g
n
=

n
j=0
f
j
for each n; then g

n
= u
n
, so
|u u
n
|
1
=
_
[f g
n
[
_

j=n+1
[f
j
[

j=n+1
4
j
= 4
n
/3
for each n. Thus u = lim
n
u
n
in L
1
. As u
n

nN
is arbitrary, L
1
is complete (2A4E).
242G Denition It will be convenient, for later reference, to introduce the following phrase. A Banach lattice
is a Riesz space U together with a norm | | on U such that (i) |u| |v| whenever u, v U and [u[ [v[, writing [u[
for u (u), as in 241Ee (ii) U is complete under | |. Thus 242Dc and 242F amount to saying that the normed Riesz
space (L
1
, | |
1
) is a Banach lattice.
242H L
1
as a Riesz space We can discuss the ordered linear space L
1
in the language already used in 241E-241G
for L
0
.
Theorem Let (X, , ) be any measure space. Then L
1
= L
1
() is Dedekind complete.
proof (a) Let A L
1
be any non-empty set which is bounded above in L
1
. Set
A

= u
0
. . . u
n
: u
0
, . . . , u
n
A.
Then A A

, A

has the same upper bounds as A and u v A

for all u, v A

. Taking w
0
to be any upper bound
of A and A

, we have
_
u
_
w
0
for every u A

, so = sup
uA

_
u is dened in R. For each n N, choose u
n
A

such that
_
u
n
2
n
. Because L
0
= L
0
() is Dedekind -complete (241Ga), u

= sup
nN
u
n
is dened in L
0
, and
u
0
u

w
0
in L
0
. Consequently
0 u

u
0
w
0
u
0
in L
0
. But w
0
u
0
L
1
, so u

u
0
L
1
(242Cb) and u

L
1
.
(b) The point is that u

is an upper bound for A. PPP If u A, then u u


n
A

for every n, so
|u u u

|
1
=
_
u u u


_
u u u
n
(because u u
n
u
n
u

, so u u
n
u u

)
=
_
u u
n
u
n
(because u u
n
+u u
n
= u +u
n
see the formulae in 242Cd)
144 Function spaces 242H
=
_
u u
n

_
u
n
( 2
n
) = 2
n
for every n; so |u u u

|
1
= 0. But this means that u = u u

, that is, that u u

. As u is arbitrary, u

is an
upper bound for A. QQQ
(c) On the other hand, any upper bound for A is surely an upper bound for u
n
: n N, so is greater than or
equal to u

. Thus u

= sup A in L
1
. As A is arbitrary, L
1
is Dedekind complete.
Remark Note that the order-completeness of L
1
, unlike that of L
0
, does not depend on any particular property of the
measure space (X, , ).
242I The Radon-Nikod ym theorem I think it is worth re-writing the Radon-Nikod ym theorem (232E) in the
language of this chapter.
Theorem Let (X, , ) be a measure space. Then there is a canonical bijection between L
1
= L
1
() and the set of
truly continuous additive functionals : R, given by the formula
F =
_
F
u for F , u L
1
.
Remark Recall that if is -nite, then the truly continuous additive functionals are just the absolutely continuous
countably additive functionals; and that if is totally nite, then all absolutely continuous (nitely) additive functionals
are truly continuous (232Bd).
proof For u L
1
, F set
u
F =
_
F
u. If u L
1
, there is an integrable function f such that f

= u, in which case
F
u
F =
_
F
f : R
is additive and truly continuous, by 232D. If : R is additive and truly continuous, then by 232E there is an
integrable function f such that F =
_
F
f for every F ; setting u = f

in L
1
, =
u
. Finally, if u, v are distinct
members of L
1
, there is an F such that
_
F
u ,=
_
F
v (242Ce), so that
u
,=
v
; thus u
u
is injective as well as
surjective.
242J Conditional expectations revisited We now have the machinery necessary for a new interpretation of
some of the ideas of 233.
(a) Let (X, , ) be a measure space, and T a -subalgebra of , as in 233A. Then (X, T, T) is a measure
space, and L
0
(T) L
0
(); moreover, if f, g L
0
(T), then f = g (T)-a.e. i f = g -a.e. PPP There are
T-conegligible sets F, G T such that fF and gG are T-measurable; set
E = x : x F G, f(x) ,= g(x) T;
then
f = g (T)-a.e. (T)(E) = 0 E = 0 f = g -a.e. QQQ
Accordingly we have a canonical map S : L
0
(T) L
0
() dened by saying that if u L
0
(T) is the equivalence
class of f L
0
(T), then Su is the equivalence class of f in L
0
(). It is easy to check, working through the
operations described in 241D, 241E and 241H, that S is linear, injective and order-preserving, and that [Su[ = S[u[,
S(u v) = Su Sv and S(u v) = Su Sv for u, v L
0
(T).
(b) Next, if f L
1
(T), then f L
1
() and
_
fd =
_
fd(T) (233B); so Su L
1
() and |Su|
1
= |u|
1
for
every u L
1
(T).
Observe also that every member of L
1
() S[L
0
(T)] is actually in S[L
1
(T)]. PPP Take u L
1
() S[L
0
(T)].
Then u is expressible both as f

where f L
1
(), and as g

where g L
0
(T). So g =
a.e.
f, and g is -integrable,
therefore (T)-integrable (233B again). QQQ
This means that S : L
1
(T) L
1
() S[L
0
(T)] is a bijection.
(c) Now suppose that X = 1, so that (X, , ) is a probability space. Recall that g is a conditional expectation of f
on T if g is T-integrable, f is -integrable and
_
F
g =
_
F
f for every F T; and that every -integrable function has
such a conditional expectation (233D). If g is a conditional expectation of f and f
1
= f -a.e. then g is a conditional
expectation of f
1
, because
_
F
f
1
=
_
F
f for every F; and I have already remarked in 233Dc that if g, g
1
are conditional
expectations of f on T then g = g
1
T-a.e.
242M L
1
145
(d) This means that we have an operator P : L
1
() L
1
(T) dened by saying that P(f

) = g

whenever
g L
1
(T) is a conditional expectation of f L
1
() on T; that is, that
_
F
Pu =
_
F
u whenever u L
1
() and
F T. If we identify L
1
(), L
1
(T) with the sets of absolutely continuous additive functionals dened on and T,
as in 242I, then P corresponds to the operation T.
(e) Because Pu is uniquely dened in L
1
(T) by the requirement
_
F
Pu =
_
F
u for every F T (242Ce), we see
that P must be linear. PPP If u, v L
1
() and c R, then
_
F
Pu +Pv =
_
F
Pu +
_
F
Pv =
_
F
u +
_
F
v =
_
F
u +v =
_
F
P(u +v),
_
F
P(cu) =
_
F
cu = c
_
F
u = c
_
F
Pu =
_
F
cPu
for every F T. QQQ Also, if u 0, then
_
F
Pu =
_
F
u 0 for every F T, so Pu 0 (242Ce again).
It follows at once that P is order-preserving, that is, that Pu Pv whenever u v. Consequently
[Pu[ = Pu (Pu) = Pu P(u) P[u[
for every u L
1
(), because u [u[ and u [u[. Finally, P is a bounded linear operator, with norm 1. PPP The last
formula tells us that
|Pu|
1
|P[u[|
1
=
_
P[u[ =
_
[u[ = |u|
1
for every u L
1
(), so |P| 1. On the other hand, P(X

) = X

,= 0, so |P| = 1. QQQ
(f ) We may legitimately regard Pu L
1
(T) as the conditional expectation of u L
1
() on T; P is the
conditional expectation operator.
(g) If u L
1
(T), then we have a corresponding Su L
1
(), as in (b); now PSu = u. PPP
_
F
PSu =
_
F
Su =
_
F
u
for every F T. QQQ Consequently SPSP = SP : L
1
() L
1
().
(h) The distinction drawn above between u = f

L
0
(T) and Su = f

L
0
() is of course pedantic. I believe
it is necessary to be aware of such distinctions, even though for nearly all purposes it is safe as well as convenient
to regard L
0
(T) as actually a subset of L
0
(). If we do so, then (b) tells us that we can identify L
1
(T) with
L
1
() L
0
(T), while (g) becomes P
2
= P.
242K The language just introduced allows the following re-formulations of 233J-233K.
Theorem Let (X, , ) be a probability space and T a -subalgebra of . Let : R R be a convex function and

: L
0
() L
0
() the corresponding operator dened by setting

(f

) = (f)

(241I). If P : L
1
() L
1
(T) is the
conditional expectation operator, then

(Pu) P(

u) whenever u L
1
() is such that

(u) L
1
().
proof This is just a restatement of 233J.
242L Proposition Let (X, , ) be a probability space, and T a -subalgebra of . Let P : L
1
() L
1
(T)
be the corresponding conditional expectation operator. If u L
1
= L
1
() and v L
0
(T), then u v L
1
i
P[u[ v L
1
, and in this case P(u v) = Pu v; in particular,
_
u v =
_
Pu v.
proof (I am here using the identication of L
0
(T) as a subspace of L
0
(), as suggested in 242Jh.) Express u as
f

and v as h

, where f L
1
= L
1
() and h L
0
(T). Let g, g
0
L
1
(T) be conditional expectations of f, [f[
respectively, so that Pu = g

and P[u[ = g

0
. Then, using 233K,
u v L
1
f h L
1
g
0
h L
1
P[u[ v L
1
,
and in this case g h is a conditional expectation of f h, that is, Pu v = P(u v).
242M L
1
as a completion I mentioned in the introduction to this section that L
1
appears in functional analysis
as a completion of some important spaces; put another way, some dense subspaces of L
1
are signicant. The rst is
elementary.
Proposition Let (X, , ) be any measure space, and write S for the space of -simple functions on X. Then
(a) whenever f is a -integrable real-valued function and > 0, there is an h S such that
_
[f h[ ;
(b) S = f

: f S is a dense linear subspace of L


1
= L
1
().
146 Function spaces 242M
proof (a)(i) If f is non-negative, then there is a simple function h such that h
a.e.
f and
_
h
_
f
1
2
(122K), in
which case
_
[f h[ =
_
f h =
_
f
_
h
1
2
.
(ii) In the general case, f is expressible as a dierence f
1
f
2
of non-negative integrable functions. Now there are h
1
,
h
2
S such that
_
[f
j
h
j
[
1
2
for both j and
_
[f h[
_
[f
1
h
1
[ +
_
[f
2
h
2
[ .
(b) Because S is a linear subspace of R
X
included in L
1
= L
1
(), S is a linear subspace of L
1
. If u L
1
and > 0,
there are an f L
1
such that f

= u and an h S such that


_
[f h[ ; now v = h

S and
|u v|
1
=
_
[f h[ .
As u and are arbitrary, S is dense in L
1
.
242N As always, Lebesgue measure on R
r
and its subsets is by far the most important example; and in this
case we have further classes of dense subspace of L
1
. If you have reached this point without yet troubling to master
multi-dimensional Lebesgue measure, just take r = 1. If you feel uncomfortable with general subspace measures, take
X to be R
r
or [0, 1] R or some other particular subset which you nd interesting. The following term will be useful.
Denition If f is a real- or complex-valued function dened on a subset of R
r
, say that the support of f is
x : x domf, f(x) ,= 0.
242O Theorem Let X be any subset of R
r
, where r 1, and let be Lebesgue measure on X, that is, the
subspace measure on X induced by Lebesgue measure on R
r
. Write C
k
for the space of bounded continuous functions
f : R
r
R which have bounded support, and S
0
for the space of linear combinations of functions of the form I where
I R
r
is a bounded half-open interval. Then
(a) whenever f L
1
= L
1
() and > 0, there are g C
k
, h S
0
such that
_
X
[f g[ and
_
X
[f h[ ;
(b) (gX)

: g C
k
and (hX)

: h S
0
are dense linear subspaces of L
1
= L
1
().
Remark Of course there is a redundant bounded in the description of C
k
; see 242Xh.
proof (a) I argue in turn that the result is valid for each of an increasing number of members f of L
1
= L
1
(). Write

r
for Lebesgue measure on R
r
, so that is the subspace measure (
r
)
X
.
(i) Suppose rst that f = IX where I R
r
is a bounded half-open interval. Of course I is already in S
0
,
so I have only to show that it is approximated by members of C
k
. If I = the result is trivial; we can take g = 0.
Otherwise, express I as [a b, a +b[ where a = (
1
, . . . ,
r
), b = (
1
, . . . ,
r
) and
j
> 0 for each j. Let > 0 be such
that
2
r

r
j=1
(
j
+) + 2
r

r
j=1

j
.
For R set
g
j
() = 1 if [
j
[
j
,
= (
j
+ [
j
[)/ if
j
[
j
[
j
+,
= 0 if [
j
[
j
+.

j

1
The function g
j
242O L
1
147
For x = (
1
, . . . ,
r
) R
r
set
g(x) =

r
j=1
g
j
(
j
).
Then g C
k
and I g J, where J = [a b 1, a +b +1] (writing 1 = (1, . . . , 1)), so that (by the choice of )

r
J
r
I +, and
_
X
[g f[
_
((J X) (I X))d = ((J I) X)

r
(J I) =
r
J
r
I ,
as required.
(ii) Now suppose that f = (X E) where E R
r
is a set of nite measure. Then there is a disjoint family
I
0
, . . . , I
n
of half-open intervals such that
r
(E

jn
I
j
)
1
2
. PPP There is an open set G E such that
r
(G
E)
1
4
(134Fa). For each m N, let J
m
be the family of half-open intervals in R
r
of the form [a, b[ where
a = (2
m
k
1
, . . . , 2
m
k
r
), k
1
, . . . , k
r
being integers, and b = a + 2
m
1; then J
m
is a disjoint family. Set H
m
=

I : I J
m
, I G; then H
m

mN
is a non-decreasing family with union G, so that there is an m such that

r
(G H
m
)
1
4
and
r
(EH
m
)
1
2
. But now H
m
is expressible as a disjoint union

jn
I
j
where I
0
, . . . , I
n
enumerate the members of J
m
included in H
m
. (The last sentence derails if H
m
is empty. But if H
m
= then we can
take n = 0, I
0
= .) QQQ
Accordingly h =

n
j=0
I
j
S
0
and
_
X
[f h[ = (X (E

jn
I
j
))
1
2
.
As for C
k
, (i) tells us that there is for each j n a g
j
C
k
such that
_
X
[g
j
I
j
[ /2(n+1), so that g =

n
j=0
g
j
C
k
and
_
X
[f g[
_
X
[f h[ +
_
X
[h g[

2
+

n
j=0
_
X
[g
j
I
j
[ .
(iii) If f is a simple function, express f as

n
k=0
a
k
E
k
where each E
k
is of nite measure in X. Each E
k
is
expressible as X F
k
where
r
F
k
= E
k
(214Ca). By (ii), we can nd g
k
C
k
, h
k
S
0
such that
[a
k
[
_
X
[g
k
F
k
[

n+1
, [a
k
[
_
X
[h
k
F
k
[

n+1
for each k. Set g =

n
k=0
a
k
g
k
and h =

n
k=0
a
k
h
k
; then g C
k
, h S
0
and
_
X
[f g[
_
X

n
k=0
[a
k
[[F
k
g
k
[ =

n
k=0
[a
k
[
_
X
[F
k
g
k
[ ,
_
X
[f h[

n
k=0
[a
k
[
_
X
[F
k
h
k
[ ,
as required.
(iv) If f is any integrable function on X, then by 242Ma we can nd a simple function f
0
such that
_
[f f
0
[
1
2
,
and now by (iii) there are g C
k
, h S
0
such that
_
X
[f
0
g[
1
2
,
_
X
[f
0
h[
1
2
; so that
_
X
[f g[
_
X
[f f
0
[ +
_
X
[f
0
g[ ,
_
X
[f h[
_
X
[f f
0
[ +
_
X
[f
0
h[ .
(b)(i) We must check rst that if g C
k
then gX is actually -integrable. The point here is that if g C
k
and
a R then
x : x X, g(x) > a
is the intersection of X with an open subset of R
r
, and is therefore measured by , because all open sets are measured
by
r
(115G). Next, g is bounded and the set E = x : x X, g(x) ,= 0 is bounded in R
r
, therefore of nite outer
measure for
r
and of nite measure for . Thus there is an M 0 such that [g[ ME, which is -integrable.
Accordingly g is -integrable.
Of course hX is -integrable for every h S
0
because (by the denition of subspace measure) (I X) is dened
and nite for every bounded half-open interval I.
148 Function spaces 242O
(ii) Now the rest follows by just the same arguments as in 242Mb. Because gX : g C
k
and hX : h S
0

are linear subspaces of R


X
included in L
1
(), their images C
#
k
and S
#
0
are linear subspaces of L
1
. If u L
1
and > 0,
there are an f L
1
such that f

= u, and g C
k
, h S
0
such that
_
X
[f g[,
_
X
[f h[ ; now v = (gX)

C
#
k
and w = (hX)

S
#
0
and
|u v|
1
=
_
X
[f g[ , |u w|
1
=
_
X
[f h[ .
As u and are arbitrary, C
#
k
and S
#
0
are dense in L
1
.
242P Complex L
1
As you would, I hope, expect, we can repeat the work above with L
1
C
, the space of complex-
valued integrable functions, in place of L
1
, to construct a complex Banach space L
1
C
. The required changes, based on
the ideas of 241J, are minor.
(a) In 242Aa, it is perhaps helpful to remark that, for f L
0
C
,
f L
1
C
[f[ L
1
!e(f), Jm(f) L
1
.
Consequently, for u L
0
C
,
u L
1
C
[u[ L
1
!e(u), Jm(u) L
1
.
(b) To prove a complex version of 242E, observe that if f
n

nN
is a sequence in L
1
C
such that

n=0
_
[f
n
[ < ,
then

n=0
_
[ !e(f
n
)[ and

n=0
_
[ Jm(f
n
)[ are both nite, so we may apply 242E twice and see that
_
(

n=0
f
n
) =
_
(

n=0
!e(f
n
)) +
_
(

n=0
Jm(f
n
)) =

n=0
_
f
n
.
Accordingly we can prove that L
1
C
is complete under | |
1
by the argument of 242F.
(c) Similarly, little change is needed to adapt 242J to give a description of a conditional expectation operator
P : L
1
C
() L
1
C
(T) when (X, , ) is a probability space and T is a -subalgebra of . In the formula
[Pu[ P[u[
of 242Je, we need to know that
[Pu[ = sup
||=1
!e(Pu)
in L
0
(T) (241Jc), while
!e(Pu) = !e(P(u)) = P(!e(u)) P[u[
whenever [[ = 1.
(d) In 242M, we need to replace S by S
C
, the space of complex-valued simple functions of the form

n
k=0
a
k
E
k
where each a
k
is a complex number and each E
k
is a measurable set of nite measure; then we get a dense linear
subspace S
C
= f

: f S
C
of L
1
C
. In 242O, we must replace C
k
by C
k
(R
r
; C), the space of bounded continuous
complex-valued functions of bounded support, and S
0
by the linear span over C of I : I is a bounded half-open
interval.
242X Basic exercises >>>(a) Let X be a set, and let be counting measure on X. Show that L
1
() can be
identied with the space
1
(X) of absolutely summable real-valued functions on X (see 226A). In particular, the space

1
=
1
(N) of absolutely summable real-valued sequences is an L
1
space. Write out proofs of 242F adapted to these
special cases.
>>>(b) Let (X, , ) be any measure space, and the completion of . Show that L
1
( ) = L
1
() and L
1
( ) = L
1
()
(cf. 241Xb).
(c) Let (X
i
,
i
,
i
)
iI
be a family of measure spaces, and (X, , ) their direct sum. Show that the isomorphism
between L
0
() and

iI
L
0
(
i
) (241Xd) induces an identication between L
1
() and
u : u

iI
L
1
(
i
), |u| =

iI
|u(i)|
1
<

iI
L
1
(
i
).
(d) Let (X, , ) and (Y, T, ) be measure spaces, and : X Y an inverse-measure-preserving function. Show
that g g : L
1
() L
1
() (235G) induces a linear operator T : L
1
() L
1
() such that |Tv|
1
= |v|
1
for every
v L
1
().
242Yg L
1
149
(e) Let U be a Riesz space (denition: 241Ed). A Riesz norm on U is a norm | | such that |u| |v| whenever
[u[ [v[. Show that if U is given its norm topology (2A4Bb) for such a norm, then (i) u [u[ : U U, (u, v)
u v : U U U are continuous (ii) u : u 0 is closed.
(f ) Show that any Banach lattice must be an Archimedean Riesz space (241Fa).
(g) Let (X, , ) be a probability space, and T a -subalgebra of , a -subalgebra of T. Let P
1
: L
1
()
L
1
(T), P
2
: L
1
(T) L
1
() and P : L
1
() L
1
() be the corresponding conditional expectation operators.
Show that P = P
2
P
1
.
(h) Show that if g : R
r
R is continuous and has bounded support it is bounded and attains its bounds. (Hint:
2A2F-2A2G.)
(i) Let be Lebesgue measure on R. (i) Take > 0. Show that if

(x) = exp(
1

2
x
2
) for [x[ < , 0 for [x[
then is smooth, that is, dierentiable arbitrarily often. (ii) Show that if F

(x) =
_
x

d for x R then F

is
smooth. (iii) Show that if a < b < c < d in R there is a smooth function h such that [b, c] h [a, d]. (iv) Write
D for the space of smooth functions h : R R such that x : h(x) ,= 0 is bounded. Show that h

: h D is dense
in L
1
(). (v) Let f be a real-valued function which is integrable over every bounded subset of R. Show that f h is
integrable for every h D, and that if
_
f h = 0 for every h D then f = 0 a.e. (Hint: 222D.)
(j) Let (X, , ) be a probability space, T a -subalgebra of and P : L
1
() L
1
(T) L
1
() the corresponding
conditional expectation operator. Show that if u, v L
1
() are such that P[u[ P[v[ L
1
(), then
_
Pu v =
_
Pu Pv =
_
u Pv.
242Y Further exercises (a) Let (X, , ) be a measure space. Let A L
1
= L
1
() be a non-empty downwards-
directed set, and suppose that inf A = 0 in L
1
. (i) Show that inf
uA
|u|
1
= 0. (Hint: set = inf
uA
|u|
1
; nd a
non-increasing sequence u
n

nN
in A such that lim
n
|u
n
|
1
= ; set v = inf
nN
u
n
and show that u v = v for
every u A, so that v = 0.) (ii) Show that if U is any open set containing 0, there is a u A such that v U whenever
0 v u.
(b) Let (X, , ) be a measure space and Y any subset of X; let
Y
be the subspace measure on Y and T : L
0
()
L
0
(
Y
) the canonical map described in 241Yg. (i) Show that Tu L
1
(
Y
) and |Tu|
1
|u|
1
for every u L
1
().
(ii) Show that if u L
1
() then |Tu|
1
= |u|
1
i
_
E
u =
_
Y E
Tu for every E . (iii) Show that T is surjective and
that |v|
1
= min|u|
1
: u L
1
(), Tu = v for every v L
1
(
Y
). (Hint: 214Eb.) (See also 244Yd below.)
(c) Let (X, , ) be a measure space. Write L
1
strict
for the space of all integrable -measurable functions from
X to R, and N for the subspace of L
1
strict
consisting of measurable functions which are zero almost everywhere. (i)
Show that L
1
strict
is a Dedekind -complete Riesz space. (ii) Show that L
1
() can be identied, as ordered linear
space, with the quotient L
1
strict
/N as dened in 241Yb. (iii) Show that | |
1
is a seminorm on L
1
strict
. (iv) Show that
f [f[ : L
1
strict
L
1
strict
is continuous if L
1
strict
is given the topology dened from | |
1
. (v) Show that f : f = 0 a.e.
is closed in L
1
strict
, but that f : f 0 need not be.
(d) Let (X, , ) be a measure space, and the c.l.d. version of (213E). Show that the inclusion L
1
() L
1
( )
induces an isomorphism, as ordered normed linear spaces, between L
1
( ) and L
1
().
(e) Let (X, , ) be a measure space and u
0
, . . . , u
n
L
1
(). (i) Suppose k
0
, . . . , k
n
Z are such that

n
i=0
k
i
= 1.
Show that

n
i=0

n
j=0
k
i
k
j
|u
i
u
j
|
1
0. (Hint:

n
i=0

n
j=0
k
i
k
j
[
i

j
[ 0 for all
0
, . . . ,
n
R.) (ii) Suppose

0
, . . . ,
n
R are such that

n
i=0

i
= 0. Show that

n
i=0

n
j=0

j
|u
i
u
j
|
1
0.
(f ) Let (X, , ) be a measure space, and A L
1
= L
1
() a non-empty upwards-directed set. Suppose that A is
bounded for the norm | |
1
. (i) Show that there is a non-decreasing sequence u
n

nN
in A such that lim
n
_
u
n
=
sup
uA
_
u, and that u
n

nN
is Cauchy. (ii) Show that w = sup A is dened in L
1
and belongs to the norm-closure of
A in L
1
, so that, in particular, |w|
1
sup
uA
|u|
1
.
(g) A Riesz norm (denition: 242Xe) on a Riesz space U is order-continuous if inf
uA
|u| = 0 whenever A U is
a non-empty downwards-directed set with inmum 0. (Thus 242Ya tells us that the norms | |
1
are all order-continuous.)
Show that in this case (i) any non-decreasing sequence in U which has an upper bound in U must be Cauchy (ii) if U
is a Banach lattice, it is U is Dedekind complete. (Hint for (i): if u
n

nN
is a non-decreasing sequence with an upper
bound in U, let B be the set of upper bounds of u
n
: n N and show that A = v u
n
: v B, n N has inmum
0 because U is Archimedean.)
150 Function spaces 242Yh
(h) Let (X, , ) be any measure space. Show that L
1
() has the countable sup property (241Ye).
(i) More generally, show that any Riesz space with an order-continuous Riesz norm has the countable sup property.
(j) Let (X, , ) and (Y, T, ) be measure spaces and U L
0
() a linear subspace. Let T : U L
0
() be a linear
operator such that Tu 0 in L
0
() whenever u U and u 0 in L
0
(). Suppose that w U is such that w 0 and
Tw = (Y )

. Show that whenever : R R is a convex function and u L


0
() is such that wu and w

(u) U,
dening

: L
0
() L
0
() as in 241I, then

T(w u) T(w

u). Explain how this result may be regarded as a
common generalization of Jensens inequality, as stated in 233I, and 242K above. See also 244M below.
(k)(i) A function : C R is convex if (ab + (1 a)c) a(b) + (1 a)(c) for all b, c C and a [0, 1].
(ii) Show that such a function must be bounded on any bounded subset of C. (iii) If : C R is convex and
c C, show that there is a b C such that (x) (c) + !e(b(x c)) for every x C. (iv) If b
c

cC
is such that
(x)
c
(x) = (c) + !e(b
c
(x c)) for all x, c C, show that b
c
: c I is bounded for any bounded I C. (v)
Show that if D C is any dense set, (x) = sup
cD

c
(x) for every x C.
(l) Let (X, , ) be a probability space and T a -subalgebra of . Let P : L
1
C
() L
1
C
(T) be the conditional
expectation operator. Show that if : C R is any convex function, and we dene

(f

) = (f)

for every f L
0
C
(),
then

(Pu) P(

(u)) whenever u L
1
C
() is such that

(u) L
1
().
242 Notes and comments Of course L
1
-spaces compose one of the most important classes of Riesz space, and
accordingly their properties have great prominence in the general theory; 242Xe, 242Xf, 242Ya and 242Yf-242Yi outline
some of the interrelations between these properties. I will return to these questions in Chapter 35 in the next volume.
I have mentioned in passing (242Dd) the additivity of the norm of L
1
on the positive elements. This elementary fact
actually characterizes L
1
spaces among Banach lattices; see 369E in the next volume.
Just as L
0
() can be regarded as a quotient of a linear space L
0
strict
, so can L
1
() be regarded as a quotient of a
linear space L
1
strict
(242Yc). I have discussed this question in the notes to 241; all I try to do here is to be consistent.
We now have a language in which we can speak of the conditional expectation of a function f, the equivalence class
in L
1
(T) consisting precisely of all the conditional expections of f on T. If we think of L
1
(T) as identied with
its image in L
1
(), then the conditional expectation operator P : L
1
() L
1
(T) becomes a projection (242Jh). We
therefore have re-statements of 233J-233K, as in 242K, 242L and 242Yj.
I give 242O in a fairly general form; but its importance already appears if we take X to be [0, 1] with one-dimensional
Lebesgue measure. In this case, we have a natural norm on C([0, 1]), the space of all continuous real-valued functions
on [0, 1], given by setting
|f|
1
=
_
1
0
[f(x)[dx
for every f C([0, 1]). The integral here can, of course, be taken to be the Riemann integral; we do not need the
Lebesgue theory to show that | |
1
is a norm on C([0, 1]). It is easy to check that C([0, 1]) is not complete for this norm
(if we set f
n
(x) = min(1, 2
n
x
n
) for x [0, 1], then f
n

nN
is a | |
1
-Cauchy sequence with no | |
1
-limit in C([0, 1])).
We can use the abstract theory of normed spaces to construct a completion of C([0, 1]); but it is much more satisfactory
if this completion can be given a relatively concrete form, and this is what the identication of L
1
with the completion
of C([0, 1]) can do. (Note that the remark that | |
1
is a norm on C([0, 1]), that is, that |f|
1
,= 0 for every non-zero
f C([0, 1]), means just that the map f f

: C([0, 1]) L
1
is injective, so that C([0, 1]) can be identied, as
ordered normed space, with its image in L
1
.) It would be even better if we could nd a realization of the completion
of C([0, 1]) as a space of functions on some set Z, rather than as a space of equivalence classes of functions on [0, 1].
Unfortunately this is not practical; such realizations do exist, but necessarily involve either a thoroughly unfamiliar
base set Z, or an intolerably arbitrary embedding map from C([0, 1]) into R
Z
.
You can get an idea of the obstacle to realizing the completion of C([0, 1]) as a space of functions on [0, 1] itself by
considering f
n
(x) =
1
n
x
n
for n 1. An easy calculation shows that

n=1
|f
n
|
1
< , so that

n=1
f
n
must exist in
the completion of C([0, 1]); but there is no natural value to assign to it at the point 1. Adaptations of this idea can
give rise to indenitely complicated phenomena indeed, 242O shows that every integrable function is associated with
some appropriate sequence from C([0, 1]). In 245 I shall have more to say about what | |
1
-convergent sequences look
like.
From the point of view of measure theory, narrowly conceived, most of the interesting ideas appear most clearly with
real functions and real linearspaces. But some of the most important applications of measure theory important not
only as mathematics in general, but also for the measure-theoretic questions they inspire deal with complex functions
and complex linear spaces. I therefore continue to oer sketches of the complex theory, as in 242P. I note that at
irregular intervals we need ideas not already spelt out in the real theory, as in 242Pb and 242Yl.
243Da L

151
243 L

The second of the classical Banach spaces of measure theory which I treat is the space L

. As will appear below,


L

is the polar companion of L


1
, the linked opposite; for ordinary measure spaces it is actually the dual of L
1
(243F-243G).
243A Denitions Let (X, , ) be any measure space. Let L

= L

() be the set of functions f L


0
= L
0
()
which are essentially bounded, that is, such that there is some M 0 such that x : x domf, [f(x)[ M is
conegligible, and write
L

= L

() = f

: f L

() L
0
().
Note that if f L

, g L
0
and g =
a.e.
f, then g L

; thus L

= f : f L
0
, f

.
243B Theorem Let (X, , ) be any measure space. Then
(a) L

= L

() is a linear subspace of L
0
= L
0
().
(b) If u L

, v L
0
and [v[ [u[ then v L

. Consequently [u[, u v, u v, u
+
= u 0 and u

= (u) 0
belong to L

for all u, v L

.
(c) Writing e = X

, the equivalence class in L


0
of the constant function with value 1, then an element u of L
0
belongs to L

i there is an M 0 such that [u[ Me.


(d) If u, v L

then u v L

.
(e) If u L

and v L
1
= L
1
() then u v L
1
.
proof (a) If f, g L

= L

() and c R, then f + g, cf L

. PPP We have M
1
, M
2
0 such that [f[ M
1
a.e.
and [g[ M
2
a.e. Now
[f +g[ [f[ +[g[ M
1
+M
2
a.e., [cf[ [c[[M
1
[ a.e.,
so f +g, cf L

. QQQ It follows at once that u +v, cu L

whenever u, v L

and c R.
(b)(i) Take f L

, g L
0
= L
0
() such that u = f

and v = g

. Then [g[
a.e.
[f[. Let M 0 be such that
[f[ M a.e.; then [g[ M a.e., so g L

and v L

.
(ii) Now [ [u[ [ = [u[ so [u[ L

whenever u L

. Also u v =
1
2
(u + v + [u v[), u v =
1
2
(u + v [u v[)
belong to L

for all u, v L

.
(c)(i) If u L

, take f L

such that f

= u. Then there is an M 0 such that [f[ M a.e., so that


[f[
a.e.
MX and [u[ Me. (ii) Of course X L

, so e L

, and if u L
0
and [u[ Me then u L

by (b).
(d) f g L

whenever f, g L

. PPP If [f[ M
1
a.e. and [g[ M
2
a.e., then
[f g[ = [f[ [g[ M
1
M
2
a.e. QQQ
So u v L

for all u, v L

.
(e) If f L

and g L
1
= L
1
(), then there is an M 0 such that [f[ M a.e., so [f g[
a.e.
M[g[; because
M[g[ is integrable and f g is virtually measurable, f g is integrable and u v L
1
.
243C The order structure of L

Let (X, , ) be any measure space. Then L

= L

(), being a linear


subspace of L
0
= L
0
(), inherits a partial order which renders it a partially ordered linear space (compare 242Ca).
Because [u[ L

whenever u L

(243Bb), u v and u v belong to L

whenever u, v L

, and L

is a Riesz
space (compare 242Cd).
The behaviour of L

as a Riesz space is dominated by the fact that it has an order unit e with the property that
for every u L

there is an M 0 such that [u[ Me


(243Bc).
243D The norm of L

Let (X, , ) be any measure space.


(a) For f L

= L

(), say that the essential supremum of [f[ is


ess sup [f[ = infM : M 0, x : x domf, [f(x)[ M is conegligible.
Then [f[ ess sup [f[ a.e. PPP Set M = ess sup [f[. For each n N, there is an M
n
M +2
n
such that [f[ M
n
a.e.
Now
x : [f(x)[ M =

nN
x : [f(x)[ M
n

is conegligible, so [f[ M a.e. QQQ


152 Function spaces 243Db
(b) If f, g L

and f =
a.e.
g, then ess sup [f[ = ess sup [g[. Accordingly we may dene a functional | |

on
L

= L

() by setting |u|

= ess sup [f[ whenever u = f

.
(c) From (a), we see that, for any u L

, |u|

= min : [u[ e, where, as before, e = X

.
Consequently | |

is a norm on L

. PPP(i) If u, v L

then
[u +v[ [u[ +[v[ (|u|

+|v|

)e
so |u +v|

|u|

+|v|

. (ii) If u L

and c R then
[cu[ = [c[[u[ [c[|u|

e,
so |cu|

[c[|u|

. (iii) If |u|

= 0, there is an f L

such that f

= u and [f[ |u|

a.e.; now f = 0 a.e. so


u = 0. QQQ
(d) Note also that if u L
0
, v L

and [u[ [v[ then [u[ |v|

e so u L

and |u|

|v|

; similarly,
|u v|

|u|

|v|

, |u v|

max(|u|

, |v|

)
for all u, v L

. Thus L

is a commutative Banach algebra (2A4J).


(e) Moreover,
[
_
u v[
_
[u v[ = |u v|
1
|u|
1
|v|

whenever u L
1
and v L

, because
[u v[ = [u[ [v[ [u[ |v|

e = |v|

[u[.
(f ) Observe that if u, v are non-negative members of L

then
|u v|

= max(|u|

, |v|

);
this is because, for any 0,
u v e u e and v e.
243E Theorem For any measure space (X, , ), L

= L

() is a Banach lattice under | |

.
proof (a) We already know that |u|

|v|

whenever [u[ [v[ (243Dd); so we have just to check that L

is
complete under | |

. Let u
n

nN
be a Cauchy sequence in L

. For each n N choose f


n
L

= L

() such that
f

n
= u
n
in L

. For all m, n N, (f
m
f
n
)

= u
m
u
n
. Consequently
E
mn
= x : [f
m
(x) f
n
(x)[ > |u
m
u
n
|

is negligible, by 243Da. This means that


E =

nN
x : x domf
n
, [f
n
(x)[ |u
n
|

m,nN
E
mn
is conegligible. But for every x E, [f
m
(x) f
n
(x)[ |u
m
u
n
|

for all m, n N, so that f


n
(x)
nN
is a Cauchy
sequence, with a limit in R. Thus f = lim
n
f
n
is dened almost everywhere. Also, at least for x E,
[f(x)[ sup
nN
|u
n
|

< ,
so f L

and u = f

. If m N, then, for every x E,


[f(x) f
m
(x)[ sup
nm
[f
n
(x) f
m
(x)[ sup
nm
|u
n
u
m
|

,
so
|u u
m
|

sup
nm
|u
n
u
m
|

0
as m , and u = lim
m
u
m
in L

. As u
n

nN
is arbitrary, L

is complete.
243F The duality between L

and L
1
Let (X, , ) be any measure space.
(a) I have already remarked that if u L
1
= L
1
() and v L

= L

(), then uv L
1
and [
_
uv[ |u|
1
|v|

(243Bd, 243De).
243G L

153
(b) Consequently we have a bounded linear operator T from L

to the normed space dual (L


1
)

of L
1
, given by
writing
(Tv)(u) =
_
u v for all u L
1
, v L

.
PPP (i) By (a), (Tv)(u) is well-dened for u L
1
, v L

. (ii) If v L

, u, u
1
, u
2
L
1
and c R, then
(Tv)(u
1
+u
2
) =
_
(u
1
+u
2
) v =
_
(u
1
v) + (u
2
v)
=
_
u
1
v +
_
u
2
v = (Tv)(u
1
) + (Tv)(u
2
),
(Tv)(cu) =
_
cu v =
_
c(u v) = c
_
u v = c(Tv)(u).
This shows that Tv : L
1
R is a linear functional for each v L

. (iii) Next, for any u L


1
and v L

,
[(Tv)(u)[ = [
_
u v[ |u v|
1
|u|
1
|v|

,
as remarked in (a). This means that Tv (L
1
)

and |Tv| |v|

for every v L

. (iv) If v, v
1
, v
2
L

, u L
1
and c R, then
T(v
1
+v
2
)(u) =
_
(v
1
+v
2
) u =
_
(v
1
u) + (v
2
u)
=
_
v
1
u +
_
v
2
u = (Tv
1
)(u) + (Tv
2
)(u)
= (Tv
1
+Tv
2
)(u),
T(cv)(u) =
_
cv u = c
_
v u = c(Tv)(u) = (cTv)(u).
As u is arbitrary, T(v
1
+ v
2
) = Tv
1
+ Tv
2
and T(cv) = c(Tv); thus T : L

(L
1
)

is linear. (v) Recalling from (iii)


that |Tv| |v|

for every v L

, we see that |T| 1. QQQ


(c) Exactly the same arguments show that we have a linear operator T

: L
1
(L

, given by writing
(T

u)(v) =
_
u v for all u L
1
, v L

,
and that |T

| also is at most 1.
243G Theorem Let (X, , ) be a measure space, and T : L

() (L
1
())

the canonical map described in


243F. Then
(a) T is injective i (X, , ) is semi-nite, and in this case is norm-preserving;
(b) T is bijective i (X, , ) is localizable, and in this case is a normed space isomorphism.
proof (a)(i) Suppose that T is injective, and that E has E = . Then E is not equal almost everywhere to
0, so (E)

,= 0 in L

, and T(E)

,= 0; let u L
1
be such that T(E)

(u) ,= 0, that is,


_
u (E)

,= 0. Express
u as f

where f is integrable; then


_
E
f ,= 0 so
_
E
[f[ , = 0. Let g be a simple function such that 0 g
a.e.
[f[ and
_
g >
_
[f[
_
E
[f[; then
_
E
g ,= 0. Express g as

n
i=0
a
i
E
i
where E
i
< for each i; then 0 ,=

n
i=0
a
i
(E
i
E),
so there is an i n such that (E E
i
) ,= 0, and now E E
i
is a measurable subset of E of non-zero nite measure.
As E is arbitrary, this shows that (X, , ) must be semi-nite if T is injective.
(ii) Now suppose that (X, , ) is semi-nite, and that v L

is non-zero. Express v as g

where g : X R
is measurable; then g L

. Take any a ]0, |v|

[; then E = x : [g(x)[ a has non-zero measure. Let F E


be a measurable set of non-zero nite measure, and set f(x) = [g(x)[/g(x) if x F, 0 otherwise; then f L
1
and
(f g)(x) a for x F, so, setting u = f

L
1
, we have
(Tv)(u) =
_
u v =
_
f g aF = a
_
[f[ = a|u|
1
> 0.
This shows that |Tv| a; as a is arbitrary, |Tv| |v|

. We know already from 243F that |Tv| |v|

, so
|Tv| = |v|

for every non-zero v L

; the same is surely true for v = 0, so T is norm-preserving and injective.


(b)(i) Using (a) and the denition of localizable, we see that under either of the conditions proposed (X, , ) is
semi-nite and T is injective and norm-preserving. I therefore have to show just that it is surjective i (X, , ) is
localizable.
154 Function spaces 243G
(ii) Suppose that T is surjective and that c . Let T be the family of nite unions of members of c, counting
as the union of no members of c, so that T is closed under nite unions and, for any G , E G is negligible for
every E c i E G is negligible for every E T.
If u L
1
, then h(u) = lim
EF,E
_
E
u exists in R. PPP If u is non-negative, then
h(u) = sup
_
E
u : E T
_
u < .
For other u, we can express u as u
1
u
2
, where u
1
and u
2
are non-negative, and now h(u) = h(u
1
) h(u
2
). QQQ
Evidently h : L
1
R is linear, being a limit of the linear functionals u
_
E
u, and also
[h(u)[ sup
EF
[
_
E
u[
_
[u[
for every u, so h (L
1
)

. Since we are supposing that T is surjective, there is a v L

such that Tv = h. Express v


as g

where g : X R is measurable and essentially bounded. Set G = x : g(x) > 0 .


If F and F < , then
_
F
g =
_
(F)

= (Tv)(F)

= h(F)

= sup
EF
(E F).
??? If E c and E G is not negligible, then there is a set F E G such that 0 < F < ; now
F = (E F)
_
F
g 0,
as g(x) 0 for x F. XXX Thus E G is negligible for every E c.
Let H be such that E H is negligible for every E c. ??? If G H is not negligible, there is a set F G H
of non-zero nite measure. Now
(E F) (H F) = 0
for every E c, so (E F) = 0 for every E T, and
_
F
g = 0; but g(x) > 0 for every x F, so F = 0, which is
impossible. XXX Thus G H is negligible.
Accordingly G is an essential supremum of c in . As c is arbitrary, (X, , ) is localizable.
(iii) For the rest of this proof, I will suppose that (X, , ) is localizable and seek to show that T is surjective.
Take h (L
1
)

such that |h| = 1. Write


f
= F : F , F < , and for F
f
dene
F
: R by setting

F
E = h((E F)

)
for every E . Then
F
= h(0) = 0, and if E, E

are disjoint

F
E +
F
E

= h((E F)

) +h((E

F)

) = h(((E F) +(E

F))

)
= h(((E E

) F)

) =
F
(E E

).
Thus
F
is additive. Also
[
F
E[ |(E F)

|
1
= (E F)
for every E , so
F
is truly continuous in the sense of 232Ab. By the Radon-Nikod ym theorem (232E), there is an
integrable function g
F
such that
_
E
g
F
=
F
E for every E ; we may take it that g
F
is measurable and has domain
X (232He).
(iv) It is worth noting that [g
F
[ 1 a.e. PPP If G = x : g
F
(x) > 1, then
_
G
g
F
=
F
G (F G) G;
but this is possible only if G = 0. Similarly, if G

= x : g
F
(x) < 1, then
_
G

g
F
=
F
G

,
so again G

= 0. QQQ
(v) If F, F


f
, then g
F
= g
F
almost everywhere in F F

. PPP If E and E F F

, then
_
E
g
F
= h((E F)

) = h((E F

) =
_
E
g
F
.
So 131Hb gives the result. QQQ 213N (applied to g
F
F : F
f
) now tells us that, because is localizable, there is a
measurable function g : X R such that g = g
F
almost everywhere in F, for every F
f
.
(vi) For any F
f
, the set
x : x F, [g(x)[ > 1 x : [g
F
(x)[ > 1 x : x F, g(x) ,= g
F
(x)
243I L

155
is negligible; because is semi-nite, x : [g(x)[ > 1 is negligible, and g L

, with ess sup [g[ 1. Accordingly


v = g

, and we may speak of Tv (L


1
)

.
(vii) If F
f
, then
_
F
g =
_
F
g
F
=
F
X = h(F

).
It follows at once that
(Tv)(f

) =
_
f g = h(f

)
for every simple function f : X R. Consequently Tv = h, because both Tv and h are continuous and the equivalence
classes of simple functions form a dense subset of L
1
(242Mb, 2A3Uc). Thus h = Tv is a value of T.
(viii) The argument as written above has assumed that |h| = 1. But of course any non-zero member of (L
1
)

is
a scalar multiple of an element of norm 1, so is a value of T. So T : L

(L
1
)

is indeed surjective, and is therefore


an isometric isomorphism, as claimed.
243H Recall that L
0
is always Dedekind -complete and sometimes Dedekind complete (241G), while L
1
is always
Dedekind complete (242H). In this respect L

follows L
0
.
Theorem Let (X, , ) be a measure space.
(a) L

() is Dedekind -complete.
(b) If is localizable, L

() is Dedekind complete.
proof These are both consequences of 241G. If A L

= L

() is bounded above in L

, x u
0
A and an upper
bound w
0
of A in L

. If B is the set of upper bounds for A in L


0
= L
0
(), then B L

is the set of upper bounds


for A in L

. Moreover, if B has a least member v


0
, then we must have u
0
v
0
w
0
, so that
0 v
0
u
0
w
0
u
0
L

and v
0
u
0
, v
0
belong to L

. (Compare part (a) of the proof of 242H.) Thus v


0
= sup A in L

.
Now we know that L
0
is Dedekind -complete; if A L

is a non-empty countable set which is bounded above in


L

, it is surely bounded above in L


0
, so has a supremum in L
0
which is also its supremum in L

. As A is arbitrary,
L

is Dedekind -complete. While if is localizable, we can argue in the same way with arbitrary non-empty subsets
of L

to see that L

is Dedekind complete because L


0
is.
243I A dense subspace of L

In 242M and 242O I described a couple of important dense linear subspaces of L


1
spaces. The position concerning L

is a little dierent. However I can describe one important dense subspace.


Proposition Let (X, , ) be a measure space.
(a) Write S for the space of -simple functions on X, that is, the space of functions from X to R expressible as

n
k=0
a
k
E
k
where a
k
R and E
k
for every k n. Then for every f L

= L

() and every > 0, there is a


g S such that ess sup [f g[ .
(b) S = f

: f S is a | |

-dense linear subspace of L

= L

().
(c) If (X, , ) is totally nite, then S is the space of -simple functions, so S becomes just the space of equivalence
classes of simple functions, as in 242Mb.
proof (a) Let

f : X R be a bounded measurable function such that f =
a.e.

f. Let n N be such that [f(x)[ n
for every x X. For n k n set
E
k
= x : k

f(x) < k + 1).
Set
g =

n
k=n
kE
k
S;
then 0

f(x) g(x) for every x X, so
ess sup [f g[ = ess sup [

f g[ .
(b) This follows immediately, as in 242Mb.
(c) also is elementary.
156 Function spaces 243J
243J Conditional expectations Conditional expectations are so important that it is worth considering their
interaction with every new concept.
(a) If (X, , ) is any measure space, and T is a -subalgebra of , then the canonical embedding S : L
0
(T)
L
0
() (242Ja) embeds L

(T) as a subspace of L

(), and |Su|

= |u|

for every u L

(T). As in 242Jb,
we can identify L

(T) with its image in L

().
(b) Now suppose that X = 1, and let P : L
1
() L
1
(T) be the conditional expectation operator (242Jd).
Then L

() is actually a linear subspace of L


1
(). Setting e = X

(), we see that


_
F
e = (T)(F) for every
F T, so
Pe = X

(T).
If u L

(), then setting M = |u|

we have Me u Me, so MPe Pu MPe, because P is order-


preserving (242Je); accordingly |Pu|

M = |u|

. Thus PL

() : L

() L

(T) is an operator of norm


1.
If u L

(T), then Pu = u; so P[L

] is the whole of L

(T).
243K Complex L

All the ideas needed to adapt the work above to complex L

spaces have already appeared


in 241J and 242P. Let L

C
be
f : f L
0
C
, ess sup [f[ < = f : !e(f) L

, Jm(f) L

.
Then
L

C
= f

: f L

C
= u : u L
0
C
, !e(u) L

, Jm(u) L

.
Setting
|u|

= |[u[|

= ess sup [f[ whenever f

= u,
we have a norm on L

C
rendering it a Banach space. We still have u v L

C
and |u v|

|u|

|v|

for all u,
v L

C
.
We now have a duality between L
1
C
and L

C
giving rise to a linear operator T : L

C
(L
1
C
)

of norm at most 1,
dened by the formula
(Tv)(u) =
_
u v for every u L
1
, v L

.
T is injective i the underlying measure space is semi-nite, and is a bijection i the underlying measure space is
localizable. (This can of course be proved by re-working the arguments of 243G; but it is perhaps easier to note that
T(!e(v)) = !e(Tv), T(Jm(v)) = Jm(Tv) for every v, so that the result for complex spaces can be deduced from the
result for real spaces.) To check that T is norm-preserving when it is injective, the quickest route seems to be to imitate
the argument of (a-ii) of the proof of 243G.
243X Basic exercises (a) Let (X, , ) be any measure space, and the completion of (212C, 241Xb). Show
that L

( ) = L

() and L

( ) = L

().
>>>(b) Let (X, , ) be a non-empty measure space. Write L

strict
for the space of bounded -measurable real-valued
functions with domain X. (i) Show that L

() = f

: f L

strict
L
0
= L
0
(). (ii) Show that L

strict
is a Dedekind
-complete Banach lattice if we give it the norm
|f|

= sup
xX
[f(x)[ for every f L

strict
.
(iii) Show that for every u L

= L

(), |u|

= min|f|

: f L

strict
, f

= u.
>>>(c) Let (X, , ) be any measure space, and A a subset of L

(). Show that A is bounded for the norm | |

i
it is bounded above and below for the ordering of L

.
(d) Let (X, , ) be any measure space, and A L

() a non-empty set with a least upper bound w in L

().
Show that |w|

sup
uA
|u|

.
(e) Let (X
i
,
i
,
i
)
iI
be a family of measure spaces, and (X, , ) their direct sum (214L). Show that the canonical
isomorphism between L
0
() and

iI
L
0
(
i
) (241Xd) induces an isomorphism between L

() and the subspace


u : u

iI
L

(
i
), |u| = sup
iI
|u(i)|

<
of

iI
L

(
i
).
243 Notes L

157
(f ) Let (X, , ) be any measure space, and u L
1
(). Show that there is a v L

() such that |v|

1 and
_
u v = |u|
1
.
(g) Let (X, , ) be a semi-nite measure space and v L

(). Show that


|v|

= sup
_
u v : u L
1
, |u|
1
1 = sup|u v|
1
: u L
1
, |u|
1
1.
(h) Give an example of a probability space (X, , ) and a v L

() such that |u v|
1
< |v|

whenever
u L
1
() and |u|
1
1.
(i) Write out proofs of 243G adapted to the special cases (i) X = 1 (ii) (X, , ) is -nite.
(j) Let (X, , ) be any measure space. Show that L
0
() is Dedekind complete i L

() is Dedekind complete.
(k) Let (X, , ) be a totally nite measure space and : R a functional. Show that the following are
equiveridical: (i) there is a continuous linear functional h : L
1
() R such that h((E)

) = E for every E (ii)


is additive and there is an M 0 such that [E[ ME for every E .
>>>(l) Let X be any set, and let be counting measure on X. In this case it is customary to write

(X) for L

(),
and to identify it with L

(). Write out statements and proofs of the results of this chapter adapted to this special
case if you like, with X = N. In particular, write out a direct proof that (
1
)

can be identied with

. What
happens when X has just two members? or three?
(m) Show that if (X, , ) is any measure space and u L

C
(), then
|u|

= sup| !e(u)|

: C, [[ = 1.
(n) Let (X, , ) and (Y, T, ) be measure spaces, and : X Y an inverse-measure-preserving function. Show
that g L

() for every g L

(), and that the map g g induces a linear operator T : L

() L

() dened
by setting T(g

) = (g)

for every g L

(). (Compare 241Xg.) Show that |Tv|

= |v|

for every v L

().
(o) For f, g C = C([0, 1]), the space of continuous real-valued functions on the unit interval [0, 1], say
f g i f(x) g(x) for every x [0, 1],
|f|

= sup
x[0,1]
[f(x)[.
Show that C is a Banach lattice, and that moreover
|f g|

= max(|f|

, |g|

) whenever f, g 0,
|f g|

|f|

|g|

for all f, g C,
|f|

= min : [f[ X for every f C.


243Y Further exercises (a) Let (X, , ) be a measure space, and Y a subset of X; write
Y
for the subspace
measure on Y . Show that the canonical map from L
0
() onto L
0
(
Y
) (241Yg) induces a canonical map from L

()
onto L

(
Y
), which is norm-preserving i it is injective i Y has full outer measure.
243 Notes and comments I mention the formula
|u v|

= max(|u|

, |v|

) for u, v 0
(243Df) because while it does not characterize L

spaces among Banach lattices (see 243Xo), it is in a sense dual to


the characteristic property
|u +v|
1
= |u|
1
+|v|
1
for u, v 0
of the norm of L
1
. (I will return to this in Chapter 35 in the next volume.)
The particular set L

I have chosen (243A) is somewhat arbitrary. The space L

can very well be described


entirely as a subspace of L
0
, without going back to functions at all; see 243Bc, 243Dc. Just as with L
0
and L
1
, there
are occasions when it would be simpler to work with the linear space of essentially bounded measurable functions from
158 Function spaces 243 Notes
X to R; and we now have a third obvious candidate, the linear space L

strict
of measurable functions from X to R which
are literally, rather than essentially, bounded, which is itself a Banach lattice (243Xb).
I suppose the most important theorem of this section is 243G, identifying L

with (L
1
)

. This identication is
the chief reason for setting localizable measure spaces apart. The proof of 243Gb is long because it depends on two
separate ideas. The Radon-Nikod ym theorem deals, in eect, with the totally nite case, and then in parts (b-v) and
(b-vi) of the proof localizability is used to link the partial solutions g
F
together. Exercise 243Xi is supposed to help
you to distinguish the two operations. The map T

: L
1
(L

(243Fc) is also very interesting in its way, but I shall


leave it for Chapter 36.
243G gives another way of looking at conditional expectation operators. If (X, , ) is a probability space and T is
a -subalgebra of , of course both and T are localizable, so L

() can be identied with (L


1
())

and L

(T)
can be identied with (L
1
(T))

. Now we have the canonical embedding S : L


1
(T) L
1
() (242Jb) which is a
norm-preserving linear operator, so gives rise to an adjoint operator S

: L
1
()

L
1
(T)

dened by the formula


(S

h)(v) = h(Sv) for all v L


1
(T), h L
1
()

.
Writing T

: L

() L
1
()

and T
T
: L

(T) L
1
(T)

for the canonical maps, we get a map Q = T


1
T
S

:
L

() L

(T), dened by saying that


_
Qu v = (T
T
Qu)(v) = (S

u)(v) = (T

v)(Su) =
_
Su v =
_
u v
whenever v L
1
(T) and u L

(). But this agrees with the formula of 242L: we have


_
Qu v =
_
u v =
_
P(u v) =
_
Pu v.
Because v is arbitrary, we must have Qu = Pu for every u L

(). Thus a conditional expectation operator is, in a


sense, the adjoint of the appropriate embedding operator.
The discussion in the last paragraph applies, of course, only to the restriction PL

() of the conditional expectation


operator to the L

space. Because is totally nite, L

() is a subspace of L
1
(), and the real qualities of the operator
P are related to its behaviour on the whole space L
1
. P : L
1
() L
1
(T) can also be expressed as an adjoint operator,
but the expression needs more of the theory of Riesz spaces than I have space for here. I will return to this topic in
Chapter 36.
244 L
p
Continuing with our tour of the classical Banach spaces, we come to the L
p
spaces for 1 < p < . The case p = 2
is more important than all the others put together, and it would be reasonable, perhaps even advisable, to read this
section rst with this case alone in mind. But the other spaces provide instructive examples and remain a basic part
of the education of any functional analyst.
244A Denitions Let (X, , ) be any measure space, and p ]1, [. Write L
p
= L
p
() for the set of functions
f L
0
= L
0
() such that [f[
p
is integrable, and L
p
= L
p
() for f

: f L
p
L
0
= L
0
().
Note that if f L
p
, g L
0
and f =
a.e.
g, then [f[
p
=
a.e.
[g[
p
so [g[
p
is integrable and g L
p
; thus L
p
= f : f
L
0
, f

L
p
.
Alternatively, we can dene u
p
whenever u L
0
, u 0 by writing (f

)
p
= (f
p
)

for every f L
0
such that f(x) 0
for every x domf (compare 241I), and say that L
p
= u : u L
0
, [u[
p
L
1
().
244B Theorem Let (X, , ) be any measure space, and p [1, ].
(a) L
p
= L
p
() is a linear subspace of L
0
= L
0
().
(b) If u L
p
, v L
0
and [v[ [u[, then v L
p
. Consequently [u[, u v and u v belong to L
p
for all u, v L
p
.
proof The cases p = 1, p = are covered by 242B, 242C and 243B; so I suppose that 1 < p < .
(a)(i) Suppose that f, g L
p
= L
p
(). If a, b R then [a +b[
p
2
p
max([a[
p
, [b[
p
), so [f +g[
p

a.e.
2
p
([f[
p
[g[
p
);
now [f +g[
p
L
0
and 2
p
([f[
p
[g[
p
) L
1
so [f +g[
p
L
1
. Thus f +g L
p
for all f, g L
p
; it follows at once that
u +v L
p
for all u, v L
p
.
(ii) If f L
p
and c R then [cf[
p
= [c[
p
[f[
p
L
1
, so cf L
p
. Accordingly cu L
p
whenever u L
p
and c R.
(b)(i) Express u as f

and v as g

, where f L
p
and g L
0
. Then [g[
a.e.
[f[, so [g[
p

a.e.
[f[
p
and [g[
p
is
integrable; accordingly g L
p
and v L
p
.
(ii) Now [ [u[ [ = [u[ so [u[ L
p
whenever u L
p
. Finally uv =
1
2
(u+v +[uv[) and uv =
1
2
(u+v [uv[)
belong to L
p
for all u, v L
p
.
244E L
p
159
244C The order structure of L
p
Let (X, , ) be any measure space, and p [1, ]. Then 244B is enough to
ensure that the partial order inherited from L
0
() makes L
p
() a Riesz space (compare 242C, 243C).
244D The norm of L
p
Let (X, , ) be a measure space, and p ]1, [.
(a) For f L
p
= L
p
(), set |f|
p
= (
_
[f[
p
)
1/p
. If f, g L
p
and f =
a.e.
g then [f[
p
=
a.e.
[g[
p
so |f|
p
= |g|
p
.
Accordingly we may dene | |
p
: L
p
= L
p
() [0, [ by writing |f

|
p
= |f|
p
for every f L
p
.
Alternatively, we can say just that |u|
p
= (
_
[u[
p
)
1/p
for every u L
p
= L
p
().
(b) The notation | |
p
carries a promise that it is a norm on L
p
; this is indeed so, but I hold the proof over to 244F
below. For the moment, however, let us note just that |cu|
p
= [c[|u|
p
for all u L
p
and c R, and that if |u|
p
= 0
then
_
[u[
p
= 0 so [u[
p
= 0 and u = 0.
(c) If [u[ [v[ in L
p
then |u|
p
|v|
p
; this is because [u[
p
[v[
p
.
244E I now work through the lemmas required to show that | |
p
is a norm on L
p
and, eventually, that the normed
space dual of L
p
may be identied with a suitable L
q
.
Lemma Suppose (X, , ) is a measure space, and that p, q ]1, [ are such that
1
p
+
1
q
= 1.
(a) ab
1
p
a
p
+
1
q
b
q
for all real a, b 0.
(b)(i) f g is integrable and
[
_
f g[
_
[f g[ |f|
p
|g|
q
for all f L
p
= L
p
(), g L
q
= L
q
();
(ii) u v L
1
= L
1
() and
[
_
u v[ |u v|
1
|u|
p
|v|
q
for all u L
p
= L
p
(), v L
q
= L
q
().
proof (a) If either a or b is 0, this is trivial. If both are non-zero, we may argue as follows. The function x x
1/p
:
[0, [ R is concave, with second derivative strictly less than 0, so lies entirely below any of its tangents; in particular,
below its tangent at the point (1, 1), which has equation y = 1 +
1
p
(x 1). Thus we have
x
1/p

1
p
x + 1
1
p
=
1
p
x +
1
q
for every x [0, [. So if c, d > 0, then
(
c
d
)
1/p

1
p
c
d
+
1
q
;
multiplying both sides by d,
c
1/p
d
1/q

1
p
c +
1
q
d;
setting c = a
p
and d = b
q
, we get
ab
1
p
a
p
+
1
q
b
q
,
as claimed.
(b)(i)() Suppose rst that |f|
p
= |g|
q
= 1. For every x domf domg we have
[f(x)g(x)[
1
p
[f(x)[
p
+
1
q
[g(x)[
q
by (a). So
[f g[
a.e.
1
p
[f[
p
+
1
q
[g[
q
L
1
()
and f g is integrable; also
_
[f g[
1
p
_
[f[
p
+
1
q
_
[g[
q
=
1
p
|f|
p
p
+
1
q
|g|
q
q
=
1
p
+
1
q
= 1.
160 Function spaces 244E
() If |f|
p
= 0, then
_
[f[
p
= 0 so [f[
p
=
a.e.
0, f =
a.e.
0, f g =
a.e.
0 and
_
[f g[ = 0 = |f|
p
|g|
q
.
Similarly, if |g|
q
= 0, then g =
a.e.
0 and again
_
[f g[ = 0 = |f|
p
|g|
q
.
() Finally, for general f L
p
, g L
q
such that c = |f|
p
and d = |g|
q
are both non-zero, we have |
1
c
f|
p
= |
1
d
g|
q
= 1
so
f g = cd(
1
c
f
1
d
g)
is integrable, and
_
[f g[ = cd
_
[
1
c
f
1
d
g[ cd,
as required.
(ii) Now if u L
p
, v L
q
take f L
p
, g L
q
such that u = f

and v = g

; f g is integrable, so u v L
1
,
and
[
_
u v[ |u v|
1
=
_
[f g[ |f|
p
|g|
q
= |u|
p
|v|
q
.
Remark Part (b) is Holders inequality. In the case p = q = 2 it is Cauchys inequality.
244F Proposition Let (X, , ) be a measure space and p ]1, [. Set q = p/(p 1), so that
1
p
+
1
q
= 1.
(a) For every u L
p
= L
p
(), |u|
p
= max
_
u v : v L
q
(), |v|
q
1.
(b) | |
p
is a norm on L
p
.
proof (a) For u L
p
, set
(u) = sup
_
u v : v L
q
(), |v|
q
1.
By 244E(b-ii), |u|
p
(u). If |u|
p
= 0 then surely
0 = |u|
p
= (u) = max
_
u v : v L
q
(), |v|
q
1.
If |u|
p
= c > 0, consider
v = c
p/q
sgn u [u[
p/q
,
where for a R I write sgn a = [a[/a if a ,= 0, 0 if a = 0, so that sgn : R R is a Borel measurable function; for
f L
0
I write (sgn f)(x) = sgn(f(x)) for x domf, so that sgn f L
0
; and for f L
0
I write sgn(f

) = (sgn f)

to
dene a function sgn : L
0
L
0
(cf. 241I). Then v L
q
= L
q
() and
|v|
q
= (
_
[v[
q
)
1/q
= c
p/q
(
_
[u[
p
)
1/q
= c
p/q
c
p/q
= 1.
So
(u)
_
u v = c
p/q
_
sgn u [u[ sgn u [u[
p/q
= c
p/q
_
[u[
1+
p
q
= c
p/q
_
[u[
p
= c
p
p
q
= c,
recalling that 1 +
p
q
= p, p
p
q
= 1. Thus (u) |u|
p
and
(u) = |u|
p
=
_
u v.
(b) In view of the remarks in 244Db, I have only to check that |u +v|
p
|u|
p
+|v|
p
for all u, v L
p
. But given
u and v, let w L
q
be such that |w|
q
= 1 and
_
(u +v) w = |u +v|
p
. Then
|u +v|
p
=
_
(u +v) w =
_
u w +
_
v w |u|
p
+|v|
p
,
as required.
Remark The triangle inequality |u +v|
p
|u|
p
+|v|
p
is Minkowskis inequality.
244H L
p
161
244G Theorem Let (X, , ) be any measure space, and p [1, ]. Then L
p
= L
p
() is a Banach lattice under
its norm | |
p
.
proof The cases p = 1, p = are covered by 242F and 243E, so let us suppose that 1 < p < . We know already
that |u|
p
|v|
p
whenever [u[ [v[, so it remains only to show that L
p
is complete.
Let u
n

nN
be a sequence in L
p
such that |u
n+1
u
n
|
p
4
n
for every n N. Note that
|u
n
|
p
|u
0
|
p
+

n1
k=0
|u
k+1
u
k
|
p
|u
0
|
p
+

k=0
4
k
|u
0
|
p
+ 2
for every n. For each n N, choose f
n
L
p
such that f

0
= u
0
, f

n
= u
n
u
n1
for n 1; do this in such a way that
domf
n
= X and f
n
is -measurable (241Bk). Then |f
n
|
p
4
n+1
for n 1.
For m, n N, set
E
mn
= x : [f
m
(x)[ 2
n
.
Then [f
m
(x)[
p
2
np
for x E
mn
, so
2
np
E
mn

_
[f
m
[
p
<
and E
mn
< . So E
mn
L
q
= L
q
() and
_
E
mn
[f
k
[ =
_
[f
k
[ E
mn
|f
k
|
p
|E
mn
|
q
for each k, by 244E(b-i). Accordingly

k=0
_
E
mn
[f
k
[ |E
mn
|
q

k=0
|f
k
|
p
< ,
and

k=0
f
k
(x) exists for almost every x E
mn
, by 242E. This is true for all m, n N. But if x X

m,nN
E
mn
,
f
n
(x) = 0 for every n, so

k=0
f
k
(x) certainly exists. Thus g(x) =

k=0
f
k
(x) is dened in R for almost every x X.
Set g
n
=

n
k=0
f
k
; then g

n
= u
n
L
p
for each n, and g(x) = lim
n
g
n
(x) is dened for almost every x. Now
consider [g[
p
=
a.e.
lim
n
[g
n
[
p
. We know that
liminf
n
_
[g
n
[
p
= liminf
n
|u
n
|
p
p
(2 +|u
0
|
p
)
p
< ,
so by Fatous Lemma
_
[g[
p
liminf
k
_
[g
k
[
p
< .
Thus u = g

L
p
. Moreover, for any m N,
_
[g g
m
[
p
liminf
n
_
[g
n
g
m
[
p
= liminf
n
|u
n
u
m
|
p
p
liminf
n
n1

k=m
4
kp
=

k=m
4
kp
= 4
mp
/(1 4
p
).
So
|u u
m
|
p
= (
_
[g g
m
[
p
)
1/p
4
m
/(1 4
p
)
1/p
0
as m . Thus u = lim
m
u
m
in L
p
. As u
n

nN
is arbitrary, L
p
is complete.
244H Following 242M and 242O, I note that L
p
behaves like L
1
in respect of certain dense subspaces.
Proposition (a) Let (X, , ) be any measure space, and p [1, [. Then the space S of equivalence classes of
-simple functions is a dense linear subspace of L
p
= L
p
().
(b) Let X be any subset of R
r
, where r 1, and let be the subspace measure on X induced by Lebesgue measure
on R
r
. Write C
k
for the set of (bounded) continuous functions g : R
r
R such that x : g(x) ,= 0 is bounded, and S
0
for the space of linear combinations of functions of the form I, where I R
r
is a bounded half-open interval. Then
(gX)

: g C
k
and (hX)

: h S
0
are dense in L
p
().
proof (a) I repeat the argument of 242M with a tiny modication.
(i) Suppose that u L
p
(), u 0 and > 0. Express u as f

where f : X [0, [ is a measurable function. Let


g : X R be a simple function such that 0 g f
p
and
_
g
_
f
p

p
. Set h = g
1/p
. Then h is a simple function
and h f. Because p > 1, (f h)
p
+h
p
f
p
and
_
(f h)
p

_
f
p
g
p
,
162 Function spaces 244H
so
|u h

|
p
= (
_
[f h[
p
)
1/p
,
while h

S.
(ii) For general u L
p
, > 0, u can be expressed as u
+
u

where u
+
= u 0, u

= (u) 0 belong to L
p
and
are non-negative. By (i), we can nd v
1
, v
2
S such that |u
+
v
1
|
p

1
2
and |u

v
2
|
p

1
2
, so that v = v
1
v
2
S
and |u v|
p
. As u and are arbitrary, S is dense.
(b) Again, all the ideas are to be found in 242O; the changes needed are in the formulae, not in the method. They
will go more easily if I note at the outset that whenever f
1
, f
2
L
p
() and
_
[f
1
[
p

p
,
_
[f
2
[
p

p
(where , 0),
then
_
[f
1
+f
2
[
p
( +)
p
; this is just the triangle inequality for | |
p
(244Fb). Also I will regularly express the target
relationships in the form
_
X
[f g[
p

p
,
_
X
[f g[
p

p
. Now let me run through the argument of 242Oa, rather
more briskly than before.
(i) Suppose rst that f = IX where I R
r
is a bounded half-open interval. As before, we can set h = I
and get
_
X
[f h[
p
= 0. This time, use the same construction to nd an interval J and a function g C
k
such that
I g J and
r
(J I)
p
; this will ensure that
_
X
[f g[
p

p
.
(ii) Now suppose that f = (X E) where E R
r
is a set of nite measure. Then, for the same reasons
as before, there is a disjoint family I
0
, . . . , I
n
of half-open intervals such that
r
(E

jn
I
j
) (
1
2
)
p
. Accordingly
h =

n
j=0
I
j
S
0
and
_
X
[f h[
p
(
1
2
)
p
. And (i) tells us that there is for each j n a g
j
C
k
such that
_
X
[g
j
I
j
[
p
(/2(n + 1))
p
, so that g =

n
j=0
g
j
C
k
and
_
X
[f g[
p

p
.
(iii) The move to simple functions, and thence to arbitrary members of L
p
(), is just as before, but using |f|
p
in place of
_
X
[f[. Finally, the translation from L
p
to L
p
is again direct remembering, as before, to check that gX,
hX belong to L
p
() whenever g C
k
and h S
0
.
*244I Corollary In the context of 244Hb, L
p
() is separable.
proof Let A be the set
(

n
j=0
q
j
([a
j
, b
j
[ X))

: n N, q
0
, . . . , q
n
Q, a
0
, . . . , a
n
, b
0
, . . . , b
n
Q
r
.
Then A is a countable subset of L
p
(), and its closure must contain (

n
j=0
c
j
([a
j
, b
j
[ X))

whenever c
0
, . . . , c
n
R
and a
0
, . . . , a
n
, b
0
, . . . , b
n
R
r
; that is, A is a closed set including (hX)

: h S
0
, and is the whole of L
p
(), by
244Hb.
244J Duality in L
p
spaces Let (X, , ) be any measure space, and p ]1, [. Set q = p/(p 1); note that
1
p
+
1
q
= 1 and that p = q/(q1); the relation between p and q is symmetric. Now uv L
1
() and |uv|
1
|u|
p
|v|
q
whenever u L
p
= L
p
() and v L
q
= L
q
() (244E). Consequently we have a bounded linear operator T from L
q
to
the normed space dual (L
p
)

of L
p
, given by writing
(Tv)(u) =
_
u v
for all u L
p
and v L
q
, exactly as in 243F.
244K Theorem Let (X, , ) be a measure space, and p ]1, [; set q = p/(p 1). Then the canonical map
T : L
q
() L
p
()

, described in 244J, is a normed space isomorphism.


Remark I should perhaps remind anyone who is reading this section to learn about L
2
that the basic theory of Hilbert
spaces yields this theorem in the case p = q = 2 without any need for the more generally applicable argument given
below (see 244N, 244Yk).
proof We know that T is a bounded linear operator of norm at most 1; I need to show (i) that T is actually an
isometry (that is, that |Tv| = |v|
q
for every v L
q
), which will show incidentally that T is injective (ii) that T is
surjective, which is the really substantial part of the theorem.
(a) If v L
q
, then by 244Fa (recalling that p = q/(q 1)) there is a u L
p
such that |u|
p
1 and
_
u v = |v|
q
;
thus |Tv| (Tv)(u) = |v|
q
. As we know already that |Tv| |v|
q
, we have |Tv| = |v|
q
for every v, and T is an
isometry.
(b) The rest of the proof, therefore, will be devoted to showing that T : L
q
(L
p
)

is surjective. Fix h (L
p
)

with |h| = 1.
244K L
p
163
I need to show that h lives on a countable union of sets of nite measure in X, in the following sense: there is
a non-decreasing sequence E
n

nN
of sets of nite measure such that h(f

) = 0 whenever f L
p
and f(x) = 0 for
x

nN
E
n
. PPP Choose a sequence u
n

nN
in L
p
such that |u
n
|
p
1 for every n and lim
n
h(u
n
) = |h| = 1. For
each n, express u
n
as f

n
, where f
n
: X R is a measurable function. Set
E
n
= x :

n
k=0
[f
k
(x)[
p
2
n

for n N; because [f
k
[
p
is measurable and integrable and has domain X for every k, E
n
and E
n
< for each n.
Now suppose that f L
p
(X) and that f(x) = 0 for x

nN
E
n
; set u = f

L
p
. ??? Suppose, if possible, that
h(u) ,= 0, and consider h(cu), where
sgn c = sgn h(u), 0 < [c[ < (p [h(u)[ |u|
p
p
)
1/(p1)
.
(Of course |u|
p
,= 0 if h(u) ,= 0.) For each n, we have
x : f
n
(x) ,= 0

mN
E
m
x : f(x) = 0,
so [f
n
+cf[
p
= [f
n
[
p
+[cf[
p
and
h(u
n
+cu) |u
n
+cu|
p
= (|u
n
|
p
p
+|cu|
p
p
)
1/p
(1 +[c[
p
|u|
p
p
)
1/p
.
Letting n ,
1 +ch(u) (1 +[c[
p
|u|
p
p
)
1/p
.
Because sgn c = sgn h(u), ch(u) = [c[[h(u)[ and we have
1 +p[c[[h(u)[ (1 +ch(u))
p
1 +[c[
p
|u|
p
p
,
so that
p[h(u)[ [c[
p1
|u|
p
p
< p[h(u)[
by the choice of c; which is impossible. XXX
This means that h(f

) = 0 whenever f : X R is measurable, belongs to L


q
, and is zero on

nN
E
n
. QQQ
(c) Set H
n
= E
n

k<n
E
k
for each n N; then H
n

nN
is a disjoint sequence of sets of nite measure. Now
h(u) =

n=0
h(u (H
n
)

) for every u L
p
. PPP Express u as f

, where f : X R is measurable. Set f


n
= f H
n
for each n, g = f (X

nN
H
n
); then h(g

) = 0, by (a), because

nN
H
n
=

nN
E
n
. Consider
g
n
= g +

n
k=0
f
k
L
p
for each n. Then lim
n
f g
n
= 0, and
[f g
n
[
p
[f[
p
L
1
for every n, so by either Fatous Lemma or Lebesgues Dominated Convergence Theorem
lim
n
_
[f g
n
[
p
= 0,
and
lim
n
|u g

k=0
u (H
k
)

|
p
= lim
n
|u g

n
|
p
= lim
n
_
_
[f g
n
[
p
_
1/p
= 0,
that is,
u = g

k=0
u H

k
in L
p
. Because h : L
p
R is linear and continuous, it follows that
h(u) = h(g

) +

k=0
h(u H

k
) =

k=0
h(u H

k
),
as claimed. QQQ
(d) For each n N, dene
n
: R by setting

n
F = h((F H
n
)

)
for every F . (Note that
n
F is always dened because (F H
n
) < , so that
164 Function spaces 244K
|(F H
n
)|
p
= (F H
n
)
1/p
< .)
Then
n
= h(0) = 0, and if F
k

kN
is a disjoint sequence in ,
|(

kN
H
n
F
k
)

m
k=0
(H
n
F
k
)|
p
= (H
n

k=m+1
F
k
)
1/p
0
as m , so

n
(

kN
F
k
) =

k=0

n
F
k
.
So
n
is countably additive. Further, [
n
F[ (H
n
F)
1/p
for every F , so
n
is truly continuous in the sense of
232Ab.
There is therefore an integrable function g
n
such that
n
F =
_
F
g
n
for every F ; let us suppose that g
n
is
measurable and dened on the whole of X. Set g(x) = g
n
(x) whenever n N and x H
n
, g(x) = 0 for x X

nN
H
n
.
(e) g =

n=0
g
n
H
n
is measurable and has the property that
_
F
g = h(F

) whenever n N and F is a
measurable subset of H
n
; consequently
_
F
g = h(F

) whenever n N and F is a measurable subset of E


n
=

kn
H
k
.
Set G = x : g(x) > 0

nN
E
n
. If F G and F < , then
lim
n
_
g (F E
n
) sup
nN
h((F E
n
)

) sup
nN
|(F E
n
)|
p
= (F)
1/p
,
so by B.Levis theorem
_
F
g =
_
g F = lim
n
_
g (F E
n
)
exists. Similarly,
_
F
g exists if F x : g(x) < 0 has nite measure; while obviously
_
F
g exists if F x : g(x) = 0.
Accordingly
_
F
g exists for every set F of nite measure. Moreover, by Lebesgues Dominated Convergence Theorem,
_
F
g = lim
n
_
FE
n
g = lim
n
h((F E
n
)

) =

n=0
h((F H
n
)

) = h(F

)
for such F, by (c) above. It follows at once that
_
g f = h(f

)
for every simple function f : X R.
(f ) Now g L
q
. PPP (i) We already know that [g[
q
: X R is measurable, because g is measurable and a [a[
q
is
continuous. (ii) Suppose that f is a non-negative simple function and f
a.e.
[g[
q
. Then f
1/p
is a simple function, and
sgn g is measurable and takes only the values 0, 1 and 1, so f
1
= f
1/p
sgn g is simple. We see that
_
[f
1
[
p
=
_
f,
so |f
1
|
p
= (
_
f)
1/p
. Accordingly
(
_
f)
1/p
h(f

1
) =
_
g f
1
=
_
[g f
1/p
[

_
f
1/q
f
1/p
(because 0 f
1/q

a.e.
[g[)
=
_
f,
and we must have
_
f 1. (iii) Thus
sup
_
f : f is a non-negative simple function, f
a.e.
[g[
q
1 < .
But now observe that if > 0 then
x : [g(x)[
q
=

nN
x : x E
n
, [g(x)[
q
,
and for each n N
x : x E
n
, [g(x)[
q

1

,
because f = x : x E
n
, [g(x)[
q
is a simple function less than or equal to [g[
q
, so has integral at most 1.
Accordingly
x : [g(x)[
q
= sup
nN
x : x E
n
, [g(x)[
q

1

< .
Thus [g[
q
is integrable, by the criterion in 122Ja. QQQ
244Nb L
p
165
(g) We may therefore speak of h
1
= T(g

) (L
p
)

, and we know that it agrees with h on members of L


p
of the form
f

where f is a simple function. But these form a dense subset of L


p
, by 244Ha, and both h and h
1
are continuous, so
h = h
1
, by 2A3Uc, and h is a value of T. The argument as written so far has assumed that |h| = 1. But of course
any non-zero member of (L
p
)

is a scalar multiple of an element of norm 1, so is a value of T. So T : L


q
(L
p
)

is
indeed surjective, and is therefore an isometric isomorphism, as claimed.
244L Continuing with the same topics as in 242 and 243, I turn to the order-completeness of L
p
.
Theorem Let (X, , ) be any measure space, and p [1, [. Then L
p
= L
p
() is Dedekind complete.
proof I use 242H. Let A L
p
be a non-empty set which is bounded above in L
p
. Fix u
0
A and set
A

= u
0
u : u A,
so that A

has the same upper bounds as A and is bounded below by u


0
. Fixing an upper bound w
0
of A in L
p
, then
u
0
u w
0
for every u A

. Set
B = (u u
0
)
p
: u A

.
Then
0 v (w
0
u
0
)
p
L
1
= L
1
()
for every v B, so B is a non-empty subset of L
1
which is bounded above in L
1
, and therefore has a least upper bound
v
1
in L
1
. Now v
1/p
1
L
p
; consider w
1
= u
0
+v
1/p
1
. If u A

then (u u
0
)
p
v
1
so u u
0
v
1/p
1
and u w
1
; thus w
1
is an upper bound for A

. If w L
p
is an upper bound for A

, then uu
0
wu
0
and (uu
0
)
p
(wu
0
)
p
for every
u A

, so (wu
0
)
p
is an upper bound for B and v
1
(wu
0
)
p
, v
1/p
1
wu
0
and w
1
w. Thus w = sup A

= sup A
in L
p
. As A is arbitrary, L
p
is Dedekind complete.
244M As in the last two sections, the theory of conditional expectations is worth revisiting.
Theorem Let (X, , ) be a probability space, and T a -subalgebra of . Take p [1, ]. Regard L
0
(T) as
a subspace of L
0
= L
0
(), as in 242Jh, so that L
p
(T) becomes L
p
() L
0
(T). Let P : L
1
() L
1
(T) be
the conditional expectation operator, as described in 242Jd. Then whenever u L
p
= L
p
(), [Pu[
p
P([u[
p
), so
Pu L
p
(T) and |Pu|
p
|u|
p
. Moreover, P[L
p
] = L
p
(T).
proof For p = , this is 243Jb, so I assume henceforth that p < . Concerning the identication of L
p
(T) with
L
p
L
0
(T), if S : L
0
(T) L
0
is the canonical embedding described in 242J, we have [Su[
p
= S([u[
p
) for every
u L
0
(T), so that Su L
p
i [u[
p
L
1
(T) i u L
p
(T).
Set (t) = [t[
p
for t R; then is a convex function (because it is absolutely continuous on any bounded interval,
and its derivative is non-decreasing), and [u[
p
=

(u) for every u L
0
= L
0
(), where

is dened as in 241I. Now if
u L
p
= L
p
(), we surely have u L
1
(because [u[ [u[
p
(X)

, or otherwise); so 242K tells us that [Pu[


p
P[u[
p
.
But this means that Pu L
p
L
1
(T) = L
p
(T), and
|Pu|
p
= (
_
[Pu[
p
)
1/p
(
_
P[u[
p
)
1/p
= (
_
[u[
p
)
1/p
= |u|
p
,
as claimed. If u L
p
(T), then Pu = u, so P[L
p
] is the whole of L
p
(T).
244N The space L
2
(a) As I have already remarked, the really important function spaces are L
0
, L
1
, L
2
and L

.
L
2
has the special property of being an inner product space; if (X, , ) is any measure space and u, v L
2
= L
2
()
then u v L
1
(), by 244Eb, and we may write (u[v) =
_
u v. This makes L
2
a real inner product space (because
(u
1
+u
2
[v) = (u
1
[v) + (u
2
[v), (cu[v) = c(u[v), (u[v) = (v[u),
(u[u) 0, u = 0 whenever (u[u) = 0
for all u, u
1
, u
2
, v L
2
and c R) and its norm | |
2
is the associated norm (because |u|
2
=
_
(u[u) whenever u L
2
).
Because L
2
is complete (244G), it is a real Hilbert space. The fact that it may be identied with its own dual (244K)
can of course be deduced from this.
I will use the phrase square-integrable to describe functions in L
2
().
(b) Conditional expectations take a special form in the case of L
2
. Let (X, , ) be a probability space, T a -
subalgebra of , and P : L
1
= L
1
() L
1
(T) L
1
the corresponding conditional expectation operator. Then
P[L
2
] L
2
, where L
2
= L
2
() (244M), so we have an operator P
2
= PL
2
from L
2
to itself. Now P
2
is an orthogonal
166 Function spaces 244Nb
projection and its kernel is u : u L
2
,
_
F
u = 0 for every F T. PPP (i) If u L
1
then Pu = 0 i
_
F
u = 0 for every
F T (cf. 242Je); so surely the kernel of P
2
is the set described. (ii) Since P
2
= P, P
2
also is a projection; because P
2
has norm at most 1 (244M), and is therefore continuous,
U = P
2
[L
2
] = L
2
(T) = u : u L
2
, P
2
u = u, V = u : P
2
u = 0
are closed linear subspaces of L
2
such that U V = L
2
. (iii) Now suppose that u U and v V . Then P[v[ L
2
, so
u P[v[ L
1
and P(u v) = u Pv, by 242L. Accordingly
(u[v) =
_
u v =
_
P(u v) =
_
u Pv = 0.
Thus U and V are orthogonal subspaces of L
2
, which is what we mean by saying that P
2
is an orthogonal projection.
(Some readers will know that every projection of norm at most 1 on an inner product space is orthogonal.) QQQ
*244O This is not the place for a detailed discussion of the geometry of L
p
spaces. However there is a particularly
important fact about the shape of the unit ball which is accessible by the methods available to us here.
Theorem (Clarkson 36) Suppose that p ]1, [ and (X, , ) is a measure space. Then L
p
= L
p
() is uniformly
convex (denition: 2A4K).
proof (Hanner 56, Naor 04)
(a)(i) For 0 < t 1 and a, b R, set

0
(t) = (1 +t)
p1
+ (1 t)
p1
,

1
(t) =
(1+t)
p1
(1t)
p1
t
p1
= (
1
t
+ 1)
p1
(
1
t
1)
p1
,

ab
(t) = [a[
p

0
(t) +[b[
p

1
(t),

2
(b) = (1 +b)
p
+[1 b[
p
.
(ii) We have

0
(t) = (p 1)((1 +t)
p2
(1 t)
p2
), which has the same sign as p 2,
(of course it is zero if p = 2),

1
(t) =
p1
t
2
((
1
t
1)
p2
(
1
t
1)
p2
)
=
p1
t
p
((1 +t)
p2
(1 t)
p2
) =
1
t
p

0
(t)
for every t ]0, 1[. Accordingly

1
has the same sign as p 2 everywhere on ]0, 1[. Also

0
(1) = 2
p1
=
1
(1),
so
0

1
has the same sign as 2 p everywhere on ]0, 1].
(iii)
2
is strictly increasing on [0, [. PPP For b > 0,

2
(b) = p((1 +b)
p1
(1 b)
p1
) > 0 if b 1,
= p((1 +b)
p1
+ (b 1)
p1
) > 0 if b 1. QQQ
(iv) If 0 < b a, then

ab
(
b
a
) = a
p

0
(
b
a
) +b
p

1
(
b
a
)
= a
p
(1 +
b
a
)
p1
+a
p
(1
b
a
)
p1
+b
p
(
a
b
+ 1)
p1
b
p
(
a
b
1)
p1
= a(a +b)
p1
+a(a b)
p1
+b(a +b)
p1
b(a b)
p1
= (a +b)
p
+ (a b)
p
= (a +b)
p
+[a b[
p
. ()
*244O L
p
167
Also

ab
(t) = (a
p

b
p
t
p
)

0
(t) has the sign of 2 p if 0 < t <
b
a
and the sign of p 2 if
b
a
< t < 1. Accordingly
if 1 < p 2,
ab
(t)
ab
(
b
a
) = (a +b)
p
+[a b[
p
for every t ]0, 1],
if p 2,
ab
(t)
ab
(
b
a
) = (a +b)
p
+[a b[
p
for every t ]0, 1].
(v) Now consider the case 0 < a b. If 1 < p 2,

ab
(t) = a
p

0
(t) +b
p

1
(t) a
p

0
(t) +b
p

1
(t) + (b
p
a
p
)(
0
(t)
1
(t))
(by (ii))
= b
p

0
(t) +a
p

1
(t) (b +a)
p
+ (b a)
p
= (a +b)
p
+[a b[
p
for every t ]0, 1]. If p 2, on the other hand,

ab
(t) = a
p

0
(t) +b
p

1
(t) a
p

0
(t) +b
p

1
(t) + (b
p
a
p
)(
0
(t)
1
(t))
= b
p

0
(t) +a
p

1
(t) (a +b)
p
+[a b[
p
for every t.
(vi) Thus we have the inequalities

ab
(t) [a +b[
p
+[a b[
p
if p ]1, 2] ,
[a +b[
p
+[a b[
p
if p [2, [
()
whenever t ]0, 1] and a, b ]0, [. Since (a, b)
ab
(t) is continuous for every t, the same inequalities are valid for
all a, b [0, [. And since

ab
(t) =
|a|,|b|
(t), [a +b[
p
+[a b[
p
=

[a[ +[b[

p
+

[a[ [b[

p
for all a, b R and t ]0, 1], the inequalities (*) are valid for all a, b R and t ]0, 1].
(b) Suppose that p 2.
(i)
|u +v|
p
p
+|u v|
p
p
(|u|
p
+|v|
p
)
p
+[|u|
p
|v|
p
[
p
for all u, v L
p
. PPP First consider the case 0 < |v|
p
|u|
p
. Let f, g : X R be -measurable functions such that
f

= u and g

= v. Then for any t ]0, 1],


|u +v|
p
p
+|u v|
p
p
=
_
[f(x) +g(x)[
p
+[f(x) g(x)[
p
(dx)

_

f(x),g(x)
(t)(dx)
(by the second inequality in (*))
=
_
[f(x)[
p

0
(t) +[g(x)[
p

1
(t)(dx) = |u|
p
p

0
(t) +|v|
p
p

1
(t).
In particular, taking t = |v|
p
/|u|
p
, and applying () from (a-iv),
|u +v|
p
p
+|u v|
p
p
(|u|
p
+|v|
p
)
p
+[|u|
p
|v|
p
[
p
.
Of course the result will also be true if 0 < |u|
p
|v|
p
, and the case in which either u or v is zero is trivial. QQQ
(ii) Let ]0, 2]. Set = 2 (2
p

p
)
1/p
> 0. If u, v L
p
, |u|
p
= |v|
p
= 1 and |u v|
p
, then
|u +v|
p
p
+
p
|u +v|
p
p
+|u v|
p
p
(|u|
p
+|v|
p
)
p
+[|u|
p
|v|
p
[
p
= 2
p
,
so |u +v|
p
(2
p

p
)
1/p
= 2 . As u, v and are arbitrary, L
p
is uniformly convex.
168 Function spaces *244O
(c) Next suppose that p ]1, 2].
(i)
(|u|
p
+|v|
p
)
p
+[|u|
p
|v|
p
[
p
|u +v|
p
p
+|u v|
p
p
for all u, v L
p
. PPP We can repeat all the ideas, and most of the formulae, of (b-i). As before, start with the case
0 < |v|
p
|u|
p
. Let f, g : X R be -measurable functions such that f

= u and g

= v. Taking t = |v|
p
/|u|
p
,
|u +v|
p
p
+|u v|
p
p
=
_
[f(x) +g(x)[
p
+[f(x) g(x)[
p
(dx)

_

f(x),g(x)
(t)(dx)
(by the rst inequality in (*))
= |u|
p
p

0
(t) +|v|
p
p

1
(t) = (|u|
p
+|v|
p
)
p
+[|u|
p
|v|
p
[
p
.
Similarly if 0 < |u|
p
|v|
p
, and the case in which either u or v is zero is trivial. QQQ
(ii) Let > 0. Set =
2
(

2
) > 2 (see (a-iii) above) and = 2
_
1(
2

)
1/p
_
> 0. Now suppose that |u|
p
= |v|
p
= 1
and |u v|
p
. Then |u +v|
p
2 . PPP If u +v = 0 this is trivial. Otherwise, set a = |u +v|
p
and b = |u v|
p
.
Then a 2 and b , so
a
p
= a
p

2
(

2
) a
p

2
(
b
a
)
(by (a-iii) again)
= (a +b)
p
+[a b[
p
= (|u +v|
p
+|u v|
p
)
p
+[|u +v|
p
|u v|
p
[
p
|2u|
p
p
+|2v|
p
p
(by (i) here)
= 2
p+1
and a 2
_
2

_
1/p
= 2 . QQQ As u, v and are arbitrary, L
p
is uniformly convex.
Remark The inequalities in (b-i) and (c-i) of the proof are called Hanners inequalities.
244P Complex L
p
Let (X, , ) be any measure space.
(a) For any p ]1, [, set
L
p
C
= L
p
C
() = f : f L
0
C
(), [f[
p
is integrable,
L
p
C
= L
p
C
() = f

: f L
p
C

= u : u L
0
C
(), !e(u) L
p
() and Jm(u) L
p
()
= u : u L
0
C
(), [u[ L
p
().
Then L
p
C
is a linear subspace of L
0
C
(). Set |u|
p
= |[u[|
p
= (
_
[u[
p
)
1/p
for u L
p
C
.
(b) The proof of 244E(b-i) applies unchanged to complex-valued functions, so taking q = p/(p 1) we get
|u v|
1
|u|
p
|v|
q
for all u L
p
C
, v L
q
C
. 244Fa becomes
for every u L
p
C
there is a v L
q
C
such that |v|
q
1 and
_
u v = [
_
u v[ = |u|
p
;
the same proof works, if you omit all mention of the functional , and allow me to write sgn a = [a[/a for all non-zero
complex numbers it would perhaps be more natural to write sgna in place of sgn a. So, just as before, we nd that
| |
p
is a norm. We can use the argument of 244G to show that L
p
C
is complete. (Alternatively, note that a sequence
u
n

nN
in L
0
C
is Cauchy, or convergent, i its real and imaginary parts are.) The space S
C
of equivalence classes of
complex-valued simple functions is dense in L
p
C
. If X is a subset of R
r
and is Lebesgue measure on X, then the
space of equivalence classes of continuous complex-valued functions on X with bounded support is dense in L
p
C
.
244Xk L
p
169
(c) The canonical map T : L
q
C
(L
p
C
)

, dened by writing (Tv)(u) =


_
uv, is surjective because TL
q
: L
q
(L
p
)

is surjective; and it is an isometry by the remarks in (b) just above. Thus we can still identify L
q
C
with (L
p
C
)

.
(d) When we come to the complex form of Jensens inequality, it seems that a new idea is needed. I have relegated
this to 242Yk-242Yl. But for the complex form of 244M a simpler argument will suce. If (X, , ) is a probability
space, T is a -subalgebra of and P : L
1
C
() L
1
C
(T) is the corresponding conditional expectation operator, then
for any u L
p
C
we shall have
[Pu[
p
(P[u[)
p
P([u[
p
),
applying 242Pc and 244M. So |Pu|
p
|u|
p
, as before.
(e) There is a special point arising with L
2
C
. We now have to dene
(u[v) =
_
u v
for u, v L
2
C
, so that (u[u) =
_
[u[
2
= |u|
2
2
for every u; this means that (v[u) is the complex conjugate of (u[v).
244X Basic exercises >>>(a) Let (X, , ) be a measure space, and (X,

, ) its completion. Show that L
p
( ) =
L
p
() and L
p
( ) = L
p
() for every p [1, ].
(b) Let (X, , ) be a measure space, and 1 p q r . Show that L
p
() L
r
() L
q
() L
p
() +L
r
()
L
0
(). (See also 244Yh.)
(c) Let (X, , ) be a measure space. Suppose that p, q, r [1, ] and that
1
p
+
1
q
=
1
r
, setting
1

= 0 as usual.
Show that u v L
r
() and |u v|
r
|u|
p
|v|
q
whenever u L
p
() and v L
q
(). (Hint: if r < apply H olders
inequality to [u[
r
L
p/r
, [v[
r
L
q/r
.)
>>>(d)(i) Let (X, , ) be a probability space. Show that if 1 p r then |f|
p
|f|
r
for every f L
r
().
(Hint: use H olders inequality to show that
_
[f[
p
|[f[
p
|
r/p
.) In particular, L
p
() L
r
(). (ii) Let (X, , ) be
a measure space such that E 1 whenever E and E > 0. (This happens, for instance, when is counting
measure on X.) Show that if 1 p r then L
p
() L
r
() and |u|
p
|u|
r
for every u L
p
(). (Hint: look
rst at the case |u|
p
= 1.)
(e) Let (X, , ) be a semi-nite measure space, and p, q [1, ] such that
1
p
+
1
q
= 1. Show that if u L
0
()L
p
()
then there is a v L
q
() such that u v / L
1
(). (Hint: reduce to the case u 0. Show that in this case there is for
each n N a u
n
u such that 4
n
|u
n
|
p
< ; take v
n
L
q
such that |v
n
|
q
2
n
and
_
u
n
v
n
2
n
, and set
v =

n=0
v
n
.)
(f ) Let (X
i
,
i
,
i
)
iI
be a family of measure spaces, and (X, , ) their direct sum (214L). Take any p [1, [.
Show that the canonical isomorphism between L
0
() and

iI
L
0
(
i
) (241Xd) induces an isomorphism between L
p
()
and the subspace
u : u

iI
L
p
(
i
), |u| =
_
iI
|u(i)|
p
p
)
1/p
<
of

iI
L
p
(
i
).
(g) Let (X, , ) be a measure space. Set M
,1
= L
1
() L

(). Show that for u M


,1
the function
p |u|
p
: [1, [ [0, [ is continuous, and that |u|

= lim
p
|u|
p
. (Hint: consider rst the case in which u is
the equivalence class of a simple function.)
(h) Let be counting measure on X = 1, 2, so that L
0
() = R
2
and L
p
() = L
0
() can be identied with R
2
for
every p [1, ]. Sketch the unit balls u : |u|
p
1 in R
2
for p = 1,
3
2
, 2, 3 and .
(i) Let be counting measure on X = 1, 2, 3, so that L
0
() = R
3
and L
p
() = L
0
() can be identied with R
3
for every p [1, ]. Describe the unit balls u : |u|
p
1 in R
3
for p = 1, 2 and .
(j) At which points does the argument of 244Hb break down if we try to apply it to L

with | |

?
(k) Let p [1, [. (i) Show that [a
p
b
p
[ [a b[
p
for all a, b 0. (Hint: for a > b, dierentiate both sides with
respect to a.) (ii) Let (X, , ) be a measure space and U a linear subspace of L
0
() such that () [u[ U for every
u U () u
1/p
U for every u U () U L
1
is dense in L
1
= L
1
(). Show that U L
p
is dense in L
p
= L
p
().
(Hint: check rst that u : u U L
1
, u 0 is dense in u : u L
1
, u 0.) (iii) Use this to prove 244H from 242M
and 242O.
170 Function spaces 244Xl
(l) For any measure space (X, , ) write M
1,
= M
1,
() for v + w : v L
1
(), w L

() L
0
(). Show
that M
1,
is a linear subspace of L
0
including L
p
for every p [1, ], and that if u L
0
, v M
1,
and [u[ [v[
then u M
1,
. (Hint: u = v w where [w[ X

.)
(m) Let (X, , ) and (Y, T, ) be two measure spaces, and let T
+
be the set of linear operators T : M
1,
()
M
1,
() such that () Tu 0 whenever u 0 in M
1,
() () Tu L
1
() and |Tu|
1
|u|
1
whenever u L
1
()
() Tu L

() and |Tu|

|u|

whenever u L

(). (i) Show that if : R R is a convex function such


that (0) = 0, and u M
1,
() is such that

(u) M
1,
() (interpreting

: L
0
() L
0
() as in 241I), then

(Tu) M
1,
() and

(Tu) T(

(u)) for every T T


+
. (ii) Hence show that if p [1, ] and u L
p
(),
Tu L
p
() and |Tu|
p
|u|
p
for every T T
+
.
>>>(n) Let X be any set, and let be counting measure on X. In this case it is customary to write
p
(X) for
L
p
(), and to identify it with L
p
(). In particular, L
2
() becomes identied with
2
(X), the space of square-summable
functions on X. Write out statements and proofs of the results of this section adapted to this special case.
(o) Let (X, , ) and (Y, T, ) be measure spaces and : X Y an inverse-measure-preserving function. Show that
the map g g : L
0
() L
0
() (241Xg) induces a norm-preserving map from L
p
() to L
p
() for every p [1, ],
and also a map from M
1,
() to M
1,
() which belongs to the class T
+
of 244Xm.
244Y Further exercises (a) Let (X, , ) be a measure space, and (X,

, ) its c.l.d. version. Show that L
p
()
L
p
( ) and that this embedding induces a Banach lattice isomorphism between L
p
() and L
p
( ), for every p [1, [.
(b) Let (X, , ) be any measure space, and p [1, [. Show that L
p
() has the countable sup property in the
sense of 241Ye. (Hint: 242Yh.)
(c) Suppose that (X, , ) is a measure space, and that p ]0, 1[, q < 0 are such that
1
p
+
1
q
= 1. (i) Show that
ab
1
p
a
p
+
1
q
b
q
for all real a 0, b > 0. (Hint: set p

=
1
p
, q

=
p

1
, c = (ab)
p
, d = b
p
and apply 244Ea.) (ii) Show
that if f, g L
0
() are non-negative and E = x : x domg, g(x) > 0, then
(
_
E
f
p
)
1/p
(
_
E
g
q
)
1/q

_
f g.
(iii) Show that if f, g L
0
() are non-negative, then
(
_
f
p
)
1/p
+ (
_
g
p
)
1/p
(
_
(f +g)
p
)
1/p
.
(d) Let (X, , ) be a measure space, and Y a subset of X; write
Y
for the subspace measure on Y . Show that the
canonical map T from L
0
() onto L
0
(
Y
) (241Yg) includes a surjection from L
p
() onto L
p
(
Y
) for every p [1, ],
and also a map from M
1,
() to M
1,
(
Y
) which belongs to the class T
+
of 244Xm. Show that the following are
equiveridical: (i) there is some p [1, [ such that TL
p
() is injective; (ii) T : L
p
() L
p
(
Y
) is norm-preserving
for every p [1, [; (iii) F Y ,= whenever F and 0 < F < .
(e) Let (X, , ) be any measure space, and p [1, [. Show that the norm | |
p
on L
p
() is order-continuous in
the sense of 242Yg.
(f ) Let (X, , ) be any measure space, and p [1, ]. Show that if A L
p
() is upwards-directed and norm-
bounded, then it is bounded above. (Hint: 242Yf.)
(g) Let (X, , ) be any measure space, and p [1, ]. Show that if a non-empty set A L
p
() is upwards-directed
and has a supremum in L
p
(), then | sup A|
p
sup
uA
|u|
p
. (Hint: consider rst the case 0 A.)
(h) Let (X, , ) be a measure space and u L
0
(). (i) Show that I = p : p [1, [ , u L
p
() is an interval.
Give examples to show that it may be open, closed or half-open. (ii) Show that p p ln |u|
p
: I R is convex. (Hint:
if p < q and t ]0, 1[, observe that
_
[u[
tp+(1t)q
|[u[
pt
|
1/t
|[u[
q(1t)
|
1/(1t)
.) (iii) Show that if p q r in I, then
|u|
q
max(|u|
p
, |u|
r
).
(i) Let [a, b] be a non-trivial closed interval in R and F : [a, b] R a function; take p ]1, [. Show that the
following are equiveridical: (i) F is absolutely continuous and its derivative F

belongs to L
p
(), where is Lebesgue
measure on [a, b] (ii)
c = sup

n
i=1
|F(a
i
)F(a
i1
)|
p
(a
i
a
i1
)
p1
: a a
0
< a
1
< . . . < a
n
b
244 Notes L
p
171
is nite, and that in this case c = |F

|
p
. (Hint: (i) if F is absolutely continuous and F

L
p
, use H olders inequality
to show that [F(b

) F(a

)[
p
(b

)
p1
_
b

[F

[
p
whenever a a

b. (ii) If F satises the condition, show


that (

n
i=0
[F(b
i
) F(a
i
)[)
p
c(

n
i=0
(b
i
a
i
))
p1
whenever a a
0
b
0
a
1
. . . b
n
b, so that F is absolutely
continuous. Take a sequence F
n

nN
of polygonal functions approximating F; use 223Xj to show that F

n
F

a.e.,
so that
_
[F

[
p
sup
nN
_
[F

n
[
p
c
p
.)
(j) Let G be an open set in R
r
and write for Lebesgue measure on G. Let C
k
(G) be the set of continuous functions
f : G R such that inf|xy| : x G, f(x) ,= 0, y R
r
G > 0 (counting inf as ). Show that for any p [1, [
the set f

: f C
k
(G) is a dense linear subspace of L
p
().
(k) Let U be any Hilbert space. (i) Show that if C U is convex (that is, tu +(1 t)v C whenever u, v C and
t [0, 1]; see 233Xd), closed and not empty, and u U, then there is a unique v C such that |uv| = inf
wC
|uw|,
and (u v[v w) 0 for every w C. (ii) Show that if h U

there is a unique v U such that h(w) = (w[v) for


every w U. (Hint: apply (i) with C = w : h(w) = 1, u = 0.) (iii) Show that if V U is a closed linear subspace
then there is a unique linear projection P on U such that P[U] = V and (u Pu[v) = 0 for all u U, v V (P is
orthogonal). (Hint: take Pu to be the point of V nearest to u.)
(l) Let (X, , ) be a probability space, and T a -subalgebra of . Use part (iii) of 244Yk to show that there is an
orthogonal projection P : L
2
() L
2
(T) such that
_
F
Pu =
_
F
u for every u L
2
() and F T. Show that Pu 0
whenever u 0 and that
_
Pu =
_
u for every u, so that P has a unique extension to a continuous operator from L
1
()
onto L
1
(T). Use this to develop the theory of conditional expectations without using the Radon-Nikod ym theorem.
(m) (Roselli & Willem 02) (i) Set C = [0, [
2
R
2
. Let : C R be a continuous function such that
(tz) = t(z) for all z C. Show that is convex (denition: 233Xd) i t (1, t) : [0, [ R is convex.
(ii) Show that if p ]1, [ and q =
p
p1
then (s, t) s
1/p
t
1/q
, (s, t) (s
1/p
+ t
1/p
)
p
: C R are convex.
(iii) Show that if p [1, 2] then (s, t) [s
1/p
+ t
1/p
[
p
+ [s
1/p
t
1/p
[
p
is convex. (iv) Show that if p [2, [ then
(s, t) [s
1/p
+t
1/p
[
p
[s
1/p
t
1/p
[
p
is convex. (v) Use (ii) and 233Yj to prove H olders and Minkowskis inequalities.
(vi) Use (iii) and (iv) to prove Hanners inequalities. (vii) Adapt the method to answer (ii) and (iii) of 244Yc.
(n)(i) Show that any inner product space is uniformly convex. (ii) Let U be a uniformly convex Banach space, C U
a non-empty closed convex set, and u U. Show that there is a unique v
0
C such that |u v
0
| = inf
vC
|u v|.
(iii) Let U be a uniformly convex Banach space, and A U a non-empty bounded set. Set
0
= inf : there is some
u U such that A B(u, ) = v : |v u| . Show that there is a unique u
0
U such that A B(u
0
,
0
).
(o) Let (X, , ) be a measure space, and u L
0
(). Suppose that p
n

nN
is a sequence in [1, ] with limit
p [1, ]. Show that if limsup
n
|u|
p
n
is nite then lim
n
|u|
p
n
is dened and is equal to |u|
p
.
244 Notes and comments At this point I feel we must leave the investigation of further function spaces. The next
stage would have to be a systematic abstract analysis of general Banach lattices. The L
p
spaces give a solid foundation
for such an analysis, since they introduce the basic themes of norm-completeness, order-completeness and identication
of dual spaces. I have tried in the exercises to suggest the importance of the next layer of concepts: order-continuity of
norms and the relationship between norm-boundedness and order-boundedness. What I have not had space to discuss
is the subject of order-preserving linear operators between Riesz spaces, which is the key to understanding the order
structure of the dual spaces here. (But you can make a start by re-reading the theory of conditional expectation
operators in 242J-242L, 243J and 244M.) All these topics are treated in Fremlin 74 and in Chapters 35 and 36 of the
next volume.
I remember that one of my early teachers of analysis said that the L
p
spaces (for p ,= 1, 2, ) had somehow got
into the syllabus and had never been got out again. I would myself call them classics, in the sense that they have
been part of the common experience of all functional analysts since functional analysis began; and while you are at
liberty to dislike them, you can no more ignore them than you can ignore Milton if you are studying English poetry.
H olders inequality, in particular, has a wealth of applications; not only 244F and 244K, but also 244Xc-244Xd and
244Yh-244Yi, for instance.
The L
p
spaces, for 1 p , form a kind of continuum. In terms of the concepts dealt with here, there is no
distinction to be drawn between dierent L
p
spaces for 1 < p < except the observation that the norm of L
2
is
an inner product norm, corresponding to a Euclidean geometry on its nite-dimensional subspaces. To discriminate
between the other L
p
spaces we need much more rened concepts in the geometry of normed spaces.
172 Function spaces 244 Notes
In terms of the theorems given here, L
1
seems closer to the middle range of L
p
for 1 < p < than L

does;
thus, for all 1 p < , we have L
p
Dedekind complete (independent of the measure space involved), the space S of
equivalence classes of simple functions is dense in L
p
(again, for every measure space), and the dual (L
p
)

is (almost)
identiable as another function space. All of these should be regarded as consequences in one way or another of the
order-continuity of the norm of L
p
for p < . The chief obstacle to the universal identication of (L
1
)

with L

is
that for non--nite measure spaces the space L

can be inadequate, rather than any pathology in the L


1
space itself.
(This point, at least, I mean to return to in Volume 3.) There is also the point that for a non-semi-nite measure space
the purely innite sets can contribute to L

without any corresponding contribution to L


1
. For 1 < p < , neither
of these problems can arise. Any member of any such L
p
is supported entirely by a -nite part of the measure space,
and the same applies to the dual see part (c) of the proof of 244K.
Of course L
1
does have a markedly dierent geometry from the other L
p
spaces. The rst sign of this is that it is not
reexive as a Banach space (except when it is nite-dimensional), whereas for 1 < p < the identications of (L
p
)

with L
q
and of (L
q
)

with L
p
, where q = p/(p 1), show that the canonical embedding of L
p
in (L
p
)

is surjective,
that is, that L
p
is reexive. But even when L
1
is nite-dimensional the unit balls of L
1
and L

are clearly dierent in


kind from the unit balls of L
p
for 1 < p < ; they have corners instead of being smoothly rounded (244Xh-244Xi). A
direct expression of the dierence is in 244O. As usual, the case p = 2 is both much more important than the general
case and enormously easier (244Yn(i)); and note how Hanners inequalities reverse at p = 2. (See 244Yc for the reversal
of H olders and Minkowskis inequalities at p = 1.) There are occasions on which it is useful to know that | |
1
and
| |

can be approximated, in an exactly describable way, by uniformly convex norms (244Yo). I have written out a
proof of 244O based on ingenuity and advanced calculus, like that of 244E. With a bit more about convex sets and
functions, sketched in 233Yf-233Yj, there is a striking alternative proof (244Ym). Of course the proof of 244Ea also
uses convexity, upside down.
The proof of 244K, identifying (L
p
)

, is a fairly long haul, and it is natural to ask whether we really have to work
so hard, especially since in the case of L
2
we have a much easier argument (244Yk). Of course we can go faster if we
know a bit more about Banach lattices (369 in Volume 3 has the relevant facts), though this route uses some theorems
quite as hard as 244K as given. There are alternative routes using the geometry of the L
p
spaces, following the ideas
of 244Yk, but I do not think they are any easier, and the argument I have presented here at least has the virtue of
using some of the same ideas as the identication of (L
1
)

in 243G. The dierence is that whereas in 243G we may


have to piece together a large family of functions g
F
(part (b-v) of the proof), in 244K there are only countably many
g
n
; consequently we can make the argument work for arbitrary measure spaces, not just localizable ones.
The geometry of Hilbert space gives us an approach to conditional expectations which does not depend on the
Radon-Nikod ym theorem (244Yl). To turn these ideas into a proof of the Radon-Nikod ym theorem itself, however,
requires qualities of determination and ingenuity which can be better employed elsewhere.
The convexity arguments of 233J/242K can be used on many operators besides conditional expectations (see 244Xm).
The class T
+
described there is not in fact the largest for which these arguments work; I take the ideas farther in
Chapter 37. There is also a great deal more to be said if you put an arbitrary pair of L
p
spaces in place of L
1
and L

in 244Xl. 244Yh is a start, but for the real thing (the Riesz convexity theorem) I refer you to Zygmund 59, XII.1.11
or Dunford & Schwartz 57, VI.10.11.
245 Convergence in measure
I come now to an important and interesting topology on the spaces L
0
and L
0
. I start with the denition (245A)
and with properties which echo those of the L
p
spaces for p 1 (245D-245E). In 245G-245J I describe the most useful
relationships between this topology and the norm topologies of the L
p
spaces. For -nite spaces, it is metrizable
(245Eb) and sequential convergence can be described in terms of pointwise convergence of sequences of functions
(245K-245L).
245A Denitions Let (X, , ) be a measure space.
(a) For any measurable set F X of nite measure, we have a functional
F
on L
0
= L
0
() dened by setting

F
(f) =
_
[f[ F
for every f L
0
. (The integral exists in R because [f[ F belongs to L
0
and is dominated by the integrable function
F). Now
F
(f +g)
F
(f) +
F
(g) whenever f, g L
0
. PPP We need only observe that
min([(f +g)(x)[, F(x)) min([f(x)[, F(x)) + min([g(x)[, F(x))
for every x domf domg, which is almost every x X. QQQ Consequently, setting
F
(f, g) =
F
(f g), we have
245Cb Convergence in measure 173

F
(f, h) =
F
((f g) + (g h))
F
(f g) +
F
(g h) =
F
(f, g) +
F
(g, h),

F
(f, g) =
F
(f g) 0,

F
(f, g) =
F
(f g) =
F
(g f) =
F
(g, f)
for all f, g, h L
0
; that is,
F
is a pseudometric on L
0
.
(b) The family

F
: F , F <
now denes a topology on L
0
(2A3F); I will call it the topology of convergence in measure on L
0
.
(c) If f, g L
0
and f =
a.e.
g, then [f[ F =
a.e.
[g[ F and
F
(f) =
F
(g), for every set F of nite measure.
Consequently we have functionals
F
on L
0
= L
0
() dened by writing

F
(f

) =
F
(f)
whenever f L
0
, F and F < . Corresponding to these we have pseudometrics
F
dened by either of the
formulae

F
(u, v) =
F
(u v),
F
(f

, g

) =
F
(f, g)
for u, v L
0
, f, g L
0
and F of nite measure. The family of these pseudometrics denes the topology of
convergence in measure on L
0
.
(d) I shall allow myself to say that a sequence (in L
0
or L
0
) converges in measure if it converges for the topology
of convergence in measure (in the sense of 2A3M).
245B Remarks (a) Of course the topologies of L
0
, L
0
are about as closely related as it is possible for them to be.
Not only is the topology of L
0
the quotient of the topology on L
0
(that is, a set G L
0
is open i f : f

G is
open in L
0
), but every open set in L
0
is the inverse image under the quotient map of an open set in L
0
.
(b) It is convenient to note that if F
0
, . . . , F
n
are measurable sets of nite measure with union F, then, in the
notation of 245A,
F
i

F
for every i; this means that a set G L
0
is open for the topology of convergence in measure
i for every f G we can nd a single set F of nite measure and a > 0 such that

F
(g, f) = g G.
Similarly, a set G L
0
is open for the topology of convergence in measure i for every u G we can nd a set F of
nite measure and a > 0 such that

F
(v, u) = v G.
(c) The phrase topology of convergence in measure agrees well enough with standard usage when (X, , ) is
totally nite. But a warning! the phrase topology of convergence in measure is also used for the topology dened
by the metric of 245Ye below, even when X = . I have seen the phrase local convergence in measure used for
the topology of 245A. Most authors ignore non--nite spaces in this context. However I hold that 245D-245E below
are of sucient interest to make the extension worth while.
245C Pointwise convergence The topology of convergence in measure is almost denable in terms of pointwise
convergence, which is one of the roots of measure theory. The correspondence is closest in -nite measure spaces (see
245K), but there is still a very important relationship in the general case, as follows. Let (X, , ) be a measure space,
and write L
0
= L
0
(), L
0
= L
0
().
(a) If f
n

nN
is a sequence in L
0
converging almost everywhere to f L
0
, then f
n

nN
f in measure. PPP By
2A3Mc, I have only to show that lim
n

F
(f
n
, f) = 0 whenever F < . But [f
n
f[ F
nN
converges to 0 a.e.
and is dominated by the integrable function F, so by Lebesgues Dominated Convergence Theorem
lim
n

F
(f
n
, f) = lim
n
_
[f
n
f[ F = 0. QQQ
(b) To formulate a corresponding result applicable to L
0
, we need the following concept. If f
n

nN
, g
n

nN
are
sequences in L
0
such that f

n
= g

n
for every n, and f, g L
0
are such that f

= g

, and f
n

nN
f a.e., then
g
n

nN
g a.e., because
174 Function spaces 245Cb
x : x domf domg

nN
domf
n

nN
g
n
,
g(x) = f(x) = lim
n
f
n
(x), f
n
(x) = g
n
(x) n N
is conegligible. Consequently we have a denition applicable to sequences in L
0
; we can say that, for f, f
n
L
0
,
f

nN
is order*-convergent, or order*-converges, to f

i f =
a.e.
lim
n
f
n
. In this case, of course, f
n

nN
f
in measure. Thus, in L
0
, a sequence u
n

nN
which order*-converges to u L
0
also converges to u in measure.
Remark I suggest alternative descriptions of order-convergence in 245Xc; the conditions (iii)-(vi) there are in forms
adapted to more general structures.
(c) For a typical example of a sequence which is convergent in measure without being order-convergent, consider
the following. Take to be Lebesgue measure on [0, 1], and set f
n
(x) = 2
m
if x [2
m
k, 2
m
(k + 1)], 0 otherwise,
where k = k(n) N, m = m(n) N are dened by saying that n + 1 = 2
m
+ k and 0 k < 2
m
. Then f
n

nN
0
for the topology of convergence in measure (since
F
(f
n
, 0) 2
m
if F [0, 1] is measurable and 2
m
1 n), though
f
n

nN
is not convergent to 0 almost everywhere (indeed, limsup
n
f
n
= everywhere).
245D Proposition Let (X, , ) be any measure space.
(a) The topology of convergence in measure is a linear space topology on L
0
= L
0
().
(b) The maps , : L
0
L
0
L
0
, and u [u[, u u
+
, u u

: L
0
L
0
are all continuous.
(c) The map : L
0
L
0
L
0
is continuous.
(d) For any continuous function h : R R, the corresponding function

h : L
0
L
0
(241I) is continuous.
proof (a) The point is that the functionals
F
, as dened in 245Ac, satisfy the conditions of 2A5B below. PPP Fix a
set F of nite measure. I noted in 245Aa that

F
(f +g)
F
(f) +
F
(g) for all f, g L
0
,
so

F
(u +v)
F
(u) +
F
(v) for all u, v L
0
.
Next,

F
(cu)
F
(u) whenever u L
0
and [c[ 1 (*)
because [cf[ F
a.e.
[f[ F whenever f L
0
and [c[ 1. Finally, given u L
0
and > 0, let f L
0
be such that
f

= u. Then
lim
n
[2
n
f[ F = 0 a.e.,
so by Lebesgues Dominated Convergence Theorem
lim
n

F
(2
n
u) = lim
n
_
[2
n
f[ F = 0,
and there is an n such that
F
(2
n
u) . It follows (by (*) just above) that
F
(cu) whenever [c[ 2
n
. As is
arbitrary, lim
c0

F
(u) = 0 for every u L
0
; which is the third condition in 2A5B. QQQ
Now 2A5B tells us that the topology dened by the
F
is a linear space topology.
(b) For any u, v L
0
, [[u[ [v[[ [u v[, so
F
([u[, [v[)
F
(u, v) for every set F of nite measure. By 2A3H,
[ [ : L
0
L
0
is continuous. Now
u v =
1
2
(u +v +[u v[), u v =
1
2
(u +v [u v[),
u
+
= u 0, u

= (u) 0.
As addition and subtraction are continuous, so are , ,
+
and

.
(c) Take u
0
, v
0
L
0
, F a set of nite measure and > 0. Represent u
0
and v
0
as f

0
, g

0
respectively, where f
0
,
g
0
: X R are -measurable (241Bk). If we set
F
m
= x : x F, [f
0
(x)[ +[g
0
(x)[ m,
245E Convergence in measure 175
then F
m

mN
is a non-decreasing sequence of sets with union F, so there is an m N such that (F F
m
)
1
2
. Let
> 0 be such that (2m+F)
2
+ 2
1
2
.
Now suppose that u, v L
0
are such that
F
(u, u
0
)
2
and
F
(v, v
0
)
2
. Let f, g : X R be measurable
functions such that f

= u and v

= v. Then
x : x F, [f(x) f
0
(x)[ , x : x F, [g(x) g
0
(x)[ ,
so that
x : x F, [f(x) f
0
(x)[[g(x) g
0
(x)[
2
2
and
_
F
min(1, [f f
0
[ [g g
0
[)
2
F + 2.
Also
[f g f
0
g
0
[ [f f
0
[ [g g
0
[ +[f
0
[ [g g
0
[ +[f f
0
[ [g
0
[,
so that

F
(u v, u
0
v
0
) =
_
F
min(1, [f g f
0
g
0
[)

1
2
+
_
F
m
min(1, [f g f
0
g
0
[)

1
2
+
_
F
m
min(1, [f f
0
[ [g g
0
[ +m[g g
0
[ +m[f f
0
[)

1
2
+
_
F
min(1, [f f
0
[ [g g
0
[)
+m
_
F
min(1, [g g
0
[) +m
_
F
min(1, [f f
0
[)

1
2
+
2
F + 2 + 2m
2
.
As F and are arbitrary, is continuous at (u
0
, v
0
); as u
0
and v
0
are arbitrary, is continuous.
(d) Take u L
0
, F of nite measure and > 0. Then there is a > 0 such that
F
(

h(v),

h(u)) whenever

F
(v, u) . PPP??? Otherwise, we can nd, for each n N, a v
n
such that
F
(v
n
, u) 4
n
but
F
(

h(v
n
),

h(u)) > .
Express u as f

and v
n
as g

n
where f, g
n
: X R are measurable. Set
E
n
= x : x F, [g
n
(x) f(x)[ 2
n

for each n. Then


F
(v
n
, u) 2
n
E
n
, so E
n
2
n
for each n, and E =

nN

mn
E
m
is negligible. But
lim
n
g
n
(x) = f(x) for every x F E, so (because h is continuous) lim
n
h(g
n
(x)) = h(f(x)) for every x F E.
Consequently (by Lebesgues Dominated Convergence Theorem, as always)
lim
n

F
(

h(v
n
),

h(u)) = lim
n
_
F
min(1, [h(g
n
(x)) h(f(x))[(dx) = 0,
which is impossible. XXXQQQ
By 2A3H,

h is continuous.
Remark I cannot say that the topology of convergence in measure on L
0
is a linear space topology solely because (on
the denitions I have chosen) L
0
is not in general a linear space.
245E I turn now to the principal theorem relating the properties of the topological linear space L
0
() to the
classication of measure spaces in Chapter 21.
Theorem Let (X, , ) be a measure space. Let T be the topology of convergence in measure on L
0
= L
0
(), as
described in 245A.
(a) (X, , ) is semi-nite i T is Hausdor.
(b) (X, , ) is -nite i T is metrizable.
(c) (X, , ) is localizable i T is Hausdor and L
0
is complete under T.
176 Function spaces 245E
proof I use the pseudometrics
F
on L
0
= L
0
(),
F
on L
0
described in 245A.
(a)(i) Suppose that (X, , ) is semi-nite and that u, v are distinct members of L
0
. Express them as f

and g

where f and g are measurable functions from X to R. Then x : f(x) ,= g(x) > 0 so, because (X, , ) is semi-nite,
there is a set F of nite measure such that x : x F, f(x) ,= g(x) > 0. Now

F
(u, v) =
_
F
min([f(x) g(x)[, 1)dx > 0
(see 122Rc). As u and v are arbitrary, T is Hausdor (2A3L).
(ii) Suppose that T is Hausdor and that E , E > 0. Then u = E

,= 0 so there is an F such that


F < and
F
(u, 0) ,= 0, that is, (E F) > 0. Now E F is a non-negligible set of nite measure included in E.
As E is arbitrary, (X, , ) is semi-nite.
(b)(i) Suppose that (X, , ) is -nite. Let E
n

nN
be a non-decreasing sequence of sets of nite measure covering
X. Set
(u, v) =

n=0

E
n
(u, v)
1 + 2
n
E
n
for u, v L
0
. Then is a metric on L
0
. PPP Because every
E
n
is a pseudometric, so is . If (u, v) = 0, express u as
f

, v as g

where f, g L
0
(); then
_
[f g[ E
n
=
E
n
(u, v) = 0,
so f = g almost everywhere in E
n
, for every n. Because X =

nN
E
n
, f =
a.e.
g and u = v. QQQ
If F and F < and > 0, take n such that (F E
n
)
1
2
. If u, v L
0
and (u, v) /2(1 +2
n
E
n
), then

F
(u, v) . PPP Express u as f

, v = g

where f, g L
0
. Then
_
[u v[ E
n
=
E
n
(u, v) (1 + 2
n
E
n
) (u, v)

2
,
while
_
[f g[ (F E
n
) (F E
n
)

2
,
so

F
(u, v) =
_
[f g[ F
_
[f g[ E
n
+
_
[f g[ (F E
n
)

2
+

2
= . QQQ
In the other direction, given > 0, take n N such that 2
n

1
2
; then (u, v) whenever
E
n
(u, v) /2(n+1).
These show that denes the same topology as the
F
(2A3Ib), so that T, the topology dened by the
F
, is
metrizable.
(ii) Suppose that T is metrizable. Let be a metric dening T. For each n N there must be a measurable set
F
n
of nite measure and a
n
> 0 such that

F
n
(u, 0)
n
= (u, 0) 2
n
.
Set E = X

nN
F
n
. ??? If E is not negligible, then u = E

,= 0; because is a metric, there is an n N such that


(u, 0) > 2
n
; now
(E F
n
) =
F
n
(u, 0) >
n
.
But E F
n
= . XXX
So E = 0 < . Now X = E

nN
F
n
is a countable union of sets of nite measure, so is -nite.
(c) By (a), either hypothesis ensures that is semi-nite and that T is Hausdor.
(i) Suppose that (X, , ) is localizable. Let T be a Cauchy lter on L
0
(2A5F). For each measurable set F of
nite measure, choose a sequence A
n
(F)
nN
of members of T such that
F
(u, v) 4
n
for every u, v A
n
(F) and
every n (2A5G). Choose u
Fn

kn
A
n
(F) for each n; then
F
(u
F,n+1
, u
Fn
) 2
n
for each n. Express each u
Fn
as
f

Fn
where f
Fn
is a measurable function from X to R. Then
x : x F, [f
F,n+1
(x) f
Fn
(x)[ 2
n
2
n

F
(u
F,n+1
, u
Fn
) 2
n
for each n. It follows that f
Fn

nN
must converge almost everywhere in F. PPP Set
H
n
= x : x F, [f
F,n+1
(x) f
Fn
(x)[ 2
n
.
245E Convergence in measure 177
Then H
n
2
n
for each n, so
(

nN

mn
H
m
) inf
nN

m=n
2
m
= 0.
If x F

nN

mn
H
m
, then there is some k such that x F

mk
H
m
, so that [f
F,m+1
(x) f
Fm
(x)[ 2
m
for
every m k, and f
Fn
(x)
nN
is Cauchy, therefore convergent. QQQ
Set f
F
(x) = lim
n
f
Fn
(x) for every x F such that the limit is dened in R, so that f
F
is measurable and dened
almost everywhere in F.
If E, F are two sets of nite measure and E F, then
E
(u
En
, u
Fn
) 2 4
n
for each n. PPP A
n
(E) and A
n
(F)
both belong to T, so must have a point w in common; now

E
(u
En
, u
Fn
)
E
(u
En
, w) +
E
(w, u
Fn
)

E
(u
En
, w) +
F
(w, u
Fn
) 4
n
+ 4
n
. QQQ
Consequently
x : x E, [f
Fn
(x) f
En
(x)[ 2
n
2
n

E
(u
Fn
, u
En
) 2
n+1
for each n, and lim
n
f
Fn
f
En
= 0 almost everywhere in E; so that f
E
= f
F
a.e. on E.
Consequently, if E and F are any two sets of nite measure, f
E
= f
F
a.e. on E F, because both are equal almost
everywhere on E F to f
EF
.
Because is localizable, it follows that there is an f L
0
such that f = f
E
a.e. on E for every measurable set E of
nite measure (213N). Consider u = f

L
0
. For any set E of nite measure,

E
(u, u
En
) =
_
E
min(1, [f(x) f
En
(x)[)dx =
_
E
min(1, [f
E
(x) f
En
(x)[)dx 0
as n , using Lebesgues Dominated Convergence Theorem. Now
inf
AF
sup
vA

E
(v, u) inf
nN
sup
vA
En

E
(v, u)
inf
nN
sup
vA
En
(
E
(v, u
En
) +
E
(u, u
En
))
inf
nN
(4
n
+
E
(u, u
En
)) = 0.
As E is arbitrary, T u. As T is arbitrary, L
0
is complete under T.
(ii) Now suppose that L
0
is complete under T and let c be any family of sets in . Let c

be

c
0
: c
0
is a nite subset of c.
Then the union of any two members of c

belongs to c

. Let T be the set


A : A L
0
, A A
E
for some E c

,
where for E c

I write
A
E
= F

: F c

, F E.
Then T is a lter on L
0
, because A
E
A
F
= A
EF
for all E, F c

.
In fact T is a Cauchy lter. PPP Let H be any set of nite measure and > 0. Set = sup
EE
(H E) and take
E c

such that (H E) . Consider A


E
T. If F, G c

and F E, G E then

H
(F

, G

) = (H (FG)) = (H (F G)) (H F G)
(H E) .
Thus
H
(u, v) for all u, v A
E
. As H and are arbitrary, T is Cauchy. QQQ
Because L
0
is complete under T, T has a limit w say. Express w as h

, where h : X R is measurable, and consider


G = x : h(x) >
1
2
.
??? If E c and (EG) > 0, let F EG be a set of non-zero nite measure. Then u :
F
(u, w) <
1
2
F belongs
to T, so meets A
E
; let H c

be such that E H and


F
(H

, w) <
1
2
F. Then
_
F
min(1, [1 h(x)[) =
F
(H

, w) <
1
2
F;
but because F G = , 1 h(x)
1
2
for every x F, so this is impossible. XXX
Thus E G is negligible for every E c.
178 Function spaces 245E
Now suppose that H and (G H) > 0. Then there is an E c such that (E H) > 0. PPP Let F G H
be a set of non-zero nite measure. Let u A

be such that
F
(u, w) <
1
2
F. Then u is of the form C

for some
C c

, and
_
F
min(1, [C(x) h(x)[) <
1
2
F.
As h(x)
1
2
for every x F, (C F) > 0. But C is a nite union of members of c, so there is an E c such that
(E F) > 0, and now (E H) > 0. QQQ
As H is arbitrary, G is an essential supremum of c in . As c is arbitrary, (X, , ) is localizable.
245F Alternative description of the topology of convergence in measure Let us return to arbitrary measure
spaces (X, , ).
(a) For any F of nite measure and > 0 dene
F
: L
0
[0, [ by

F
(f) =

x : x F domf, [f(x)[ >


for f L
0
, taking

to be the outer measure dened from (132B). If f, g L


0
and f =
a.e.
g, then
x : x F domf, [f(x)[ > x : x F domg, [g(x)[ >
is negligible, so
F
(f) =
F
(g); accordingly we have a functional from L
0
to [0, [, given by

F
(u) =
F
(f)
whenever f L
0
and u = f

L
0
.
(b) Now
F
is not (except in trivial cases) subadditive, so does not dene a pseudometric on L
0
or L
0
. But we can
say that, for f L
0
,

F
(f) min(1, ) =
F
(f) =
F
(f) (1 +F).
(The point is that if E domf is a measurable conegligible set such that fE is measurable, then

F
(f) =
_
EF
min(f(x), 1)dx,
F
(f) = x : x E F, f(x) > .)
This means that a set G L
0
is open for the topology of convergence in measure i for every f G we can nd a
set F of nite measure and , > 0 such that

F
(g f) = g G.
Of course
F
(f)
F
(f) whenever , so we can equally say: G L
0
is open for the topology of convergence in
measure i for every f G we can nd a set F of nite measure and > 0 such that

F
(g f) = g G.
Similarly, G L
0
is open for the topology of convergence in measure on L
0
i for every u G we can nd a set F of
nite measure and > 0 such that

F
(v u) = v G.
(c) It follows at once that a sequence f
n

nN
in L
0
= L
0
() converges in measure to f L
0
i
lim
n

x : x F domf domf
n
, [f
n
(x) f(x)[ > = 0
whenever F , F < and > 0. Similarly, a sequence u
n

nN
in L
0
converges in measure to u i lim
n

F
(u
u
n
) = 0 whenever F < and > 0.
(d) In particular, if (X, , ) is totally nite, f
n

nN
f in L
0
i
lim
n

x : x domf domf
n
, [f(x) f
n
(x)[ > = 0
for every > 0, and u
n

nN
u in L
0
i
lim
n

X
(u u
n
) = 0
for every > 0.
245G Embedding L
p
in L
0
: Proposition Let (X, , ) be any measure space. Then for any p [1, ], the
embedding of L
p
= L
p
() in L
0
= L
0
() is continuous for the norm topology of L
p
and the topology of convergence in
measure on L
0
.
245I Convergence in measure 179
proof Suppose that u, v L
p
and that F < . Then (F)

belongs to L
q
, where q = p/(p 1) (taking q = 1 if
p = , q = if p = 1 as usual), and

F
(u, v)
_
[u v[ (F)

|u v|
p
|F

|
q
(244Eb). By 2A3H, this is enough to ensure that the embedding u u : L
p
L
0
is continuous.
245H The case of L
1
is so important that I go farther with it.
Proposition Let (X, , ) be a measure space.
(a)(i) If f L
1
= L
1
() and > 0, there are a > 0 and a set F of nite measure such that
_
[f g[
whenever g L
1
,
_
[g[
_
[f[ + and
F
(f, g) .
(ii) For any sequence f
n

nN
in L
1
and any f L
1
, lim
n
_
[f f
n
[ = 0 i f
n

nN
f in measure and
limsup
n
_
[f
n
[
_
[f[.
(b)(i) If u L
1
= L
1
() and > 0, there are a > 0 and a set F of nite measure such that |u v|
1

whenever v L
1
, |v|
1
|u|
1
+ and
F
(u, v) .
(ii) For any sequence u
n

nN
in L
1
and any u L
1
, u
n

nN
u for | |
1
i u
n

nN
u in measure and
limsup
n
|u
n
|
1
|u|
1
.
proof (a)(i) We know that there are a set F of nite measure and an > 0 such that
_
E
[f[
1
5
whenever
(E F) (225A). Take > 0 such that ( +5F) and
1
5
. Then if
_
[g[
_
[f[ + and
F
(f, g) , let
G domf domg be a conegligible measurable set such that fG and gG are both measurable. Set
E = x : x F G, [f(x) g(x)[

+5F
;
then

F
(f, g)

+5F
E,
so E . Set H = F E, so that (F H) and
_
X\H
[f[
1
5
. On the other hand, for almost every x H,
[f(x) g(x)[

+5F
, so
_
H
[f g[
1
5
and
_
H
[g[
_
H
[f[
1
5

_
[f[
_
X\H
[f[
1
5

_
[f[
2
5
.
Since
_
[g[
_
[f[ +
1
5
,
_
X\H
[g[
3
5
. Now this means that
_
[g f[
_
X\H
[g[ +
_
X\H
[f[ +
_
H
[g f[
3
5
+
1
5
+
1
5
= ,
as required.
(ii) If lim
n
_
[f f
n
[ = 0, that is, lim
n
f

n
= f

in L
1
, then by 245G we must have f

nN
f

in L
0
,
that is, f
n

nN
f for the topology of convergence in measure. Also, of course, lim
n
_
[f
n
[ =
_
[f[.
In the other direction, if limsup
n
_
[f
n
[
_
[f[ and f
n

nN
f for the topology of convergence in measure,
then whenever > 0 and F < there must be an m N such that
_
[f
n
[
_
[f[ +,
F
(f, f
n
) for every n m;
so (i) tells us that lim
n
_
[f
n
f[ = 0.
(b) This now follows immediately if we express u as f

, v as g

and u
n
as f

n
.
245I Remarks (a) I think the phenomenon here is so important that it is worth looking at some elementary
examples.
(i) If is counting measure on N, and we set f
n
(n) = 1, f
n
(i) = 0 for i ,= n, then f
n

nN
0 in measure, while
_
[f
n
[ = 1 for every n.
(ii) If is Lebesgue measure on [0, 1], and we set f
n
(x) = 2
n
for 0 < x 2
n
, 0 for other x, then again f
n

nN
0
in measure, while
_
[f
n
[ = 1 for every n.
(iii) In 245Cc we have another sequence f
n

nN
converging to 0 in measure, while
_
[f
n
[ = 1 for every n. In all these
cases (as required by Fatous Lemma, at least in (i) and (ii)) we have
_
[f[ liminf
n
_
[f
n
[. (The next proposition
shows that this applies to any sequence which is convergent in measure.)
The common feature of these examples is the way in which somehow the f
n
escape to innity, either laterally (in
(i)) or vertically (in (iii)) or both (in (ii)). Note that in all three examples we can set f

n
= 2
n
f
n
to obtain a sequence
still converging to 0 in measure, but with lim
n
_
[f

n
[ = .
180 Function spaces 245Ib
(b) In 245H, I have used the explicit formulations lim
n
_
[f
n
f[ = 0 (for sequences of functions), u
n

nN
u
for | |
1
(for sequences in L
1
). These are often expressed by saying that f
n

nN
, u
n

nN
are convergent in mean
to f, u respectively.
245J For semi-nite spaces we have a further relationship.
Proposition Let (X, , ) be a semi-nite measure space. Write L
0
= L
0
(), etc.
(a)(i) For any a 0, the set f : f L
1
,
_
[f[ a is closed in L
0
for the topology of convergence in measure.
(ii) If f
n

nN
is a sequence in L
1
which is convergent in measure to f L
0
, and liminf
n
_
[f
n
[ < , then f
is integrable and
_
[f[ liminf
n
_
[f
n
[.
(b)(i) For any a 0, the set u : u L
1
, |u|
1
a is closed in L
0
for the topology of convergence in measure.
(ii) If u
n

nN
is a sequence in L
1
which is convergent in measure to u L
0
, and liminf
n
|u
n
|
1
< , then
u L
1
and |u|
1
liminf
n
|u
n
|
1
.
proof (a)(i) Set A = f : f L
1
,
_
[f[ a, and let g be any member of the closure of A in L
0
. Let h be any simple
function such that 0 h
a.e.
[g[, and > 0. If h = 0 then of course
_
h a. Otherwise, setting F = x : h(x) > 0
and M = sup
xX
h(x), there is an f A such that

x : x F domf domg, [f(x) g(x)[ (245F);


let E x : x F domf domg, [f(x) g(x)[ be a measurable set of measure at most . Then h
a.e.
[f[ + F + ME, so
_
h a + (M + F). As is arbitrary,
_
h a. But we are supposing that is semi-nite,
so this is enough to ensure that g is integrable and that
_
[g[ a (213B), that is, that g A. As g is arbitrary, A is
closed.
(ii) Now if f
n

nN
is convergent in measure to f, and liminf
n
_
[f
n
[ = a, then for any > 0 there is
a subsequence f
n(k)

kN
such that
_
[f
n(k)
[ a + for every k; since f
n(k)

kN
still converges in measure to f,
_
[f[ a +. As is arbitrary,
_
[f[ a.
(b) As in 245H, this is just a translation of part (a) into the language of L
1
and L
0
.
245K For -nite measure spaces, the topology of convergence in measure on L
0
is metrizable, so can be described
eectively in terms of convergent sequences; it is therefore important that we have, in this case, a sharp characterisation
of sequential convergence in measure.
Proposition Let (X, , ) be a -nite measure space. Then
(a) a sequence f
n

nN
in L
0
converges in measure to f L
0
i every subsequence of f
n

nN
has a sub-subsequence
converging to f almost everywhere;
(b) a sequence u
n

nN
in L
0
converges in measure to u L
0
i every subsequence of u
n

nN
has a sub-subsequence
which order*-converges to u.
proof (a)(i) Suppose that f
n

nN
f, that is, that lim
n
_
[f f
n
[ F = 0 for every set F of nite measure. Let
E
k

nN
be a non-decreasing sequence of sets of nite measure covering X, and let n(k)
kN
be a strictly increasing
sequence in N such that
_
[f f
n(k)
[ E
k
4
k
for every k N. Then

k=0
[f f
n(k)
[ E
k
is nite almost
everywhere (242E); but lim
k
f
n(k)
(x) = f(x) whenever

k=0
min(1, [f(x) f
n(k)
(x)[) < , so f
n(k)

kN
f a.e.
(ii) The same applies to every subsequence of f
n

nN
, so that every subsequence of f
n

nN
has a sub-subsequence
converging to f almost everywhere.
(iii) Now suppose that f
n

nN
does not converge to f. Then there is an open set U containing f such that
n : f
n
/ U is innite, that is, f
n

nN
has a subsequence f

nN
with f

n
/ U for every n. But now no sub-
subsequence of f

nN
converges to f in measure, so no such sub-subsequence can converge almost everywhere, by
245Ca.
(b) This follows immediately from (a) if we express u as f

, u
n
as f

n
.
245L Corollary Let (X, , ) be a -nite measure space.
(a) A subset A of L
0
= L
0
() is closed for the topology of convergence in measure i f A whenever f L
0
and
there is a sequence f
n

nN
in A such that f =
a.e.
lim
n
f
n
.
(b) A subset A of L
0
= L
0
() is closed for the topology of convergence in measure i u A whenever u L
0
and
there is a sequence u
n

nN
in A order*-converging to u.
proof (a)(i) If A is closed for the topology of convergence in measure, and f
n

nN
is a sequence in A converging to
f almost everywhere, then f
n

nN
converges to f in measure, so surely f A (since otherwise all but nitely many
of the f
n
would have to belong to the open set L
0
A).
245Xi Convergence in measure 181
(ii) If A is not closed, there is an f A A. The topology can be dened by a metric (245Eb), and we can
choose a sequence f
n

nN
in A such that (f
n
, f) 2
n
for every n, so that f
n

nN
f in measure. By 245K,
f
n

nN
has a subsequence f

nN
converging a.e. to f, and this witnesses that A fails to satisfy the condition.
(b) This follows immediately, because A L
0
is closed i f : f

A is closed in L
0
.
245M Complex L
0
In 241J I briey discussed the adaptations needed to construct the complex linear space L
0
C
.
The formulae of 245A may be used unchanged to dene topologies of convergence in measure on L
0
C
and L
0
C
. I think
that every word of 245B-245L still applies if we replace each L
0
or L
0
with L
0
C
or L
0
C
. Alternatively, to relate the real
and complex forms of 245E, for instance, we can observe that because
max(
F
(!e(u), !e(v)),
F
(Jm(u), Jm(v)))
F
(u, v)

F
(!e(u), !e(v)) +
F
(Jm(u), Jm(v))
for all u, v L
0
and all sets F of nite measure, L
0
C
can be identied, as uniform space, with L
0
L
0
, so is Hausdor,
or metrizable, or complete i L
0
is.
245X Basic exercises >>>(a) Let X be any set, and counting measure on X. Show that the topology of convergence
in measure on L
0
() = R
X
is just the product topology on R
X
regarded as a product of copies of R.
>>>(b) Let (X, , ) be any measure space, and (X,

, ) its completion. Show that the topologies of convergence in
measure on L
0
() = L
0
( ) (241Xb), corresponding to the families
F
: F , F < ,
F
: F

, F < are
the same.
>>>(c) Let (X, , ) be any measure space; set L
0
= L
0
(). Let u, u
n
L
0
for n N. Show that the following are
equiveridical:
(i) u
n

nN
order*-converges to u in the sense of 245C;
(ii) there are measurable functions f, f
n
: X R such that f

= u, f

n
= u
n
for every n N, and f(x) =
lim
n
f
n
(x) for every x X;
(iii) u = inf
nN
sup
mn
u
m
= sup
nN
inf
mn
u
m
, the inma and suprema being taken in L
0
;
(iv) inf
nN
sup
mn
[u u
m
[ = 0 in L
0
;
(v) there is a non-increasing sequence v
n

nN
in L
0
such that inf
nN
v
n
= 0 in L
0
and u v
n
u
n
u +v
n
for
every n N;
(vi) there are sequences v
n

nN
, w
n

nN
in L
0
such that v
n

nN
is non-decreasing, w
n

nN
is non-increasing,
sup
nN
v
n
= u = inf
nN
w
n
and v
n
u
n
w
n
for every n N.
(d) Let (X, , ) be a semi-nite measure space. Show that a sequence u
n

nN
in L
0
= L
0
() is order*-convergent
to u L
0
i [u
n
[ : n N is bounded above in L
0
and sup
mn
[u
m
u[
nN
0 for the topology of convergence in
measure.
(e) Write out proofs that L
0
() is complete (as linear topological space) adapted to the special cases (i) X = 1
(ii) is -nite, taking advantage of any simplications you can nd.
(f ) Let (X, , ) be a measure space and r 1; let h : R
r
R be a continuous function. (i) Suppose that for 1 k
r we are given a sequence f
kn

nN
in L
0
= L
0
() converging in measure to f
k
L
0
. Show that h(f
1n
, . . . , f
kn
)
nN
converges in measure to h(f
1
, . . . , f
k
). (ii) Generally, show that (f
1
, . . . , f
r
) h(f
1
, . . . , f
r
) : (L
0
)
r
L
0
is continuous
for the topology of convergence in measure. (iii) Show that the corresponding function

h : (L
0
)
r
L
0
(241Xh) is
continuous for the topology of convergence in measure.
(g) Let (X, , ) be a measure space and u L
1
(). Show that v
_
uv : L

R is continuous for the topology


of convergence in measure on the unit ball of L

, but not, as a rule, on the whole of L

.
(h) Let (X, , ) be a measure space and v a non-negative member of L
1
= L
1
(). Show that on the set A = u :
u L
1
, [u[ v the subspace topologies (2A3C) induced by the norm topology of L
1
and the topology of convergence
in measure are the same. (Hint: given > 0, take F of nite measure and M 0 such that
_
([v[ MF

)
+
.
Show that |u u

|
1
+M
F
(u, u

) for all u, u

A.)
(i) Let (X, , ) be a measure space and T a lter on L
1
= L
1
() which is convergent, for the topology of convergence
in measure, to u L
1
. Show that T u for the norm topology of L
1
i inf
AF
sup
vA
|v|
1
|u|
1
.
182 Function spaces 245Xj
(j) Let (X, , ) be a measure space and p [1, [. Suppose that u
n

nN
is a sequence in L
p
() which converges
for | |
p
to u L
p
(). Show that [u
n
[
p

nN
[u[
p
for | |
1
. (Hint: 245G, 245Dd, 245H.)
>>>(k) Let (X, , ) be a semi-nite measure space and p [1, ], a 0. Show that u : u L
p
(), |u|
p
a is
closed in L
0
() for the topology of convergence in measure.
(l) Let (X, , ) be a measure space, and u
n

nN
a sequence in L
p
= L
p
(), where 1 p < . Let u L
p
. Show
that the following are equiveridical: (i) u = lim
n
u
n
for the norm topology of L
p
(ii) u
n

nN
u for the topology
of convergence in measure and lim
n
|u
n
|
p
= |u|
p
(iii) u
n

nN
u for the topology of convergence in measure
and limsup
n
|u
n
|
p
|u|
p
.
(m) Let X be a set and , two measures on X with the same measurable sets and the same negligible sets. (i)
Show that L
0
() = L
0
() and L
0
() = L
0
(). (ii) Show that if both and are semi-nite, then they dene the same
topology of convergence in measure on L
0
and L
0
. (Hint: use 215A to show that if E < then E = supF : F
E, F < .)
245Y Further exercises (a) Let (X, , ) be a measure space and give the topology described in 232Ya. Show
that : L
0
() is a homeomorphism between and its image [] in L
0
, if L
0
is given the topology of convergence
in measure and [] the subspace topology.
(b) Let (X, , ) be a measure space and Y any subset of X; let
Y
be the subspace measure on Y . Let T :
L
0
() L
0
(
Y
) be the canonical map dened by setting T(f

) = (fY )

for every f L
0
() (241Yg). Show that T
is continuous for the topologies of convergence in measure on L
0
() and L
0
(
Y
).
(c) Let (X, , ) be a measure space, and the c.l.d. version of . Show that the map T : L
0
() L
0
( ) induced
by the inclusion L
0
() L
0
( ) (241Yf) is continuous for the topologies of convergence in measure.
(d) Let (X, , ) be a measure space, and give L
0
= L
0
() the topology of convergence in measure. Let A L
0
be
a non-empty downwards-directed set, and suppose that inf A = 0 in L
0
. (i) Let F be any set of nite measure,
and dene
F
as in 245A; show that inf
uA

F
(u) = 0. (Hint: set = inf
uA

F
(u); nd a non-increasing sequence
u
n

nN
in A such that lim
n

F
(u
n
) = ; set v = (F)

inf
nN
u
n
and show that u v = v for every u A, so
that v = 0.) (ii) Show that if U is any open set containing 0, there is a u A such that v U whenever 0 v u.
(e) Let (X, , ) be a measure space. (i) Show that for u L
0
= L
0
() we may dene
a
(u), for a 0, by setting

a
(u) = x : [f(x)[ a whenever f : X R is a measurable function and f

= u. (ii) Dene : L
0
L
0
[0, 1]
by setting (u, v) = min(1 a : a 0,
a
(u v) a. Show that is a metric on L
0
, that L
0
is complete under
, and that +, , , : L
0
L
0
L
0
are continuous for . (iii) Show that c cu : R L
0
is continuous for every
u L
0
i (X, , ) is totally nite, and that in this case denes the topology of convergence in measure on L
0
.
(f ) Let (X, , ) be a localizable measure space and A L
0
= L
0
() a non-empty upwards-directed set which is
bounded in the linear topological space sense (i.e., such that for every neighbourhood U of 0 in L
0
there is a k N
such that A kU). Show that A is bounded above in L
0
, and that its supremum belongs to its closure.
(g) Let (X, , ) be a measure space, p [1, [ and v a non-negative member of L
p
= L
p
(). Show that on the set
A = u : u L
p
, [u[ v the subspace topologies induced by the norm topology of L
p
and the topology of convergence
in measure are the same.
(h) Let S be the set of all sequences s : N N such that lim
n
s(n) = . For every s S, let (X
s
,
s
,
s
)
be [0, 1] with Lebesgue measure, and let (X, , ) be the direct sum of (X
s
,
s
,
s
)
sS
(214L). For s S, t [0, 1],
n N set h
n
(s, t) = f
s(n)
(t), where f
n

nN
is the sequence of 245Cc. Show that h
n

nN
0 for the topology of
convergence in measure on L
0
(), but that h
n

nN
has no subsequence which is convergent to 0 almost everywhere.
(i) Let X be a set, and suppose we are given a relation between sequences in X and members of X such that
() if x
n
= x for every n then x
n

nN
x () x

nN
x whenever x
n

nN
x and x

nN
is a subsequence
of x
n

nN
. Show that we have a topology T on X dened by saying that a subset G of X belongs to T i whenever
x
n

nN
is a sequence in X and x
n

nN
x G then some x
n
belongs to G. Show that a sequence x
n

nN
in X is
T-convergent to x i every subsequence of x
n

nN
has a sub-subsequence x

nN
such that x

nN
x.
246Bb Uniform integrability 183
(j) Let be Lebesgue measure on R
r
. Show that L
0
() is separable for the topology of convergence in measure.
(Hint: 244I.)
245 Notes and comments In this section I am inviting you to regard the topology of (local) convergence in measure
as the standard topology on L
0
, just as the norms dene the standard topologies on L
p
spaces for p 1. The denition
I have chosen is designed to make addition and scalar multiplication and the operations , and continuous (245D);
see also 245Xf. From the point of view of functional analysis these properties are more important than metrizability
or even completeness.
Just as the algebraic and order structure of L
0
can be described in terms of the general theory of Riesz spaces,
the more advanced results 241G and 245E also have interpretations in the general theory. It is not an accident that
(for semi-nite measure spaces) L
0
is Dedekind complete i it is complete as uniform space; you may nd the relevant
generalizations in 23K and 24E of Fremlin 74. Of course it is exactly because the two kinds of completeness are
interrelated that I feel it necessary to use the phrase Dedekind completeness to distinguish this particular kind of
order-completeness from the more familiar uniformity-completeness described in 2A5F.
The usefulness of the topology of convergence in measure derives in large part from 245G-245J and the L
p
versions
245Xk and 245Xl. Some of the ideas here can be related to a question arising out of the basic convergence theorems.
If f
n

nN
is a sequence of integrable functions converging (pointwise) to a function f, in what ways can
_
f fail to be
lim
n
_
f
n
? In the language of this section, this translates into: if we have a sequence (or lter) in L
1
converging
for the topology of convergence in measure, in what ways can it fail to converge for the norm topology of L
1
? The
rst answer is Lebesgues Dominated Convergence Theorem: this cannot happen if the sequence is dominated, that
is, lies within some set of the form u : [u[ v where v L
1
. (See 245Xh and 245Yg.) I will return to this in the
next section. For the moment, though, 245H tells us that if u
n

nN
converges in measure to u L
1
, but not for the
topology of L
1
, it is because limsup
n
|u
n
|
1
is too big; some of its weight is being lost at innity, as in the examples
of 245I. If u
n

nN
actually order*-converges to u, then Fatous Lemma tells us that liminf
n
|u
n
|
1
|u|
1
, that
is, that the limit cannot have greater weight (as measured by | |
1
) than the sequence provides. 245J and 245Xk are
generalizations of this to convergence in measure. If you want a generalization of B.Levis theorem, then 242Yf remains
the best expression in the language of this chapter; but 245Yf is a version in terms of the concepts of the present
section.
In the case of -nite spaces, we have an alternative description of the topology of convergence in measure (245L)
which makes no use of any of the functionals or pseudo-metrics in 245A. This can be expressed, at least in the context
of L
0
, in terms of a standard result from general topology (245Yi). You will see that that result gives a recipe for a
topology on L
0
which could be applied in any measure space. What is remarkable is that for -nite spaces we get a
linear space topology.
246 Uniform integrability
The next topic is a fairly specialized one, but it is of great importance, for dierent reasons, in both probability
theory and functional analysis, and it therefore seems worth while giving a proper treatment straight away.
246A Denition Let (X, , ) be a measure space.
(a) A set A L
1
() is uniformly integrable if for every > 0 we can nd a set E , of nite measure, and an
M 0 such that
_
([f[ ME)
+
for every f A.
(b) A set A L
1
() is uniformly integrable if for every > 0 we can nd a set E , of nite measure, and an
M 0 such that
_
([u[ ME

)
+
for every u A.
246B Remarks (a) Recall the formulae from 241Ef: u
+
= u 0, so (u v)
+
= u u v.
(b) The phrase uniformly integrable is not particularly helpful. But of course we can observe that for any particular
integrable function f, there are simple functions approximating f for | |
1
(242M), and such functions will be bounded
(in modulus) by functions of the form ME, with E < ; thus singleton subsets of L
1
and L
1
are uniformly
184 Function spaces 246Bb
integrable. A general uniformly integrable set of functions is one in which M and E can be chosen uniformly over the
set.
(c) It will I hope be clear from the denitions that A L
1
is uniformly integrable i f

: f A L
1
is uniformly
integrable.
(d) There is a useful simplication in the denition if X < (in particular, if (X, , ) is a probability space).
In this case a set A L
1
() is uniformly integrable i
inf
M0
sup
uA
_
([u[ Me)
+
= 0
i
lim
M
sup
uA
_
([u[ Me)
+
= 0,
writing e = X

L
1
(). (For if sup
uA
_
([u[ ME

)
+
, then
_
([u[ M

e)
+
for every M

M.) Similarly,
A L
1
() is uniformly integrable i
lim
M
sup
fA
_
([f[ MX)
+
= 0
i
inf
M0
sup
fA
_
([f[ MX)
+
= 0.
Warning! Some authors use the phrase uniformly integrable for sets satisfying the conditions in (d) even when is
not totally nite.
246C We have the following wide-ranging stability properties of the class of uniformly integrable sets in L
1
or L
1
.
Proposition Let (X, , ) be a measure space and A a uniformly integrable subset of L
1
().
(a) A is bounded for the norm | |
1
.
(b) Any subset of A is uniformly integrable.
(c) For any a R, aA = au : u A is uniformly integrable.
(d) There is a uniformly integrable C A such that C is convex and | |
1
-closed and v C whenever u C and
[v[ [u[.
(e) If B is another uniformly integrable subset of L
1
, then AB and A+B = u +v : u A, v B are uniformly
integrable.
proof Write
f
for E : E , E < .
(a) There must be E
f
, M 0 such that
_
([u[ ME

)
+
1 for every u A; now
|u|
1

_
([u[ ME

)
+
+
_
ME

1 +ME
for every u A, so A is bounded.
(b) This is immediate from the denition 246Ab.
(c) Given > 0, we can nd E
f
, M 0 such that [a[
_
E
([u[ ME

)
+
for every u A; now
_
E
([v[
[a[ME

)
+
for every v aA.
(d) If A is empty, take C = A. Otherwise, try
C = v : v L
1
,
_
([v[ w)
+
sup
uA
_
([u[ w)
+
for every w L
1
().
Evidently A C, and C satises the denition 246Ab because A does, considering w of the form ME

where E
f
and M 0. The functionals
v
_
([v[ w)
+
: L
1
() R
are all continuous for | |
1
(because the operators v [v[, v v w, v v
+
, v
_
v are continuous), so C is closed.
If [v

[ [v[ and v C, then


_
([v

[ w)
+

_
([v[ w)
+
sup
uA
_
([u[ w)
+
for every w, and v

C. If v = av
1
+bv
2
where v
1
, v
2
C, a [0, 1] and b = 1 a, then [v[ a[v
1
[ +b[v
2
[, so
[v[ w (a[v
1
[ aw) + (b[v
2
[ bw) (a[v
1
[ aw)
+
+ (b[v
2
[ bw)
+
and
246Ec Uniform integrability 185
([v[ w)
+
a([v
1
[ w)
+
+b([v
2
[ w)
+
for every w; accordingly
_
([v[ w)
+
a
_
([v
1
[ w)
+
+b
_
([v
2
[ w)
+
(a +b) sup
uA
_
([u[ w)
+
= sup
uA
_
([u[ w)
+
for every w, and v C.
Thus C has all the required properties.
(e) I show rst that A B is uniformly integrable. PPP Given > 0, let M
1
, M
2
0 and E
1
, E
2

f
be such that
_
([u[ M
1
E

1
)
+
for every u A,
_
([u[ M
2
E

2
)
+
for every u B.
Set M = max(M
1
, M
2
), E = E
1
E
2
; then E < and
_
([u[ ME

)
+
for every u A B.
As is arbitrary, A B is uniformly integrable. QQQ
Now (d) tells us that there is a convex uniformly integrable set C including AB, and in this case A+B 2C, so
A+B is also uniformly integrable, using (b) and (c).
246D Proposition Let (X, , ) be a probability space and A L
1
() a uniformly integrable set. Then there is
a convex, | |
1
-closed uniformly integrable set C L
1
such that A C, w C whenever v C and [w[ [v[, and
Pv C whenever v C and P is the conditional expectation operator associated with a -subalgebra of .
proof Set
C = v : v L
1
(),
_
([v[ Me)
+
sup
uA
_
([u[ Me)
+
for every M 0,
writing e = X

as usual. The arguments in the proof of 246Cd make it plain that C A is uniformly integrable,
convex and closed, and that w C whenever v C and [w[ [v[. As for the conditional expectation operators, if
v C, T is a -subalgebra of , P is the associated conditional expectation operator, and M 0, then
[Pv[ P[v[ = P(([v[ Me) + ([v[ Me)
+
) Me +P(([v[ Me)
+
),
so
([Pv[ Me)
+
P(([v[ Me)
+
)
and
_
([Pv[ Me)
+

_
P([v[ Me)
+
=
_
([v[ Me)
+
sup
uA
_
([u[ Me)
+
;
as M is arbitrary, Pv C.
246E Remarks (a) Of course 246D has an expression in terms of L
1
rather than L
1
: if (X, , ) is a probability
space and A L
1
() is uniformly integrable, then there is a uniformly integrable set C A such that (i) af +(1a)g
C whenever f, g C and a [0, 1] (ii) g C whenever f C, g L
0
() and [g[
a.e.
[f[ (iii) f C whenever there
is a sequence f
n

nN
in C such that lim
n
_
[f f
n
[ = 0 (iv) g C whenever there is an f C such that g is a
conditional expectation of f with respect to some -subalgebra of .
(b) In fact, there are obvious extensions of 246D; the proof there already shows that T[C] C whenever T :
L
1
() L
1
() is an order-preserving linear operator such that |Tu|
1
|u|
1
for every u L
1
() and |Tu|

|u|

for every u L
1
() L

() (246Yc). If we had done a bit more of the theory of operators on Riesz spaces I should
be able to take you a good deal farther along this road; for instance, it is not in fact necessary to assume that the
operators T of the last sentence are order-preserving. I will return to this in Chapter 37 in the next volume.
(c) Moreover, the main theorem of the next section will show that for any measure spaces (X, , ), (Y, T, ),
T[A] will be uniformly integrable in L
1
() whenever A L
1
() is uniformly integrable and T : L
1
() L
1
() is a
continuous linear operator (247D).
186 Function spaces 246F
246F We shall need an elementary lemma which I have not so far spelt out.
Lemma Let (X, , ) be a measure space. Then for any u L
1
(),
|u|
1
2 sup
E
[
_
E
u[.
proof Express u as f

where f : X R is measurable. Set F = x : f(x) 0. Then


|u|
1
=
_
[f[ = [
_
F
f[ +[
_
X\F
f[ 2 sup
E
[
_
E
f[ = 2 sup
E
[
_
E
u[.
246G Now we come to some of the remarkable alternative descriptions of uniform integrability.
Theorem Let (X, , ) be any measure space and A a non-empty subset of L
1
(). Then the following are equiveridical:
(i) A is uniformly integrable;
(ii) sup
uA
[
_
F
u[ < for every -atom F , and for every > 0 there are E , > 0 such that E <
and [
_
F
u[ whenever u A, F and (F E) ;
(iii) sup
uA
[
_
F
u[ < for every -atom F , and lim
n
sup
uA
[
_
F
n
u[ = 0 whenever F
n

nN
is a disjoint
sequence in ;
(iv) sup
uA
[
_
F
u[ < for every -atom F , and lim
n
sup
uA
[
_
F
n
u[ = 0 whenever F
n

nN
is a non-
increasing sequence in with empty intersection.
Remark I use the phrase -atom to emphasize that I mean an atom in the measure space sense (211I).
proof (a)(i)(iv) Suppose that A is uniformly integrable. Then surely if F is a -atom,
sup
uA
[
_
F
u[ sup
uA
|u|
1
< ,
by 246Ca. Now suppose that F
n

nN
is a non-increasing sequence in with empty intersection, and that > 0.
Take E , M 0 such that E < and
_
([u[ ME

)
+

1
2
whenever u A. Then for all n large enough,
M(F
n
E)
1
2
, so that
[
_
F
n
u[
_
F
n
[u[
_
([u[ ME

)
+
+
_
F
n
ME



2
+M(F
n
E)
for every u A. As is arbitrary, lim
n
sup
uA
[
_
F
n
u[ = 0, and (iv) is true.
(b)(iv)(iii) Suppose that (iv) is true. Then of course sup
uA
[
_
F
u[ < for every -atom F . ??? Suppose,
if possible, that F
n

nN
is a disjoint sequence in such that = limsup
n
sup
uA
min(1,
1
3
[
_
F
n
u[) > 0. Set
H
n
=

in
F
i
for each n, so that H
n

nN
is non-increasing and has empty intersection, and
_
H
n
u 0 as n for
every u L
1
(). Choose n
i

iN
, m
i

iN
, u
i

iN
inductively, as follows. n
0
= 0. Given n
i
N, take m
i
n
i
, u
i
A
such that [
_
F
m
i
u
i
[ 2. Take n
i+1
> m
i
such that
_
H
n
i+1
[u
i
[ . Continue.
Set G
k
=

ik
F
m
i
for each k. Then G
k

kN
is a non-increasing sequence in with empty intersection. But
F
m
i
G
i
F
m
i
H
n
i+1
, so
[
_
G
i
u
i
[ [
_
F
m
i
u
i
[ [
_
G
i
\F
m
i
u
i
[ 2
_
H
n
i+1
[u
i
[
for every i, contradicting the hypothesis (iv). XXX
This means that lim
n
sup
uA
[
_
F
n
u[ must be zero, and (iii) is true.
(c)(iii)(ii) We still have sup
uA
[
_
F
u[ < for every -atom F. ??? Suppose, if possible, that there is an > 0
such that for every measurable set E of nite measure and every > 0 there are u A, F such that (F E)
and [
_
F
u[ . Choose a sequence E
n

nN
of sets of nite measure, a sequence G
n

nN
in , a sequence
n

nN
of
strictly positive real numbers and a sequence u
n

nN
in A as follows. Given u
k
, E
k
,
k
for k < n, choose u
n
A and
G
n
such that (G
n

k<n
E
k
) 2
n
min(1
k
: k < n) and [
_
G
n
u
n
[ ; then choose a set E
n
of nite
measure and a
n
> 0 such that
_
F
[u
n
[
1
2
whenever F and (F E
n
)
n
(see 225A). Continue.
On completing the induction, set F
n
= E
n
G
n

k>n
G
k
for each n; then F
n

nN
is a disjoint sequence in . By
the choice of G
k
,
(E
n

k>n
G
k
)

k=n+1
2
k

n

n
,
so (E
n
(G
n
F
n
))
n
and
_
G
n
\F
n
[u
n
[
1
2
. This means that [
_
F
n
u
n
[ [
_
G
n
u
n
[
1
2

1
2
. But this is contrary
to the hypothesis (iii). XXX
246I Uniform integrability 187
(d)(ii)(i)() Assume (ii). Let > 0. Then there are E , > 0 such that E < and [
_
F
u[ whenever
u A, F and (F E) . Now sup
uA
_
E
[u[ < . PPP Write J for the family of those F such that
F E and sup
uA
_
F
[u[ is nite. If F E is an atom for , then sup
uA
_
F
[u[ = sup
uA
[
_
F
u[ < , so F J.
(The point is that if f : X R is a measurable function such that f

= u, then one of F

= x : x F, f(x) 0,
F

= x : x F, f(x) < 0 must be negligible, so that


_
F
[u[ is either
_
F

u =
_
F
u or
_
F

u =
_
F
u.) If F ,
F E and F then
sup
uA
_
F
[u[ 2 sup
uA,G,GF
[
_
G
u[ 2
(by 246F), so F J. Next, if F, G J then sup
uA
_
FG
[u[ sup
uA
_
F
[u[ + sup
uA
_
G
[u[ is nite, so F G J.
Finally, if F
n

nN
is any sequence in J, and F =

nN
F
n
, there is some n N such that (F

in
F
i
) ; now

in
F
i
and F

in
F
i
both belong to J, so F J.
By 215Ab, there is an F J such that H F is negligible for every H J. Now observe that E F cannot
include any non-negligible member of J; in particular, cannot include either an atom or a non-negligible set of measure
less than . But this means that the subspace measure on E F is atomless, totally nite and has no non-negligible
measurable sets of measure less than ; by 215D, (E F) = 0 and E F and E belong to J, as required. QQQ
Since
_
X\E
[u[ for every u A, = sup
uA
_
[u[ is nite.
() Set M = /. If u A, express u as f

, where f : X R is measurable, and consider


F = x : f(x) ME(x).
Then
M(F E)
_
F
f =
_
F
u ,
so (F E) /M = . Accordingly
_
F
u . Similarly,
_
F

(u) , writing F

= x : f(x) ME(x). But


this means that
_
([u[ ME

)
+
=
_
([f[ ME)
+

_
FF

[f[ =
_
FF

[u[ 2
for every u A. As is arbitrary, A is uniformly integrable.
246H Remarks (a) Of course conditions (ii)-(iv) of this theorem, like (i), have direct translations in terms of
members of L
1
. Thus a non-empty set A L
1
is uniformly integrable i sup
fA
[
_
F
f[ is nite for every atom F
and
either for every > 0 we can nd E , > 0 such that E < and [
_
F
f[ whenever f A,
F and (F E)
or lim
n
sup
fA
[
_
F
n
f[ = 0 for every disjoint sequence F
n

nN
in
or lim
n
sup
fA
[
_
F
n
f[ = 0 for every non-increasing sequence F
n

nN
in with empty intersection.
(b) There are innumerable further equivalent expressions characterizing uniform integrability; every author has his
own favourite. Many of them are variants on (i)-(iv) of this theorem, as in 246I and 246Yd-246Yf. For a condition of
a quite dierent kind, see Theorem 247C.
246I Corollary Let (X, , ) be a probability space. For f L
0
(), M 0 set F(f, M) = x : x domf, [f(x)[
M. Then a non-empty set A L
1
() is uniformly integrable i lim
M
sup
fA
_
F(f,M)
[f[ = 0.
proof (a) If A satises the condition, then
inf
M0
sup
fA
_
([f[ MX)
+
inf
M0
sup
fA
_
F(f,M)
[f[ = 0,
so A is uniformly integrable.
(b) If A is uniformly integrable, and > 0, there is an M
0
0 such that
_
([f[ M
0
X)
+
for every f A; also,
= sup
fA
_
[f[ is nite (246Ca). Take any M M
0
max(1, (1 +)/). If f A, then
[f[ F(f, M) ([f[ M
0
X)
+
+M
0
F(f, M) ([f[ M
0
X)
+
+

+1
[f[
everywhere on domf, so
_
F(f,M)
[f[
_
([f[ M
0
X)
+
+

+1
_
[f[ 2.
As is arbitrary, lim
M
sup
fA
_
F(f,M)
[f[ = 0.
188 Function spaces 246J
246J The next step is to set out some remarkable connexions between uniform integrability and the topology of
convergence in measure discussed in the last section.
Theorem Let (X, , ) be a measure space.
(a) If f
n

nN
is a uniformly integrable sequence of real-valued functions on X, and f(x) = lim
n
f
n
(x) for almost
every x X, then f is integrable and lim
n
_
[f
n
f[ = 0; consequently
_
f = lim
n
_
f
n
.
(b) If A L
1
= L
1
() is uniformly integrable, then the norm topology of L
1
and the topology of convergence in
measure of L
0
= L
0
() agree on A.
(c) For any u L
1
and any sequence u
n

nN
in L
1
, the following are equiveridical:
(i) u = lim
n
u
n
for | |
1
;
(ii) u
n
: n N is uniformly integrable and u
n

nN
converges to u in measure.
(d) If (X, , ) is semi-nite, and A L
1
is uniformly integrable, then the closure A of A in L
0
for the topology of
convergence in measure is still a uniformly integrable subset of L
1
.
proof (a) Note rst that because sup
nN
_
[f
n
[ < (246Ca) and [f[ = liminf
n
[f
n
[, Fatous Lemma assures us
that [f[ is integrable, with
_
[f[ limsup
n
_
[f
n
[. It follows immediately that f
n
f : n N is uniformly
integrable, being the sum of two uniformly integrable sets (246Cc, 246Ce).
Given > 0, there are M 0, E such that E < and
_
([f
n
f[ ME)
+
for every n N. Also
[f
n
f[ ME 0 a.e., so
limsup
n
_
[f
n
f[ limsup
n
_
([f
n
f[ ME)
+
+ limsup
n
_
[f
n
f[ ME
,
by Lebesgues Dominated Convergence Theorem. As is arbitrary, lim
n
_
[f
n
f[ = 0 and lim
n
_
f
n
f = 0.
(b) Let T
A
, S
A
be the topologies on A induced by the norm topology of L
1
and the topology of convergence in
measure on L
0
respectively.
(i) Given > 0, let F , M 0 be such that F < and
_
([v[ MF

)
+
for every v A, and consider

F
, dened as in 245A. Then for any f, g L
0
,
[f g[ ([f[ MF)
+
+ ([g[ MF)
+
+M([f g[ F)
everywhere on domf domg, so
[u v[ ([u[ MF

)
+
+ ([v[ MF

)
+
+M([u v[ F

)
for all u, v L
0
. Consequently
|u v|
1
2 +M
F
(u, v)
for all u, v A.
This means that, given > 0, we can nd F, M such that, for u, v A,

F
(u, v)

1+M
= |u v|
1
3.
It follows that every subset of A which is open for T
A
is open for S
A
(2A3Ib).
(ii) In the other direction, we have
F
(u, v) |u v|
1
for every u L
1
and every set F of nite measure, so
every subset of A which is open for S
A
is open for T
A
.
(c) If u
n

nN
u for | |
1
, A = u
n
: n N is uniformly integrable. PPP Given > 0, let m be such that
|u
n
u|
1
whenever n m. Set v = [u[ +

im
[u
i
[ L
1
, and let M 0, E be such that E is nite and
_
E
(v ME

)
+
. Then, for w A,
([w[ ME

)
+
([w[ v)
+
+ (v ME

)
+
,
so
_
E
([w[ ME

)
+
|([w[ v)
+
|
1
+
_
E
(v ME

)
+
2. QQQ
Thus on either hypothesis we can be sure that u
n
: n N and A = u u
n
: n N are uniformly integrable,
so that the two topologies agree on A (by (b)) and u
n

nN
converges to u in one topology i it converges to u in the
other.
246Yb Uniform integrability 189
(d) Because A is | |
1
-bounded (246Ca) and is semi-nite, A L
1
(245J(b-i)). Given > 0, let M 0, E
be such that E < and
_
([u[ ME

)
+
for every u A. Now the maps u [u[, u u ME

,
u u
+
: L
0
L
0
are all continuous for the topology of convergence in measure (245D), while u : |u|
1
is
closed for the same topology (245J again), so u : u L
0
,
_
([u[ ME

)
+
is closed and must include A. Thus
_
([u[ ME

)
+
for every u A. As is arbitrary, A is uniformly integrable.
246K Complex L
1
and L
1
The denitions and theorems above can be repeated without diculty for spaces
of (equivalence classes of) complex-valued functions, with just one variation: in the complex equivalent of 246F, the
constant must be changed. It is easy to see that, for u L
1
C
(),
|u|
1
| !e(u)|
1
+| Jm(u)|
1
2 sup
F
[
_
F
!e(u)[ + 2 sup
F
[
_
F
Jm(u)[ 4 sup
F
[
_
F
u[.
(In fact, |u|
1
sup
F
[
_
F
u[; see 246Yl and 252Yt.) Consequently some of the arguments of 246G need to be
written out with dierent constants, but the results, as stated, are unaected.
246X Basic exercises (a) Let (X, , ) be a measure space and A a subset of L
1
= L
1
(). Show that the following
are equiveridical: (i) A is uniformly integrable; (ii) for every > 0 there is a w 0 in L
1
such that
_
([u[ w)
+

for every u A; (iii) ([u
n+1
[ sup
in
[u
i
[)
+

nN
0 in L
1
for every sequence u
n

nN
in A. (Hint: for (ii)(iii), set
v
n
= sup
in
[u
i
[ and note that v
n
w
nN
is convergent in L
1
for every w 0.)
>>>(b) Let (X, , ) be a totally nite measure space. Show that for any p > 1 and M 0 the set f : f
L
p
(), |f|
p
M is uniformly integrable. (Hint:
_
([f[ MX)
+
M
1p
_
[f[
p
.)
>>>(c) Let be counting measure on N. Show that a set A L
1
() =
1
is uniformly integrable i (i) sup
fA
[f(n)[ <
for every n N (ii) for every > 0 there is an m N such that

n=m
[f(n)[ for every f A.
(d) Let X be a set, and let be counting measure on X. Show that a set A L
1
() =
1
(X) is uniformly integrable
i (i) sup
fA
[f(x)[ < for every x X (ii) for every > 0 there is a nite set I X such that

xX\I
[f(x)[
for every f A. Show that in this case A is relatively compact for the norm topology of
1
(X).
(e) Let (X, , ) be a measure space, > 0, and J a family such that (i) every atom belongs to J (ii) E J
whenever E and E (iii) E F J whenever E, F J and E F = . Show that every set of nite measure
belongs to J.
(f ) Let (X, , ) and (Y, T, ) be measure spaces and : X Y an inverse-measure-preserving function. Show
that a set A L
1
() is uniformly integrable i g : g A is uniformly integrable in L
1
(). (Hint: use 246G for if,
246A for only if.)
>>>(g) Let (X, , ) be a measure space and p [1, [. Let f
n

nN
be a sequence in L
p
= L
p
() such that
[f
n
[
p
: n N is uniformly integrable and f
n
f a.e. Show that f L
p
and lim
n
_
[f
n
f[
p
= 0.
(h) Let (X, , ) be a semi-nite measure space and p [1, [. Let u
n

nN
be a sequence in L
p
= L
p
() and
u L
0
(). Show that the following are equiveridical: (i) u L
p
and u
n

nN
converges to u for | |
p
(ii) u
n

nN
converges in measure to u and [u
n
[
p
: n N is uniformly integrable. (Hint: 245Xl.)
(i) Let (X, , ) be a totally nite measure space, and 1 p < r . Let u
n

nN
be a | |
r
-bounded sequence
in L
r
() which converges in measure to u L
0
(). Show that u
n

nN
converges to u for | |
p
. (Hint: show that
[u
n
[
p
: n N is uniformly integrable.)
246Y Further exercises (a) Let (X, , ) be a totally nite measure space. Show that A L
1
() is uniformly
integrable i there is a convex function : [0, [ R such that lim
a
(a)/a = and sup
fA
_
([f[) < .
(b) For any metric space (Z, ), let (
Z
be the family of closed subsets of Z, and for F, F

(
Z
set (F, F

) =
min(1, max(sup
zF
inf
z

F
(z, z

), sup
z

F
inf
zF
(z, z

))). Show that is a metric on (


Z
(it is the Hausdor
metric). Show that if (Z, ) is complete then the family /
Z
of non-empty compact subsets of Z is closed for .
Now let (X, , ) be any measure space and take Z = L
1
= L
1
(), (z, z

) = |z z

|
1
for z, z

Z. Show that the


family of non-empty closed uniformly integrable subsets of L
1
is a closed subset of (
Z
including /
Z
.
190 Function spaces 246Yc
(c) Let (X, , ) be a totally nite measure space and A L
1
() a uniformly integrable set. Show that there is
a uniformly integrable set C A such that (i) C is convex and closed in L
0
() for the topology of convergence in
measure (ii) if u C and [v[ [u[ then v C (iii) if T belongs to the set T
+
of operators from L
1
() = M
1,
() to
itself, as described in 244Xm, then T[C] C.
(d) Let be Lebesgue measure on R. Show that a set A L
1
() is uniformly integrable i lim
n
_
F
n
f
n
= 0 for every disjoint sequence F
n

nN
of compact sets in R and every sequence f
n

nN
in A.
(e) Let be Lebesgue measure on R. Show that a set A L
1
() is uniformly integrable i lim
n
_
G
n
f
n
= 0 for every disjoint sequence G
n

nN
of open sets in R and every sequence f
n

nN
in A.
(f ) Repeat 246Yd and 246Ye for Lebesgue measure on arbitrary subsets of R
r
.
(g) Let X be a set and a -algebra of subsets of X. Let
n

nN
be a sequence of countably additive functionals on
such that E = lim
n

n
E is dened for every E . Show that lim
n

n
F
n
= 0 whenever F
n

nN
is a disjoint
sequence in . (Hint: suppose otherwise. By taking suitable subsequences reduce to the case in which [
n
F
i
F
i
[
2
n
for i < n, [
n
F
n
[ 3, [
n
F
i
[ 2
i
for i > n. Set F =

iN
F
2i+1
and show that [
2n+1
F
2n
F[ for every
n.)
(h) Let (X, , ) be a measure space and u
n

nN
a sequence in L
1
= L
1
() such that lim
n
_
F
u
n
is dened for
every F . Show that u
n
: n N is uniformly integrable. (Hint: suppose not. Then there are a disjoint sequence
F
n

nN
in and a subsequence u

nN
of u
n

nN
such that inf
nN
[
_
F
n
u

n
[ = > 0. But this contradicts 246Yg.)
(i) In 246Yg, show that is countably additive. (Hint: Set =

n=0
a
n

n
for a suitable sequence a
n

nN
of
strictly positive numbers. For each n choose a Radon-Nikod ym derivative f
n
of
n
with respect to . Show that
f
n
: n N is uniformly integrable, so that is truly continuous.)(This is the Vitali-Hahn-Saks theorem.)
(j) Let (X, , ) be any measure space, and A L
1
(). Show that the following are equiveridical: (i) A is | |
1
-
bounded; (ii) sup
uA
[
_
F
u[ < for every -atom F and limsup
n
sup
uA
[
_
F
n
u[ < for every disjoint
sequence F
n

nN
of measurable sets of nite measure; (iii) sup
uA
[
_
E
u[ < for every E . (Hint: show that
a
n
u
n

nN
is uniformly integrable whenever lim
n
a
n
= 0 in R and u
n

nN
is a sequence in A.)
(k) Let (X, , ) be a measure space and A L
1
() a non-empty set. Show that the following are equiveridical:
(i) A is uniformly integrable; (ii) whenever B L

() is non-empty and downwards-directed and has inmum 0 in


L

() then inf
vB
sup
uA
[
_
u v[ = 0. (Hint: for (i)(ii), note that inf
vB
w v = 0 for every w 0 in L
0
. For
(ii)(i), use 246G(iv).)
(l) Set f(x) = e
ix
for x [, ]. Show that [
_
E
f[ 2 for every E [, ].
246 Notes and comments I am holding over to the next section the most striking property of uniformly integrable
sets (they are the relatively weakly compact sets in L
1
) because this demands some non-trivial ideas from functional
analysis and general topology. In this section I give the results which can be regarded as essentially measure-theoretic
in inspiration. The most important new concept, or technique, is that of disjoint-sequence theorem. A typical example
is in condition (iii) of 246G, relating uniform integrability to the behaviour of functionals on disjoint sequences of sets.
I give variants of this in 246Yd-246Yf, and 246Yg-246Yj are further results in which similar methods can be used. The
central result of the next section (247C) will also use disjoint sequences in the proof, and they will appear more than
once in Chapter 35 in the next volume.
The phrase uniformly integrable ought to mean something like uniformly approximable by simple functions, and
the denition 246A can be forced into such a form, but I do not think it very useful to do so. However condition
(ii) of 246G amounts to something like uniformly truly continuous, if we think of members of L
1
as truly continuous
functionals on , as in 242I. (See 246Yi.) Note that in each of the statements (ii)-(iv) of 246G we need to take special
note of any atoms for the measure, since they are not controlled by the main condition imposed. In an atomless measure
space, of course, we have a simplication here, as in 246Yd-246Yf.
Another way of justifying the uniformly in uniformly integrable is by considering functionals
w
where w 0 in
L
1
, setting
w
(u) =
_
([u[ w)
+
for u L
1
; then A L
1
is uniformly integrable i
w
0 uniformly on A as w rises
in L
1
(246Xa). It is sometimes useful to know that if this is true at all then it is necessarily witnessed by elements w
which can be built directly from materials at hand (see (iii) of 246Xa). Furthermore, the sets A
w
= u :
w
(u) are
always convex, | |
1
-closed and solid (if u A
w
and [v[ [u[ then v A
w
)(246Cd); they are closed under pointwise
247B Weak compactness in L
1
191
convergence of sequences (246Ja) and in semi-nite measure spaces they are closed for the topology of convergence in
measure (246Jd); in probability spaces, for level w, they are closed under conditional expectations (246D) and similar
operators (246Yc). Consequently we can expect that any uniformly integrable set will be included in a uniformly
integrable set which is closed under operations of several dierent types.
Yet another uniform property of uniformly integrable sets is in 246Yk. The norm | |

is never (in interesting


cases) order-continuous in the way that other | |
p
are (244Ye); but the uniformly integrable subsets of L
1
provide
interesting order-continuous seminorms on L

.
246J supplements results from 245. In the notes to that section I mentioned the question: if f
n

nN
f a.e., in
what ways can
_
f
n

nN
fail to converge to
_
f? Here we nd that
_
[f
n
f[
nN
0 i f
n
: n N is uniformly
integrable; this is a way of making precise the expression none of the weight of the sequence is lost at innity. Generally,
for sequences, convergence in | |
p
, for p [1, [, is convergence in measure for pth-power-uniformly-integrable sequences
(246Xh).
247 Weak compactness in L
1
I now come to the most striking feature of uniform integrability: it provides a description of the relatively weakly
compact subsets of L
1
(247C). I have put this into a separate section because it demands some knowledge of functional
analysis in particular, of course, of weak topologies on Banach spaces. I will try to give an account in terms which are
accessible to novices in the theory of normed spaces because the result is essentially measure-theoretic, as well as being
of vital importance to applications in probability theory. I have written out the essential denitions in 2A3-2A5.
247A Part of the argument of the main theorem below will run more smoothly if I separate out an idea which is,
in eect, a simple special case of a theme which has been running through the exercises of this chapter (241Yg, 242Yb,
243Ya, 244Yd).
Lemma Let (X, , ) be a measure space, and G any member of . Let
G
be the subspace measure on G, so that

G
E = E for E G, E . Set
U = u : u L
1
(), u G

= u L
1
().
Then we have an isomorphism S between the ordered normed spaces U and L
1
(
G
), given by writing
S(f

) = (fG)

for every f L
1
() such that f

U.
proof Of course I should remark explicitly that U is a linear subspace of L
1
(). I have discussed integration over
subspaces in 131 and 214; in particular, I noted that fG is integrable, and that
_
[fG[d
G
=
_
[f[ Gd
_
[f[d
for every f L
1
() (131Fa). If f, g L
1
() and f = g -a.e., then fG = gG
G
-a.e.; so the proposed formula for
S does indeed dene a map from U to L
1
(
G
).
Because
(f +g)G = (fG) + (gG), (cf)G = c(fG)
for all f, g L
1
() and all c R, S is linear. Because
f g -a.e. = fG gG
G
-a.e.,
S is order-preserving. Because
_
[fG[d
G

_
[f[d for every f L
1
(), |Su|
1
|u|
1
for every u U.
To see that S is surjective, take any v L
1
(
G
). Express v as g

where g L
1
(
G
). By 131E, f L
1
(), where
f(x) = g(x) for x domg, 0 for x X G; so that f

U and fG = g and v = S(f

) S[U].
To see that S is norm-preserving, note that, for any f L
1
(),
_
[fG[d
G
=
_
[f[ Gd,
so that if u = f

U we shall have
|Su|
1
=
_
[fG[d
G
=
_
[f[ Gd = |u G

|
1
= |u|
1
.
192 Function spaces 247B
247B Corollary Let (X, , ) be any measure space, and let G be a measurable set expressible as a countable
union of sets of nite measure. Dene U as in 247A, and let h : L
1
() R be any continuous linear functional. Then
there is a v L

() such that h(u) =


_
u v d for every u U.
proof Let S : U L
1
(
G
) be the isomorphism described in 247A. Then S
1
: L
1
(
G
) U is linear and continuous,
so h
1
= hS
1
belongs to the normed space dual (L
1
(
G
))

of L
1
(
G
). Now of course
G
is -nite, therefore localizable
(211L), so 243Gb tells us that there is a v
1
L

(
G
) such that
h
1
(u) =
_
u v
1
d
G
for every u L
1
(
G
).
Express v
1
as g

1
where g
1
: G R is a bounded measurable function. Set g(x) = g
1
(x) for x G, 0 for x X G;
then g : X R is a bounded measurable function, and v = g

(). If u U, express u as f

where f L
1
();
then
h(u) = h(S
1
Su) = h
1
((fG)

) =
_
(fG) g
1
d
G
=
_
(f g)Gd
G
=
_
f g Gd =
_
f g d =
_
u v.
As u is arbitrary, this proves the result.
247C Theorem Let (X, , ) be any measure space and A a subset of L
1
= L
1
(). Then A is uniformly integrable
i it is relatively compact in L
1
for the weak topology of L
1
.
proof (a) Suppose that A is relatively compact for the weak topology. I seek to show that it satises the condition
(iii) of 246G.
(i) If F , then surely sup
uA
[
_
F
u[ < , because u
_
F
u belongs to (L
1
)

, and if h (L
1
)

then the image


of any relatively weakly compact set under h must be bounded (2A5Ie).
(ii) Now suppose that F
n

nN
is a disjoint sequence in . ??? Suppose, if possible, that sup
uA
[
_
F
n
u[
nN
does
not converge to 0. Then there is a strictly increasing sequence n(k)
kN
in N such that
=
1
2
inf
kN
sup
uA
[
_
F
n(k)
u[ > 0.
For each k, choose u
k
A such that [
_
F
n(k)
u
k
[ . Because A is relatively compact for the weak topology, there is a
cluster point u of u
k

kN
in L
1
for the weak topology (2A3Ob). Set
j
= 2
j
/6 > 0 for each j N.
We can now choose a strictly increasing sequence k(j)
jN
inductively so that, for each j,
_
F
n(k(j))
([u[ +

j1
i=0
[u
k(i)
[)
j

j1
i=0
[
_
F
n(k(i))
u
_
F
n(k(i))
u
k(j)
[
j
for every j, interpreting

1
i=0
as 0. PPP Given k(i)
i<j
, set v

= [u[ +

j1
i=0
[u
k(i)
[; then lim
k
_
F
n(k)
v

= 0, by
Lebesgues Dominated Convergence Theorem or otherwise, so there is a k

such that k

> k(i) for every i < j and


_
F
n(k)
v


j
for every k k

. Next,
w

j1
i=0
[
_
F
n(k(i))
u
_
F
n(k(i))
w[ : L
1
R
is continuous for the weak topology of L
1
and zero at u, and u belongs to every weakly open set containing u
k
: k k

,
so there is a k(j) k

such that

j1
i=0
[
_
F
n(k(i))
u
_
F
n(k(i))
u
k(j)
[ <
j
, which continues the construction. QQQ
Let v be any cluster point in L
1
, for the weak topology, of u
k(j)

jN
. Setting G
i
= F
n(k(i))
, we have [
_
G
i
u
_
G
i
u
k(j)
[
j
whenever i < j, so lim
j
_
G
i
u
k(j)
exists =
_
G
i
u for each i, and
_
G
i
v =
_
G
i
u for every i; setting
G =

iN
G
i
,
_
G
v =

i=0
_
G
i
v =

i=0
_
G
i
u =
_
G
u,
by 232D, because G
i

iN
is disjoint.
For each j N,
247C Weak compactness in L
1
193
j1

i=0
[
_
G
i
u
k(j)
[ +

i=j+1
[
_
G
i
u
k(j)
[

j1

i=0
_
G
i
[u[ +
j1

i=0
[
_
G
i
u
_
G
i
u
k(j)
[ +

i=j+1
_
G
i
[u
k(j)
[

j1

i=0

i
+
j
+

i=j+1

i
=

i=0

i
=

3
.
On the other hand, [
_
G
j
u
k(j)
[ . So
[
_
G
u
k(j)
[ = [

i=0
_
G
i
u
k(j)
[
2
3
.
This is true for every j; because every weakly open set containing v meets u
k(j)
: j N, [
_
G
v[
2
3
and
[
_
G
u[
2
3
. On the other hand,
[
_
G
u[ = [

i=0
_
G
i
u[

i=0
_
G
i
[u[

i=0

i
=

3
,
which is absurd. XXX
This contradiction shows that lim
n
sup
uA
[
_
F
n
u[ = 0. As F
n

nN
is arbitrary, A satises the condition 246G(iii)
and is uniformly integrable.
(b) Now assume that A is uniformly integrable. I seek a weakly compact set C A.
(i) For each n N, choose E
n
, M
n
0 such that E
n
< and
_
([u[ M
n
E

n
)
+
2
n
for every u A.
Set
C = v : v L
1
, [
_
F
v[ M
n
(F E
n
) + 2
n
n N, F ,
and note that A C, because if u A and F ,
[
_
F
u[
_
F
([u[ M
n
E

n
)
+
+
_
F
M
n
E

n
2
n
+M
n
(F E
n
)
for every n. Observe also that C is | |
1
-bounded, because
|u|
1
2 sup
F
[
_
F
u[ 2(1 +M
0
(F E
0
)) 2(1 +M
0
E
0
)
for every u C (using 246F).
(ii) Because I am seeking to prove this theorem for arbitrary measure spaces (X, , ), I cannot use 243G to
identify the dual of L
1
. Nevertheless, 247B above shows that 243Gb it is nearly valid, in the following sense: if
h (L
1
)

, there is a v L

such that h(u) =


_
u v for every u C. PPP Set G =

nN
E
n
, and dene U L
1
as in 247A-247B. By 247B, there is a v L

such that h(u) =


_
u v for every u U. But if u C, we can express
u as f

where f : X R is measurable. If F and F G = , then


[
_
F
f[ = [
_
F
u[ 2
n
+M
n
(F E
n
) = 2
n
for every n N, so
_
F
f = 0; it follows that f = 0 a.e. on X G (131Fc), so that f G =
a.e.
f and u = uG

, that
is, u U, and h(u) =
_
u v, as required. QQQ
(iii) So we may proceed, having an adequate description, not of (L
1
())

itself, but of its action on C.


Let T be any ultralter on L
1
containing C (see 2A3R). For each F , set
F = lim
uF
_
F
u;
because
sup
uC
[
_
F
u[ sup
uC
|u|
1
< ,
this is well-dened in R (2A3Se). If E, F are disjoint members of , then
_
EF
u =
_
E
u +
_
F
u for every u C, so
(E F) = lim
uF
_
EF
u = lim
uF
_
E
u + lim
uF
_
F
u = E +F
194 Function spaces 247C
(2A3Sf). Thus : R is additive. Next, it is truly continuous with respect to . PPP Given > 0, take n N such
that 2
n

1
2
, set = /2(M
n
+ 1) > 0 and observe that
[F[ sup
uC
[
_
F
u[ 2
n
+M
n
(F E
n
)
whenever (F E
n
) . QQQ By the Radon-Nikod ym theorem (232E), there is an f
0
L
1
such that
_
F
f
0
= F for
every F . Set u
0
= f

0
L
1
. If n N, F then
[
_
F
u
0
[ = [F[ sup
uC
[
_
F
u[ 2
n
+M
n
(F E
n
),
so u
0
C.
(iv) Of course the point is that T converges to u
0
. PPP Let h (L
1
)

. Then there is a v L

such that
h(u) =
_
uv for every u C. Express v as g

, where g : X R is bounded and -measurable. Let > 0. Take a


0

a
1
. . . a
n
such that a
i+1
a
i
for each i while a
0
g(x) < a
n
for each x X. Set F
i
= x : a
i1
g(x) < a
i

for 1 i n, and set g =

n
i=1
a
i
F
i
, v = g

; then | v v|

. We have
_
u
0
v =
n

i=1
a
i
_
F
i
u =
n

i=1
a
i
F
i
=
n

i=1
a
i
lim
uF
_
F
i
u = lim
uF
n

i=1
a
i
_
F
i
u = lim
uF
_
u v.
Consequently
limsup
uF
[
_
u v
_
u
0
v[ [
_
u
0
v
_
u
0
v[ + sup
uC
[
_
u v
_
u v[
|u
0
|
1
|v v|

+ sup
uC
|u|
1
|v v|

2 sup
uC
|u|
1
.
As is arbitrary,
limsup
uF
[h(u) h(u
0
)[ = limsup
uF
[
_
u v
_
u
0
v[ = 0.
As h is arbitrary, u
0
is a limit of T in C for the weak topology of L
1
. QQQ
As T is arbitrary, C is weakly compact in L
1
, and the proof is complete.
247D Corollary Let (X, , ) and (Y, T, ) be any two measure spaces, and T : L
1
() L
1
() a continuous linear
operator. Then T[A] is a uniformly integrable subset of L
1
() whenever A is a uniformly integrable subset of L
1
().
proof The point is that T is continuous for the respective weak topologies (2A5If). If A L
1
() is uniformly
integrable, then there is a weakly compact C A, by 247C; T[C], being the image of a compact set under a continuous
map, must be weakly compact (2A3N(b-ii)); so T[C] and T[A] are uniformly integrable by the other half of 247C.
247E Complex L
1
There are no diculties, and no surprises, in proving 247C for L
1
C
. If we follow the same proof,
everything works, but of course we must remember to change the constant when applying 246F, or rather 246K, in
part (b-i) of the proof.
247X Basic exercises >>>(a) Let (X, , ) be any measure space. Show that if A L
1
= L
1
() is relatively weakly
compact, then v : v L
1
, [v[ [u[ for some u A is relatively weakly compact.
(b) Let (X, , ) be a measure space. On L
1
= L
1
() dene pseudometrics
F
,

w
for F , w L

() by setting

F
(u, v) = [
_
F
u
_
F
v[,

w
(u, v) = [
_
u w
_
v w[ for u, v L
1
. Show that on any | |
1
-bounded subset of L
1
,
the topology dened by
F
: F agrees with the topology generated by

w
: w L

.
>>>(c) Show that for any set X a subset of
1
=
1
(X) is compact for the weak topology of
1
i it is compact for the
norm topology of
1
. (Hint: 246Xd.)
247 Notes Weak compactness in L
1
195
(d) Use the argument of (a-ii) in the proof of 247C to show directly that if A
1
(N) is weakly compact then
inf
nN
[u
n
(n)[ = 0 for any sequence u
n

nN
in A.
(e) Let (X, , ) and (Y, T, ) be measure spaces, and T : L
2
() L
1
() any bounded linear operator. Show that
Tu : u L
2
(), |u|
2
1 is uniformly integrable in L
1
(). (Hint: use 244K to see that u : |u|
2
1 is weakly
compact in L
2
().)
247Y Further exercises (a) Let (X, , ) be a measure space. Take 1 < p < and M 0 and set A = u : u
L
p
= L
p
(), |u|
p
M. Write S
A
for the topology of convergence in measure on A, that is, the subspace topology
induced by the topology of convergence in measure on L
0
(). Show that if h (L
p
)

then hA is continuous for S


A
;
so that if T is the weak topology on L
p
, then the subspace topology T
A
is included in S
A
.
(b) Let (X, , ) be a measure space and u
n

nN
a sequence in L
1
= L
1
() such that lim
n
_
F
u
n
is dened for
every F . Show that u
n
: n N is weakly convergent. (Hint: 246Yh.) Find an alternative argument relying on
2A5J and the result of 246Yj.
247 Notes and comments In 247D and 247Xa I try to suggest the power of the identication between weak
compactness and uniform integrability. That a continuous image of a weakly compact set should be weakly compact is
a commonplace of functional analysis; that the solid hull of a uniformly integrable set should be uniformly integrable
is immediate from the denition. But I see no simple arguments to show that a continuous image of a uniformly
integrable set should be uniformly integrable, or that the solid hull of a weakly compact set should be relatively weakly
compact. (Concerning the former, an alternative route does exist; see 371Xf in the next volume.)
I can distinguish two important ideas in the proof of 247C. The rst, in (a-ii) of the proof, is a careful manipulation
of sequences; it is the argument needed to show that a weakly compact subset of
1
is norm-compact. (You may nd
it helpful to write out a solution to 247Xd.) The F
n(k)
and u
k
are chosen to mimic the situation in which we have a
sequence in
1
such that u
k
(k) = 1 for each k. The k(i) are chosen so that the hump moves suciently rapidly along
for u
k(j)
(k(i)) to be very small whenever i ,= j. But this means that

i=0
u
k(j)
(k(i)) (corresponding to
_
G
u
k(j)
in
the proof) is always substantial, while

i=0
v(k(i)) will be small for any putative cluster point v of u
k(j)

jN
. I used
similar techniques in 246; compare 246Yg.
In the other half of the proof of 247C, the strategy is clearer. Members of L
1
correspond to truly continuous
functionals on ; the uniform integrability of C makes the corresponding set of functionals uniformly truly continuous,
so that any limit functional will also be truly continuous and will give us a member of L
1
via the Radon-Nikod ym
theorem. A straightforward approximation argument ((a-iv) in the proof, and 247Xb) shows that lim
uF
_
u w =
_
v w for every w L

. For localizable measures , this would complete the proof. For the general case, we need
another step, here done in 247A-247B; a uniformly integrable subset of L
1
eectively lives on a -nite part of the
measure space, so that we can ignore the rest of the measure and suppose that we have a localizable measure space.
The conditions (ii)-(iv) of 246G make it plain that weak compactness in L
1
can be eectively discussed in terms
of sequences; see also 246Yh. I should remark that this is a general feature of weak compactness in Banach spaces
(2A5J). Of course the disjoint-sequence formulations in 246G are characteristic of L
1
I mean that while there are
similar results applicable elsewhere (see Fremlin 74, chap. 8), the ideas are clearest and most dramatically expressed
in their application to L
1
.
196 Product measures
Chapter 25
Product Measures
I come now to another chapter on pure measure theory, discussing a fundamental construction or, as you may
prefer to consider it, two constructions, since the problems involved in forming the product of two arbitrary measure
spaces (251) are rather dierent from those arising in the product of arbitrarily many probability spaces (254). This
work is going to stretch our technique to the utmost, for while the fundamental theorems to which we are moving are
natural aims, the proofs are lengthy and there are many pitfalls beside the true paths.
The central idea is that of repeated integration. You have probably already seen formulae of the type
__
f(x, y)dxdy
used to calculate the integral of a function of two real variables over a region in the plane. One of the basic techniques
of advanced calculus is reversing the order of integration; for instance, we expect
_
1
0
(
_
1
y
f(x, y)dx)dy to be equal to
_
1
0
(
_
x
0
f(x, y)dy)dx. As I have developed the subject, we already have a third calculation to compare with these two:
_
D
f, where D = (x, y) : 0 y x 1 and the integral is taken with respect to Lebesgue measure on the plane.
The rst two sections of this chapter are devoted to an analysis of the relationship between one- and two-dimensional
Lebesgue measure which makes these operations valid some of the time; part of the work has to be devoted to a
careful description of the exact conditions which must be imposed on f and D if we are to be safe.
Repeated integration, in one form or another, appears everywhere in measure theory, and it is therefore necessary
sooner or later to develop the most general possible expression of the idea. The standard method is through the theory
of products of general measure spaces. Given measure spaces (X, , ) and (Y, T, ), the aim is to nd a measure
on X Y which will, at least, give the right measure E F to a rectangle E F where E and F T. It
turns out that there are already diculties in deciding what the product measure is, and to do the job properly I
nd I need, even at this stage, to describe two related but distinguishable constructions. These constructions and their
elementary properties take up the whole of 251. In 252 I turn to integration over the product, with Fubinis and
Tonellis theorems relating
_
fd with
__
f(x, y)(dx)(dy). Because the construction of is symmetric between the
two factors, this automatically provides theorems relating
__
f(x, y)(dx)(dy) with
__
f(x, y)(dy)(dx). 253 looks
at the space L
1
() and its relationship with L
1
() and L
1
().
For general measure spaces, there are obstacles in the way of forming an innite product; to start with, if (X
n
,
n
)
nN
is a sequence of measure spaces, then a product measure on X =

nN
X
n
ought to set X =

n=0

n
X
n
, and
there is no guarantee that the product will converge, or behave well when it does. But for probability spaces, when

n
X
n
= 1 for every n, this problem at least evaporates. It is possible to dene the product of any family of probability
spaces; this is the burden of 254.
I end the chapter with three sections which are a preparation for Chapters 27 and 28, but are also important in
their own right as an investigation of the way in which the group structure of R
r
interacts with Lebesgue and other
measures. 255 deals with the convolution f g of two functions, where (f g)(x) =
_
f(y)g(xy)dy (the integration
being with respect to Lebesgue measure). In 257 I show that some of the same ideas, suitably transformed, can be
used to describe a convolution
1

2
of two measures on R
r
; in preparation for this I include a section on Radon
measures on R
r
(256).
251 Finite products
The rst construction to set up is the product of a pair of measure spaces. It turns out that there are already
substantial technical diculties in the way of nding a canonical universally applicable method. I nd myself therefore
describing two related, but distinct, constructions, the primitive and c.l.d. product measures (251C, 251F). After
listing the fundamental properties of the c.l.d product measure (251I-251J), I work through the identication of the
product of Lebesgue measure with itself (251N) and a fairly thorough discussion of subspaces (251O-251S).
251A Denition Let (X, , ) and (Y, T, ) be two measure spaces. For A X Y set
A = inf

n=0
E
n
F
n
: E
n
, F
n
T n N, A

nN
E
n
F
n
.
Remark In the products E
n
F
n
, 0 is to be taken as 0, as in 135.
251B Lemma In the context of 251A, is an outer measure on X Y .
proof (a) Setting E
n
= F
n
= for every n N, we see that = 0.
(b) If A B X Y , then whenever B

nN
E
n
F
n
we shall have A

nN
E
n
F
n
; so A B.
251E Finite products 197
(c) Let A
n

nN
be a sequence of subsets of X Y , with union A. For any > 0, we may choose, for each n N,
sequences E
nm

mN
in and F
nm

mN
in T such that A
n

mN
E
nm
F
nm
and

m=0
E
nm
F
nm
A
n
+2
n
.
Because N N is countable, we have a bijection k (n
k
, m
k
) : N N N, and now
A

n,mN
E
nm
F
nm
=

kN
E
n
k
m
k
F
n
k
m
k
,
so that
A

k=0
E
n
k
m
k
F
n
k
m
k
=

n=0

m=0
E
nm
F
nm

n=0
A
n
+ 2
n
= 2 +

n=0
A
n
.
As is arbitrary, A

n=0
A
n
.
As A
n

nN
is arbitrary, is an outer measure.
251C Denition Let (X, , ) and (Y, T, ) be measure spaces. By the primitive product measure on X Y
I shall mean the measure
0
derived by Caratheodorys method (113C) from the outer measure dened in 251A.
Remark I ought to point out that there is no general agreement on what the product measure on X Y should be.
Indeed in 251F below I will introduce an alternative one, and in the notes to this section I will mention a third.
251D Denition It is convenient to have a name for a natural construction for -algebras. If X and Y are
sets with -algebras TX and T TY , I will write

T for the -algebra of subsets of X Y generated by


E F : E , F T.
251E Proposition Let (X, , ) and (Y, T, ) be measure spaces; let
0
be the primitive product measure on XY ,
and its domain. Then

T and
0
(E F) = E F for all E and F T.
proof Throughout this proof, write
f
= E : E , E < , T
f
= F : F T, F < .
(a) Suppose that E and A X Y . For any > 0, there are sequences E
n

nN
in and F
n

nN
in T such
that A

nN
E
n
F
n
and

n=0
E
n
E
n
A+. Now
A (E Y )

nN
(E
n
E) F
n
, A (E Y )

nN
(E
n
E) F
n
,
so
(A (E Y )) +(A (E Y ))

n=0
(E
n
E) F
n
+

n=0
(E
n
E) F
n
=

n=0
E
n
F
n
A+.
As is arbitrary, (A (E Y )) +(A (E Y )) A. And this is enough to ensure that E Y (see 113D).
(b) Similarly, X F for every F T, so E F = (E Y ) (X F) for every E , F T.
Because is a -algebra, it must include the smallest -algebra containing all the products EF, that is,

T.
(c) Take E , F T. We know that EF ; setting E
0
= E, F
0
= F, E
n
= F
n
= for n 1 in the denition
of , we have

0
(E F) = (E F) E F.
We have come to the central idea of the construction. In fact (E F) = E F. PPP Suppose that E F

nN
E
n
F
n
where E
n
and F
n
T for every n. Set u =

n=0
E
n
F
n
. If u = or E = 0 or F = 0 then
of course E F u. Otherwise, set
I = n : n N, E
n
= 0, J = n : n N, F
n
= 0, K = N (I J),
E

= E

nI
E
n
, F

= F

nJ
F
n
.
Then E

= E and F

= F; E



nK
E
n
F
n
; and for n K, E
n
< and F
n
< , since
E
n
F
n
u < and neither E
n
nor F
n
is zero. Set
198 Product measures 251E
f
n
= F
n
E
n
: X R
if n K, and f
n
= 0 : X R if n I J. Then f
n
is a simple function and
_
f
n
= F
n
E
n
for n K, 0 otherwise,
so

n=0
_
f
n
(x)(dx) =

n=0
E
n
F
n
u.
By B.Levis theorem (123A), applied to

n
k=0
f
k

nN
, g =

n=0
f
n
is integrable and
_
g d u. Write E

for
x : x E

, g(x) < , so that E

= E

= E. Now take any x E

and set K
x
= n : n K, x E
n
. Because
E

nK
E
n
F
n
, F

nK
x
F
n
and
F = F

nK
x
F
n
=

n=0
f
n
(x) = g(x).
Thus g(x) F for every x E

. We are supposing that 0 < E = E

and 0 < F, so we must have F < ,


E

< . Now g FE

, so
E F = E

F =
_
FE


_
g u =

n=0
E
n
F
n
.
As E
n

nN
, F
n

nN
are arbitrary, (E F) E F and (E F) = E F. QQQ
Thus

0
(E F) = (E F) = E F
for all E , F T.
251F Denition Let (X, , ) and (Y, T, ) be measure spaces, and
0
the primitive product measure dened in
251C. By the c.l.d. product measure on X Y I shall mean the function : dom
0
[0, ] dened by setting
W = sup
0
(W (E F)) : E , F T, E < , F <
for W dom
0
.
251G Remark I had better show at once that is a measure. PPP Of course its domain = dom
0
is a -algebra,
and =
0
= 0. If W
n

nN
is a disjoint sequence in , then for any E , F T of nite measure

0
(

nN
W
n
(E F)) =

n=0

0
(W
n
(E F))

n=0
W
n
,
so (

nN
W
n
)

n=0
W
n
. On the other hand, if a <

n=0
W
n
, then we can nd m N and a
0
, . . . , a
m
such
that a

m
n=0
a
n
and a
n
< W
n
for each n m; now there are E
0
, . . . , E
m
and F
0
, . . . , F
m
T, all of nite
measure, such that a
n

0
(W
n
(E
n
F
n
)) for each n. Setting E =

nm
E
n
and F =

nm
F
n
, we have E <
and F < , so
(
_
nN
W
n
)
0
(
_
nN
W
n
(E F)) =

n=0

0
(W
n
(E F))

n=0

0
(W
n
(E
n
F
n
))
m

n=0
a
n
a.
As a is arbitrary, (

nN
W
n
)

n=0
W
n
and (

nN
W
n
) =

n=0
W
n
. As W
n

nN
is arbitrary, is a measure.
QQQ
251H We need a simple property of the measure
0
.
Lemma Let (X, , ) and (Y, T, ) be two measure spaces; let
0
be the primitive product measure on X Y , and
its domain. If H X Y and H (E F) whenever E < and F < , then H .
proof Let be the outer measure described in 251A. Suppose that A XY and A < . Let > 0. Let E
n

nN
,
F
n

nN
be sequences in , T respectively such that A

nN
E
n
F
n
and

n=0
E
n
F
n
A+. Now, for each
n, the product of the measures E
n
, E
n
is nite, so either one is zero or both are nite. If E
n
= 0 or F
n
= 0 then
of course
E
n
F
n
= 0 = ((E
n
F
n
) H) +((E
n
F
n
) H).
If E
n
< and F
n
< then
251I Finite products 199
E
n
F
n
=
0
(E
n
F
n
)
=
0
((E
n
F
n
) H) +
0
((E
n
F
n
) H)
= ((E
n
F
n
) H) +((E
n
F
n
) H).
Accordingly, because is an outer measure,
(A H) +(A H)

n=0
((E
n
F
n
) H) +

n=0
((E
n
F
n
) H)
=

n=0
E
n
F
n
A+.
As is arbitrary, (A H) +(A H) A. As A is arbitrary, H .
251I Now for the fundamental properties of the c.l.d. product measure.
Theorem Let (X, , ) and (Y, T, ) be measure spaces; let be the c.l.d. product measure on X Y , and its
domain. Then
(a)

T and (E F) = E F whenever E , F T and E F < ;


(b) for every W there is a V

T such that V W and V = W;


(c) (X Y, , ) is complete and locally determined, and in fact is the c.l.d. version of (X Y, ,
0
) as described
in 213D-213E; in particular, W =
0
W whenever
0
W < ;
(d) if W and W > 0 then there are E , F T such that E < , F < and (W (E F)) > 0;
(e) if W and W < , then for every > 0 there are E
0
, . . . , E
n
, F
0
, . . . , F
n
T, all of nite measure,
such that (W

in
(E
i
F
i
)) .
proof Take to be the outer measure of 251A and
0
the primitive product measure of 251C. Set
f
= E : E
, E < and T
f
= F : F T, F < .
(a) By 251E,

T . If E and F T and E F < , either E F = 0 and (E F) =


0
(E F) = 0
or both E and F are nite and again (E F) =
0
(E F) = E F.
(b)(i) Take any a < W. Then there are E
f
, F T
f
such that
0
(W (E F)) > a (251F); now
((E F) W) =
0
((E F) W)
=
0
(E F)
0
(W (E F)) <
0
(E F) a.
Let E
n

nN
, F
n

nN
be sequences in , T respectively such that (EF) W

nN
E
n
F
n
and

n=0
E
n
F
n

0
(E F) a. Consider
V = (E F)

nN
E
n
F
n

T;
then V W, and
V =
0
V =
0
(E F)
0
((E F) V )

0
(E F)
0
(
_
nN
E
n
F
n
)
(because (E F) V

nN
E
n
F
n
)

0
(E F)

n=0
E
n
F
n
a
(by the choice of the E
n
, F
n
).
(ii) Thus for every a < W there is a V

T such that V W and V a. Now choose a sequence a


n

nN
strictly increasing to W, and for each a
n
a corresponding V
n
; then V =

nN
V
n
belongs to the -algebra

T, is
included in W, and has measure at least sup
nN
V
n
and at most W; so V = W, as required.
200 Product measures 251I
(c)(i) If H X Y is -negligible, there is a W such that H W and W = 0. If E , F T are of
nite measure,
0
(W (E F)) = 0; but
0
, being derived from the outer measure by Caratheodorys method, is
complete (212A), so H (E F) and
0
(H (E F)) = 0. Because E and F are arbitrary, H , by 251H. As
H is arbitrary, is complete.
(ii) If W and W = , then there must be E , F T such that E < , F < and
0
(W(EF)) >
0; now
0 < (W (E F)) E F < .
Thus is semi-nite.
(iii) If H X Y and H W whenever W < , then, in particular, H (E F) whenever E <
and F < ; by 251H again, H . Thus is locally determined.
(iv) If W and
0
W < , then we have sequences E
n

nN
in , F
n

nN
in T such that W

nN
(E
n
F
n
)
and

n=0
E
n
F
n
< . Set
I = n : E
n
= , J = n : F
n
= , K = N (I J);
then (

nI
F
n
) = (

nJ
E
n
) = 0, so
0
(W W

) = 0, where
W

= W

nK
(E
n
F
n
) W ((

nJ
E
n
Y ) (X

nI
F
n
)).
Now set E

n
=

iK,in
E
i
, F

n
=

iK,in
F
i
for each n. We have W

nN
W

(E

n
F

n
), so
W
0
W =
0
W

= lim
n

0
(W

(E

n
F

n
)) W

W,
and W =
0
W.
(v) Following the terminology of 213D, let us write

= W : W X Y, W V whenever V and
0
V < ,

W = sup
0
(W V ) : V ,
0
V < .
Because
0
(E F) < whenever E < and F < ,

and

= .
Now for any W we have

W = sup
0
(W V ) : V ,
0
V <
sup
0
(W (E F)) : E
f
, F T
f

= W
sup(W V ) : V ,
0
V <
= sup
0
(W V ) : V ,
0
V < ,
using (iv) just above, so that =

is the c.l.d. version of
0
.
(d) If W and W > 0, there are E
f
and F T
f
such that (W (E F)) =
0
(W (E F)) > 0.
(e) There are E
f
, F T
f
such that
0
(W (E F)) W
1
3
; set V
1
= W (E F); then
(W V
1
) = W V
1
= W
0
V
1

1
3
.
There are sequences E

nN
in , F

nN
in T such that V
1


nN
E

n
F

n
and

n=0
E

n
F

n

0
V
1
+
1
3
.
Replacing E

n
, F

n
by E

n
E, F

n
F if necessary, we may suppose that E

n

f
and F

n
T
f
for every n. Set
V
2
=

nN
E

n
F

n
; then
(V
2
V
1
)
0
(V
2
V
1
)

n=0
E

n
F

n

0
V
1

1
3
.
Let m N be such that

n=m+1
E

n
F

n

1
3
, and set
V =

m
n=0
E

n
F

n
.
Then
*251L Finite products 201
(V
2
V )

n=m+1
E

n
F

n

1
3
.
Putting these together, we have WV (W V
1
) (V
2
V
1
) (V
2
V ), so
(WV ) (W V
1
) +(V
2
V
1
) +(V
2
V )
1
3
+
1
3
+
1
3
= .
And V is of the required form.
251J Proposition If (X, , ) and (Y, T, ) are semi-nite measure spaces and is the c.l.d. product measure on
X Y , then (E F) = E F for all E , F T.
proof Setting
f
= E : E , E < , T
f
= F : F T, F < , we have
(E F) = sup
0
((E E
0
) (F F
0
)) : E
0

f
, F
0
T
f

= sup(E E
0
) (F F
0
)) : E
0

f
, F
0
T
f

= sup(E E
0
) : E
0

f
sup(F F
0
) : F
0
T
f
= E F
(using 213A).
251K -nite spaces Of course most of the measure spaces we shall apply these results to are -nite, and in this
case there are some useful simplications.
Proposition Let (X, , ) and (Y, T, ) be -nite measure spaces. Then the c.l.d. product measure on XY is equal
to the primitive product measure, and is the completion of its restriction to

T; moreover, this common product


measure is -nite.
proof Write
0
, for the primitive and c.l.d. product measures, as usual, and for their domain. Let E
n

nN
,
F
n

nN
be non-decreasing sequences of sets of nite measure covering X, Y respectively (see 211D).
(a) For each n N, (E
n
F
n
) = E
n
F
n
is nite, by 251Ia. Since X Y =

nN
E
n
F
n
, is -nite.
(b) For any W ,

0
W = lim
n

0
(W (E
n
F
n
)) = lim
n
(W (E
n
F
n
)) = W.
So =
0
.
(c) Write
B
for the restriction of =
0
to

T, and

B
for its completion.
(i) Suppose that W dom

B
. Then there are W

, W

T such that W

W W

and
B
(W

) = 0
(212C). In this case, (W

) = 0; as is complete, W and
W = W

=
B
W

B
W.
Thus extends

B
.
(ii) If W , then there is a V

T such that V W and (W V ) = 0. PPP For each n N there is a


V
n

T such that V
n
W (E
n
F
n
) and V
n
= (W (E
n
F
n
)) (251Ib). But as (E
n
F
n
) = E
n
F
n
is
nite, this means that (W (E
n
F
n
) V
n
) = 0. So if we set V =

nN
V
n
, we shall have V

T, V W and
W V =

nN
W (E
n
F
n
) V

nN
W (E
n
F
n
) V
n
is -negligible. QQQ
Similarly, there is a V

T such that V

(XY )W and (((XY )W)V

) = 0. Setting V

= (XY )V

,
V

T, W V

and (V

W) = 0. So

B
(V

V ) = (V

V ) = (V

W) +(W V ) = 0,
and W is measured by

B
, with

B
W =
B
V = W. As W is arbitrary,

B
= .
*251L The following result ts in naturally here; I star it because it will be used seldom (there is a more important
version of the same idea in 254G) and the proof can be skipped until you come to need it.
Proposition Let (X
1
,
1
,
1
), (X
2
,
2
,
2
), (Y
1
, T
1
,
1
) and (Y
2
, T
2
,
2
) be -nite measure spaces; let
1
,
2
be the
product measures on X
1
Y
1
, X
2
Y
2
respectively. Suppose that f : X
1
X
2
and g : Y
1
Y
2
are inverse-measure-
preserving functions, and that h(x, y) = (f(x), g(y)) for x X
1
, y Y
1
. Then h is inverse-measure-preserving.
202 Product measures *251L
proof Write
1
,
2
for the domains of
1
,
2
respectively.
(a) Suppose that E
2
and F T
2
have nite measure. Then
1
h
1
[W (E F)] is dened and equal to

2
(W (E F)) for every W
2
. PPP

1
h
1
[E F] =
1
(f
1
[E] g
1
[F]) =
1
f
1
[E]
1
g
1
[F]
=
2
E
2
F =
2
(E F)
by 251E/251J. QQQ
(b) Take E
0

2
and F
0
T
2
of nite measure. Let

1
,

2
be the subspace measures on f
1
[E
0
] g
1
[F
0
] and
E
0
F
0
respectively. Set

h = hf
1
[E
0
] g
1
[F
0
], and write for the identity map from E
0
F
0
to X
2
Y
2
; let
=

h
1
and

1
be the image measures on X
2
Y
2
. Then (a) tells us that
(E F) =
1
(h
1
[(E E
0
) (F F
0
)])
=
2
((E E
0
) (F F
0
)) =

(E F)
whenever E
2
and F T
2
. By the Monotone Class Theorem (136C), and

agree on
2

T
2
, that is,
1
(h
1
[W
(E
0
F
0
)]) =
2
(W (E
0
F
0
)) for every W
2

T
2
.
If W is any member of
2
, there are W

, W

T
2
such that W

W W

and
2
(W

) = 0 (251K).
Now we must have
h
1
[W

(E
0
F
0
)] h
1
[W (E
0
F
0
)] h
1
[W

(E
0
F
0
)],

1
(h
1
[W

(E
0
F
0
)] h
1
[W

(E
0
F
0
)]) =
2
((W

) (E
0
F
0
)) = 0;
because
1
is complete,
1
h
1
[W (E
0
F
0
)] is dened and equal to

1
h
1
[W

(E
0
F
0
)] =
2
(W

(E
0
F
0
)) =
2
(W (E
0
F
0
)).
(c) Now suppose that E
n

nN
, F
n

nN
are non-decreasing sequences of sets of nite measure with union X
2
, Y
2
respectively. If W
2
,

1
h
1
[W] = sup
nN

1
h
1
[W (E
n
F
n
)] = sup
nN

2
(W (E
n
F
n
)) =
2
W.
So h is inverse-measure-preserving, as claimed.
251M It is time that I gave some examples. Of course the central example is Lebesgue measure. In this case we
have the only reasonable result. I pause to describe the leading example of the product

T introduced in 251D.
Proposition Let r, s 1 be integers. Then we have a natural bijection : R
r
R
s
R
r+s
, dened by setting
((
1
, . . . ,
r
), (
1
, . . . ,
s
)) = (
1
, . . . ,
r
,
1
, . . . ,
s
)
for
1
, . . . ,
r
,
1
, . . . ,
s
R. If we write B
r
, B
s
and B
r+s
for the Borel -algebras of R
r
, R
s
and R
r+s
respectively,
then identies B
r+s
with B
r

B
s
.
proof (a) Write B for the -algebra
1
[W] : W B
r+s
copied onto R
r
R
s
by the bijection ; we are seeking
to prove that B = B
r

B
s
. We have maps
1
: R
r+s
R
r
,
2
: R
r+s
R
s
dened by setting
1
((x, y)) = x,

2
((x, y)) = y. Each co-ordinate of
1
is continuous, therefore Borel measurable (121Db), so
1
1
[E] B
r+s
for every
E B
r
, by 121K. Similarly,
1
2
[F] B
r+s
for every F B
s
. So [E F] =
1
1
[E]
1
1
[F] belongs to B
r+s
, that is,
E F B, whenever E B
r
and F B
s
. Because B is a -algebra, B
r

B
s
B.
(b) Now examine sets of the form
(x, y) :
i
= x :
i
R
s
,
(x, y) :
j
= R
r
y :
j

for R, i r and j s, taking x = (
1
, . . . ,
r
) and y = (
1
, . . . ,
s
). All of these belong to B
r

B
s
. But the
-algebra they generate is just B, by 121J. So B B
r

B
s
and B = B
r

B
s
.
251N Theorem Let r, s 1 be integers. Then the bijection : R
r
R
s
R
r+s
described in 251M identies
Lebesgue measure on R
r+s
with the c.l.d. product of Lebesgue measure on R
r
and Lebesgue measure on R
s
.
251O Finite products 203
proof Write
r
,
s
,
r+s
for the three versions of Lebesgue measure,

r
,

s
and

r+s
for the corresponding outer
measures, and for the outer measure on R
r
R
s
derived from
r
and
s
by the formula of 251A.
(a) If I R
r
and J R
s
are half-open intervals, then [I J] R
r+s
is also a half-open interval, and

r+s
([I J]) =
r
I
s
J;
this is immediate from the denition of the Lebesgue measure of an interval. (I speak of half-open intervals here, that
is, intervals of the form

1jr
[
j
,
j
[, because I used them in the denition of Lebesgue measure in 115. If you
prefer to work with open intervals or closed intervals it makes no dierence.) Note also that every half-open interval
in R
r+s
is expressible as [I J] for suitable I, J.
(b) For any A R
r+s
, (
1
[A])

r+s
(A). PPP For any > 0, there is a sequence K
n

nN
of half-open intervals
in R
r+s
such that A

nN
K
n
and

n=0

r+s
(K
n
)

r+s
(A) +. Express each K
n
as [I
n
J
n
], where I
n
and J
n
are half-open intervals in R
r
and R
s
respectively; then
1
[A]

nN
I
n
J
n
, so that
(
1
[A])

n=0

r
I
n

s
J
n
=

n=0

r+s
(K
n
)

r+s
(A) +.
As is arbitrary, we have the result. QQQ
(c) If E R
r
and F R
s
are measurable, then

r+s
([E F])
r
E
s
F.
PPP (i) Consider rst the case
r
E < ,
s
F < . In this case, given > 0, there are sequences I
n

nN
, J
n

nN
of half-open intervals such that E

nN
I
n
, F

nN
F
n
,

n=0

r
I
n

r
E + =
r
E +,

n=0

s
J
n

s
F + =
s
F +.
Accordingly E F

m,nN
I
m
J
n
and [E F]

m,nN
[I
m
J
n
], so that

r+s
([E F])

m,n=0

r+s
([I
m
J
n
]) =

m,n=0

r
I
m

s
J
n
=

m=0

r
I
m

n=0

s
J
n
(
r
E +)(
s
F +).
As is arbitrary, we have the result.
(ii) Next, if
r
E = 0, there is a sequence F
n

nN
of sets of nite measure covering R
s
F, so that

r+s
([E F])

n=0

r+s
([E F
n
])

n=0

r
E
s
F
n
= 0 =
r
E
s
F.
Similarly,

r+s
([E F])
r
E
s
F if
s
F = 0. The only remaining case is that in which both of
r
E,
s
F are
strictly positive and one is innite; but in this case
r
E
s
F = , so surely

r+s
([E F])
r
E
s
F. QQQ
(d) If A R
r+s
, then

r+s
(A) (
1
[A]). PPP Given > 0, there are sequences E
n

nN
, F
n

nN
of measurable
sets in R
r
, R
s
respectively such that
1
[A]

nN
E
n
F
n
and

n=0

r
E
n

s
F
n
(
1
[A]) + . Now A

nN
[E
n
F
n
], so

r+s
(A)

n=0

r+s
([E
n
F
n
])

n=0

r
E
n

s
F
n
(
1
[A]) +.
As is arbitrary, we have the result. QQQ
(e) Putting (c) and (d) together, we have (
1
[A]) =

r+s
(A) for every A R
r+s
. Thus on R
r
R
s
corresponds
exactly to

r+s
on R
r+s
. So the associated measures
0
,
r+s
must correspond in the same way, writing
0
for the
primitive product measure. But 251K tells us that
0
= , so we have the result.
251O In fact, a large proportion of the applications of the constructions here are to subspaces of Euclidean space,
rather than to the whole product R
r
R
s
. It would not have been especially dicult to write 251N out to deal with
arbitrary subspaces, but I prefer to give a more general description of the product of subspace measures, as I feel that
it illuminates the method. I start with a straightforward result on strictly localizable spaces.
Proposition Let (X, , ) and (Y, T, ) be strictly localizable measure spaces. Then the c.l.d. product measure on
X Y is strictly localizable; moreover, if X
i

iI
and Y
j

jJ
are decompositions of X and Y respectively, X
i

Y
j

(i,j)IJ
is a decomposition of X Y .
204 Product measures 251O
proof Let X
i

iI
and Y
j

jJ
be decompositions of X, Y respectively. Then X
i
Y
j

(i,j)IJ
is a partition of XY
into measurable sets of nite measure. If W X Y and W > 0, there are sets E , F T such that E < ,
F < and (W (E F)) > 0. We know that E =

iI
(E X
i
) and F =

jJ
(F Y
j
), so there must be
nite sets I
0
I, J
0
J such that
E F (

iI
0
(E X
i
))(

jJ
0
(F Y
j
)) < (W (E F)).
Setting E

iI
0
X
i
and F

jJ
0
Y
j
we have
((E F) (E

)) = (E F) ((E E

) (F F

)) < (W (E F)),
so that (W (E

)) > 0. There must therefore be some i I


0
, j J
0
such that (W (X
i
Y
j
)) > 0.
This shows that X
i
Y
j
: i I, j J satises the criterion of 213O, so that , being complete and locally
determined, must be strictly localizable. Because X
i
Y
j

(i,j)IJ
covers X Y , it is actually a decomposition of
X Y (213Ob).
251P Lemma Let (X, , ) and (Y, T, ) be measure spaces, and the c.l.d. product measure on X Y . Let

be the corresponding outer measure (132B). Then

C = sup(C (E F)) : E , F T, E < , F <


for every C X Y , where is the outer measure of 251A.
proof Write for the domain of ,
f
for E : E , E < , T
f
for F : F T, F < ; set u =
sup(C (E F)) : E
f
, F T
f
.
(a) If C W , E
f
and F T
f
, then
(C (E F)) (W (E F)) =
0
(W (E F))
(where
0
is the primitive product measure)
W.
As E and F are arbitrary, u W; as W is arbitrary, u

C.
(b) If u = , then of course

C = u. Otherwise, let E
n

nN
, F
n

nN
be sequences in
f
, T
f
respectively such
that
u = sup
nN
(C (E
n
F
n
)).
Consider C

= C (

nN
E
n

nN
F
n
). If E
f
and F T
f
, then for every n N we have
u (C ((E E
n
) (F F
n
)))
= (C ((E E
n
) (F F
n
)) (E
n
F
n
))
+(C ((E E
n
) (F F
n
)) (E
n
F
n
))
(because E
n
F
n
, by 251E)
(C (E
n
F
n
)) +(C

(E F)).
Taking the supremum of the right-hand expression as n varies, we have u u +(C

(E F)) so
(C

(E F)) = (C

(E F)) = 0.
As E and F are arbitrary, C

= 0.
But this means that

(C (
_
nN
E
n

_
nN
F
n
)) +

= lim
n

(C (
_
in
E
i

_
in
F
i
))
(using 132Ae)
u,
as required.
251Q Finite products 205
251Q Proposition Let (X, , ) and (Y, T, ) be measure spaces, and A X, B Y subsets; write
A
,
B
for the
subspace measures on A, B respectively. Let be the c.l.d. product measure on X Y , and
#
the subspace measure
it induces on AB. Let

be the c.l.d. product measure of
A
and
B
on AB. Then
(i)

extends
#
.
(ii) If
either () A and B T
or () A and B can both be covered by sequences of sets of nite measure
or () and are both strictly localizable,
then

=
#
.
proof Let be the outer measure on XY dened from and by the formula of 251A, and

the outer measure on
AB similarly dened from
A
and
B
. Write for the domain of ,

for the domain of

, and
#
= W (AB) :
W for the domain of
#
. Set
f
= E : E < , T
f
= F : F < .
(a) The rst point to observe is that

C = C for every C A B. PPP (i) If E
n

nN
and F
n

nN
are sequences
in , T respectively such that C

nN
E
n
F
n
, then
C = C (AB)

nN
(E
n
A) (F
n
B),
so

n=0

A
(E
n
A)
B
(F
n
B)
=

n=0

(E
n
A)

(F
n
B)

n=0
E
n
F
n
.
As E
n

nN
and F
n

nN
are arbitrary,

C C. (ii) If

E
n

nN
,

F
n

nN
are sequences in
A
= dom
A
, T
B
= dom
B
respectively such that C

nN

E
n


F
n
, then for each n N we can choose E
n
, F
n
T such that

E
n
E
n
, E
n
=


E
n
=
A

E
n
,

F
n
F
n
, F
n
=


F
n
=
B

F
n
,
and now
C

n=0
E
n
F
n
=

n=0

A

E
n

B

F
n
.
As

E
n

nN
,

F
n

nN
are arbitrary, C

C. QQQ
(b) It follows that
#


. PPP Suppose that V
#
and that C AB. In this case there is a W such that
V = W (AB). So

(C V ) +

(C V ) = (C W) +(C W) = C =

C.
As C is arbitrary, V

. QQQ
Accordingly, for V
#
,

#
V =

V = sup(V (E F)) : E
f
, F T
f

= sup(V (

E

F)) :

E
A
,

F T
B
,
A

E < ,
B

F <
= sup

(V (

E

F)) :

E
A
,

F T
B
,
A

E < ,
B

F < =

V,
using 251P twice.
This proves part (i) of the proposition.
(c) The next thing to check is that if V

and V E F where E
f
and F T
f
, then V
#
. PPP Let
W E F be a measurable envelope of V with respect to (132Ee). Then
(W (AB) V ) =

(W (AB) V ) =

(W (AB) V )
(because W (AB)
#


, V

)
206 Product measures 251Q
=

(W (AB))

V =

(W (AB))

V
= (W (AB)) V =

(W (AB))

V
W

V = 0.
But this means that W

= W (AB) V and V = (AB) (W W

) belongs to
#
. QQQ
(d) Now x any V

, and look at the conditions ()-() of part (ii) of the proposition.
() If A and B T, and C X Y , then AB (251E), so
(C V ) +(C V ) = (C V ) +((C V ) (AB)) +((C V ) (AB))
=

(C V ) +

(C (AB) V ) +(C (AB))
=

(C (AB)) +(C (AB))
= (C (AB)) +(C (AB)) = C.
As C is arbitrary, V , so V = V (AB) belongs to
#
.
() If A

nN
E
n
and B

nN
F
n
where all the E
n
, F
n
are of nite measure, then V =

m,nN
V (E
m
F
n
)

#
, by (c).
() If X
i

iI
, Y
j

jJ
are decompositions of X, Y respectively, then for each i I, j J we have V (X
i
Y
j
)

#
, that is, there is a W
ij
such that V (X
i
Y
j
) = W
ij
(A B). Now X
i
Y
j

(i,j)IJ
is a decomposition
of X Y for (251O), so that
W =

iI,jJ
W
ij
(X
i
Y
j
) ,
and V = W (AB)
#
.
(e) Thus any of the three conditions is sucient to ensure that

=
#
, in which case (a) tells us that

=
#
.
251R Corollary Let r, s 1 be integers, and : R
r
R
s
R
r+s
the natural bijection. If A R
r
and B R
s
,
then the restriction of to A B identies the product of Lebesgue measure on A and Lebesgue measure on B with
Lebesgue measure on [AB] R
r+s
.
Remark Note that by Lebesgue measure on A I mean the subspace measure
rA
on A induced by r-dimensional
Lebesgue measure
r
on R
r
, whether or not A is itself a measurable set.
proof By 251Q, using either of the conditions (ii-) or (ii-), the product measure

on A B is just the subspace
measure
#
on AB induced by the product measure on R
r
R
s
. But by 251N we know that is an isomorphism
between (R
r
R
s
, ) and (R
r+s
,
r+s
); so it must also identify

with the subspace measure on [AB].
251S Corollary Let (X, , ) and (Y, T, ) be measure spaces, and the c.l.d. product measure on X Y . If
A X and B Y can be covered by sequences of sets of nite measure, then

(AB) =

B.
proof In the language of 251Q,

(AB) =
#
(AB) =

(AB) =
A
A
B
B
(by 251K and 251E)
=

B.
251T The next proposition gives an idea of how the technical denitions here t together.
Proposition Let (X, , ) and (Y, T, ) be measure spaces. Write (X,

, ) and (X,

, ) for the completion and c.l.d.
version of (X, , ) (212C, 213E). Let ,

and

be the three c.l.d. product measures on X Y obtained from the
pairs (, ), ( , ) and ( , ) of factor measures. Then =

=

.
proof Write ,

and

for the domains of ,

,

respectively; and ,

,

for the outer measures on XY obtained
by the formula of 251A from the three pairs of factor measures.
*251W Finite products 207
(a) If E and E < , then ,

and

agree on subsets of E Y . PPP Take A E Y and > 0.
(i) There are sequences E
n

nN
in , F
n

nN
in T such that A

nN
E
n
F
n
and

n=0
E
n
F
n
A+.
Now E
n
E
n
for every n (213Fb), so

n=0
E
n
F
n

n=0
E
n
F
n
A+.
(ii) There are sequences

E
n

nN
in

,

F
n

nN
in T such that A

nN

E
n


F
n
and

n=0


E
n


F
n


A+.
Now for each n there is an E

n
such that

E
n
E

n
and E

n
=

E
n
, so that
A

n=0
E

n


F
n
=

n=0


E
n


F
n


A+.
(iii) There are sequences

E
n

nN
in

,

F
n

nN
in T such that A

nN

E
n


F
n
and

n=0


E
n


F
n


A+.
Now for each n,

E
n
E

, so

n=0
(

E
n
E)

F
n

n=0


E
n


F
n


A+.
(iv) Since A and are arbitrary, =

=

on T(E Y ). QQQ
(b) Consequently, the outer measures

and

are identical. PPP Use 251P. Take A X Y , E ,



E

,

E

, F T such that E,

E,

E and F are all nite. Then
(i)
(A (E F)) =

(A (E F))

A, (A (E F)) =

(A (E F))

A
because E and E are both nite.
(ii) There is an E

such that

E E

and E

< , so that

(A (

E F))

(A (E

F)) = (A (E

F))

A.
(iii) There is an E

such that E



E and (

EE

) = 0 (213Fc), so that

((

EE

)Y ) = 0 and E

< ;
accordingly

(A (

E F)) =

(A (E

F)) = (A (E

F))

A.
(iv) Taking the suprema over E,

E,

E and F, we get

A,

A,

A,

A.
As A is arbitrary,

. QQQ
(c) Now ,

and

are all complete and locally determined, so by 213C are the measures dened by Caratheodorys
method from their own outer measures, and are therefore identical.
251U It is obvious, and an easy consequence of theorems so far proved, that the set (x, x) : x R is negligible
for Lebesgue measure on R
2
. The corresponding result is true in the square of any atomless measure space.
Proposition Let (X, , ) be an atomless measure space, and let be the c.l.d. measure on XX. Then = (x, x) :
x X is -negligible.
proof Let E, F be sets of nite measure, and n N. Applying 215D repeatedly, we can nd a disjoint family
F
i

i<n
of measurable subsets of F such that F
i
=
F
n+1
for each i; setting F
n
= F

i<n
F
i
, we also have F
n
=
F
n+1
.
Now
(E F)

in
(E F
i
) F
i
,
so

( (E F))

n
i=0
(E F
i
) F
i
=
F
n+1

n
i=0
(E F
i
)
1
n+1
E F.
As n is arbitrary, ( (E F)) = 0; as E and F are arbitrary, = 0.
*251W Products of more than two spaces The whole of this section can be repeated for arbitrary nite
products. The labour is substantial but no new ideas are required. By the time we need the general construction in
208 Product measures *251W
any formal way, it should come very naturally, and I do not think it is necessary to work through the next couple of
pages before proceeding, especially as products of probability spaces are dealt with in 254. However, for completeness,
and to help locate results when applications do appear, I list them here. They do of course constitute a very instructive
set of exercises. The most important fragments are repeated in 251Xe-251Xf.
Let (X
i
,
i
,
i
)
iI
be a nite family of measure spaces, and set X =

iI
X
i
. Write
f
i
= E : E
i
,
i
E <
for each i I.
(a) For A X set
A = inf

n=0

iI

i
E
ni
: E
ni

i
i I, n N, A
_
nN

iI
E
ni
.
Then is an outer measure on X. Let
0
be the measure on X derived by Caratheodorys method from , and its
domain.
(b) If X
i

iI
is a nite family of sets and
i
is a -algebra of subsets of X
i
for each i I, then

iI

i
is the
-algebra of subsets of X =

iI
X
i
generated by

iI
E
i
: E
i

i
for every i I. (For the corresponding
construction when I is innite, see 254E.)
(c)
0
(

iI
E
i
) is dened and equal to

iI

i
E
i
whenever E
i

i
for each i I.
(d) The c.l.d. product measure on X is the measure dened by setting
W = sup
0
(W

iI
E
i
) : E
i

f
i
for each i I
for W . If I is empty, so that X = , then the appropriate convention is to set X = 1.
(e) If H X, then H i H

iI
E
i
whenever E
i

f
i
for each i I.
(f )(i)

iI

i
and (

iI
E
i
) =

iI

i
E
i
whenever E
i

f
i
for each i.
(ii) For every W there is a V

iI

i
such that V W and V = W.
(iii) is complete and locally determined, and is the c.l.d. version of
0
.
(iv) If W and W > 0 then there are E
i

f
i
, for i I, such that (W

iI
E
i
) > 0.
(v) If W and W < , then for every > 0 there are n N and E
0i
, . . . , E
ni

f
i
, for each i I, such that
(W

kn

iI
E
ki
) .
(g) If each
i
is -nite, so is , and =
0
is the completion of its restriction to

iI

i
.
(h) If I
j

jJ
is any partition of I, then can be identied with the c.l.d. product of
j

jJ
, where
j
is the c.l.d.
product of
i

iI
j
. (See the arguments in 251N and also in 254N below.)
(i) If I = 1, . . . , n and each
i
is Lebesgue measure on R, then can be identied with Lebesgue measure on R
n
.
(j) If, for each i I, we have a decomposition X
ij

jJ
i
of X
i
, then

iI
X
i,f(i)

iI
J
i
is a decomposition of X.
(k) For any C X,

C = sup(C

iI
E
i
) : E
i

f
i
for every i I.
(l) Suppose that A
i
X
i
for each i I. Write
#
for the subspace measure on A =

iI
A
i
, and

for the c.l.d.
product of the subspace measures on the A
i
. Then

extends
#
, and if
either A
i

i
for every i
or every A
i
can be covered by a sequence of sets of nite measure
or every
i
is strictly localizable,
then

=
#
.
(m) If A
i
X
i
can be covered by a sequence of sets of nite measure for each i I, then

iI
A
i
) =

iI

i
A
i
.
251Xi Finite products 209
(n) Writing
i
,
i
for the completion and c.l.d. version of each
i
, is the c.l.d. product of
i

iI
and also of

i

iI
.
(o) If all the (X
i
,
i
,
i
) are the same atomless measure space, then x : x X, i x(i) is injective is -conegligible.
(p) Now suppose that we have another family (Y
i
, T
i
,
i
)
iI
of measure spaces, with product (Y,

), and
inverse-measure-preserving functions f
i
: X
i
Y
i
for each i; dene f : X Y by saying that f(x)(i) = f
i
(x(i)) for
x X and i I. If all the
i
are -nite, then f is inverse-measure-preserving for and

.
251X Basic exercises (a) Let X and Y be sets, / TX and B TY . Let be the -algebra of subsets of X
generated by / and T the -algebra of subsets of Y generated by B. Show that

T is the -algebra of subsets of


X Y generated by AB : A /, B B.
(b) Let (X, , ) and (Y, T, ) be measure spaces; let
0
be the primitive product measure on X Y , and the
c.l.d. product measure. Show that
0
W < i W < and W is included in a set of the form
(E Y ) (X F)

nN
E
n
F
n
where E = F = 0 and E
n
< , F
n
< for every n.
>>>(c) Show that if X and Y are any sets, with their respective counting measures, then the primitive and c.l.d.
product measures on X Y are both counting measure on X Y .
(d) Let (X, , ) and (Y, T, ) be measure spaces; let
0
be the primitive product measure on X Y , and the
c.l.d. product measure. Show that

0
is locally determined

0
is semi-nite

0
=

0
and have the same negligible sets.
>>>(e) (See 251W.) Let (X
i
,
i
,
i
)
iI
be a family of measure spaces, where I is a non-empty nite set. Set
X =

iI
X
i
. For A X, set
A = inf

n=0

iI

i
E
ni
: E
ni

i
n N, i I, A

nN

iI
E
ni
.
Show that is an outer measure on X. Let
0
be the measure dened from by Caratheodorys method, and for
W dom
0
set
W = sup
0
(W

iI
E
i
) : E
i

i
,
i
E
i
< for every i I.
Show that is a measure on X, and is the c.l.d. version of
0
.
>>>(f ) (See 251W.) Let I be a non-empty nite set and (X
i
,
i
,
i
)
iI
a family of measure spaces. For non-empty
K I set X
(K)
=

iK
X
i
and let
(K)
0
,
(K)
be the measures on X
(K)
constructed as in 251Xe. Show that if K is
a non-empty proper subset of I, then the natural bijection between X
(I)
and X
(K)
X
(I\K)
identies
(I)
0
with the
primitive product measure of
(K)
0
and
(I\K)
0
, and
(I)
with the c.l.d. product measure of
(K)
and
(I\K)
.
>>>(g) Using 251Xe-251Xf above, or otherwise, show that if (X
1
,
1
,
1
), (X
2
,
2
,
2
), (X
3
,
3
,
3
) are measure spaces
then the primitive and c.l.d. product measures
0
, of (X
1
X
2
) X
3
, constructed by rst taking the appropriate
product measure on X
1
X
2
and then taking the product of this with the measure of X
3
, are identied with the
corresponding product measures on X
1
(X
2
X
3
) by the canonical bijection between the sets (X
1
X
2
) X
3
and
X
1
(X
2
X
3
).
(h)(i) What happens in 251Xe when I is a singleton? (ii) Devise an appropriate convention to make 251Xe-251Xf
remain valid when one or more of the sets I, K, I K there is empty.
>>>(i) Let (X, , ) be a complete locally determined measure space, and I any non-empty set; let be counting
measure on I. Show that the c.l.d. product measure on X I is equal to (or at any rate identiable with) the direct
sum measure of the family (X
i
,
i
,
i
)
iI
, if we set (X
i
,
i
,
i
) = (X, , ) for every i.
210 Product measures 251Xj
>>>(j) Let (X
i
,
i
,
i
)
iI
be a family of measure spaces, with direct sum (X, , ) (214L). Let (Y, T, ) be any
measure space, and give X Y , X
i
Y their c.l.d. product measures. Show that the natural bijection between X Y
and Z =

iI
(X
i
Y ) i is an isomorphism between the measure of X Y and the direct sum measure on Z.
>>>(k) Let (X, , ) be any measure space, and Y a singleton set y; let be the measure on Y such that Y = 1.
Show that the natural bijection between X y and X identies the primitive product measure on X y with
as dened in 213Xa, and the c.l.d. product measure with the c.l.d. version of . Explain how to put this together with
251Xg and 251Ic to prove 251T.
>>>(l) Let (X, , ) and (Y, T, ) be measure spaces, and the c.l.d. product measure on X Y . Show that is the
c.l.d. version of its restriction to

T.
(m) Let (X, , ) and (Y, T, ) be measure spaces, with primitive and c.l.d. product measures
0
, . Let
1
be any
measure with domain

T such that
1
(EF) = E F whenever E and F T. Show that W
1
W
0
W
for every W

T.
(n) Let (X, , ) and (Y, T, ) be two measure spaces, and
0
the primitive product measure on X Y . Show that
the corresponding outer measure

0
is just the outer measure of 251A.
(o) Let (X, , ) and (Y, T, ) be measure spaces, and A X, B Y subsets; write
A
,
B
for the subspace
measures. Let
0
be the primitive product measure on X Y , and
#
0
the subspace measure it induces on AB. Let

0
be the primitive product measure of
A
and
B
on AB. Show that

0
extends
#
0
. Show that if either () A
and B T or () A and B can both be covered by sequences of sets of nite measure or () and are both strictly
localizable, then

0
=
#
0
.
(p) In 251Q, show that

and
#
will have the same null ideals, even if none of the conditions of 251Q(ii) are
satised.
(q) Let (X, , ) and (Y, T, ) be any measure spaces, and
0
the primitive product measure on X Y . Show that

0
(AB) =

B for any A X and B Y .


(r) Let (X, , ) and (Y, T, ) be measure spaces, and the completion of . Show that , and , have the
same primitive product measures.
(s) Let (X, , ) be a semi-nite measure space. Show that is atomless i the diagonal (x, x) : x X is negligible
for the c.l.d. product measure on X X.
(t) Let (X, , ) be an atomless measure space, and (Y, T, ) any measure space. Show that the c.l.d. product
measure on X Y is atomless.
>>>(u) Let (X, , ) and (Y, T, ) be measure spaces, and the c.l.d. product measure on X Y . (i) Show that if
and are purely atomic, so is . (ii) Show that if and are point-supported, so is .
251Y Further exercises (a) Let X, Y be sets with -algebras of subsets , T. Suppose that h : X Y R is

T-measurable and : X Y is (, T)-measurable (121Yc). Show that x h(x, (x)) : X R is -measurable.


(b) Show that there are measure spaces (X
1
,
1
,
1
) and (X
2
,
2
,
2
), a probability space (Y, T, ) and an inverse-
measure-preserving function f : X
1
X
2
such that h : X
1
Y X
2
Y is not inverse-measure-preserving for the
c.l.d. product measures on X
1
Y and X
2
Y , where h(x, y) = (f(x), y) for x X
1
and y Y .
(c) Let (X, , ) be a complete locally determined measure space with a subspace A whose measure is not locally
determined (see 216Xb). Set Y = 0, Y = 1 and consider the c.l.d. product measures on X Y and A Y ; write
,

for their domains. Show that

properly includes W (AY ) : W .
(d) Let (X, , ) be any measure space, (Y, T, ) an atomless measure space, and f : X Y a (, T)-measurable
function. Show that (x, f(x)) : x X is negligible for the c.l.d. product measure on X Y .
251 Notes Finite products 211
251 Notes and comments There are real diculties in deciding which construction to declare as the product of
two arbitrary measures. My phrase primitive product measure, and notation
0
, betray a bias; my own preference is
for the c.l.d. product , for two principal reasons. The rst is that
0
is likely to be bad, in particular, not semi-nite,
even if and are good (251Xd, 252Yk), while inherits some of the most important properties of and (e.g.,
251O); the second is that in the case of topological measure spaces X and Y , there is often a canonical topological
measure on X Y , which is likely to be more closely related to than to
0
. But for elucidation of this point I must
ask you to wait until 417 in Volume 4.
It would be possible to remove the primitive product measure entirely from the exposition, or at least to relegate it
to the exercises. This is indeed what I expect to do in the rest of this treatise, since (in my view) all signicant features
of product measures on nitely many factors can be expressed in terms of the c.l.d. product measure. For the rst
introduction to product measures, however, a direct approach to the c.l.d. product measure (through the description of

in 251P, for instance) is an uncomfortably large bite, and I have some sort of duty to present the most natural rival
to the c.l.d. product measure prominently enough for you to judge for yourself whether I am right to dismiss it. There
certainly are results associated with the primitive product measure (251Xn, 251Xq, 252Yc) which have an agreeable
simplicity.
The clash is avoided altogether, of course, if we specialize immediately to -nite spaces, in which the two construc-
tions coincide (251K). But even this does not solve all problems. There is a popular alternative measure often called
the product measure: the restriction
0B
of
0
to the -algebra

T. (See, for instance, Halmos 50.) The advantage


of this is that if a function f on X Y is

T-measurable, then x f(x, y) is -measurable for every y Y . (This


is because
W : W X Y, x : (x, y) W y Y
is a -algebra of subsets of X Y containing E F whenever E and F T, and therefore including

T.) The
primary objection, to my mind, is that Lebesgue measure on R
2
is no longer the product of Lebesgue measure on
R with itself. Generally, it is right to seek measures which measure as many sets as possible, and I prefer to face up
to the technical problems (which I acknowledge are o-putting) by seeking appropriate denitions on the approach to
major theorems, rather than rely on ad hoc xes when the time comes to apply them.
I omit further examples of product measures for the moment, because the investigation of particular examples will
be much easier with the aid of results from the next section. Of course the leading example, and the one which should
come always to mind in response to the words product measure, is Lebesgue measure on R
2
, the case r = s = 1 of
251N and 251R. For an indication of what can happen when one of the factors is not -nite, you could look ahead to
252K.
I hope that you will see that the denition of the outer measure in 251A corresponds to the standard denition
of Lebesgue outer measure, with measurable rectangles E F taking the place of intervals, and the functional
E F E F taking the place of length or volume of an interval; moreover, thinking of E and F as intervals,
there is an obvious relation between Lebesgue measure on R
2
and the product measure on RR. Of course an obvious
relationship is not the same thing as a proper theorem with exact hypotheses and conclusions, but Theorem 251N
is clearly central. Long before that, however, there is another parallel between the construction of 251A and that of
Lebesgue measure. In both cases, the proof that we have an outer measure comes directly from the dening formula
(in 113Yd I gave as an exercise a general result covering 251B), and consequently a very general construction can
lead us to a measure. But the measure would be of far less interest and value if it did not measure, and measure
correctly, the basic sets, in this case the measurable rectangles. Thus 251E corresponds to the theorem that intervals
are Lebesgue measurable, with the right measure (114Db, 114G). This is the real key to the construction, and is one
of the fundamental ideas of measure theory.
Yet another parallel is in 251Xn; the outer measure dening the primitive product measure
0
is exactly equal to
the outer measure dened from
0
. I described the corresponding phenomenon for Lebesgue measure in 132C.
Any construction which claims the title canonical must satisfy a variety of natural requirements; for instance, one
expects the canonical bijection between X Y and Y X to be an isomorphism between the corresponding product
measure spaces. Commutativity of the product in this sense is I think obvious from the denitions in 251A-251C. It
is obviously desirable not, I think, obviously true that the product should be associative in that the canonical
bijection between (X Y ) Z and X (Y Z) should also be an isomorphism between the corresponding products
of product measures. This is in fact valid for both the primitive and c.l.d. product measures (251Wh, 251Xe-251Xg).
Working through the classication of measure spaces presented in 211, we nd that the primitive product measure

0
of arbitrary factor measures , is complete, while the c.l.d. product measure is always complete and locally
determined.
0
may not be semi-nite, even if and are strictly localizable (252Yk); but will be strictly localizable
if and are (251O). Of course this is associated with the fact that the c.l.d. product measure is distributive over
212 Product measures 251 Notes
direct sums (251Xj). If either or is atomless, so is (251Xt). Both and
0
are -nite if and are (251K). It
is possible for both and to be localizable but not (254U).
At least if you have worked through Chapter 21, you have now done enough pure measure theory for this kind
of investigation, however straightforward, to raise a good many questions. Apart from direct sums, we also have the
constructions of completion, subspace, outer measure and (in particular) c.l.d. version to integrate into the new
ideas; I oer some results in 251T and 251Xk. Concerning subspaces, some possibly surprising diculties arise. The
problem is that the product measure on the product of two subspaces can have a larger domain than one might expect.
I give a simple example in 251Yc and a more elaborate one in 254Ye. For strictly localizable spaces, there is no
problem (251Q); but no other criterion drawn from the list of properties considered in 251 seems adequate to remove
the possibility of a disconcerting phenomenon.
252 Fubinis theorem
Perhaps the most important feature of the concept of product measure is the fact that we can use it to discuss
repeated integrals. In this section I give versions of Fubinis theorem and Tonellis theorem (252B, 252G) with a
variety of corollaries, the most useful ones being versions for -nite spaces (252C, 252H). As applications I describe
the relationship between integration and measuring ordinate sets (252N) and calculate the r-dimensional volume of
a ball in R
r
(252Q, 252Xi). I mention counter-examples showing the diculties which can arise with non--nite
measures and non-integrable functions (252K-252L, 252Xf-252Xg).
252A Repeated integrals Let (X, , ) and (Y, T, ) be measure spaces, and f a real-valued function dened on
a set domf X Y . We can seek to form the repeated integral
__
f(x, y)(dy)(dx) =
_ __
f(x, y)(dy)
_
(dx),
which should be interpreted as follows: set
D = x : x X,
_
f(x, y)(dy) is dened in [, ],
g(x) =
_
f(x, y)(dy) for x D,
and then write
__
f(x, y)(dy)(dx) =
_
g(x)(dx) if this is dened. Of course the subset of Y on which y f(x, y)
is dened may vary with x, but it must always be conegligible, as must D.
Similarly, exchanging the roles of X and Y , we can seek a repeated integral
__
f(x, y)(dx)(dy) =
_ __
f(x, y)(dx)
_
(dy).
The point is that, under appropriate conditions on and , we can relate these repeated integrals to each other by
connecting them both with the integral of f itself with respect to the product measure on X Y .
As will become apparent shortly, it is essential here to allow oneself to discuss the integral of a function which is not
everywhere dened. It is of less importance whether one allows integrands and integrals to take innite values, but for
deniteness let me say that I shall be following the rules of 135F; that is,
_
f =
_
f
+

_
f

provided that f is dened


almost everywhere, takes values in [, ] and is virtually measurable, and at most one of
_
f
+
,
_
f

is innite.
252B Theorem Let (X, , ) and (Y, T, ) be measure spaces, with c.l.d. product (X Y, , ) (251F). Suppose
that is -nite and that is either strictly localizable or complete and locally determined. Let f be a [, ]-valued
function such that
_
fd is dened in [, ]. Then
__
f(x, y)(dy)(dx) is dened and is equal to
_
fd.
proof The proof of this result involves substantial technical diculties. If you have not seen these ideas before, you
should almost certainly not go straight to the full generality of the version announced above. I will therefore start by
writing out a proof in the case in which both and are totally nite; this is already lengthy enough. I will present it
in such a way that only the central section (part (b) below) needs to be amended in the general case, and then, after
completing the proof of the special case, I will give the alternative version of (b) which is required for the full result.
(a) Write L for the family of [0, ]-valued functions f such that
_
fd and
__
f(x, y)(dy)(dx) are dened and
equal. My aim is to show rst that f L whenever f is non-negative and
_
fd is dened, and then to look at dierences
of functions in L. To prove that enough functions belong to L, my strategy will be to start with elementary functions
and work outwards through progressively larger classes. It is most ecient to begin by describing ways of building new
members of L from old, as follows.
(i) f
1
+f
2
L for all f
1
, f
2
L, and cf L for all f L, c [0, [; this is because
252B Fubinis theorem 213
_
(f
1
+f
2
)(x, y)(dy) =
_
f
1
(x, y)(dy) +
_
f
2
(x, y)(dy),
_
(cf)(x, y)(dy) = c
_
f(x, y)(dy)
whenever the right-hand sides are dened, which we are supposing to be the case for almost every x, so that
__
(f
1
+f
2
)(x, y)(dy)(dx) =
__
f
1
(x, y)(dy)(dx) +
__
f
2
(x, y)(dy)(dx)
=
_
f
1
d +
_
f
2
d =
_
(f
1
+f
2
)d,
__
(cf)(x, y)(dy)(dx) = c
_
f(x, y)(dy)(dx) = c
_
fd =
_
(cf)d.
(ii) If f
n

nN
is a sequence in L such that f
n
(x, y) f
n+1
(x, y) whenever n N and (x, y) domf
n
domf
n+1
,
then sup
nN
f
n
L. PPP Set f = sup
nN
f
n
; for x X, n N set g
n
(x) =
_
f
n
(x, y)(dy) when the integral is dened
in [0, ]. Since here I am allowing as a value of a function, it is natural to regard f as dened on

nN
domf
n
. By
B.Levis theorem,
_
fd = sup
nN
_
f
n
d; write u for this common value in [0, ]. Next, because f
n
f
n+1
wherever
both are dened, g
n
g
n+1
wherever both are dened, for each n; we are supposing that f
n
L, so g
n
is dened
-almost everywhere for each n, and
sup
nN
_
g
n
d = sup
nN
_
f
n
d = u.
By B.Levis theorem again,
_
g d = u, where g = sup
nN
g
n
. Now take any x

nN
domg
n
, and consider the
functions f
xn
on Y , setting f
xn
(y) = f
n
(x, y) whenever this is dened. Each f
xn
has an integral in [0, ], and
f
xn
(y) f
x,n+1
(y) whenever both are dened, and
sup
nN
_
f
xn
d = g(x);
so, using B.Levis theorem for a third time,
_
(sup
nN
f
xn
)d is dened and equal to g(x), that is,
_
f(x, y)(dy) = g(x).
This is true for almost every x, so
__
f(x, y)(dy)(dx) =
_
g d = u =
_
fd.
Thus f L, as claimed. QQQ
(iii) The expression of the ideas in the next section of the proof will go more smoothly if I introduce another
term. Write J for W : W X Y, W L. Then
() if W, W

J and W W

= , W W

J
by (i), because (W W

) = W +W

,
()

nN
W
n
J whenever W
n

nN
is a non-decreasing sequence in J
because W
n

nN
W, and we can use (ii).
It is also helpful to note that, for any W X Y and any x X,
_
W(x, y)(dy) = W[x], at least whenever
W[x] = y : (x, y) W is measured by . Moreover, because is complete, a set W X Y belongs to i W
is -virtually measurable i
_
W d is dened in [0, ], and in this case W =
_
W d.
(iv) Finally, we need to observe that, in appropriate circumstances, the dierence of two members of J will belong
to J: if W, W

J and W W

and W

< , then W

W J. PPP We are supposing that g(x) =


_
W(x, y)(dy)
and g

(x) =
_
W

(x, y)(dy) are dened for almost every x, and that


_
g d = W,
_
g

d = W

. Because W

is
nite, g

must be nite almost everywhere, and D = x : x domg domg

, g

(x) < is conegligible. Now, for any


x D, both g(x) and g

(x) are nite, so


y (W

W)(x, y) = W

(x, y) W(x, y)
is the dierence of two integrable functions, and
_
(W

W)(x, y)(dy) =
_
W

(x, y) W(x, y)(dy)


=
_
W

(x, y)(dy)
_
W(x, y)(dy) = g

(x) g(x).
Accordingly
214 Product measures 252B
__
(W

W)(x, y)(dy)(dx) =
_
g

(x) g(x)(dx) = W

W = (W

W),
and W

W belongs to J. QQQ
(Of course the argument just above can be shortened by a few words if we allow ourselves to assume that and
are totally nite, since then g(x) and g

(x) will be nite whenever they are dened; but the key idea, that the dierence
of integrable functions is integrable, is unchanged.)
(b) Now let us examine the class J, assuming that and are totally nite.
(i) E F J for all E , F T. PPP (E F) = E F (251J), and
_
(E F)(x, y)(dy) = F E(x)
for each x, so
__
(E F)(x, y)(dy)(dx) =
_
(F E(x))(dx) = E F
= (E F) =
_
(E F)d. QQQ
(ii) Let ( be E F : E , F T. Then ( is closed under nite intersections (because (E F) (E

) =
(E E

) (F F

)) and is included in J. In particular, X Y J. But this, together with (a-iv) and (a-iii-)
above, means that J is a Dynkin class (denition: 136A), so includes the -algebra of subsets of X Y generated by
(, by the Monotone Class Theorem (136B); that is, J

T (denition: 251D).
(iii) Next, W J whenever W X Y is -negligible. PPP By 251Ib, there is a V

T such that V
(X Y ) W and V = ((X Y ) W). Because (X Y ) = X Y is nite, V

= (X Y ) V is -negligible, and
we have W V

T. Consequently
0 = V

=
__
V

(x, y)(dy)(dx).
But this means that
D = x :
_
V

(x, y)(dy) is dened and equal to 0


is conegligible. If x D, then we must have V

(x, y) = 0 for -almost every y, that is, V

[x] is negligible; in
which case W[x] V

[x] also is negligible, and


_
W(x, y)(dy) = 0. And this is true for every x D, so
_
W(x, y)(dy) is dened and equal to 0 for almost every x, and
__
W(x, y)(dy)(dx) = 0 = W,
as required. QQQ
(iv) It follows that J. PPP If W , then, by 251Ib again, there is a V

T such that V W and


V = W, so that (W V ) = 0. Now V J by (ii) and W V J by (iii), so W J by (a-iii-). QQQ
(c) I return to the class L.
(i) If f L and g is a [0, ]-valued function dened and equal to f -a.e., then g L. PPP Set
W = (X Y ) (x, y) : (x, y) domf domg, f(x, y) = g(x, y),
so that W = 0. (Remember that is complete.) By (b),
__
W(x, y)(dy)(dx) = 0, that is, W[x] is -negligible
for -almost every x. Let D be x : x X, W[x] is -negligible. Then D is -conegligible. If x D, then
W[x] = Y y : (x, y) domf domg, f(x, y) = g(x, y)
is negligible, so that
_
f(x, y)(dy) =
_
g(x, y)(dy) if either is dened. Thus the functions x
_
f(x, y)(dy),
x
_
g(x, y)(dy) are equal almost everywhere, and
__
g(x, y)(dy)(dx) =
__
f(x, y)(dy)(dx) =
_
fd =
_
g d,
so that g L. QQQ
(ii) Now let f be any non-negative function such that
_
fd is dened in [0, ]. Then f L. PPP For k, n N set
W
nk
= (x, y) : (x, y) domf, f(x, y) 2
n
k.
Because is complete and f is -virtually measurable and domf is conegligible, every W
nk
belongs to , so W
nk
L,
by (b). Set f
n
=

4
n
k=1
2
n
W
nk
, so that
252B Fubinis theorem 215
f
n
(x, y) = 2
n
k if k 4
n
and 2
n
k f(x, y) < 2
n
(k + 1),
= 2
n
if f(x, y) 2
n
,
= 0 if (x, y) (X Y ) domf.
By (a-i), f
n
L for every n N, while f
n

nN
is non-decreasing, so f

= sup
nN
f
n
L, by (a-ii). But f =
a.e.
f

, so
f L, by (i) just above. QQQ
(iii) Finally, let f be any [, ]-valued function such that
_
fd is dened in [, ]. Then
_
f
+
d,
_
f

d
are both dened and at most one is innite. By (ii), both f
+
and f

belong to L. Set g(x) =


_
f
+
(x, y)(dy),
h(x) =
_
f

(x, y)(dy) whenever these are dened; then


_
g d =
_
f
+
d and
_
hd =
_
f

d are both dened in


[0, ].
Suppose rst that
_
f

d is nite. Then
_
hd is nite, so h must be nite -almost everywhere; set
D = x : x domg domh, h(x) < .
For any x D,
_
f
+
(x, y)(dy) and
_
f

(x, y)(dy) are dened in [0, ], and the latter is nite; so


_
f(x, y)(dy) =
_
f
+
(x, y)(dy)
_
f

(x, y)(dy) = g(x) h(x)


is dened in ], ]. Because D is conegligible,
__
f(x, y)(dy)(dx) =
_
g(x) h(x)(dx) =
_
g d
_
hd
=
_
f
+
d
_
f

d =
_
fd,
as required.
Thus we have the result when
_
f

d is nite. Similarly, or by applying the argument above to f,


__
f(x, y)(dy)(dx) =
_
fd if
_
f
+
d is nite.
Thus the theorem is proved, at least when and are totally nite.
(b*) The only point in the argument above where we needed to know anything special about the measures and
was in part (b), when showing that J. I now return to this point under the hypotheses of the theorem as stated,
that is -nite and is either strictly localizable or complete and locally determined.
(i) It will be helpful to note that the completion of (212C) is identical with its c.l.d. version (213E). PPP If
is strictly localizable, then = by 213Ha. If is complete and locally determined, then = = (212D, 213Hf).
QQQ
(ii) Write
f
= G : G , G < , T
f
= H : H T, H < . For G
f
, H T
f
let
G
,
H
and
GH
be the subspace measures on G, H and G H respectively; then
GH
is the c.l.d. product measure of
G
and
H
(251Q(ii-)). Now W (GH) J for every W . PPP W (GH) belongs to the domain of
GH
, so by (b) of
this proof, applied to the totally nite measures
G
and
H
,
(W (GH)) =
GH
(W (GH))
=
_
G
_
H
(W (GH))(x, y)
H
(dy)
G
(dx)
=
_
G
_
Y
(W (GH))(x, y)(dy)
G
(dx)
(because (W (GH))(x, y) = 0 if y Y H, so we can use 131E)
=
_
X
_
Y
(W (GH))(x, y)(dy)(dx)
by 131E again, because
_
Y
(W (GH))(x, y)(dy) = 0 if x X G. So W (GH) J. QQQ
(iii) In fact, W J for every W . PPP Remember that we are supposing that is -nite. Let Y
n

nN
be
a non-decreasing sequence in T
f
covering Y , and for each n N set W
n
= W (X Y
n
), g
n
(x) =
_
W
n
(x, y)(dy)
whenever this is dened. For any G
f
,
_
G
g
n
d =
__
(W (GY
n
))(x, y)(dy)(dx)
216 Product measures 252B
is dened and equal to (W (G Y
n
)), by (ii). But this means, rst, that G domg
n
is negligible, that is, that
(G domg
n
) = 0. Since this is so whenever G is nite, (X domg
n
) = 0, and g
n
is dened -a.e.; but = ,
so g
n
is dened -a.e., that is, -a.e. (212Eb). Next, if we set E
na
= x : x domg
n
, g
n
(x) a for a R, then
E
na
G

whenever G
f
, where

is the domain of ; by the denition in 213D, E
na
is measured by = . As
a is arbitrary, g
n
is -virtually measurable (212Fa).
We can therefore speak of
_
g
n
d. Now
__
W
n
(x, y)(dy)(dx) =
_
g
n
d = sup
G
f
_
G
g
n
(213B, because is semi-nite)
= sup
G
f
(W (GY
n
)) = (W (X Y
n
))
by the denition in 251F. Thus W (X Y
n
) J.
This is true for every n N. Because Y
n

nN
Y , W J, by (a-iii-). QQQ
(iv) We can therefore return to part (c) of the argument above and conclude as before.
252C The theorem above is of course asymmetric, in that dierent hypotheses are imposed on the two factor
measures and . If we want a symmetric theorem we have to suppose that they are both -nite, as follows.
Corollary Let (X, , ) and (Y, T, ) be two -nite measure spaces, and the c.l.d. product measure on X Y . If
f is -integrable, then
__
f(x, y)(dy)(dx) and
__
f(x, y)(dx)(dy) are dened, nite and equal.
proof Since and are surely strictly localizable (211Lc), we can apply 252B from either side to conclude that
__
f(x, y)(dy)(dx) =
_
f d =
__
f(x, y)(dx)(dy).
252D So many applications of Fubinis theorem are to characteristic functions that I take a few lines to spell out
the form which 252B takes in this case, as in parts (b)-(b*) of the proof there.
Corollary Let (X, , ) and (Y, T, ) be measure spaces and the c.l.d. product measure on X Y . Suppose that
is -nite and that is either strictly localizable or complete and locally determined.
(i) If W dom, then
_

W[x](dx) is dened in [0, ] and equal to W.


(ii) If is complete, we can write
_
W[x](dx) in place of
_

W[x](dx).
proof The point is just that
_
W(x, y)(dy) = W[x] whenever either is dened, where is the completion of
(212F). Now 252B tells us that
W =
__
W(x, y)(dy)(dx) =
_
W[x](dx).
We always have W[x] =

W[x], by the denition of (212C); and if is complete, then = so W =


_
W[x](dx).
252E Corollary Let (X, , ) and (Y, T, ) be measure spaces, with c.l.d. product (X Y, , ). Suppose that
is -nite and that is has locally determined negligible sets (213I). Then if f is a -measurable real-valued function
dened on a subset of X Y , y f(x, y) is -virtually measurable for -almost every x X.
proof Let

f be a -measurable extension of f to a real-valued function dened everywhere in X Y (121I), and set

f
x
(y) =

f(x, y) for all x X, y Y ,
D = x : x X,

f
x
is -virtually measurable.
If G and G < , then G D is negligible. PPP Let Y
n

nN
be a non-decreasing sequence of sets of nite
measure covering Y respectively, and set

f
n
(x, y) =

f(x, y) if x G, y Y
n
and [

f(x, y)[ n,
= 0 for other x X Y.
Then each

f
n
is -integrable, being bounded and -measurable and zero o G Y
n
. Consequently, setting

f
nx
(y) =

f
n
(x, y),
252G Fubinis theorem 217
_
(
_

f
nx
d)(dx) exists =
_

f
n
d.
But this surely means that

f
nx
is -integrable, therefore -virtually measurable, for almost every x X. Set
D
n
= x : x X,

f
nx
is -virtually measurable;
then every D
n
is -conegligible, so

nN
D
n
is conegligible. But for any x G

nN
D
n
,

f
x
= lim
n

f
nx
is
-virtually measurable. Thus G D X

nN
D
n
is negligible. QQQ
This is true whenever G < . As G is arbitrary and has locally determined negligible sets, D is conegligible.
But, for any x D, y f(x, y) is a restriction of

f
x
and must be -virtually measurable.
252F As a further corollary we can get some useful information about the c.l.d. product measure for arbitrary
measure spaces.
Corollary Let (X, , ) and (Y, T, ) be two measure spaces, the c.l.d. product measure on XY , and its domain.
Let W be such that the vertical section W[x] is -negligible for -almost every x X. Then W = 0.
proof Take E , F T of nite measure. Let
EF
be the subspace measure on E F. By 251Q(ii-) again, this
is just the product of the subspace measures
E
and
F
. We know that W (E F) is measured by
EF
. At the
same time, the vertical section (W (EF))[x] = W[x] F is
F
-negligible for
E
-almost every x E. Applying
252B to
E
and
F
and (W (E F)),
(W (E F)) =
EF
(W (E F)) =
_
E

F
(W[x] F)
E
(dx) = 0.
But looking at the denition in 251F, we see that this means that W = 0, as claimed.
252G Theorem 252B and its corollaries depend on the factor measures and belonging to restricted classes.
There is a partial result which applies to all c.l.d. product measures, as follows.
Tonellis theorem Let (X, , ) and (Y, T, ) be measure spaces, and (X Y, , ) their c.l.d. product. Let f be a
-measurable [, ]-valued function dened on a member of , and suppose that either
__
[f(x, y)[(dx)(dy) or
__
[f(x, y)[(dy)(dx) exists in R. Then f is -integrable.
proof Because the construction of the product measure is symmetric in the two factors, it is enough to consider the
case in which
__
[f(x, y)[(dy)(dx) is dened and nite, as the same ideas will surely deal with the other case also.
(a) The rst step is to check that f is dened and nite -a.e. PPP Set W = (x, y) : (x, y) domf, f(x, y) is nite.
Then W . The hypothesis
__
[f(x, y)[(dy)(dx) is dened and nite
includes the assertion
_
[f(x, y)[(dy) is dened and nite for -almost every x,
which implies that
for -almost every x, f(x, y) is dened and nite for -almost every y;
that is, that
for -almost every x, W[x] is -conegligible.
But by 252F this implies that (X Y ) W is -negligible, as required. QQQ
(b) Let h be any non-negative -simple function such that h [f[ -a.e. Then
_
h cannot be greater than
__
[f(x, y)[(dy)(dx). PPP Set
W = (x, y) : (x, y) domf, h(x, y) [f(x, y)[, h

= h W;
then h

is a simple function and h

=
a.e.
h. Express h

as

n
i=0
a
i
W
i
where a
i
0 and W
i
< for each i. Let
> 0. For each i n there are E
i
, F
i
T such that E
i
< , F
i
< and (W
i
(E
i
F
i
)) W
i
. Set
E =

in
E
i
and F =

in
F
i
. Consider the subspace measures
E
and
F
and their product
EF
on E F; then

EF
is the subspace measure on E F dened from (251Q(ii-) once more). Accordingly, applying 252B to the
product
E

F
,
_
EF
h

d =
_
EF
h

d
EF
=
_
E
_
F
h

(x, y)
F
(dy)
E
(dx).
For any x, we know that h

(x, y) [f(x, y)[ whenever f(x, y) is dened. So we can be sure that


218 Product measures 252G
_
F
h

(x, y)
F
(dy) =
_
h

(x, y)F(y)(dy)
_
[f(x, y)[(dy)
at least whenever
_
F
h

(x, y)
F
(dy) and
_
[f(x, y)[(dy) are both dened, which is the case for almost every x E.
Consequently
_
EF
h

d =
_
E
_
F
h

(x, y)
F
(dy)
E
(dx)

_
E
_
[f(x, y)[(dy)(dx)
__
[f(x, y)[(dy)(dx).
On the other hand,
_
h

d
_
EF
h

d =
n

i=0
a
i
(W
i
(E F))

i=0
a
i
(W
i
(E
i
F
i
))
n

i=0
a
i
.
So
_
hd =
_
h

d
__
[f(x, y)[(dy)(dx)+

n
i=0
a
i
.
As is arbitrary,
_
hd
__
[f(x, y)[(dy)(dx), as claimed. QQQ
(c) This is true whenever h is a -simple function less than or equal to [f[ -a.e. But [f[ is -measurable and
is semi-nite (251Ic), so this is enough to ensure that [f[ is -integrable (213B), which (because f is supposed to be
-measurable) in turn implies that f is -integrable.
252H Corollary Let (X, , ) and (Y, T, ) be -nite measure spaces, the c.l.d. product measure on X Y ,
and its domain. Let f be a -measurable [, ]-valued function dened on a member of . Then if one of
_
XY
[f(x, y)[(d(x, y)),
_
Y
_
X
[f(x, y)[(dx)(dy),
_
X
_
Y
[f(x, y)[(dy)(dx)
exists in R, so do the other two, and in this case
_
XY
f(x, y)(d(x, y)) =
_
Y
_
X
f(x, y)(dx)(dy) =
_
X
_
Y
f(x, y)(dy)(dx).
proof (a) Suppose that
_
[f[d is nite. Because both and are -nite, 252B tells us that
__
[f(x, y)[(dx)(dy),
__
[f(x, y)[(dy)(dx)
both exist and are equal to
_
[f[d, while
__
f(x, y)(dx)(dy),
__
f(x, y)(dy)(dx)
both exist and are equal to
_
fd.
(b) Now suppose that
__
[f(x, y)[(dy)(dx) exists in R. Then 252G tells us that [f[ is -integrable, so we can use
(a) to complete the argument. Exchanging the coordinates, the same argument applies if
__
[f(x, y)[(dx)(dy) exists
in R.
252I Corollary Let (X, , ) and (Y, T, ) be measure spaces, the c.l.d. product measure on X Y , and its
domain. Take W . If either of the integrals
_

W
1
[y](dy),
_

W[x](dx)
exists and is nite, then W < .
proof Apply 252G with f = W, remembering that

W
1
[y] =
_
W(x, y)(dx),

W[x] =
_
W(x, y)(dy)
whenever the integrals are dened, as in the proof of 252D.
252J Remarks 252H is the basic form of Fubinis theorem; it is not a coincidence that most authors avoid non-
-nite spaces in this context. The next two examples exhibit some of the diculties which can arise if we leave the
familiar territory of more-or-less Borel measurable functions on -nite spaces. The rst is a classic.
252L Fubinis theorem 219
252K Example Let (X, , ) be [0, 1] with Lebesgue measure, and let (Y, T, ) be [0, 1] with counting measure.
(a) Consider the set
W = (t, t) : t [0, 1] X Y .
We observe that W is expressible as

nN

n
k=0
[
k
n+1
,
k+1
n+1
] [
k
n+1
,
k+1
n+1
]

T.
If we look at the sections
W
1
[t] = W[t] = t
for t [0, 1], we have
__
W(x, y)(dx)(dy) =
_
W
1
[y](dy) =
_
0 (dy) = 0,
__
W(x, y)(dy)(dx) =
_
W[x](dx) =
_
1 (dx) = 1,
so the two repeated integrals dier. It is therefore not generally possible to reverse the order of repeated integration,
even for a non-negative measurable function in which both repeated integrals exist and are nite.
(b) Because the set W of part (a) actually belongs to

T, we know that it is measured by the c.l.d. product


measure , and 252F (applied with the coordinates reversed) tells us that W = 0.
(c) It is in fact easy to give a full description of .
(i) The point is that a set W [0, 1] [0, 1] belongs to the domain of i every horizontal section W
1
[y]
is Lebesgue measurable. PPP () If W , then, for every b [0, 1], ([0, 1] b) is nite, so W ([0, 1] b) is a set
of nite measure, and
(W ([0, 1] b)) =
_
(W ([0, 1] b))
1
[y](dy) = W
1
[b]
by 252D, because is -nite, is both strictly localizable and complete and locally determined, and
(W ([0, 1] b))
1
[y] = W
1
[b] if y = b,
= otherwise.
As b is arbitrary, every horizontal section of W is measurable. () If every horizontal section of W is measurable, let
F [0, 1] be any set of nite measure for ; then F is nite, so
W ([0, 1] F) =

yF
W
1
[y] y

T .
But it follows that W itself belongs to , by 251H. QQQ
(ii) Now some of the same calculations show that for every W ,
W =

y[0,1]
W
1
[y].
PPP For any nite F [0, 1],
(W ([0, 1] F)) =
_
(W ([0, 1] F))
1
[y](dy)
=
_
F
W
1
[y](dy) =

yF
W
1
[y].
So
W = sup
F[0,1] is nite

yF
W
1
[y] =

y[0,1]
W
1
[y]. QQQ
252L Example For the second example, I turn to a problem that can arise if we neglect to check that a function
is measurable as a function of two variables.
Let (X, , ) = (Y, T, ) be
1
, the rst uncountable ordinal (2A1Fc), with the countable-cocountable measure
(211R). Set
W = (, ) : <
1
X Y .
220 Product measures 252L
Then all the horizontal sections W
1
[] = : are countable, so
_
W
1
[](d) =
_
0 (d) = 0,
while all the vertical sections W[] = : <
1
are cocountable, so
_
W[](d) =
_
1 (d) = 1.
Because the two repeated integrals are dierent, they cannot both be equal to the measure of W, and the sole resolution
is to say that W is not measured by the product measure.
252M Remark A third kind of diculty in the formula
__
f(x, y)dxdy =
__
f(x, y)dydx
can arise even on probability spaces with

T-measurable real-valued functions dened everywhere if we neglect to


check that f is integrable with respect to the product measure. In 252H, we do need the hypothesis that one of
_
XY
[f(x, y)[(d(x, y)),
_
Y
_
X
[f(x, y)[(dx)(dy),
_
X
_
Y
[f(x, y)[(dy)(dx)
is nite. For examples to show this, see 252Xf and 252Xg.
252N Integration through ordinate sets I: Proposition Let (X, , ) be a complete locally determined measure
space, and the c.l.d. product measure on X R, where R is given Lebesgue measure; write for the domain of .
For any [0, ]-valued function f dened on a conegligible subset of X, write
f
,

f
for the ordinate sets

f
= (x, a) : x domf, 0 a f(x) X R,

f
= (x, a) : x domf, 0 a < f(x) X R.
Then

f
=

f
=
_
fd
in the sense that if one of these is dened in [0, ], so are the other two, and they are equal.
proof (a) If
f
, then
_
f(x)(dx) =
_
y : (x, y)
f
(dx) =
f
by 252D, writing for Lebesgue measure, because f is dened almost everywhere. Similarly, if

f
,
_
f(x)(dx) =
_
y : (x, y)

f
(dx) =

f
.
(b) If
_
fd is dened, then f is -virtually measurable, therefore measurable (because is complete); again because
is complete, domf . So

f
=

qQ,q>0
x : x domf, f(x) > q [0, q],

f
=

n1

qQ,q>0
x : x domf, f(x) q
1
n
[0, q]
belong to , so that
f
and

f
are dened. Now both are equal to
_
fd, by (a).
252O Integration through ordinate sets II: Proposition Let (X, , ) be a measure space, and f a non-
negative -virtually measurable function dened on a conegligible subset of X. Then
_
fd =
_

x : x domf, f(x) tdt =


_

x : x domf, f(x) > tdt


in [0, ], where the integrals
_
. . . dt are taken with respect to Lebesgue measure.
proof Completing does not change the integral of f or the outer measure

(212Fb, 212Ea), so we may suppose that


is complete, in which case domf and f will be measurable. For n, k N set E
nk
= x : x domf, f(x) > 2
n
k,
g
n
(x) = 2
n

4
n
k=1
E
nk
. Then g
n

nN
is a non-decreasing sequence of measurable functions converging to f at
every point of domf, so
_
fd = lim
n
_
g
n
d and x : f(x) > t = lim
n
x : g
n
(x) > t for every t 0;
consequently
_

0
x : f(x) > tdt = lim
n
_

0
x : g
n
(x) > tdt.
*252P Fubinis theorem 221
On the other hand, x : g
n
(x) > t = E
nk
if 1 k 4
n
and 2
n
(k 1) t < 2
n
k, 0 if t 2
n
, so that
_

0
x : g
n
(x) > tdt =

4
n
k=1
2
n
E
nk
=
_
g
n
d,
for every n N. So
_

0
x : f(x) > tdt =
_
fd.
Now x : f(x) t = x : f(x) > t for almost all t. PPP Set C = t : x : f(x) > t < , h(t) = x : f(x) > t
for t C. If C is not empty, h : C [0, [ is monotonic, so is continuous almost everywhere in C (222A). But at any
point of C inf C at which h is continuous,
x : f(x) t = lim
st
x : f(x) > s = x : f(x) > t.
So we have the result, since x : f(x) t = x : f(x) > t = for any t [0, [ C. QQQ
Accordingly
_

0
x : f(x) tdt is also equal to
_
fd.
*252P If we work through the ideas of 252B for

T-measurable functions, we get the following, which is sometimes


useful.
Proposition Let (X, , ) be a measure space, and (Y, T, ) a -nite measure space. Then for any

T-measurable
function f : XY [0, ], x
_
f(x, y)(dy) : X [0, ] is -measurable; and if is semi-nite,
__
f(x, y)(dy)(dx) =
_
fd, where is the c.l.d. product measure on X Y .
proof (a) Let Y
n

nN
be a non-decreasing sequence of subsets of Y of nite measure with union Y . Set
/ = W : W X Y, W[x] T for every x X,
x (Y
n
W[x]) is -measurable for every n N.
Then / is a Dynkin class of subsets of X Y including E F : E , F T, so includes

T, by the Monotone
Class Theorem again (136B).
This means that if W

T, then
W[x] = sup
nN
(Y
n
W[x])
is dened for every x X and is a -measurable function of x.
(b) Now, for n, k N, set
W
nk
= (x, y) : f(x, y) 2
n
k, g
n
=

4
n
k=1
2
n
W
nk
.
Then if we set
h
n
(x) =
_
g
n
(x, y)(dy)=

4
n
k=1
2
n
W
nk
[x]
for n N and x X, h
n
: X [0, ] is -measurable, and
lim
n
h
n
(x) =
_
(lim
n
g
n
(x, y))(dy) =
_
f(x, y)(dy)
for every x, because g
n
(x, y)
nN
is a non-decreasing sequence with limit f(x, y) for all x X, y Y . So x
_
f(x, y)(dy) is dened everywhere in X and is -measurable.
(c) If E X is measurable and has nite measure, then
_
E
_
f(x, y)(dy)(dx) =
_
EY
fd, applying 252B to the
product of the subspace measure
E
and (and using 251Q to check that the product of
E
and is the subspace
measure on E Y ). Now if W is dened and nite, there must be a non-decreasing sequence E
n

nN
of subsets of
X of nite measure such that W = sup
nN
(W (E
n
Y )), so that W

nN
(E
n
Y ) is negligible, and
_
W
fd = lim
n
_
W(E
n
Y )
fd
(by B.Levis theorem applied to f (W (E
n
Y ))
nN
)
lim
n
_
E
n
Y
fd = lim
n
_
E
n
_
f(x, y)(dy)(dx)

__
f(x, y)(dy)(dx).
By 213B once more,
_
fd = sup
W<
_
W
fd
__
f(x, y)(dy)(dx).
222 Product measures *252P
But also, if is semi-nite,
__
f(x, y)(dy)(dx) = sup
E<
_
E
_
f(x, y)(dy)(dx)
_
fd,
so
_
fd =
__
f(x, y)(dy)(dx), as claimed.
252Q The volume of a ball We now have all the essential machinery to perform a little calculation which is, I
suppose, desirable simply as general knowledge: the volume of the unit ball x : |x| 1 = (
1
, . . . ,
r
) :

r
i=1

2
i
1
in R
r
. In fact, from a theoretical point of view, I think we could very nearly just call it
r
and leave it at that; but since
there is a general formula in terms of
2
= and factorials, it seems shameful not to present it. The calculation has
nothing to do with Lebesgue integration, and I could dismiss it as mere advanced calculus; but since only a minority of
mathematicians are now taught calculus to this level with reasonable rigour before being introduced to the Lebesgue
integral, I do not doubt that many readers, like myself, missed some of the subtleties involved. I therefore take the
space to spell the details out in the style used elsewhere in this volume, recognising that the machinery employed is a
good deal more elaborate than is really necessary for this result.
(a) The rst basic fact we need is that, for any n 1,
I
n
=
_
/2
/2
cos
n
t dt =
(2k)!
(2
k
k!)
2
if n = 2k is even,
= 2
(2
k
k!)
2
(2k+1)!
if n = 2k + 1 is odd.
PPP For n = 0, of course,
I
0
=
_
/2
/2
1 dt = =
0!
(2
0
0!)
2
,
while for n = 1 we have
I
1
= sin

2
sin(

2
) = 2 = 2
(2
0
0!)
2
1!
,
using the Fundamental Theorem of Calculus (225L) and the fact that sin

= cos is bounded. For the inductive step to


n + 1 2, we can use integration by parts (225F):
I
n+1
=
_
/2
/2
cos t cos
n
t dt
= sin

2
cos
n

2
sin(

2
) cos
n
(

2
) +
_
/2
/2
sin t ncos
n1
t sin t dt
= n
_
/2
/2
(1 cos
2
t) cos
n1
t dt = n(I
n1
I
n+1
),
so that I
n+1
=
n
n+1
I
n1
. Now the given formulae follow by an easy induction. QQQ
(b) The next result is that, for any n N and any a 0,
_
a
a
(a
2
s
2
)
n/2
ds = I
n+1
a
n+1
.
PPP Of course this is an integration by substitution; but the singularity of the integrand at s = a complicates the
issue slightly. I oer the following argument. If a = 0 the result is trivial; take a > 0. For a b a, set
F(b) =
_
b
a
(a
2
s
2
)
n/2
ds. Because the integrand is continuous, F

(b) exists and is equal to (a


2
b
2
)
n/2
for a < b < a
(222H). Set G(t) = F(a sin t); then G is continuous and
G

(t) = aF

(a sin t) cos t = a
n+1
cos
n+1
t
for

2
< t <

2
. Consequently
_
a
a
(a
2
s
2
)
n/2
ds = F(a) F(a) = G(

2
) G(

2
) =
_
/2
/2
G

(t)dt
(by 225L, as before)
252Rc Fubinis theorem 223
= a
n+1
I
n+1
,
as required. QQQ
(c) Now at last we are ready for the balls B
r
= x : x R
r
, |x| 1. Let
r
be Lebesgue measure on R
r
, and set

r
= I
1
I
2
. . . I
r
for r 1. I claim that, writing
B
r
(a) = x : x R
r
, |x| a,
we have
r
(B
r
(a)) =
r
a
r
for every a 0. PPP Induce on r. For r = 1 we have
1
= 2, B
1
(a) = [a, a], so the result is
trivial. For the inductive step to r + 1, we have

r+1
B
r+1
(a) =
_

r
x : (x, t) B
r+1
(a)dt
(putting 251N and 252D together, and using the fact that B
r+1
(a) is closed, therefore measurable)
=
_
a
a

r
B
r
(
_
a
2
t
2
)dt
(because (x, t) B
r+1
(a) i [t[ a and |x|

a
2
t
2
)
=
_
a
a

r
(a
2
t
2
)
r/2
dt
(by the inductive hypothesis)
=
r
a
r+1
I
r+1
(by (b) above)
=
r+1
a
r+1
(by the denition of
r+1
). Thus the induction continues. QQQ
(d) In particular, the r-dimensional Lebesgue measure of the r-dimensional ball B
r
= B
r
(1) is just
r
= I
1
. . . I
r
.
Now an easy induction on k shows that

r
=
1
k!

k
if r = 2k is even,
=
2
2k+1
k!
(2k+1)!

k
if r = 2k + 1 is odd.
(e) Note that in part (c) of the proof we saw that x : x R
r
, |x| a has measure
r
a
r
for every a 0.
The formulae here are consistent with the assignation
0
= 1; which corresponds to saying that R
0
= , that

0
R
0
= 1 and that B
0
= . Taking
0
R
0
to be 1 is itself consistent with the idea that, following 251N, the product
measure
0

r
ought to match
0+r
on R
0+r
.
252R Complex-valued functions It is easy to apply the results of 252B-252I above to complex-valued functions,
by considering their real and imaginary parts. Specically:
(a) Let (X, , ) and (Y, T, ) be measure spaces, and the c.l.d. product measure on X Y . Suppose that
is -nite and that is either strictly localizable or complete and locally determined. Let f be a -integrable
complex-valued function. Then
__
f(x, y)(dy)(dx) is dened and equal to
_
fd.
(b) Let (X, , ) and (Y, T, ) be measure spaces, the c.l.d. product measure on X Y , and its domain. Let f
be a -measurable complex-valued function dened on a member of , and suppose that either
__
[f(x, y)[(dx)(dy)
or
__
[f(x, y)[(dy)(dx) is dened and nite. Then f is -integrable.
(c) Let (X, , ) and (Y, T, ) be -nite measure spaces, the c.l.d. product measure on XY , and its domain.
Let f be a -measurable complex-valued function dened on a member of . Then if one of
_
XY
[f(x, y)[(d(x, y)),
_
Y
_
X
[f(x, y)[(dx)(dy),
_
X
_
Y
[f(x, y)[(dy)(dx)
exists in R, so do the other two, and in this case
_
XY
f(x, y)(d(x, y)) =
_
Y
_
X
f(x, y)(dx)(dy) =
_
X
_
Y
f(x, y)(dy)(dx).
224 Product measures 252X
252X Basic exercises (a) Let (X, , ) and (Y, T, ) be measure spaces, and the c.l.d. product measure on
X Y . Let f be a -integrable real-valued function such that
_
EF
f = 0 whenever E , F T, E < and
F < . Show that f = 0 -a.e. (Hint: use 251Ie to show that
_
W
f = 0 whenever W < .)
(b) Let f, g : R R be two non-decreasing functions, and
f
,
g
the associated Lebesgue-Stieltjes measures (see
114Xa). Set
f(x
+
) = lim
tx
f(t), f(x

) = lim
tx
f(t)
for each x R, and dene g(x
+
), g(x

) similarly. Show that whenever a b in R,


_
[a,b]
f(x

)
g
(dx) +
_
[a,b]
g(x
+
)
f
(dx) = g(b
+
)f(b
+
) g(a

)f(a

)
=
_
[a,b]
1
2
(f(x

) +f(x
+
))
g
(dx) +
_
[a,b]
1
2
((g(x

) +g(x
+
))
f
(dx).
(Hint: nd two expressions for (
f

g
)(x, y) : a x < y b.)
>>>(c) Let (X, , ) and (Y, T, ) be complete locally determined measure spaces, the c.l.d. product measure on
X Y , and its domain. Suppose that A X and B Y . Show that A B i either A = 0 or B = 0 or
A and B T. (Hint: if B is not negligible and AB , take H such that H < and BH is not negligible.
Then W = A (B H) is measured by
H
, where
H
is the subspace measure on H. Now apply 252D to ,
H
and W to see that A .)
>>>(d) Let (X
1
,
1
,
1
), (X
2
,
2
,
2
), (X
2
,
3
,
3
) be three -nite measure spaces, and f a real-valued function
dened almost everywhere on X
1
X
2
X
3
and -measurable, where is the domain of the product measure
described in 251W or 251Xg. Show that if
___
[f(x
1
, x
2
, x
3
)[dx
1
dx
2
dx
3
is dened in R, then
___
f(x
1
, x
2
, x
3
)dx
2
dx
3
dx
1
and
___
f(x
1
, x
2
, x
3
)dx
3
dx
1
dx
2
exist and are equal.
(e) Give an example of strictly localizable measure spaces (X, , ), (Y, T, ) and a W

T such that x
W[x] is not -measurable. (Hint: in 252Kb, try Y a proper subset of [0, 1].)
>>>(f ) Set f(x, y) = sin(x y) if 0 y x y + 2, 0 for other x, y R
2
. Show that
__
f(x, y)dxdy = 0 and
__
f(x, y)dy dx = 2, taking all integrals with respect to Lebesgue measure.
>>>(g) Set f(x, y) =
x
2
y
2
(x
2
+y
2
)
2
for x, y ]0, 1]. Show that
_
1
0
_
1
0
f(x, y)dydx =

4
,
_
1
0
_
1
0
f(x, y)dxdy =

4
.
>>>(h) Let (X, , ) and (Y, T, ) be measure spaces, and f a

T-measurable function dened on a subset of XY .


Show that y f(x, y) is T-measurable for every x X.
(i) Let r 1 be an integer, and write
r
for the Lebesgue measure of the unit ball in R
r
. Set g
r
(t) = r
r
t
r1
for t 0,
(x) = |x| for x R
r
. (i) Writing
r
for Lebesgue measure on R
r
, show that
r

1
[E] =
_
E
r
r
t
r1

1
(dt) for every
Lebesgue measurable set E [0, [. (Hint: start with intervals E, noting from 115Xe that
r
x : |x| a =
r
a
r
for a 0, and progress to open sets, negligible sets and general measurable sets.) (ii) Using 235R, show that
_
e
x
2
/2

r
(dx) = r
r
_

0
t
r1
e
t
2
/2

1
(dt) = 2
(r2)/2
r
r
(
r
2
)
= 2
r/2

r
(1 +
r
2
) = (

2(
1
2
))
r
where is the -function (225Xj). (iii) Show that
2(
1
2
)
2
= 2
2
(2) = 2
2
_

0
te
t
2
/2
dt = 2,
and hence that
r
=

r/2
(1+
r
2
)
and
_

e
t
2
/2
dt =

2.
252Y Further exercises (a) Let (X, , ) be a measure space. Show that the following are equiveridical: (i) the
completion of is locally determined; (ii) the completion of coincides with the c.l.d. version of ; (iii) whenever
(Y, T, ) is a -nite measure space and the c.l.d. product measure on X Y and f is a function such that
_
fd is
dened in [, ], then
__
f(x, y)(dy)(dx) is dened and equal to
_
fd.
252Ym Fubinis theorem 225
(b) Let (X, , ) be a measure space. Show that the following are equiveridical: (i) has locally determined
negligible sets; (ii) whenever (Y, T, ) is a -nite measure space and the c.l.d. product measure on X Y , then
__
f(x, y)(dy)(dx) is dened and equal to
_
fd for any -integrable function f.
(c) Let (X, , ) and (Y, T, ) be measure spaces, and
0
the primitive product measure on X Y (251C). Let
f be any
0
-integrable real-valued function. Show that
__
f(x, y)(dy)(dx) =
_
fd
0
. (Hint: show that there are
sequences G
n

nN
, H
n

nN
of sets of nite measure such that f(x, y) is dened and equal to 0 for every (x, y)
(X Y )

nN
G
n
H
n
.)
(d) Let (X, , ) and (Y, T, ) be measure spaces; let
0
be the primitive product measure on X Y , and the
c.l.d. product measure. Show that if f is a
0
-integrable real-valued function, it is -integrable, and
_
fd =
_
fd
0
.
(e) Let (X, , ) be a complete locally determined measure space and a < b in R, endowed with Lebesgue measure;
let be the domain of the c.l.d. product measure on X [a, b]. Let f : X ]a, b[ R be a -measurable function
such that t f(x, t) : [a, b] R is continuous on [a, b] and dierentiable on ]a, b[ for every x X. (i) Show that the
partial derivative
f
t
with respect to the second variable is -measurable. (ii) Now suppose that
f
t
is -integrable
and that
_
f(x, t
0
)(dx) is dened and nite for some t
0
]a, b[. Show that F(t) =
_
f(x, t)(dx) is dened in R for
every t [a, b], that F is absolutely continuous, and that F

(t) =
_
f
t
(x, t)(dx) for almost every t ]a, b[. (Hint:
F(c) = F(a) +
_
X[a,c]
f
t
d for every c [a, b].)
(f ) Show that
(a)(b)
(a+b)
=
_
1
0
t
a1
(1 t)
b1
dt for all a, b > 0. (Hint: show that
_

0
t
a1
_

t
e
x
(x t)
b1
dxdt =
_

0
e
x
_
x
0
t
a1
(x t)
b1
dtdx.)
(g) Let (X, , ) and (Y, T, ) be -nite measure spaces and the c.l.d. product measure on XY . Suppose that
f L
0
() and that 1 < p < . Show that (
_
[
_
f(x, y)dx[
p
dy)
1/p

_
(
_
[f(x, y)[
p
dy)
1/p
dx. (Hint: set q =
p
p1
and
consider the integral
_
[f(x, y)g(y)[(d(x, y)) for g L
q
(), using 244K.)
(h) Let be Lebesgue measure on [0, [; suppose that f L
p
() where 1 < p < . Set F(y) =
1
y
_
y
0
f for y > 0.
Show that |F|
p

p
p1
|f|
p
. (Hint: F(y) =
_
1
0
f(xy)dx; use 252Yg with X = [0, 1], Y = [0, [.)
(i) Show that if p is any non-zero (real) polynomial in r variables, then x : x R
r
, p(x) = 0 is Lebesgue negligible.
(j) Let (X, , ) and (Y, T, ) be measure spaces, with c.l.d. product (X Y, , ). Let f be a non-negative
-measurable real-valued function dened on a -conegligible set, and suppose that
_ __
f(x, y)(dx)
_
(dy)
is nite. Show that f is -integrable.
(k) Let (X, , ) be the unit interval [0, 1] with Lebesgue measure, and (Y, T, ) the interval with counting measure,
as in 252K; let
0
be the primitive product measure on [0, 1]
2
. (i) Setting = (t, t) : t [0, 1], show that
0
= .
(ii) Show that
0
is not semi-nite. (iii) Show that if W dom
0
, then
0
W =

y[0,1]
W
1
[y] if there are a
countable set A [0, 1] and a Lebesgue negligible set E [0, 1] such that W ([0, 1] A) (E [0, 1]), otherwise.
(l) Let (X, , ) be a measure space, and
0
the primitive product measure on X R, where R is given Lebesgue
measure; write for its domain. For any [0, ]-valued function f dened on a conegligible subset of X, write
f
,

f
for the corresponding ordinate sets, as in 252N. Show that if any of
0

f
,
0

f
,
_
fd is dened and nite, so are the
others, and all three are equal.
(m) Let (X, , ) be a complete locally determined measure space, and f a non-negative function dened on a
conegligible subset of X. Write
f
,

f
for the corresponding ordinate sets, as in 252N. Let be the c.l.d. product
measure on X R, where R is given Lebesgue measure. Show that
_
f d =

f
=

f
.
226 Product measures 252Yn
(n) Let (X, , ) be a measure space and f : X [0, [ a function. Show that
_
fd =
_

x : f(x) tdt.
(o) Let (X, , ) be a complete measure space and write M
0,
for the set f : f L
0
(), x : [f(x)[ a
is nite for some a [0, [. (i) Show that for each f M
0,
there is a non-increasing f

: ]0, [ R such
that
L
t : f

(t) = x : [f(x)[ for every > 0, writing


L
for Lebesgue measure. (ii) Show that
_
E
[f[d
_
E
0
f

d
L
for every E (allowing ). (Hint: (f E)

.) (iii) Show that |f

|
p
= |f|
p
for every
p [1, ], f M
0,
. (Hint: ([f[
p
)

= (f

)
p
.) (iv) Show that if f, g M
0,
then
_
[f g[d
_
f

d
L
. (Hint:
look at simple functions rst.) (v) Show that if is atomless then
_
a
0
f

d
L
= sup
E,Ea
_
E
[f[ for every a 0.
(Hint: 215D.) (vi) Show that A L
1
() is uniformly integrable i f

: f A is uniformly integrable in L
1
(
L
). (f

is called the decreasing rearrangement of f.)


(p) Let (X, , ) be a complete locally determined measure space, and write for Lebesgue measure on [0, 1]. Show
that the c.l.d. product measure on X [0, 1] is localizable i is localizable. (Hints: (i) if c , show that F
is an essential supremum for c in i F [0, 1] is an essential supremum for E [0, 1] : E c in = dom. (ii)
For W , n N, k < 2
n
set
W
nk
= x : x X,

t : (x, t) W, 2
n
k t 2
n
(k + 1) 2
n1
.
Show that if J and F
nk
is an essential supremum for W
nk
: W J in for all n, k, then

nN

mn

k<2
m
F
mk
[2
m
k, 2
m
(k + 1)]
is an essential supremum for J in .)
(q) Let (X, , ) be the space of Example 216D, and give Lebesgue measure to [0, 1]. Show that the c.l.d. product
measure on X [0, 1] is complete, locally determined, atomless and not localizable.
(r) Let (X, , ) be a complete locally determined measure space and (Y, T, ) a semi-nite measure space with
Y > 0. Show that if the c.l.d. product measure on X Y is strictly localizable, then is strictly localizable. (Hint:
take F T, 0 < F < . Let W
i

iI
be a decomposition of X Y . For i I, n N set E
in
= x :

y : y
F, (x, y) W
i
2
n
. Apply 213Ye to E
in
: i I, n N.)
(s) Let (X, , ) be the space of Example 216E, and give Lebesgue measure to [0, 1]. Show that the c.l.d. product
measure on X [0, 1] is complete, locally determined, atomless and localizable, but not strictly localizable.
(t) Let (X, , ) be a measure space and f a -integrable complex-valued function. For ], ] set H

= x :
x domf, !e(e
i
f(x)) > 0. Show that
_

!e(e
i
_
H

f)d = 2
_
[f[, and hence that there is some such that
[
_
H

f[
1

_
[f[. (Compare 246F.)
(u) Set f(t) = t ln(t + 1) for t > 1. (i) Show that (a + 1) = a
a+1
e
a
_

1
e
af(u)
du for every a > 0. (Hint:
substitute u =
t
a
1 in 225Xj(iii).) (ii) Show that there is a > 0 such that f(t)
1
3
t
2
for 1 t . (iii) Setting
=
1
2
f(), show that (for a 1)

a
_

e
af(t)
dt

ae
a
_

0
e
f(t)/2
dt 0
as a . (iv) Set g
a
(t) = e
af(t/

a)
if

a < t

a, 0 otherwise. Show that g


a
(t) e
t
2
/3
for all a, t and that
lim
a
g
a
(t) = e
t
2
/2
for all t, so that
lim
a
e
a
(a+1)
a
a+
1
2
= lim
a

a
_

1
e
af(t)
dt = lim
a

a
_

1
e
af(t)
dt
= lim
a
_

g
a
(t)dt =
_

e
t
2
/2
dt =

2.
(v) Show that lim
n
n!
e
n
n
n

n
=

2. (This is Stirlings formula.)


(v) Let (X, , ) be a complete locally determined measure space and f, g two real-valued, -virtually measurable
functions dened almost everywhere in X. (i) Let be the c.l.d. product of and Lebesgue measure on R. Setting

f
= (x, a) : x domf, a R, a f(x) and

g
= (x, a) : x domg, a R, a g(x), show that (

g
) =
_
(f g)
+
d and (

g
) =
_
[f g[d. (ii) Suppose that is -nite. Show that
252 Notes Fubinis theorem 227
_
[f g[d =
_

(x : x domf domg, (f(x) a)(g(x) a) < 0da.


(iii) Suppose that is -nite, that T is a -subalgebra of , that E and that g : X [0, 1] is T-measurable.
Show that there is an F T such that (EF)
_
[E g[d.
252 Notes and comments For a volume and a half now I have asked you to accept the idea of integrating partially-
dened functions, insisting that sooner or later they would appear at the core of the subject. The moment has now
come. If we wish to apply Fubinis and Tonellis theorems in the most fundamental of all cases, with both factors
equal to Lebesgue measure on the unit interval, it is surely natural to look at all functions which are integrable on
the square for two-dimensional Lebesgue measure. Now two-dimensional Lebesgue measure is a complete measure, so,
in particular, assigns zero measure to any set of the form (x, b) : x A or (a, y) : y A, whether or not the
set A is measured by one-dimensional measure. Accordingly, if f is a function of two variables which is integrable
for two-dimensional Lebesgue measure, there is no reason why any particular section x f(x, b) or y f(a, y)
should be measurable, let alone integrable. Consequently, even if f itself is dened everywhere, the outer integral of
__
f(x, y)dxdy is likely to be applied to a function which is not dened for every y. Let me remark that the problem
does not concern ; the awkward functions are those with sections so irregular that they cannot be assigned an
integral at all.
I have seen many approaches to this particular nettle, generally less whole-hearted than the one I have determined
on for this treatise. Part of the diculty is that Fubinis theorem really is at the centre of measure theory. Over large
parts of the subject, it is possible to assert that a result is non-trivial if and only if it depends on Fubinis theorem.
I am therefore unwilling to insert any local x, saying that in this chapter, we shall integrate functions which are
not dened everywhere; before long, such a provision would have to be interpolated into the preambles to half the
best theorems, or an explanation oered of why it wasnt necessary in their particular contexts. I suppose that one
of the commonest responses is (like Halmos 50) to restrict attention to

T-measurable functions, which eliminates


measurability problems for the moment (252Xh, 252P); but unhappily (or rather, to my mind, happily) there are
crucial applications in which the functions are not actually

T-measurable, but belong to some wider class, and


this restriction sooner or later leads to undignied contortions as we are forced to adapt limited results to unforeseen
contexts. Besides, it leaves unsaid the really rather important information that if f is a measurable function of two
variables then (under appropriate conditions) almost all its sections are measurable (252E).
In 252B and its corollaries there is a clumsy restriction: we assume that one of the measures is -nite and the
other is either strictly localizable or complete and locally determined. The obvious question is, whether we need these
hypotheses. From 252K we see that the hypothesis -nite on the second factor can certainly not be abandoned, even
when the rst factor is a complete probability measure. The requirement is either strictly localizable or complete
and locally determined is in fact fractionally stronger than what is needed, as well as disagreeably elaborate. The
right hypothesis is that the completion of should be locally determined (see 252Ya). The point is that because the
product of two measures is the same as the product of their c.l.d. versions (251T), no theorem which leads from the
product measure to the factor measures can distinguish between a measure and its c.l.d. version; so that, in 252B, we
must expect to need and its c.l.d. version to give rise to the same integrals. The proof of 252B would be better
focused if the hypothesis was simplied to is -nite and is complete and locally determined. But this would just
transfer part of the argument into the proof of 252C.
We also have to work a little harder in 252B in order to cover functions and integrals taking the values . Fubinis
theorem is so central to measure theory that I believe it is worth taking a bit of extra trouble to state the results in
maximal generality. This is especially important because we frequently apply it in multiply repeated integrals, as in
252Xd, in which we have even less control than usual over the intermediate functions to be integrated.
I have expressed all the main results of this section in terms of the c.l.d. product measure. In the case of -nite
spaces, of course, which is where the theory works best, we could just as well use the primitive product measure.
Indeed, Fubinis theorem itself has a version in terms of the primitive product measure which is rather more elegant
than 252B as stated (252Yc), and covers the great majority of applications. (Integrals with respect to the primitive and
c.l.d. product measures are of course very closely related; see 252Yd.) But we do sometimes need to look at non--nite
spaces, and in these cases the asymmetric form in 252B is close to the best we can do. Using the primitive product
measure does not help at all with the most substantial obstacle, the phenomenon in 252K (see 252Yk).
The pre-calculus concept of an integral as the area under a curve is given expression in 252N: the integral of a
non-negative function is the measure of its ordinate set. This is unsatisfactory as a denition of the integral, not just
because of the requirement that the base space should be complete and locally determined (which can be dealt with
by using the primitive product measure, as in 252Yl), but because the construction of the product measure involves
integration (part (c) of the proof of 251E). The idea of 252N is to relate the measure of an ordinate set to the integral
of the measures of its vertical sections. Curiously, if instead we integrate the measures of its horizontal sections, as in
228 Product measures 252 Notes
252O, we get a more versatile result. (Indeed this one does not involve the concept of product measure, and could
have appeared at any point after 123.) Note that the integral
_

0
. . . dt here is applied to a monotonic function, so
may be interpreted as an improper Riemann integral. If you think you know enough about the Riemann integral to
make this a tempting alternative to the construction in 122, the tricky bit now becomes the proof that the integral is
additive.
A dierent line of argument is to use integration over sections to dene a product measure. The diculty with this
approach is that unless we take great care we may nd ourselves with an asymmetric construction. My own view is
that such an asymmetry is acceptable only when there is no alternative. But in Chapter 43 of Volume 4 I will describe
a couple of examples.
Of the two examples I give here, 252K is supposed to show that when I call for -nite spaces they are really
necessary, while 252L is supposed to show that joint measurability is essential in Tonellis theorem and its corollaries.
The factor spaces in 252K, Lebesgue measure and counting measure, are chosen to show that it is only the lack of
-niteness that can be the problem; they are otherwise as regular as one can reasonably ask. In 252L I have used the
countable-cocountable measure on
1
, which you may feel is t only for counter-examples; and the question does arise,
whether the same phenomenon occurs with Lebesgue measure. This leads into deep water, and I will return to it in
Chapter 53 of Volume 5.
I ought perhaps to note explicitly that in Fubinis theorem, we really do need to have a function which is integrable
for the product measure. I include 252Xf and 252Xg to remind you that even in the best-regulated circumstances, the
repeated integrals
__
f dxdy,
__
f dydx may fail to be equal if f is not integrable as a function of two variables.
There are many ways to calculate the volume
r
of an r-dimensional ball; the one I have used in 252Q follows a line
that would have been natural to me before I ever heard of measure theory. In 252Xi I suggest another method. The idea
of integration-by-substitution, used in part (b) of the argument for 252Q, is there supported by an ad hoc argument; I
will present a dierent, more generally applicable, approach in Chapter 26. Elsewhere (252Xi, 252Yf, 252Yh, 252Yu)
I nd myself taking for granted substitutions of the form t at, t a + t, t t
2
; for a systematic justication, see
263. Of course an enormous number of other formulae of advanced calculus are also based on repeated integration of
one kind or another, and I give a sample handful of such results (252Xb, 252Ye-252Yh, 252Yu).
253 Tensor products
The theorems of the last section show that the integrable functions on a product of two measure spaces can be
eectively studied in terms of integration on each factor space separately. In this section I present a very striking
relationship between the L
1
space of a product measure and the L
1
spaces of its factors, which actually determines
the product L
1
up to isomorphism as Banach lattice. I start with a brief note on bilinear operators (253A) and a
description of the canonical bilinear operator from L
1
() L
1
() to L
1
( ) (253B-253E). The main theorem of the
section is 253F, showing that this canonical map is universal for continuous bilinear operators from L
1
() L
1
() to
Banach spaces; it also determines the ordering of L
1
( ) (253G). I end with a description of a fundamental type
of conditional expectation operator (253H) and notes on products of indenite-integral measures (253I) and upper
integrals of special kinds of function (253J, 253K).
253A Bilinear operators Before looking at any of the measure theory in this section, I introduce a concept from
the theory of linear spaces.
(a) Let U, V and W be linear spaces over R (or, indeed, any eld). A map : U V W is bilinear if it is linear
in each variable separately, that is,
(u
1
+u
2
, v) = (u
1
, v) +(u
2
, v),
(u, v
1
+v
2
) = (u, v
1
) +(u, v
2
),
(u, v) = (u, v) = (u, v)
for all u, u
1
, u
2
U, v, v
1
, v
2
V and scalars . Observe that such a gives rise to, and in turn can be dened by, a
linear operator T : U L(V ; W), writing L(V ; W) for the space of linear operators from V to W, where
(Tu)(v) = (u, v)
for all u U, v V . Hence, or otherwise, we can see, for instance, that (0, v) = (u, 0) = 0 whenever u U and
v V .
If W

is another linear space over the same eld, and S : W W

is a linear operator, then S : U V W

is
bilinear.
253D Tensor products 229
(b) Now suppose that U, V and W are normed spaces, and : U V W a bilinear operator. Then we say that
is bounded if sup|(u, v)| : |u| 1, |v| 1 is nite, and in this case we call this supremum the norm || of .
Note that |(u, v)| |||u||v| for all u U, v V (because
|(u, v)| = |(
1
u,
1
v)| ||
whenever > |u|, > |v|).
If W

is another normed space and S : W W

is a bounded linear operator, then S : U V W

is a bounded
bilinear operator, and |S| |S|||.
253B Denition The most important bilinear operators of this section are based on the following idea. Let f and
g be real-valued functions. I will write f g for the function (x, y) f(x)g(y) : domf domg R.
253C Proposition (a) Let X and Y be sets, and , T -algebras of subsets of X, Y respectively. If f is a -
measurable real-valued function dened on a subset of X, and g is a T-measurable real-valued function dened on a
subset of Y , then f g, as dened in 253B, is

T-measurable.
(b) Let (X, , ) and (Y, T, ) be measure spaces, and the c.l.d. product measure on X Y . If f L
0
() and
g L
0
(), then f g L
0
().
Remark Recall from 241A that L
0
() is the space of -virtually measurable real-valued functions dened on -
conegligible subsets of X.
proof (a) The point is that f Y is

T-measurable, because for any R there is an E such that


x : f(x) = E domf,
so that
(x, y) : (f Y )(x, y) = (E domf) Y = (E Y ) dom(f Y ),
and of course EY

T. Similarly, Xg is

T-measurable and f g = (f Y )(Xg) is

T-measurable.
(b) Let E , F T be conegligible subsets of X, Y respectively such that E domf, F domg, fE is
-measurable and gF is T-measurable. Write for the domain of . Then

T (251Ia). Also E F is
-conegligible, because
((X Y ) (E F)) ((X E) Y ) +(X (Y F))
= (X E) Y +X (Y F) = 0
(also from 251Ia). So dom(f g) E F is conegligible. Also, by (a), (f g)(E F) = (fE) (gF) is

T-measurable, therefore -measurable, and f g is virtually measurable. Thus f g L


0
(), as claimed.
253D Now we can apply the ideas of 253B-253C to integrable functions.
Proposition Let (X, , ) and (Y, T, ) be measure spaces, and write for the c.l.d. product measure on X Y . If
f L
1
() and g L
1
(), then f g L
1
() and
_
f g d =
_
f d
_
g d.
Remark I follow 242 in writing L
1
() for the space of -integrable real-valued functions.
proof (a) Consider rst the case f = E, g = F where E , F T have nite measure; then f g = (E F) is
-integrable with integral
(E F) = E F =
_
f d
_
g d,
by 251Ia.
(b) It follows at once that f g is -simple, with
_
f g d =
_
f d
_
g d, whenever f is a -simple function and
g is a -simple function.
(c) If f and g are non-negative integrable functions, there are non-decreasing sequences f
n

nN
, g
n

nN
of non-
negative simple functions converging almost everywhere to f, g respectively; now note that if E X, F Y are
conegligible, E F is conegligible in X Y , as remarked in the proof of 253C, so the non-decreasing sequence
f
n
g
n

nN
of -simple functions converges almost everywhere to f g, and
_
f g d = lim
n
_
f
n
g
n
d = lim
n
_
f
n
d
_
g
n
d =
_
f d
_
g d
by B.Levis theorem.
230 Product measures 253D
(d) Finally, for general f and g, we can express them as the dierences f
+
f

, g
+
g

of non-negative integrable
functions, and see that
_
f g d =
_
f
+
g
+
f
+
g

g
+
+f

d =
_
f d
_
g d.
253E The canonical map L
1
L
1
L
1
I continue the argument from 253D. Because E F is conegligible in
X Y whenever E and F are conegligible subsets of X and Y , f
1
g
1
= f g -a.e. whenever f = f
1
-a.e. and
g = g
1
-a.e. We may therefore dene u v L
1
(), for u L
1
() and v L
1
(), by saying that u v = (f g)

whenever u = f

and v = g

.
Now if f, f
1
, f
2
L
1
(), g, g
1
, g
2
L
1
() and a R,
(f
1
+f
2
) g = (f
1
g) + (f
2
g),
f (g
1
+g
2
) = (f g
1
) + (f g
2
),
(af) g = a(f g) = f (ag).
It follows at once that the map (u, v) u v is bilinear.
Moreover, if f L
1
() and g L
1
(), [f[ [g[ = [f g[, so
_
[f g[d =
_
[f[d
_
[g[d. Accordingly
|u v|
1
= |u|
1
|v|
1
for all u L
1
(), v L
1
(). In particular, the bilinear operator is bounded, with norm 1 (except in the trivial case
in which one of L
1
(), L
1
() is 0-dimensional).
253F We are now ready for the main theorem of this section.
Theorem Let (X, , ) and (Y, T, ) be measure spaces, and let be the c.l.d. product measure on X Y . Let W
be any Banach space and : L
1
() L
1
() W a bounded bilinear operator. Then there is a unique bounded linear
operator T : L
1
() W such that T(u v) = (u, v) for all u L
1
() and v L
1
(), and |T| = ||.
proof (a) The centre of the argument is the following fact: if E
0
, . . . , E
n
are measurable sets of nite measure in
X, F
0
, . . . , F
n
are measurable sets of nite measure in Y , a
0
, . . . , a
n
R and

n
i=0
a
i
(E
i
F
i
) = 0 -a.e., then

n
i=0
a
i
(E

i
, F

i
) = 0 in W. PPP We can nd a disjoint family G
j

jm
of measurable sets of nite measure in
X such that each E
i
is expressible as a union of some subfamily of the G
j
; so that E
i
is expressible in the form

m
j=0
b
ij
G
j
(see 122Ca). Similarly, we can nd a disjoint family H
k

kl
of measurable sets of nite measure in Y
such that each F
i
is expressible as

l
k=0
c
ik
H
k
. Now

m
j=0

l
k=0
_
n
i=0
a
i
b
ij
c
ik
_
(G
j
H
k
) =

n
i=0
a
i
(E
i
F
i
) = 0 -a.e.
Because the G
j
H
k
are disjoint, and (G
j
H
k
) = G
j
H
k
for all j, k, it follows that for every j m, k l we
have either G
j
= 0 or H
k
= 0 or

n
i=0
a
i
b
ij
c
ik
= 0. In any of these three cases,

n
i=0
a
i
b
ij
c
ik
(G

j
, H

k
) = 0 in
W. But this means that
0 =

m
j=0

l
k=0
_
n
i=0
a
i
b
ij
c
ik
_
(G

j
, H

k
) =

n
i=0
a
i
(E

i
, F

i
),
as claimed. QQQ
(b) It follows that if E
0
, . . . , E
n
, E

0
, . . . , E

m
are measurable sets of nite measure in X, F
0
, . . . , F
n
, F

0
, . . . , F

m
are
measurable sets of nite measure in Y , a
0
, . . . , a
n
, a

0
, . . . , a

m
R and

n
i=0
a
i
(E
i
F
i
) =

m
i=0
a

i
(E

i
F

i
) -a.e.,
then

n
i=0
a
i
(E

i
, F

i
) =

m
i=0
a

i
(E

, F

)
in W. Let M be the linear subspace of L
1
() generated by
(E F)

: E , E < , F T, F < ;
then we have a unique map T
0
: M W such that
T
0
(

n
i=0
a
i
(E
i
F
i
)

) =

n
i=0
a
i
(E

i
, F

i
)
whenever E
0
, . . . , E
n
are measurable sets of nite measure in X, F
0
, . . . , F
n
are measurable sets of nite measure in
Y and a
0
, . . . , a
n
R. Of course T
0
is linear.
253F Tensor products 231
(c) Some of the same calculations show that |T
0
u| |||u|
1
for every u M. PPP If u M, then, by the arguments
of (a), we can express u as

m
j=0

l
k=0
a
jk
(G
j
H
k
)

, where G
j

jm
and H
k

kl
are disjoint families of sets of
nite measure. Now
|T
0
u| = |
m

j=0
l

k=0
a
jk
(G

j
, H

k
)|
m

j=0
l

k=0
[a
jk
[|(G

j
, H

k
)|

j=0
l

k=0
[a
jk
[|||G

j
|
1
|H

k
|
1
= ||
m

j=0
l

k=0
[a
jk
[G
j
H
k
= ||
m

j=0
l

k=0
[a
jk
[(G
j
H
k
) = |||u|
1
,
as claimed. QQQ
(d) The next point is to observe that M is dense in L
1
() for | |
1
. PPP Repeating the ideas above once again,
we observe that if E
0
, . . . , E
n
are sets of nite measure in X and F
0
, . . . , F
n
are sets of nite measure in Y , then
(

in
E
i
F
i
)

M; this is because, expressing each E


i
as a union of G
j
, where the G
j
are disjoint, we have

in
E
i
F
i
=

jm
G
j
F

j
,
where F

j
=

F
i
: G
j
E
i
for each j; now G
j
F

jm
is disjoint, so
(

jm
G
j
F
j
)

m
j=0
(G
j
F

j
)

M.
So 251Ie tells us that whenever H < and > 0 there is a G such that (HG) and G

M; now
|H

|
1
= (GH) ,
so H

is approximated arbitrarily closely by members of M, and belongs to the closure M of M in L


1
(). Because M
is a linear subspace of L
1
(), so is M (2A4Cb); accordingly M contains the equivalence classes of all -simple functions;
but these are dense in L
1
() (242Mb), so M = L
1
(), as claimed. QQQ
(e) Because W is a Banach space, it follows that there is a bounded linear operator T : L
1
() W extending T
0
,
with |T| = |T
0
| || (2A4I). Now T(u v) = (u, v) for all u L
1
(), v L
1
(). PPP If u = E

and v = F

,
where E, F are measurable sets of nite measure, then
T(u v) = T((E F)

) = T
0
((E F)

) = (E

, F

) = (u, v).
Because and are bilinear and T is linear,
T(f

) = (f

, g

)
whenever f and g are simple functions. Now whenever u L
1
(), v L
1
() and > 0, there are simple functions f,
g such that |u f

|
1
, |v g

|
1
(242Mb again); so that
|(u, v) (f

, g

)| |(u f

, v g

)| +|(u, g

v)| +|(f

u, v)|
||(
2
+|u|
1
+|v|
1
).
Similarly
|u v f

|
1
( +|u|
1
+|v|
1
),
so
|T(u v) T(f

)| |T|( +|u|
1
+|v|
1
);
because T(f

) = (f

, g

),
|T(u v) (u, v)| (|T| +||)( +|u|
1
+|v|
1
).
As is arbitrary, T(u v) = (u, v), as required. QQQ
(f ) The argument of (e) ensured that |T| ||. Because |u v|
1
|u|
1
|v|
1
for all u L
1
() and v L
1
(),
|(u, v)| |T||u|
1
|v|
1
for all u, v, and || |T|; so |T| = ||.
(g) Thus T has the required properties. To see that it is unique, we have only to observe that any bounded linear
operator S : L
1
() W such that S(u v) = (u, v) for all u L
1
(), v L
1
() must agree with T on objects of the
232 Product measures 253F
form (E F)

where E and F are of nite measure, and therefore on every member of M; because M is dense and
both S and T are continuous, they agree everywhere in L
1
().
253G The order structure of L
1
In 253F I have treated the L
1
spaces exclusively as normed linear spaces.
In general, however, the order structure of an L
1
space (see 242C) is as important as its norm. The map :
L
1
() L
1
() L
1
() respects the order structures of the three spaces in the following strong sense.
Proposition Let (X, , ) and (Y, T, ) be measure spaces, and the c.l.d. product measure on X Y . Then
(a) u v 0 in L
1
() whenever u 0 in L
1
() and v 0 in L
1
().
(b) The positive cone w : w 0 of L
1
() is precisely the closed convex hull C of u v : u 0, v 0 in L
1
().
*(c) Let W be any Banach lattice, and T : L
1
() W a bounded linear operator. Then the following are
equiveridical:
(i) Tw 0 in W whenever w 0 in L
1
();
(ii) T(u v) 0 in W whenever u 0 in L
1
() and v 0 in L
1
().
proof (a) If u, v 0 then they are expressible as f

, g

where f L
1
(), g L
1
(), f 0 and g 0. Now f g 0
so u v = (f g)

0.
(b)(i) Write L
1
()
+
for w : w L
1
(), w 0. Then L
1
()
+
is a closed convex set in L
1
() (242De); by (a), it
contains u v whenever u L
1
()
+
and v L
1
()
+
, so it must include C.
(ii)() Of course 0 = 0 0 C. () If u M, as dened in the proof of 253F, and u > 0, then u is expressible
as

jm,kl
a
jk
(G
j
H
k
)

, where G
0
, . . . , G
m
and H
0
, . . . , H
l
are disjoint sequences of sets of nite measure, as in
(a) of the proof of 253F. Now a
jk
can be negative only if (G
j
H
k
)

= 0, so replacing every a
jk
by max(0, a
jk
) if
necessary, we can suppose that a
jk
0 for all j, k. Not all the a
jk
can be zero, so a =

jm,kl
a
jk
> 0, and
u =

jm,kl
a
jk
a
a(G
j
H
k
)

jm,kl
a
jk
a
(aG

j
) H

k
C.
() If w L
1
()
+
and > 0, express w as h

where h 0 in L
1
(). There is a simple function h
1
0 such that
h
1

a.e.
h and
_
h
_
h
1
+ . Express h
1
as

n
i=0
a
i
H
i
where H
i
< and a
i
0 for each i, and for each
i n choose sets G
i0
, . . . , G
im
i
, F
i0
, . . . , F
im
i
T, all of nite measure, such that G
i0
, . . . , G
im
i
are disjoint and
(H
i

jm
i
G
ij
F
ij
) /(n + 1)(a
i
+ 1), as in (d) of the proof of 253F. Set
w
0
=

n
i=0
a
i

m
i
j=0
(G
ij
F
ij
)

.
Then w
0
C because w
0
M and w
0
0. Also
|w w
0
|
1
|w h

1
|
1
+|h

1
w
0
|
1

_
(h h
1
)d +
n

i=0
a
i
_
[H
i

m
i

j=0
(G
ij
F
ij
)[d
+
n

i=0
a
i
(H
_
jm
i
G
ij
F
ij
) 2.
As is arbitrary and C is closed, w C. As w is arbitrary, L
1
()
+
C and C = L
1
()
+
.
(c) Part (a) tells us that (i)(ii). For the reverse implication, we need a fragment from the theory of Banach
lattices: W
+
= w : w W, w 0 is a closed set in W. PPP If w, w

W, then
w = (w w

) +w

[w w

[ +w

[w w

[ +[w

[,
w = (w

w) w

[w w

[ w

[w w

[ +[w

[,
[w[ [w w

[ +[w

[, [w[ [w

[ [w w

[,
because [w[ = w (w) and the order of W is translation-invariant (241Ec). Similarly, [w

[ [w[ [w w

[ and
[[w[ [w

[[ [w w

[, so |[w[ [w

[| |w w

|, by the denition of Banach lattice (242G). Setting (w) = [w[ w,


we see that |(w) (w

)| 2|w w

| for all w, w

W, so that is continuous.
Now, because the order is invariant under multiplication by positive scalars,
w 0 2w 0 w w w = [w[ (w) = 0,
so W
+
= w : (w) = 0 is closed. QQQ
253I Tensor products 233
Now suppose that (ii) is true, and set C
1
= w : w L
1
(), Tw 0. Then C
1
contains u v whenever u, v 0;
but also it is convex, because T is linear, and closed, because T is continuous and C
1
= T
1
[W
+
]. By (b), C
1
includes
w : w L
1
(), w 0, as required by (i).
253H Conditional expectations The ideas of this section and the preceding one provide us with some of the
most important examples of conditional expectations.
Theorem Let (X, , ) and (Y, T, ) be complete probability spaces, with c.l.d. product (X Y, , ). Set
1
=
E Y : E . Then
1
is a -subalgebra of . Given a -integrable real-valued function f, set
g(x, y) =
_
f(x, z)(dz)
whenever x X, y Y and the integral is dened in R. Then g is a conditional expectation of f on
1
.
proof We know that
1
, by 251Ia, and
1
is a -algebra of sets because is. Fubinis theorem (252B, 252C)
tells us that f
1
(x) =
_
f(x, z)(dz) is dened for almost every x, and therefore that g = f
1
Y is dened almost
everywhere in X Y . f
1
is -virtually measurable; because is complete, f
1
is -measurable, so g is
1
-measurable
(since (x, y) : g(x, y) = x : f
1
(x) Y for every R). Finally, if W
1
, then W = E Y for some
E , so
_
W
g d =
_
(f
1
Y ) (E Y )d =
_
f
1
E d
_
Y d
(by 253D)
=
__
E(x)f(x, y)(dy)(dx) =
_
f (E Y )d
(by Fubinis theorem)
=
_
W
f d.
So g is a conditional expectation of f.
253I This is a convenient moment to set out a useful result on products of indenite-integral measures.
Proposition Let (X, , ) and (Y, T, ) be measure spaces, and f L
0
(), g L
0
() non-negative functions. Let

be the corresponding indenite-integral measures (see 234). Let be the c.l.d. product of and , and

the
indenite-integral measure dened from and f g L
0
() (253Cb). Then

is the c.l.d. product of

and

.
proof Write for the c.l.d. product of

and

.
(a) If we replace by its completion, we do not change

(234Ke); at the same time, we do not change , by 251T.


The same applies to . So it will be enough to prove the result on the assumption that and are complete; in which
case f and g are measurable and have measurable domains.
Set F = x : x domf, f(x) > 0 and G = y : y domg, g(y) > 0, so that F G = w : w dom(f g), (f
g)(w) > 0. Then F is

-conegligible and G is

-conegligible, so F G is -conegligible as well as

-conegligible.
Because both and

are complete (251Ic, 234I), it will be enough to show that the subspace measures
FG
,

FG
on F G are equal. But note that
FG
can be identied with the product of

F
and

G
, where

F
and

G
are
the subspace measures on F, G respectively (251Q(ii-)). At the same time,

F
is the indenite-integral measure
dened from the subspace measure
F
on F and the function fF,

G
is the indenite-integral measure dened from
the subspace measure
G
on G and gG, and

FG
is dened from the subspace measure
FG
and (fF) (gG).
Finally, by 251Q again,
FG
is the product of
F
and
G
.
What all this means is that it will be enough to deal with the case in which F = X and G = Y , that is, f and g are
everywhere dened and strictly positive; which is what I will suppose from now on.
(b) In this case dom

= and dom

= T (234La). Similarly, dom

= is just the domain of . Set


F
n
= x : x X, 2
n
f(x) 2
n
, G
n
= y : y Y, 2
n
g(y) 2
n

for n N.
(c) Set
/ = W : W dom dom

, (W) =

(W).
234 Product measures 253I
If

E and

H are dened and nite, then f E and g H are integrable, so

(E H) =
_
(f g) (E H)d =
_
(f E) (g H)d
=
_
f E d
_
g H d = (E H)
by 253D and 251Ia, that is, E H /. If we now look at /
EH
= W : W X Y , W (E H) /, then we see
that
/
EH
contains E

for every E

, H

T,
if W
n

nN
is a non-decreasing sequence in /
EH
then

nN
W
n
/
EH
,
if W, W

/
EH
and W W

then W

W /
EH
.
Thus /
EH
is a Dynkin class of subsets of X Y , and by the Monotone Class Theorem (136B) includes the -algebra
generated by E

: E

, H

T, which is

T.
(d) Now suppose that W . In this case W dom and W

W. PPP Take n N, and E , H T such that

E and

H are both nite. Set E

= EF
n
, H

= HG
n
and W

= W(E

). Then W

, while E

2
n

E
and H

2
n

H are nite. By 251Ib there is a V

T such that V W

and V = W

. Similarly, there is a
V

T such that V

(E

) W

and V

= ((E

) W

). This means that ((E

) (V V

)) = 0,
so

((E

) (V V

)) = 0. But (E

) (V V

) /, by (c), so ((E

) (V V

)) = 0 and W

dom,
while
W

= V =

W.
Since E and H are arbitrary, W(F
n
G
n
) dom (251H) and (W (F
n
G
n
))

W. Since F
n

nN
, G
n

nN
are non-decreasing sequences with unions X, Y respectively,
W = sup
nN
(W (F
n
G
n
))

W. QQQ
(e) In the same way,

W is dened and less than or equal to W for every W dom. PPP The arguments are very
similar, but a renement seems to be necessary at the last stage. Take n N, and E , H T such that E and
H are both nite. Set E

= E F
n
, H

= H G
n
and W

= W (E

). Then W

dom, while

2
n
E
and

2
n
H are nite. This time, there are V , V

T such that V W

, V

(E

) W

, V = W

and V

= ((E

) W

). Accordingly

V +

= V +V

= (E

) =

(E

),
so that

is dened and equal to W

.
What this means is that W (F
n
G
n
) (E H) / whenever E and H are nite. So W (F
n
G
n
) , by
251H; as n is arbitrary, W and

W is dened.
??? Suppose, if possible, that

W > W. Then there is some n N such that

(W (F
n
G
n
)) > W. Because
is semi-nite, 213B tells us that there is some -simple function h such that h (f g) (W (F
n
G
n
)) and
_
hd > W; setting V = (x, y) : h(x, y) > 0, we see that V W (F
n
G
n
), V is dened and nite and

V > W. Now there must be sets E , H T such that E and F are both nite and (V (E H)) <
4
n
(

V W). But in this case V dom (by (d)), so we can apply the argument just above to V and conclude
that V (E H) = V (F
n
G
n
) (E H) belongs to /. And now

V =

(V (E H)) +

(V (E H))
(V (E H)) + 4
n
(V (E H)) < V +

V W

V,
which is absurd. XXX
So

W is dened and not greater than W. QQQ


(f ) Putting this together with (d), we see that

= , as claimed.
Remark If

and

are totally nite, so that they are truly continuous with respect to and in the sense of
232Ab, then f and g are integrable, so f g is -integrable, and =

is truly continuous with respect to .


The proof above can be simplied using fragments of the general theory of complete locally determined spaces, which
will be given in 412 in Volume 4.
*253J Upper integrals The idea of 253D can be repeated in terms of upper integrals, as follows.
253Lc Tensor products 235
Proposition Let (X, , ) and (Y, T, ) be -nite measure spaces, with c.l.d. product measure . Then for any
functions f and g, dened on conegligible subsets of X and Y respectively, and taking values in [0, ],
_
f g d =
_
fd
_
g d.
Remark Here (f g)(x, y) = f(x)g(y) for all x domf and y domg, taking 0 = 0, as in 135.
proof (a) I show rst that
_
f g
_
f
_
g. PPP If
_
f = 0, then f = 0 a.e., so f g = 0 a.e. and the result is immediate.
The same argument applies if
_
g = 0. If both
_
f and
_
g are non-zero, and either is innite, the result is trivial. So
let us suppose that both are nite. In this case there are integrable f
0
, g
0
such that f
a.e.
f
0
, g
a.e.
g
0
,
_
f =
_
f
0
and
_
g =
_
g
0
(133Ja/135Ha). So f g
a.e.
f
0
g
0
, and
_
f g
_
f
0
g
0
=
_
f
0
_
g
0
=
_
f
_
g,
by 253D. QQQ
(b) For the reverse inequality, we need consider only the case in which
_
f g is nite, so that there is a -integrable
function h such that f g
a.e.
h and
_
f g =
_
h. Set
f
0
(x) =
_
h(x, y)(dy)
whenever this is dened in R, which is almost everywhere, by Fubinis theorem (252B-252C). Then f
0
(x) f(x)
_
g d
for every x domf
0
domf, which is a conegligible set in X; so
_
f g =
_
hd =
_
f
0
d
_
f
_
g,
as required.
*253K A similar argument applies to upper integrals of sums, as follows.
Proposition Let (X, , ) and (Y, T, ) be probability spaces, with c.l.d. product measure . Then for any real-valued
functions f, g dened on conegligible subsets of X, Y respectively,
_
f(x) +g(y) (d(x, y)) =
_
f(x)(dx) +
_
g(y)(dy),
at least when the right-hand side is dened in [, ].
proof Set h(x, y) = f(x) +g(y) for x domf and y domg, so that domh is -conegligible.
(a) As in 253J, I start by showing that
_
h
_
f +
_
g. PPP If either
_
f or
_
g is , this is trivial. Otherwise, take
integrable functions f
0
, g
0
such that f
a.e.
f
0
and g
a.e.
g
0
. Set h
0
= (f
0
Y ) + (X g
0
); then h h
0
-a.e., so
_
hd
_
h
0
d =
_
f
0
d +
_
g
0
d.
As f
0
, g
0
are arbitrary,
_
h
_
f +
_
g. QQQ
(b) For the reverse inequality, suppose that h h
0
for -almost every (x, y), where h
0
is -integrable. Set f
0
(x) =
_
h
0
(x, y)(dy) whenever this is dened in R. Then f
0
(x) f(x) +
_
g d whenever x domf domf
0
, so
_
h
0
d =
_
f
0
d
_
fd +
_
g d.
As h
0
is arbitrary,
_
h
_
f +
_
g, as required.
253L Complex spaces As usual, the ideas of 253F and 253H apply essentially unchanged to complex L
1
spaces.
Writing L
1
C
(), etc., for the complex L
1
spaces involved, we have the following results. Throughout, let (X, , ) and
(Y, T, ) be measure spaces, and the c.l.d. product measure on X Y .
(a) If f L
0
C
() and g L
0
C
() then f g, dened by the formula (f g)(x, y) = f(x)g(y) for x domf and
y domg, belongs to L
0
C
().
(b) If f L
1
C
() and g L
1
C
() then f g L
1
C
() and
_
f g d =
_
fd
_
g d.
(c) We have a bilinear operator (u, v) u v : L
1
C
() L
1
C
() L
1
C
() dened by writing f

= (f g)

for
all f L
1
C
(), g L
1
C
().
236 Product measures 253Ld
(d) If W is any complex Banach space and : L
1
C
() L
1
C
() W is any bounded bilinear operator, then there is
a unique bounded linear operator T : L
1
C
() W such that T(u v) = (u, v) for every u L
1
C
() and v L
1
C
(),
and |T| = ||.
(e) If and are complete probability measures, and
1
= E Y : E , then for any f L
1
C
() we have a
conditional expectation g of f on
1
given by setting g(x, y) =
_
f(x, z)(dz) whenever this is dened.
253X Basic exercises >>>(a) Let U, V and W be linear spaces. Show that the set of bilinear operators from U V
to W has a natural linear structure agreeing with those of L(U; L(V ; W)) and L(V ; L(U; W)), writing L(U; W) for the
linear space of linear operators from U to W.
>>>(b) Let U, V and W be normed spaces. (i) Show that for a bilinear operator : U V W the following
are equiveridical: () is bounded in the sense of 253Ab; () is continuous; () is continuous at some point of
U V . (ii) Show that the space of bounded bilinear operators from U V to W is a linear subspace of the space of all
bilinear operators from U V to W, and that the functional | | dened in 253Ab is a norm, agreeing with the norms
of B(U; B(V ; W)) and B(V ; B(U; W)), writing B(U; W) for the normed space of bounded linear operators from U to
W.
(c) Let (X
1
,
1
,
1
), . . . , (X
n
,
n
,
n
) be measure spaces, and the c.l.d. product measure on X
1
. . . X
n
, as
described in 251W. Let W be a Banach space, and suppose that : L
1
(
1
). . . L
1
(
n
) W is multilinear (that is,
linear in each variable separately) and bounded (that is, || = sup(u
1
, . . . , u
n
) : |u
i
|
1
1 i n < ). Show
that there is a unique bounded linear operator T : L
1
() W such that T = , where : L
1
(
1
) . . . L
1
(
n
)
L
1
() is a canonical multilinear operator (to be dened).
(d) Let (X, , ) and (Y, T, ) be measure spaces, and the c.l.d. product measure on XY . Show that if A L
1
()
and B L
1
() are both uniformly integrable, then u v : u A, v B is uniformly integrable in L
1
().
>>>(e) Let (X, , ) and (Y, T, ) be measure spaces and the c.l.d. product measure on X Y . Show that
(i) we have a bilinear operator (u, v) u v : L
0
() L
0
() L
0
() given by setting f

= (f g)

for all
f L
0
() and g L
0
();
(ii) if 1 p then u v L
p
() and |u v|
p
= |u|
p
|v|
p
for all u L
p
() and v L
p
();
(iii) if u, u

L
2
() and v, v

L
2
() then the inner product (u v[u

), taken in L
2
(), is just (u[u

)(v[v

);
(iv) the map (u, v) u v : L
0
() L
0
() L
0
() is continuous if L
0
(), L
0
() and L
0
() are all given their
topologies of convergence in measure.
(f ) In 253Xe, assume that and are semi-nite. Show that if u
0
, . . . , u
n
are linearly independent members of
L
0
() and v
0
, . . . , v
n
L
0
() are not all 0, then

n
i=0
u
i
v
i
,= 0 in L
0
(). (Hint: start by nding sets E , F T
of nite measure such that u
0
E

, . . . , u
n
E

are linearly independent and v


0
F

, . . . , v
n
F

are not all 0.)


(g) In 253Xe, assume that and are semi-nite. If U, V are linear subspaces of L
0
() and L
0
() respectively,
write U V for the linear subspace of L
0
() generated by u v : u U, v V . Show that if W is any linear space
and : U V W is a bilinear operator, there is a unique linear operator T : U V W such that T(uv) = (u, v)
for all u U, v V . (Hint: start by showing that if u
0
, . . . , u
n
U and v
0
, . . . , v
n
V are such that

n
i=0
u
i
v
i
= 0,
then

n
i=0
(u
i
, v
i
) = 0 do this by expressing the u
i
as linear combinations of some linearly independent family and
applying 253Xf.)
>>>(h) Let (X, , ) and (Y, T, ) be complete probability spaces, with c.l.d. product measure . Suppose that
p [1, ] and that f L
p
(). Set g(x) =
_
f(x, y)(dy) whenever this is dened. Show that g L
p
() and that
|g|
p
|f|
p
. (Hint: 253H, 244M.)
(i) Let (X, , ) and (Y, T, ) be measure spaces, with c.l.d. product measure , and p [1, [. Show that
w : w L
p
(), w 0 is the closed convex hull in L
p
() of uv : u L
p
(), v L
p
(), u 0, v 0 (see 253Xe(ii)
above).
253Y Further exercises (a) Let (X, , ) and (Y, T, ) be measure spaces, and
0
the primitive product measure
on X Y . Show that if f L
0
() and g L
0
(), then f g L
0
(
0
).
(b) Let (X, , ) and (Y, T, ) be measure spaces, and
0
the primitive product measure on X Y . Show that if
f L
1
() and g L
1
(), then f g L
1
(
0
) and
_
f g d
0
=
_
f d
_
g d.
253 Notes Tensor products 237
(c) Let (X, , ) and (Y, T, ) be measure spaces, and
0
, the primitive and c.l.d. product measures on X Y .
Show that the embedding L
1
(
0
)

L
1
() induces a Banach lattice isomorphism between L
1
(
0
) and L
1
().
(d) Let (X, , ), (Y, T, ) be strictly localizable measure spaces, with c.l.d. product measure . Show that L

()
can be identied with L
1
()

. Show that under this identication w : w L

(), w 0 is the weak*-closed convex


hull of u v : u L

(), v L

(), u 0, v 0.
(e) Find a version of 253J valid when one of , is not -nite.
(f ) Let (X, , ) be any measure space and V any Banach space. Write L
1
V
= L
1
V
() for the set of functions f such
that () domf is a conegligible subset of X () f takes values in V () there is a conegligible set D domf such
that f[D] is separable and D f
1
[G] for every open set G V () the integral
_
|f(x)|(dx) is nite. (These
are the Bochner integrable functions from X to V .) For f, g L
1
V
write f g if f = g -a.e.; let L
1
V
be the set of
equivalence classes in L
1
V
under . Show that
(i) f +g, cf L
1
V
for all f, g L
1
V
, c R;
(ii) L
1
V
has a natural linear space structure, dened by writing f

+g

= (f +g)

, cf

= (cf)

for f, g L
1
V
and
c R;
(iii) L
1
V
has a norm | |, dened by writing |f

| =
_
|f(x)|(dx) for f L
1
V
;
(iv) L
1
V
is a Banach space under this norm;
(v) there is a natural map : L
1
V L
1
V
dened by writing (f v)(x) = f(x)v when f L
1
= L
1
R
(), v V
and x domf;
(vi) there is a canonical bilinear operator : L
1
V L
1
V
dened by writing f

v = (f v)

for f L
1
and
v V ;
(vii) whenever W is a Banach space and : L
1
V W is a bounded bilinear operator, there is a unique bounded
linear operator T : L
1
V
W such that T(u v) = (u, v) for all u L
1
and v V , and |T| = ||. (When W = V
and (u, v) = (
_
u)v for u L
1
and v V , Tf

is called the Bochner integral of f.)


(g) Let (X, , ) and (Y, T, ) be measure spaces, and
0
the primitive product measure on X Y . If f is a

0
-integrable function, write f
x
(y) = f(x, y) whenever this is dened. Show that we have a map x f

x
from a
conegligible subset D
0
of X to L
1
(). Show that this map is a Bochner integrable function, as dened in 253Yf, and
that its Bochner integral is
_
fd
0
.
(h) Let (X, , ) and (Y, T, ) be measure spaces, and suppose that is a function from X to a separable subset of
L
1
() which is measurable in the sense that
1
[G] for every open G L
1
(). Show that there is a -measurable
function f from X Y to R, where is the domain of the c.l.d. product measure on X Y , such that (x) = f

x
for
every x X, writing f
x
(y) = f(x, y) for x X, y Y .
(i) Let (X, , ) and (Y, T, ) be measure spaces, and the c.l.d. product measure on X Y . Show that 253Yg
provides a canonical identication between L
1
() and L
1
L
1
()
().
(j) Let (X, , ) and (Y, T, ) be complete locally determined measure spaces, with c.l.d. product measure . (i)
Suppose that K L
2
(), f L
2
(). Show that h(y) =
_
K(x, y)f(x)dx is dened for almost all y Y and that
h L
2
(). (Hint: to see that h is dened a.e., consider
_
EF
K(x, y)f(x)d(x, y) for E, F < ; to see that
h L
2
consider
_
h g where g L
2
().) (ii) Show that the map f h corresponds to a bounded linear operator
T
K
: L
2
() L
2
(). (iii) Show that the map K T
K
corresponds to a bounded linear operator, of norm at most 1,
from L
2
() to B(L
2
(); L
2
()).
(k) Suppose that p, q [1, ] and that
1
p
+
1
q
= 1, interpreting
1

as 0 as usual. Let (X, , ), (Y, T, ) be complete


locally determined measure spaces with c.l.d. product measure . Show that the ideas of 253Yj can be used to dene
a bounded linear operator, of norm at most 1, from L
p
() to B(L
q
(); L
p
()).
(l) In 253Xc, suppose that W is a Banach lattice. Show that the following are equiveridical: (i) Tu 0 whenever
u L
1
(); (ii) (u
1
, . . . , u
n
) 0 whenever u
i
0 in L
1
(
i
) for each i n.
253 Notes and comments Throughout the main arguments of this section, I have written the results in terms of
the c.l.d. product measure; of course the isomorphism noted in 253Yc means that they could just as well have been
expressed in terms of the primitive product measure. The more restricted notion of integrability with respect to the
primitive product measure is indeed the one appropriate for the ideas of 253Yg.
238 Product measures 253 Notes
Theorem 253F is a universal mapping theorem; it asserts that every bounded bilinear operator on L
1
() L
1
()
factors through : L
1
() L
1
() L
1
(), at least if the range space is a Banach space. It is easy to see that
this property denes the pair (L
1
(), ) up to Banach space isomorphism, in the following sense: if V is a Banach
space, and : L
1
() L
1
() V is a bounded bilinear operator such that for every bounded bilinear operator
from L
1
() L
1
() to any Banach space W there is a unique bounded linear operator T : V W such that T =
and |T| = ||, then there is an isometric Banach space isomorphism S : L
1
() V such that S = . There is
of course a general theory of bilinear operators between Banach spaces; in the language of this theory, L
1
() is, or is
isomorphic to, the projective tensor product of L
1
() and L
1
(). For an introduction to this subject, see Defant &
Floret 93, I.3, or Semadeni 71, 20. I should perhaps emphasise, for the sake of those who have not encountered
tensor products before, that this theorem is special to L
1
spaces. While some of the same ideas can be applied to other
function spaces (see 253Xe-253Xg), there is no other class to which 253F applies.
There is also a theory of tensor products of Banach lattices, for which I do not think we are quite ready (it needs
general ideas about ordered linear spaces for which I mean to wait until Chapter 35 in the next volume). However
253G shows that the ordering, and therefore the Banach lattice structure, of L
1
() is determined by the ordering of
L
1
() and L
1
() and the map : L
1
() L
1
() L
1
().
The conditional expectation operators described in 253H are of very great importance, largely because in this special
context we have a realization of the conditional expectation operator as a function P
0
from L
1
() to L
1
(
1
), not just
as a function from L
1
() to L
1
(
1
), as in 242J. As described here, P
0
(f +f

) need not be equal, in the strict sense,


to P
0
f +P
0
f

; it can have a larger domain. In applications, however, one might be willing to restrict attention to the
linear space U of bounded

T-measurable functions dened everywhere on X Y , so that P


0
becomes an operator
from U to itself (see 252P).
254 Innite products
I come now to the second basic idea of this chapter: the description of a product measure on the product of a (possibly
large) family of probability spaces. The section begins with a construction on similar lines to that of 251 (254A-254F)
and its dening property in terms of inverse-measure-preserving functions (254G). I discuss the usual measure on 0, 1
I
(254J-254K), subspace measures (254L) and various properties of subproducts (254M-254T), including a study of the
associated conditional expectation operators (254R-254T).
254A Denitions (a) Let (X
i
,
i
,
i
)
iI
be a family of probability spaces. Set X =

iI
X
i
, the family of
functions x with domain I such that x(i) X
i
for every i I. In this context, I will say that a measurable cylinder
is a subset of X expressible in the form
C =

iI
C
i
,
where C
i

i
for every i I and i : C
i
,= X
i
is nite. Note that for a non-empty C X this expression is unique.
PPP Suppose that C =

iI
C
i
=

iI
C

i
. For each i I set
D
i
= x(i) : x C.
Of course D
i
C
i
. Because C ,= , we can x on some z C. If i I and C
i
, consider x X dened by setting
x(i) = , x(j) = z(j) for j ,= i;
then x C so = x(i) D
i
. Thus D
i
= C
i
for i I. Similarly, D
i
= C

i
. QQQ
(b) We can therefore dene a functional
0
: ( [0, 1], where ( is the set of measurable cylinders, by setting

0
C =

iI

i
C
i
whenever C
i

i
for every i I and i : C
i
,= X
i
is nite, noting that only nitely many terms in the product can
dier from 1, so that it can safely be treated as a nite product. If C = , one of the C
i
must be empty, so
0
C is
surely 0, even though the expression of C as

iI
C
i
is no longer unique.
(c) Now dene : TX [0, 1] by setting
A = inf

n=0

0
C
n
: C
n
( for every n N, A

nN
C
n
.
254B Lemma The functional dened in 254Ac is always an outer measure on X.
proof Use exactly the same arguments as those in 251B above.
254F Innite products 239
254C Denition Let (X
i
,
i
,
i
)
iI
be any indexed family of probability spaces, and X the Cartesian product

iI
X
i
. The product measure on X is the measure dened by Caratheodorys method (113C) from the outer
measure dened in 254A.
254D Remarks (a) In 254Ab, I asserted that if C ( and no C
i
is empty, then nor is C =

iI
C
i
. This is
the Axiom of Choice: the product of any family C
i

iI
of non-empty sets is non-empty, that is, there is a choice
function x with domain I picking out a distinguished member x(i) of each C
i
. In this volume I have not attempted
to be scrupulous in indicating uses of the axiom of choice. In fact the use here is not an absolutely vital one; I mean,
the theory of innite products, even uncountable products, of probability spaces does not change character completely
in the absence of the full axiom of choice (provided, that is, that we allow ourselves to use the countable axiom of
choice). The point is that all we really need, in the present context, is that X =

iI
X
i
should be non-empty; and
in many contexts we can prove this, for the particular cases of interest, without using the axiom of choice, by actually
exhibiting a member of X. The simplest case in which this is dicult is when the X
i
are uncontrolled Borel subsets
of [0, 1]; and even then, if they are presented with coherent descriptions, we may, with appropriate labour, be able to
construct a member of X. But clearly such a process is liable to slow us down a good deal, and for the moment I think
there is no great virtue in taking so much trouble.
(b) I have given this section the title innite products, but it is useful to be able to apply the ideas to nite I; I
should mention in particular the cases #(I) 2.
(i) If I = , X consists of the unique function with domain I, the empty function. If we identify a function with
its graph, then X is actually ; in any case, X is to be a singleton set, with X = 1.
(ii) If I is a singleton i, then we can identify X with X
i
; ( becomes identied with
i
and
0
with
i
, so
that can be identied with

i
and the product measure becomes the measure on X
i
dened from

i
, that is, the
completion of
i
(213Xa(iv)).
(iii) If I is a doubleton i, j, then we can identify X with X
i
X
j
; in this case the denitions of 254A and 254C
match exactly with those of 251A and 251C, so that here can be identied with the primitive product measure as
dened in 251C. Because
i
and
j
are both totally nite, this agrees with the c.l.d. product measure of 251F.
254E Denition Let X
i

iI
be any family of sets, and X =

iI
X
i
. If
i
is a -subalgebra of subsets of X
i
for
each i I, I write

iI

i
for the -algebra of subsets of X generated by
x : x X, x(i) E : i I, E
i
.
(Compare 251D.)
254F Theorem Let (X
i
,
i
,
i
)
iI
be a family of probability spaces, and let be the product measure on
X =

iI
X
i
dened as in 254C; let be its domain.
(a) X = 1.
(b) If E
i

i
for every i I, and i : E
i
,= X
i
is countable, then

iI
E
i
, and (

iI
E
i
) =

iI

i
E
i
. In
particular, C =
0
C for every measurable cylinder C, as dened in 254A, and if j I then x x(j) : X X
j
is
inverse-measure-preserving.
(c)

iI

i
.
(d) is complete.
(e) For every W and > 0 there is a nite family C
0
, . . . , C
n
of measurable cylinders such that (W

kn
C
k
)
.
(f) For every W there are W
1
, W
2

iI

i
such that W
1
W W
2
and (W
2
W
1
) = 0.
Remark Perhaps I should pause to interpret the product

iI

i
E
i
. Because all the
i
E
i
belong to [0, 1], this is
simply inf
JI,J is nite

iJ

i
E
i
, taking the empty product to be 1.
proof Throughout this proof, dene (,
0
and as in 254A. I will write out an argument which applies to nite I as
well as innite I, but you may reasonably prefer to assume that I is innite on rst reading.
(a) Of course X = X, so I have to show that X = 1. Because X, ( and
0
X =

iI

i
X
i
= 1 and
0
= 0,
X
0
X +
0
+. . . = 1.
I therefore have to show that X 1. ??? Suppose, if possible, otherwise.
240 Product measures 254F
(i) There is a sequence C
n

nN
in (, covering X, such that

n=0

0
C
n
< 1. For each n N, express C
n
as
x : x(i) E
ni
i I, where every E
ni

i
and J
n
= i : E
ni
,= X
i
is nite. No J
n
can be empty, because

0
C
n
< 1 =
0
X; set J =

nN
J
n
. Then J is a countable non-empty subset of I. Set K = N if J is innite,
k : 0 k < #(J) if J is nite; let k i
k
: K J be a bijection.
For each k K, set L
k
= i
j
: j < k J, and set
nk
=

iI\L
k

i
E
ni
for n N, k K. If J is nite, then we
can identify L
#(J)
with J, and set
n,#(J)
= 1 for every n. We have
n0
=
0
C
n
for each n, so

n=0

n0
< 1. For
n N, k K and t X
i
k
set
f
nk
(t) =
n,k+1
if t E
n,i
k
,
= 0 otherwise.
Then
_
f
nk
d
i
k
=
n,k+1

i
k
E
n,i
k
=
nk
.
(ii) Choose t
k
X
i
k
inductively, for k K, as follows. The inductive hypothesis will be that

nM
k

nk
< 1,
where M
k
= n : n N, t
j
E
n,i
j
j < k; of course M
0
= N, so the induction starts. Given that
1 >

nM
k

nk
=

nM
k
_
f
nk
d
i
k
=
_
(

nM
k
f
nk
)d
i
k
(by B.Levis theorem), there must be a t
k
X
i
k
such that

nM
k
f
nk
(t
k
) < 1. Now for such a choice of t
k
,
n,k+1
=
f
nk
(t
k
) for every n M
k+1
, so that

nM
k+1

n,k+1
< 1, and the induction continues, unless J is nite and k + 1 =
#(J). In this last case we must just have M
#(J)
= , because
n,#(J)
= 1 for every n.
(iii) If J is innite, we obtain a full sequence t
k

kN
; if J is nite, we obtain just a nite sequence t
k

k<#(J)
. In
either case, there is an x X such that x(i
k
) = t
k
for each k K. Now there must be some m N such that x C
m
.
Because J
m
= i : E
mi
,= X
i
is nite, there is a k N such that J
m
L
k
(allowing k = #(J) if J is nite). Now
m M
k
, so in fact we cannot have k = #(J), and
mk
= 1, so

nM
k

nk
1, contrary to the inductive hypothesis.
XXX
This contradiction shows that X = 1.
(b)(i) I take the particular case rst. Let j I and E
j
, and let C (; set W = x : x X, x(j) E; then
C W and C W both belong to (, and
0
C =
0
(C W) +
0
(C W). PPP If C =

iI
C
i
, where C
i

i
for each i,
then C W =

iI
C

i
, where C

i
= C
i
if i ,= j, and C

j
= C
j
E; similarly, C W =

iI
C

i
, where C

i
= C
i
if i ,= j,
and C

j
= C
j
E. So both belong to (, and

0
(C W) +
0
(C W) = (
j
(C
j
E) +
j
(C
j
E))

i=j
C
i
=

iI
C
i
=
0
C. QQQ
(ii) Now suppose that A X is any set, and > 0. Then there is a sequence C
n

nN
in ( such that A

nN
C
n
and

n=0

0
C
n
A+. In this case
A W

nN
C
n
W, A W

nN
C
n
W,
so
(A W)

n=0

0
(C
n
W), (A W)

n=0

0
(C
n
W),
and
(A W) +(A W)

n=0

0
(C
n
W) +
0
(C
n
W) =

n=0

0
C
n
A+.
As is arbitrary, (A W) +(A W) A; as A is arbitrary, W .
(iii) I show next that if J I is nite and C
i

i
for each i J, and C = x : x X, x(i) C
i
i J, then
C and C =

iJ

i
C
i
. PPP Induce on #(J). If #(J) = 0, that is, J = , then C = X and this is part (a). For the
inductive step to #(J) = n + 1, take any j J and set J

= J j,
C

= x : x X, x(i) C
i
i J

,
C

= C

C = x : x C

, x(j) X
j
C
j
.
Then C, C

, C

all belong to (, and


0
C

iJ


i
C
i
= say,
0
C =
j
C
j
,
0
C

= (1
j
C
j
). Moreover, by
the inductive hypothesis, C

and = C

= C

. So C = C

x : x(j) C
j
by (ii), and C

= C

C .
We surely have C = C
0
C, C


0
C

; but also
= C

= C +C


0
C +
0
C

= ,
254G Innite products 241
so in fact
C =
0
C =
j
C
j
=

iJ
C
i
,
and the induction proceeds. QQQ
(iv) Now let us return to the general case of a set W of the form

iI
E
i
where E
i

i
for each i, and
K = i : E
i
,= X
i
is countable. If K is nite then W = x : x(i) E
i
i K so W and
W =

iK

i
E
i
=

iI

i
E
i
.
Otherwise, let i
n

nN
be an enumeration of K. For each n N set W
n
= x : x X, x(i
k
) E
i
k
k n; then we
know that W
n
and that W
n
=

n
k=0

i
k
E
i
k
. But W
n

nN
is a non-increasing sequence with intersection W, so
W and
W = lim
n
W
n
=

iK

i
E
i
=

iI

i
E
i
.
(c) is an immediate consequence of (b) and the denition of

iI

i
.
(d) Because is constructed by Caratheodorys method it must be complete.
(e) Let C
n

nN
be a sequence in ( such that W

nN
C
n
and

n=0

0
C
n
W +
1
2
. Set V =

nN
C
n
; by
(b), V . Let n N be such that

i=n+1

0
C
i

1
2
, and consider W

kn
C
k
. Since V W

i>n
C
i
,
(WW

) (V W

) +(V W) = V W +(V W) = V W +(V W)

i=0

0
C
i
W +

i=n+1

0
C
i

1
2
+
1
2
= .
(f )(i) If W and > 0 there is a V

iI

i
such that W V and V W +. PPP Let C
n

nN
be a sequence
in ( such that W

nN
C
n
and

n=0

0
C
n
W + . Then C
n

iI

i
for each n, so V =

nN
C
n

iI

i
.
Now W V , and
V = V

n=0

0
C
n
W + = W +. QQQ
(ii) Now, given W , let V
n

nN
be a sequence of sets in

iI

i
such that W V
n
and V
n
W +2
n
for
each n; then W
2
=

nN
V
n
belongs to

iI

i
and W W
2
and W
2
= W. Similarly, there is a W

iI

i
such
that X W W

2
and W

2
= (X W), so we may take W
1
= X W

2
to complete the proof.
254G The following is a fundamental, indeed dening, property of product measures. (Compare 251L.)
Lemma Let (X
i
,
i
,
i
)
iI
be a family of probability spaces with product (X, , ). Let (Y, T, ) be a complete
probability space and : Y X a function. Suppose that

1
[C] C for every measurable cylinder C X.
Then is inverse-measure-preserving. In particular, is inverse-measure-preserving i
1
[C] T and
1
[C] = C
for every measurable cylinder C X.
Remark By

I mean the usual outer measure dened from as in 132.


proof (a) First note that, writing for the outer measure of 254A,

1
[A] A for every A X. PPP Given > 0,
there is a sequence C
n

nN
of measurable cylinders such that A

nN
C
n
and

n=0

0
C
n
A+, where
0
is the
functional of 254A. But we know that
0
C = C for every measurable cylinder C (254Fb), so

1
[A]

nN

1
[C
n
])

n=0

1
[C
n
]

n=0
C
n
A+.
As is arbitrary,

1
[A] A. QQQ
(b) Now take any W . Then there are F, F

T such that

1
[W] F,
1
[X W] F

,
F =

1
[W] W = W, F

[X W].
We have
F F


1
[W]
1
[X W] = Y ,
so
242 Product measures 254G
(F F

) = F +F

(F F

) W +(X W) 1 = 0.
Now
F
1
[W] F
1
[X W] F F

is -negligible. Because is complete, F


1
[W] T and
1
[W] = F (F
1
[W]) belongs to T. Moreover,
1 = F +F

W +(X W) = 1,
so we must have F = W; but this means that
1
[W] = W. As W is arbitrary, is inverse-measure-preserving.
254H Corollary Let (X
i
,
i
,
i
)
iI
and (Y
i
, T
i
,
i
)
iI
be two families of probability spaces, with products
(X, , ) and (Y,

). Suppose that for each i I we are given an inverse-measure-preserving function


i
: X
i
Y
i
.
Set (x) =
i
(x(i))
iI
for x X. Then : X Y is inverse-measure-preserving.
proof If C =

iI
C
i
is a measurable cylinder in Y , then
1
[C] =

iI

1
i
[C
i
] is a measurable cylinder in X, and

1
[C] =

iI

i

1
i
[C
i
] =

iI

i
C
i
=

C.
Since is a complete probability measure, 254G tells us that is inverse-measure-preserving.
254I Corresponding to 251T we have the following.
Proposition Let (X
i
,
i
,
i
)
iI
be a family of probability spaces, the product measure on X =

iI
X
i
, and
its domain. Then is also the product of the completions
i
of the
i
(212C).
proof Write

for the product of the
i
, and

for its domain. (i) The identity map from X
i
to itself is inverse-measure-
preserving if regarded as a map from (X
i
,
i
) to (X
i
,
i
), so the identity map on X is inverse-measure-preserving if
regarded as a map from (X,

) to (X, ), by 254H; that is,



and =

. (ii) If C is a measurable cylinder for

i

iI
, that is, C =

iI
C
i
where C
i

i
for every i and i : C
i
,= X
i
is nite, then for each i I we can nd a
C

i

i
such that C
i
C

i
and
i
C

i
=
i
C
i
; setting C

iI
C

i
, we get

C C

iI

i
C

i
=

iI

i
C
i
=

C.
By 254G, W must be dened and equal to

W whenever W

. Putting this together with (i), we see that =

.
254J The product measure on 0, 1
I
(a) Perhaps the most important of all examples of innite product
measures is the case in which each factor X
i
is just 0, 1 and each
i
is the fair-coin probability measure, setting

i
0 =
i
1 =
1
2
.
In this case, the product X = 0, 1
I
has a family E
i

iI
of measurable sets such that, writing for the product
measure on X,
(

iJ
E
i
) = 2
#(J)
if J I is nite.
(Just take E
i
= x : x(i) = 1 for each i.) I will call this the usual measure on 0, 1
I
. Observe that if I is nite
then x = 2
#(I)
for each x X (using 254Fb). On the other hand, if I is innite, then x = 0 for every x X
(because, again using 254Fb,

x 2
n
for every n).
(b) There is a natural bijection between 0, 1
I
and TI, matching x 0, 1
I
with i : i I, x(i) = 1. So we get a
standard measure

on TI, which I will call the usual measure on TI. Note that for any nite b I and any c b
we have

a : a b = c = x : x(i) = 1 for i c, x(i) = 0 for i b c = 2


#(b)
.
(c) Of course we can apply 254G to these measures; if (Y, T, ) is a complete probability space, a function : Y
0, 1
I
is inverse-measure-preserving i
y : y Y , (y)J = z = 2
#(J)
whenever J I is nite and z 0, 1
J
; this is because the measurable cylinders in 0, 1
I
are precisely the sets of
the form x : xJ = z where J I is nite.
254K Innite products 243
254K In the case of countably innite I, we have a very important relationship between the usual product measure
of 0, 1
I
and Lebesgue measure on [0, 1].
Proposition Let be the usual measure on X = 0, 1
N
, and let be Lebesgue measure on [0, 1]; write for the
domain of and for the domain of .
(i) For x X set (x) =

i=0
2
i1
x(i). Then

1
[E] and
1
[E] = E for every E ;
[F] and [F] = F for every F .
(ii) There is a bijection

: X [0, 1] which is equal to at all but countably many points, and any such bijection
is an isomorphism between (X, , ) and ([0, 1], , ).
proof (a) The rst point to observe is that is nearly a bijection. Setting
H = x : x X, m N, x(i) = x(m) i m,
H

= 2
n
k : n N, k 2
n
,
then H and H

are countable and X H is a bijection between X H and [0, 1] H

. (For t [0, 1] H

,
1
(t)
is the binary expansion of t.) Because H and H

are countably innite, there is a bijection between them; combining


this with X H, we have a bijection between X and [0, 1] equal to except at countably many points. For the rest
of this proof, let

be any such bijection. Let M be the countable set x : x X, (x) ,=

(x), and N the countable
set [M]

[M]; then [A]

[A] N for every A X.


(b) To see that

1
[E] exists and is equal to E for every E , I consider successively more complex sets E.
() If E = t then

1
[E] =

1
(t) exists and is zero.
() If E is of the form [2
n
k, 2
n
(k + 1)[, where n N and 0 k < 2
n
, then
1
[E] diers by at most two points
from a set of the form x : x(i) = z(i) i < n, so

1
[E] diers from this by a countable set, and

1
[E] = 2
n
= E.
() If E is of the form [2
n
k, 2
n
l[, where n N and 0 k < l 2
n
, then E =

ki<l
[2
n
i, 2
n
(i + 1)[, so

1
[E] = 2
n
(l k) = E.
() If E is of the form [t, u[, where 0 t < u 1, then for each n N let k
n
be the integral part of 2
n
t and l
n
the
integral part of 2
n
u; set E
n
= [2
n
(k
n
+ 1), 2
n
l
n
[; then E
n

nN
is a non-decreasing sequence and

nN
E
n
is ]t, u[.
So (using ())

1
[E] =

1
[
_
nN
E
n
] = lim
n

1
[E
n
]
= lim
n
E
n
= E.
() If E , then for any > 0 there is a sequence I
n

nN
of half-open subintervals of [0, 1[ such that E 1

nN
I
n
and

n=0
I
n
E +; now

1
[E]

1
(1)

nN

1
[I
n
], so

1
[E] (

nN

1
[I
n
])

n=0

1
[I
n
] =

n=0
I
n
E +.
As is arbitrary,

1
[E] E, and there is a V such that
1
[E] V and V E.
() Similarly, there is a V

such that V

1
[[0, 1] E] and V

([0, 1] E). Now V V

= X, while
V +V

E + (1 E) = 1 = (V V

),
so (V V

) = 0 and

1
[E] = (X V

) (V V

1
[E])
belongs to , with

1
[E] V E;
at the same time,
1

1
[E] V

1 E
244 Product measures 254K
so

1
[E] = E.
(c) Now suppose that C X is a measurable cylinder of the special form x : x(0) =
0
, . . . , x(n) =
n
for some

0
, . . . ,
n
0, 1. Then [C] = [t, t + 2
n1
] where t =

n
i=0
2
i1

i
, so that [C] = C. Since

[C][C] N is
countable,

[C] = C.
If C X is any measurable cylinder, then it is of the form x : xJ = z for some nite J N; taking n so large that
J 0, . . . , n, C is expressible as a disjoint union of 2
n+1#(J)
sets of the form just considered, being just those in
which
i
= z(i) for i J. Summing their measures, we again get

[C] = C. Now 254G tells us that


1
: [0, 1] X
is inverse-measure-preserving, that is,

[W] is Lebesgue measurable, with measure W, for every W .
Putting this together with (b),

must be an isomorphism between (X, , ) and ([0, 1], , ), as claimed in (ii) of
the proposition.
(d) As for (i), if E then
1
[E]

1
[E] M is countable, so
1
[E] =

1
[E] = E. While if W ,
[F]

[W] N is countable, so [W] =

[W] = W.
(e) Finally, if : X [0, 1] is any other bijection which agrees with at all but countably many points, set
M

= x : (x) ,= (x), N

= [M

] [M

]. Then

1
[E]
1
[E] M

,
1
[E] =
1
[E] = E
for every E , and
[F][F] N

, [F] = [F] = F
for every F .
254L Subspaces Just as in 251Q, we can consider the product of subspace measures. There is a simplication in
the form of the result because in the present context we are restricted to probability measures.
Theorem Let (X
i
,
i
,
i
)
iI
be a family of probability spaces, and (X, , ) their product.
(a) For each i I, let A
i
X
i
be a set of full outer measure, and write
i
for the subspace measure on A
i
(214B).
Let

be the product measure on A =

iI
A
i
. Then

is the subspace measure on A induced by .
(b)

iI
A
i
) =

iI

i
A
i
whenever A
i
X
i
for every i.
proof (a) Write
A
for the subspace measure on A dened from , and
A
for its domain; write

for the domain of

.
(i) Let : A X be the identity map. If C X is a measurable cylinder, say C =

iI
C
i
where C
i

i
for
each i, then
1
[C] =

iI
(C
i
A
i
) is a measurable cylinder in A, and

1
[C] =

iI

i
(C
i
A
i
)

iI

i
C
i
= C.
By 254G, is inverse-measure-preserving, that is,

(A W) = W for every W . But this means that

V is
dened and equal to
A
V =

V for every V
A
, since for any such V there is a W such that V = A W and
W =
A
V . In particular,
A
A = 1.
(ii) Now regard as a function from the measure space (A,
A
,
A
) to (A,

,

). If D is a measurable cylinder
in A, we can express it as

iI
D
i
where every D
i
belongs to the domain of
i
and D
i
= A
i
for all but nitely many
i. Now for each i we can nd C
i

i
such that D
i
= C
i
A
i
and C
i
=
i
D
i
, and we can suppose that C
i
= X
i
whenever D
i
= A
i
. In this case C =

iI
C
i
and
C =

iI

i
C
i
=

iI

i
D
i
=

D.
Accordingly

1
[D] =
A
(A C) C =

D.
By 254G again, is inverse-measure-preserving in this manifestation, that is,
A
V is dened and equal to

V for every
V

. Putting this together with (i), we have
A
=

, as claimed.
(b) For each i I, choose a set E
i

i
such that A
i
E
i
and
i
E
i
=

i
A
i
; do this in such a way that E
i
= X
i
whenever

i
A
i
= 1. Set B
i
= A
i
(X
i
E
i
), so that

i
B
i
= 1 for each i (if F
i
and F B
i
then F E
i
A
i
, so

i
F =
i
(F E
i
) +
i
(F E
i
) =
i
E
i
+
i
(X
i
E
i
) = 1.)
By (a), we can identify the subspace measure
B
on B =

iI
B
i
with the product of the subspace measures
i
on
B
i
. In particular,

B =
B
B = 1. Now A
i
= B
i
E
i
so (writing A =

iI
A
i
), A = B

iI
E
i
.
254N Innite products 245
If

iI

i
A
i
= 0, then for every > 0 there is a nite J I such that

iJ

i
A
i
; consequently (using 254Fb)

A x : x(i) E
i
for every i J =

iJ

i
E
i
.
As is arbitrary,

A = 0. If

iI

i
A
i
> 0, then for every n N the set i :

A
i
1 2
n
must be nite, so
J = i :

A
i
< 1 = i : E
i
,= X
i

is countable. By 254Fb again, applied to E


i
B
i

iI
in the product

iI
B
i
,

iI
A
i
) =
B
(

iI
A
i
) =
B
x : x B, x(i) E
i
B
i
for every i J
=

iJ

i
(E
i
B
i
) =

iI

i
A
i
,
as required.
254M I now turn to the basic results which make it possible to use these product measures eectively. First, I
oer a vocabulary for dealing with subproducts. Let X
i

iI
be a family of sets, with product X.
(a) For J I, write X
J
for

iJ
X
i
. We have a canonical bijection x (xJ, xI J) : X X
I
X
I\J
.
Associated with this we have the map x
J
(x) = xJ : X X
J
. Now I will say that a set W X is determined
by coordinates in J if there is a V X
J
such that W =
1
J
[V ]; that is, W corresponds to V X
I\J
X
J
X
I\J
.
It is easy to see that
W is determined by coordinates in J
x

W whenever x W, x

X and x

J = xJ
W =
1
J
[
J
[W]].
It follows that if W is determined by coordinates in J, and J K I, W is also determined by coordinates in K.
The family J
J
of subsets of X determined by coordinates in J is closed under complementation and arbitrary unions
and intersections. PPP If W J
J
, then
X W = X
1
J
[
J
[W]] =
1
J
[X
J

J
[W]] J
J
.
If 1 J
J
, then

1 =

V V

1
J
[
J
[V ]] =
1
J
[

V V

J
[V ]] J
J
. QQQ
(b) It follows that
J =

J
J
: J I is countable,
the family of subsets of X determined by coordinates in some countable set, is a -algebra of subsets of X. PPP (i) X
and are determined by coordinates in (recall that X

is a singleton, and that X =


1

[X

], =
1

[]). (ii) If
W J, there is a countable J I such that W J
J
; now
X W =
1
J
[X
J

J
[W]] J
J
J.
(iii) If W
n

nN
is a sequence in J, then for each n N there is a countable J
n
I such that W J
J
n
. Now
J =

nN
J
n
is a countable subset of I, and every W
n
belongs to J
J
, so

nN
W
n
J
J
J. QQQ
(c) If i I and E X
i
then x : x X, x(i) E is determined by the single coordinate i, so surely belongs to
J; accordingly J must include

iI
TX
i
. A fortiori, if
i
is a -algebra of subsets of X
i
for each i, J

iI

i
;
that is, every member of

iI

i
is determined by coordinates in some countable set.
254N Theorem Let (X
i
,
i
,
i
)
iI
be a family of probability spaces and K
j

jJ
a partition of I. For each j J
let
j
be the product measure on Z
j
=

iK
j
X
i
, and write for the product measure on X =

iI
X
i
. Then the
natural bijection
x (x) = xK
j

jJ
: X

jJ
Z
j
246 Product measures 254N
identies with the product of the family
j

jJ
.
In particular, if K I is any set, then can be identied with the c.l.d. product of the product measures on

iK
X
i
and

iI\K
X
i
.
proof (Compare 251N.) Write Z =

jJ
Z
j
and

for the product measure on Z; let ,

be the domains of and

.
(a) Let C Z be a measurable cylinder. Then

1
[C]

C. PPP Express C as

jJ
C
j
where C
j
Z
j
belongs
to the domain
j
of
j
for each j. Set L = j : C
j
,= Z
j
, so that L is nite. Let > 0. For each j L let C
jn

nN
be
a sequence of measurable cylinders in Z
j
=

iK
j
X
i
such that C
j

nN
C
jn
and

n=0

j
C
jn
C
j
+ . Express
each C
jn
as

iK
j
C
jni
where C
jni

i
for i K
j
(and i : C
jni
,= X
i
is nite).
For f N
L
, set
D
f
= x : x X, x(i) C
j,f(j),i
whenever j L, i K
j
.
Because

jL
i : C
j,f(j),i
,= X
i
is nite, D
f
is a measurable cylinder in X, and
D
f
=

jL

iK
j

i
C
j,f(j),i
=

jL

j
C
j,f(j)
.
Also

D
f
: f N
L

1
[C]
because if (x) C then (x)(j) C
j
for each j L, so there must be an f N
L
such that (x)(j) C
j,f(j)
for every
j L. But (because N
L
is countable) this means that

1
[C]

fN
L
D
f
=

fN
L

jL

j
C
j,f(j)
=

jL

n=0

j
C
jn

jL
(
j
C
j
+).
As is arbitrary,

1
[C]

jL

j
C
j
=

C. QQQ
By 254G, it follows that
1
[W] is dened, and equal to

W, whenever W

.
(b) Next,

[D] = D for every measurable cylinder D X. PPP This is easy. Express D as

iI
D
i
where D
i

i
for every i I and i : D
i
,=
i
is nite. Then [D] =

jJ

D
j
, where

D
j
=

iK
j
D
i
is a measurable cylinder
for each j J. Because j :

D
j
,= Z
j
must also be nite (in fact, it cannot have more members than the nite set
i : D
i
,= X
i
),

jJ

D
j
is itself a measurable cylinder in Z, and

[D] =

jJ

j

D
j
=

jJ

iK
j
D
i
= D. QQQ
Applying 254G to
1
: Z X, it follows that

[W] is dened, and equal to W, for every W . But together
with (a) this means that for any W X,
if W then [W]

and

[W] = W,
if [W]

then W and W =

[W].
And of course this is just what is meant by saying that is an isomorphism between (X, , ) and (Z,

,

).
254O Proposition Let (X
i
,
i
,
i
)
iI
be a family of probability spaces. For each J I let
J
be the product
probability measure on X
J
=

iJ
X
i
, and
J
its domain; write X = X
I
, =
I
and =
I
. For x X and J I
set
J
(x) = xJ X
J
.
(a) For every J I,
J
is the image measure
1
J
(234D); in particular,
J
: X X
J
is inverse-measure-preserving
for and
J
.
(b) If J I and W is determined by coordinates in J (254M), then
J

J
[W] is dened and equal to W.
Consequently there are W
1
, W
2
belonging to the -algebra of subsets of X generated by
x : x(i) E : i J, E
i

such that W
1
W W
2
and (W
2
W
1
) = 0.
(c) For every W , we can nd a countable set J and W
1
, W
2
, both determined by coordinates in J, such
that W
1
W W
2
and (W
2
W
1
) = 0.
254P Innite products 247
(d) For every W , there is a countable set J I such that
J
[W]
J
and
J

J
[W] = W; so that
W

=
1
J
[
J
[W]] belongs to , and (W

W) = 0.
proof (a)(i) By 254N, we can identify with the product of
J
and
I\J
on X
J
X
I\J
. Now
1
J
[E] X corresponds
to E X
I\J
X
J
X
I\J
, so
(
1
[E]) =
J
E
I\J
X
I\J
=
J
E,
by 251E or 251Ia, whenever E
J
. This shows that
J
is inverse-measure-preserving.
(ii) To see that
J
is actually the image measure, suppose that E X
J
is such that
1
J
[E] . Identifying

1
J
[E] with E X
I\J
, as before, we are supposing that E X
I\J
is measured by the product measure on X
J
X
I\J
.
But this means that for
I\J
-almost every z X
I\J
, E
z
= y : (y, z) E X
I\J
belongs to
J
(252D(ii), because

J
is complete). Since E
z
= E for every z, E itself belongs to
J
, as claimed.
(b) If W is determined by coordinates in J, set H =
J
[W]; then
1
J
[H] = W, so H
J
by (a) just above.
By 254Ff, there are H
1
, H
2

iJ

i
such that H
1
H H
2
and
J
(H
2
H
1
) = 0.
Let T
J
be the -algebra of subsets of X generated by sets of the form x : x(i) E where i J and E
J
.
Consider T

J
= G : G X
J
,
1
J
[G] T
J
. This is a -algebra of subsets of X
J
, and it contains y : y X
J
,
y(i) E whenever i J, E
J
(because

1
J
[y : y X
J
, y(i) E] = x : x X, x(i) E
whenever i J, E X
i
). So T

J
must include

iJ

i
. In particular, H
1
and H
2
both belong to T

J
, that is,
W
k
=
1
J
[H
k
] belongs to T
J
for both k. Of course W
1
W W
2
, because H
1
H H
2
, and
(W
2
W
1
) =
J
(H
2
H
1
) = 0,
as required.
(c) Now take any W . By 254Ff, there are W
1
and W
2

iI

i
such that W
1
W W
2
and (W
2
W
1
) = 0.
By 254Mc, there are countable sets J
1
, J
2
I such that, for each k, W
k
is determined by coordinates in J
k
. Setting
J = J
1
J
2
, J is a countable subset of I and both W
1
and W
2
are determined by coordinates in J.
(d) Continuing the argument from (c),
J
[W
1
],
J
[W
2
]
J
, by (b), and
J
(
J
[W
2
]
J
[W
1
]) = 0. Since
J
[W
1
]

J
[W]
J
[W
2
], it follows that
J
[W]
J
, with
J

J
[W] =
J

J
[W
2
]; so that, setting W

=
1
J
[
J
[W]], W

,
and
W

=
J

J
[W] =
J

J
[W
2
] =
1
J
[
J
[W
2
]] = W
2
= W.
254P Proposition Let (X
i
,
i
,
i
)
iI
be a family of probability spaces, and for each J I let
J
be the product
probability measure on X
J
=

iJ
X
i
, and
J
its domain; write X = X
I
, =
I
and =
I
. For x X and J I
set
J
(x) = xJ X
J
.
(a) If J I and g is a real-valued function dened on a subset of X
J
, then g is
J
-measurable i g
J
is -measurable.
(b) Whenever f is a -measurable real-valued function dened on a -conegligible subset of X, we can nd a
countable set J I and a
J
-measurable function g dened on a
J
-conegligible subset of X
J
such that f extends
g
J
.
proof (a)(i) If g is
J
-measurable and a R, there is an H
J
such that y : y domg, g(y) a = H domg.
Now
1
J
[H] , by 254Oa, and x : x domg
J
, g
J
(x) a =
1
J
[H] domg
J
. So g
J
is -measurable.
(ii) If g
J
is -measurable and a R, then there is a W such that x : g
J
(x) a = W domg
J
. As in
the proof of 254Oa, we may identify with the product of
J
and
I\J
, and 252D(ii) tells us that, if we identify W
with the corresponding subset of X
J
X
I\J
, there is at least one z X
I\J
such that W
z
= y : y X
I
, (y, z) W
belongs to
J
. But since (on this convention) g
J
(y, z) = g(y) for every y X
J
, we see that y : y domg,
g(y) a = W
z
domg. As a is arbitrary, g is
J
-measurable.
(b) For rational numbers q, set W
q
= x : x domf, f(x) q. By 254Oc we can nd for each q a countable
set J
q
I and sets W

q
, W

q
, both determined by coordinates in J
q
, such that W

q
W
q
W

q
and (W

q
W

q
) = 0.
Set J =

qQ
J
q
, V = X

qQ
(W

q
W

q
); then J is a countable subset of I and V is a conegligible subset of X;
moreover, V is determined by coordinates in J because all the W

q
, W

q
are.
For every q Q, W
q
V = W

q
V , because V (W
q
W

q
) V (W

q
W

q
) = ; so W
q
V is determined by
coordinates in J. Consequently V domf =

qQ
V W
q
also is determined by coordinates in J. Also
248 Product measures 254P
x : x V domf, f(x) a =

qa
V W
q
is determined by coordinates in J. What this means is that if x, x

V and
J
x =
J
x

, then x domf i x

domf
and in this case f(x) = f(x

). Setting H =
J
[V domf], we have
1
J
[H] = V domf a conegligible subset of X,
so (because
J
=
1
J
) H is conegligible in X
J
. Also, for y H, f(x) = f(x

) whenever
J
x =
J
x

= y, so there
is a function g : H R dened by saying that g
J
(x) = f(x) whenever x V domf. Thus g is dened almost
everywhere in X
J
and f extends g
J
. Finally, for any a R,

1
J
[y : g(y) a] = x : x V domf, f(x) a ;
by 254Oa, y : g(y) a
J
; as a is arbitrary, g is measurable.
254Q Proposition Let (X
i
,
i
,
i
)
iI
be a family of probability spaces, and for each J I let
J
be the product
probability measure on X
J
=

iJ
X
i
; write X = X
I
, =
I
. For x X, J I set
J
(x) = xJ X
J
.
(a) Let S be the linear subspace of R
X
spanned by C : C X is a measurable cylinder. Then for every
-integrable real-valued function f and every > 0 there is a g S such that
_
[f g[d .
(b) Whenever J I and g is a real-valued function dened on a subset of X
J
, then
_
g d
J
=
_
g
J
d if either
integral is dened in [, ].
(c) Whenever f is a -integrable real-valued function, we can nd a countable set J X and a
J
-integrable
function g such that f extends g
J
.
proof (a)(i) Write S for the set of functions f satisfying the assertion, that is, such that for every > 0 there is a
g S such that
_
[f g[ . Then f
1
+f
2
and cf
1
S whenever f
1
, f
2
S. PPP Given > 0 there are g
1
, g
2
S such
that
_
[f
1
g
1
[

2+|c|
,
_
[f
2
g
2
[

2
; now g
1
+ g
2
, cg
1
S and
_
[(f
1
+ f
2
) (g
1
+ g
2
)[ ,
_
[cf
1
cg
1
[ . QQQ
Also, of course, f S whenever f
0
S and f =
a.e.
f
0
.
(ii) Write J for W : W X, W S, and ( for the family of measurable cylinders in X. Then it is plain
from the denition in 254A that C C

( for all C, C

(, and of course C J for every C (, because C S.


Next, W V J whenever W, V J and V W, because then (W V ) = W V . Thirdly,

nN
W
n
J
for every non-decreasing sequence W
n

nN
in J. PPP Set W =

nN
W
n
. Given > 0, there is an n N such that
(W W
n
)

2
. Now there is a g S such that
_
[W
n
g[

2
, so that
_
[W g[ . QQQ Thus J is a Dynkin
class of subsets of X.
By the Monotone Class Theorem (136B), J must include the -algebra of subsets of X generated by (, which is

iI

i
. But this means that J contains every measurable subset of X, since by 254Ff any measurable set diers by
a negligible set from some member of

iI

i
.
(iii) Thus S contains the characteristic function of any measurable subset of X. Because it is closed under addition
and scalar multiplication, it contains all simple functions. But this means that it must contain all integrable functions.
PPP If f is a real-valued function which is integrable over X, and > 0, there is a simple function h : X R such that
_
[f h[

2
(242M), and now there is a g S such that
_
[h g[

2
, so that
_
[f g[ . QQQ
This proves part (a) of the proposition.
(b) Put 254Oa and 235J together.
(c) By 254Pb, there are a countable J I and a real-valued function g dened on a conegligible subset of X
J
such
that f extends g
J
. Now dom(g
J
) =
1
J
[domg] is conegligible, so f =
a.e.
g
J
and g
J
is -integrable. By (b), g is

J
-integrable.
254R Conditional expectations again Putting the ideas of 253H together with the work above, we obtain some
results which are important not only for their direct applications but for the light they throw on the structures here.
Theorem Let (X
i
,
i
,
i
)
iI
be a family of probability spaces with product (X, , ). For J I let
J
be the
-subalgebra of sets determined by coordinates in J (254Mb). Then we may regard L
0
(
J
) as a subspace of L
0
()
(242Jh). Let P
J
: L
1
() L
1
(
J
) L
1
() be the corresponding conditional expectation operator (242Jd). Then
(a) for any J, K I, P
KJ
= P
K
P
J
;
(b) for any u L
1
(), there is a countable set J

I such that P
J
u = u i J J

;
(c) for any u L
0
(), there is a unique smallest set J

I such that u L
0
(
J
), and this J

is countable;
(d) for any W there is a unique smallest set J

I such that WW

is negligible for some W


J
, and this
J

is countable;
254R Innite products 249
(e) for any -measurable real-valued function f : X R there is a unique smallest set J

I such that f is equal


almost everywhere to a

J
-measurable function, and this J

is countable.
proof For J I, write X
J
=

iJ
X
i
, let
J
be the product measure on X
J
, and set
J
(x) = xJ for x X. Write
L
0
J
for L
0
(
J
), regarded as a subset of L
0
= L
0
I
, and L
1
J
for L
1
(
J
) = L
1
() L
0
J
, as in 242Jb; thus L
1
J
is the set
of values of the projection P
J
.
(a)(i) Let C X be a measurable cylinder, expressed as

iI
C
i
where C
i

i
for every i and L = i : C
i
,= X
i

is nite. Set
C

i
= C
i
for i J, X
i
for i I J, C

iI
C

i
, =

iI\J

i
C
i
.
Then C

is a conditional expectation of C on
J
. PPP By 254N, we can identify with the product of
J
and
I\J
.
This identies
J
with E X
I\J
: E dom
J
. By 253H we have a conditional expectation g of C dened by
setting
g(y, z) =
_
C(y, t)
I\J
(dt)
for y X
J
, z X
I\J
. But C is identied with C
J
C
I\J
, where C
J
=

iJ
C
i
, so that g(y, z) = 0 if y / C
J
and
otherwise is
I\J
C
I\J
= . Thus g = (C
J
X
I\J
). But the identication between X
I
X
I\J
and X matches
C
J
X
I\J
with C

, as described above. So g becomes identied with C

and C

is a conditional expectation of
C. QQQ
(ii) Next, setting
C

i
= C

i
for i K, X
i
for i I K, C

iI
C

i
,
=

iI\K

i
C

i
=

iI\(JK)

i
C
i
,
the same arguments show that C

is a conditional expectation of C

on
K
. So we have
P
K
P
J
(C)

= (C

.
But if we look at , this is just

iI\(KJ)

i
C
i
, while C

i
= C
i
if i KJ, X
i
for other i. So C

is a conditional
expectation of C on
KJ
, and
P
K
P
J
(C)

= P
KJ
(C)

.
(iii) Thus we see that the operators P
K
P
J
, P
KJ
agree on elements of the form C

where C is a measurable
cylinder. Because they are both linear, they agree on linear combinations of these, that is, P
K
P
J
v = P
KJ
v whenever
v = g

for some g in the space S of 254Q. But if u L


1
() and > 0, there is a -integrable function f such that
f

= u an there is a g S such that


_
[f g[ (254Qa), so that |u v|
1
, where v = g

. Since P
J
, P
K
and
P
KJ
are all linear operators of norm 1,
|P
K
P
J
u P
KJ
u|
1
2|u v|
1
+|P
K
P
J
v P
KJ
v|
1
2.
As is arbitrary, P
K
P
J
u = P
KJ
u; as u is arbitrary, P
K
P
J
= P
KJ
.
(b) Take u L
1
(). Let be the family of all subsets J of I such that P
J
u = u. By (a), J K for all J,
K . Next, contains a countable set J
0
. PPP Let f be a -integrable function such that f

= u. By 254Qc, we can
nd a countable set J
0
I and a
J
0
-integrable function g such that f =
a.e.
g
J
0
. Now g
J
0
is
J
0
-measurable and
u = (g
J
0
)

belongs to L
1
J
0
, so J
0
. QQQ
Write J

, so that J

J
0
is countable. Then J

. PPP Let > 0. As in the proof of (a) above, there is


a g S such that |u v|
1
, where v = g

. But because g is a nite linear combination of characteristic functions


of measurable cylinders, each determined by coordinates in some nite set, there is a nite K I such that g is
K
-
measurable, so that P
K
v = v. Because K is nite, there must be J
1
, . . . , J
n
such that J

K =

1in
J
i
K;
but as is closed under nite intersections, J = J
1
. . . J
n
, and J

K = J K.
Now we have
P
J
v = P
J
P
K
v = P
J

K
v = P
JK
v = P
J
P
K
v = P
J
v,
using (a) twice. Because both P
J
and P
J
have norm 1,
|P
J
u u|
1
|P
J
u P
J
v|
1
+|P
J
v P
J
v|
1
+|P
J
v P
J
u|
1
+|P
J
u u|
1
|u v|
1
+ 0 +|u v|
1
+ 0 2.
As is arbitrary, P
J
u = u and J

. QQQ
250 Product measures 254R
Now, for any J I,
P
J
u = u = J = J J

= P
J
u = P
J
P
J
u = P
JJ
u = P
J
u = u.
Thus J

has the required properties.


(c) Set e = (X)

, u
n
= (ne)(une) for each n N. Then, for any J I, u L
0
J
i u
n
L
0
J
for every n. PPP ()
If u L
0
J
, then u is expressible as f

for some
J
-measurable f; now f
n
= (nX) (f nX) is
J
-measurable, so
u
n
= f

n
L
0
J
for every n. () If u
n
L
0
J
for each n, then for each n we can nd a
J
-measurable function f
n
such that
f

n
= u
n
. But there is also a -measurable function f such that u = f

, and we must have f


n
=
a.e.
(nX)(f nX)
for each n, so that f =
a.e.
lim
n
f
n
and u = (lim
n
f
n
)

. Since lim
n
f
n
is
J
-measurable, u L
0
J
. QQQ
As every u
n
belongs to L
1
, we know that
u
n
L
0
J
u
n
L
1
J
P
J
u
n
= u
n
.
By (b), there is for each n a countable J

n
such that P
J
u
n
= u
n
i J J

n
. So we see that u L
0
J
i J J

n
for every
n, that is, J

nN
J

n
. Thus J

nN
J

n
has the property claimed.
(d) Applying (c) to u = (W)

, we have a (countable) unique smallest J

such that u L
0
J
. But if J I, then
there is a W


J
such that W

W is negligible i u L
0
J
. So this is the J

we are looking for.


(e) Again apply (c), this time to f

.
254S Proposition Let (X
i
,
i
,
i
)
iI
be a family of probability spaces, with product (X, , ).
(a) If A X is determined by coordinates in I j for every j I, then its outer measure

A must be either 0
or 1.
(b) If W and W > 0, then for every > 0 there are a W

and a nite set J I such that W

1
and for every x W

there is a y W such that xI J = yI J.


proof For J I write X
J
for

iJ
X
i
and
J
for the product measure on X
J
.
(a) Let W be a measurable envelope of A. By 254Rd, there is a smallest J I for which there is a W

,
determined by coordinates in J, with (WW

) = 0. Now J = . PPP Take any j I. Then A is determined by


coordinates in I j, that is, can be regarded as X
j
A

for some A

X
I\{j}
. We can also think of as the product
of
{j}
and
I\{j}
(254N). Let
I\{j}
be the domain of
I\{j}
. By 251S,

A =

{j}
X
j

I\{j}
A

I\{j}
A

.
Let V
I\{j}
be measurable envelope of A

. Then W

= X
j
V belongs to , includes A and has measure

A, so
(W W

) = W = W

and WW

is negligible. At the same time, W

is determined by coordinates in I j.
This means that J must be included in I j. As j is arbitrary, J = . QQQ
But the only subsets of X which are determined by coordinates in are X and . Since W diers from one of these
by a negligible set,

A = W 0, 1, as claimed.
(b) Set =
1
2
min(, 1)W. By 254Fe, there is a measurable set V , determined by coordinates in a nite subset J
of I, such that (WV ) . Note that
V W
1
2
W > 0,
so
(WV )
1
2
W V .
We may identify with the c.l.d. product of
J
and
I\J
(254N). Let

W,

V X
I
X
I\J
be the sets corresponding to
W, V X. Then

V can be expressed as U X
I\J
where
J
U = V > 0. Set U

= z : z X
I\J
,
J

W
1
[z] = 0.
Then U

is measured by
I\J
(252D(ii) again, because both
J
and
I\J
are complete), and

J
U
I\J
U


_

J
(

W
1
[z]U)
I\J
(dz)
(because if z U

then
J
(

W
1
[z]U) =
J
U)
*254U Innite products 251
=
_

J
(

W

V )
1
[z]
I\J
(dz)
= (
J

I\J
)(

W

V )
(252D once more)
= (WV ) V =
J
U.
This means that
I\J
U

. Set W

= x : x X, xI J / U

; then W

1 . If x W

, then z = xI J / U

,
so

W
1
[z] is not empty, that is, there is a y W such that yI J = z. So this W

has the required properties.


254T Remarks It is important to understand that the results above apply to L
0
and L
1
and measurable-sets-up-to-
a-negligible set, not to sets and functions themselves. One idea does apply to sets and functions, whether measurable
or not.
(a) Let X
i

iI
be a family of sets with Cartesian product X. For each J I let J
J
be the set of subsets of X
determined by coordinates in J. Then J
J
J
K
= J
JK
for all J, K I. PPP Of course J
J
J
K
J
JK
, because
J
J
J
J
whenever J

J. On the other hand, suppose W J


J
J
K
, x W, y X and xJ K = yJ K. Set
z(i) = x(i) for i J, y(i) for i I J. Then zJ = xJ so z W. Also yK = zK so y W. As x, y are arbitrary,
W J
JK
; as W is arbitrary, J
J
J
K
J
JK
. QQQ Accordingly, for any W X, T = J : W J
J
is a lter on
I (unless W = X or W = , in which case T = TX). But T does not necessarily have a least element, as the following
example shows.
(b) Set X = 0, 1
N
,
W = x : x X, lim
i
x(i) = 0.
Then for every n N W is determined by coordinates in J
n
= i : i n. But W is not determined by coordinates in

nN
J
n
= . Note that
W =

nN

in
x : x(i) = 0
is measured by the usual measure on X. But it is also negligible (since it is countable); in 254Rd we have J = ,
W

= .
*254U I am now in a position to describe a counter-example answering a natural question arising out of 251.
Example There are a localizable measure space (X, , ) and a probability space (Y, T, ) such that the c.l.d. product
measure on X Y is not localizable.
proof (a) Take (X, , ) to be the space of 216E, so that X = 0, 1
I
, where I = TC for some set C of cardinal
greater than c. For each C write E

for x : x X, x() = 1 (that is, G


{}
in the notation of 216Ec); then
E

and E

= 1; also every measurable set of non-zero measure meets some E

in a set of non-zero measure,


while E

is negligible for all distinct , (see 216Ee).


Let (Y, T, ) be 0, 1
C
with the usual measure (254J). For C, let F

be y : y Y , y() = 1, so that F

=
1
2
.
Let be the c.l.d. product measure on X Y , and its domain.
(b) Consider the family J = E

: C . ??? Suppose, if possible, that V were an essential supremum of


J in in the sense of 211G. For C write H

= x : V [x]F

is negligible. Because F

is non-negligible,
H

= for all ,= .
Now E

is -negligible for every C. PPP ((E

) V ) = 0, so F

V [x] is negligible for almost every


x E

, by 252D. On the other hand, if we set F

= Y F

, W

= (X Y ) (E

), then we see that


(E

) (E

) = , E

,
((E

) W

) = ((E

) (E

)) (E

) = 0
for every ,= , so W

is an essential upper bound for J and V (E

) = V W

must be -negligible. Accordingly


V [x] F

= V [x] F

is -negligible for -almost every x E

. But this means that V [x]F

is -negligible
for -almost every x E

, that is, (E

) = 0. QQQ
Now consider the family E

C
. This is a disjoint family of sets of nite measure in X. If E has non-zero
measure, there is a C such that (E

E) = (E

E) > 0. But this means that c = E

: C
satises the conditions of 213O, and must be strictly localizable; which it isnt. XXX
252 Product measures *254U
(c) Thus we have found a family J with no essential supremum in , and is not localizable.
Remark If (X, , ) and (Y, T, ) are any localizable measure spaces with a non-localizable c.l.d. product measure,
then their c.l.d. versions are still localizable (213Hb) and still have a non-localizable product (251T), which cannot be
strictly localizable; so that at least one of the factors is not strictly localizable (251O). Thus any example of the type
here must involve a complete locally determined localizable space which is not strictly localizable, as in 216E.
254V Corresponding to 251U and 251Wo, we have the following result on countable powers of atomless probability
spaces.
Proposition Let (X, , ) be an atomless probability space and I a countable set. Let be the product probability
measure on X
I
. Then x : x X
I
, x is injective is -conegligible.
proof For any pair i, j of distinct elements of X, the set z : z X
{i,j}
, z(i) = z(j) is negligible for the product
measure on X
{i,j}
, by 251U. By 254Oa, x : x X, x(i) = x(j) is -negligible. Because I is countable, there are
only countably many such pairs i, j, so x : x X, x(i) = x(j) for some distinct i, j I is negligible, and its
complement is conegligible; but this complement is just the set of injective functions from I to X.
254X Basic exercises (a) Let (X
i
,
i
,
i
)
iI
be any family of probability spaces, with product (X, , ). Write
c for the family of subsets of X expressible as the union of a nite disjoint family of measurable cylinders. (i) Show
that if C X is a measurable cylinder then X C c. (ii) Show that W V c for all W, V c. (iii) Show that
X W c for every W c. (iv) Show that c is an algebra of subsets of X. (v) Show that for any W , > 0
there is a V c such that (WV )
2
. (vi) Show that for any W , > 0 there are disjoint measurable cylinders
C
0
, . . . , C
n
such that (W C
j
) (1 )C
j
for every j and (W

jn
C
j
) . (Hint: select the C
j
from the
measurable cylinders composing a set V as in (v).) (vii) Show that if f, g are -integrable functions and
_
C
f
_
C
g
for every measurable cylinder C X, then f
a.e.
g. (Hint: show that
_
W
f
_
W
f for every W .)
>>>(b) Let (X
i
,
i
,
i
) be a family of probability spaces, with product (X, , ). Show that the outer measure

dened by is exactly the outer measure described in 254A, that is, that is a regular outer measure.
(c) Let (X
i
,
i
,
i
) be a family of probability spaces, with product (X, , ). Write
0
for the restriction of to

iI

i
, and ( for the family of measurable cylinders in X. Suppose that (Y, T, ) is a probability space and : Y X
a function. (i) Show that is inverse-measure-preserving when regarded as a function from (Y, T, ) to (X,

iI

i
,
0
)
i
1
[C] belongs to T and
1
[C] =
0
C for every C (. (ii) Show that
0
is the only measure on X with this
property. (Hint: 136C.)
>>>(d) Let I be a set and (Y, T, ) a complete probability space. Show that a function : Y 0, 1
I
is inverse-
measure-preserving for and the usual measure on 0, 1
I
i y : (y)(i) = 1 for every i J = 2
#(J)
for every
nite J I.
>>>(e) Let I be any set and the usual measure on X = 0, 1
I
. Dene addition on X by setting (x+y)(i) = x(i)+
2
y(i)
for every i I, x, y X, where 0 +
2
0 = 1 +
2
1 = 0, 0 +
2
1 = 1 +
2
0 = 1. (i) Show that for any y X, the map
x x+y : X X is inverse-measure-preserving. (Hint: Use 254G.) (ii) Show that the map (x, y) x+y : XX X
is inverse-measure-preserving, if X X is given its product measure.
>>>(f ) Let I be any set and the usual measure on TI. (i) Show that the map a ab : TI TI is inverse-
measure-preserving for any b I; in particular, a I a is inverse-measure-preserving. (ii) Show that the map
(a, b) ab : TI TI TI is inverse-measure-preserving.
>>>(g) Show that for any q [0, 1] and any set I there is a measure on TI such that a : J a = q
#(J)
for every
nite J I.
>>>(h) Let (Y, T, ) be a complete probability space, and write for Lebesgue measure on [0, 1]. Suppose that
: Y [0, 1] is a function such that
1
[I] exists and is equal to I for every interval I of the form [2
n
k, 2
n
(k+1)],
where n N and 0 k < 2
n
. Show that is inverse-measure-preserving for and .
(i) Let X
i

iI
be a family of sets, and for each i I let
i
be a -algebra of subsets of X
i
. Show that for every
E

iI

i
there is a countable set J I such that E is expressible as
1
J
[F] for some F

iJ
X
j
, writing

J
(x) = xJ

iJ
X
i
for x

iI
X
i
.
254Yb Innite products 253
(j)(i) Let be the usual measure on X = 0, 1
N
. Show that for any k 1, (X, ) is isomorphic to (X
k
,
k
), where

k
is the measure on X
k
which is the product measure obtained by giving each factor X the measure . (ii) Writing

[0,1]
for Lebesgue measure on [0, 1], etc., show that for any k 1, ([0, 1]
k
,
[0,1]
k) is isomorphic to ([0, 1],
[0,1]
).
(k)(i) Writing
[0,1]
for Lebesgue measure on [0, 1], etc., show that ([0, 1],
[0,1]
) is isomorphic to ([0, 1[ ,
[0,1[
). (ii)
Show that for any k 1, ([0, 1[
k
,
[0,1[
k) is isomorphic to ([0, 1[ ,
[0,1[
). (iii) Show that for any k 1, (R,
R
) is
isomorphic to (R
k
,
R
k).
(l) Let be Lebesgue measure on [0, 1] and the product measure on [0, 1]
N
. Show that ([0, 1], ) and ([0, 1]
N
, )
are isomorphic.
(m) Let (X
i
,
i
,
i
)
iI
be a family of complete probability spaces and the product measure on

iI
X
i
, with
domain . Suppose that A
i
X
i
for each i I. Show that

iI
A
i
i either (i)

iI

i
A
i
= 0 or (ii) A
i

i
for
every i and i : A
i
,= X
i
is countable. (Hint: assemble ideas from 252Xc, 254F, 254L and 254N.)
(n) Let (X
i
,
i
,
i
)
iI
be a family of probability spaces with product (X, , ). (i) Show that, for any A X,

A = min

J
[A] : J I is countable,
where for J I I write
J
for the product probability measure on X
J
=

iJ
X
i
and
J
: X X
J
for the canonical
map. (ii) Show that if J, K I are disjoint and A, B X are determined by coordinates in J, K respectively, then

(A B) =

B.
(o) Let (X
i
,
i
,
i
)
iI
be a family of probability spaces with product (X, , ). Let S be the linear span of the set
of characteristic functions of measurable cylinders in X, as in 254Q. Show that f

: f S is dense in L
p
() for every
p [1, [.
(p) Let (X
i
,
i
,
i
)
iI
be a family of probability spaces, and (X, , ) their product; for J I let
J
be the
-algebra of members of determined by coordinates in J and P
J
: L
1
= L
1
() L
1
J
= L
1
(
J
) the corresponding
conditional expectation. (i) Show that if u L
1
J
and v L
1
I\J
then u v L
1
and
_
u v =
_
u
_
v. (Hint: 253D.)
(ii) Show that if u L
1
then u L
1
J
i
_
C
u = C
_
u for every measurable cylinder C X which is determined by
coordinates in I J. (Hint: 254Xa(vii).) (iii) Show that if TI is non-empty, with J

, then L
1
J
=

JJ
L
1
J
.
(q)(i) Let I be any set and the usual measure on TI. Let A TI be such that ab A whenever a A and
b I is nite. Show that

A must be either 0 or 1. (ii) Let be the usual measure on 0, 1


N
, and its domain.
Let f : 0, 1
N
R be a function such that, for x, y 0, 1
N
, f(x) = f(y) n : n N, x(n) ,= y(n) is nite.
Show that f is not -measurable. (Hint: for any q Q,

x : f(x) q is either 0 or 1.)


(r) Let X
i

iI
be any family of sets and A B

iI
X
i
. Suppose that A is determined by coordinates in J I
and that B is determined by coordinates in K. Show that there is a set C such that A C B and C is determined
by coordinates in J K.
(s) Show that if

: 0, 1
N
[0, 1] is any bijection constructed by the method of 254K, then

1
[E] : E [0, 1] is
a Borel set is just the -algebra of subsets of 0, 1
N
generated by the sets x : x(i) = 1 for i N.
254Y Further exercises (a) Let (X
i
,
i
,
i
)
iI
be a family of probability spaces, and for J I let
J
be the
product measure on X
J
=

iJ
X
i
; write X = X
I
, =
I
and
J
(x) = xJ for x X and J I.
(i) Show that for K J I we have a natural linear, order-preserving and norm-preserving map T
JK
: L
1
(
K
)
L
1
(
J
) dened by writing T
JK
(f

) = (f
KJ
)

for every
K
-integrable function f, where
KJ
(y) = yK for y X
J
.
(ii) Write / for the set of nite subsets of I. Show that if W is any Banach space and T
K

KK
is a family such
that () T
K
is a bounded linear operator from L
1
(
K
) to W for every K / () T
K
= T
J
T
JK
whenever K J /
() sup
KK
|T
K
| < , then there is a unique bounded linear operator T : L
1
() W such that T
K
= TT
IK
for every
K /.
(iii) Write for the set of countable subsets of I. Show that L
1
() =

JJ
T
IJ
[L
1
(
J
)].
(b) Let (X
i
,
i
,
i
)
iI
be a family of probability spaces, and a complete measure on X =

iI
X
i
. Suppose
that for every complete probability space (Y, T, ) and function : Y X, is inverse-measure-preserving for and
i
1
[C] is dened and equal to
0
C for every measurable cylinder C X, writing
0
for the functional of 254A.
Show that is the product measure on X.
254 Product measures 254Yc
(c) Let I be a set, and the usual measure on 0, 1
I
. Show that L
1
() is separable, in its norm topology, i I is
countable.
(d) Let I be a set, and the usual measure on TI. Show that if T is a non-principal ultralter on I then

T = 1.
(Hint: 254Xq, 254Xf.)
(e) Let (X, , ), (Y, T, ) and be as in 254U. Set A = x

: C as dened in 216E. Let


A
be the subspace
measure on A, and

the c.l.d. product measure of
A
and on A Y . Show that

is a proper extension of the
subspace measure
AY
. (Hint: consider

W = (f

, y) : C, y F

, in the notation of 254U.)


(f ) Let (X, , ) be an atomless probability space, I a set with cardinal at most #(X), and A the set of injective
functions from I to X. Show that A has full outer measure for the product measure on X
I
.
254 Notes and comments While there are many reasons for studying innite products of probability spaces, one
stands pre-eminent, from the point of view of abstract measure theory: they provide constructions of essentially new
kinds of measure space. I cannot describe the nature of this newness eectively without venturing into the territory
of Volume 3. But the function spaces of Chapter 24 do give at least a form of words we can use: these are the rst
probability spaces (X, , ) we have seen for which L
1
() need not be separable for its norm topology (254Yc).
The formulae of 254A, like those of 251A, lead very naturally to measures; the point at which they become more
than a curiosity is when we nd that the product measure is a probability measure (254Fa), which must be regarded
as the crucial argument of this section, just as 251E is the essential basis of 251. It is I think remarkable that it
makes no dierence to the result here whether I is nite, countably innite or uncountable. If you write out the
proof for the case I = N, it will seem natural to expand the sets J
n
until they are initial segments of I itself, thereby
avoiding altogether the auxiliary set K; but this is a misleading simplication, because it hides an essential feature of
the argument, which is that any sequence in ( involves only countably many coordinates, so that as long as we are
dealing with only one such sequence the uncountability of the whole set I is irrelevant. This general principle naturally
permeates the whole of the section; in 254O I have tried to spell out the way in which many of the questions we are
interested in can be expressed in terms of countable subproducts of the factor spaces X
i
. See also the exercises 254Xi,
254Xm and 254Ya(iii).
There is a slightly paradoxical side to this principle: even the best-behaved subsets E
i
of X
i
may fail to have
measurable products

iI
E
i
if E
i
,= X
i
for uncountably many i. For instance, ]0, 1[
I
is not a measurable subset of
[0, 1]
I
if I is uncountable (254Xm). It has full outer measure and its own product measure is just the subspace measure
(254L), but any measurable subset must have measure zero. The point is that the empty set is the only member of

iI

i
, where
i
is the algebra of Lebesgue measurable subsets of [0, 1] for each i, which is included in ]0, 1[
I
(see
254Xi).
As in 251, I use a construction which automatically produces a complete measure on the product space. I am sure
that this is the best choice for the product measure. But there are occasions when its restriction to the -algebra
generated by the measurable cylinders is worth looking at; see 254Xc.
Lemma 254G is a result of a type which will be commoner in Volume 3 than in the present volume. It describes the
product measure in terms not of what it is but of what it does; specically, in terms of a property of the associated
family of inverse-measure-preserving functions. It is therefore a universal mapping theorem. (Compare 253F.) Because
this description is sucient to determine the product measure completely (254Yb), it is not surprising that I use it
repeatedly.
The usual measure on 0, 1
I
(254J) is sometimes called coin-tossing measure because it can be used to model the
concept of tossing a coin arbitrarily many times indexed by the set I, taking an x 0, 1
I
to represent the outcome
in which the coin is heads for just those i I for which x(i) = 1. The sets, or events, in the class ( are just those
which can be specied by declaring the outcomes of nitely many tosses, and the probability of any particular sequence
of n results is 1/2
n
, regardless of which tosses we look at or in which order. In Chapter 27 I will return to the use of
product measures to represent probabilities involving independent events.
In 254K I come to the rst case in this treatise of a non-trivial isomorphism between two measure spaces. If you
have been brought up on a conventional diet of modern abstract pure mathematics based on algebra and topology,
you may already have been struck by the absence of emphasis on any concept of homomorphism or isomorphism.
Here indeed I start to speak of isomorphisms between measure spaces without even troubling to dene them; I hope
it really is obvious that an isomorphism between measure spaces (X, , ) and (Y, T, ) is a bijection : X Y such
that T = F : F Y ,
1
[F] and F =
1
[F] for every F T, so that is necessarily E : E X, [E] T
and E = [E] for every E . Put like this, you may, if you worked through the exercises of Volume 1, be reminded
of some constructions of -algebras in 111Xc-111Xd and of the image measures in 234C-234D. The result in 254K
255A Convolutions of functions 255
(see also 134Yo) naturally leads to two distinct notions of homomorphism between two measure spaces (X, , ) and
(Y, T, ):
(i) a function : X Y such that
1
[F] and
1
[F] = F for every F T,
(ii) a function : X Y such that [E] T and [E] = E for every E .
On either denition, we nd that a bijection : X Y is an isomorphism i and
1
are both homomorphisms.
(Also, of course, the composition of homomorphisms will be a homomorphism.) My own view is that (i) is the more
important, and in this treatise I study such functions at length, calling them inverse-measure-preserving. But both
have their uses. The function of 254K not only satises both denitions, but is also nearly an isomorphism in several
dierent ways, of which possibly the most important is that there are conegligible sets X

0, 1
N
, Y

[0, 1] such
that X

is an isomorphism between X

and Y

when both are given their subspace measures.


Having once established the isomorphism between [0, 1] and 0, 1
N
, we are led immediately to many more; see
254Xj-254Xl. In fact Lebesgue measure on [0, 1] is isomorphic to a large proportion of the probability spaces arising in
applications. In Volumes 3 and 4 I will discuss these isomorphisms at length.
The general notion of subproduct is associated with some of the deepest and most characteristic results in the theory
of product measures. Because we are looking at products of arbitrary families of probability spaces, the denition must
ignore any possible structure in the index set I of 254A-254C. But many applications, naturally enough, deal with index
sets with favoured subsets or partitions, and the rst essential step is the associative law (254N; compare 251Xe-251Xf
and 251Wh). This is, for instance, the tool by which we can apply Fubinis theorem within innite products. The
natural projection maps from

iI
X
i
to

iJ
X
i
, where J I, are related in a way which has already been used
as the basis of theorems in 235; the product measure on

iJ
X
i
is precisely the image of the product measure on

iI
X
i
(254Oa). In 254O-254Q I explore the consequences of this fact and the fact already noted that all measurable
sets in the product are essentially determined by coordinates in some countable set.
In 254R I go more deeply into this notion of a set W

iI
X
i
determined by coordinates in a set J I. In its
primitive form this is a purely set-theoretic notion (254M, 254Ta). I think that even a three-element set I can give
us surprises; I invite you to try to visualize subsets of [0, 1]
3
which are determined by pairs of coordinates. But the
interactions of this with measure-theoretic ideas, and in particular with a willingness to add or discard negligible sets,
lead to much more, and in particular to the unique minimal sets of coordinates associated with measurable sets and
functions (254R). Of course these results can be elegantly and eectively described in terms of L
1
and L
0
spaces, in
which negligible sets are swept out of sight as the spaces are constructed. The basis of all this is the fact that the
conditional expectation operators associated with subproducts multiply together in the simplest possible way (254Ra);
but some further idea is needed to show that if is a non-empty family of subsets of I, then L
0

J
=

JJ
L
0
J
(see
part (b) of the proof of 254R, and 254Xp(iii)).
254Sa is a version of the zero-one law (272O below). 254Sb is a strong version of the principle that measurable sets
in a product must be approximable by sets determined by a nite set of coordinates (254Fe, 254Qa, 254Xa). Evidently
it is not a coincidence that the set W of 254Tb is negligible. In 272 I will revisit many of the ideas of 254R-254S and
254Xp, in particular, in the more general context of independent -algebras.
Finally, 254U and 254Ye hardly belong to this section at all; they are unnished business from 251. They are
here because the construction of 254A-254C is the simplest way to produce an adequately complex probability space
(Y, T, ).
255 Convolutions of functions
I devote a section to a construction which is of great importance and will in particular be very useful in Chapters
27 and 28 and may also be regarded as a series of exercises on the work so far.
I nd it dicult to know how much repetition to indulge in in this section, because the natural unied expression
of the ideas is in the theory of topological groups, and I do not think we are yet ready for the general theory (I will
come to it in Chapter 44 in Volume 4). The groups we need for this volume are
R;
R
r
, for r 2;
S
1
= z : z C, [z[ = 1, the circle group;
Z, the group of integers.
All the ideas already appear in the theory of convolutions on R, and I will therefore present this material in relatively
detailed form, before sketching the forms appropriate to the groups R
r
and S
1
(or ], ]); Z can I think be safely left
to the exercises.
256 Product measures 255A
255A This being a book on measure theory, it is perhaps appropriate for me to emphasize, as the basis of the
theory of convolutions, certain measure space isomorphisms.
Theorem Let be Lebesgue measure on R and
2
Lebesgue measure on R
2
; write ,
2
for their domains.
(a) For any a R, the map x a +x : R R is a measure space automorphism of (R, , ).
(b) The map x x : R R is a measure space automorphism of (R, , ).
(c) For any a R, the map x a x : R R is a measure space automorphism of (R, , ).
(d) The map (x, y) (x +y, y) : R
2
R
2
is a measure space automorphism of (R
2
,
2
,
2
).
(e) The map (x, y) (x y, y) : R
2
R
2
is a measure space automorphism of (R
2
,
2
,
2
).
Remark I ought to remark that (b), (d) and (e) may be regarded as simple special cases of Theorem 263A in the
next chapter. I nevertheless feel that it is worth writing out separate proofs here, partly because the general case
of linear operators dealt with in 263A requires some extra machinery not needed here, but more because the result
here has nothing to do with the linear structure of R and R
2
; it is exclusively dependent on the group structure of R,
together with the links between its topology and measure, and the arguments I give now are adaptable to the proper
generalizations to abelian topological groups.
proof (a) This is just the translation-invariance of Lebesgue measure, dealt with in 134. There I showed that if E
then E +a and (E +a) = E (134Ab); that is, writing (x) = x +a, ([E]) exists and is equal to E for every
E . But of course we also have
(
1
[E]) = (E + (a)) = E
for every E , so is an automorphism.
(b) The point is that

(A) =

(A) for every A R. PPP (I follow the denitions of Volume 1.) If > 0, there
is a sequence I
n

nN
of half-open intervals covering A with

n=0
I
n

A + . Now A

nN
(I
n
). But if
I
n
= [a
n
, b
n
[ then I
n
= ]b
n
, a
n
], so

(A)

n=0
(I
n
) =

n=0
max(0, a
n
(b
n
)) =

n=0
I
n

A+.
As is arbitrary,

(A)

A. Also of course

((A)) =

A, so

(A) =

A. QQQ
This means that, setting (x) = x this time, is an automorphism of the structure (R,

). But since is
dened from

by the abstract procedure of Caratheodorys method, must also be an automorphism of the structure
(R, , ).
(c) Put (a) and (b) together; x a x is the composition of the automorphisms x x and x a +x, and the
composition of automorphisms is surely an automorphism.
(d)(i) Write T for the set E : E
2
, [E]
2
, where this time (x, y) = (x + y, y) for x, y R, so that
: R
2
R
2
is a bijection. Then T is a -algebra, being the intersection of the -algebras
2
and E : [E]
2
=

1
[F] : F
2
. Moreover,
2
E =
2
([E]) for every E T. PPP By 252D, we have

2
E =
_
x : (x, y) E(dy).
But applying the same result to [E] we have

2
[E] =
_
x : (x, y) [E](dy) =
_
x : (x y, y) E(dy)
=
_
(E
1
[y] +y)(dy) =
_
E
1
[y](dy)
(because Lebesgue measure is translation-invariant)
=
2
E. QQQ
(ii) Now and
1
are clearly continuous, so that [G] is open, and therefore measurable, for every open G;
consequently all open sets must belong to T. Because T is a -algebra, it contains all Borel sets. Now let E be any
measurable set. Then there are Borel sets H
1
, H
2
such that H
1
E H
2
and
2
(H
2
H
1
) = 0 (134Fb). We have
[H
1
] [E] [H
2
] and
([H
2
] [H
1
]) = [H
2
H
1
] = (H
2
H
1
) = 0.
Thus [E] [H
1
] must be negligible, therefore measurable, and [E] = [H
1
] ([E] [H
1
]) is measurable. This
shows that [E] is measurable whenever E is.
255E Convolutions of functions 257
(iii) Repeating the same arguments with y in the place of y, we see that
1
[E] is measurable, and
2

1
[E] =

2
E, for every E
2
. So is an automorphism of the structure (R
2
,
2
,
2
).
(e) Of course this is an immediate corollary either of the proof of (d) or of (d) itself as stated, since (x, y) (xy, y)
is just the inverse of (x, y) (x +y, y).
255B Corollary (a) If a R, then for any complex-valued function f dened on a subset of R
_
f(x)dx =
_
f(a +x)dx =
_
f(x)dx =
_
f(a x)dx
in the sense that if one of the integrals exists so do the others, and they are then all equal.
(b) If f is a complex-valued function dened on a subset of R
2
, then
_
f(x +y, y)d(x, y) =
_
f(x y, y)d(x, y) =
_
f(x, y)d(x, y)
in the sense that if one of the integrals exists and is nite so does the other, and they are then equal.
255C Remarks (a) I am not sure whether it ought to be obvious that if (X, , ), (Y, T, ) are measure spaces
and : X Y is an isomorphism, then for any function f dened on a subset of Y
_
f((x))(dx) =
_
f(y)(dy)
in the sense that if one is dened so is the other, and they are then equal. If it is obvious then the obviousness must be
contingent on the nature of the denition of integration: integrability with respect to the measure is something which
depends on the structure (X, , ) and on no other properties of X. If it is not obvious then it is an easy deduction
from Theorem 235A above, applied in turn to and
1
and to the real and imaginary parts of f. In any case the
isomorphisms of 255A are just those needed to prove 255B.
(b) Note that in 255Bb I write
_
f(x, y)d(x, y) to emphasize that I am considering the integral of f with respect to
two-dimensional Lebesgue measure. The fact that
_ __
f(x, y)dx
_
dy =
_ __
f(x +y, y)dx
_
dy =
_ __
f(x y, y)dx
_
dy
is actually easier, being an immediate consequence of the equality
_
f(a + x)dx =
_
f(x)dx. But applications of this
result often depend essentially on the fact that the functions (x, y) f(x +y, y), (x, y) f(x y, y) are measurable
as functions of two variables.
(c) I have moved directly to complex-valued functions because these are necessary for the applications in Chapter
28. If however they give you any discomfort, either technically or aesthetically, all the measure-theoretic ideas of this
section are already to be found in the real case, and you may wish at rst to read it as if only real numbers were
involved.
255D A further corollary of 255A will be useful.
Corollary Let f be a complex-valued function dened on a subset of R.
(a) If f is measurable, then the functions (x, y) f(x +y), (x, y) f(x y) are measurable.
(b) If f is dened almost everywhere in R, then the functions (x, y) f(x+y), (x, y) f(xy) are dened almost
everywhere in R
2
.
proof Writing g
1
(x, y) = f(x +y), g
2
(x, y) = f(x y) whenever these are dened, we have
g
1
(x, y) = (f R)((x, y)), g
2
(x, y) = (f R)(
1
(x, y)),
writing (x, y) = (x + y, y) as in 255B(d-e), and (f R)(x, y) = f(x), following the notation of 253B. By 253C,
f R is measurable if f is, and dened almost everywhere if f is. Because is a measure space automorphism,
(f R) = g
1
and (f R)
1
= g
2
are measurable, or dened almost everywhere, if f is.
255E The basic formula Let f and g be measurable complex-valued functions dened almost everywhere in R.
Write f g for the function dened by the formula
(f g)(x) =
_
f(x y)g(y)dy
whenever the integral exists (with respect to Lebesgue measure, naturally) as a complex number. Then f g is the
convolution of the functions f and g.
258 Product measures 255E
Observe that dom([f[ [g[) = dom(f g), and that [f g[ [f[ [g[ everywhere on their common domain, for all f
and g.
Remark Note that I am here prepared to contemplate the convolution of f and g for arbitrary members of L
0
C
, the
space of almost-everywhere-dened measurable complex-valued functions, even though the domain of f g may be
empty.
255F Elementary properties (a) Because integration is linear, we surely have
((f
1
+f
2
) g)(x) = (f
1
g)(x) + (f
2
g)(x),
(f (g
1
+g
2
))(x) = (f g
1
)(x) + (f g
2
)(x),
(cf g)(x) = (f cg)(x) = c(f g)(x)
whenever the right-hand sides of the formulae are dened.
(b) If f and g are measurable complex-valued functions dened almost everywhere in R, then f g = g f, in the
strict sense that they have the same domain and the same value at each point of that common domain.
PPP Take x R and apply 255Ba to see that
(f g)(x) =
_
f(x y)g(y)dy =
_
f(x (x y))g(x y)dy
=
_
f(y)g(x y)dy = (g f)(x)
if either is dened. QQQ
(c) If f
1
, f
2
, g
1
, g
2
are measurable complex-valued functions dened almost everywhere in R, f
1
=
a.e.
f
2
and
g
1
=
a.e.
g
2
, then f
1
g
1
= f
2
g
2
. PPP For every x R we shall have f
1
(x y) = f
2
(x y) for almost every y R, by
255Ac. Consequently f
1
(x y)g
1
(y) = f
2
(x y)g
2
(y) for almost every y, and (f
1
g
1
)(x) = (f
2
g
2
)(x) in the sense
that if one of these is dened so is the other, and they are then equal. QQQ
It follows that if u, v L
0
C
, then we have a function (u, v) which is equal to f g whenever f, g L
0
C
are such that
f

= u and g

= u. Observe that (u, v) = (v, u), and that (u


1
+ u
2
, v) extends (u
1
, v) + (u
2
, v), (cu, v) extends
c(u, v) for all u, u
1
, u
2
, v L
0
C
and c C.
255G I have grouped 255Fa-255Fc together because they depend only on ideas up to and including 255Ac and
255Ba. Using the second halves of 255A and 255B we get much deeper. I begin with what seems to be the fundamental
result.
Theorem Let f, g and h be measurable complex-valued functions dened almost everywhere in R.
(a) Suppose that
_
h(x +y)f(x)g(y)d(x, y) exists in C. Then
_
h(x)(f g)(x)dx =
_
h(x +y)f(x)g(y)d(x, y)
=
__
h(x +y)f(x)g(y)dxdy =
__
h(x +y)f(x)g(y)dydx
provided that in the expression h(x)(f g)(x) we interpret the product as 0 if h(x) = 0 and (f g)(x) is undened.
(b) If, on a similar interpretation of [h(x)[([f[ [g[)(x), the integral
_
[h(x)[([f[ [g[)(x)dx is nite, then
_
h(x +
y)f(x)g(y)d(x, y) exists in C.
proof Consider the functions
k
1
(x, y) = h(x)f(x y)g(y), k
2
(x, y) = h(x +y)f(x)g(y)
wherever these are dened. 255D tells us that k
1
and k
2
are measurable and dened almost everywhere. Now setting
(x, y) = (x +y, y), we have k
2
= k
1
, so that
_
k
1
(x, y)d(x, y) =
_
k
2
(x, y)d(x, y)
if either exists, by 255Bb.
If
255I Convolutions of functions 259
_
h(x +y)f(x)g(y)d(x, y) =
_
k
2
exists, then by Fubinis theorem we have
_
k
2
=
_
k
1
(x, y)d(x, y) =
_
(
_
h(x)f(x y)g(y)dy)dx
so
_
h(x)f(x y)g(y)dy exists almost everywhere, that is, (f g)(x) exists for almost every x such that h(x) ,= 0; on
the interpretation I am using here, h(x)(f g)(x) exists almost everywhere, and
_
h(x)(f g)(x)dx =
_
_
_
h(x)f(x y)g(y)dy
_
dx =
_
k
1
=
_
k
2
=
_
h(x +y)f(x)g(y)d(x, y)
=
__
h(x +y)f(x)g(y)dxdy =
__
h(x +y)f(x)g(y)dydx
by Fubinis theorem again.
If (on the same interpretation) [h[ ([f[ [g[) is integrable,
[k
1
(x, y)[ = [h(x)[[f(x y)[[g(y)[
is measurable, and
__
[h(x)[[f(x y)[[g(y)[dydx =
_
[h(x)[([f[ [g[)(x)dx
is nite, so by Tonellis theorem (252G, 252H) k
1
and k
2
are integrable.
255H Certain standard results are now easy.
Corollary If f, g are complex-valued functions which are integrable over R, then f g is integrable, with
_
f g =
_
f
_
g,
_
[f g[
_
[f[
_
[g[.
proof In 255G, set h(x) = 1 for every x R; then
_
h(x +y)f(x)g(y)d(x, y) =
_
f(x)g(y)d(x, y) =
_
f
_
g
by 253D, so
_
f g =
_
h(x)(f g)(x)dx =
_
h(x +y)f(x)g(y)d(x, y) =
_
f
_
g,
as claimed. Now
_
[f g[
_
[f[ [g[ =
_
[f[
_
[g[.
255I Corollary For any measurable complex-valued functions f, g dened almost everywhere in R, f g is mea-
surable and has measurable domain.
proof Set f
n
(x) = f(x) if x domf, [x[ n and [f(x)[ n, and 0 elsewhere in R; dene g
n
similarly from g.
Then f
n
and g
n
are integrable, [f
n
[ [f[ and [g
n
[ [g[ almost everywhere, f =
a.e.
lim
n
f
n
and g =
a.e.
lim
n
g
n
.
Consequently, by Lebesgues Dominated Convergence Theorem,
(f g)(x) =
_
f(x y)g(y)dy =
_
lim
n
f
n
(x y)g
n
(y)dy
= lim
n
_
f
n
(x y)g
n
(y)dy = lim
n
(f
n
g
n
)(x)
for every x domf g. But f
n
g
n
is integrable, therefore measurable, for every n, so that f g must be measurable.
As for the domain of f g,
260 Product measures 255I
x dom(f g)
_
f(x y)g(y)dy is dened in C

_
[f(x y)[[g(y)[dy is dened in R

_
[f
n
(x y)[[g
n
(y)[dy is dened in R for every n
and sup
nN
_
[f
n
(x y)[[g
n
(y)[dy < .
Because every [f
n
[ [g
n
[ is integrable, therefore measurable and with measurable domain,
dom(f g) = x : x

nN
dom([f
n
[ [g
n
[), sup
nN
([f
n
[ [g
n
[)(x) <
is measurable.
255J Theorem Let f, g and h be complex-valued measurable functions, dened almost everywhere in R, such that
f g and g h are dened a.e. Suppose that x R is such that one of ([f[ ([g[ [h[))(x), (([f[ [g[) [h[)(x) is dened
in R. Then f (g h) and (f g) h are dened and equal at x.
proof Set k(y) = f(x y) when this is dened, so that k is measurable and dened almost everywhere (255D).
(a) If ([f[ ([g[ [h[))(x) is dened, this is
_
[k(y)[([g[ [h[)(y)dy, so by 255G we have
_
k(y)(g h)(y)dy =
_
k(y +z)g(y)h(z)d(y, z),
that is,
(f (g h))(x) =
_
f(x y)(g h)(y)dy =
_
k(y)(g h)(y)dy
=
_
k(y +z)g(y)h(z)d(y, z) =
__
k(y +z)g(y)h(z)dydz
=
__
f(x y z)g(y)h(z)dydz =
_
(f g)(x z)h(z)dz
= ((f g) h)(x).
(b) If (([f[ [g[) [h[)(x) is dened, this is
_
([f[ [g[)(x z)[h(z)[dz =
__
[f(x z y)[[g(y)[[h(z)[dydz
=
__
[k(y +z)[[g(y)[[h(z)[dydz.
By 255D again, (y, z) k(y +z) is measurable, so we can apply Tonellis theorem to see that
_
k(y +z)g(y)h(z)d(y, z)
is dened, and is equal to
_
k(y)(g h)(y)dy = (f (g h))(x) by 255Ga. On the other side, by the last two lines of
the proof of (a),
_
k(y +z)g(y)h(z)d(y, z) is also equal to ((f g) h)(x).
255K I do not think we shall need an exhaustive discussion of the question of just when (f g)(x) is dened;
this seems to be complicated. However there is a fundamental case in which we can be sure that (f g)(x) is dened
everywhere.
Proposition Suppose that f, g are measurable complex-valued functions dened almost everywhere in R, and that
f L
p
C
, g L
q
C
where p, q [1, ] and
1
p
+
1
q
= 1 (writing
1

= 0 as usual). Then f g is dened everywhere in R, is


uniformly continuous, and
sup
xR
[(f g)(x)[ |f|
p
|g|
q
if 1 < p < , 1 < q < ,
|f|
1
ess sup [g[ if p = 1, q = ,
ess sup [f[ |g|
1
if p = , q = 1.
255Ma Convolutions of functions 261
proof (a) (For an introduction to L
p
spaces, see 244.) For any x R, the function f
x
, dened by setting f
x
(y) =
f(xy) whenever xy domf, must also belong to L
p
, because f
x
= f for an automorphism of the measure space.
Consequently (f g)(x) =
_
f
x
g is dened, and of modulus at most |f|
p
|g|
q
or |f|
1
ess sup [g[ or ess sup [f[ |g|
1
,
by 244Eb/244Pb and 243Fa/243K.
(b) To see that f g is uniformly continuous, argue as follows. Suppose rst that p < . Let > 0. Let > 0
be such that (2 + 2
1/p
)|g|
q
. Then there is a bounded continuous function h : R C such that x : h(x) ,= 0
is bounded and |f h|
p
(244Hb/244Pb); let M 1 be such that h(x) = 0 whenever [x[ M 1. Next, h is
uniformly continuous, so there is a ]0, 1] such that [h(x) h(x

)[ M
1/p
whenever [x x

[ .
Suppose that [x x

[ . Dening h
x
(y) = h(x y), as before, we have
_
[h
x
h
x
[
p
=
_
[h(x y) h(x

y)[
p
dy =
_
[h(t) h(x

x +t)[
p
dt
(substituting t = x y)
=
_
M
M
[h(t) h(x

x +t)[
p
dt
(because h(t) = h(x

x +t) = 0 if [t[ M)
2M(M
1/p
)
p
(because [h(t) h(x

x +t)[ M
1/p
for every t)
= 2
p
.
So |h
x
h
x
|
p
2
1/p
. On the other hand,
_
[h
x
f
x
[
p
=
_
[h(x y) f(x y)[
p
dy =
_
[h(y) f(y)[
p
dy,
so |h
x
f
x
|
p
= |h f|
p
, and similarly |h
x
f
x
|
p
. So
|f
x
f
x
|
p
|f
x
h
x
|
p
+[h
x
h
x
|
p
+|h
x
f
x
|
p
(2 + 2
1/p
).
This means that
[(f g)(x) (f g)(x

)[ = [
_
f
x
g
_
f
x
g[ = [
_
(f
x
f
x
) g[
|f
x
f
x
[
p
|g|
q
(2 + 2
1/p
)|g|
q
.
As is arbitrary, f g is uniformly continuous.
The argument here supposes that p is nite. But if p = then q = 1 is nite, so we can apply the method with g
in place of f to show that g f is uniformly continuous, and f g = g f by 255Fb.
255L The r-dimensional case I have written 255A-255K out as theorems about Lebesgue measure on R. However
they all apply equally well to Lebesgue measure on R
r
for any r 1, and the modications required are so small that
I think I need do no more than ask you to read through the arguments again, turning every R into an R
r
, and every
R
2
into an (R
r
)
2
. In 255A and elsewhere, the measure
2
should be read either as Lebesgue measure on R
2r
or as the
product measure on (R
r
)
2
; by 251N the two may be identied. There is a trivial modication required in part (b) of
the proof; if I
n
= [a
n
, b
n
[ then
I
n
= (I
n
) =

r
i=1
max(0,
ni

ni
),
writing a
n
= (
n1
, . . . ,
nr
). In the proof of 255I, the functions f
n
should be dened by saying that f
n
(x) = f(x) if
[f(x)[ n and |x| n, 0 otherwise.
In quoting these results, therefore, I shall be uninhibited in referring to the paragraphs 255A-255K as if they were
actually written out for general r 1.
255M The case of ], ] The same ideas also apply to the circle group S
1
and to the interval ], ], but here
perhaps rather more explanation is in order.
(a) The rst thing to establish is the appropriate group operation. If we think of S
1
as the set z : z C, [z[ = 1,
then the group operation is complex multiplication, and in the formulae above x + y must be rendered as xy, while
x y must be rendered as xy
1
. On the interval ], ], the group operation is +
2
, where for x, y ], ] I write
262 Product measures 255Ma
x+
2
y for whichever of x+y, x+y+2, x+y2 belongs to ], ]. To see that this is indeed a group operation, one
method is to note that it corresponds to multiplication on S
1
if we use the canonical bijection x e
ix
: ], ] S
1
;
another, to note that it corresponds to the operation on the quotient group R/2Z. Thus in this interpretation of the
ideas of 255A-255K, we shall wish to replace x +y by x +
2
y, x by
2
x, and x y by x
2
y, where

2
x = x if x ], [,
2
= ,
and x
2
y is whichever of x y, x y + 2, x y 2 belongs to ], ].
(b) As for the measure, the measure to use on ], ] is just Lebesgue measure. Note that because ], ] is
Lebesgue measurable, there will be no confusion concerning the meaning of measurable subset, as the relatively
measurable subsets of ], ] are actually measured by Lebesgue measure on R. Also we can identify the product
measure on ], ] ], ] with the subspace measure induced by Lebesgue measure on R
2
(251R).
On S
1
, we need the corresponding measure induced by the canonical bijection between S
1
and ], ], which indeed
is often called Lebesgue measure on S
1
. (We shall see in 265E that it is also equal to Hausdor one-dimensional
measure on S
1
.) We are very close to the level at which it would become reasonable to move to S
1
and this measure
(or its normalized version, in which it is reduced by a factor of 2, so as to make S
1
a probability space). However,
the elementary theory of Fourier series, which will be the principal application of this work in the present volume, is
generally done on intervals in R, so that formulae based on ], ] are closer to the standard expressions. Henceforth,
therefore, I will express the work in terms of ], ].
(c) The result corresponding to 255A now takes a slightly dierent form, so I spell it out.
255N Theorem Let be Lebesgue measure on ], ] and
2
Lebesgue measure on ], ] ], ]; write ,
2
for their domains.
(a) For any a ], ], the map x a+
2
x : ], ] ], ] is a measure space automorphism of (], ] , , ).
(b) The map x
2
x : ], ] ], ] is a measure space automorphism of (], ] , , ).
(c) For any a ], ], the map x a
2
x : ], ] ], ] is a measure space automorphism of (], ] , , ).
(d) The map (x, y) (x +
2
y, y) : ], ]
2
], ]
2
is a measure space automorphism of (], ]
2
,
2
,
2
).
(e) The map (x, y) (x
2
y, y) : ], ]
2
], ]
2
is a measure space automorphism of (], ]
2
,
2
,
2
).
proof (a) Set (x) = a +
2
x. Then for any E ], ],
[E] = ((E +a) ], ]) (((E +a) ], 3]) 2) (((E +a) ]3, ]) + 2),
and these three sets are disjoint, so that
[E] = ((E +a) ], ]) +(((E +a) ], 3]) 2)
+(((E +a) ]3, ]) + 2)
=
L
((E +a) ], ]) +
L
(((E +a) ], 3]) 2)
+
L
(((E +a) ]3, ]) + 2)
(writing
L
for Lebesgue measure on R)
=
L
((E +a) ], ]) +
L
((E +a) ], 3]) +
L
((E +a) ]3, ])
=
L
(E +a) =
L
E = E.
Similarly,
1
[E] is dened and equal to E for every E , so that is an automorphism of (], ] , , ).
(b) Of course this is quicker. Setting (x) =
2
x for x ], ], we have
([E]) = ([E] ], [) = ((E ], [)
=
L
((E ], [)) =
L
(E ], [)
= (E ], [) = E
for every E .
(c) This is just a matter of putting (a) and (b) together, as in 255A.
(d) We can argue as in (a), but with a little more elaboration. If E
2
, and (x, y) = (x+
2
y, y) for x, y ], ],
set (x, y) = (x +y, y) for x, y R, and write c = (2, 0) R
2
, H = ], ]
2
, H

= H +c, H

= H c. Then for any


E
2
,
255Xb Convolutions of functions 263
[E] = ([E] H) (([E] H

) c) (([E] H

) +c),
so

2
[E] =
2
([E] H) +
2
(([E] H

) c) +
2
(([E] H

) +c)
=
L
([E] H) +
L
(([E] H

) c) +
L
(([E] H

) +c)
(this time writing
L
for Lebesgue measure on R
2
)
=
L
([E] H) +
L
([E] H

) +
L
([E] H

)
=
L
[E] =
L
E =
2
E.
In the same way,
2
(
1
[E]) =
2
E for every E
2
, so is an automorphism of (], ]
2
,
2
,
2
), as required.
(e) Finally, (e) is just a restatement of (d), as before.
255O Convolutions on ], ] With the fundamental result established, the same arguments as in 255B-255K
now yield the following. Write for Lebesgue measure on ], ].
(a) Let f and g be measurable complex-valued functions dened almost everywhere in ], ]. Write f g for the
function dened by the formula
(f g)(x) =
_

f(x
2
y)g(y)dy
whenever x ], ] and the integral exists as a complex number. Then f g is the convolution of the functions f
and g.
(b) If f and g are measurable complex-valued functions dened almost everywhere in ], ], then f g = g f.
(c) Let f, g and h be measurable complex-valued functions dened almost everywhere in ], ]. Then
(i)
_

h(x)(f g)(x)dx =
_
],]
2
h(x +
2
y)f(x)g(y)d(x, y)
whenever the right-hand side exists and is nite, provided that in the expression h(x)(f g)(x) we interpret the product
as 0 if h(x) = 0 and (f g)(x) is undened.
(ii) If, on the same interpretation of [h(x)[([f[ [g[)(x), the integral
_

[h(x)[([f[ [g[)(x)dx is nite, then


_
],]
2 h(x +
2
y)f(x)g(y)d(x, y) exists in C, so again we shall have
_

h(x)(f g)(x)dx =
_
],]
2
h(x +
2
y)f(x)g(y)d(x, y).
(d) If f, g are complex-valued functions which are integrable over ], ], then f g is integrable, with
_

f g =
_

f
_

g,
_

[f g[
_

[f[
_

[g[.
(e) Let f, g, h be complex-valued measurable functions dened almost everywhere in ], ], such that f g and gh
are also dened almost everywhere. Suppose that x ], ] is such that one of ([f[ ([g[ [h[))(x), (([f[ [g[) [h[)(x)
is dened in R. Then f (g h) and (f g) h are dened and equal at x.
(f ) Suppose that f L
p
C
(), g L
q
C
() where p, q [1, ] and
1
p
+
1
q
= 1. Then f g is dened everywhere in
], ], and sup
x],]
[(f g)(x)[ |f|
p
|g|
q
, interpreting | |

as ess sup [ [, as in 255K.


255X Basic exercises >>>(a) Let f, g be complex-valued functions dened almost everywhere in R. Show that for
any x R, (f g)(x) =
_
f(x +y)g(y)dy if either is dened.
>>>(b) Let f and g be complex-valued functions dened almost everywhere in R. (i) Show that if f and g are even
functions, so is f g. (ii) Show that if f is even and g is odd then f g is odd. (iii) Show that if f and g are odd then
f g is even.
264 Product measures 255Xc
(c) Suppose that f, g are real-valued measurable functions dened almost everywhere in R
r
and such that f > 0
a.e., g 0 a.e. and x : g(x) > 0 is not negligible. Show that f g > 0 everywhere in dom(f g).
>>>(d) Suppose that f : R C is a bounded dierentiable function and that f

is bounded. Show that for any


integrable complex-valued function g on R, f g is dierentiable and (f g)

= f

g everywhere. (Hint: 123D.)


(e) A complex-valued function g dened almost everywhere in R is locally integrable if
_
b
a
g is dened in C
whenever a < b in R. Suppose that g is such a function and that f : R C is a dierentiable function, with continuous
derivative, such that x : f(x) ,= 0 is bounded. Show that (f g)

= f

g everywhere.
>>>(f ) Set

(x) = exp(
1

2
x
2
) if [x[ < , 0 if [x[ , as in 242Xi. Set

=
_

=
1

. Let f be a
locally integrable complex-valued function on R. (i) Show that f

is a smooth function dened everywhere on R


for every > 0. (ii) Show that lim
0
(f

)(x) = f(x) for almost every x R. (Hint: 223Yg.) (iii) Show that if f
is integrable then lim
0
_
[f f

[ = 0. (Hint: use (ii) and 245H(a-ii) or look rst at the case f = [a, b] and use
242O, noting that
_
[f

[
_
[f[.) (iv) Show that if f is uniformly continuous and dened everywhere in R then
lim
0
sup
xR
[f(x) (f

)(x)[ = 0.
>>>(g) For > 0, set g

(t) =
1
()
t
1
for t > 0, 0 for t 0. Show that g

= g
+
for all , > 0. (Hint:
252Yf.)
>>>(h) Let be Lebesgue measure on R. For u, v, w L
0
C
= L
0
C
(), say that u v = w if f g is dened almost
everywhere and (f g)

= w whenever f, g L
0
C
(), f

= u and g

= w. (i) Show that (u


1
+u
2
) v = u
1
v +u
2
v
whenever u
1
, u
2
, v L
0
C
and u
1
v and u
2
v are dened in this sense. (ii) Show that u v = v u whenever u,
v L
0
(C) and either u v or v u is dened. (iii) Show that if u, v, w L
0
C
, u v and v w are dened, and either
[u[ ([v[ [w[) or ([u[ [v[) [w[ is dened, then then u (v w) = (u v) w are dened and equal.
>>>(i) Let be Lebesgue measure on R. (i) Show that u v, as dened in 255Xh, belongs to L
1
C
() whenever u,
v L
1
C
(). (ii) Show that L
1
C
is a commutative Banach algebra under (denition: 2A4Jb).
(j)(i) Show that if h is an integrable function on R
2
, then (Th)(x) =
_
h(x y, y)dy exists for almost every x R,
and that
_
(Th)(x)dx =
_
h(x, y)d(x, y). (ii) Write
2
for Lebesgue measure on R
2
, for Lebesgue measure on R.
Show that there is a linear operator

T : L
1
(
2
) L
1
() dened by setting

T(h

) = (Th)

for every integrable function


h on R
2
. (iii) Show that in the language of 253E and 255Xh,

T(u v) = u v for all u, v L
1
().
>>>(k) For aaa, bbb C
Z
set (aaa bbb)(n) =

iZ
aaa(n i)bbb(i) whenever

iZ
[aaa(n i)bbb(i)[ < . Show that
(i) aaa bbb = bbb aaa;
(ii)

iZ
ccc(i)(aaa bbb)(i) =

i,jZ
ccc(i +j)aaa(i)bbb(j) if

i,jZ
[ccc(i +j)aaa(i)bbb(j)[ < ;
(iii) if aaa, bbb
1
(Z) then aaa bbb
1
(Z) and |aaa bbb|
1
|aaa|
1
|bbb|
1
;
(iv) If aaa, bbb
2
(Z) then aaa bbb

(Z) and |aaa bbb|

|aaa|
2
|bbb|
2
;
(v) if aaa, bbb, ccc C
Z
and ([aaa[ ([bbb[ [ccc[))(n) is well-dened, then (aaa (bbb ccc))(n) = ((aaa bbb) ccc)(n).
255Y Further exercises (a) Let f be a complex-valued function which is integrable over R. (i) Let x be any
point of the Lebesgue set of f. Show that for any > 0 there is a > 0 such that [f(x) (f g)(x)[ whenever
g : R [0, [ is a function which is non-decreasing on ], 0], non-decreasing on [0, [, and has
_
g = 1 and
_

g 1 . (ii) Show that for any > 0 there is a > 0 such that |f f g|
1
whenever g : R [0, [ is a
function which is non-decreasing on ], 0], non-decreasing on [0, [, and has
_
g = 1 and
_

g 1 .
(b) Let f be a complex-valued function which is integrable over R. Show that, for almost every x R,
lim
a
a

f(y)
1+a
2
(xy)
2
dy, lim
a
1
a
_

x
f(y)e
a(yx)
dy,
lim
0
1

2
_

f(y)e
(yx)
2
/2
2
dy
all exist and are equal to f(x). (Hint: 263G.)
(c) Set f(x) = 1 for all x R, g(x) =
x
|x|
for 0 < [x[ 1 and 0 otherwise, h(x) = tanh x for all x R. Show that
f (g h) and (f g) h are both dened (and constant) everywhere, and are dierent.
255Yo Convolutions of functions 265
(d) Discuss what can happen if, in the context of 255J, we know that ([f[ ([g[ [h[))(x) is dened, but have no
information on the domain of f g.
(e) Suppose that p [1, [ and that f L
p
C
(), where is Lebesgue measure on R
r
. For a R
r
set (S
a
f)(x) =
f(a + x) whenever a + x domf. Show that S
a
f L
p
C
(), and that for every > 0 there is a > 0 such that
|S
a
f f|
p
whenever [a[ .
(f ) Suppose that p, q ]1, [ and
1
p
+
1
q
= 1. Take f L
p
C
() and g L
q
C
(), where is Lebesgue measure on R
r
.
Show that lim
x
(f g)(x) = 0. (Hint: use 244Hb.)
(g) Repeat 255Ye and 255K, this time taking to be Lebesgue measure on ], ], and setting (S
a
f)(x) = f(a+
2
x)
for a ], ]; show that in the new version of 255K, (f g)() = lim
x
(f g)(x).
(h) Let be Lebesgue measure on R. For a R, f L
0
= L
0
() set (S
a
f)(x) = f(a +x) whenever a +x domf.
(i) Show that S
a
f L
0
for every f L
0
.
(ii) Show that we have a map

S
a
: L
0
L
0
dened by setting

S
a
(f

) = (S
a
f)

for every f L
0
.
(iii) Show that

S
a
is a Riesz space isomorphism and is a homeomorphism for the topology of convergence in
measure; moreover, that

S
a
(u v) =

S
a
u

S
a
v for all u, v L
0
.
(iv) Show that

S
a+b
=

S
a

S
b
for all a, b R.
(v) Show that lim
a0

S
a
u = u for the topology of convergence in measure, for every u L
0
.
(vi) Show that if 1 p then

S
a
L
p
is an isometric isomorphism of the Banach lattice L
p
.
(vii) Show that if p [1, [ then lim
a0
|

S
a
u u|
p
= 0 for every u L
p
.
(viii) Show that if A L
1
is uniformly integrable and M 0, then

S
a
u : u A, [a[ M is uniformly integrable.
(ix) Suppose that u, v L
0
are such that u v is dened in L
0
in the sense of 255Xh. Show that

S
a
(u v) =
(

S
a
u) v = u (

S
a
v) for every a R.
(i) Prove 255Nd from 255Na by the method used to prove 255Ad from 255Aa, rather than by quoting 255Ad.
(j) Let be Lebesgue measure on R, and : R R a convex function; let

: L
0
L
0
= L
0
() be the associated
operator (see 241I). Show that if u L
1
= L
1
(), v L
0
are such that u 0,
_
u = 1 and u v, u

(v) are both
dened in the sense of 255Xh, then

(u v) u

(v). (Hint: 233I.)
(k) Let be Lebesgue measure on R, and p [1, ]. Let f L
1
C
(), g L
p
C
(). Show that f g L
p
C
() and that
|f g|
p
|f|
1
|g|
p
. (Hint: argue from 255Yj, as in 244M.)
(l) Suppose that p, q, r ]1, [ and that
1
p
+
1
q
= 1 +
1
r
. Let be Lebesgue measure on R. (i) Show that
_
f g |f|
1p/r
p
|g|
1q/r
q
(
_
f
p
g
q
)
1/r
whenever f, g 0 and f L
p
(), g L
q
(). (Hint: set p

= p/(p 1), etc.; f


1
= f
p/q

, g
1
= g
q/p

, h = (f
p
g
q
)
1/r
.
Use 244Xc to see that |f
1
g
1
|
r
|f
1
|
q
|g
1
|
p
, so that
_
f
1
g
1
h |f
1
|
q
|g
1
|
p
|h|
r
.) (ii) Show that f g is
dened a.e. and that |f g|
r
|f|
p
|g|
q
for all f L
p
(), g L
q
(). (Hint: take f, g 0. Use (i) to see that
(f g)(x)
r
|f|
rp
p
|g|
rq
q
_
f(y)
p
g(x y)
q
dy, so that |f g|
r
r
|f|
rp
p
|g|
rq
q
_
f(y)
p
|g|
q
q
dy.) (This is Youngs
inequality.)
(m) Repeat the results of this section for the group (S
1
)
r
, where r 2, given its product measure.
(n) Let G be a group and a -nite measure on G such that () for every a G, the map x ax is an
automorphism of (G, ) () the map (x, y) (x, xy) is an automorphism of (G
2
,
2
), where
2
is the c.l.d. product
measure on GG. For f, g L
0
C
() write (f g)(x) =
_
f(y)g(y
1
x)dy whenever this is dened. Show that
(i) if f, g, h L
0
C
() and
_
h(xy)f(x)g(y)d(x, y) is dened in C, then
_
h(x)(f g)(x)dx exists and is equal to
_
h(xy)f(x)g(y)d(x, y), provided that in the expression h(x)(f g)(x) we interpret the product as 0 if h(x) = 0 and
(f g)(x) is undened;
(ii) if f, g L
1
C
() then f g L
1
C
() and
_
f g =
_
f
_
g, |f g|
1
|f|
1
|g|
1
;
(iii) if f, g, h L
1
C
() then f (g h) = (f g) h.
(See Halmos 50, 59.)
(o) Repeat 255Yn for counting measure on any group G.
266 Product measures 255 Notes
255 Notes and comments I have tried to set this section out in such a way that it will be clear that the basis of all the
work here is 255A, and the crucial application is 255G. I hope that if and when you come to look at general topological
groups (for instance, in Chapter 44), you will nd it easy to trace through the ideas in any abelian topological group
for which you can prove a version of 255A. For non-abelian groups, of course, rather more care is necessary, especially
as in some important examples we no longer have x
1
: x E = E for every E; see 255Yn-255Yo for a little of
what can be done without using topological ideas.
The critical point in 255A is the move from the one-dimensional results in 255Aa-255Ac, which are just the
translation- and reection-invariance of Lebesgue measure, to the two-dimensional results in 255Ac-255Ad. And the
living centre of the argument, as I present it, is the fact that the shear transformation is an automorphism of the
structure (R
2
,
2
). The actual calculation of
2
[E], assuming that it is measurable, is an easy application of Fubinis
and Tonellis theorems and the translation-invariance of . It is for this step that we absolutely need the topological
properties of Lebesgue measure. I should perhaps remind you that the fact that is a homeomorphism is not su-
cient; in 134I I described a homeomorphism of the unit interval which does not preserve measurability, and it is easy
to adapt this to produce a homeomorphism : R
2
R
2
such that [E] is not always measurable for measurable E.
The argument of 255A is dependent on the special relationships between all three of the measure, topology and group
structure of R.
I have already indulged in a few remarks on what ought, or ought not, to be obvious (255C). But perhaps I can add
that such results as 255B and the later claim, in the proof of 255K, that a reected version of a function in L
p
is also in
L
p
, can only be trivial consequences of results like 255A if every step in the construction of the integral is done in the
abstract context of general measure spaces. Even though we are here working exclusively with the Lebesgue integral,
the argument will become untrustworthy if we have at any stage in the denition of the integral even mentioned that
we are thinking of Lebesgue measure. I advance this as a solid reason for dening integration on abstract measure
spaces from the beginning, as I did in Volume 1. Indeed, I suggest that generally in pure mathematics there are good
reasons for casting arguments into the forms appropriate to the arguments themselves.
I am writing this book for readers who are interested in proofs, and as elsewhere I have written the proofs of this
section out in detail. But most of us nd it useful to go through some material in advanced calculus mode, by which
I mean starting with a formula such as
(f g)(x) =
_
f(x y)g(y)dy,
and then working out consequences by formal manipulations, for instance
_
h(x)(f g)(x)dx =
__
h(x)f(x y)g(y)dydx =
__
h(x +y)f(x)g(y)dydx,
without troubling about the precise applicability of the formulae to begin with. In some ways this formula-driven
approach can be more truthful to the structure of the subject than the careful analysis I habitually present. The exact
hypotheses necessary to make the theorems strictly true are surely secondary, in such contexts as this section, to the
pattern formed by the ensemble of the theorems, which can be adequately and elegantly expressed in straightforward
formulae. Of course I do still insist that we cannot properly appreciate the structure, nor safely use it, without
mastering the ideas of the proofs and as I have said elsewhere, I believe that mastery of ideas necessarily includes
mastery of the formal details, at least in the sense of being able to reconstruct them fairly uently on demand.
Throughout the main exposition of this section, I have worked with functions rather than equivalence classes of
functions. But all the results here have interpretations of great importance for the theory of the function spaces of
Chapter 24. In 255Xh and the succeeding exercises, I have pointed to a denition of convolution as an operator from
a subset of L
0
L
0
to L
0
. It is an interesting point that if u, v L
0
then u v can be interpreted as a function, not
as a member of L
0
(255Fc). Thus 255H can be regarded as saying that u v L
1
for u, v L
1
. We cannot quite
say that convolution is a bilinear operator from L
1
L
1
to L
1
, because L
1
, as I dene it, is not strictly speaking a
linear space. If we want a bilinear operator, then we have to regard convolution as a function from L
1
L
1
to L
1
. But
when we look at convolution as a function on L
2
L
2
, for instance, then our functions u v are dened everywhere
(255K), and indeed are continuous functions vanishing at (255Ye-255Yf). So in this case it seems more appropriate
to regard convolution as a bilinear operator from L
2
L
2
to some space of continuous functions, and not as an operator
from L
2
L
2
to L

. For an example of an interesting convolution which is not naturally representable in terms of an


operator on L
p
spaces, see 255Xg.
Because convolution acts as a continuous bilinear operator from L
1
()L
1
() to L
1
(), where is Lebesgue measure
on R, Theorem 253F tells us that it must correspond to a linear operator from L
1
(
2
) to L
1
(), where
2
is Lebesgue
measure on R
2
. This is the operator

T of 255Xj.
So far in these notes I have written as though we were concerned only with Lebesgue measure on R. However many
applications of the ideas involve R
r
or ], ] or S
1
. The move to R
r
should be elementary. The move to S
1
does
require a re-formulation of the basic result 255A/255N. It should also be clear that there will be no new diculties in
256B Radon measures on R
r
267
moving to ], ]
r
or (S
1
)
r
. Moreover, we can also go through the whole theory for the groups Z and Z
r
, where the
appropriate measure is now counting measure, so that L
0
C
becomes identied with C
Z
or C
Z
r
(255Xk, 255Yo).
256 Radon measures on R
r
In the next section, and again in Chapters 27 and 28, we need to consider the principal class of measures on
Euclidean spaces. For a proper discussion of this class, and the interrelationships between the measures and the
topologies involved, we must wait until Volume 4. For the moment, therefore, I present denitions adapted to the
case in hand, warning you that the correct generalizations are not quite obvious. I give the denition (256A) and a
characterization (256C) of Radon measures on Euclidean spaces, and theorems on the construction of Radon measures
as indenite integrals (256E, 256J), as image measures (256G) and as product measures (256K). In passing I give a
version of Lusins theorem concerning measurable functions on Radon measure spaces (256F).
256A Denitions Let be a measure on R
r
, where r 1, and its domain.
(a) is a topological measure if every open set belongs to . Note that in this case every Borel set, and in
particular every closed set, belongs to .
(b) is locally nite if every bounded set has nite outer measure.
(c) If is a topological measure, it is inner regular with respect to the compact sets if
E = supK : K E is compact
for every E . (Because is a topological measure, and compact sets are closed (2A2Ec), K is dened for every
compact set K.)
(d) is a Radon measure if it is a complete locally nite topological measure which is inner regular with respect
to the compact sets.
256B It will be convenient to be able to call on the following elementary facts.
Lemma Let be a Radon measure on R
r
, and its domain.
(a) is -nite.
(b) For any E and any > 0 there are a closed set F E and an open set G E such that (G F) .
(c) For every E there is a set H E, expressible as the union of a sequence of compact sets, such that
(E H) = 0.
(d) Every continuous real-valued function on R
r
is -measurable.
(e) If h : R
r
R is continuous and has bounded support, then h is -integrable.
proof (a) For each n N, B(0, n) = x : |x| n is a closed bounded set, therefore Borel. So if is a Radon measure
on R
r
, B(0, n)
nN
is a cover of R
r
by a sequence of sets of nite measure.
(b) Set E
n
= x : x E, n |x| < n + 1 for each n. Then E
n
< , so there is a compact set K
n
E
n
such
that K
n
E
n
2
n2
. Set F =

nN
K
n
; then
(E F) =

n=0
(E
n
K
n
)
1
2
.
Also F E and F is closed because
F B(0, n) =

in
K
i
B(0, n)
is closed for each n.
In the same way, there is a closed set F

R
r
E such that ((R
r
E) F

)
1
2
. Setting G = R
r
F

, we see that
G is open, that G E and that (G E)
1
2
, so that (G F) , as required.
(c) By (b), we can choose for each n N a closed set F
n
E such that (E F
n
) 2
n
. Set H =

nN
F
n
; then
H E and (E H) = 0, and also H =

m,nN
B(0, m) F
n
is a countable union of compact sets.
(d) If h : R
r
R is continuous, all the sets x : h(x) > a are open, so belong to .
(e) By (d), h is measurable. Now we are supposing that there is some n N such that h(x) = 0 whenever
x / B(0, n). Since B(0, n) is compact (2A2F), h is bounded on B(0, n) (2A2G), and we have [h[ B(0, n) for some
; since B(0, n) is nite, h is -integrable.
268 Product measures 256C
256C Theorem A measure on R
r
is a Radon measure i it is the completion of a locally nite measure dened
on the -algebra B of Borel subsets of R
r
.
proof (a) Suppose rst that is a Radon measure. Write for its domain.
(i) Set
0
= B. Then
0
is a measure with domain B, and it is locally nite because
0
B(0, n) = B(0, n) is
nite for every n. Let
0
be the completion of
0
(212C).
(ii) If
0
measures E, there are E
1
, E
2
B such that E
1
E E
2
and
0
(E
2
E
1
) = 0. Now E E
1
E
2
E
1
must be -negligible; as is complete, E and
E = E
1
=
0
E
1
=
0
E.
(iii) If E , then by 256Bc there is a Borel set H E such that (E H) = 0. Equally, there is a Borel set
H

R
r
E such that ((R
r
E) H

) = 0, so that we have H E R
r
H

and

0
((R
r
H

) H) = ((R
r
H

) H) = 0.
So
0
E is dened and equal to
0
E
1
= E.
This shows that =
0
is the completion of the locally nite Borel measure B. And this is true for any Radon
measure on R
r
.
(b) For the rest of the proof, I suppose that
0
is a locally nite measure on R
r
and is its completion. Write
for the domain of . We say that a subset of R
r
is a K

set if it is expressible as the union of a sequence of compact


sets. Note that every K

set is a Borel set, so belongs to . Set


/ = E : E , there is a K

set H E such that (E H) = 0,


= E : E /, R
r
E /.
(c)(i) Every open set is itself a K

set, so belongs to /. PPP Let G R


r
be open. If G = then G is compact
and the result is trivial. Otherwise, let J be the set of closed intervals of the form [q, q

], where q, q

Q
r
, which are
included in G. Then all the members of J are closed and bounded, therefore compact. If x G, there is a > 0 such
that B(x, ) = y : |y x| G; now there is an I J such that x I B(x, ). Thus G =

J. But J is
countable, so G is K

. QQQ
(ii) Every closed subset of R is K

, so belongs to /. PPP If F R is closed, then F =

nN
F B(0, n); but every
F B(0, n) is closed and bounded, therefore compact. QQQ
(iii) If E
n

nN
is any sequence in /, then E =

nN
E
n
belongs to /. PPP For each n N we have a countable
family /
n
of compact subsets of E
n
such that (E
n

/
n
) = 0; now / =

nN
/
n
is a countable family of compact
subsets of E, and E

nN
(E
n

/
n
) is -negligible. QQQ
(iv) If E
n

nN
is any sequence in /, then F =

nN
E
n
/. PPP For each n N, let K
ni

iN
be a sequence of
compact subsets of E
n
such that (E
n

iN
K
ni
) = 0. Set K

nj
=

ij
K
ni
for each j, so that
(E
n
H) = lim
j
(K

nj
H)
for every H . Now, for each m, n N, choose j(m, n) such that
(E
n
B(0, m) K

n,j(m,n)
) (E
n
B(0, m)) 2
(m+n)
.
Set K
m
=

nN
K

n,j(m,n)
; then K
m
is closed (being an intersection of closed sets) and bounded (being a subset of
K

0,j(m,0)
), therefore compact. Also K
m
F, because K

n,j(m,n)
E
n
for each n, and
(F B(0, m) K
m
)

n=0
(E
n
B(0, m) K

n,j(m,n)
)

n=0
2
(m+n)
= 2
m+1
.
Consequently H =

mN
K
m
is a K

subset of F and
(F B(0, m) H) inf
km
(F B(0, k) H
k
) = 0
for every m, so (F H) = 0 and F /. QQQ
(d) is a -algebra of subsets of R. PPP (i) and its complement are open, so belong to / and therefore to . (ii)
If E then both R
r
E and R
r
(R
r
E) = E belong to /, so R
r
E . (iii) Let E
n

nN
be a sequence in
with union E. By (a-iii) and (a-iv),
E /, R
r
E =

nN
(R
r
E
n
) /,
256E Radon measures on R
r
269
so E . QQQ
(e) By (c-i) and (c-ii), every open set belongs to ; consequently every Borel set belongs to and therefore to /.
Now if E is any member of , there is a Borel set E
1
E such that (E E
1
) = 0 and a K

set H E
1
such that
(E
1
H) = 0. Express H as

nN
K
n
where every K
n
is compact; then
E = H = lim
n
(

in
K
i
) sup
KE is compact
K E
because

in
K
i
is a compact subset of E for every n.
(f ) Thus is inner regular with respect to the compact sets. But of course it is complete (being the completion of

0
) and a locally nite topological measure (because
0
is); so it is a Radon measure. This completes the proof.
256D Proposition If and

are two Radon measures on R


r
, the following are equiveridical:
(i) =

;
(ii) K =

K for every compact set K R


r
;
(iii) G =

G for every open set G R


r
;
(iv)
_
hd =
_
hd

for every continuous function h : R


r
R with bounded support.
proof (a)(i)(iv) is trivial.
(b)(iv)(iii) If (iv) is true, and G R
r
is an open set, then for each n N set
h
n
(x) = min(1, 2
n
inf
yR
r
\(GB(0,n))
|y x|)
for x R
r
. Then h
n
is continuous (in fact [h
n
(x) h
n
(x

)[ 2
n
|x x

| for all x, x

R
r
) and zero outside B(0, n),
so
_
h
n
d =
_
h
n
d

. Next, h
n
(x)
nN
is a non-decreasing sequence converging to G(x) for every x R
r
. So
G = lim
n
_
h
n
d = lim
n
_
h
n
d

G,
by 135Ga. As G is arbitrary, (iii) is true.
(c)(iii)(ii) If (iii) is true, and K R
r
is compact, let n be so large that |x| < n for every x K. Set
G = x : |x| < n, H = G K. Then G and H are open and G is bounded, so G =

G is nite, and
K = GH =

H =

K.
As K is arbitrary, (ii) is true.
(d)(ii)(i) If ,

agree on the compact sets, then


E = sup
KE is compact
K = sup
KE is compact

K =

E
for every Borel set E. So B =

B, where B is the algebra of Borel sets. But since and

are both the completions


of their restrictions to B, they are identical.
256E It is I suppose time I gave some examples of Radon measures. However it will save a few lines if I rst
establish some basic constructions. You may wish to glance ahead to 256H at this point.
Theorem Let be a Radon measure on R
r
, with domain , and f a non-negative -measurable function dened on
a -conegligible subset of R
r
. Suppose that f is locally integrable in the sense that
_
E
fd < for every bounded
set E. Then the indenite-integral measure

on R
r
dened by saying that

E =
_
E
fd whenever E x : x domf, f(x) > 0
is a Radon measure on R
r
.
proof For the construction of

, see 234I-234L. It is a topological measure because every open set belongs to and
therefore to the domain

of

is locally nite because f is locally integrable. To see that

is inner regular with


respect to the compact sets, take any set E

, and set E

= x : x E domf, f(x) > 0. Then E

, so there
is a set H E

, expressible as the union of a sequence of compact sets, such that (E

H) = 0. In this case

(E H) =
_
E\H
fd = 0.
Let K
n

nN
be a sequence of compact sets with union H; then

E =

H = lim
n

in
K
i
) sup
KE is compact

E.
As E is arbitrary,

is inner regular with respect to the compact sets.


270 Product measures 256F
256F Theorem Let be a Radon measure on R
r
, and its domain. Let f : D R be a -measurable function,
where D R
r
. Then for every > 0 there is a closed set F R
r
such that (R
r
F) and fF is continuous.
proof By 121I, there is a -measurable function h : R
r
R extending f. Enumerate Q as q
n

nN
. For each n N
set E
n
= x : h(x) q
n
, E

n
= x : h(x) > q
n
and use 256Bb to choose closed sets F
n
E
n
, F

n
E

n
such that
(E
n
F
n
) 2
n2
, (E

n
F

n
) 2
n2
. Set F =

nN
(F
n
F

n
); then F is closed and
(R
r
F)

n=0
(R
r
(F
n
F

n
))

n=0
(E
n
F
n
) +(E

n
F

n
) .
I claim that hF is continuous. PPP Suppose that x F and > 0. Then there are m, n N such that
h(x) q
m
< h(x) q
n
h(x) +.
This means that x E

m
E
n
; consequently x / F
m
F

n
. Because F
m
F

n
is closed, there is an > 0 such that
y / F
m
F

n
whenever |y x| . Now suppose that y F and |y x| . Then y (F
m
F

m
) (F
n
F

n
)
and y / F
m
F

n
, so y F

m
F
n
E

m
E
n
and q
m
< h(y) q
n
. Consequently [h(y) h(x)[ . As x and are
arbitrary, hF is continuous. QQQ Consequently fF = (hF)D is continuous, as required.
256G Theorem Let be a Radon measure on R
r
, with domain , and suppose that : R
r
R
s
is measurable
in the sense that all its coordinates are -measurable. If the image measure

=
1
(234D) is locally nite, it is a
Radon measure.
proof Write for the domain of and

for the domain of

. If = (
1
, . . . ,
s
), then

1
[y :
j
] = x :
j
(x) ,
so y :
j

for every j s, R, where I write y = (


1
, . . . ,
s
) for y R
s
. Consequently every Borel subset
of R
s
belongs to

(121J), and

is a topological measure. It is complete by 234Eb.


The point is of course that

is inner regular with respect to the compact sets. PPP Suppose that F

and that
<

F. For each j s, there is a closed set H


j
R
r
such that
j
H
j
is continuous and (R
r
H
j
) <
1
s
(

F ),
by 256F. Set H =

js
H
j
; then H is closed and H is continuous and
(R
r
H) <

F =
1
[F] ,
so that (
1
[F] H) > . Let K
1
[F] H be a compact set such that K , and set L = [K]. Because
K H and H is continuous, L is compact (2A2Eb). Of course L F, and

L =
1
[L] K .
As F and are arbitrary,

is inner regular with respect to the compact sets. QQQ


Since

is locally nite by the hypothesis of the theorem, it is a Radon measure.


256H Examples I come at last to the promised examples.
(a) Lebesgue measure on R
r
is a Radon measure. (It is a topological measure by 115G, and inner regular with
respect to the compact sets by 134Fb.)
(b) Let t
n

nN
be any sequence in R
r
, and a
n

nN
any summable sequence in [0, [. For every E R
r
set
E =

a
n
: t
n
E.
so that is a totally nite point-supported measure. Then is a (totally nite) Radon measure on R
r
. PPP Clearly
is complete and dened on every Borel set and gives nite measure to bounded sets. To see that it is inner regular
with respect to the compact sets, observe that for any E R
r
the sets
K
n
= E t
i
: i n
are compact and E = lim
n
K
n
. QQQ
(c) Now we come to a new idea. Recall that the Cantor set C (134G) is a closed negligible subset of [0, 1], and that
the Cantor function (134H) is a non-decreasing continuous function f : [0, 1] [0, 1] such that f(0) = 0, f(1) = 1 and
f is constant on each of the intervals composing [0, 1] C. It follows that if we set g(x) =
1
2
(x + f(x)) for x [0, 1],
then g : [0, 1] [0, 1] is a continuous bijection such that the Lebesgue measure of g[C] is
1
2
(134I); consequently
g
1
: [0, 1] [0, 1] is continuous. Now extend g to a bijection h : R R by setting h(x) = x for x R [0, 1]. Then h
and h
1
are continuous. Note that h[C] = g[C] has Lebesgue measure
1
2
.
256K Radon measures on R
r
271
Let
1
be the indenite-integral measure dened from Lebesgue measure on R and the function 2(h[C]); that
is,
1
E = 2(E h[C]) whenever this is dened. By 256E,
1
is a Radon measure, and
1
h[C] =
1
R = 1. Let be
the measure
1
h, that is, E =
1
h[E] for just those E R such that h[E] dom
1
. Then is a Radon probability
measure on R, by 256G, and C = 1, (R C) = C = 0.
256I Remarks (a) The measure of 256Hc, sometimes called Cantor measure, is a classic example, and as such
has many constructions, some rather more natural than the one I use here (see 256Xk, and also 264Ym below). But
I choose the method above because it yields directly, without further investigation or any appeal to more advanced
general theory, the fact that is a Radon measure.
(b) The examples above are chosen to represent the extremes under the Lebesgue decomposition described in 232I.
If is a (totally nite) Radon measure on R
r
, we can use 232Ib to express its restriction B to the Borel -algebra as

p
+
ac
+
cs
, where
p
is the point-mass or atomic part of B,
ac
is the absolutely continuous part (with respect
to Lebesgue measure), and
cs
is the atomless singular part. In the example of 256Hb, we have B =
p
; in 256E, if
we start from Lebesgue measure, we have B =
ac
; and in 256Hc we have B =
cs
.
256J Absolutely continuous Radon measures It is worth pausing a moment over the indenite-integral mea-
sures described in 256E.
Proposition Let be a Radon measure on R
r
, where r 1, and write for Lebesgue measure on R
r
. Then the
following are equiveridical:
(i) is an indenite-integral measure over ;
(ii) E = 0 whenever E is a Borel subset of R
r
and E = 0.
In this case, if g L
0
() and
_
E
g d = E for every Borel set E R
r
, then g is a Radon-Nikod ym derivative of
with respect to in the sense of 232Hf.
proof (a)(i)(ii) If f is a Radon-Nikod ym derivative of with respect to , then of course
E =
_
E
fd = 0
whenever E = 0.
(ii)(i) If E = 0 for every -negligible Borel set E, then E is dened and equal to 0 for every -negligible set
E, because is complete and any -negligible set is included in a -negligible Borel set. Consequently dom includes
the domain of , since every Lebesgue measurable set is expressible as the union of a Borel set and a negligible set.
For each n N set E
n
= x : n |x| < n + 1, so that E
n

nN
is a partition of R
r
into bounded Borel sets. Set

n
E = (E E
n
) for every Lebesgue measurable set E and every n N. Now
n
is absolutely continuous with respect
to (232Ba), so by the Radon-Nikod ym theorem (232F) there is a -integrable function f
n
such that
_
E
f
n
d =
n
E
for every Lebesgue measurable set E. Because
n
E 0 for every E , f
n
0 a.e.; because
n
(R
r
E
n
) = 0, f
n
= 0
a.e. on R
r
E
n
. Now if we set
f = max(0,

n=0
f
n
),
f will be dened -a.e. and we shall have
_
E
fd =

n=0
_
E
f
n
d =

n=0
(E E
n
) = E
for every Borel set E, so that the indenite-integral measure

dened by f and agrees with on the Borel sets.


Since this ensures that

is locally nite,

is a Radon measure, by 256E, and is equal to , by 256D. Accordingly


is an indenite-integral measure over .
(b) As in (a-ii) above, h must be locally integrable and the indenite-integral measure dened by h agrees with
on the Borel sets, so is identical with .
256K Products The class of Radon measures on Euclidean spaces is stable under a wide variety of operations, as
we have already seen; in particular, we have the following.
Theorem Let
1
,
2
be Radon measures on R
r
and R
s
respectively, where r, s 1. Let be their c.l.d. product
measure on R
r
R
s
. Then is a Radon measure.
Remark When I say that is Radon according to the denition in 256A, I am of course identifying R
r
R
s
with
R
r+s
, as in 251M-251N.
272 Product measures 256K
proof (a) I hope the following rather voluminous notation will seem natural. Write
1
,
2
for the domains of
1
,
2
;
B
r
, B
s
for the Borel -algebras of R
r
, R
s
; for the domain of ; and B for the Borel -algebra of R
r+s
.
Because each
i
is the completion of its restriction to the Borel sets (256C), is the product of
1
B
r
and
2
B
s
(251T). Because
1
B
r
and
2
B
s
are -nite (256Ba, 212Ga), must be the completion of its restriction to B
r

B
s
,
which by 251M is identied with B. Setting Q
n
= (x, y) : |x| n, |y| n we have
Q
n
=
1
x : |x| n
2
y : |y| n <
for every n, while every bounded subset of R
r+s
is included in some Q
n
. So B is locally nite, and its completion
is a Radon measure, by 256C.
256L Remark We see from 253I that if
1
and
2
are Radon measures on R
r
and R
s
respectively, and both are
indenite-integral measures over Lebesgue measure, then their product measure on R
r+s
is also an indenite-integral
measure over Lebesgue measure.
*256M For the sake of applications in 286 below, I include another result, which is in fact one of the fundamental
properties of Radon measures, as will appear in 414.
Proposition Let be a Radon measure on R
r
, and D any subset of R
r
. Let be a non-empty upwards-directed
family of non-negative continuous functions from D to R. For x D set g(x) = sup
f
f(x) in [0, ]. Then
(a) g : D [0, ] is lower semi-continuous, therefore Borel measurable;
(b)
_
D
g d = sup
f
_
D
fd.
proof (a) For any u [, ],
x : x D, g(x) > u =

f
x : x D, f(x) > u
is an open set for the subspace topology on D (2A3C), so is the intersection of D with a Borel subset of R
r
. This is
enough to show that g is Borel measurable (121B-121C).
(b) Accordingly
_
D
g d will be dened in [0, ], and of course
_
D
g d sup
f
_
D
fd.
For the reverse inequality, observe that there is a countable set such that g(x) = sup
f
f(x) for every x D.
PPP For a Q, q, q

Q
r
set

aqq
= f : f , f(y) > a whenever y D [q, q

],
interpreting [q, q

] as in 115G. Choose f
aqq

aqq
if
aqq
is not empty, and arbitrarily in otherwise; and set
= f
aqq
: a Q, q, q

Q
r
, so that is a countable subset of . If x D and b < g(x), there is an a Q
such that b a < g(x); there is an

f such that

f(x) > a; because

f is continuous, there are q, q

Q
r
such that q x q

and

f(y) a whenever y D [q, q

]; so that

f
aqq
,
aqq
,= , f
aqq

aqq
and
sup
f
f(x) f
aqq
(x) b. As b is arbitrary, g(x) = sup
f
f(x). QQQ
Let f
n

nN
be a sequence running over . Because is upwards-directed, we can choose f

nN
in inductively in
such a way that f

n+1
max(f

n
, f
n
) for every n N. So f

nN
is a non-decreasing sequence in and sup
nN
f

n
(x)
sup
f
f(x) = g(x) for every x D. By B.Levis theorem,
_
D
g d sup
nN
_
D
f

n
d sup
f
_
D
fd,
and we have the required inequality.
256X Basic exercises >>>(a) Let be a measure on R
r
. (i) Show that it is locally nite, in the sense of 256Ab, i
for every x R
r
there is a > 0 such that

B(x, ) < . (Hint: the sets B(0, n) are compact.) (ii) Show that in
this case is -nite.
>>>(b) Let be a Radon measure on R
r
and ( a non-empty upwards-directed family of open sets in R
r
. (i) Show
that (

() = sup
GG
G. (Hint: observe that if K

( is compact, then K G for some G (.) (ii) Show that


(E

() = sup
GG
(E G) for every set E which is measured by .
>>>(c) Let be a Radon measure on R
r
and T a non-empty downwards-directed family of closed sets in R
r
such that
inf
FF
F < . (i) Show that (

T) = inf
FF
F. (Hint: apply 256Xb(ii) to ( = R
r
F : F T.) (ii) Show
that (E

T) = inf
FF
(E F) for every E in the domain of .
>>>(d) Show that a Radon measure on R
r
is atomless i x = 0 for every x R
r
. (Hint: apply 256Xc with
T = F : F E is closed, not negligible.)
256Ye Radon measures on R
r
273
(e) Let
1
,
2
be Radon measures on R
r
, and
1
,
2
]0, [. Set = dom
1
dom
2
, and for E set
E =
1

1
E +
2

2
E. Show that is a Radon measure on R
r
. Show that is an indenite-integral measure over
Lebesgue measure i
1
,
2
are, and that in this case a linear combination of of Radon-Nikod ym derivatives of
1
and

2
is a Radon-Nikod ym derivative of .
>>>(f ) Let be a Radon measure on R
r
. (i) Show that there is a unique closed set F R
r
such that, for open sets
G R
r
, G > 0 i G F ,= . (F is called the support of .) (ii) Generally, a set A R
r
is called self-supporting
if

(A G) > 0 whenever G R
r
is an open set meeting A. Show that for every closed set F R
r
there is a unique
self-supporting closed set F

F such that (F F

) = 0.
>>>(g) Show that a measure on R is a Radon measure i it is a Lebesgue-Stieltjes measure as described in 114Xa.
Show that in this case is an indenite-integral measure over Lebesgue measure i the function x ], x] is
absolutely continuous on every bounded interval.
(h) Let be a Radon measure on R
r
. Let C
k
be the space of continuous real-valued functions on R
r
with bounded
supports. Show that for every -integrable function f and every > 0 there is a g C
k
such that
_
[f g[d .
(Hint: use arguments from 242O, but in (a-i) of the proof there start with closed intervals I.)
(i) Let be a Radon measure on R
r
. Show that E = infG : G E is open for every set E in the domain of .
(j) Let ,

be two Radon measures on R


r
, and suppose that I =

I for every half-open interval I R


r
(denition: 115Ab). Show that =

.
(k) Let be Cantor measure (256Hc). (i) Show that if C
n
is the nth set used in the construction of the Cantor
set, so that C
n
consists of 2
n
intervals of length 3
n
, then I = 2
n
for each of the intervals I composing C
n
. (ii)
Let be the usual measure on 0, 1
N
(254J). Dene : 0, 1
N
R by setting (x) =
2
3

n=0
3
n
x(n) for each
x 0, 1
N
. Show that is a bijection between 0, 1
N
and C. (iii) Show that if B is the Borel -algebra of R, then

1
[E] : E B is precisely the -algebra of subsets of 0, 1
N
generated by the sets x : x(n) = i for n N,
i 0, 1. (iv) Show that is an isomorphism between (0, 1
N
, ) and (C,
C
), where
C
is the subspace measure on
C induced by .
(l) Let and

be two Radon measures on R


r
. Show that

is an indenite-integral measure over i

E = 0
whenever E = 0, and in this case a function f is a Radon-Nikod ym derivative of

with respect to i
_
E
fd =

E
for every Borel set E.
256Y Further exercises (a) Let be a Radon measure on R
r
, and X any subset of R
r
; let
X
be the subspace
measure on X and
X
its domain, and give X its subspace topology (2A3C). Show that
X
has the following properties:
(i)
X
is complete and locally determined; (ii) every open subset of X belongs to
X
; (iii)
X
E = sup
X
F : F E
is closed in X for every E
X
; (iv) whenever ( is a non-empty upwards-directed family of open subsets of X,

X
(

() = sup
GG

X
G; (v) every point of X belongs to an open set of nite measure.
(b) Let be a Radon measure on R
r
, with domain , and f : R
r
R a function. Show that the following are
equiveridical: (i) f is -measurable; (ii) for every non-negligible set E there is a non-negligible F such that
F E and fF is continuous; (iii) for every set E , E = sup
KK
f
,KE
K, where /
f
= K : K R
r
is compact,
fK is continuous. (Hint: for (ii)(i), apply 215B(iv) to CalK
f
.)
(c) Take , X,
X
and
X
as in 256Ya. Suppose that f : X R is a function. Show that f is
X
-measurable i for
every non-negligible measurable set E X there is a non-negligible measurable F E such that fF is continuous.
(d) Let
n

nN
be a sequence of Radon measures on R
r
. Show that there is a Radon measure on R
r
such that
every
n
is an indenite-integral measure over . (Hint: nd a sequence
n

nN
of strictly positive numbers such that

n=0

n
B(0, k) < for every k, and set =

n=0

n
, using the idea of 256Xe.)
(e) A set G R
N
is open if for every x G there are n N, > 0 such that
y : y R
N
, [y(i) x(i)[ < for every i n G.
The Borel -algebra of R
N
is the -algebra B of subsets of R
N
generated, in the sense of 111Gb, by the family T of
open sets. (i) Show that T is a topology (2A3A). (ii) Show that a lter T on R
N
converges to x R
N
i
i
[[T]] x(i)
for every i N, where
i
(y) = y(i) for i N, y R
N
. (iii) Show that B is the -algebra generated by sets of the
274 Product measures 256Ye
form x : x R
N
, x(i) a, where i runs over N and a runs over R. (iv) Show that if
i
0 for every i N, then
x : [x(i)[
i
i N is compact. (Hint: 2A3R.) (v) Show that any open set in R
N
is the union of a sequence of
closed sets. (Hint: look at sets of the form x : q
i
x(i) q

i
i n, where q
i
, q

i
Q for i n.) (vi) Show that if

0
is any probability measure with domain B, then its completion is inner regular with respect to the compact sets,
and therefore may be called a Radon measure on R
N
. (Hint: show that there are compact sets of measure arbitrarily
close to 1, and therefore that every open set, and every closed set, includes a K

set of the same measure.)


256 Notes and comments Radon measures on Euclidean spaces are very special, and the results of this section do
not give clear pointers to the direction the theory takes when applied to other kinds of topological space. With the
material here you could make a stab at developing a theory of Radon measures on complete separable metric spaces,
provided you use 256Xa as the basis for your denition of locally nite. These are the spaces for which a version of
256C is true. (See 256Ye.) But for generalizations to other types of topological space, and for the more interesting
parts of the theory on R
r
, I must ask you to wait for Volume 4. My purpose in introducing Radon measures here is
strictly limited; I wish only to give a basis for 257 and 271 suciently solid not to need later revision. In fact I think
that all we really need are the Radon probability measures.
The chief technical diculty in the denition of Radon measure here lies in the insistence on completeness. It
may well be that for everything studied in this volume, it would be simpler to look at locally nite measures with
domain the algebra of Borel sets. This would involve us in a number of circumlocutions when dealing with Lebesgue
measure itself and its derivates, since Lebesgue measure is dened on a larger -algebra; but the serious objection
arises in the more advanced theory, when non-Borel sets of various kinds become central. Since my aim in this book
is to provide secure foundations for the study of all aspects of measure theory, I ask you to take a little extra trouble
now in order to avoid the possibility of having to re-work all your ideas later. The extra trouble arises, for instance, in
256D, 256Xe and 256Xj; since dierent Radon measures are dened on dierent -algebras, we have to check that two
Radon measures which agree on the compact sets, or on the open sets, have the same domains. On the credit side,
some of the power of 256G arises from the fact that the Radon image measure
1
is dened on the whole -algebra
F :
1
[F] dom(), not just on the Borel sets.
The further technical point that Radon measures are expected to be locally nite gives less diculty; its eect is
that from most points of view there is little dierence between a general Radon measure and a totally nite Radon
measure. The extra condition which obviously has to be put into the hypotheses of such results as 256E and 256G is
no burden on either intuition or memory.
In eect, we have two denitions of Radon measures on Euclidean spaces: they are the inner regular locally nite
topological measures, and they are also the completions of the locally nite Borel measures. The equivalence of these
denitions is Theorem 256C. The latter denition is the better adapted to 256K, and the former to 256G. The inner
regularity of the basic denition refers to compact sets; we also have forms of inner regularity with respect to closed
sets (256Bb) and K

sets (256Bc), and a complementary notion of outer regularity with respect to open sets (256Xi).
257 Convolutions of measures
The ideas of this chapter can be brought together in a satisfying way in the theory of convolutions of Radon measures,
which will be useful in 272 and again in 285. I give just the denition (257A) and the central property (257B) of the
convolution of totally nite Radon measures, with a few corollaries and a note on the relation between convolution of
functions and convolution of measures (257F).
257A Denition Let r 1 be an integer and
1
,
2
two totally nite Radon measures on R
r
. Let be the product
measure on R
r
R
r
; then also is a (totally nite) Radon measure, by 256K. Dene : R
r
R
r
R
r
by setting
(x, y) = x+y; then is continuous, therefore measurable in the sense of 256G. The convolution of
1
and
2
,
1

2
,
is the image measure
1
; by 256G, this is a Radon measure.
Note that if
1
and
2
are Radon probability measures, then and
1

2
are also probability measures.
257B Theorem Let r 1 be an integer, and
1
and
2
two totally nite Radon measures on R
r
; let =
1

2
be
their convolution, and their product on R
r
R
r
. Then for any real-valued function h dened on a subset of R
r
,
_
h(x +y)(d(x, y)) exists =
_
h(x)(dx)
if either integral is dened in [, ].
proof Apply 235J with J(x, y) = 1, (x, y) = x +y for all x, y R
r
.
257F Convolutions of measures 275
257C Corollary Let r 1 be an integer, and
1
,
2
two totally nite Radon measures on R
r
; let =
1

2
be
their convolution, and their product on R
r
R
r
; write for the domain of . Let h be a -measurable function
dened -almost everywhere in R
r
. Suppose that any one of the integrals
__
[h(x +y)[
1
(dx)
2
(dy),
__
[h(x +y)[
2
(dy)
1
(dx),
_
h(x +y)(d(x, y))
exists and is nite. Then h is -integrable and
_
h(x)(dx) =
__
h(x +y)
1
(dx)
2
(dy) =
__
h(x +y)
2
(dy)
1
(dx).
proof Put 257B together with Fubinis and Tonellis theorems (252H).
257D Corollary If
1
and
2
are totally nite Radon measures on R
r
, then
1

2
=
2

1
.
proof For any Borel set E R
r
, apply 257C to h = E to see that
(
1

2
)(E) =
__
E(x +y)
1
(dx)
2
(dy) =
__
E(x +y)
2
(dy)
1
(dx)
=
__
E(y +x)
2
(dy)
1
(dx) = (
2

1
)(E).
Thus
1

2
and
2

1
agree on the Borel sets of R
r
; because they are both Radon measures, they must be identical
(256D).
257E Corollary If
1
,
2
and
3
are totally nite Radon measures on R
r
, then (
1

2
)
3
=
1
(
2

3
).
proof For any Borel set E R
r
, apply 257B to h = E to see that
((
1

2
)
3
)(E) =
__
E(x +z)(
1

2
)(dx)
3
(dz)
=
___
E(x +y +z)
1
(dx)
2
(dy)
3
(dz)
(because x E(x +z) is Borel measurable for every z)
=
__
E(x +y)
1
(dx)(
2

3
)(dy)
(because (x, y) E(x +y) is Borel measurable, so y
_
E(x +y)
1
(dx) is (
2

3
)-integrable)
= (
1
(
2

3
))(E).
Thus (
1

2
)
3
and
1
(
2

3
) agree on the Borel sets of R
r
; because they are both Radon measures, they must
be identical.
257F Theorem Suppose that
1
and
2
are totally nite Radon measures on R
r
which are indenite-integral
measures over Lebesgue measure . Then
1

2
also is an indenite-integral measure over ; if f
1
and f
2
are Radon-
Nikod ym derivatives of
1
,
2
respectively, then f
1
f
2
is a Radon-Nikod ym derivative of
1

2
.
proof By 255H/255L, f
1
f
2
is integrable with respect to , with
_
f
1
f
2
d = 1, and of course f
1
f
2
is non-negative.
If E R
r
is a Borel set,
_
E
f
1
f
2
d =
__
E(x +y)f
1
(x)f
2
(y)(dx)(dy)
(255G)
=
__
E(x +y)f
2
(y)
1
(dx)(dy)
(because x E(x +y) is Borel measurable)
=
__
E(x +y)
1
(dx)
2
(dy)
(because (x, y) E(x +y) is Borel measurable, so y
_
E(x +y)
1
(dx) is
2
-integrable)
= (
1

2
)(E).
So f
1
f
2
is a Radon-Nikod ym derivative of with respect to , by 256J.
276 Product measures 257X
257X Basic exercises >>>(a) Let r 1 be an integer. Let
0
be the Dirac measure on R
r
concentrated at 0. Show
that
0
is a Radon probability measure on R
r
and that
0
= for every totally nite Radon measure on R
r
.
(b) Let and be totally nite Radon measures on R
r
, and E any set measured by their convolution . Show
that
_
(E y)(dy) is dened in [0, ] and equal to ( )(E).
(c) Let
1
, . . . ,
n
be totally nite Radon measures on R
r
, and let be the convolution
1
. . .
n
(using 257E to
see that such a bracketless expression is legitimate). Show that
_
h(x)(dx) =
_
. . .
_
h(x
1
+. . . +x
n
)
1
(dx
1
) . . .
n
(dx
n
)
for every -integrable function h.
(d) Let
1
and
2
be totally nite Radon measures on R
r
, with supports F
1
, F
2
(256Xf). Show that the support of

1

2
is x +y : x F
1
, y F
2
.
>>>(e) Let
1
and
2
be totally nite Radon measures on R
r
, and suppose that
1
has a Radon-Nikod ym derivative f
with respect to Lebesgue measure . Show that
1

2
has a Radon-Nikod ym derivative g, where g(x) =
_
f(xy)
2
(dy)
for -almost every x R
r
.
(f ) Suppose that
1
,
2
,

1
and

2
are totally nite Radon measures on R
r
, and that

1
,

2
are absolutely continuous
with respect to
1
,
2
respectively. Show that

2
is absolutely continuous with respect to
1

2
.
257Y Further exercises (a) Let M be the space of countably additive functionals dened on the algebra B
of Borel subsets of R, with its norm || = [[(R) (see 231Yh). (i) Show that we have a unique bilinear operator
: M M M such that (
1
B) (
2
B) = (
1

2
)B for all totally nite Radon measures
1
,
2
on R. (ii) Show
that is commutative and associative. (iii) Show that |
1

2
| |
1
||
2
| for all
1
,
2
M, so that M is a Banach
algebra under this multiplication. (iv) Show that M has a multiplicative identity. (v) Show that L
1
() can be regarded
as a closed subalgebra of M, where is Lebesgue measure on R
r
(cf. 255Xi).
(b) Let us say that a Radon measure on ], ] is a measure , with domain , on ], ] such that (i) every
Borel subset of ], ] belongs to (ii) for every E there are Borel sets E
1
, E
2
such that E
1
E E
2
and
(E
2
E
1
) = 0 (iii) every compact subset of ], ] has nite measure. Show that for any two totally nite Radon
measures
1
,
2
on ], ] there is a unique totally nite Radon measure on ], ] such that
_
h(x)(dx) =
_
h(x +
2
y)
1
(dx)
2
(dy)
for every -integrable function h, where +
2
is dened as in 255Ma.
257 Notes and comments Of course convolution of functions and convolution of measures are very closely connected;
the obvious link being 257F, but the correspondence between 255G and 257B is also very marked. In eect, they give
us the same notion of convolution u v when u, v are positive members of L
1
and u v is interpreted in L
1
rather than
as a function (257Ya). But we should have to go rather deeper than the arguments here to nd ideas in the theory of
convolution of measures to correspond to such results as 255K. I will return to questions of this type in 444 in Volume
4.
All the theorems of this section can be extended to general abelian locally compact Hausdor topological groups;
but for such generality we need much more advanced ideas (see 444), and for the moment I leave only the suggestion
in 257Yb that you should try to adapt the ideas here to ], ] or S
1
.
261A Vitalis theorem in R
r
277
Chapter 26
Change of Variable in the Integral
I suppose most courses on basic calculus still devote a substantial amount of time to practice in the techniques of
integrating standard functions. Surely the most powerful single technique is that of substitution: replacing
_
g(y)dy
by
_
g((x))

(x)dx for an appropriate function . At this level one usually concentrates on the skills of guessing at
appropriate and getting the formulae right. I will not address such questions here, except for rare special cases; in
this book I am concerned rather with validating the process. For functions of one variable, it can usually be justied
by an appeal to the Fundamental Theorem of Calculus, and for any particular case I would normally go rst to 225
in the hope that the results there would cover it. But for functions of two or more variables some much deeper ideas
are necessary.
I have already treated the general problem of integration-by-substitution in abstract measure spaces in 235. There
I described conditions under which
_
g(y)dy =
_
g((x))J(x)dx for an appropriate function J. The context there gave
very little scope for suggestions as to how to compute J; at best, it could be presented as a Radon-Nikod ym derivative
(235M). In this chapter I give a form of the fundamental theorem for the case of Lebesgue measure, in which is a
more or less dierentiable function between Euclidean spaces, and J is a Jacobian, the modulus of the determinant
of the derivative of (263D). This necessarily depends on a serious investigation of the relationship between Lebesgue
measure and geometry. The rst step is to establish a form of Vitalis theorem for r-dimensional space, together with
r-dimensional density theorems; I do this in 261, following closely the scheme of 221 and 223 above. We need to
know quite a lot about dierentiable functions between Euclidean spaces, and it turns out that the theory is intertwined
with that of Lipschitz functions; I treat these in 262.
In the next two sections of the chapter, I turn to a separate problem for which some of the same techniques turn
out to be appropriate: the description of surface measure on (smooth) surfaces in Euclidean space, like the surface of
a cone or sphere. I suppose there is no diculty in forming a robust intuition as to what is meant by the area of
such a surface and of suitably simple regions within it, and there is a very strong presumption that there ought to be
an expression for this intuition in terms of measure theory as presented in this book; but the details are not I think
straightforward. The rst point to note is that for any calculation of the area of a region G in a surface S, one would
always turn at once to a parametrization of the region, that is, a bijection : D G from some subset D of Euclidean
space. But obviously one needs to be sure that the result of the calculation is independent of the parametrization
chosen, and while it would be possible to base the theory on results showing such independence directly, that does not
seem to me to be a true reection of the underlying intuition, which is that the area of simple surfaces, at least, is
something intrinsic to their geometry. I therefore see no acceptable alternative to a theory of r-dimensional measure
which can be described in purely geometric terms. This is the burden of 264, in which I give the denition and most
fundamental properties of Hausdor r-dimensional measure in Euclidean spaces. With this established, we nd that
the techniques of 261-263 are sucient to relate it to calculations through parametrizations, which is what I do in
265.
The chapter ends with a brief account of the Brunn-Minkowski inequality (266C), which is an essential tool for the
geometric measure theory of convex sets.
261 Vitalis theorem in R
r
The main aim of this section is to give r-dimensional versions of Vitalis theorem and Lebesgues Density Theorem,
following ideas already presented in 221 and 223.
261A Notation For most of this chapter, we shall be dealing with the geometry and measure of Euclidean space;
it will save space to x some notation.
Throughout this section and the two following, r 1 will be an integer. I will use Roman letters for members of R
r
and Greek letters for their coordinates, so that a = (
1
, . . . ,
r
), etc.; if you see any Greek letter with a subscript you
should look rst for a nearby vector of which it might be a coordinate. The measure under consideration will nearly
always be Lebesgue measure on R
r
; so unless otherwise indicated should be interpreted as Lebesgue measure, and

as Lebesgue outer measure. Similarly,


_
. . . dx will always be integration with respect to Lebesgue measure (in a
dimension determined by the context).
For x = (
1
, . . . ,
r
) R
r
, write |x| =
_

2
1
+. . . +
2
r
. Recall that |x + y| |x| + |y| (1A2C) and that
|x| = [[|x| for any vectors x, y and scalar .
I will use the same notation as in 115 for intervals, so that, in particular,
[a, b[ = x :
i

i
<
i
i r,
278 Change of variable in the integral 261A
]a, b[ = x :
i
<
i
<
i
i r,
[a, b] = x :
i

i

i
i r
whenever a, b R
r
.
0 = (0, . . . , 0) will be the zero vector in R
r
, and 1 will be (1, . . . , 1). If x R
r
and > 0, B(x, ) will be the closed
ball with centre x and radius , that is, y : y R
r
, |y x| . Note that B(x, ) = x + B(0, ); so that by the
translation-invariance of Lebesgue measure we have
B(x, ) = B(0, ) =
r

r
,
where

r
=
1
k!

k
if r = 2k is even,
=
2
2k+1
k!
(2k+1)!

k
if r = 2k + 1 is odd
(252Q).
261B Vitalis theorem in R
r
Let A R
r
be any set, and J a family of closed non-trivial (that is, non-singleton,
or, equivalently, non-negligible) balls in R
r
such that every point of A is contained in arbitrarily small members of J.
Then there is a countable disjoint set J
0
J such that (A

J
0
) = 0.
proof (a) To begin with (down to the end of (f) below), suppose that |x| < M for every x A, and set
J

= I : I J, I B(0, M).
If there is a nite disjoint set J
0
J

such that A

J
0
(including the possibility that A = J
0
= ), we can stop. So
let us suppose henceforth that there is no such J
0
.
(b) In this case, if J
0
is any nite disjoint subset of J

, there is a J J

which is disjoint from any member of J


0
.
PPP Take x A

J
0
. Because every member of J
0
is closed, there is a > 0 such that B(x, ) does not meet any
member of J
0
, and as |x| < M we can suppose that B(x, ) B(0, M). Let J be a member of J, containing x, and
of diameter at most ; then J J

and J

J
0
= . QQQ
(c) We can therefore choose a sequence
n

nN
of real numbers and a disjoint sequence I
n

nN
in J

inductively,
as follows. Given I
j

j<n
(if n = 0, this is the empty sequence, with no members), with I
j
J

for each j < n, and


I
j
I
k
= for j < k < n, set
n
= I : I J

, I I
j
= for every j < n. We know from (b) that
n
,= . Set

n
= supdiamI : I
n
;
then
n
2M, because every member of
n
is included in B(0, M). We can therefore nd a set I
n

n
such that
diamI
n

1
2

n
, and this continues the induction.
(e) Because the I
n
are disjoint measurable subsets of the bounded set B(0, M), we have

n=0
I
n
B(0, M) < ,
and lim
n
I
n
= 0. Also I
n

r
(
1
4

n
)
r
for each n, so lim
n

n
= 0.
Now dene I

n
to be the closed ball with the same centre as I
n
but ve times the diameter, so that it contains every
point within a distance
n
of I
n
. I claim that, for any n, A

j<n
I
j

jn
I

j
. PPP??? Suppose, if possible, otherwise.
Take any x A (

j<n
I
j

jn
I

j
). Let > 0 be such that
B(x, ) B(0, M)

j<n
I
j
,
and let J J be such that x J B(x, ). Then
lim
m

m
= 0 < diamJ
(this is where we use the hypothesis that all the balls in J are non-trivial); let m be the least integer greater than
or equal to n such that
m
< diamJ. In this case J cannot belong to
m
, so there must be some k < m such that
J I
k
,= , because certainly J J

. By the choice of , k cannot be less than n, so n k < m, and


k
diamJ. So
the distance from x to the nearest point of I
k
is at most diamJ
k
. But this means that x I

k
; which contradicts
the choice of x. XXXQQQ
(f ) It follows that
261C Vitalis theorem in R
r
279

(A

j<n
I
j
) (

jn
I

j
)

j=n
I

j
5
r

j=n
I
j
.
As

j=0
I
j
B(0, M) < ,
lim
n

(A

j<n
I
j
) = 0 and
(A

jN
I
j
) =

(A

jN
I
j
) = 0.
Thus in this case we may set J
0
= I
n
: n N to obtain a countable disjoint family in J with (A

J
0
) = 0.
(g) This completes the proof if A is bounded. In general, set
U
n
= x : x R
r
, n < |x| < n + 1, A
n
= A U
n
,
n
= I : I J, I U
n
,
for each n N. Then for each n we see that every point of A
n
belongs to arbitrarily small members of
n
, so there is
a countable disjoint

n

n
such that A
n

n
is negligible. Now (because the U
n
are disjoint) J
0
=

nN

n
is
disjoint, and of course J
0
is a countable subset of J; moreover,
A

J
0
(R
r

nN
U
n
)

nN
(A
n

n
)
is negligible. (To see that R
r

nN
U
n
= x : |x| N is negligible, note that for any n N the set
x : |x| = n B(0, n) B(0, n)
has measure at most
r
n
r

r
(n)
r
for every [0, 1[, so must be negligible.)
261C Just as in 223, we can use the r-dimensional Vitali theorem to prove theorems on the approximation of
functions by their local mean values.
Density Theorem in R
r
: integral form Let D be a subset of R
r
, and f a real-valued function which is integrable
over D. Then
f(x) = lim
0
1
B(x,)
_
DB(x,)
fd
for almost every x D.
proof (a) To begin with (down to the end of (b)), let us suppose that D = domf = R
r
.
Take n N and q, q

Q with q < q

, and set
A = A
nqq
= x : |x| n, f(x) q, limsup
0
1
B(x,)
_
B(x,)
fd > q

.
??? Suppose, if possible, that

A > 0. Let > 0 be such that (1 + [q[) < (q

q)

A, and let ]0, ] be such that


_
E
[f[ whenever E (225A). Let G A be an open set of measure at most

A + (134Fa). Let J be the


set of non-trivial closed balls B G such that
1
B
_
B
fd q

. Then every point of A is contained in (indeed, is the


centre of) arbitrarily small members of J. So there is a countable disjoint set J
0
J such that (A

J
0
) = 0, by
261B; set H =

J
0
.
Because
_
I
fd q

I for each I J
0
, we have
_
H
fd =

II
0
_
I
fd q

II
0
I = q

H q

A.
Set
E = x : x G, f(x) q.
Then E is measurable, and A E G; so

A E G

A+

A+.
Also
(H E) GE ,
so by the choice of ,
_
H\E
f and
_
H
f +
_
HE
f +q(H E)
+q

A+[q[((H E)

A) q

A+(1 +[q[)
(because

A =

(A H) (H E))
280 Change of variable in the integral 261C
< q

A
_
H
f,
which is impossible. XXX
Thus A
nqq
is negligible. This is true for all q < q

and all n, so
A

q,q

Q,q<q

nN
A
nqq

is negligible. But
f(x) limsup
0
1
B(x,)
_
B(x,)
f
for every x R
r
A

, that is, for almost all x R


r
.
(b) Similarly, or applying this result to f.
f(x) liminf
0
1
B(x,)
_
B(x,)
f
for almost every x, so
f(x) = lim
0
1
B(x,)
_
B(x,)
f
for almost every x.
(c) For the (supercially) more general case enunciated in the theorem, let

f be a -integrable function extending
fD, dened everywhere on R
r
, and such that
_
F

f =
_
DF
f for every measurable F R
r
(applying 214Eb to fD).
Then
f(x) =

f(x) = lim
0
1
B(x,)
_
B(x,)

f = lim
0
1
B(x,)
_
DB(x,)
f
for almost every x D.
261D Corollary (a) If D R
r
is any set, then
lim
0

(DB(x,))
B(x,)
= 1
for almost every x D.
(b) If E R
r
is a measurable set, then
lim
0
(EB(x,))
B(x,)
= E(x)
for almost every x R
r
.
(c) If D R
r
and f : D R is any function, then for almost every x D,
lim
0

({y:yD, |f(y)f(x)|}B(x,))
B(x,)
= 1
for every > 0.
(d) If D R
r
and f : D R is measurable, then for almost every x D,
lim
0

({y:yD, |f(y)f(x)|}B(x,))
B(x,)
= 0
for every > 0.
proof (a) Apply 261C with f = B(0, n) to see that, for any n N,
lim
0

(DB(x,))
B(x,)
= 1
for almost every x D with |x| < n.
(b) Apply (a) to E to see that
liminf
0
(EB(x,))
B(x,)
E(x)
for almost every x R
r
, and to E

= R
r
E to see that
261E Vitalis theorem in R
r
281
limsup
0
(EB(x,))
B(x,)
= 1 liminf
0
(E

B(x,))
B(x,)
1 E

(x) = E(x)
for almost every x.
(c) For q, q

Q, set
D
qq
= x : x D, q f(x) q

,
C
qq
= x : x D
qq
, lim
0

(D
qq
B(x,))
B(x,)
= 1;
now set
C = D

q,q

Q
(D
qq
C
qq
),
so that D C is negligible. If x C and > 0, then there are q, q

Q such that f(x) q f(x) q

f(x) +,
and now
liminf
0

{y:yDB(x,), |f(y)f(x)|}
B(x,)
liminf
0

(D
qq
B(x,))
B(x,)
= 1,
so
lim
0

{y:yDB(x,), |f(y)f(x)|}
B(x,)
= 1.
(d) Dene C as in (c). We know from (a) that (D C

) = 0, where
C

= x : x D, lim
0

(DB(x,))
B(x,)
= 1.
If x C C

and > 0, we know from (c) that


lim
0

{y:yDB(x,), |f(y)f(x)|/2}
B(x,)
= 1.
But because f is measurable, we have

y : y D B(x, ), [f(y) f(x)[


+

y : y D B(x, ), [f(y) f(x)[


1
2

(D B(x, ))
for every > 0. Accordingly
limsup
0

{y:yDB(x,), |f(y)f(x)|}
B(x,)
lim
0

(DB(x,))
B(x,)
lim
0

{y:yDB(x,), |f(y)f(x)|/2}
B(x,)
= 0,
and
lim
0

{y:yDB(x,), |f(y)f(x)|}
B(x,)
= 0
for every x C C

, that is, for almost every x D.


261E Theorem Let f be a locally integrable function dened on a conegligible subset of R
r
. Then
lim
0
1
B(x,)
_
B(x,)
[f(y) f(x)[dy = 0
for almost every x R
r
.
proof (Compare 223D.)
(a) Fix n N for the moment, and set G = x : |x| < n. For each q Q, set g
q
(x) = [f(x) q[ for x Gdomf;
then g
q
is integrable over G, and
lim
0
1
B(x,)
_
GB(x,)
g
q
= g
q
(x)
for almost every x G, by 261C. Setting
282 Change of variable in the integral 261E
E
q
= x : x G domf, lim
0
1
B(x,)
_
GB(x,)
g
q
= g
q
(x),
we have G E
q
negligible for every q, so G E is negligible, where E =

qQ
E
q
. Now
lim
0
1
B(x,)
_
GB(x,)
[f(y) f(x)[dy = 0
for every x E. PPP Take x E and > 0. Then there is a q Q such that [f(x) q[ , so that
[f(y) f(x)[ [f(y) q[ + = g
q
(y) +
for every y G domf, and
limsup
0
1
B(x,)
_
GB(x,)
[f(y) f(x)[dy limsup
0
1
B(x,)
_
GB(x,)
g
q
(y) + dy
= +g
q
(x) 2.
As is arbitrary,
lim
0
1
B(x,)
_
GB(x,)
[f(y) f(x)[dy = 0,
as required. QQQ
(b) Because G is open,
lim
0
1
B(x,)
_
B(x,)
[f(y) f(x)[dy = lim
0
1
B(x,)
_
GB(x,)
[f(y) f(x)[dy = 0
for almost every x G. As n is arbitrary,
lim
0
1
B(x,)
_
B(x,)
[f(y) f(x)[dy = 0
for almost every x R
r
.
Remark The set
x : x domf, lim
0
1
B(x,)
_
B(x,)
[f(y) f(x)[dy = 0
is sometimes called the Lebesgue set of f.
261F Another very useful consequence of 261B is the following.
Proposition Let A R
r
be any set, and > 0. Then there is a sequence B
n

nN
of closed balls in R
r
, all of radius at
most , such that A

nN
B
n
and

n=0
B
n

A + . Moreover, we may suppose that the balls in the sequence


whose centres do not lie in A have measures summing to at most .
proof (a) Set
r
= B(0, 1). The rst step is the obvious remark that if x R
r
, > 0 then the half-open cube
I = [x, x +1[ is a subset of the ball B(x,

r), which has measure


r

r
=
r
I, where
r
=
r
r
r/2
. It follows that if
G R
r
is any open set, then G can be covered by a sequence of balls of total measure at most
r
G. PPP If G is empty,
we can take all the balls to be singletons. Otherwise, for each k N, set
Q
k
= z : z Z
r
,
_
2
k
z, 2
k
(z +1)
_
G,
E
k
=

zQ
k
_
2
k
z, 2
k
(z +1
_
).
Then E
k

kN
is a non-decreasing sequence of sets with union G, and E
0
and each of the dierences E
k+1
E
k
is
expressible as a disjoint union of half-open cubes. Thus G also is expressible as a disjoint union of a sequence I
n

nN
of half-open cubes. Each I
n
is covered by a ball B
n
of measure
r
I
n
; so that G

nN
B
n
and

n=0
B
n

r

n=0
I
n
=
r
G. QQQ
(b) It follows at once that if A = 0 then for any > 0 there is a sequence B
n

nN
of balls covering A of measures
summing to at most , because there is certainly an open set including A with measure at most /
r
.
(c) Now take any set A, and > 0. Let G A be an open set with G

A+
1
2
. Let J be the family of non-trivial
closed balls included in G, of radius at most and with centres in A. Then every point of A belongs to arbitrarily
small members of J, so there is a countable disjoint J
0
J such that (A

J
0
) = 0. Let B

nN
be a sequence
261Yh Vitalis theorem in R
r
283
of balls covering A

J
0
with

n=0
B

n
min(
1
2
,
r

r
); these surely all have radius at most . Let B
n

nN
be a
sequence amalgamating J
0
with B

nN
; then A

nN
B
n
, every B
n
has radius at most and

n=0
B
n
=

BI
0
B +

n=0
B

n
G+
1
2
A+,
while the B
n
whose centres do not lie in A must come from the sequence B

nN
, so their measures sum to at most
1
2
.
Remark In fact we can (if A is not empty) arrange that the centre of every B
n
belongs to A. This is an easy
consequence of Besicovitchs Covering Lemma (see 472 in Volume 4).
261X Basic exercises (a) Show that 261C is valid for any locally integrable real-valued function f; in particular,
for any f L
p
(
D
) for any p 1, writing
D
for the subspace measure on D.
(b) Show that 261C, 261Dc, 261Dd and 261E are valid for complex-valued functions f.
>>>(c) Take three disks in the plane, each touching the other two, so that they enclose an open region R with three
cusps. In R let D be a disk tangent to each of the three original disks, and R
0
, R
1
, R
2
the three components of R D.
In each R
j
let D
j
be a disk tangent to each of the disks bounding R
j
, and R
j0
, R
j1
, R
j2
the three components of
R
j
D
j
. Continue, obtaining 27 regions at the next step, 81 regions at the next, and so on.
Show that the total area of the residual regions converges to zero as the process continues indenitely. (Hint:
compare with the process in the proof of 261B.)
261Y Further exercises (a) Formulate an abstract denition of Vitali cover, meaning a family of sets satisfying
the conclusion of 261B in some sense, and corresponding generalizations of 261C-261E, covering (at least) (b)-(d) below.
(b) For x R
r
, k N let C(x, k) be the half-open cube of the form
_
2
k
z, 2
k
(z +1)
_
, with z Z
r
, containing x.
Show that if f is an integrable function on R
r
then
lim
k
2
kr
_
C(x,k)
f = f(x)
for almost every x R
r
.
(c) Let f be a real-valued function which is integrable over R
r
. Show that
lim
0
1

r
_
[x,x+1[
f = f(x)
for almost every x R
r
.
(d) Give X = 0, 1
N
its usual measure (254J). For x X, k N set C(x, k) = y : y X, y(i) = x(i) for
i < k. Show that if f is any real-valued function which is integrable over X then lim
k
2
k
_
C(x,k)
fd = f(x),
lim
k
2
k
_
C(x,k)
[f(y) f(x)[(dy) = 0 for almost every x X.
(e) Let f be a real-valued function which is integrable over R
r
, and x a point in the Lebesgue set of f. Show that for
every > 0 there is a > 0 such that [f(x)
_
f(x y)g(|y|)dy[ whenever g : [0, [ [0, [ is a non-increasing
function such that
_
R
r
g(|y|)dy = 1 and
_
B(0,)
g(|y|)dy 1 . (Hint: 223Yg.)
(f ) Let T be the family of those measurable sets G R
r
such that lim
0
(GB(x,))
B(x,)
= 1 for every x G. Show
that T is a topology on R
r
, the density topology of R
r
. Show that a function f : R
r
R is measurable i it is
T-continuous at almost every point of R
r
.
(g) A set A R
r
is said to be porous at x R
r
if limsup
yx
(y,A)
yx
> 0, writing (y, A) = inf
zA
|y z| (or
if A is empty). (i) Show that if A is porous at all its points then it is negligible. (ii) Show that in the construction of
261A the residual set A

J
0
is always porous.
(h) Let A R
r
be a bounded set and J a non-empty family of non-trivial closed balls covering A. Show that for
any > 0 there are disjoint B
0
, . . . , B
n
J such that

A (3 +)
r

n
k=0
B
k
.
284 Change of variable in the integral 261Yi
(i) Let (X, ) be a metric space and A X any set, x
x
: A [0, [ any bounded function. Show that if > 3
then there is an A

A such that (i) (x, y) >


x
+
y
for all distinct x, y A

(ii)

xA
B(x,
x
)

xA

B(x,
x
),
writing B(x, ) for the closed ball y : (y, x) .
(j) Show that any union of non-trivial closed balls in R
r
is Lebesgue measurable. (Hint: induce on r. Compare
415Ye in Volume 4.)
(k) Suppose that A R
r
and that J is a family of closed subsets of R
r
such that
for every x A there is an > 0 such that for every > 0 there is an I J such that x I and
0 < (diamI)
r
I .
Show that there is a countable disjoint set J
0
J such that A

J
0
is negligible.
(l) Let T

be the family of measurable sets G R


r
such that whenever x G and > 0 there is a > 0 such that
(GI) (1 )I whenever I is an interval containing x and included in B(x, ). Show that T

is a topology on R
r
intermediate between the density topology (261Yf) and the Euclidean topology.
261 Notes and comments In the proofs of 261B-261E above, I have done my best to follow the lines of the one-
dimensional case; this section amounts to a series of generalizations of the work of 221 and 223.
It will be clear that the idea of 261A/261B can be used on other shapes than balls. To make it work in the form
above, we need a family J such that there is a constant K for which
I

KI
for every I J, where we write
I

= x : inf
yI
|x y| diam(I).
Evidently this will be true for many classes J determined by the shapes of the sets involved; for instance, if E R
r
is
any bounded set of strictly positive measure, the family J = x +E : x R
r
, > 0 will satisfy the condition.
In 261Ya I challenge you to nd an appropriate generalization of the arguments depending on the conclusion of
261B.
Another way of using 261B is to say that because sets can be essentially covered by disjoint sequences of balls, it
ought to be possible to use balls, rather than half-open intervals, in the denition of Lebesgue measure on R
r
. This is
indeed so (261F). The diculty in using balls in the basic denition comes right at the start, in proving that if a ball
is covered by nitely many balls then the sum of the volumes of the covering balls is at least the volume of the covered
ball. (There is a trick, using the compactness of closed balls and the openness of open balls, to extend such a proof
to innite covers.) Of course you could regard this fact as elementary, on the ground that Archimedes would have
noticed if it werent true, but nevertheless it would be something of a challenge to prove it, unless you were willing to
wait for a version of Fubinis theorem, as some authors do.
I have given the results in 261C-261D for arbitrary subsets D of R
r
not because I have any applications in mind
in which non-measurable subsets are signicant, but because I wish to make it possible to notice when measurability
matters. Of course it is necessary to interpret the integrals
_
D
fd in the way laid down in 214. The game is given
away in part (c) of the proof of 261C, where I rely on the fact that if f is integrable over D then there is an integrable

f : R
r
R such that
_
F

f =
_
DF
f for every measurable F R
r
. In eect, for all the questions dealt with here, we
can replace f, D by

f, R
r
.
The idea of 261C is that, for almost every x, f(x) is approximated by its mean value on small balls B(x, ), ignoring
the missing values on B(x, ) (D domf); 261E is a sharper version of the same idea. The formulae of 261C-261E
mostly involve the expression B(x, ). Of course this is just
r

r
. But I think that leaving it unexpanded is actually
more illuminating, as well as avoiding sub- and superscripts, since it makes it clearer what these density theorems are
really about. In 472 of Volume 4 I will revisit this material, showing that a surprisingly large proportion of the ideas
can be applied to arbitrary Radon measures on R
r
, even though Vitalis theorem (in the form stated here) is no longer
valid.
262B Lipschitz and dierentiable functions 285
262 Lipschitz and dierentiable functions
In preparation for the main work of this chapter in 263, I devote a section to two important classes of functions
between Euclidean spaces. What we really need is the essentially elementary material down to 262I, together with the
technical lemma 262M and its corollaries. Theorem 262Q is not relied on in this volume, though I believe that it makes
the patterns which will develop more natural and comprehensible.
262A Lipschitz functions Suppose that r, s 1 and : D R
s
is a function, where D R
r
. We say that is
-Lipschitz, where [0, [, if
|(x) (y)| |x y|
for all x, y D, writing |x| =
_

2
1
+. . . +
2
r
if x = (
1
, . . . ,
r
) R
r
, |z| =
_

2
1
+. . . +
2
s
if z = (
1
, . . . ,
s
) R
s
.
In this case, is a Lipschitz constant for .
A Lipschitz function is a function which is -Lipschitz for some 0. Note that in this case has a least
Lipschitz constant (since if A is the set of Lipschitz constants for , and
0
= inf A, then
0
is a Lipschitz constant for
).
262B We need the following easy facts.
Lemma Let D R
r
be a set and : D R
s
a function.
(a) is Lipschitz i
i
: D R is Lipschitz for every i, writing (x) = (
1
(x), . . . ,
s
(x)) for every x D =
dom R
r
.
(b) In this case, there is a Lipschitz function

: R
r
R
s
extending .
(c) If r = s = 1 and D = [a, b] is an interval, then is Lipschitz i it is absolutely continuous and has a bounded
derivative.
proof (a) For any x, y D and i s,
[
i
(x)
i
(y)[ |(x) (y)|

s sup
js
[
j
(x)
j
(y)[,
so any Lipschitz constant for will be a Lipschitz constant for every
i
, and if
j
is a Lipschitz constant for
j
for
each j, then

s sup
js

j
will be a Lipschitz constant for .
(b) By (a), it is enough to consider the case s = 1, for if every
i
has a Lipschitz extension

i
, we can set

(x) = (

1
(x), . . . ,

s
(x)) for every x to obtain a Lipschitz extension of . Taking s = 1, then, note that the case
D = is trivial; so suppose that D ,= . Let be a Lipschitz constant for , and write

(z) = sup
yD
(y) |y z|
for every z R
r
. If x D, then, for any z R
r
and y D,
(y) |y z| (x) +|y x| |y z| (x) +|z x|,
so that

(z) (x) +|z x|; this shows, in particular, that

(z) < . Also, if z D, we must have
(z) |z z|

(z) (z) +|z z|,
so that

extends . Finally, if w, z R
r
and y D,
(y) |y w| (y) |y z| +|w z|

(z) +|w z|;
and taking the supremum over y D,

(w)

(z) +|w z|.
As w and z are arbitrary,

is Lipschitz.
(c)(i) Suppose that is -Lipschitz. If > 0 and a a
1
b
1
. . . a
n
b
n
b and

n
i=1
b
i
a
i
/(1 + ),
then

n
i=1
[(b
i
) (a
i
)[

n
i=1
[b
i
a
i
[ .
As is arbitrary, is absolutely continuous. If x [a, b] and

(x) is dened, then


[

(x)[ = lim
yx
|(y)(x)|
|yx|
,
so

is bounded.
286 Change of variable in the integral 262B
(ii) Now suppose that is absolutely continuous and that [

(x)[ for every x dom

, where 0. Then
whenever a x y b,
[(y) (x)[ = [
_
y
x

[
_
y
x
[

[ (y x)
(using 225E for the rst equality). As x and y are arbitrary, is -Lipschitz.
262C Remark The argument for (b) above shows that if : D R is a Lipschitz function, where D R
r
, then
has an extension to R
r
with the same Lipschitz constants. In fact it is the case that if : D R
s
is a Lipschitz
function, then has an extension to

: R
r
R
s
with the same Lipschitz constants; this is Kirzbrauns theorem
(Kirzbraun 34, or Federer 69, 2.10.43).
262D Proposition If : D R
r
is a -Lipschitz function, where D R
r
, then

[A]
r

A for every A D,
where is Lebesgue measure on R
r
. In particular, [D A] is negligible for every negligible set A R
r
.
proof Let > 0. By 261F, there is a sequence B
n

nN
= B(x
n
,
n
)
nN
of closed balls in R
r
, covering A, such that

n=0
B
n

A+ and

nN\K
B
n
, where K = n : n N, x
n
A. Set
L = n : n N K, B
n
D ,= ,
and for n L choose y
n
D B
n
. Now set
B

n
= B((x
n
),
n
) if n K,
= B((y
n
), 2
n
) if n L,
= if n N (K L).
Then [B
n
D] B

n
for every n, so [D A]

nN
B

n
, and

[A D]

n=0
B

n
=
r

nK
B
n
+ 2
r

nL
B
n

r
(

A+) + 2
r

r
.
As is arbitrary,

[A D]
r

A, as claimed.
262E Corollary Let : D R
r
be an injective Lipschitz function, where D R
r
, and f a measurable function
from a subset of R
r
to R.
(a) If
1
is dened almost everywhere in a subset H of R
r
and f is dened almost everywhere in R
r
, then f
1
is dened almost everywhere in H.
(b) If E D is Lebesgue measurable then [E] is measurable.
(c) If D is measurable then f
1
is measurable.
proof Set
C = dom(f
1
) = y : y [D],
1
(y) domf = [D domf].
(a) Because f is dened almost everywhere, [D domf] is negligible. But now
C = [D] [D domf] = dom
1
[D domf],
so
H C (H dom
1
) [D domf]
is negligible.
(b) Now suppose that E D and that E is measurable. Let F
n

nN
be a sequence of closed bounded subsets of E
such that (E

nN
F
n
) = 0 (134Fb). Because is Lipschitz, it is continuous, so [F
n
] is compact, therefore closed,
therefore measurable for every n (2A2F, 2A2E, 115G); also [E

nN
F
n
] is negligible, by 262D, therefore measurable.
So
[E] = [E

nN
F
n
]

nN
[F
n
]
is measurable.
(c) For any a R, take a measurable set E R
r
such that x : f(x) a = E domf. Then
262Ha Lipschitz and dierentiable functions 287
y : y C, f
1
(y) a = C [D E].
But [D E] is measurable, by (b), so y : f
1
(y) a is relatively measurable in C. As a is arbitrary, f
1
is
measurable.
262F Dierentiability I come now to the class of functions whose properties will take up most of the rest of the
chapter.
Denitions Suppose r, s 1 and that is a function from a subset D = dom of R
r
to R
s
.
(a) is dierentiable at x D if there is a real s r matrix T such that
lim
yx
(y)(x)T(yx)
yx
= 0;
in this case we may write T =

(x).
(b) I will say that is dierentiable relative to its domain at x, and that T is a derivative of at x, if x D
and for every > 0 there is a > 0 such that |(y) (x) T(y x)| |y x| for every y B(x, ) D.
262G Remarks (a) The standard denition in 262Fa, involving an all-sided limit lim
yx
, implicitly requires to
be dened on some non-trivial ball centered on x, so that we can calculate (y) (x) T(y x) for all y suciently
near x. It has the advantage that the derivative T =

(x) is uniquely dened (because if lim


z0
T
1
zT
2
z
z
= 0 then
(T
1
T
2
)z
z
= lim
0
T
1
(z)T
2
(z)
z
= 0
for every non-zero z, so T
1
T
2
must be the zero matrix). For our purposes here, there is some advantage in relaxing
this slightly to the form in 262Fb, so that we do not need to pay special attention to the boundary of dom.
(b) If you have not seen this concept of dierentiability before, but have some familiarity with partial dierentiation,
it is necessary to emphasize that the concept of dierentiable function (at least in the strict sense demanded by 262Fa)
is strictly stronger than the concept of partially dierentiable function. For purposes of computation, the most useful
method of nding true derivatives is through 262Id below. For a simple example of a function with a full set of partial
derivatives, which is not everywhere dierentiable, consider : R
2
R dened by
(
1
,
2
) =

1

2
1
+
2
2
if
2
1
+
2
2
,= 0,
= 0 if
1
=
2
= 0.
Then is not even continuous at 0, although both partial derivatives

j
are dened everywhere.
(c) In the denition above, I speak of a derivative as being a matrix. Properly speaking, the derivative of a function
dened on a subset of R
r
and taking values in R
s
should be thought of as a bounded linear operator from R
r
to R
s
;
the formulation in terms of matrices is acceptable just because there is a natural one-to-one correspondence between
s r real matrices and linear operators from R
r
to R
s
, and all these linear operators are bounded. I use the matrix
description because it makes certain calculations more direct; in particular, the relationship between

and the partial


derivatives of (262Ic), and the notion of the determinant det

(x), used throughout 263 and 265.


262H The norm of a matrix Some of the calculations below will rely on the notion of norm of a matrix. The
one I will use (in fact, for our purposes here, any norm would do) is the operator norm, dened by saying
|T| = sup|Tx| : x R
r
, |x| 1
for any s r matrix T. For the basic facts concerning these norms, see 2A4F-2A4G. The following will also be useful.
(a) If all the coecients of T are small, so is |T|; in fact, if T =
ij

is,jr
, and |x| 1, then [
j
[ 1 for each j,
so
|Tx| =
_
s
i=1
(

r
j=1

ij

j
)
2
_
1/2

_
s
i=1
(

r
j=1
[
ij
[)
2
_
1/2
r

s max
is,jr
[
ij
[,
and |T| r

s max
is,jr
[
ij
[. (This is a singularly crude inequality. A better one is in 262Ya. But it tells us, in
particular, that |T| is always nite.)
288 Change of variable in the integral 262Hb
(b) If |T| is small, so are all the coecients of T; in fact, writing e
j
for the jth unit vector of R
r
, then the ith
coordinate of Te
j
is
ij
, so [
ij
[ |Te
j
| |T|.
262I Lemma Let : D R
s
be a function, where D R
r
. For i s let
i
: D R be its ith coordinate, so that
(x) = (
1
(x), . . . ,
s
(x)) for x D.
(a) If is dierentiable relative to its domain at x D, then is continuous at x.
(b) If x D, then is dierentiable relative to its domain at x i each
i
is dierentiable relative to its domain at
x.
(c) If is dierentiable at x D, then all the partial derivatives

i

j
of are dened at x, and the derivative of
at x is the matrix

j
(x)
is,jr
.
(d) If all the partial derivatives

i

j
, for i s and j r, are dened in a neighbourhood of x D and are continuous
at x, then is dierentiable at x.
proof (a) Let T be a derivative of at x. Applying the denition 262Fb with = 1, we see that there is a > 0 such
that
|(y) (x) T(y x)| |y x|
whenever y D and |y x| . Now
|(y) (x)| |T(y x)| +|y x| (1 +|T|)|y x|
whenever y D and |y x| , so is continuous at x.
(b)(i) If is dierentiable relative to its domain at x D, let T be a derivative of at x. For i s let T
i
be the
1 r matrix consisting of the ith row of T. Let > 0. Then we have a > 0 such that
[
i
(y)
i
(x) T
i
(y x)[ |(y) (x) T(y x)|
|y x|
whenever y D and |y x| , so that T
i
is a derivative of
i
at x.
(ii) If each
i
is dierentiable relative to its domain at x, with corresponding derivatives T
i
, let T be the s r
matrix with rows T
1
, . . . , T
s
. Given > 0, there is for each i s a
i
> 0 such that
[
i
(y)
i
(x) T
i
y[ |y x| whenever y D, |y x|
i
;
set = min
is

i
> 0; then if y D and |y x| , we shall have
|(y) (x) T(y x)|
2
=

s
i=1
[
i
(y)
i
(x) T
i
(y x)[
2
s
2
|y x|
2
,
so that
|(y) (x) T(y x)|

s|y x|.
As is arbitrary, T is a derivative of at x.
(c) Set T =

(x). We have
lim
yx
(y)(x)T(yx)
yx
= 0;
x j r, and consider y = x +e
j
, where e
j
= (0, . . . , 0, 1, 0, . . . , 0) is the jth unit vector in R
r
. Then we must have
lim
0
(x+e
j
)(x)T(e
j
)
||
= 0.
Looking at the ith coordinate of (x +e
j
) (x) T(e
j
), we have
[
i
(x +e
j
)
i
(x)
ij
[ |(x +e
j
) (x) T(e
j
)|,
where
ij
is the (i, j)th coecient of T; so that
lim
0
|
i
(x+e
j
)
i
(x)
ij
|
||
= 0.
But this just says that the partial derivative

i

j
(x) exists and is equal to
ij
, as claimed.
(d) Now suppose that the partial derivatives

i

j
are dened near x and continuous at x. Let > 0. Let > 0 be
such that
262L Lipschitz and dierentiable functions 289
[

j
(y)
ij
[
whenever |y x| , writing
ij
=

i

j
(x). Now suppose that |y x| . Set
y = (
1
, . . . ,
r
), x = (
1
, . . . ,
r
),
y
j
= (
1
, . . . ,
j
,
j+1
, . . . ,
r
) for 0 j r,
so that y
0
= x, y
r
= y and the line segment between y
j1
and y
j
lies wholly within of x whenever 1 j r,
since if z lies on this line segment then
i
lies between
i
and
i
for every i. By the ordinary mean value theorem for
dierentiable real functions, applied to the function
t
i
(
1
, . . . ,
j1
, t,
j+1
, . . . ,
r
),
there is for each i s, j r a point z
ij
on the line segment between y
j1
and y
j
such that

i
(y
j
)
i
(y
j1
) = (
j

j
)

j
(z
ij
).
But
[

j
(z
ij
)
ij
[ ,
so
[
i
(y
j
)
i
(y
j1
)
ij
(
j

j
)[ [
j

j
[ |y x|.
Summing over j,
[
i
(y)
i
(x)

r
j=1

ij
(
j

j
)[ r|y x|
for each i. Summing the squares and taking the square root,
|(y) (x) T(y x)| r

s|y x|,
where T =
ij

is,jr
. And this is true whenever |y x| . As is arbitrary,

(x) = T is dened.
262J Remark I am not sure if I ought to apologize for the notation

j
. In such formulae as (
j

j
)

j
(z
ij
)
above, the two appearances of
j
clash most violently. But I do not think that any person of good will is likely to be
misled, provided that the labels
j
(or whatever symbols are used to represent the variables involved) are adequately
described when the domain of is rst introduced (and always remembering that in partial dierentiation, we are not
only moving one variable a
j
in the present context but holding xed some further list of variables, not listed in
the notation). I believe that the traditional notation

j
has survived for solid reasons, and I should like to oer a
welcome to those who are more comfortable with it than with any of the many alternatives which have been proposed,
but have never taken root.
262K The Cantor function revisited It is salutary to re-examine the examples of 134H-134I in the light of the
present considerations. Let f : [0, 1] [0, 1] be the Cantor function (134H) and set g(x) =
1
2
(x + f(x)) for x [0, 1].
Then g : [0, 1] [0, 1] is a homeomorphism (134I); set = g
1
: [0, 1] [0, 1]. We see that if 0 x y 1
then g(y) g(x)
1
2
(y x); equivalently, (y) (x) 2(y x) whenever 0 x y 1, so that is a Lipschitz
function, therefore absolutely continuous (262Bc). If D = x :

(x) is dened, then [0, 1] D is negligible (225Cb),


so [0, 1] [D] = [ [0, 1] D] is negligible (262Da). I noted in 134I that there is a measurable function h : [0, 1] R
such that the composition h is not measurable; now h(D) = (h)D cannot be measurable, even though D is
dierentiable.
262L It will be convenient to be able to call on the following straightforward result.
Lemma Suppose that D R
r
and x R
r
are such that lim
0

(DB(x,))
B(x,)
= 1. Then lim
z0
(x+z,D)
z
= 0, where
(x +z, D) = inf
yD
|x +z y|.
proof Let > 0. Let
0
> 0 be such that

(D B(x, )) > (1 (

1+
)
r
)B(x, )
whenever 0 <
0
. Take any z such that 0 < |z|
0
/(1 + ). ??? Suppose, if possible, that (x + z, D) > |z|.
Then B(x +z, |z|) B(x, (1 +)|z|) D, so
290 Change of variable in the integral 262L

(D B(x, (1 +)|z|)) B(x, (1 +)|z|) B(x +z, |z|)


= (1 (

1+
)
r
)B(x, (1 +)|z|),
which is impossible, as (1 +)|z|
0
. XXX Thus (x +z, D) |z|. As is arbitrary, this proves the result.
Remark There is a word for this; see 261Yg.
262M I come now to the rst result connecting Lipschitz functions with dierentiable functions. I approach it
through a substantial lemma which will be the foundation of 263.
Lemma Let r, s 1 be integers and a function from a subset D of R
r
to R
s
which is dierentiable at each point
of its domain. For each x D let T(x) be a derivative of . Let M
sr
be the set of s r matrices and : A ]0, [
a strictly positive function, where A M
sr
is a non-empty set containing T(x) for every x D. Then we can nd
sequences D
n

nN
, T
n

nN
such that
(i) D
n

nN
is a partition of D into sets which are relatively measurable in D, that is, are intersections of D with
measurable subsets of R
r
;
(ii) T
n
A for every n;
(iii) |(x) (y) T
n
(x y)| (T
n
)|x y| for every n N and x, y D
n
;
(iv) |T(x) T
n
| (T
n
) for every x D
n
.
proof (a) The rst step is to note that there is a sequence S
n

nN
in A such that
A

nN
T : T M
sr
, |T S
n
| < (S
n
).
PPP (Of course this is a standard result about separable metric spaces.) Write Q for the set of matrices in M
sr
with
rational coecients; then there is a natural bijection between Q and Q
sr
, so Q and Q N are countable. Enumerate
QN as (R
n
, k
n
)
nN
. For each n N, choose S
n
A by the rule
if there is an S A such that T : |T R
n
| 2
k
n
T : |T S| < (S), take such an S for S
n
;
otherwise, take S
n
to be any member of A.
I claim that this works. For let S A. Then (S) > 0; take k N such that 2
k
< (S). Take R

Q such that
|R

S| < min((S) 2
k
, 2
k
); this is possible because |R S| will be small whenever all the coecients of R
are close enough to the corresponding coecients of S (262Ha), and we can nd rational numbers to achieve this. Let
n N be such that R

= R
n
and k = k
n
. Then
T : |T R
n
| 2
k
n
T : |T S| < (S)
(because |T S| |T R
n
| +|R
n
S|), so we must have chosen S
n
by the rst part of the rule above, and
S T : |T R
n
| 2
k
n
T : |T S
n
| < (S
n
).
As S is arbitrary, this proves the result. QQQ
(b) Enumerate Q
r
Q
r
N as (q
n
, q

n
, m
n
)
nN
. For each n N, set
H
n
= x : x [q
n
, q

n
] D, |(y) (x) S
m
n
(y x)| (S
m
n
)|y x|
for every y [q
n
, q

n
] D
= [q
n
, q

n
] D

y[q
n
,q

n
]D
x : x D,
|(y) (x) S
m
n
(y x)| (S
m
n
)|y x|.
Because is continuous, H
n
= D H
n
, writing H
n
for the closure of H
n
, so H
n
is relatively measurable in D. Note
that if x, y H
n
, then y D [q
n
, q

n
], so that
|(y) (x) S
m
n
(y x)| (S
m
n
)|y x|.
Set
H

n
= x : x H
n
, |T(x) S
m
n
| (S
m
n
).
(c) D =

nN
H

n
. PPP Let x D. Then T(x) A, so there is a k N such that |T(x) S
k
| < (S
k
). Let > 0 be
such that
|(y) (x) T(x)(x y)| ((S
k
) |T(x) S
k
|)|x y|
262N Lipschitz and dierentiable functions 291
whenever y D and |y x| . Then
|(y) (x) S
k
(x y)| ((S
k
) |T(x) S
k
|)|x y| +|T(x) S
k
||x y|
(S
k
)|x y|
whenever y D B(x, ). Let q, q

Q
r
be such that x [q, q

] B(x, ). Let n be such that q = q


n
, q

= q

n
and
k = m
n
. Then x H

n
. QQQ
(d) Write
C
n
= x : x H
n
, lim
0

(H
n
B(x,))
B(x,)
= 1.
Then C
n
H

n
.
PPP (i) Take x C
n
, and set

T = T(x) S
m
n
. I have to show that |

T| (S
m
n
). Take > 0. Let
0
> 0 be such
that
|(y) (x) T(x)(y x)| |y x|
whenever y D and |y x|
0
. Since
|(y) (x) S
m
n
(y x)| (S
m
n
)|y x|
whenever y H
n
, we have
|

T(y x)| ( +(S


m
n
))|y x|
whenever y H
n
and |y x|
0
.
(ii) By 262L, there is a
1
> 0 such that (1 + 2)
1

0
and (x + z, H
n
) |z| whenever 0 < |z|
1
. So if
|z|
1
there is a y H
n
such that |x+z y| 2|z|. (If z = 0 we can take y = x.) Now |xy| (1+2)|z|
0
,
so
|

Tz| |

T(y x)| +|

T(x +z y)|
( +(S
m
n
))|y x| +|

T||x +z y|
( +(S
m
n
))|z| + ( +(S
m
n
) +|

T|)|x +z y|
( +(S
m
n
) + 2
2
+ 2(S
m
n
) + 2|

T|)|z|.
And this is true whenever 0 < |z|
1
. But multiplying this inequality by suitable positive scalars we see that
|

Tz|
_
+(S
m
n
) + 2
2
+ 2(S
m
n
) + 2|

T|
_
|z|
for all z R
r
, and
|

T| +(S
m
n
) + 2
2
+ 2(S
m
n
) + 2|

T|.
As is arbitrary, |

T| (S
m
n
), as claimed. QQQ
(e) By 261Da, H
n
C
n
is negligible for every n, so H
n
H

n
is negligible, and
H

n
= D (H
n
(H
n
H

n
))
is relatively measurable in D. Set
D
n
= H

k<n
H

k
, T
n
= S
m
n
for each n; these serve.
262N Corollary Let be a function from a subset D of R
r
to R
s
, and suppose that is dierentiable relative to
its domain at each point of D. Then D can be expressed as the union of a sequence D
n

nN
of sets such that D
n
is Lipschitz for each n N.
proof In 262M, take (T) = 1 for every T A = M
sr
. If x, y D
n
then
|(x) (y)| |(x) (y) T
n
(x y)| +|T
n
(x y)|
|x y| +|T
n
||x y|,
so D
n
is (1 +|T
n
|)-Lipschitz.
292 Change of variable in the integral 262O
262O Corollary Suppose that is an injective function from a measurable subset D of R
r
to R
r
, and that is
dierentiable relative to its domain at every point of D.
(a) If A D is negligible, [A] is negligible.
(b) If E D is measurable, then [E] is measurable.
(c) If D is measurable and f is a measurable function dened on a subset of R
r
, then f
1
is measurable.
(d) If H R
r
and
1
is dened almost everywhere on H, and if f is a function dened almost everywhere in R
r
,
then f
1
is dened almost everywhere in H.
proof Let D
n

nN
be a sequence of measurable sets with union D such that D
n
is Lipschitz for each n. Then
[A D] =

nN
(D
n
)[A D
n
] is negligible for every negligible A R
r
, by 262D.
Now parts (b)-(d) follow from (a) (because is continuous), just as in 262E.
262P Corollary Let be a function from a a subset D of R
r
to R
s
, and suppose that is dierentiable relative
to its domain, with a derivative T(x), at each point x D. Then the function x T(x) is measurable in the sense
that
ij
: D R is measurable for all i s and j r, where
ij
(x) is the (i, j)th coecient of the matrix T(x) for all
i, j and x.
proof For each k N, apply 262M with (T) = 2
k
for each T A = M
sr
, obtaining sequences D
kn

nN
of relatively
measurable subsets of D and T
kn

nN
in M
sr
. Let
(kn)
ij
be the (i, j)th coecient of T
kn
. Then we have functions
f
ijk
: D R dened by setting
f
ijk
(x) =
(kn)
ij
if x D
kn
.
Because the D
kn
are relatively measurable, the f
ijk
are measurable functions. For x D
kn
,
[
ij
(x) f
ijk
(x)[ |T(x) T
n
| 2
k
,
so [
ij
(x) f
ijk
(x)[ 2
k
for every x D, and

ij
= lim
k
f
ijk
is measurable, as claimed.
*262Q This concludes the part of the section which is essential for the rest of the chapter. However the main
results of 263 will I think be better understood if you are aware of the fact that any Lipschitz function is dierentiable
(relative to its domain) almost everywhere in its domain. I devote the next couple of pages to a proof of this fact,
which apart from its intrinsic interest is a useful exercise.
Rademachers theorem Let be a Lipschitz function from a subset of R
r
to R
s
, where s 1. Then is dierentiable
relative to its domain almost everywhere in its domain.
proof (a) By 262Ba and 262Ib, it will be enough to deal with the case s = 1. By 262Bb, there is a Lipschitz function

: R
r
R extending ; now is dierentiable with respect to its domain at any point of dom at which

is
dierentiable, so it will be enough if I can show that

is dierentiable almost everywhere. To make the notation more
agreeable to the eye, I will suppose that itself was dened everywhere in R
r
. Let be a Lipschitz constant for .
The proof proceeds by induction on r. If r = 1, we have a Lipschitz function : R R; now is absolutely
continuous in any bounded interval (262Bc), therefore dierentiable almost everywhere. Thus the induction starts.
The rest of the proof is devoted to the inductive step to r > 1.
(b) The rst step is to show that all the partial derivatives

j
are dened almost everywhere and are Borel
measurable. PPP Take j r. For q Q 0 set

q
(x) =
1
q
((x +qe
j
) (x)),
writing e
j
for the jth unit vector of R
r
. Because is continuous, so is
q
, so that
q
is a Borel measurable function
for each q. Next, for any x R
r
,
D
+
(x) = limsup
0
1

((x +e
j
) (x)) = lim
n
sup
qQ,0<|q|2
n
q
(x),
so that the set on which D
+
(x) is dened in R is Borel and D
+
is a Borel measurable function. Similarly,
D

(x) = liminf
0
1

((x +e
j
) (x))
*262Q Lipschitz and dierentiable functions 293
is a Borel measurable function with Borel domain. So
E = x :

j
(x) exists in R = x : D
+
(x) = D

(x) R
is a Borel set, and

j
is a Borel measurable function.
On the other hand, if we identify R
r
with R
J
R, taking J to be 1, . . . , j 1, j + 1, . . . , r, then we can think
of the measure on R
r
as being the product of Lebesgue measure
J
on R
J
with Lebesgue measure
1
on R (251N).
Now for every y R
J
we have a function
y
: R R dened by writing

y
() = (y, ),
and E becomes
(y, ) :

y
() is dened,
so that all the sections
: (y, ) E
are conegligible subsets of R, because every
y
is Lipschitz, therefore dierentiable almost everywhere, as remarked in
part (a) of the proof. Since we know that E is measurable, it must be conegligible, by Fubinis theorem (apply 252D
or 252F to the complement of E). Thus

j
is dened almost everywhere, as claimed. QQQ
Write
H = x : x R
r
,

j
(x) exists for every j r,
so that H is a conegligible Borel set in R
r
.
(c) For the rest of this proof, I x on the natural identication of R
r
with R
r1
R, identifying (
1
, . . . ,
r
) with
((
1
, . . . ,
r1
),
r
). For x H, let T(x) be the 1 r matrix (

1
(x), . . . ,

r
(x)).
(d) Set
H
1
= x : x H, lim
u0 in R
r1
|(x+(u,0))(x)T(x)(u,0)|
u
= 0.
I claim that H
1
is conegligible in R
r
. PPP This is really the same idea as in (b). For x H, x H
1
i
for every > 0 there is a > 0 such that
[(x + (u, 0)) (x) T(x)(u, 0)[ |u|
whenever |u| ,
that is, i
for every m N there is an n N such that
[(x + (u, 0)) (x) T(x)(u, 0)[ 2
m
|u|
whenever u Q
r1
and |u| 2
n
.
But for any particular m N and u Q
r1
the set
x : [(x + (u, 0)) (x) T(x)(u, 0)[ 2
m
|u|
is measurable, indeed Borel, because all the functions x (x+(u, 0)), x (x), x T(x)(u, 0) are Borel measurable.
So H
1
is of the form

mN

nN

uQ
r1
,u2
n
E
mnu
where every E
mnu
is a measurable set, and H
1
is therefore measurable.
Now however observe that for any R, the function
v

(v) = (v, ) : R
r1
R
is Lipschitz, therefore (by the inductive hypothesis) dierentiable almost everywhere in R
r1
; and that (v, ) H
1
i
(v, ) H and

(v) is dened. Consequently v : (v, ) H


1
is conegligible whenever v : (v, ) H is, that is,
for almost every R; so that H
1
, being measurable, must be conegligible. QQQ
(e) Now, for q, q

Q and n N, set
F(q, q

, n) = x : x R
r
, q
(x+(0,))(x)

whenever 0 < [[ 2
n
.
294 Change of variable in the integral *262Q
Set
F

(q, q

, n) = x : x F(q, q

, n), lim
0

(F(q,q

,n)B(x,))
B(x,)
= 1.
By 261Da, F(q, q

, n) F

(q, q

, n) is negligible for all q, q

, n, so that
H
2
= H
1

q,q

Q,nN
(F(q, q

, n) F

(q, q

, n))
is conegligible.
(f ) I claim that is dierentiable at every point of H
2
. PPP Take x = (u, ) H
2
. Then =

r
(x) and T = T(x)
are dened. Let be a Lipschitz constant for .
Take > 0; take q, q

Q such that q < < q

+. There must be an n N such that x F(q, q

, n);
consequently x F

(q, q

, n), by the denition of H


2
. By 262L, there is a
0
> 0 such that (x +z, F(q, q

, n)) |z|
whenever |z|
0
. Next, there is a
1
> 0 such that [(x + (v, 0)) (x) T(v, 0)[ |v| whenever v R
r1
and
|v|
1
. Set
= min(
0
,
1
, 2
n
)/(1 + 2) > 0.
Suppose that z = (v, ) R
r
and that |z| . Because |z|
0
there is an x

= (u

) F(q, q

, n) such that
|x +z x

| 2|z|; set x

= (u

, ). Now
max(|u u

|, [

[) |x x

| (1 + 2)|z| min(
1
, 2
n
).
so
[(x

) (x) T(x

x)[ |u

u| (1 + 2)|z|.
But also
[(x

) (x

) T(x

)[ = [(x

) (x

) (

)[ [

[ (1 + 2)|z|,
because x

F(q, q

, n) and [

[ 2
n
, so that (if x

,= x

)
q
(x

)(x

+
and

(x

)(x

.
Finally,
[(x +z) (x

)[ |x +z x

| 2|z|,
[Tz T(x

x)[ |T||x +z x

| 2|T||z|.
Putting all these together,
[(x +z) x Tz[ [(x +z) (x

)[ +[T(x

x) Tz[
+[(x

) (x

) T(x

)[ +[(x

) (x) T(x

x)[
2|z| + 2|T||z| +(1 + 2)|z| +(1 + 2)|z|
= (2 + 2|T| + 2 + 4)|z|.
And this is true whenever |z| . As is arbitrary, is dierentiable at x. QQQ
Thus x : is dierentiable at x includes H
2
and is conegligible; and the induction continues.
262X Basic exercises (a) Let and be Lipschitz functions from subsets of R
r
to R
s
. Show that + is a
Lipschitz function from dom dom to R
s
.
(b) Let be a Lipschitz function from a subset of R
r
to R
s
, and c R. Show that c is a Lipschitz function.
(c) Suppose : D R
s
and : E R
q
are Lipschitz functions, where D R
r
and E R
s
. Show that the
composition : D
1
[E] R
q
is Lipschitz.
(d) Suppose , are functions from subsets of R
r
to R
s
, and suppose that x dom dom is such that each
function is dierentiable relative to its domain at x, with derivatives S, T there. Show that + is dierentiable
relative to its domain at x, and that S +T is a derivative of + at x.
262Yg Lipschitz and dierentiable functions 295
(e) Suppose that is a function from a subset of R
r
to R
s
, and is dierentiable relative to its domain at x dom.
Show that c is dierentiable relative to its domain at x for every c R.
>>>(f ) Suppose : D R
s
and : E R
q
are functions, where D R
r
and E R
s
; suppose that is dierentiable
relative to its domain at x D
1
[E], with an s r matrix T a derivative there, and that is dierentiable relative
to its domain at (x), with a q s matrix S a derivative there. Show that the composition is dierentiable relative
to its domain at x, and that the q r matrix ST is a derivative of at x.
(g) Let : R
r
R
s
be a linear operator, with associated matrix T. Show that is dierentiable everywhere, with

(x) = T for every x.


>>>(h) Let G R
r
be a convex open set, and : G R
s
a function such that all the partial derivatives

i

j
are
dened everywhere in G. Show that is Lipschitz i all the partial derivatives are bounded on G.
(i) Let : R
r
R
s
be a function. Show that is dierentiable at x R
r
i for every m N there are an n N and
an r s matrix T with rational coecients such that |(y) (x) T(y x)| 2
m
|y x| whenever |y x| 2
n
.
>>>(j) Suppose that f is a real-valued function which is integrable over R
r
, and that g : R
r
R is a bounded
dierentiable function such that the partial derivative
g

j
is bounded, where j r. Let f g be the convolution of f
and g (255L). Show that

j
(f g) is dened everywhere and equal to f
g

j
. (Hint: 255Xd.)
>>>(k) Let (X, , ) be a measure space, G R
r
an open set, and f : X G R a function. Suppose that
(i) for every x X, t f(x, t) : G R is dierentiable;
(ii) there is an integrable function g on X such that [
f

j
(x, t)[ g(x) whenever x X, t G and j r;
(iii)
_
[f(x, t)[(dx) exists in R for every t G.
Show that t
_
f(x, t)(dx) : G R is dierentiable. (Hint: show rst that, for a suitable M, [f(x, t) f(x, t

)[
M[g(x)[|t t

| for every t, t

G and x X.)
262Y Further exercises (a) Show that if T =
ij

is,jr
is an s r matrix then the operator norm |T|, as
dened in 262H, is at most
_

s
i=1

r
j=1

2
ij
.
(b) Give an example of a measurable function : R
2
R such that dom

1
is not measurable.
(c) Let : D R be any function, where D R
r
. Show that H = x : x D, is dierentiable relative to its
domain at x is relatively measurable in D, and that

j
H is measurable for every j r.
(d) A function : R
r
R is smooth if all its partial derivatives
...

j
...
l
are dened everywhere in R
r
and
are continuous. Show that if f is integrable over R
r
and : R
r
R is smooth and has bounded support then the
convolution f is smooth. (Hint: 262Xj, 262Xk.)
(e) For > 0 set

(x) = e
1/(
2
x
2
)
if |x| < , 0 if |x| ; set

=
_

(x)dx,

(x) =
1

(x) for every


x. (i) Show that

: R
r
R is smooth and has bounded support. (ii) Show that if f is integrable over R
r
then
lim
0
_
[f(x) (f

)(x)[dx = 0. (Hint: start with continuous functions f with bounded support, and use 242O.)
(f ) Show that if f is integrable over R
r
and > 0 there is a smooth function h with bounded support such that
_
[f h[ . (Hint: either reduce to the case in which f has bounded support and use 262Ye or adapt the method of
242Xi.)
(g) Suppose that f is a real function which is integrable over every bounded subset of R
r
. (i) Show that f is
integrable whenever : R
r
R is a smooth function with bounded support. (ii) Show that if
_
f = 0 for every
smooth function with bounded support then f = 0 a.e. (Hint: show that
_
B(x,)
f = 0 for every x R
r
and > 0,
and use 261C. Alternatively show that
_
E
f = 0 rst for E = [b, c], then for open sets E, then for arbitrary measurable
sets E.)
296 Change of variable in the integral 262Yh
(h) Let f be integrable over R
r
, and for > 0 let

: R
r
R be the function of 262Ye. Show that lim
0
(f

)(x) =
f(x) for every x in the Lebesgue set of f. (Hint: 261Ye.)
(i) Let L be the space of all Lipschitz functions from R
r
to R
s
and for L set
|| = |(0)| + inf : [0, [, |(y) (x)| |y x| for every x, y R
r
.
Show that (L, | |) is a Banach space.
262 Notes and comments The emphasis of this section has turned out to be on the connexions between the concepts
of Lipschitz function and dierentiable function. It is the delight of classical real analysis that such intimate
relationships arise between concepts which belong to dierent categories. Lipschitz functions clearly belong to the
theory of metric spaces (I will return to this in 264), while dierentiable functions belong to the theory of dierentiable
manifolds, which is outside the scope of this volume. I have written this section out carefully just in case there are
readers who have so far missed the theory of dierentiable mappings between multi-dimensional Euclidean spaces; but
it also gives me a chance to work through the notion of function dierentiable relative to its domain, which will make
it possible in the next section to ride smoothly past a variety of problems arising at boundaries. The diculties I am
concerned with arise in the rst place with such functions as the polar-coordinate transformation
(, ) ( cos , sin ) : (0, 0) (]0, [ ], ]) R
2
.
In order to make this a bijection we have to do something rather arbitrary, and the domain of the transformation
cannot be an open set. On the denitions I am using, this function is dierentiable relative to its domain at every point
of its domain, and we can apply such results as 262O uninhibitedly. You will observe that in this case the non-interior
points of the domain form a negligible set (0, 0) (]0, [ ), so we can expect to be able to ignore them; and for
most of the geometrically straightforward transformations that the theory is applied to, judicious excision of negligible
sets will reduce problems to the case of honestly dierentiable functions with open domains. But while open-domain
theory will deal with a large proportion of the most important examples, there is a danger that you would be left with
real misapprehensions concerning the scope of these methods.
The essence of dierentiability is that a dierentiable function is approximable, near any given point of its domain,
by an ane function. The idea of 262M is to describe a widely eective method of dissecting D = dom into countably
many pieces on each of which is well-behaved. This will be applied in 263 and 265 to investigate the measure of
[D]; but we already have several straightforward consequences (262N-262P).
263 Dierentiable transformations in R
r
This section is devoted to the proof of a single major theorem (263D) concerning dierentiable transformations
between subsets of R
r
. There will be a generalization of this result in 265, and those with some familiarity with the
topic, or sucient hardihood, may wish to read 264 before taking this section and 265 together. I end with a few
simple corollaries and an extension of the main result which can be made in the one-dimensional case (263I).
Throughout this section, as in the rest of the chapter, will denote Lebesgue measure on R
r
.
263A Linear transformations I begin with the special case of linear operators, which is not only the basis of the
proof of 263D, but is also one of its most important applications, and is indeed sucient for many very striking results.
Theorem Let T be a real r r matrix; regard T as a linear operator from R
r
to itself. Let J = [ det T[ be the modulus
of its determinant. Then
T[E] = JE
for every measurable set E R
r
. If T is a bijection (that is, if J ,= 0), then
F = JT
1
[F]
for every measurable F R
r
, and
_
F
g d = J
_
T
1
[F]
gT d
for every integrable function g and measurable set F.
proof (a) The rst step is to show that T[I] is measurable for every half-open interval I R
r
. PPP Any non-empty
half-open interval I = [a, b[ is a countable union of closed intervals I
n
= [a, b 2
n
1], and each I
n
is compact (2A2F),
263A Dierentiable transformations in R
r
297
so that T[I
n
] is compact (2A2Eb), therefore closed (2A2Ec), therefore measurable (115G), and T[I] =

nN
T[I
n
] is
measurable. QQQ
(b) Set J

= T[ [0, 1[ ], where 0 = (0, . . . , 0) and 1 = (1, . . . , 1); because T[ [0, 1[ ] is bounded, J

< . (I will
eventually show that J

= J.) It is convenient to deal with the case of singular T rst. Recall that T, regarded as a
linear transformation from R
r
to itself, is either bijective or onto a proper linear subspace. In the latter case, take any
e R
r
T[R
r
]; then the sets
T[ [0, 1[ ] +e,
as runs over [0, 1], are disjoint and all of the same measure J

, because is translation-invariant (134A); moreover,


their union is bounded, so has nite outer measure. As there are innitely many such , the common measure J

must
be zero. Now observe that
T[R
r
] =

zZ
r
T[ [0, 1[ ] +Tz,
and
(T[ [0, 1[ ] +Tz) = J

= 0
for every z Z
r
, while Z
r
is countable, so T[R
r
] = 0. At the same time, because T is singular, it has zero determinant,
and J = 0. Accordingly
T[E] = 0 = JE
for every measurable E R
r
, and were done.
(c) Henceforth, therefore, let us assume that T is non-singular. Note that it and its inverse are continuous, so that
T is a homeomorphism, and T[G] is open i G is open.
If a R
r
and k N, then
T[
_
a, a + 2
k
1
_
] = 2
kr
J

.
PPP Set J

k
= T[
_
0, 2
k
1
_
]. Now T[
_
a, a + 2
k
1
_
] = T[
_
0, 2
k
1
_
] +Ta; because is translation-invariant, its measure
is also J

k
. Next, [0, 1[ is expressible as a disjoint uion of 2
kr
sets of the form
_
a, a + 2
k
1
_
; consequently, T[ [0, 1[ ] is
expressible as a disjoint uion of 2
kr
sets of the form T[
_
a, a + 2
k
1
_
], and
J

= T[ [0, 1[ ] = 2
kr
J

k
,
that is, J

k
= 2
kr
J

, as claimed. QQQ
(d) Consequently T[G] = J

G for every open set G R


r
. PPP For each k N, set
Q
k
= z : z Z
r
,
_
2
k
z, 2
k
z + 2
k
1
_
G,
G
k
=

zQ
k
_
2
k
z, 2
k
z + 2
k
1
_
.
Then G
k
is a disjoint union of #(Q
k
) sets of the form
_
2
k
z, 2
k
z + 2
k
1
_
, so G
k
= 2
kr
#(Q
k
); also, T[G
k
] is a
disjoint union of #(Q
k
) sets of the form T[
_
2
k
z, 2
k
z + 2
k
1
_
], so has measure 2
kr
J

#(Q
k
) = J

G
k
, using (c).
Observe next that G
k

kN
is a non-decreasing sequence with union G, so that
T[G] = lim
k
T[G
k
] = lim
k
J

G
k
= J

G. QQQ
(e) It follows that

T[A] = J

A for every A R
r
. PPP Given A R
r
and > 0, there are open sets G, H such
that G A, H T[A], G

A+ and H

T[A] + (134Fa). Set G


1
= G T
1
[H]; then G
1
is open because
T
1
[H] is. Now T[G
1
] = J

G
1
, so

T[A] T[G
1
] = J

G
1
J

A+J

G
1
+J

= T[G
1
] +J

H +J

T[A] + +J

.
As is arbitrary,

T[A] = J

A. QQQ
(f ) Consequently T[E] exists and is equal to J

E for every measurable E R


r
. PPP Let E R
r
be measurable,
and take any A R
r
. Set A

= T
1
[A]. Then
298 Change of variable in the integral 263A

(A T[E]) +

(A T[E]) =

(T[A

E]) +

(T[A

E])
= J

(A

E) +

(A

E))
= J

T[A

] =

A.
As A is arbitrary, T[E] is measurable, and now
T[E] =

T[E] = J

E = J

E. QQQ
(g) We are at last ready for the calculation of J

. Recall that the matrix T must be expressible as PDQ, where P


and Q are orthogonal matrices and D is diagonal, with non-negative diagonal entries (2A6C). Now we must have
T[ [0, 1[ ] = P[D[Q[ [0, 1[ ]]],
so, using (f),
J

= J

P
J

D
J

Q
,
where J

P
= P[ [0, 1[ ], etc. Now we nd that J

P
= J

Q
= 1. PPP Let B = B(0, 1) be the unit ball of R
r
. Because B
is closed, it is measurable; because it is bounded, B < ; and because B includes the non-empty half-open interval
_
0, r
1/2
1
_
, B > 0. Now P[B] = Q[B] = B, because P and Q are orthogonal matrices; so we have
B = P[B] = J

P
B,
and J

P
must be 1; similarly, J

Q
= 1. QQQ
(h) So we have only to calculate J

D
. Suppose the coecients of D are
1
, . . . ,
r
0, so that Dx = (
1

1
, . . . ,
r

r
) =
d x. We have been assuming since the beginning of (c) that T is non-singular, so no
i
can be 0. Accordingly
D[ [0, 1[ ] = [0, d[,
and
J

D
= [0, d[ =

r
i=1

i
= det D.
Now because P and Q are orthogonal, both have determinant 1, so det T = det D and J

= det T; because J

is
surely non-negative, J

= [ det T[ = J.
(i) Thus T[E] = JE for every Lebesgue measurable E R
r
. If T is non-singular, then we may use the above
argument to show that T
1
[F] is measurable for every measurable F, and
F = T[T
1
[F]] = JT
1
[F] =
_
J (T
1
[F]) d,
identifying J with the constant function with value J. By 235A,
_
F
g d =
_
T
1
[F]
JgT d = J
_
T
1
[F]
gT d
for every integrable function g and measurable set F.
263B Remark Perhaps I should have warned you that I should be calling on the results of 235. But if they were
fresh in your mind the formulae of the statement of the theorem will have recalled them, and if not then it is perhaps
better to turn back to them now rather than before reading the theorem, since they are used only in the last sentence
of the proof.
I have taken the argument above at a leisurely, not to say pedestrian, pace. The point is that while the translation-
invariance of Lebesgue measure, and its behaviour under simple magnication of a single coordinate, are more or less
built into the denition, its behaviour under general rotations is not, since a rotation takes half-open intervals into
skew cuboids. Of course the calculation of the measure of such an object is not really anything to do with the Lebesgue
theory, and it will be clear that much of the argument would apply equally to any geometrically reasonable notion of
r-dimensional volume.
We come now to the central result of the chapter. We have already done some of the detail work in 262M. The next
basic element is the following lemma.
263C Lemma Let T be any r r matrix; set J = [ det T[. Then for any > 0 there is a = (T, ) > 0 such that
(i) [ det S det T[ whenever S is an r r matrix and |S T| ;
(ii) whenever D R
r
is a bounded set and : D R
r
is a function such that |(x) (y) T(x y)| |x y|
for all x, y D, then [

[D] J

D[

D.
263C Dierentiable transformations in R
r
299
proof (a) Of course (i) is the easy part. Because det S is a continuous function of the coecients of S, and the
coecients of S must be close to those of T if |ST| is small (262Hb), there is surely a
0
> 0 such that [ det Sdet T[
whenever |S T|
0
.
(b)(i) Write B = B(0, 1) for the unit ball of R
r
, and consider T[B]. We know that T[B] = JB (263A). Let
G T[B] be an open set such that G (J + )B (134Fa). Because B is compact (2A2F) so is T[B], so there is a

1
> 0 such that T[B] +
1
B G (2A2Ed). This means that

(T[B] +
1
B) (J +)B.
(ii) Now suppose that D R
r
is a bounded set, and that : D R
r
is a function such that |(x) (y)
T(x y)|
1
|x y| for all x, y D. Then if x D and > 0,
[D B(x, )] (x) +T[B] +
1
B,
because if y D B(x, ) then T(y x) T[B] and
(y) = (x) +T(y x) + ((y) (x) T(y x))
(x) +T[B] +
1
|y x|B (x) +T[B] +
1
B.
Accordingly

[D B(x, )]

(T[B] +
1
B) =
r

(T[B] +
1
B)

r
(J +)B = (J +)B(x, ).
Let > 0. Then there is a sequence B
n

nN
of balls in R
r
such that D

nN
B
n
,

n=0
B
n

D+ and the
sum of the measures of those B
n
whose centres do not lie in D is at most (261F). Let K be the set of those n such that
the centre of B
n
lies in D. Then

[DB
n
] (J +)B
n
for every n K. Also, of course, is (|T| +
1
)-Lipschitz,
so

[D B
n
] (|T| +
1
)
r
B
n
for n N K (262D). Now

[D]

n=0

[D B
n
]

nK
(J +)B
n
+

nN\K
(|T| +
1
)
r
B
n
(J +)(

D +) +(|T| +
1
)
r
.
As is arbitrary,

[D] (J +)

D.
(c) If J = 0, we can stop here, setting = min(
0
,
1
); for then we surely have [ det S det T[ whenever
|S T| , while if : D R
r
is such that |(x) (y) T(x y)| |x y| for all x, y D, then
[

[D] J

D[ =

[D]

D.
If J ,= 0, we have more to do. Because T has non-zero determinant, it has an inverse T
1
, and [ det T
1
[ = J
1
. As in
(b-i) above, there is a
2
> 0 such that

(T
1
[B] +
2
B) (J
1
+

)B, where

= /J(J + ). Repeating (b), we


see that if C R
r
is bounded and : C R
r
is such that |(u) (v) T
1
(u v)|
2
|u v| for all u, v C,
then

[C] (J
1
+

C.
Now suppose that D R
r
is bounded and : D R
r
is such that |(x) (y) T(x y)|

2
|x y| for all x,
y D, where

2
= min(
2
, |T
1
|)/2|T
1
|
2
> 0. Then
|T
1
((x) (y)) (x y)| |T
1
|

2
|x y|
1
2
|x y|
for all x, y D, so must be injective; set C = [D] and =
1
: C D. Note that C is bounded, because
|(x) (y)| (|T| +

2
)|x y|
whenever x, y D. Also
|T
1
(u v) ((u) (v))| |T
1
|

2
|(u) (v)|
1
2
|(u) (v)|
for all u, v C. But this means that
|(u) (v)| |T
1
||u v|
1
2
|(u) (v)|
300 Change of variable in the integral 263C
and |(u) (v)| 2|T
1
||u v| for all u, v C, so that
|(u) (v) T
1
(u v)| 2

2
|T
1
|
2
|u v|
2
|u v|
for all u, v C.
By (b) just above, it follows that

D =

[C] (J
1
+

C = (J
1
+

[D],
and
J

D (1 +J

[D].
(d) So if we set = min(
0
,
1
,

2
) > 0, and if D R
r
, : D R
r
are such that D is bounded and |(x) (y)
T(x y)| |x y| for all x, y D, we shall have

[D] (J +)

D,

[D] J

D J

[D] J

D J

(J +)

D = J

D,
so we get the required formula
[

[D] J

D[

D.
263D We are ready for the theorem.
Theorem Let D R
r
be any set, and : D R
r
a function dierentiable relative to its domain at each point of D.
For each x D let T(x) be a derivative of relative to D at x, and set J(x) = [ det T(x)[. Then
(i) J : D [0, [ is a measurable function,
(ii)

[D]
_
D
J d,
allowing as the value of the integral. If D is measurable, then
(iii) [D] is measurable.
If D is measurable and is injective, then
(iv) [D] =
_
D
J d,
(v) for every real-valued function g dened on a subset of [D],
_
[D]
g d =
_
D
J gd
if either integral is dened in [, ], provided we interpret J(x)g((x)) as zero when J(x) = 0 and g((x)) is
undened.
proof (a) To see that J is measurable, use 262P; the function T [ det T[ is a continuous function of the coecients
of T, and the coecients of T(x) are measurable functions of x, by 262P, so x [ det T(x)[ is measurable (121K). We
also know that if D is measurable, [D] will be measurable, by 262Ob. Thus (i) and (iii) are done.
(b) For the moment, assume that D is bounded, and x > 0. For r r matrices T, take (T, ) > 0 as in 263C.
Take D
n

nN
, T
n

nN
as in 262M, so that D
n

nN
is a disjoint cover of D by sets which are relatively measurable in
D, and each T
n
is an r r matrix such that
|T(x) T
n
| (T
n
, ) whenever x D
n
,
|(x) (y) T
n
(x y)| (T
n
, )|x y| for all x, y D
n
.
Then, setting J
n
= [ det T
n
[, we have
[J(x) J
n
[ for every x D
n
,
[

[D
n
] J
n

D
n
[

D
n
,
by the choice of (T
n
, ). So we have
_
D
J d

n=0
J
n

D
n
+

D
_
D
J d + 2

D;
I am using here the fact that all the D
n
are relatively measurable in D, so that, in particular,

D =

n=0

D
n
.
Next,

[D]

n=0

[D
n
]

n=0
J
n

D
n
+

D.
263Ec Dierentiable transformations in R
r
301
Putting these together,

[D]
_
D
J d + 2

D.
If D is measurable and is injective, then all the D
n
are measurable subsets of R
r
, so all the [D
n
] are measurable,
and they are also disjoint. Accordingly
_
D
J d

n=0
J
n
D
n
+D

n=0
([D
n
] +D
n
) +D = [D] + 2D.
Since is arbitrary, we get

[D]
_
D
J d,
and if D is measurable and is injective,
_
D
J d [D];
thus we have (ii) and (iv), on the assumption that D is bounded.
(c) For a general set D, set B
k
= B(0, k); then

[D] = lim
k

[D B
k
] lim
k
_
DB
k
J d =
_
D
J d,
with equality if is injective and D is measurable.
(d) For part (v), I seek to show that the hypotheses of 235J are satised, taking X = D and Y = [D]. PPP Set
G = x : x D, J(x) > 0.
() If F [D] is measurable, then there are Borel sets F
1
, F
2
such that F
1
F F
2
and (F
2
F
1
) = 0. Set
E
j
=
1
[F
j
] for each j, so that E
1

1
[F] E
2
, and both the sets E
j
are measurable, because and dom are
measurable. Now, applying (iv) to E
j
,
_
E
j
J d = [E
j
] = (F
j
[D]) = F
for both j, so
_
E
2
\E
1
J d = 0 and J = 0 a.e. on E
2
E
1
. Accordingly J(
1
[F]) =
a.e.
JE
1
, and
_
J(
1
[F])d
exists and is equal to
_
E
1
J d = F. At the same time, (
1
[F]G)(E
1
G) is negligible, so
1
[F]G is measurable.
() If F [D] and G
1
[F] is measurable, then we know that [D G] =
_
D\G
J = 0 (by (iv)), so F [G]
must be negligible; while F [G] = [G
1
[F]] also is measurable, by (iii). Accordingly F is measurable whenever
G
1
[F] is measurable.
Thus all the hypotheses of 235J are satised. QQQ Now (v) can be read o from the conclusion of 235J.
263E Remarks (a) This is a version of the classical result on change of variable in a many-dimensional integral.
What I here call J(x) is the Jacobian of at x; it describes the change in volumes of objects near x, following the
rule already established in 263A for functions with constant derivative. The idea of the proof is also the classical
one: to break the set D up into small enough pieces D
m
for us to be able to approximate by ane operators
y (x) + T
m
(y x) on each. The potential irregularity of the set D, which in this theorem may be any set, is
compensated for by a corresponding freedom in choosing the sets D
m
. In fact there is a further decomposition of
the sets D
m
hidden in part (b-ii) of the proof of 263C; each D
m
is essentially covered by a disjoint family of balls,
the measures of whose images we can estimate with an adequate accuracy. There is always a danger of a negligible
exceptional set, and we need the crude inequalities of the proof of 262D to deal with it.
(b) Throughout the work of this chapter, from 261B to 263D, I have chosen balls B(x, ) as the basic shapes to
work with. I think it should be clear that in fact any reasonable shapes would do just as well. In particular, the balls
B
1
(x, ) = y :

r
i=1
[
i

i
[ , B

(x, ) = y : [
i

i
[ i
would serve perfectly. There are many alternatives. We could use sets of the form C(x, k), for x R
r
and k N, dened
to be the half-open cube of the form
_
2
k
z, 2
k
(z +1)
_
with z Z
r
containing x, instead; or even C

(x, ) = [x, x +1[.


In all such cases we have versions of the density theorems (261Yb-261Yc) which support the remaining theory.
(c) I have presented 263D as a theorem about dierentiable functions, because that is the normal form in which one
uses it in elementary applications. However, the proof depends essentially on the fact that a dierentiable function is
a countable union of Lipschitz functions, and 263D would follow at once from the same theorem proved for Lipschitz
functions only. Now the fact is that the theorem applies to any countable union of Lipschitz functions, because
302 Change of variable in the integral 263Ec
a Lipschitz function is dierentiable almost everywhere. For more advanced work (see Federer 69 or Evans &
Gariepy 92, or Chapter 47 in Volume 4) it seems clear that Lipschitz functions are the vital ones, so I spell out the
result.
*263F Corollary Let D R
r
be any set and : D R
r
a Lipschitz function. Let D
1
be the set of points at
which has a derivative relative to D, and for each x D
1
let T(x) be such a derivative, with J(x) = [ det T(x)[. Then
(i) D D
1
is negligible;
(ii) J : D
1
[0, [ is measurable;
(iii)

[D]
_
D
J(x)dx.
If D is measurable, then
(iv) [D] is measurable.
If D is measurable and is injective, then
(v) [D] =
_
D
J d,
(vi) for every real-valued function g dened on a subset of [D],
_
[D]
g d =
_
D
J gd
if either integral is dened in [, ], provided we interpret J(x)g((x)) as zero when J(x) = 0 and g((x)) is
undened.
proof This is now just a matter of putting 262Q and 263D together, with a little help from 262D. Use 262Q to show
that D D
1
is negligible, 262D to show that [D D
1
] is negligible, and apply 263D to D
1
.
263G Polar coordinates in the plane I oer an elementary example with a useful consequence. Dene : R
2

R
2
by setting (, ) = ( cos , sin ) for , R
2
. Then

(, ) =
_
cos sin
sin cos
_
, so J(, ) = [[ for all , .
Of course is not injective, but if we restrict it to the domain D = (0, 0) (, ) : > 0, < then D is
a bijection between D and R
2
, and
_
g d
1
d
2
=
_
D
g((, )) dd
for every real-valued function g which is integrable over R
2
.
Suppose, in particular, that we set
g(x) = e
x
2
/2
= e

2
1
/2
e

2
2
/2
for x = (
1
,
2
) R. Then
_
g(x)dx =
_
e

2
1
/2
d
1
_
e

2
2
/2
d
2
,
as in 253D. Setting I =
_
e
t
2
/2
dt, we have
_
g = I
2
. (To see that I is well-dened in R, note that the integrand is
continuous, therefore measurable, and that
_
1
1
e
t
2
/2
dt 2,
_
1

e
t
2
/2
dt =
_

1
e
t
2
/2
dt
_

1
e
t/2
dt = lim
a
_
a
1
e
t/2
dt =
1
2
e
1/2
are both nite.) Now looking at the alternative expression we have
I
2
=
_
g(x)dx =
_
D
g( cos , sin ) d(, )
=
_
D
e

2
/2
d(, ) =
_

0
_

2
/2
dd
(ignoring the point (0, 0), which has zero measure)
=
_

0
2e

2
/2
d = 2 lim
a
_
a
0
e

2
/2
d
= 2 lim
a
(e
a
2
/2
+ 1) = 2.
Consequently
263I Dierentiable transformations in R
r
303
_

e
t
2
/2
dt = I =

2,
which is one of the many facts every mathematician should know, and in particular is vital for Chapter 27 below.
263H Corollary If k N is odd,
_

x
k
e
x
2
/2
dx = 0;
if k = 2l N is even, then
_

x
k
e
x
2
/2
dx =
(2l)!
2
l
l!

2.
proof (a) To see that all the integrals are well-dened and nite, observe that lim
x
x
k
e
x
2
/4
= 0, so that
M
k
= sup
xR
[x
k
e
x
2
/4
[ is nite, and
_

[x
k
e
x
2
/2
[dx M
k
_

e
x
2
/4
dx < .
(b) If k is odd, then substituting y = x we get
_

x
k
e
x
2
/2
dx =
_

y
k
e
y
2
/2
dy,
so that both integrals must be zero.
(c) For even k, proceed by induction. Set I
l
=
_

x
2l
e
x
2
/2
dx. I
0
=

2 =
0!
2
0
0!

2 by 263G. For the inductive


step to l + 1 1, integrate by parts to see that
_
a
a
x
2l+1
xe
x
2
/2
dx = a
2l+1
e
a
2
/2
+ (a)
2l+1
e
a
2
/2
+
_
a
a
(2l + 1)x
2l
e
x
2
/2
dx
for every a 0. Letting a ,
I
l+1
= (2l + 1)I
l
.
Because
(2(l+1))!
2
l+1
(l+1)!

2 = (2l + 1)
(2l)!
2
l
l!

2,
the induction proceeds.
263I The one-dimensional case The restriction to injective functions in 263D(v) is unavoidable in the context
of the result there. But in the substitutions of elementary calculus it is not always essential. In the hope of clarifying
the position I give a result here which covers many of the standard tricks.
Theorem Let I R be an interval with more than one point, and : I R a function which is absolutely continuous
on any closed bounded subinterval of I. Write u = inf I, u

= sup I in [, ], and suppose that v = lim


xu
(x)
and v

= lim
xu
(x) are dened in [, ]. Let g be a Lebesgue measurable real-valued function dened almost
everywhere in [I]. Then
_
v

v
g =
_
I
g((x))

(x)dx
whenever the right-hand side is dened in R, on the understanding that we interpret
_
v

v
g as
_
v
v

g when v

< v, and
g((x))

(x) as 0 when

(x) = 0 and g((x)) is undened.


proof (a) Recall that is dierentiable almost everywhere on I (225Cb) and that [A] is negligible for every negligible
A I (225G). (These results are stated for closed bounded intervals; but since any interval is expressible as the union
of a sequence of closed bounded intervals, they remain valid in the present context.) Set D = dom

, so that I D
and [I D] are negligible. Next, setting D
0
= x : x D,

(x) = 0, D and D
0
are Borel sets (225J) and [D
0
] is
negligible, by 263D(ii), while
_
I
g((x))

(x)dx =
_
D\D
0
g((x))

(x)dx.
Applying 262M with A = R0 and () =
1
2
[[ for A, we have sequences E
n

nN
,
n

nN
such that E
n

nN
is a partition of D D
0
into measurable sets, every
n
is non-zero, and [(x) (y)
n
(x y)[
1
2
[
n
[[x y[ for all
x, y E
n
; so that, in particular, E
n
is injective, while sgn

(x) = sgn
n
for every x E
n
, writing sgn = /[[ as
usual. Set
n
= sgn
n
for each n. Now 263D(v) tells us that

n=0
_
[g[ ([E
n
]) =

n=0
_
E
n
[g((x))

(x)[dx
304 Change of variable in the integral 263I
is nite.
Note that 263D(v) also shows that if B R is negligible, then E
n

1
[B] must be negligible for every n, so that
_

1
[B]
g((x))

(x)dx = 0.
Consequently, setting
C
0
= y : y ([I] domg) (v, v

[I D] [D
0
]),

n=0
[g(y)([E
n
])(y)[ < ,
[I] C
0
is negligible, and if we set C = y : y C
0
, g(y) ,= 0,
_
JC
g =
_
J
g
for every J [I].
(b) The point of the argument is the following fact: if y C then

n=0

n
([E
n
])(y) = 1 if v < y < v

,
= 1 if v

< y < v,
= 0 if y < v v

or v

v < y.
PPP Because g(y) ,= 0 and

n=0
[g(y)([E
n
])(y)[ is nite, n : y [E
n
] is nite; because y / [I D] [D
0
], and
E
n
is injective for every n, and

nN
E
n
= D D
0
, K =
1
[y] is nite. For each x K, let n
x
be such that
x E
n
x
; then
n
x
= sgn

(x). So

n=0

n
([E
n
])(y) =

xK
sgn

(x).
If J R K is an interval, (z) ,= y for z J; since is continuous, the Intermediate Value Theorem tells us
that sgn((z) y) is constant on J. A simple induction on #(K ], z[) shows that sgn((z) y) = sgn(v y) +
2

xK,x<z
sgn

(x) for every z R K; taking the limit as z u

xK
sgn

(x) =
1
2
(sgn(v

y) sgn(v y)).
(Here we may have to interpret sgn() as 1 in the obvious way.) This turns out to be just what we need to know.
QQQ
(c) What this means is that
_
v

v
g =
_
v

v
g C =
_
C
g

n=0

n
([E
n
])
(allowing for the conventional reversal if v

< v)
=

n=0

n
_
C
g ([E
n
]) =

n=0

n
_
E
n

1
[C]
g((x))[

(x)[dx
(263D(v) again, applied to (E
n

1
[C]) for each n)
=

n=0

n
_
E
n
g((x))[

(x)[dx
(because g((x))

(x) = 0 almost everywhere in E


n

1
[C])
=

n=0
_
E
n
g((x))

(x)dx =
_
D\D
0
g((x))

(x)dx =
_
I
g((x))

(x)dx,
as claimed.
263X Basic exercises (a) Let (X, , ) be any measure space, f L
0
() and p [1, [. Show that f L
p
() i
= p
_

0

p1

x : x domf, [f(x)[ > d


is nite, and in this case |f|
p
=
1/p
. (Hint:
_
[f[
p
=
_

x : [f(x)[
p
> d, by 252O; now substitute =
p
.)
(b) Let f be an integrable function dened almost everywhere in R
r
. Show that if < r 1 then

n=1
n

[f(nx)[
is nite for almost every x R
r
. (Hint: estimate

n=0
n

_
B
[f(nx)[dx for any ball B centered at the origin.)
(c) Let A ]0, 1[ be a set such that

A =

([0, 1] A) = 1, where is Lebesgue measure on R. Set D = Ax :


x ]0, 1[ A [1, 1], and set (x) = [x[ for x D. Show that is injective, that is dierentiable relative to its
domain everywhere in D, and that

[D] <
_
D
[

(x)[dx.
263 Notes Dierentiable transformations in R
r
305
(d) Let : D R
r
be a function dierentiable relative to D at each point of D R
r
, and suppose that for each
x D there is a non-singular derivative T(x) of at x; set J(x) = [ det T(x)[. Show that D is expressible as

kN
D
k
where D
k
= D D
k
and D
k
is injective for each k.
>>>(e)(i) Show that for any Lebesgue measurable E R, t R0,
_
tE
1
|u|
du =
_
E
1
|u|
du. (ii) For t R, u R0
set (t, u) = (
t
u
, u). Show that
_
[E]
1
|tu|
d(t, u) =
_
E
1
|tu|
d(t, u) for any Lebesgue measurable E R
2
.
>>>(f ) Dene : R
3
R
3
by setting
(, , ) = ( sin sin , sin cos , cos ).
Show that det

(, , ) =
2
sin .
(g) Show that if k = 2l + 1 is odd, then
_

0
x
k
e
x
2
/2
dx = 2
l
l!. (Compare 252Xi.)
263Y Further exercises (a) Dene a measure on R by setting E =
_
E
1
|x|
dx for Lebesgue measurable sets
E R. For f, g L
1
() set (f g)(x) =
_
f(
x
t
)g(t)(dt) whenever this is dened in R. (i) Show that f g =
g f L
1
(). (ii) Show that
_
h(x)(f g)(x)(dx) =
_
h(xy)f(x)f(y)(dx)(dy) for every h L

(). (iii) Show that


f (g h) = (f g) h for every h L
1
(). (Hint: 263Xe.)
(b) Let E R
2
be a measurable set such that limsup

2
(EB(0, )) > 0, writing
2
for Lebesgue measure
on R
2
. Show that there is some ], ] such that
1
E

= , where E

= : 0, ( cos , sin ) E. (Hint:


show that
1

2
(E B(0, ))
_

min(
1
2
,
1

1
E

)d.) Generalize to higher dimensions and to functions other than


E.
(c) Let E R
r
be a measurable set, and : E R
r
a function dierentiable relative to its domain, with a
derivative T(x), at each point x of E; set J(x) = [ det T(x)[. Show that for any integrable function g dened on [E],
_
g(y)#(
1
[y])dy =
_
E
J(x)g((x))dx
(Hint: 263I.)
(d) Find a proof of 263I based on the ideas of 225. (Hint: 225Xg.)
(e) Let f : [a, b] R be a function of bounded variation, where a < b in R, with Lebesgue decomposition
f = f
p
+ f
cs
+ f
ac
as in 226Cd; let be Lebesgue measure on R. Show that the following are equiveridical: (i) f
cs
is
constant; (ii) f[ [a, b] ]
_
b
a
[f

[d; (iii)

f[A]
_
A
[f

[d for every A [a, b]; (iv) f[A] is negligible for every negligible
set A [a, b]. (Hint: for (iv)(i) put 226Yd and 263D(ii) together to show that [f(d) f(c)[
_
d
c
[f

[d +Var
[c,d]
f
p
whenever a c d b, and therefore that Var
[a,b]
f Var
[a,b]
f
p
+ Var
[a,b]
f
ac
.)
263 Notes and comments Yet again, approaching 263D, I nd myself having to choose between giving an accessible,
relatively weak result and making the extra eort to set out a theorem which is somewhere near the natural boundary
of what is achievable within the concepts being developed in this volume; and, as usual, I go for the more powerful
form. There are three basic sources of diculty: (i) the fact that we are dealing with more than one dimension; (ii) the
fact that we are dealing with irregular domains; (iii) the fact that we are dealing with arbitrary integrable functions. I
do not think I need to apologise for (iii) in a book on measure theory. Concerning (ii), it is quite true that the principal
applications of these results are to cases in which the transformation is dierentiable everywhere, with continuous
derivative, and the set D has negligible boundary; and in these cases there are substantial simplications available
mostly because the sets D
m
of the proof of 263D can be taken to be cubes. Nevertheless, I think any form of the
result which makes such assumptions is deeply unsatisfactory at this level, being an awkward compromise between
ideas natural to the Riemann integral and those natural to the Lebesgue integral. Concerning (i), it might even have
been right to lay out the whole argument for the case r = 1 before proceeding to the general case, as I did in 114-115,
because the one-dimensional case is already important and interesting; and if you nd the work above dicult which
it is and your immediate interests are in one-dimensional integration by substitution, then I think you might nd it
worth your time to reproduce the r = 1 argument yourself, up to a proof of 263I. In fact the biggest dierence is in
263A, which becomes nearly trivial; the work of 262M and 263C becomes more readable, because all the matrices turn
into scalars and we can drop the word determinant, but I do not think we can dispense with any of the ideas, at least
if we wish to obtain 263D as stated. (But see 263Yd.)
306 Change of variable in the integral 263 Notes
I found myself insisting, in the last paragraph, that a distinction can be made between ideas natural to the Riemann
integral and those natural to the Lebesgue integral. We are approaching deep questions here, like what are books on
measure theory for?, which I do not think can be answered without some possibly unconscious reference to the
question what is mathematics for?. I do of course want to present here some of the wonderful general theorems which
arise in the Lebesgue theory. But more important than any specic theorem is a general idea of what can be proved
by these methods. It is the essence of modern measure theory that continuity does not matter, or, if you prefer, that
measurable functions are in some sense so nearly continuous that we do not have to add hypotheses of continuity in our
theorems. Now this is in a sense a great liberation, and the Lebesgue integral is now the standard one. But you must
not regard the Riemann integral as outdated. The intuitions on which it is founded for instance, that the surface
of a solid body has zero volume remain of great value in their proper context, which certainly includes the study of
dierentiable functions with continuous derivatives. What I am saying here is that I believe we can use these intuitions
best if we maintain a division, a exible and permeable one, of course, between the ideas of the two theories; and that
when transferring a theorem from one side of the boundary to the other we should do so whole-heartedly, seeking to
express the full power of the methods we are using.
I have already said that the essential dierence between the one-dimensional and multi-dimensional cases lies in
263A, where the Jacobian J = [ det T[ enters the argument. Shorn of the technical devices necessary to deal with
arbitrary Lebesgue measurable sets, this amounts to a calculation of the volume of the parallelepiped T[I] where I is
the interval [0, 1[. I have dealt with this by a little bit of algebra, saying that the result is essentially obvious if T is
diagonal, whereas if T is an isometry it follows from the fact that the unit ball is left invariant; and the algebra comes
in to express an arbitrary matrix as a product of diagonal and orthogonal matrices (2A6C). It is also plain from 261F
that Lebesgue measure must be rotation-invariant as well as translation-invariant; that is to say, it is invariant under
all isometries. Another way of looking at this will appear in the next section.
I feel myself that the centre of the argument for 263D is in the lemma 263C. This is where we turn the exact result
for linear operators into an approximate result for almost-linear functions; and the whole point of dierentiability is
that a dierentiable function is well approximated, in a neighbourhood of any point of its domain, by a linear operator.
The lemma involves two rather dierent ideas. To show that

[D] (J + )

D, we look rst at balls and then


use Vitalis theorem to see that D is economically covered by balls, so that an upper bound for

[D] in terms of
a sum

BI
0

[D B] is adequate. To obtain a lower bound, we need to reverse the argument by looking at


=
1
, which involves checking rst that is invertible, and then that is appropriately linked to T
1
. I have
written out exact formulae for

2
and so on, but this is only in case you do not trust your intuition; the fact that
|
1
(u)
1
(v) T
1
(u v)| is small compared with |u v| is pretty clearly a consequence of the hypothesis that
|(x) (y) T(x y)| is small compared with |x y|.
The argument of 263D itself is now a matter of breaking the set D up into appropriate pieces on each of which is
suciently nearly linear for 263C to apply, so that

[D]

m=0

[D
m
]

m=0
(J
m
+)

D
m
.
With a little care (taken in 263C, with its condition (i)), we can also ensure that the Jacobian J is well approximated
by J
m
almost everywhere in D
m
, so that

m=0
J
m

D
m

_
D
J(x)dx.
These ideas, joined with the results of 262, bring us to the point
_
E
J d = [E]
when is injective and E D is measurable. We need a nal trick, involving Borel sets, to translate this into
_

1
[F]
J d = F
whenever F [D] is measurable, which is what is needed for the application of 235J.
I hope that you long ago saw, and were delighted by, the device in 263G. Once again, this is not really Lebesgue
integration; but I include it just to show that the machinery of this chapter can be turned to deal with the classical
results, and that indeed we have a tiny prot from our labour, in that no apology need be made for the boundary of
the set D into which the polar coordinate system maps the plane. I have already given the actual result as an exercise
in 252Xi. That involved (if you chase through the references) a one-dimensional substitution (performed in 225Xj),
Fubinis theorem and an application of the formulae of 235; that is to say, very much the same elements as those used
above, though in a dierent order. I could present this with no mention of dierentiation in higher dimensions because
the rst change of variable was in one dimension, and the second (involving the function x |x|, in 252Xi(i)) was of
a particularly simple type, so that a dierent method could be used to nd the function J.
The abstract ideas to which this treatise is devoted do not, indeed, lead us to many particular examples on which to
practise the ideas of this section. The ones which do arise tend to be very straightforward, as in 263G, 263Xa-263Xb
and 263Xe. I mention the last because it provides a formula needed to discuss a new type of convolution (263Ya).
264D Hausdor measures 307
In eect, this depends on the multiplicative group R 0 in place of the additive group R treated in 255. The
formula
1
x
in the denition of is of course the derivative of ln x, and ln is an isomorphism between (]0, [ , , ) and
(R, +, Lebesgue measure).
264 Hausdor measures
The next topic I wish to approach is the question of surface measure; a useful example to bear in mind throughout
this section and the next is the notion of area for regions on the sphere, but any other smoothly curved two-dimensional
surface in three-dimensional space will serve equally well. It is I think more than plausible that our intuitive concepts
of area for such surfaces should correspond to appropriate measures. But formalizing this intuition is non-trivial,
especially if we seek the generality that simple geometric ideas lead us to; I mean, not contenting ourselves with
arguments that depend on the special nature of the sphere, for instance, to describe spherical surface area. I divide
the problem into two parts. In this section I will describe a construction which enables us to dene the r-dimensional
measure of an r-dimensional surface among other things in s-dimensional space. In the next section I will set out
the basic theorems making it possible to calculate these measures eectively in the leading cases.
264A Denitions Let s 1 be an integer, and r > 0. (I am primarily concerned with integral r, but will not insist
on this until it becomes necessary, since there are some very interesting ideas which involve non-integral dimension
r.) For any A R
s
, > 0 set

r
A = inf

n=0
(diamA
n
)
r
: A
n

nN
is a sequence of subsets of R
s
covering A,
diamA
n
for every n N.
It is convenient in this context to say that diam = 0. Now set

r
A = sup
>0

r
A;

r
is r-dimensional Hausdor outer measure on R
s
.
264B Of course we must immediately check the following:
Lemma
r
, as dened in 264A, is always an outer measure.
proof You should be used to these arguments by now, but there is an extra step in this one, so I spell out the details.
(a) Interpreting the diameter of the empty set as 0, we have
r
= 0 for every > 0, so
r
= 0.
(b) If A B R
s
, then every sequence covering B also covers A, so
r
A
r
B for every and
r
A
r
B.
(c) Let A
n

nN
be a sequence of subsets of R
s
with union A, and take any a <
r
A. Then there is a > 0 such
that a
r
A. Now
r
A

n=0

r
(A
n
). PPP Let > 0, and for each n N choose a sequence A
nm

mN
of sets,
covering A, with diamA
nm
for every m and

m=0
(diamA
nm
)
r

r
+ 2
n
. Then A
nm

m,nN
is a cover of A
by countably many sets of diameter at most , so

r
A

n=0

m=0
(diamA
nm
)
r

n=0

r
A
n
+ 2
n
= 2 +

n=0

r
A
n
.
As is arbitrary, we have the result. QQQ
Accordingly
a
r
A

n=0

r
A
n

n=0

r
A
n
.
As a is arbitrary,

r
A

n=0

r
A
n
;
as A
n

nN
is arbitrary,
r
is an outer measure.
264C Denition If s 1 is an integer, and r > 0, then Hausdor r-dimensional measure on R
s
is the measure

Hr
on R
s
dened by Caratheodorys method from the outer measure
r
of 264A-264B.
264D Remarks (a) It is important to note that the sets used in the denition of the
r
need not be balls; even
in R
2
not every set A can be covered by a ball of the same diameter as A.
308 Change of variable in the integral 264Db
(b) In the denitions above I require r > 0. It is sometimes appropriate to take
H0
to be counting measure.
This is nearly the result of applying the formulae above with r = 0, but there can be diculties if we interpret them
over-literally.
(c) All Hausdor measures must be complete, because they are dened by Caratheodorys method (212A). For r > 0,
they are atomless (264Yg). In terms of the other criteria of 211, however, they are very ill-behaved; for instance, if
r, s are integers and 1 r < s, then
Hr
on R
s
is not semi-nite. (I will give a proof of this in 439H in Volume 4.)
Nevertheless, they do have some striking properties which make them reasonably tractable.
(d) In 264A, note that
r
A
r
A when 0 <

; consequently, for instance,


r
A = lim
n

r,2
nA. I have
allowed arbitrary sets A
n
in the covers, but it makes no dierence if we restrict our attention to covers consisting of
open sets or of closed sets (264Xc).
264E Theorem Let s 1 be an integer, and r 0; let
Hr
be Hausdor r-dimensional measure on R
s
, and
Hr
its domain. Then every Borel subset of R
s
belongs to
Hr
.
proof This is trivial if r = 0; so suppose henceforth that r > 0.
(a) The rst step is to note that if A, B are subsets of R
s
and > 0 is such that |xy| for all x A, y B, then

r
(AB) =
r
A+
r
B, where
r
is r-dimensional Hausdor outer measure on R
s
. PPP Of course
r
(AB)
r
A+
r
B,
because
r
is an outer measure. For the reverse inequality, we may suppose that
r
(AB) < , so that
r
A and
r
B
are both nite. Let > 0 and let
1
,
2
> 0 be such that

r
A+
r
B
r
1
A+
r
2
B +.
Set = min(
1
,
2
,
1
2
) > 0 and let A
n

nN
be a sequence of sets of diameter at most , covering AB, and such that

n=0
(diamA
n
)
r

r
(A B) +. Set
K = n : A
n
A ,= , L = n : A
n
B ,= .
Because
|x y| > diamA
n
whenever x A, y B and n N, K L = ; and of course A

nK
A
k
, B

nL
A
n
. Consequently

r
A+
r
B +
r
1
A+
r
2
B
+

nK
(diamA
n
)
r
+

nL
(diamA
n
)
r
+

n=0
(diamA
n
)
r
2 +
r
(A B) 2 +
r
(A B).
As is arbitrary,
r
(A B)
r
A+
r
B, as required. QQQ
(b) It follows that
r
A =
r
(A G) +
r
(A G) whenever A R
s
and G is open. PPP As usual, it is enough to
consider the case
r
A < and to show that in this case
r
(A G) +
r
(A G)
r
A. Set
A
n
= x : x A, |x y| 2
n
for every y A G,
B
0
= A
0
, B
n
= A
n
A
n1
for n > 1.
Observe that A
n
A
n+1
for every n and

nN
A
n
=

nN
B
n
= AG. The point is that if m, n N and n m+2,
and if x B
m
and y B
n
, then there is a z A G such that |y z| < 2
n+1
2
m1
, while |x z| must be at
least 2
m
, so |x y| |x z| |y z| 2
m1
. It follows that for any k 0

k
m=0

r
B
2m
=
r
(

mk
B
2m
)
r
(A G) < ,

k
m=0

r
B
2m+1
=
r
(

mk
B
2m+1
)
r
(A G) < ,
(inducing on k, using (a) above for the inductive step). Consequently

n=0

r
B
n
< .
But now, given > 0, there is an m such that

n=m

r
B
m
, so that
264G Hausdor measures 309

r
(A G) +
r
(A G)
r
A
m
+

n=m

r
B
n
+
r
(A G)
+
r
A
m
+
r
(A G) = +
r
(A
m
(A G))
(by (a) again, since |x y| 2
m
for x A
m
, y A G)
+
r
A.
As is arbitrary,
r
(A G) +
r
(A G)
r
A, as required. QQQ
(c) Part (b) shows exactly that open sets belong to
Hr
. It follows at once that the Borel -algebra of R
s
is included
in
Hr
, as claimed.
264F Proposition Let s 1 be an integer, and r > 0; let
r
be r-dimensional Hausdor outer measure on R
s
, and
write
Hr
for r-dimensional Hausdor measure on R
s
,
Hr
for its domain. Then
(a) for every A R
s
there is a Borel set E A such that
Hr
E =
r
A;
(b)
r
=

Hr
, the outer measure dened from
Hr
;
(c) if E
Hr
is expressible as a countable union of sets of nite measure, there are Borel sets E

, E

such that
E

E E

and
Hr
(E

) = 0.
proof (a) If
r
A = this is trivial take E = R
s
. Otherwise, for each n N choose a sequence A
nm

mN
of
sets of diameter at most 2
n
, covering A, and such that

m=0
(diamA
nm
)
r

r,2
nA + 2
n
. Set F
nm
= A
nm
,
E =

nN

mN
F
nm
; then E is a Borel set in R
s
. Of course
A

nN

mN
A
mn

nN

mN
F
nm
= E.
For any n N,
diamF
nm
= diamA
nm
2
n
for every m N,

m=0
(diamF
nm
)
r
=

m=0
(diamA
nm
)
r

r,2
nA+ 2
n
,
so

r,2
nE
r,2
nA+ 2
n
.
Letting n ,

r
E = lim
n

r,2
nE lim
n

r,2
nA+ 2
n
=
r
A;
of course it follows that
r
A =
r
E, because A E. Now by 264E we know that E
Hr
, so we can write
Hr
E in
place of
r
E.
(b) This follows at once, because we have

Hr
A = inf
Hr
E : E
Hr
, A E = inf
r
E : E
Hr
, A E
r
A
for every A R
s
. On the other hand, if A R
s
, we have a Borel set E A such that
r
A =
Hr
E, so that

Hr
A
Hr
E =
r
A.
(c)(i) Suppose rst that
Hr
E < . By (a), there are Borel sets E

E, H E

E such that
Hr
E

=
r
E,

Hr
H =
r
(E

E) =
Hr
(E

E) =
Hr
E


Hr
E =
Hr
E


r
E = 0.
So setting E

= E

H, we obtain a Borel set included in E, and

Hr
(E

)
Hr
H = 0.
(ii) For the general case, express E as

nN
E
n
where
Hr
E
n
< for each n; take Borel sets E

n
, E

n
such that
E

n
E
n
E

n
and
Hr
(E

n
E

n
) = 0 for each n; and set E

nN
E

n
, E

nN
E

n
.
264G Lipschitz functions The denition of Hausdor measure is exactly adapted to the following result, corre-
sponding to 262D.
310 Change of variable in the integral 264G
Proposition Let m, s 1 be integers, and : D R
s
a -Lipschitz function, where D is a subset of R
m
. Then for
any A D and r 0,

Hr
([A])
r

Hr
A
for every A D, writing
Hr
for r-dimensional Hausdor outer measure on either R
m
or R
s
.
proof (a) The case r = 0 is trivial, since then
r
= 1 and

Hr
A =
H0
A = #(A) if A is nite, otherwise, while
#([A]) #(A).
(b) If r > 0, then take any > 0. Set = /(1 +) and consider
r
: TR
m
[0, ], dened as in 264A. We know
from 264Fb that

Hr
A =
r
A
r
A,
so there is a sequence A
n

nN
of sets, all of diameter at most , covering A, with

n=0
(diamA
n
)
r

Hr
A+. Now
[A]

nN
[A
n
D] and
diam[A
n
D] diamA
n

for every n. Consequently

r
([A])

n=0
(diam[A
n
])
r

n=0

r
(diamA
n
)
r

r
(

Hr
A+),
and

Hr
([A]) = lim
0

r
([A])
r

Hr
A,
as claimed.
264H The next step is to relate r-dimensional Hausdor measure on R
r
to Lebesgue measure on R
r
. The basic
fact we need is the following, which is even more important for the idea in its proof than for the result.
Theorem Let r 1 be an integer, and A a bounded subset of R
r
; write
r
for Lebesgue measure on R
r
and d = diamA.
Then

r
(A)
r
B(0,
d
2
) = 2
r

r
d
r
,
where B(0,
d
2
) is the ball with centre 0 and diameter d, so that B(0, 1) is the unit ball in R
r
, and has measure

r
=
1
k!

k
if r = 2k is even,
=
2
2k+1
k!
(2k+1)!

k
if r = 2k + 1 is odd.
proof (a) For the calculation of
r
, see 252Q or 252Xi.
(b) The case r = 1 is elementary, for in this case A is included in an interval of length diamA, so that

1
A diamA.
So henceforth let us suppose that r 2.
(c) For 1 i r let S
i
: R
r
R
r
be reection in the ith coordinate, so that S
i
x = (
1
, . . . ,
i1
,
i
,
i+1
,
. . . ,
r
) for every x = (
1
, . . . ,
r
) R
r
. Let us say that a set C R
r
is symmetric in coordinates in J, where
J 1, . . . , r, if S
i
[C] = C for i J. Now the centre of the argument is the following fact: if C R is a bounded set
which is symmetric in coordinates in J, where J is a proper subset of 1, . . . , r, and j 1, . . . , r J, then there is
a set D, symmetric in coordinates in J j, such that diamD diamC and

r
C

r
D.
PPP (i) Because Lebesgue measure is invariant under permutation of coordinates, it is enough to deal with the case
j = r. Start by writing F = C, so that diamF = diamC and
r
F

r
C. Note that because S
i
is a homeomorphism
for every i,
S
i
[F] = S
i
[C] = S
i
[C] = C = F
for i J, and F is symmetric in coordinates in J.
For y = (
1
, . . . ,
r1
) R
r1
, set
F
y
= : (
1
, . . . ,
r1
, ) F, f(y) =
1
F
y
,
where
1
is Lebesgue measure on R. Set
264H Hausdor measures 311
D = (y, ) : y R
r1
, [[ <
1
2
f(y) R
r
.
(ii) If H R
r
is measurable and H D, then, writing
r1
for Lebesgue measure on R
r1
, we have

r
H =
_

1
: (y, ) H
r1
(dy)
(using 251N and 252D)

_

1
: (y, ) D
r1
(dy) =
_
f(y)
r1
(dy)
=
_

1
: (y, ) F
r1
(dy) =
r
F

r
C.
As H is arbitrary,

r
D

r
C.
(iii) The next step is to check that diamD diamC. If x, x

D, express them as (y,


r
) and (y

r
). Because
F is a bounded closed set in R
r
, F
y
and F
y
are bounded closed subsets of R. Also both f(y) and f(y

) must be greater
than 0, so that F
y
, F
y
are both non-empty. Consequently
= inf F
y
, = sup F
y
,

= inf F
y
,

= sup F
y

are all dened in R, and , F


y
, while

and

belong to F
y
. We have
[
r

r
[ [
r
[ +[

r
[ <
1
2
f(y) +
1
2
f(y

)
=
1
2
(
1
F
y
+
1
F
y
)
1
2
( +

) max(

).
So taking (,

) to be one of (,

) or (,

), we can nd F
y
,

F
y
such that [

[ [
r

r
[. Now z = (y, ),
z

= (y

) both belong to F, so
|x x

|
2
= |y y

|
2
+[
r

r
[
2
|y y

|
2
+[

[
2
= |z z

|
2
(diamF)
2
,
and |x x

| diamF. As x and x

are arbitrary, diamD diamF = diamC, as claimed.


(iv) Evidently S
r
[D] = D. Moreover, if i J, then (interpreting S
i
as an operator on R
r1
)
F
S
i
(y)
= F
y
for every y R
r1
,
so f(S
i
(y)) = f(y) and, for R, y R
r1
,
(y, ) D [[ <
1
2
f(y) [[ <
1
2
f(S
i
(y)) (S
i
(y), ) D,
so that S
i
[D] = D. Thus D is symmetric in coordinates in J r. QQQ
(d) The rest is easy. Starting from any bounded A R
r
, set A
0
= A and construct inductively A
1
, . . . , A
r
such
that
d = diamA = diamA
0
diamA
1
. . . diamA
r
,

r
A =

r
A
0
. . .

r
A
r
,
A
j
is symmetric in coordinates in 1, . . . , j for every j r.
At the end, we have A
r
symmetric in coordinates in 1, . . . , r. But this means that if x A
r
then
x = S
1
S
2
. . . S
r
x A
r
,
so that
|x| =
1
2
|x (x)|
1
2
diamA
r

d
2
.
Thus A
r
B(0,
d
2
), and

r
A

r
A
r

r
B(0,
d
2
),
as claimed.
312 Change of variable in the integral 264I
264I Theorem Let r 1 be an integer; let be Lebesgue measure on R
r
, and let
Hr
be r-dimensional Hausdor
measure on R
r
. Then and
Hr
have the same measurable sets and
E = 2
r

Hr
E
for every measurable set E R
r
, where
r
= B(0, 1), so that the normalizing factor is
2
r

r
=
1
2
2k
k!

k
if r = 2k is even,
=
k!
(2k+1)!

k
if r = 2k + 1 is odd.
proof (a) Of course if B = B(x, ) is any ball of radius ,
2
r

r
(diamB)
r
=
r

r
= B.
(b) The point is that

= 2
r

Hr
. PPP Let A R
r
.
(i) Let , > 0. By 261F, there is a sequence B
n

nN
of balls, all of diameter at most , such that A

nN
B
n
and

n=0
B
n

A+. Now, dening


r
as in 264A,
2
r

r
(A) 2
r

n=0
(diamB
n
)
r
=

n=0
B
n

A+.
Letting 0,
2
r

Hr
A

A+.
As is arbitrary, 2
r

Hr
A

A.
(ii) Let > 0. Then there is a sequence A
n

nN
of sets of diameter at most 1 such that A

nN
A
n
and

n=0
(diamA
n
)
r

r1
A+, so that

n=0

A
n

n=0
2
r

r
(diamA
n
)
r
2
r

r
(
r1
A+) 2
r

r
(

Hr
A+)
by 264H. As is arbitrary,

A 2
r

Hr
A. QQQ
(c) Because ,
Hr
are the measures dened from their respective outer measures by Caratheodorys method, it
follows at once that = 2
r

Hr
in the strict sense required.
*264J The Cantor set I remarked in 264A that fractional dimensions r were of interest. I have no space for
these here, and they are o the main lines of this volume, but I will give one result for its intrinsic interest.
Proposition Let C be the Cantor set in [0, 1]. Set r = ln 2/ ln 3. Then the r-dimensional Hausdor measure of C is 1.
proof (a) Recall that C =

nN
C
n
, where each C
n
consists of 2
n
closed intervals of length 3
n
, and C
n+1
is obtained
from C
n
by deleting the middle (open) third of each interval of C
n
. (See 134G.) Because C is closed,
Hr
C is dened
(264E). Note that 3
r
= 2.
(b) If > 0, take n such that 3
n
; then C can be covered by 2
n
intervals of diameter 3
n
, so

r
C 2
n
(3
n
)
r
= 1.
Consequently

Hr
C =

Hr
C = lim
0

r
C 1.
(c) We need the following elementary fact: if , , 0 and max(, ) , then
r
+
r
( + + )
r
. PPP
Because 0 < r 1,
( +)
r

r
= r
_

0
( +)
r1
d
is non-increasing for every 0. Consequently
( + +)
r

r
( + +)
r

r
( + +)
r

r
=
r
(3
r
2) = 0,
as required. QQQ
(d) Now suppose that I R is any interval, and m N; write j
m
(I) for the number of the intervals composing C
m
which are included in I. Then 2
m
j
m
(I) (diamI)
r
. PPP If I does not meet C
m
, this is trivial. Otherwise, induce on
264Xc Hausdor measures 313
l = mini : I meets only one of the intervals composing C
mi
.
If l = 0, so that I meets only one of the intervals composing C
m
, then j
m
(I) 1, and if j
m
(I) = 1 then diamI 3
m
so (diamI)
r
2
m
; thus the induction starts. For the inductive step to l > 1, let J be the interval of C
ml
which
meets I, and J

, J

the two intervals of C


ml+1
included in J, so that I meets both J

and J

, and
j
m
(I) = j
m
(I J) = j
m
(I J

) +j
m
(I J

).
By the inductive hypothesis,
(diam(I J

))
r
+ (diam(I J

))
r
2
m
j
m
(I J

) + 2
m
j
m
(I J

) = 2
m
j
m
(I).
On the other hand, by (c),
(diam(I J

))
r
+ (diam(I J

))
r
(diam(I J

) + 3
m+l1
+ diam(I J

))
r
= (diam(I J))
r
(diamI)
r
because J

, J

both have diameter at most 3


(ml+1)
, the length of the interval between them. Thus the induction
continues. QQQ
(e) Now suppose that > 0. Then there is a sequence A
n

nN
of sets, covering C, such that

n=0
(diamA
n
)
r
<

Hr
C +. Take
n
> 0 such that

n=0
(diamA
n
+
n
)
r

Hr
C +, and for each n take an open interval I
n
A
n
of
length at most diamA
n
+
n
and with neither endpoint belonging to C; this is possible because C does not include any
non-trivial interval. Now C

nN
I
n
; because C is compact, there is a k N such that C

nk
I
n
. Next, there is
an m N such that no endpoint of any I
n
, for n k, belongs to C
m
. Consequently each of the intervals composing
C
m
must be included in some I
n
, and (in the terminology of (d) above)

k
n=0
j
m
(I
n
) 2
m
. Accordingly
1

k
n=0
2
m
j
m
(I
n
)

k
n=0
(diamI
n
)
r

n=0
(diamA
n
+
n
)
r

Hr
C +.
As is arbitrary,
Hr
C 1, as required.
*264K General metric spaces While this chapter deals exclusively with Euclidean spaces, readers familiar with
the general theory of metric spaces may nd the nature of the theory clearer if they use the language of metric spaces
in the basic denitions and results. I therefore repeat the denition here, and spell out the corresponding results in
the exercises 264Yb-264Yl.
Let (X, ) be a metric space, and r > 0. For any A X, > 0 set

r
A = inf

n=0
(diamA
n
)
r
: A
n

nN
is a sequence of subsets of X covering A,
diamA
n
for every n N,
interpreting the diameter of the empty set as 0, and inf as , so that
r
A = if A cannot be covered by a sequence
of sets of diameter at most . Say that
r
A = sup
>0

r
A is the r-dimensional Hausdor outer measure of A, and
take the measure
Hr
dened by Caratheodorys method from this outer measure to be r-dimensional Hausdor
measure on X.
264X Basic exercises >>>(a) Show that all the functions
r
of 264A are outer measures. Show that in that context,

r
(A) = 0 i
r
(A) = 0, for any > 0 and any A R
s
.
(b) Let s 1 be an integer, and an outer measure on R
s
such that (A B) = A + B whenever A, B are
non-empty subsets of R
s
and inf
xA,yB
|x y| > 0. Show that every Borel subset of R
s
is measured by the measure
dened from by Caratheodorys method.
>>>(c) Let s 1 be an integer and r > 0; dene
r
as in 264A. Show that for any A R
s
, > 0,

r
A = inf

n=0
(diamF
n
)
r
: F
n

nN
is a sequence of closed subsets of X
covering A, diamF
n
for every n N
= inf

n=0
(diamG
n
)
r
: G
n

nN
is a sequence of open subsets of X
covering A, diamG
n
for every n N.
314 Change of variable in the integral 264Xd
>>>(d) Let s 1 be an integer and r 0; let
Hr
be r-dimensional Hausdor measure on R
s
. Show that for every
A R
s
there is a G

set (that is, a set expressible as the intersection of a sequence of open sets) H A such that

Hr
H =

Hr
A. (Hint: use 264Xc.)
>>>(e) Let s 1 be an integer, and 0 r < r

. Show that if A R
s
and the r-dimensional Hausdor outer measure

Hr
A of A is nite, then

Hr
A must be zero.
(f )(i) Suppose that f : [a, b] R has graph
f
R
2
, where a b in R. Show that the outer measure

H1
(
f
) of
for one-dimensional Hausdor measure on R
2
is at most b a + Var
[a,b]
(f). (Hint: if f has nite variation, show that
diam(
f]t,u[
) ut +Var
]t,u[
(f); then use 224E.) (ii) Let f : [0, 1] [0, 1] be the Cantor function (134H). Show that

H1
(Gamma
f
) = 2. (Hint: 264G.)
(g) In 264A, show that

r
A = inf

n=0
(diamA
n
)
r
: A
n

nN
is a sequence of convex sets covering A,
diamA
n
for every n N
for any A R
s
.
264Y Further exercises (a) Let
11
be the outer measure on R
2
dened in 264A, with r = = 1, and
11
the
measure derived from
11
by Caratheodorys method,
11
its domain. Show that any set in
11
is either negligible or
conegligible.
(b) Let (X, ) be a metric space and r 0. Show that if A X and

Hr
A < , then A is separable.
(c) Let (X, ) be a metric space, and an outer measure on X such that (A B) = A + B whenever A, B
are non-empty subsets of X and inf
xA,yB
(x, y) > 0. (Such an outer measure is called a metric outer measure.)
Show that every open subset of X is measured by the measure dened from by Caratheodorys method.
(d) Let (X, ) be a metric space and r > 0; dene
r
as in 264K. Show that for any A X,

Hr
A = sup
>0
inf

n=0
(diamF
n
)
r
: F
n

nN
is a sequence of closed subsets of X
covering A, diamF
n
for every n N
= sup
>0
inf

n=0
(diamG
n
)
r
: G
n

nN
is a sequence of open subsets of X
covering A, diamG
n
for every n N.
(e) Let (X, ) be a metric space and r 0; let
Hr
be r-dimensional Hausdor measure on X. Show that for every
A X there is a G

set H A such that


Hr
H =

Hr
A is the r-dimensional Hausdor outer measure of A.
(f ) Let (X, ) be a metric space and r 0; let Y be any subset of X, and give Y its induced metric
Y
. (i) Show
that the r-dimensional Hausdor outer measure
(Y )
Hr
on Y is just the restriction to TY of the outer measure

Hr
on
X. (ii) Show that if either

Hr
Y < or
Hr
measures Y then r-dimensional Hausdor measure
(Y )
Hr
on Y is just the
subspace measure on Y induced by the measure
Hr
on X.
(g) Let (X, ) be a metric space and r > 0. Show that r-dimensional Hausdor measure on X is atomless. (Hint:
Let E dom
Hr
. (i) If E is not separable, there is an open set G such that E G and E G are both non-separable,
therefore both non-negligible. (ii) If there is an x E such that
Hr
(E B(x, )) > 0 for every > 0, then one of
these sets has non-negligible complement in E. (iii) Otherwise,
Hr
E = 0.)
(h) Let (X, ) be a metric space and r 0; let
Hr
be r-dimensional Hausdor measure on X. Show that if

Hr
E < then
Hr
E = sup
Hr
F : F E is closed and totally bounded. (Hint: given > 0, use 264Yd to nd a
closed totally bounded set F such that
Hr
(F E) = 0 and
Hr
(E F) , and now apply 264Ye to F E.)
264 Notes Hausdor measures 315
(i) Let (X, ) be a complete metric space and r 0; let
Hr
be r-dimensional Hausdor measure on X. Show that
if
Hr
E < then
Hr
E = sup
Hr
F : F E is compact.
(j) Let (X, ) and (Y, ) be metric spaces. If D X and : D Y is a function, then is -Lipschitz if
((x), (x

)) (x, x

) for every x, x

D. (i) Show that in this case, if r 0,

Hr
([A])
r

Hr
A for every
A D, writing

Hr
for r-dimensional Hausdor outer measure on either X or Y . (ii) Show that if X is complete and

Hr
E is dened and nite, then
Hr
([E]) is dened. (Hint: 264Yi.)
(k) Let (X, ) be a metric space, and for r 0 let
Hr
be Hausdor r-dimensional measure on X. Show that there
is a unique = (X) [0, ] such that
Hr
X = if r [0, [, 0 if r ], [.
(l) Let (X, ) be a metric space and : I X a continuous function, where I R is an interval. Write
H1
for
one-dimensional Hausdor measure on X. Show that

H1
([I]) sup

n
i=1
((t
i
), (t
i1
)) : t
0
, . . . , t
n
I, t
0
. . . t
n
,
the length of the curve , with equality if is injective.
(m) Set r = ln 2/ ln 3, as in 264J, and write
Hr
for r-dimensional Hausdor measure on the Cantor set C. Let
be the usual measure on 0, 1
N
(254J). Dene : 0, 1
N
C by setting (x) =
2
3

n=0
3
n
x(n) for x 0, 1
N
.
Show that is an isomorphism between (0, 1
N
, ) and (C,
Hr
).
(n) Set r = ln 2/ ln 3 and write
Hr
for r-dimensional Hausdor measure on the Cantor set C. Let f : [0, 1] [0, 1]
be the Cantor function and let be Lebesgue measure on R. Show that f[E] =
Hr
E for every E dom
Hr
and

Hr
(C f
1
[F]) = F for every Lebesgue measurable set F [0, 1].
(o) Let (X, ) be a metric space and h : [0, [ [0, [ a non-decreasing function. For A X set

h
A = sup
>0
inf

n=0
h(diamA
n
) : A
n

nN
is a sequence of subsets of X
covering A, diamA
n
for every n N,
interpreting diam as 0, inf as as usual. Show that
h
is an outer measure on X. State and prove theorems
corresponding to 264E and 264F. Look through 264X and 264Y for further results which might be generalizable,
perhaps on the assumption that h is continuous on the right.
(p) Let (X, ) be a metric space. Let us say that if a < b in R and f : [a, b] X is a function, then f is absolutely
continuous if for every > 0 there is a > 0 such that

n
i=1
(f(a
i
), f(b
i
)) whenever a a
0
b
0
. . .
a
n
b
n
b and

n
i=0
b
i
a
i
. Show that f : [a, b] X is absolutely continuous i it is continuous and of
bounded variation (in the sense of 224Ye) and
H1
f[A] = 0 whenever A [a, b] is Lebesgue negligible, where
H1
is
1-dimensional Hausdor measure on X. (Compare 225M.) Show that in this case
H1
f[ [a, b] ] < .
(q) Let s 1 be an integer, and r [1, infty[. For x, y R
s
set (x, y) = |x y|
s/r
. (i) Show that is a metric
on R
s
inducing the Euclidean topology. (ii) Let
Hr
be the associated r-dimensional Hausdor measure. Show that

Hr
B(0, 1) = 2
s
.
264 Notes and comments In this section we have come to the next step in geometric measure theory. I am taking
this very slowly, because there are real diculties in the subject, and for the purposes of this volume we do not need
to master very much of it. The idea here is to nd a denition of r-dimensional Lebesgue measure which will be
geometric in the strict sense, that is, dependent only on the metric structure of R
r
, and therefore applicable to sets
which have a metric structure but no linear structure. As has happened before, the denition of Hausdor measure
from an outer measure gives no problems the only new idea in 264A-264C is that of using a supremum
r
= sup
>0

r
of outer measures and the dicult part is proving that our new measure has any useful properties. Concerning the
properties of Hausdor measure, there are two essential objectives; rst, to check that these measures, in general, share
a reasonable proportion of the properties of Lebesgue measure; and second, to justify the term r-dimensional measure
by relating Hausdor r-dimensional measure on R
r
to Lebesgue measure on R
r
.
As for the properties of general Hausdor measures, we have to go rather carefully. I do not give counter-examples
here because they involve concepts which belong to Volumes 4 and 5 rather than this volume, but I must warn you to
expect the worst. However, we do at least have open sets measurable, so that all Borel sets are measurable (264E).
316 Change of variable in the integral 264 Notes
The outer measure of a set A can be dened in terms of the Borel sets including A (264Fa), though not in general
in terms of the open sets including A; but the measure of a measurable set E is not necessarily the supremum of the
measures of the Borel sets included in E, unless E is of nite measure (264Fc). We do nd that the outer measure

r
dened in 264A is the outer measure dened from
Hr
(264Fb), so that the phrase r-dimensional Hausdor outer
measure is unambiguous. A crucial property of Lebesgue measure is the fact that the measure of a measurable set E
is the supremum of the measures of the compact subsets of E; this is not generally shared by Hausdor measures, but
is valid for sets E of nite measure in complete spaces (264Yi). Concerning subspaces, there are no problems with the
outer measures, and for sets of nite measure the subspace measures are also consistent (264Yf). Because Hausdor
measure is dened in metric terms, it behaves regularly for Lipschitz maps (264G); one of the most natural classes of
functions to consider when studying metric spaces is that of 1-Lipschitz functions, so that (in the language of 264G)

Hr
[A]

Hr
A for every A.
The second essential feature of Hausdor measure, its relation with Lebesgue measure in the appropriate dimension,
is Theorem 264I. Because both Hausdor measure and Lebesgue measure are translation-invariant, this can be proved
by relatively elementary means, except for the evaluation of the normalizing constant; all we need to know is that
[0, 1[
r
= 1 and
Hr
[0, 1[
r
are both nite and non-zero, and this is straightforward. (The arguments of part (a) of the
proof of 261F are relevant.) For the purposes of this chapter, we do not I think have to know the value of the constant;
but I cannot leave it unsettled, and therefore give Theorem 264H, the isodiametric inequality, to show that it is
just the Lebesgue measure of an r-dimensional ball of diameter 1, as one would hope and expect. The critical step
in the argument of 264H is in part (c) of the proof. This is called Steiner symmetrization; the idea is that given a
set A, we transform A through a series of steps, at each stage lowering, or at least not increasing, its diameter, and
raising, or at least not decreasing, its outer measure, progressively making A more symmetric, until at the end we have
a set which is suciently constrained to be amenable. The particular symmetrization operation used in this proof
is important enough; but the idea of progressive regularization of an object is one of the most powerful methods in
measure theory, and you should give all your attention to mastering any example you encounter. In my experience,
the idea is principally useful when seeking an inequality involving disparate quantities in the present example, the
diameter and volume of a set.
Of course it is awkward having two measures on R
r
, diering by a constant multiple, and for the purposes of the
next section it would actually have been a little more convenient to follow Federer 69 in using normalized Hausdor
measure 2
r

Hr
. (For non-integral r, we could take
r
=
r/2
/(1 +
r
2
), as suggested in 252Xi.) However, I believe
this to be a minority position, and the striking example of Hausdor measure on the Cantor set (264J, 264Ym-264Yn)
looks much better in the non-normalized version.
Hausdor (ln 2/ ln 3)-dimensional measure on the Cantor set is of course but one, perhaps the easiest, of a large
class of examples. Because the Hausdor r-dimensional outer measure of a set A, regarded as a function of r, behaves
dramatically (falling from to 0) at a certain critical value (A) (see 264Xe, 264Yk), it gives us a metric space
invariant of A; (A) is the Hausdor dimension of A. Evidently the Hausdor dimension of C is ln 2/ ln 3, while
that of r-dimensional Euclidean space is r.
265 Surface measures
In this section I oer a new version of the arguments of 263, this time not with the intention of justifying integration-
by-substitution, but instead to give a practically eective method of computing the Hausdor r-dimensional measure
of a smooth r-dimensional surface in an s-dimensional space. The basic case to bear in mind is r = 2, s = 3, though
any other combination which you can easily visualize will also be a valuable aid to intuition. I give a fundamental
theorem (265E) providing a formula from which we can hope to calculate the r-dimensional measure of a surface in
s-dimensional space which is parametrized by a dierentiable function, and work through some of the calculations in
the case of the r-sphere (265F-265H).
265A Normalized Hausdor measure As I remarked at the end of the last section, Hausdor measure, as
dened in 264A-264C, is not quite the most appropriate measure for our work here; so in this section I will use nor-
malized Hausdor measure, meaning
r
= 2
r

Hr
, where
Hr
is r-dimensional Hausdor measure (interpreted
in whichever space is under consideration) and
r
=
r
B(0, 1) is the Lebesgue measure of any ball of radius 1 in R
r
.
It will be convenient to take
0
= 1. As shown in 264H-264I, this normalization makes
r
on R
r
agree with Lebesgue
measure
r
. Observe that of course

r
= 2
r

Hr
(264Fb).
265B Linear subspaces Just as in 263, the rst step is to deal with linear operators.
265C Surface measures 317
Theorem Suppose that r, s are integers with 1 r s, and that T is a real sr matrix; regard T as a linear operator
from R
r
to R
s
. Set J =

det T

T, where T

is the transpose of T. Write


r
for normalized r-dimensional Hausdor
measure on R
s
, T
r
for its domain, and
r
for Lebesgue measure on R
r
. Then

r
T[E] = J
r
E
for every measurable set E R
r
. If T is injective (that is, if J ,= 0), then

r
F = J
r
T
1
[F]
whenever F T
r
and F T[R
r
].
proof The formula for J assumes that det T

T is non-negative, which is a fact not in evidence; but the argument


below will establish it adequately soon.
(a) Let V be the linear subspace of R
s
consisting of vectors y = (
1
, . . . ,
s
) such that
i
= 0 whenever r < i s.
Let R be the r s matrix
ij

ir,js
, where
ij
= 1 if i = j r, 0 otherwise; then the s r matrix R

may be
regarded as a bijection from R
r
to V . Let W be an r-dimensional linear subspace of R
s
including T[R
r
], and let P be
an orthogonal s s matrix such that P[W] = V . Then S = RPT is an r r matrix. We have R

Ry = y for y V , so
R

RPT = PT and
S

S = T

RPT = T

PT = T

T;
accordingly
det T

T = det S

S = (det S)
2
0
and J = [ det S[. At the same time,
P

S = P

RPT = P

PT = T.
Observe that J = 0 i S is not injective, that is, T is not injective.
(b) If we consider the s r matrix P

as a map from R
r
to R
s
, we see that = P

is an isometry between R
r
and W, with inverse
1
= RPW. It follows that is an isomorphism between the measure spaces (R
r
,
(r)
Hr
) and
(W,
(s)
HrW
), where
(r)
Hr
is r-dimensional Hausdor measure on R
r
and
(s)
HrW
is the subspace measure on W induced
by r-dimensional Hausdor measure
(s)
Hr
on R
s
.
PPP (i) If A R
r
, A

W,

(s)
Hr
([A])
(r)
Hr
(A),
(r)
Hr
(
1
[A

])
(s)
Hr
(A

),
using 264G twice. Thus
(s)
Hr
([A]) =
(r)
Hr
(A) for every A R
r
.
(ii) Now because W is closed, therefore in the domain of
(s)
Hr
(264E), the subspace measure
(s)
HrW
is just the
measure induced by
(s)
Hr
W by Caratheodorys method (214H(b-ii)). Because is an isomorphism between (R
r
,
(r)
Hr
)
and (W,
(s)
Hr
W), it is an isomorphism between (R
r
,
(r)
Hr
) and (W,
(s)
HrW
). QQQ
(c) It follows that is also an isomorphism between the normalized versions (R
r
,
r
) and (W,
rW
), writing
rW
for the subspace measure on W induced by
r
.
Now if E R
r
is Lebesgue measurable, we have
r
S[E] = J
r
E, by 263A; so that

r
T[E] =
r
(P

[S[E]]) =
r
([S[E]]) =
r
S[E] = J
r
E.
If T is injective, then S =
1
T must also be injective, so that J ,= 0 and

r
F =
r
(
1
[F]) = J
r
(S
1
[
1
[F]]) = J
r
T
1
[F]
whenever F T
r
and F W = T[R
r
].
265C Corollary Under the conditions of 265B,

r
T[A] = J

r
A
for every A R
r
.
proof (a) If E is Lebesgue measurable and A E, then T[A] T[E], so

r
T[A]
r
T[E] = J
r
E;
as E is arbitrary,

r
T[A] J

r
A.
318 Change of variable in the integral 265C
(b) If J = 0 we can stop. If J ,= 0 then T is injective, so if F T
r
and T[A] F we shall have
J

r
A J
r
T
1
[F W] =
r
(F W)
r
F;
as F is arbitrary, J

r
A

r
T[A].
265D I now proceed to the lemma corresponding to 263C.
Lemma Suppose that 1 r s and that T is an s r matrix; set J =

det T

T, and suppose that J ,= 0. Then for


any > 0 there is a = (T, ) > 0 such that
(i) [

det S

S J[ whenever S is an s r matrix and |S T| ;


(ii) whenever D R
r
is a bounded set and : D R
s
is a function such that |(x) (y) T(x y)| |x y|
for all x, y D, then [

r
[D] J

r
D[

r
D.
proof (a) Because det S

S is a continuous function of the coecients of S, 262Hb tells us that there must be a


0
> 0
such that [J

det S

S[ whenever |S T|
0
.
(b) Because J ,= 0, T is injective, and there is an r s matrix T

such that T

T is the identity r r matrix. Take


> 0 such that
0
, |T

| < 1, J(1 +|T

|)
r
J + and 1 J
1
(1 |T

|)
r
.
Let : D R
s
be such that |(x) (y) T(xy)| |xy| whenever x, y D. Set = T

, so that = T.
Then for u, v T[D]
|(u) (v)| (1 +|T

|)|u v|, |u v| (1 |T

|)
1
|(u) (v)|.
PPP Take x, y D such that u = Tx, v = Ty; of course x = T

u, y = T

v. Then
|(u) (v)| = |(T

u) (T

v)| = |(x) (y)|


|T(x y)| +|x y|
= |u v| +|T

u T

v| |u v|(1 +|T

|).
Next,
|u v| = |Tx Ty| |(x) (y)| +|x y|
= |(u) (v)| +|T

u T

v|
|(u) (v)| +|T

||u v|,
so that (1 |T

|)|u v| |(u) (v)| and |u v| (1 |T

|)
1
|(u) (v)|. QQQ
(c) Now from 264G and 265C we see that

r
[D] =

r
[T[D]] (1 +|T

|)
r

r
T[D] = (1 +|T

|)
r
J

r
D (J +)

r
D,
and (provided J)
(J )

r
D = (1 J
1
)

r
T[D] (1 J
1
)(1 |T

|)
r

r
[T[D]]

r
[T[D]] =

r
[D].
(Of course, if J, then surely (J )

r
D

r
[D].) Thus
(J )

r
D

r
[D] (J +)

r
D
as required, and we have an appropriate .
265E Theorem Suppose that 1 r s; write
r
for Lebesgue measure on R
r
,
r
for normalized Hausdor
measure on R
s
, and T
r
for the domain of
r
. Let D R
r
be any set, and : D R
s
a function dierentiable
relative to its domain at each point of D. For each x D let T(x) be a derivative of at x relative to D, and set
J(x) =
_
det T(x)

T(x). Set D

= x : x D, J(x) > 0. Then


(i) J : D [0, [ is a measurable function;
(ii)

r
[D]
_
D
J(x)
r
(dx),
allowing as the value of the integral;
(iii)

r
[D D

] = 0.
If D is Lebesgue measurable, then
(iv) [D] T
r
.
265E Surface measures 319
If D is measurable and is injective, then
(v)
r
[D] =
_
D
J d
r
;
(vi) for any set E [D], E T
r
i
1
[E] D

is Lebesgue measurable, and in this case

r
E =
_

1
[E]
J(x)
r
(dx) =
_
D
J (
1
[E])d
r
;
(vii) for every real-valued function g dened on a subset of [D],
_
[D]
g d
r
=
_
D
J gd
r
,
if either integral is dened in [, ], provided we interpret J(x)g((x)) as zero when J(x) = 0 and g((x)) is
undened.
proof I seek to follow the line laid out in the proof of 263D.
(a) Just as in 263D, we know that J : D R is measurable, since J(x) is a continuous function of the coecients of
T(x), all of which are measurable, by 262P. If D is Lebesgue measurable, then there is a sequence F
n

nN
of compact
subsets of D such that D

nN
F
n
is
r
-negligible. Now [F
n
] is compact, therefore belongs to T
r
, for each n N. As
for [D

nN
F
n
], this must be
r
-negligible by 264G, because is a countable union of Lipschitz functions (262N).
So
[D] =

nN
[F
n
] [D

nN
F
n
] T
r
.
This deals with (i) and (iv).
(b) For the moment, assume that D is bounded and that J(x) > 0 for every x D, and x > 0. Let M

sr
be the
set of s r matrices T such that det T

T ,= 0, that is, the corresponding map T : R


r
R
s
is injective. For T M

sr
take (T, ) > 0 as in 265D.
Take D
n

nN
, T
n

nN
as in 262M, with A = M

sr
, so that D
n

nN
is a partition of D into sets which are relatively
measurable in D, and each T
n
is an s r matrix such that
|T(x) T
n
| (T
n
, ) whenever x D
n
,
|(x) (y) T
n
(x y)| (T
n
, )|x y| for all x, y D
n
.
Then, setting J
n
=
_
det T

n
T
n
, we have
[J(x) J
n
[ for every x D
n
,
[

r
[D
n
] J
n

r
D
n
[

r
D
n
,
by the choice of (T
n
, ). So

r
[D]

n=0

r
[D
n
]
(because [D] =

nN
[D
n
])

n=0
J

r
D
n
+

r
D
n

r
D +

n=0
J
n

r
D
n
(because the D
n
are disjoint and relatively measurable in D)
=

r
D +
_
D

n=0
J
n
D
n
d

r
D +
_
D
J(x) +
r
(dx) = 2

r
D +
_
D
J d
r
.
If D is measurable and is injective, then all the D
n
are Lebesgue measurable subsets of R
r
, so all the [D
n
] are
measured by
r
, and they are also disjoint. Accordingly
_
D
J d

n=0
J
n

r
D
n
+
r
D

n=0
(
r
[D
n
] +
r
D
n
) +
r
D =
r
[D] + 2
r
D.
320 Change of variable in the integral 265E
Since is arbitrary, we get

r
[D]
_
D
J d
r
,
and if D is measurable and is injective,
_
D
J d
r

r
[D];
thus we have (ii) and (v), on the assumption that D is bounded and J > 0 everywhere on D.
(c) Just as in 263D, we can now relax the assumption that D is bounded by considering B
k
= B(0, k) R
r
;
provided J > 0 everywhere on D, we get

r
[D] = lim
k

r
[D B
k
] lim
k
_
DB
k
J d
r
=
_
D
J d
r
,
with equality if D is measurable and is injective.
(d) Now we nd that

r
[D D

] = 0.
PPP (i) Let ]0, 1]. Dene

: D R
s+r
by setting (x) = ((x), x), identifying R
s+r
with R
s
R
r
.

is
dierentiable relative to its domain at each point of D, with derivative

T

(x), being the (s +r) r matrix in which the


top s rows consist of the s r matrix T(x), and the bottom r rows are I
r
, writing I
r
for the r r identity matrix.
(Use 262Ib.) Now of course

T

(x), regarded as a map from R


r
to R
s+r
, is injective, so

(x) =
_
det

T

(x)

T

(x) =
_
det(T(x)

T(x) +
2
I) > 0.
We have lim
0

J

(x) = J(x) = 0 for x D D

.
(ii) Express T(x) as
ij
(x)
is,jr
for each x D. Set
C
m
= x : x D, |x| m, [
ij
(x)[ m for all i s, j r
for each m 1. For x C
m
, all the coecients of

T

(x) have moduli at most m; consequently (giving the crudest


and most immediately available inequalities) all the coecients of

T

(x)

(x) have moduli at most (r + s)m


2
and

(x)
_
r!(s +r)
r
m
r
. Consequently we can use Lebesgues Dominated Convergence Theorem to see that
lim
0
_
C
m
\D

d
r
= 0.
(iii) Let
r
be normalized Hausdor r-dimensional measure on R
s+r
. Applying (b) of this proof to

C
m
D

,
we see that

[C
m
D

]
_
C
m
\D

d
r
.
Now we have a natural map P : R
s+r
R
s
given by setting P(
1
, . . . ,
s+r
) = (
1
, . . . ,
s
), and P is 1-Lipschitz, so
by 264G we have (allowing for the normalizing constants 2
r

r
)

r
P[A]

r
A
for every A R
s+r
. In particular,

r
[C
m
D

] =

r
P[

[C
m
D

]]

[C
m
D

]
_
C
m
\D

d
r
0
as 0. But this means that

r
[C
m
D

] = 0. As D =

m1
C
m
,

r
[D D

] = 0, as claimed. QQQ
(d) This proves (iii) of the theorem. But of course this is enough to give (ii) and (v), because we must have

r
[D] =

r
[D

]
_
D

J d
r
=
_
D
J d
r
,
with equality if D is measurable and is injective.
(e) So let us turn to part (vi). Assume that D is measurable and that is injective.
(i) Suppose that E [D] belongs to T
r
. Let
H
k
= x : x D, |x| k, J(x) k
for each k; then each H
k
is Lebesgue measurable, so (applying (iii) to H
k
) [H
k
] T
r
, and

r
[H
k
] k
r
H
k
< .
265F Surface measures 321
Thus [D] can be covered by a sequence of sets of nite measure for
r
, which of course are of nite measure for
r-dimensional Hausdor measure on R
s
. By 264Fc, there are Borel sets E
1
, E
2
R
s
such that E
1
E E
2
and

r
(E
2
E
1
) = 0. Now F
1
=
1
[E
1
], F
2
=
1
[E
2
] are Lebesgue measurable subsets of D, and
_
F
2
\F
1
J d
r
=
r
[F
2
F
1
] =
r
([D] E
2
E
1
) = 0.
Accordingly
r
(D

(F
2
F
1
)) = 0. But as
D

F
1
D


1
[E] D

F
2
,
it follows that D


1
[E] is measurable, and that
_

1
[E]
J d
r
=
_
D

1
[E]
J d
r
=
_
D

F
1
J d
r
=
_
DF
1
J d
r
=
r
[D F
1
] =
r
E
1
=
r
E.
Moreover, J (
1
[E]) = J (D


1
[E]) is measurable, so we can write
_
J (
1
[E]) in place of
_

1
[E]
J.
(ii) If E [D] and D


1
[E] is measurable, then of course
E = [D


1
[E]] [(D D

)
1
[E]] T
r
,
because [G] T
r
for every measurable G D and [D D

] is
r
-negligible.
(f ) Finally, (vii) follows at once from (vi), applying 235J to
r
and the subspace measure induced by
r
on [D].
265F The surface of a sphere To show how these ideas can be applied to one of the basic cases, I give the details
of a method of describing spherical surface measure in s-dimensional space. Take r 1 and s = r + 1. Write S
r
for
z : z R
r+1
, |z| = 1, the r-sphere. Then we have a parametrization
r
of S
r
given by setting

r
_
_
_
_
_
_
_

2
. . .
. . .
. . .

r
_
_
_
_
_
_
_
=
_
_
_
_
_
_
_
_
_
sin
1
sin
2
sin
3
. . . sin
r
cos
1
sin
2
sin
3
. . . sin
r
cos
2
sin
3
. . . sin
r
. . .
cos
r2
sin
r1
sin
r
cos
r1
sin
r
cos
r
_
_
_
_
_
_
_
_
_
.
I choose this formulation because I wish to use an inductive argument based on the fact that

r+1
_
x

_
=
_
sin
r
(x)
cos
_
for x R
r
, R. Every
r
is dierentiable, by 262Id. If we set
D
r
= x :
1
], ] ,
2
, . . . ,
r
[0, ],
if
j
0, then
i
= 0 for i < j,
then it is easy to check that D
r
is a Borel subset of R
r
and that
r
D
r
is a bijection between D
r
and S
r
. Now let
T
r
(x) be the (r + 1) r matrix

r
(x). Then
T
r+1
_
x

_
=
_
sin T
r
(x) cos
r
(x)
0 sin
_
.
So
(T
r+1
_
x

_
)

T
r+1
_
x

_
=
_
sin
2
T
r
(x)

T
r
(x) sin cos T
r
(x)

r
(x)
cos sin
r
(x)

T
r
(x) cos
2

r
(x)

r
(x) + sin
2

_
.
But of course
r
(x)

r
(x) = |
r
(x)|
2
= 1 for every x, and (dierentiating with respect to each coordinate of x, if you
wish) T
r
(x)

r
(x) = 0,
r
(x)

T
r
(x) = 0. So we get
(T
r+1
_
x

_
)

T
r+1
_
x

_
=
_
sin
2
T
r
(x)

T
r
(x) 0
0 1
_
,
322 Change of variable in the integral 265F
and writing J
r
(x) =
_
det T
r
(x)

T
r
(x),
J
r+1
_
x

_
= [ sin
r
[J
r
(x).
At this point we induce on r to see that
J
r
(x) = [ sin
r1

r
sin
r2

r1
. . . sin
2
[
(since of course the induction starts with the case r = 1,
1
(x) =
_
sin x
cos x
_
, T
1
(x) =
_
cos x
sin x
_
, T
1
(x)

T
1
(x) = 1,
J
1
(x) = 1).
To nd the surface measure of S
r
, we need to calculate
_
D
r
J
r
d
r
=
_

0
. . .
_

0
_

sin
r1

r
. . . sin
2
d
1
d
2
. . . d
r
= 2
r

k=2
_

0
sin
k1
t dt = 2
r1

k=1
_
/2
/2
cos
k
t dt
(substituting

2
t for t). But in the language of 252Q, this is just
2

r1
k=1
I
k
= 2
r1
,
where
r1
is the volume of the unit ball of R
r1
(interpreting
0
as 1, if you like).
265G The surface area of a sphere can also be calculated through the following result.
Theorem Let
r+1
be Lebesgue measure on R
r+1
, and
r
normalized r-dimensional Hausdor measure on R
r+1
. If f
is a locally
r+1
-integrable real-valued function, y R
r+1
and > 0,
_
B(y,)
fd
r+1
=
_

0
_
B(y,t)
fd
r
dt,
where I write B(y, s) for the sphere x : |xy| = s and the integral
_
. . . dt is to be taken with respect to Lebesgue
measure on R.
proof Take any dierentiable function : R
r
S
r
with a Borel set F R
r
such that F is a bijection between F
and S
r
; such a pair (, F) is described in 265F. Dene : R
r
R R
r+1
by setting (z, t) = y + t(z); then is
dierentiable and F ]0, ] is a bijection between F ]0, ] and B(y, )y. For t ]0, ], z R
r
set
t
(z) = (z, t);
then
t
F is a bijection between F and the sphere x : |x y| = t = B(y, t).
The derivative of at z is an (r + 1) r matrix T
1
(z) say, and the derivative T
t
(z) of
t
at z is just tT
1
(z); also
the derivative of at (z, t) is the the (r +1) (r +1) matrix T(z, t) = ( tT
1
(z) (z) ), where (z) is interpreted as a
column vector. If we set
J
t
(z) =
_
det T
t
(z)

T
t
(z), J(z, t) = [ det T(z, t)[,
then
J(z, t)
2
= det T(z, t)

T(z, t) = det
_
tT
1
(z)

(z)

_
( tT
1
z (z) )
= det
_
t
2
T
1
(z)

T
1
(z) 0
0 1
_
= J
t
(z)
2
,
because when we come to calculate the (i, r + 1)-coecient of T(z, t)

T(z, t), for 1 i r, it is

r+1
j=1
t

i
(z)
j
(z) =
t
2

i
(

r+1
j=1

j
(z)
2
) = 0,
where
j
is the jth coordinate of ; while the (r + 1, r + 1)-coecient of T(z, t)

T(z, t) is just

r+1
j=1

j
(z)
2
= 1. So in
fact J(z, t) = J
t
(z) for all z R
r
, t > 0.
Now, given f L
1
(
r+1
), we can calculate
265Xe Surface measures 323
_
B(y,)
fd
r+1
=
_
B(y,)\{y}
fd
r+1
=
_
F]0,]
f((z, t))J(z, t)
r+1
(d(z, t))
(by 263D)
=
_

0
_
F
f(
t
(z))J
t
(z)
r
(dz)dt
(where
r
is Lebesgue measure on R
r
, by Fubinis theorem, 252B)
=
_

0
_
B(y,t)
fd
r
dt
by 265E(vii).
265H Corollary If
r
is normalized r-dimensional Hausdor measure on R
r+1
, then
r
S
r
= (r + 1)
r+1
.
proof In 265G, take y = 0, = 1, and f = B(0, 1); then

r+1
=
_
fd
r+1
=
_
1
0

r
(B(0, t))dt =
_
1
0
t
r

r
S
r
dt =
1
r+1

r
S
r
applying 264G to the maps x tx, x
1
t
x from R
r+1
to itself to see that
r
(B(0, t)) = t
r

r
S
r
for t > 0.
265X Basic exercises (a) Let r 1, and let S
r
() = z : z R
r+1
, |z| = be the r-sphere of radius . Show
that
r
S
r
() = 2
r1

r
= (r + 1)
r+1

r
for every 0.
>>>(b) Let r 1, and for a [1, 1] set C
a
= z : z R
r+1
, |z| = 1,
1
a, writing z = (
1
, . . . ,
r+1
) as usual.
Show that

r
C
a
= r
r
_
arccos a
0
sin
r1
t dt.
>>>(c) Again write C
a
= z : z S
r
,
1
a, where S
r
R
r+1
is the unit sphere. Show that, for any a ]0, 1],

r
C
a


r
S
r
2(r+1)a
2
. (Hint: calculate

r+1
i=1
_
S
r
|
i
|
2

r
(dx).)
>>>(d) Let : ]0, 1[ R
r
be an injective dierentiable function. Show that the length or one-dimensional Hausdor
measure of [ ]0, 1[ ] is just
_
1
0
|

(t)|dt.
(e)(i) Show that if I is the identity r r matrix and z R
r
, then det(I +zz

) = 1 +|z|
2
. (Hint: induce on r.) (ii)
Write U
r1
for the open unit ball in R
r1
, where r 2. Dene : U
r1
R S
r
by setting

_
x

_
=
_
_
x
(x) cos
(x) sin
_
_
,
where (x) =
_
1 |x|
2
. Show that

_
x

_
x

_
=
_
I +
1
(x)
2
xx

0
0 (x)
2
_
,
so that J
_
x

_
= 1 for all x U
r1
, R. (iii) Hence show that the normalized r-dimensional Hausdor measure of
y : y S
r
,

r1
i=1

2
i
< 1 is just 2
r1
, where
r1
is the Lebesgue measure of U
r1
. (iv) By considering z =
_
_
z
0
0
_
_
for z S
r2
, or otherwise, show that the normalized r-dimensional Hausdor measure of S
r
is 2
r1
. (v) Setting
C
a
= z : z R
r+1
, |z| = 1,
r
a, as in 265Xb and 265Xc, show that
r
C
a
= 2
r1
x : x R
r1
, |x| 1,
1

a for every a [1, 1].
324 Change of variable in the integral 265Y
265Y Further exercises (a) Take a < b in R. (i) Show that : [a, b] R
r
is absolutely continuous in the
sense of 264Yp i all its coordinates
i
: [a, b] R, for i r, are absolutely continuous in the sense of 225. (ii) Let
: [a, b] R
r
be a continuous function, and set F = x : x ]a, b[ , is dierentiable at x. Show that is absolutely
continuous i
_
F
|

(x)|dx is nite and


1
([[a, b] F]) = 0, where
1
is normalized Hausdor one-dimensional measure
on R
r
. (Hint: 225K.) (iii) Show that if : [a, b] R
r
is absolutely continuous then

1
([D])
_
D
|

(x)|dx for every


D [a, b], with equality if D is measurable and D is injective.
(b) Suppose that a b in R, and that f : [a, b] R is a continuous function of bounded variation with graph
f
.
Show that the one-dimensional Hausdor measure of
f
is Var
[a,b]
(f) +
_
b
a
(
_
1 + (f

)
2
[f

[). (Hint: set E = domf

and examine
fE
,
f

fE
separately; use 264Xf and/or 264Yl.)
265 Notes and comments The proof of 265B seems to call on most of the second half of the alphabet. The idea is
supposed to be straightforward enough. Because T[R
r
] has dimension at most r, it can be rotated by an orthogonal
transformation P into a subspace of the canonical r-dimensional subspace V , which is a natural copy of R
r
; the matrix
R represents the copying process from V to R
r
, and or P

is a copy of R
r
onto a subspace including T[R
r
]. All
this copying back and forth is designed to turn T into a linear operator S : R
r
R
r
to which we can apply 263A, and
part (b) of the proof is the check that we are copying the measures as well as the linear structures.
In 265D-265E I have tried to follow 263C-263D as closely as possible. In fact only one new idea is needed. When
s = r, we have a special argument available to show that

r
[D] J

r
D +

r
D (in the language of 263C) which
applies whether or not J = 0. When s > r, this approach fails, because we can no longer approximate
r
T[B] by
r
G
where G T[B] is open. (See part (b-i) of the proof of 263C.) I therefore turn to a dierent argument, valid only when
J > 0, and accordingly have to nd a separate method to show that (x) : x D, J(x) = 0 is
r
-negligible. Since
we are working without restrictions on the dimensions r, s except that r s, we can use the trick of approximating
: D R
s
by

: D R
s+r
, as in part (d) of the proof of 265E.
I give three methods by which the area of the r-sphere can be calculated; a bare-hands approach (265F), the
surrounding-cylinder method (265Xe) and an important repeated-integral theorem (265G). The rst two provide for-
mulae for the area of a cap (265Xb, 265Xe(v)). The surrounding-cylinder method is attractive because the Jacobian
comes out to be 1, that is, we have an inverse-measure-preserving function. I note that despite having developed a
technique which allows irregular domains, I am still forced by the singularity in the function of 265Xe to take the
sphere in two bites. Theorem 265G is a special case of the Coarea Theorem (Evans & Gariepy 92, 3.4; Federer
69, 3.2.12).
For the next step in the geometric theory of measures on Euclidean space, see Chapter 47 in Volume 4.
*266 The Brunn-Minkowski inequality
We now have most of the essential ingredients for a proof of the Brunn-Minkowski inequality (266C) in a strong
form. I do not at present expect to use it in this treatise, but it is one of the basic results of geometric measure theory
and from where we now stand is not dicult, so I include it here. The preliminary results on arithmetic and geometric
means (266A) and essential closures (266B) are of great importance for other reasons.
266A Arithmetic and geometric means We shall need the following standard result.
Proposition If u
0
, . . . , u
n
, p
0
, . . . , p
n
[0, [ and

n
i=0
p
i
= 1, then

n
i=0
u
p
i
i

n
i=0
p
i
u
i
.
proof Induce on n. For n = 0, p
0
= 1 the result is trivial. If n = 1, then if u
1
= 0 the result is trivial (even if, as is
standard in this book, we interpret 0
0
as 1). Otherwise, set t =
u
0
u
1
; then
t
p
0
p
0
t + 1 p
0
= p
0
t +p
1
(as in part (a) of the proof of 244E), so
u
p
0
0
u
p
1
1
= t
p
0
u
1
p
0
tu
1
+p
1
u
1
= p
0
u
0
+p
1
u
1
.
For the inductive step to n 2, if p
0
= . . . = p
n1
= 0 the result is trivial. Otherwise, set q = p
0
+. . . +p
n1
= 1 p
n
;
then
n

i=0
u
p
i
i
= (
n1

i=0
u
p
i
/q
i
)
q
u
p
n
n
(
n1

i=0
p
i
q
u
i
)
q
u
p
n
n
(by the inductive hypothesis)
266B The Brunn-Minkowski inequality 325
q(
n1

i=0
p
i
q
u
i
) +p
n
u
n
(by the two-term case just examined)
=
n

i=0
p
i
u
i
,
and the induction continues.
266B Proposition For any set D R
r
set
cl*D = x : limsup
0

(DB(x,))
B(x,)
> 0,
where is Lebesgue measure on R
r
.
(a) D cl*D is negligible.
(b) cl*D D.
(c) cl*D is a Borel set.
(d) (cl*D) =

D.
(e) If C R then C + cl*D cl*(C +D), writing C +D for x +y : x C, y D.
proof (a) 261Da.
(b) If x R
r
D then D B(x, ) = for all small .
(c) The point is just that (x, )

(D B(x, )) is continuous. PPP For any x, y R


r
and , 0 we have
[

(D B(y, ))

(D B(x, ))[ (B(y, )B(x, ))


= 2(B(x, ) B(y, )) B(x, ) B(y, )

r
_
2(max(, ) +|x y|)
r

r
_
(where
r
= B(0, 1))
0
as (y, ) (x, ). QQQ So
x limsup
0

(DB(x,))
B(x,)
= inf
Q,>0
sup
Q,0<
1

(D B(x, ))
is Borel measurable, and
cl*D = x : limsup
0

(DB(x,))
B(x,)
> 0
is a Borel set.
(d) By (c), (cl*D) is dened; by (a), (cl*D)

D. On the other hand, let E be a measurable envelope of D


(132Ee); then 261Db tells us that
limsup
0

(DB(x,))
B(x,)
limsup
0
(EB(x,))
B(x,)
= 0
for almost every x R
r
E, so cl*D E is negligible and
(cl*D) E =

D.
(e) If x C and y cl*D, set
=
1
3
limsup
0

(DB(y,))
B(y,)
> 0.
For any > 0, there is a ]0, ] such that

(D B(y, )) 2B(x, ). Let


1
[0, [ be such that
r

r
1

r
.
Then there is an x

C such that |x x

|
1
. In this case,

((C +D) B(x +y, ))

((x

+D) B(x

+y,
1
)) =

(D B(y,
1
))

(D B(y, )) B(y, ) +B(y,


1
)
2
r

r
+
r

r
1

r

r
.
326 Change of variable in the integral 266B
As is arbitrary,
limsup
0

((C+D)B(x+y,))
B(y,)

and x +y cl*(C +D); as x and y are arbitrary, C + cl*D cl*(C +D).
Remark In this context, cl*D is called the essential closure of D.
266C Theorem Let A, B R
r
be non-empty sets, where r 1 is an integer. If is Lebesgue measure on R
r
, and
A+B = x +y : x A, y B, then

(A+B)
1/r
(

A)
1/r
+ (

B)
1/r
.
proof (a) Consider rst the case in which A = [a, a

[ and B = [b, b

[ are half-open intervals. In this case A + B =


[a +b, a

+b

[; writing a = (
1
, . . . ,
r
), etc., as in 115, set
u
i
=

i
+

i
, v
i
=

i
+

i
for each i. Then we have
(A)
1/r
+ (B)
1/r
=
r

i=0
(

i

i
)
1/r
+
r

i=0
(

i

i
)
1/r
= (A+B)
1/r
(
r

i=1
u
1/r
i
+
r

i=1
v
1/r
i
)
(A+B)
1/r
(
1
r
r

i=1
u
i
+
1
r
r

i=1
v
i
)
(266A)
= (A+B)
1/r
.
(b) Now I show by induction on m + n that if A =

m
j=0
A
j
and B =

n
j=0
B
j
, where A
j

jm
and B
j

jn
are
both disjoint families of non-empty half-open intervals, then (A+B)
1/r
(A)
1/r
+(B)
1/r
. PPP The induction starts
with the case m = n = 0, dealt with in (a). For the inductive step to m + n = l 1, one of m, n is non-zero; the
argument is the same in both cases; suppose the former. Since A
0
A
1
= , there must be some j r and R such
that A
0
and A
1
are separated by the hyperplane x :
j
= . Set A

= x : x A,
j
< and A

= x : x A,

j
; then both A

and A

are non-empty and can be expressed as the union of at most m 1 disjoint half-open
intervals. Set =
A

A
]0, 1[. The function x : x B,
j
< is continuous, so there is a R such that
B

= B, where B

= x : x B,
j
< ; set B

= B B. Then B

and B

can be expressed as unions of at most


n half-open intervals. By the inductive hypothesis,
(A

+B

)
1/r
(A

)
1/r
+ (B

)
1/r
, (A

+B

)
1/r
(A

)
1/r
+ (B

)
1/r
.
Now A

+B

x :
j
< +, while A

+B

x :
j
+. So
(A+B) (A

+B

) +(A

+B

_
(A

)
1/r
+ (B

)
1/r
_
r
+
_
(A

)
1/r
+ (B

)
1/r
_
r
=
_
(A)
1/r
+ (B)
1/r
_
r
+
_
((1 )A)
1/r
+ ((1 )B)
1/r
_
r
= (
_
A)
1/r
+ (B)
1/r
_
r
.
Taking rth roots, (A+B)
1/r
(A)
1/r
+ (B)
1/r
and the induction proceeds. QQQ
(c) Now suppose that A and B are compact non-empty subsets of R
r
. Then (A+B)
1/r
(A)
1/r
+ (B)
1/r
. PPP
A+B is compact (because AB R
r
R
r
is compact, being closed and bounded, and addition is continuous, so we
can use 2A2Eb). Let > 0. Let G A+B be an open set such that G (A+B) + (134Fa); then there is a > 0
such that B(x, 2) G for every x A+B (2A2Ed). Let n N be such that 2
n

r , and let A
1
be the union of
all the half-open intervals of the form [2
n
z, 2
n
z + 2
n
e[ which meet A, where z Z
r
and e = (1, 1, . . . , 1). Then A
1
is a nite disjoint union of half-open intervals, A A
1
and every point of A
1
is within a distance of some point of A.
Similarly, we can nd a set B
1
, a nite disjoint union of half-open intervals, including B and such that every point of
266 Notes The Brunn-Minkowski inequality 327
B
1
is within of some point of B. But this means that every point of A
1
+ B
1
is within a distance 2 of some point
of A+B, and belongs to G. Accordingly
((A+B) +)
1/r
(G)
1/r
(A
1
+B
1
)
1/r
(A
1
)
1/r
+ (B
1
)
1/r
(by (b))
(A)
1/r
+ (B)
1/r
.
As is arbitrary, (A+B)
1/r
(A)
1/r
+ (B)
1/r
. QQQ
(d) Next suppose that A, B R
r
are Lebesgue measurable. Then
(A)
1/r
+ (B)
1/r
= sup(K)
1/r
+ (L)
1/r
: K A and L B are compact
(134Fb)
sup(K +L)
1/r
: K A and L B are compact
(by (c))

(A+B)
1/r
.
(e) For the penultimate step, suppose that A, B R
r
have non-zero outer Lebesgue measure. Consider cl*A, cl*B
and cl*(A + B) as dened in 266B. Then cl*A and cl*B are non-empty and their sum is included in cl*(A + B), by
266Bb and 266Be. So we have
(

A)
1/r
+ (

B)
1/r
= (cl*A)
1/r
+(cl*B)
1/r
(266Bd)

(cl*A+ cl*B)
1/r
(by (d) here)
(cl*(A+B))
1/r
=

(A+B)
1/r
.
(f ) Finally, for arbitrary non-empty sets A, B R
r
, note that if (for instance) A is negligible then we can take any
x A and see that

(A+B)
1/r

(x +B)
1/r
= (B)
1/r
= (

A)
1/r
+ (

B)
1/r
,
and the result is similarly trivial if B is negligible. So all cases are covered.
266X Basic exercises (a) Let D, D

be subsets of R
r
. Show that (i) cl*(DD

) = cl*Dcl*D

(ii) cl*D = cl*D

i D and D

have a common measurable envelope (iii) cl*Dcl*(R


r
D

) cl*(DD

) (iv) D is Lebesgue measurable


i cl*D cl*(R
r
D) is Lebesgue negligible (v) D cl*D is a measurable envelope of D (vi) cl*(cl*D) = cl*D.
(b) Show that, for a measurable set E R, cl*E is just the set of real numbers which are not density points of
R E.
(c) In 266C, show that if A and B are similar convex sets in the same orientation then A+B is a convex set similar
to both and (A+B)
1/r
= (A)
1/r
+ (B)
1/r
.
266 Notes and comments The proof of 266C is taken from Federer 69. There is a slightly specious generality
in the form given here. If the sets A and B are at all irregular, then

(A + B)
1/r
is likely to be much greater than
(

A)
1/r
+(

B)
1/r
. The critical case, in which A and B are similar convex sets, is much easier (266Xc). The theorem
is therefore most useful when A and B are non-similar convex sets and we get a non-trivial estimate which may be
hard to establish by other means. For this case we do not need 266B. Theorem 266C is an instructive example of the
way in which the dimension r enters formulae when we seek results applying to general Euclidean spaces. There will
be many more when I return to geometric measure theory in Chapter 47 of Volume 4.
328 Probability theory
Chapter 27
Probability theory
Lebesgue created his theory of integration in response to a number of problems in real analysis, and all his life seems
to have thought of it as a tool for use in geometry and calculus (Lebesgue 72, vols. 1 and 2). Remarkably, it turned
out, when suitably adapted, to provide a solid foundation for probability theory. The development of this approach
is generally associated with the name of Kolmogorov. It has so come to dominate modern abstract probability theory
that many authors ignore all other methods. I do not propose to commit myself to any view on whether -additive
measures are the only way to give a rigorous foundation to probability theory, or whether they are adequate to deal
with all probabilistic ideas; there are some serious philosophical questions here, since probability theory, at least in its
applied aspects, seeks to help us to understand the material world outside mathematics. But from my position as a
measure theorist, it is incontrovertible that probability theory is among the central applications of the concepts and
theorems of measure theory, and is one of the most vital sources of new ideas; and that every measure theorist must
be alert to the intuitions which probabilistic methods can provide.
I have written the preceding paragraph in terms suggesting that probability theory is somehow distinguishable
from the rest of measure theory; this is another point on which I should prefer not to put forward any opinion as
denitive. But undoubtedly there is a distinction, rather deeper than the elementary point that probability deals
(almost) exclusively with spaces of measure 1. M.Lo`eve argues persuasively (Lo` eve 77, 10.2) that the essence of
probability theory is the articial nature of the probability spaces themselves. In measure theory, when we wish to
integrate a function, we usually feel that we have a proper function with a domain and values. In probability theory,
when we take the expectation of a random variable, the variable is an observable or the result of an experiment;
we are generally uncertain, or ignorant, or indierent concerning the factors underlying the variable. Let me give an
example from the theorems below. In the proof of the Central Limit Theorem (274F), I nd that I need an auxiliary
list Z
0
, . . . , Z
n
of random variables, independent of each other and of the original sequence X
0
, . . . , X
n
. I create such
a sequence by taking a product space

, and writing X

i
(,

) = X
i
(), while the Z
i
are functions of

. Now the
dierence between the X
i
and the X

i
is of a type which a well-trained analyst would ordinarily take seriously. We do
not think that the function x x
2
: [0, 1] [0, 1] is the same thing as the function (x
1
, x
2
) x
2
1
: [0, 1]
2
[0, 1]. But
a probabilist is likely to feel that it is positively pedantic to start writing X

i
instead of X
i
. He did not believe in the
space in the rst place, and if it turns out to be inadequate for his intuition he enlarges it without a qualm. Lo`eve
calls probability spaces ctions, inventions of the imagination in Larousses words; they are necessary in the models
Kolmogorov has taught us to use, but we have a vast amount of freedom in choosing them, and in their essence they
are nothing so denite as a set with points.
A probability space, therefore, is somehow a more shadowy entity in probability theory than it is in measure theory.
The important objects in probability theory are random variables and distributions, particularly joint distributions.
In this volume I shall deal exclusively with random variables which can be thought of as taking values in some power
of R; but this is not the central point. What is vital is that somehow the codomain, the potential set of values, of a
random variable, is much better dened than its domain. Consequently our attention is focused not on any features
of the articial space which it is convenient to use as the underlying probability space I write underlying, though it
is the most supercial and easily changed aspect of the model but on the distribution on the codomain induced by
the random variable. Thus the Central Limit Theorem, which speaks only of distributions, is actually more important
in applied probability than the Strong Law of Large Numbers, which claims to tell us what a long-term average will
almost certainly be.
W.Feller (Feller 66) goes even farther than Lo`eve, and as far as possible works entirely with distributions, setting
up machinery which enables him to go for long stretches without mentioning probability spaces at all. I make no
attempt to emulate him. But the approach is instructive and faithful to the essence of the subject.
Probability theory includes more mathematics than can easily be encompassed in a lifetime, and I have selected for
this introductory chapter the two limit theorems I have already mentioned, the Strong Law of Large Numbers and the
Central Limit Theorem, together with some material on martingales (275-276). They illustrate not only the special
character of probability theory so that you will be able to form your own judgement on the remarks above but also
some of its chief contributions to pure measure theory, the concepts of independence and conditional expectation.
271Ad Distributions 329
271 Distributions
I start this chapter with a discussion of probability distributions, the probability measures on R
n
dened by families
(X
1
, . . . , X
n
) of random variables. I give the basic results describing the circumstances under which two distributions
are equal (271G), integration with respect to a distribution (271E), and probability density functions (271H-271K).
271A Notation I have just spent some paragraphs on an attempt to describe the essential dierence between
probability theory and measure theory. But there is a quicker test by which you may discover whether your author is a
measure theorist or a probabilist: open any page, and look for the phrases measurable function and random variable,
and the formulae
_
fd and E(X). The rst member of each pair will enable you to diagnose measure and the second
probability, with little danger of error. So far in this treatise I have rmly used measure theorists terminology, with
a few individual quirks. But in a chapter on probability theory I nd that measure-theoretic notation, while perfectly
adequate in a formal sense, does such violence to the familiar formulations as to render them unnatural. Moreover,
you must surely at some point if you have not already done so become familiar with probabilists language. So in
this chapter I will make a substantial step in that direction. Happily, I think that this can be done without setting up
any direct conicts, so that I shall be able, in later volumes, to call upon this work in whichever notation then seems
appropriate, without needing to re-formulate it.
(a) So let (, , ) be a probability space. I take the opportunity given by a new phrase to make a technical move.
A real-valued random variable on will be a member of L
0
(), as dened in 241A; that is, a real-valued function
X dened on a conegligible subset of such that X is measurable with respect to the completion of , or, if you
prefer, such that XE is -measurable for some conegligible set E .
1
(b) If X is a real-valued random variable on a probability space (, , ), write E(X) =
_
X d if this is dened in
[, ] in the sense of Chapter 12 and 133. In this case I will call E(X) the mean or expectation of X. Thus we
may say that X has a nite expectation in place of X is integrable. 133A says that E(X + Y ) = E(X) +E(Y )
whenever E(X) and E(Y ) and their sum are dened in [, ], and 122P becomes a real-valued random variable X
has a nite expectation i E([X[) < .
(c) If X is a real-valued random variable with nite expectation, the variance of X is
Var(X) = E(X E(X))
2
= E(X
2
2E(X)X +E(X)
2
) = E(X
2
) (E(X))
2
(Note that this formula shows that E(X)
2
E(X
2
); compare 244Xd(i).) Var(X) is nite i E(X
2
) < , that is, i
X L
2
() (244A). In particular, X +Y and cX have nite variance whenever X and Y do and c R.
(d) I shall allow myself to use such formulae as
Pr(X > a), Pr(X Y X +),
where X and Y are random variables on the same probability space (, , ), to mean respectively
: domX, X() > a,
: domX domY, X() Y () X() +,
writing for the completion of as usual. There are two points to note here. First, Pr depends on , not on ; in
eect, the notation automatically directs us to complete the probability space (, , ). I could, of course, equally well
write
Pr(X
2
+Y
2
> 1) =

: domX domY, X()


2
+Y ()
2
> 1,
taking

to be the outer measure on associated with (132B). Secondly, I will use this notation only for predicates
corresponding to Borel measurable sets; that is to say, I shall write
Pr((X
1
, . . . , X
n
)) = :

in
domX
i
, (X
1
(), . . . , X
n
())
only when the set
(
1
, . . . ,
n
) : (
1
, . . . ,
n
)
is a Borel set in R
n
. Part of the reason for this restriction will appear in the next few paragraphs; Pr((X
1
, . . . , X
n
))
must be something calculable from knowledge of the joint distribution of X
1
, . . . , X
n
, as dened in 271C. In fact we
1
For an account of how this terminology became standard, see http://www.dartmouth.edu/

chance/Doob/conversation.html.
330 Probability theory 271Ad
can safely extend the idea to universally measurable predicates , to be discussed in Volume 4. But it could happen
that gave a measure to a set of the form : X() A for some exceedingly irregular set A, and in such a case it
would be prudent to regard this as an accidental pathology of the probability space, and to treat it in a rather dierent
way.
(I see that I have rather glibly assumed that the formula above denes Pr((X
1
, . . . , X
n
)) for every Borel predicate
. This is a consequence of 271Bb below.)
271B Theorem Let (, , ) be a probability space, and X
1
, . . . , X
n
real-valued random variables on . Set
XXX() = (X
1
(), . . . , X
n
()) for

in
domX
i
.
(a) There is a unique Radon measure on R
n
such that
], a] = Pr(X
i

i
for every i n)
whenever a = (
1
, . . . ,
n
) R
n
, writing ], a] for

in
],
i
];
(b) R
n
= 1 and E = (XXX
1
[E]) whenever E is dened, where is the completion of ; in particular, E =
Pr((X
1
, . . . , X
n
) E) for every Borel set E R
n
.
proof Let

be the domain of , and set D =

in
domX
i
= domXXX; then D is conegligible, so belongs to

. Let

D
= TD be the subspace measure on D (131B, 214B), and
0
the image measure
D
XXX
1
(234D); let T be the
domain of
0
.
Write B for the algebra of Borel sets in R
n
. Then B T. PPP For i n, R set F
i
= x : x R
n
,
i
,
H
i
= : domX
i
, X
i
() . X
i
is

-measurable and its domain is in

, so H
i


, and XXX
1
[F
i
] = DH
i
is
D
-measurable. Thus F
i
T. As T is a -algebra of subsets of R
n
, B T (121J). QQQ
Accordingly
0
B is a measure on R
n
with domain B; of course
0
R
n
= D = 1. By 256C, the completion of
0
B
is a Radon measure on R
n
, and R
n
=
0
R
n
= 1.
For E B,
E =
0
E =
D
XXX
1
[E] = XXX
1
[E] = Pr((X
1
, . . . , X
n
) E).
More generally, if E dom, then there are Borel sets E

, E

such that E

E E

and (E

) = 0, so that
XXX
1
[E

] XXX
1
[E] XXX
1
[E

] and (XXX
1
[E

] XXX
1
[E

]) = 0. This means that XXX


1
[E]

and
XXX
1
[E] = XXX
1
[E

] = E

= E.
As for the uniqueness of , if

is any Radon measure on R


n
such that

], a] = Pr(X
i

i
i n) for every
a R
n
, then surely

R
n
= lim
k

], k1] = lim
k
], k1] = 1 = R
n
.
Also J = ], a] : a R
n
is closed under nite intersections, and and

agree on J. By the Monotone Class


Theorem (or rather, its corollary 136C), and

agree on the -algebra generated by J, which is B (121J), and are


identical (256D).
271C Denition Let (, , ) be a probability space and X
1
, . . . , X
n
real-valued random variables on . By the
(joint) distribution or law
XXX
of the family XXX = (X
1
, . . . , X
n
) I shall mean the Radon probability measure of
271B. If we think of XXX as a function from

in
domX
i
to R
n
, then
XXX
E = Pr(XXX E) for every Borel set E R
n
.
271D Remarks (a) The choice of the Radon probability measure
XXX
as the distribution of XXX, with the insistence
that Radon measures should be complete, is of course somewhat arbitrary. Apart from the general principle that one
should always complete measures, these conventions t better with some of the work in Volume 4 and with such results
as 272G below.
(b) Observe that in order to speak of the distribution of a family XXX = (X
1
, . . . , X
n
) of random variables, it is
essential that all the X
i
should be based on the same probability space.
(c) I see that the language I have chosen allows the X
i
to have dierent domains, so that the family (X
1
, . . . , X
n
)
may not be exactly identiable with the corresponding function from

in
domX
i
to R
n
. I hope however that using
the same symbol XXX for both will cause no confusion.
(d) It is not useful to think of the whole image measure
0
=
D
XXX
1
in the proof of 271B as the distribution of XXX,
unless it happens to be equal to =
XXX
. The distribution of a random variable is exactly that aspect of it which can
be divorced from any consideration of the underlying space (, , ), and the point of such results as 271K and 272G
271E Distributions 331
is that distributions can be calculated from each other, without going back to the relatively uid and uncertain model
of a random variable in terms of a function on a probability space.
(e) If XXX = (X
1
, . . . , X
n
) and YYY = (Y
1
, . . . , Y
n
) are such that X
i
=
a.e.
Y
i
for each i, then
:

in
domX
i
, X
i
()
i
i n :

in
domY
i
, Y
i
()
i
i n
is negligible, so
Pr(X
i

i
i n) = :

in
domX
i
, X
i
()
i
i n
= Pr(Y
i

i
i n)
for all
0
, . . . ,
n
R, and
XXX
=
YYY
. This means that we can, if we wish, think of a distribution as a measure
uuu
where uuu = (u
0
, . . . , u
n
) is a nite sequence in L
0
(). In the present chapter I shall not emphasize this approach, but
it will always be at the back of my mind.
271E Measurable functions of random variables: Proposition Let XXX = (X
1
, . . . , X
n
) be a family of random
variables (as always in such a context, I mean them all to be on the same probability space (, , )); write T
XXX
for
the domain of the distribution
XXX
, and let h be a T
XXX
-measurable real-valued function dened
XXX
-a.e. on R
n
. Then we
have a random variable Y = h(X
1
, . . . , X
n
) dened by setting
h(X
1
, . . . , X
n
)() = h(X
1
(), . . . , X
n
()) for every XXX
1
[domh].
The distribution
Y
of Y is the measure on R dened by the formula

Y
F =
XXX
h
1
[F]
for just those sets F R such that h
1
[F] T
XXX
. Also
E(Y ) =
_
hd
XXX
in the sense that if one of these exists in [, ], so does the other, and they are then equal.
proof (a)(i) Once again, write (,

, ) for the completion of (, , ). Since
domY

in
( domX
i
) XXX
1
[R
n
domh]
is negligible (using 271Bb), domY is conegligible. If a R, then
E = x : x domh, h(x) a T
XXX
,
so
: , Y () a = XXX
1
[E]

.
As a is arbitrary, Y is

-measurable, and is a random variable.
(ii) Let

h : R
n
R be any extension of h to the whole of R
n
. Then

h is T
XXX
-measurable, so the ordinary image
measure
XXX

h
1
, dened on F :

h
1
[F] dom
XXX
, is a Radon probability measure on R (256G). But for any A R,

h
1
[A]h
1
[A] R
n
domh
is
XXX
-negligible, so
XXX
h
1
[F] =
XXX

h
1
[F] if either is dened.
If F R is a Borel set, then

Y
F = : Y () F = (XXX
1
[h
1
[F]]) =
XXX
(h
1
[F]).
So
Y
and
XXX

h
1
agree on the Borel sets and are equal (256D again).
(b) Now apply Theorem 235E to the measures and
XXX
and the function = XXX. We have
_
(XXX
1
[F])d = (XXX
1
[F]) =
XXX
F
for every F T
XXX
, by 271Bb. Because h is
XXX
-virtually measurable and dened
XXX
-a.e., 235Eb tells us that
_
h(XXX)d =
_
h(XXX)d =
_
hd
XXX
whenever either side is dened in [, ], which is exactly the result we need.
332 Probability theory 271F
271F Corollary If X is a single random variable with distribution
X
, then
E(X) =
_

x
X
(dx)
if either is dened in [, ]. Similarly
E(X
2
) =
_

x
2

X
(dx)
(whatever X may be). If X, Y are two random variables (on the same probability space!) then we have
E(X Y ) =
_
xy
(X,Y )
d(x, y)
if either side is dened in [, ].
271G Distribution functions (a) If X is a real-valued random variable, its distribution function is the function
F
X
: R [0, 1] dened by setting
F
X
(a) = Pr(X a) =
X
], a]
for every a R. (Warning! some authors prefer F
X
(a) = Pr(X < a).) Observe that F
X
is non-decreasing, that
lim
a
F
X
(a) = 0, that lim
a
F
X
(a) = 1 and that lim
xa
F
X
(x) = F
X
(a) for every a R. By 271Ba, X and Y
have the same distribution i F
X
= F
Y
.
(b) If X
1
, . . . , X
n
are real-valued random variables on the same probability space, their (joint) distribution
function is the function F
XXX
: R
n
[0, 1] dened by writing
F
XXX
(a) = Pr(X
i

i
i n)
whenever a = (
1
, . . . ,
n
) R
n
. If XXX and YYY have the same distribution function, they have the same distribution,
by the n-dimensional version of 271B.
271H Densities Let XXX = (X
1
, . . . , X
n
) be a family of random variables, all dened on the same probability space.
A density function for (X
1
, . . . , X
n
) is a Radon-Nikod ym derivative, with respect to Lebesgue measure, for the
distribution
XXX
; that is, a non-negative function f, integrable with respect to Lebesgue measure
L
on R
n
, such that
_
E
fd
L
=
XXX
E = Pr(XXX E)
for every Borel set E R
n
(256J) if there is such a function, of course.
271I Proposition Let XXX = (X
1
, . . . , X
n
) be a family of random variables, all dened on the same probability
space. Write
L
for Lebesgue measure on R
n
.
(a) There is a density function for XXX i Pr(XXX E) = 0 for every Borel set E such that
L
E = 0.
(b) A non-negative Lebesgue integrable function f is a density function for XXX i
_
],a]
fd
L
= Pr(XXX ], a])
for every a R
n
.
(c) Suppose that f is a density function for XXX, and G = x : f(x) > 0. Then if h is a Lebesgue measurable
real-valued function dened almost everywhere in G,
E(h(XXX)) =
_
hd
XXX
=
_
h fd
L
if any of the three integrals is dened in [, ], interpreting (h f)(x) as 0 if f(x) = 0 and x / domh.
proof (a) Apply 256J to the Radon probability measure
XXX
.
(b) Of course the condition is necessary. If it is satised, then (by B.Levis theorem)
_
fd
L
= lim
k
_
],k111]
fd
L
= lim
k

XXX
], k111] = 1.
So we have a Radon probability measure dened by writing
E =
_
E
fd
L
whenever E x : f(x) > 0 is Lebesgue measurable (256E). We are supposing that ], a] =
XXX
], a] for every
a R
n
; by 271Ba, as usual, =
XXX
, so
_
E
fd
L
= E =
XXX
E = Pr(XXX E)
for every Borel set E R
n
, and f is a density function for XXX.
271K Distributions 333
(c) By 256E,
XXX
is the indenite-integral measure over associated with f. So, writing G = x : f(x) > 0, we
have
_
hd
XXX
=
_
h fd
L
whenever either is dened in [, ] (235K). By 234La, h is T
XXX
-measurable and dened
XXX
-almost everywhere, where
T
XXX
= dom
XXX
, so E(h(XXX)) =
_
hd
XXX
by 271E.
271J The machinery developed in 263 is sucient to give a very general result on the densities of random variables
of the form (XXX), as follows.
Theorem Let XXX = (X
1
, . . . , X
n
) be a family of random variables, and D R
n
a Borel set such that Pr(XXX D) = 1.
Let : D R
n
be a function which is dierentiable relative to its domain everywhere in D; for x D, let T(x) be
a derivative of at x, and set J(x) = [ det T(x)[. Suppose that J(x) ,= 0 for each x D, and that XXX has a density
function f; and suppose moreover that D
k

kN
is a disjoint sequence of Borel sets, with union D, such that D
k
is
injective for every k. Then (XXX) has a density function g =

k=0
g
k
where
g
k
(y) =
f(
1
(y))
J(
1
(y))
for y [D
k
domf],
= 0 for y R
n
[D
k
].
proof By 262Ia, is continuous, therefore Borel measurable, so (XXX) is a random variable.
For the moment, x k N and a Borel set F R
n
. By 263D(iii), [D
k
] is measurable, and by 263D(ii) [D
k
domf]
is negligible. The function g
k
is such that f(x) = J(x)g
k
((x)) for every x D
k
domf, so by 263D(v) we have
_
F
g
k
d =
_
[D
k
]
g
k
F d =
_
D
k
J(x)g
k
((x))F((x))(dx)
=
_
D
k

1
[F]
fd = Pr(XXX D
k

1
[F]).
(The integral
_
[D
k
]
g
k
F is dened because
_
D
k
J (g
k
F) is dened, and the integral
_
g
k
F is dened
because [D
k
] is measurable and g is zero o [D
k
].)
Now sum over k. Every g
k
is non-negative, so by B.Levis theorem (123A, 123Xa)
_
F
g d =

k=0
_
F
g
k
d =

k=0
Pr(XXX D
k

1
[F])
= Pr(XXX
1
[F]) = Pr((XXX) F).
As F is arbitrary, g is a density function for (XXX), as claimed.
271K The application of the last theorem to ordinary transformations is sometimes indirect, so I give an example.
Proposition Let X, Y be two random variables with a joint density function f. Then X Y has a density function
h, where
h(u) =
_

1
|v|
f(
u
v
, v)dv
whenever this is dened in R.
proof Set (x, y) = (xy, y) for x, y R
2
. Then is dierentiable, with derivative T(x, y) =
_
y x
0 1
_
, so J(x, y) =
[ det T(x, y)[ = [y[. Set D = (x, y) : y ,= 0; then D is a conegligible Borel set in R
2
and D is injective. Now
[D] = D and
1
(u, v) = (
u
v
, v) for v ,= 0. So (X, Y ) = (X Y, Y ) has a density function g, where
g(u, v) =
f(u/v,v)
|v|
if v ,= 0.
To nd a density function for X Y , we calculate
Pr(X Y a) =
_
],a]R
g =
_
a

g(u, v)dv du =
_
a

h
by Fubinis theorem (252B, 252C). In particular, h is dened and nite almost everywhere; and by 271Ib it is a density
function for X Y .
334 Probability theory *271L
*271L When a random variable is presented as the limit of a sequence of random variables the following can be
very useful.
Proposition Let X
n

nN
be a sequence of real-valued random variables converging in measure to a random variable
X (denition: 245A). Writing F
X
n
, F
X
for the distribution functions of X
n
, X respectively,
F
X
(a) = inf
b>a
liminf
n
F
X
n
(b) = inf
b>a
limsup
n
F
X
n
(b)
for every a R.
proof Set = inf
b>a
liminf
n
F
X
n
(b),

= inf
b>a
limsup
n
F
X
n
(b).
(a) F
X
(a) . PPP Take any b > a and > 0. Then there is an n
0
N such that Pr([X
n
X[ b a) for every
n n
0
(245F). Now, for n n
0
,
F
X
(a) = Pr(X a) Pr(X
n
b) + Pr(X
n
X b a) F
X
n
(b) +.
So F
X
(a) liminf
n
F
X
n
(b) +; as is arbitrary, F
X
(a) liminf
n
F
X
n
(b); as b is arbitrary, F
X
(a) . QQQ
(b)

F
X
(a). PPP Let > 0. Then there is a > 0 such that F
X
(a + 2) F
X
(a) + (271Ga). Next, there is an
n
0
N such that Pr([X
n
X[ ) for every n n
0
. In this case, for n n
0
,
F
X
n
(a +) = Pr(X
n
a +) Pr(X a + 2) + Pr(X X
n
)
F
X
(a + 2) + F
X
(a) + 2.
Accordingly

limsup
n
F
X
n
(a +) F
X
(a) + 2.
As is arbitrary,

F
X
(a). QQQ
(c) Since of course

, we must have F
X
(a) = =

, as claimed.
271X Basic exercises >>>(a) Let X be a real-valued random variable with nite expectation, and > 0. Show that
Pr([X E(X)[ )
1

2
Var(X). (This is Chebyshevs inequality.)
>>>(b) Let F : R [0, 1] be a non-decreasing function such that (i) lim
a
F(a) = 0 (ii) lim
a
F(a) = 1
(iii) lim
xa
F(x) = F(a) for every a R. Show that there is a unique Radon probability measure in R such that
F(a) = ], a] for every a R. (Hint: 114Xa.) Hence show that F is the distribution function of some random
variable.
>>>(c) Let X be a real-valued random variable with a density function f. (i) Show that [X[ has a density function
g
1
where g
1
(x) = f(x) +f(x) whenever x 0 and f(x), f(x) are both dened, 0 otherwise. (ii) Show that X
2
has
a density function g
2
where g
2
(x) = (f(

x) + f(

x))/2

x whenever x > 0 and this is dened, 0 for other x. (iii)


Show that if Pr(X = 0) = 0 then 1/X has a density function g
3
where g
3
(x) =
1
x
2
f(
1
x
) whenever this is dened. (iv)
Show that if Pr(X < 0) = 0 then

X has a density function g


4
where g
4
(x) = 2xf(x
2
) if x 0 and f(x
2
) is dened,
0 otherwise.
>>>(d) Let X and Y be random variables with a joint density function f : R
2
R. Show that X +Y has a density
function h where h(u) =
_
f(u v, v)dv for almost every u.
(e) Let X, Y be random variables with a joint density function f : R
2
R. Show that X/Y has a density function
h where h(u) =
_
[v[f(uv, v)dv for almost every u.
(f ) Devise an alternative proof of 271K by using Fubinis theorem and one-dimensional substitutions to show that
_
b
a
_

1
|v|
f(
u
v
, v)dv du =
_
{(u,v):auvb}
f
whenever a b in R.
271Y Further exercises (a) Let T be the topology of R
N
and B the -algebra of Borel sets (256Ye). (i) Let J be
the family of sets of the form
x : x R
N
, x(i)
i
i n,
272 intro. Independence 335
where n N and
i
R for each i n. Show that B is the smallest family of subsets of R
N
such that () J B
() B A B whenever A, B B and A B ()

kN
A
k
B for every non-decreasing sequence A
k

kN
in
B. (ii) Show that if ,

are two totally nite measures dened on R


N
, and F and

F are dened and equal


for every F J, then E and

E are dened and equal for every E B. (iii) Show that if is a set and a
-algebra of subsets of and X : R
N
is a function, then X
1
[E] for every E B i
i
X is -measurable
for every i N, where
i
(x) = x(i) for each x R
N
, i N. (iv) Show that if XXX = X
i

iN
is a sequence of
real-valued random variables on a probability space (, , ), then there is a unique probability measure
B
XXX
, with
domain B, such that
B
XXX
x : x(i)
i
i n = Pr(X
i

i
i n) for every
0
, . . . ,
n
R. (v) Under
the conditions of (iv), show that there is a unique Radon measure
XXX
on R
N
(in the sense of 256Ye) such that

XXX
x : x(i)
i
i n = Pr(X
i

i
i n) for every
0
, . . . ,
n
R.
(b) Let F : R
2
[0, 1] be a function. Show that the following are equiveridical: (i) F is the distribution function
of some pair (X
1
, X
2
) of random variables (ii) there is a probability measure on R
2
such that ], a] = F(a) for
every a R
2
(iii)() F(
1
,
2
) + F(
1
,
2
) F(
1
,
2
) + F(
2
,
1
) whenever
1

1
and
2

2
() F(
1
,
2
) =
lim

1
,
2

2
F(
1
,
2
) for every
1
,
2
() lim

F(, ) = lim

F(, ) = 0 for all () lim

F(, ) = 1.
(Hint: for non-empty half-open intervals ]a, b], set ]a, b] = F(
1
,
2
) +F(
1
,
2
) F(
1
,
2
) F(
2
,
1
), and continue
as in 115B-115F.)
(c) Generalize (b) to higher dimensions, nding a suitable formula to stand in place of that in (iii-) of (b).
(d) Let (, , ) be a probability space and T a lter on L
0
() converging to X
0
L
0
() for the topology of
convergence in measure. Show that, writing F
X
for the distribution function of X L
0
(),
F
X
0
(a) = inf
b>a
liminf
XF
F
X
(b) = inf
b>a
limsup
XF
F
X
(b)
for every a R.
(e) Let X, Y be non-negative random variables with the same distribution, and h : [0, [ [0, [ a non-decreasing
function. Show that E(X hY ) E(Y hY ). (Hint: in the language of 252Yo, (Y hY )

= Y

(hY )

.)
271 Notes and comments Most of this section seems to have been taken up with technicalities. This is perhaps
unsurprising in view of the fact that it is devoted to the relationship between a vector random variable XXX and the
associated distribution
XXX
, and this necessarily leads us into the mineeld which I attempted to chart in 235. Indeed,
I call on results from 235 twice; once in 271E, with a () = XXX() and J() = 1, and once in 271I, with (x) = x
and J(x) = f(x).
Distribution functions of one-dimensional random variables are easily characterized (271Xb); in higher dimensions
we have to work harder (271Yb-271Yc). Distributions, rather than distribution functions, can be described for innite
sequences of random variables (271Ya); indeed, these ideas can be extended to uncountable families, but this requires
proper topological measure theory, and belongs in Volume 4.
The statement of 271J is lengthy, not to say cumbersome. The point is that many of the most important transforma-
tions are not themselves injective, but can easily be dissected into injective fragments (see, for instance, 271Xc and
263Xd). The point of 271K is that we frequently wish to apply the ideas here to transformations which are singular,
and indeed change the dimension of the random variable. I have not given the theorems which make such applications
routine and suggest rather that you seek out tricks such as that used in the proof of 271K, which in any case are
necessary if you want amenable formulae. Of course other methods are available (271Xf).
272 Independence
I introduce the concept of independence for families of events, -algebras and random variables. The rst part of
the section, down to 272G, amounts to an analysis of the elementary relationships between the three manifestations of
the idea. In 272G I give the fundamental result that the joint distribution of a (nite) independent family of random
variables is just the product of the individual distributions. Further expressions of the connexion between independence
and product measures are in 272J, 272M and 272N. I give a version of the zero-one law (272O), and I end the section
with a group of basic results from probability theory concerning sums and products of independent random variables
(272R-272W).
336 Probability theory 272A
272A Denitions Let (, , ) be a probability space.
(a) A family E
i

iI
in is (stochastically) independent if
(E
i
1
E
i
2
. . . E
i
n
) =

n
j=1
E
i
j
whenever i
1
, . . . , i
n
are distinct members of I.
(b) A family
i

iI
of -subalgebras of is (stochastically) independent if
(E
1
E
2
. . . E
n
) =

n
j=1
E
j
whenever i
1
, . . . , i
n
are distinct members of I and E
j

i
j
for every j n.
(c) A family X
i

iI
of real-valued random variables on is (stochastically) independent if
Pr(X
i
j

j
for every j n) =

n
j=1
Pr(X
i
j

j
)
whenever i
1
, . . . , i
n
are distinct members of I and
1
, . . . ,
n
R.
272B Remarks (a) This is perhaps the central contribution of probability theory to measure theory, and as such
deserves the most careful scrutiny. The idea of independence comes from outside mathematics altogether, in the
notion of events which have independent causes. I suppose that 272G and 272M are the results below which most
clearly show the measure-theoretic aspects of the concept. It is not an accident that both involve product measures;
one of the wonders of measure theory is the fact that the same technical devices are used in establishing the probability
theory of stochastic independence and the geometry of multi-dimensional volume.
(b) In the following paragraphs I will try to describe some relationships between the three notions of independence
just dened. But it is worth noting at once the fact that, in all three cases, a family is independent i all its nite
subfamilies are independent. Consequently any subfamily of an independent family is independent. Another elementary
fact which is immediate from the denitions is that if
i

iI
is an independent family of -algebras, and

i
is a -
subalgebra of
i
for each i, then

iI
is an independent family.
(c) A useful reformulation of 272Ab is the following: A family
i

iI
of -subalgebras of is independent i
(

iI
E
i
) =

iI
E
i
whenever E
i

i
for every i and i : E
i
,= is nite. (Here I follow the convention of 254F, saying that for a family

iI
in [0, 1] we take

iI

i
= 1 if I = , and otherwise it is to be inf
JI,J is nite

iJ

j
.)
(d) In 272Aa-b I speak of sets E
i
and algebras
i
. In fact (272Ac already gives a hint of this) we shall
more often than not be concerned with

rather than with , if there is a dierence, where (,

, ) is the completion
of (, , ).
272C The -subalgebra dened by a random variable To relate 272Ab to 272Ac we need the following notion.
Let (, , ) be a probability space and X a real-valued random variable dened on . Write B for the -algebra of
Borel subsets of R, and
X
for
X
1
[F] : F B ( domX) X
1
[F] : F B.
Then
X
is a -algebra of subsets of . PPP
= X
1
[]
X
;
if F B then
X
1
[F] = ( domX) X
1
[R F]
X
,
(( domX) X
1
[F]) = X
1
[R F]
X
;
if F
k

kN
is any sequence in B then

kN
X
1
[F
k
] = X
1
[

kN
F
k
],
so

kN
X
1
[F
k
], ( domX)

kN
X
1
[F
k
]
272D Independence 337
belong to
X
. QQQ
Evidently
X
is the smallest -algebra of subsets of , containing domX, for which X is measurable. Also
X
is a
subalgebra of

, where

is the domain of the completion of (271Aa).
Now we have the following result.
272D Proposition Let (, , ) be a probability space and X
i

iI
a family of real-valued random variables on .
For each i I, let
i
be the -algebra dened by X
i
, as in 272C. Then the following are equiveridical:
(i) X
i

iI
is independent;
(ii) whenever i
1
, . . . , i
n
are distinct members of I and F
1
, . . . , F
n
are Borel subsets of R, then
Pr(X
i
j
F
j
for every j n) =

n
j=1
Pr(X
i
j
F
j
);
(iii) whenever F
i

iI
is a family of Borel subsets of R, and i : F
i
,= R is nite, then

_
iI
(X
1
i
[F
i
] ( domX
i
))
_
=

iI
Pr(X
i
F
i
),
where is the completion of ;
(iv)
i

iI
is independent with respect to .
proof (a)(i)(ii) Write XXX = (X
i
1
, . . . , X
i
n
). Write
XXX
for the joint distribution of XXX, and for each j n write
j
for
the distribution of X
i
j
; let be the product of
1
, . . . ,
n
as described in 254A-254C. (I wrote 254 out as for innite
products. If you are interested only in nite products of probability spaces, which are adequate for our needs in this
paragraph, I recommend reading 251-252 with the mental proviso that all measures are probabilities, and then 254
with the proviso that the set I is nite.) By 256K, is a Radon measure on R
n
. (This is an induction on n, relying
on 254N for assurance that we can regard as the repeated product (. . . ((
1

2
)
3
) . . .
n1
)
n
.) Then for
any a = (
1
, . . . ,
n
) R
n
, we have
], a] =
_
n

j=1
],
j
]
_
=
n

j=1

j
],
j
]
(using 254Fb)
=
n

j=1
Pr(X
i
j

j
) = Pr(X
i
j

j
for every j n)
(using the condition (i))
=
XXX
], a] .
By the uniqueness assertion in 271Ba, =
XXX
. In particular, if F
1
, . . . , F
n
are Borel subsets of R,
Pr(X
i
j
F
j
for every j n) = Pr(XXX

jn
F
j
) =
XXX
(

jn
F
j
)
= (

jn
F
j
) =
n

j=1

j
F
j
=
n

j=1
Pr(X
i
j
F
j
),
as required.
(b)(ii)(i) is trivial, if we recall that all sets ], ] are Borel sets, so that the denition of independence given
in 272Ac is just a special case of (ii).
(c)(ii)(iv) Assume (ii), and suppose that i
1
, . . . , i
n
are distinct members of I and E
j

i
j
for each j n. For
each j, set E

j
= E
j
domX
i
j
, so that E

j
may be expressed as X
1
i
j
[F
j
] for some Borel set F
j
R. Then (E
j
E

j
) = 0
for each j, so
(

1jn
E
j
) = (

1jn
E

j
) = Pr(X
i
1
F
1
, . . . , X
i
n
F
n
)
=
n

j=1
Pr(X
i
j
F
j
)
(using (ii))
338 Probability theory 272D
=
n

i=1
E
j
.
As E
1
, . . . , E
k
are arbitrary,
i

iI
is independent.
(d)(iv)(ii) Now suppose that
i

iI
is independent. If i
1
, . . . , i
n
are distinct members of I and F
1
, . . . , F
n
are
Borel sets in R, then X
1
i
j
[F
j
]
i
j
for each j, so
Pr(X
i
1
F
1
, . . . , X
i
n
F
n
) = (

1jn
X
1
i
j
[F
j
])
=
n

i=1
X
1
i
j
[F
j
] =
n

j=1
Pr(X
i
j
F
j
)
.
(e) Finally, observe that (iii) is nothing but a re-formulation of (ii), because if F
i
= R then Pr(X
i
F
i
) = 1 and
X
1
i
[F
i
] ( domX
i
) = .
272E Corollary Let X
i

iI
be an independent family of real-valued random variables, and h
i

iI
any family of
Borel measurable functions from R to R. Then h
i
(X
i
)
iI
is independent.
proof Writing
i
for the -algebra dened by X
i
,

i
for the -algebra dened by h(X
i
), h(X
i
) is
i
-measurable
(121Eg) so

i

i
for every i and

iI
is independent, as in 272Bb.
272F Similarly, we can relate the denition in 272Aa to the others.
Proposition Let (, , ) be a probability space, and E
i

iI
a family in . Set
i
= , E
i
, E
i
, , the (-)algebra
of subsets of generated by E
i
, and X
i
= E
i
, the characteristic function of E
i
. Then the following are equiveridical:
(i) E
i

iI
is independent;
(ii)
i

iI
is independent;
(iii) X
i

iI
is independent.
proof (i)(iii) If i
1
, . . . , i
n
are distinct members of I and
1
, . . . ,
n
R, then for each j n the set G
j
= :
X
i
j
()
j
is either E
i
j
or or . If any G
j
is empty, then
Pr(X
i
j

j
for everyj n = 0 =

n
j=1
Pr(X
i
j

j
).
Otherwise, set K = j : G
j
= E
i
j
; then
Pr(X
i
j

j
for everyj n = (

jn
G
j
) = (

jK
E
i
j
)
=

jK
E
i
j
=
n

j=1
Pr(X
i
j

j
).
As i
1
, . . . , i
n
and
1
, . . . ,
n
are arbitrary, X
i

iI
is independent.
(iii)(ii) follows from (i)(iii) of 272D, because
i
is the -algebra dened by X
i
.
(ii)(i) is trivial, because E
i

i
for each i.
Remark You will I hope feel that while the theory of product measures might be appropriate to 272D, it is surely
rather heavy machinery to use on what ought to be a simple combinatorial problem like (iii)(ii) of this proposition.
I suggest that you construct an elementary proof, and examine which of the ideas of the theory of product measures
(and the Monotone Class Theorem, 136B) are actually needed here.
272G Distributions of independent random variables I have not tried to describe the joint distribution of an
innite family of random variables. (Indications of how to deal with a countable family are oered in 271Ya and 272Yb.
For uncountable families I will wait until 454 in Volume 4.) As, however, the independence of a family of random
variables is determined by the behaviour of nite subfamilies, we can approach it through the following proposition.
272K Independence 339
Theorem Let XXX = (X
1
, . . . , X
n
) be a nite family of real-valued random variables on a probability space. Let
XXX
be
the corresponding distribution on R
n
. Then the following are equiveridical:
(i) X
1
, . . . , X
n
are independent;
(ii)
XXX
can be expressed as a product of n probability measures
1
, . . . ,
n
, one for each factor R of R
n
;
(iii)
XXX
is the product measure of
X
1
, . . . ,
X
n
, writing
X
i
for the distribution of the random variable X
i
.
proof (a)(i)(iii) In the proof of (i)(ii) of 272D above I showed that
XXX
is the product of
X
1
, . . . ,
X
n
.
(b)(iii)(ii) is trivial.
(c)(ii)(i) Suppose that
XXX
is expressible as a product
1
. . .
n
. Take a = (
1
, . . . ,
n
) in R
n
. Then
Pr(X
i

i
i n) = Pr(XXX ], a]) =
XXX
(], a]) =

n
i=1

i
],
i
].
On the other hand, setting F
i
= (
1
, . . . ,
n
) :
i

i
, we must have

i
],
i
] =
XXX
F
i
= Pr(XXX F
i
) = Pr(X
i

i
)
for each i. So we get
Pr(X
i

i
for every i n) =

n
i=1
Pr(X
i

i
),
as required.
272H Corollary Suppose that X
i

iI
is an independent family of real-valued random variables on a probability
space (, , ), and that for each i I we are given another real-valued random variable Y
i
on such that Y
i
=
a.e.
X
i
.
Then Y
i

iI
is independent.
proof For every distinct i
1
, . . . , i
n
I, if we set XXX = (X
i
1
, . . . , X
i
n
) and YYY = (Y
i
1
, . . . , Y
i
n
), then XXX =
a.e.
YYY , so

XXX
,
YYY
are equal (271De). By 272G, Y
i
1
, . . . , Y
i
n
must be independent because X
i
1
, . . . , X
i
n
are. As i
1
, . . . , i
n
are
arbitrary, the whole family Y
i

iI
is independent.
Remark It follows that we may speak of independent families in the space L
0
() of equivalence classes of random
variables (241C), saying that X

iI
is independent i X
i

iI
is.
272I Corollary Suppose that X
1
, . . . , X
n
are independent real-valued random variables with density functions
f
1
, . . . , f
n
(271H). Then XXX = (X
1
, . . . , X
n
) has a density function f given by setting f(x) =

n
i=1
f
i
(
i
) whenever
x = (
1
, . . . ,
n
)

in
dom(f
i
) R
n
.
proof For n = 2 this is covered by 253I; the general case follows by induction on n.
272J The most important theorems of the subject refer to independent families of random variables, rather than
independent families of -algebras. The value of the concept of independent -algebras lies in such results as the
following.
Proposition Let (, , ) be a complete probability space, and
i

iI
a family of -subalgebras of . For each i I
let
i
be the restriction of to
i
, and let (
I
, , ) be the product probability space of the family (,
i
,
i
)
iI
.
Dene :
I
by setting ()(i) = whenever and i I. Then is inverse-measure-preserving i
i

iI
is independent.
proof This is virtually a restatement of 254Fb and 254G. (i) If is inverse-measure-preserving, i
1
, . . . , i
n
I are
distinct and E
j

i
j
for each j, then

jn
E
i
j
=
1
[x : x(i
j
) E
j
for every j n], so that
(

jn
E
i
j
) = x : x(i
j
) E
j
for every j n =

n
j=1

i
j
E
i
j
=

n
j=1
E
i
j
.
(ii) If
i

iI
is independent, E
i

i
for every i I and i : E
i
,= is nite, then

1
[

iI
E
i
] = (

iI
E
i
) =

iI
E
i
=

iI

i
E
i
.
So the conditions of 254G are satised and
1
[W] = W for every W .
272K Proposition Let (, , ) be a probability space and
i

iI
an independent family of -subalgebras of .
Let J(s)
sS
be a disjoint family of subsets of I, and for each s S let

s
be the -algebra of subsets of generated
by

iJ(s)

i
. Then

sS
is independent.
proof Let (,

, ) be the completion of (, , ). On
I
let be the product of the measures
i
, and let :
I
be the diagonal map, as in 272J. is inverse-measure-preserving for and , by 272J.
340 Probability theory 272K
We can identify with the product of
s

sS
, where for each s S
s
is the product of
i

iJ(s)
(254N). For
s S, let
s
be the domain of
s
, and set
s
(x) = xJ(s) for x
I
, so that
s
is inverse-measure-preserving for
and
s
(254Oa), and
s
=
s
is inverse-measure-preserving for and
s
; of course
s
is the diagonal map from to

J(s)
. Set

s
=
1
s
[H] : H
s
. Then

s
is a -subalgebra of

, and

s
, because
E =
1
s
[x : x(i) E]

s
whenever i J(s)a dn E
i
.
Now suppose that s
1
, . . . , s
n
S are distinct and that E
j

s
j
for each j. Then E
j

s
j
, so there are H
j

s
j
such that E
j
=
1
s
j
[H
j
] for each j. Set
W = x : x
I
, xJ(s
j
) H
j
for every j n.
Because we can identify with the product of the
s
, we have
W =

n
j=1

s
j
H
j
=

n
j=1
(
1
s
j
[H
j
]) =

n
j=1
E
j
=

n
j=1
E
j
.
On the other hand,
1
[W] =

jn
E
j
, so, because is inverse-measure-preserving,
(

jn
E
j
) = (

jn
E
j
) = W =

n
j=1
E
j
.
As E
1
, . . . , E
n
are arbitrary,

sS
is independent.
272L I give a typical application of this result as a sample.
Corollary Let X, X
1
, . . . , X
n
be independent real-valued random variables and h : R
n
R a Borel function. Then
X and h(X
1
, . . . , X
n
) are independent.
proof Let
X
,
X
i
be the -algebras dened by X, X
i
(272C). Then
X
,
X
1
, . . . ,
X
n
are independent (272D).
Let

be the -algebra generated by


X
1
. . .
X
n
. Then 272K (perhaps working in the completion of the original
probability space) tells us that
X
and

are independent. But every X


j
is

-measurable so Y = h(X
1
, . . . , X
n
) is

-measurable (121Kb); also domY

, so
Y

and
X
,
Y
are independent. By 272D again, X and Y are
independent, as claimed.
Remark Nearly all of us, when teaching elementary probability theory, would invite our students to treat this corollary
(with an explicit function h, of course) as obvious. In eect, the proof here is a conrmation that the formal denition
of independence oered is a faithful representation of our intuition of independent events having independent causes.
272M Products of probability spaces and independent families of random variables We have already
seen that the concept of independent random variables is intimately linked with that of product measure. I now give
some further manifestations of the connexion.
Proposition Let (
i
,
i
,
i
)
iI
be a family of probability spaces, and (, , ) their product.
(a) For each i I write

i
=
1
i
[E] : E
i
, where
i
:
i
is the coordinate map. Then

iI
is an
independent family of -subalgebras of .
(b) For each i I let X
ij

jJ(i)
be an independent family of real-valued random variables on
i
, and for i I,
j J(i) write

X
ij
() = X
ij
((i)) for those such that (i) domX
ij
. Then

X
ij

iI,jJ(i)
is an independent
family of random variables, and each

X
ij
has the same distribution as the corresponding X
ij
.
proof (a) It is easy to check that each

i
is a -algebra of sets. The rest amounts just to recalling from 254Fb that if
J I is nite and E
i

i
for i J, then
(

iJ

1
i
[E
i
]) = : (i) E
i
for every i I =

iI

i
E
i
if we set E
i
= X
i
for i I J.
(b) We know also that (, , ) is the product of the completions (
i
,

i
,
i
) (254I). From this, we see that each

X
ij
is dened -a.e., and is -measurable, with the same distribution as X
ij
. Now apply condition (iii) of 272D. Suppose
that F
ij

iI,jJ(i)
is a family of Borel sets in R, and that (i, j) : F
ij
,= R is nite. Consider
E
i
=

jJ(i)
(X
1
ij
[F
ij
] (
i
domX
ij
)),
E =

iI
E
i
=

iI,jJ(i)
(

X
1
ij
[F
ij
] ( dom

X
ij
)).
Because each family X
ij

jJ(i)
is independent, and j : F
ij
,= R is nite,
272P Independence 341

i
E
i
=

jJ(i)
Pr(X
ij
E
ij
)
for each i I. Because
i : E
i
,=
i
i : j J(i), F
ij
,= R
is nite,
E =

iI

i
E
i
=

iI,jJ(i)
Pr(

X
ij
F
ij
);
as F
ij

iI,jJ(i)
is arbitrary,

X
ij

iI,jJ(i)
is independent.
Remark The formulation in (b) is more complicated than is necessary to express the idea, but is what is needed for
an application below.
272N A special case of 272J is of particular importance in general measure theory, and is most useful in an adapted
form.
Proposition Let (, , ) be a complete probability space, and E
i

iI
an independent family in such that E
i
=
1
2
for every i I. Dene : 0, 1
I
by setting ()(i) = 1 if E
i
, 0 if E
i
. Then is inverse-measure-
preserving for the usual measure on 0, 1
I
(254J).
proof I use 254G again. For each i I let
i
be the algebra , E
i
, E
i
, ; then
i

iI
is independent (272F).
For i I set
i
() = ()(i). Let be the usual measure of 0, 1. Then it is easy to check that

1
i
[H] =
1
2
#(H) = H
for every H 0, 1. If H
i

iI
is a family of subsets of 0, 1, and i : H
i
,= 0, 1 is nite, then

1
[

iI
H
i
] = (

iI

1
i
[H
i
]) =

iJ

1
i
[H
i
]
(because
1
[H
i
]
i
for each i, and
i

iI
is independent)
=

iI
H
i
= (

iI
H
i
).
As H
i

iI
is arbitrary, 254G gives the result.
272O Tail -algebras and the zero-one law I have never been able to make up my mind whether the following
result is deep or not. I think it is one of the many cases in mathematics where a theorem is surprising and exciting if
one comes on it unprepared, but is natural and straightforward if one approaches it from the appropriate angle.
Proposition Let (, , ) be a probability space and
n

nN
an independent sequence of -subalgebras of . Let

n
be the -algebra generated by

mn

m
for each n, and set

nN

n
. Then E is either 0 or 1 for every
E

.
proof For each n, the family (
0
, . . . ,
n
,

n+1
) is independent, by 272K. So (
0
, . . . ,
n
,

) is independent, because

n+1
. But this means that every nite subfamily of (

,
0
,
1
, . . . ) is independent, and therefore that the
whole family is (272Bb). Consequently (

0
) must be independent, by 272K again.
Now if E

, then E also belongs to

0
, so we must have
(E E) = E E,
that is, E = (E)
2
; so that E 0, 1, as claimed.
272P To support the claim that somewhere we have achieved a non-trivial insight, I give a corollary, which will be
fundamental to the understanding of the limit theorems in the next section, and does not seem to be obvious.
Corollary Let (, , ) be a probability space, and X
n

nN
an independent sequence of real-valued random variables
on . Then
limsup
n
1
n+1
(X
0
+. . . +X
n
)
is almost everywhere constant that is, there is some u [, ] such that
342 Probability theory 272P
limsup
n
1
n+1
(X
0
+. . . +X
n
) = u
almost everywhere.
proof We may suppose that each X
n
is -measurable and dened everywhere in , because (as remarked in 272H)
changing the X
n
on a negligible set does not aect their independence, and it aects limsup
n
1
n+1
(X
0
+. . . +X
n
)
only on a negligible set. For each n, let
n
be the -algebra generated by X
n
(272C), and

n
the -algebra generated
by

mn

m
; set

nN

n
. By 272D,
n

nN
is independent, so E 0, 1 for every E

(272O).
Now take any a R and set
E
a
= : limsup
m
1
m+1
(X
0
() +. . . +X
m
()) a.
Then
limsup
m
1
m+1
(X
0
+. . . +X
m
) = limsup
m
1
m+1
(X
n
+. . . +X
m+n
),
so
E
a
= : limsup
m
1
m+1
(X
n
() +. . . +X
n+m
()) a
belongs to

n
for every n, because X
i
is

n
-measurable for every i n. So E

and
Pr(limsup
m
1
m+1
(X
0
+. . . +X
m
) a) = E
a
must be either 0 or 1. Setting
u = supa : a R, E
a
= 0
(allowing sup = and sup R = , as usual in such contexts), we see that
limsup
n
1
n+1
(X
0
+. . . +X
n
) = u
almost everywhere.
*272Q I add here a result which will be useful in Volume 5 and which gives further insight into the nature of large
independent families.
Theorem Let (, , ) be a probability space, and
i

iI
an independent family of -subalgebras of . Let c
be a family of measurable sets, and T the -algebra generated by c. Then there is a set J I such that #(I J)
max(, #(c)) and T,
j

jJ
are independent, in the sense that (F

rn
E
r
) = F

n
r=1
E
r
whenever F T,
j
1
, . . . , j
r
are distinct members of J and E
r

j
r
for each r n.
proof (a) As in 272J, give
I
the probability measure which is the product of the measures
i
, and let :
I
be the diagonal map, so that is inverse-measure-preserving for and , where is the completion of . Write
for the domain of . Set = max(, #(c)), and let c

be the set

rn
F
r
: n N, F
r
c for every r n.
Because #(c
n
) for each n (2A1Lc), #(c

) (2A1Ld). For each F c

, dene
F
: [0, 1] by setting

F
W = (F
1
[W]); then
F
is countably additive and dominated by . It therefore has a Radon-Nikod ym
derivative h
F
with respect to , so that (F
1
[W]) =
_
W
h
F
d for every W (232F). By 254Qc or 254Rb, we
can nd a function h

F
equal -almost everywhere to h
F
and determined by coordinates in a countable set J
F
, in the
sense that h

F
(w) = h

F
(w

) whenever w, w


I
and wJ
F
= w

J
F
. (I am taking it for granted that we chose h

F
to
be dened everywhere on
I
.)
(b) Set J = I

FE

J
F
; by 2A1Ld, I J =

FE

J
F
has cardinal at most . If F c

, j
1
, . . . , j
r
are distinct
members of J and E
r

j
r
for each r n, (F

rn
E
r
) = F

n
r=1
E
r
. PPP Set W = w : w
I
, w(j
r
) E
r
for each r n. Then
(F

rn
E
r
) = (F
1
[W]) =
_
W
h

F
d =
_
h

F
W d.
But observe that W is determined by coordinates in J, while h

F
is determined by coordinates in J
F
I J; putting
272Ma, 272K and 272R together (or otherwise), we have
(F

rn
E
r
) =
_
h

F
W d =
_
h

F
d W = F

n
r=1
E
r
. QQQ
272T Independence 343
(c) Now consider the family /of those sets F such that (F

rn
E
r
) = F

n
r=1
E
r
whenever j
1
, . . . , j
n
J
are distinct and E
r

j
r
for every r n. It is easy to check that / is a Dynkin class, and we have just seen that /
includes c

; as c

is closed under , / includes the -algebra T of sets generated by c

(136B). And this is just what


the theorem asserts.
272R I must now catch up on some basic facts from elementary probability theory.
Proposition Let X, Y be independent real-valued random variables with nite expectation (271Ab). Then E(XY )
exists and is equal to E(X)E(Y ).
proof Let
(X,Y )
be the joint distribution of the pair (X, Y ). Then
(X,Y )
is the product of the distributions
X
and

Y
(272G). Also
_
x
X
(dx) = E(X) and
_
y
Y
(dy) = E(Y ) exist in R (271F). So
_
xy
(X,Y )
d(x, y) exists = E(X)E(Y )
(253D). But this is just E(X Y ), by 271E with h(x, y) = xy.
272S Bienaymes Equality Let X
1
, . . . , X
n
be independent real-valued random variables. Then Var(X
1
+ . . . +
X
n
) = Var(X
1
) +. . . + Var(X
n
).
proof (a) Suppose rst that all the X
i
have nite variance. Set a
i
= E(X
i
), Y
i
= X
i
a
i
, X = X
1
+ . . . + X
n
,
Y = Y
1
+. . . +Y
n
; then E(X) = a
1
+. . . +a
n
, so Y = X E(X) and
Var(X) = E(Y
2
) = E(
n

i=1
Y
i
)
2
= E(
n

i=1
n

j=1
Y
i
Y
j
) =
n

i=1
n

j=1
E(Y
i
Y
j
).
Now observe that if i ,= j then E(Y
i
Y
j
) = E(Y
i
)E(Y
j
) = 0, because Y
i
and Y
j
are independent (by 272E) and we
may use 272R, while if i = j then
E(Y
i
Y
j
) = E(Y
2
i
) = E(X
i
E(X
i
))
2
= Var(X
i
).
So
Var(X) =

n
i=1
E(Y
2
i
) =

n
i=1
Var(X
i
).
(b)(i) I show next that if Var(X
1
+X
2
) < then Var(X
1
) < . PPP We have
__
(x +y)
2

X
1
(dx)
X
2
(dy) =
_
(x +y)
2

(X
1
,X
2
)
(d(x, y))
(by 272G and Fubinis theorem)
= E((X
1
+X
2
)
2
)
(by 271E)
< .
So there must be some a R such that
_
(x+a)
2

X
1
(dx) is nite, that is, E((X
1
+a)
2
) < ; consequently E(X
2
1
) and
Var(X
1
) are nite. QQQ
(ii) Now an easy induction (relying on 272L!) shows that if Var(X
1
+. . . +X
n
) is nite, so is Var X
j
for every j.
Turning this round, if

n
j=1
Var(X
j
) = , then Var(X
1
+. . . +X
n
) = , and again the two are equal.
272T The distribution of a sum of independent random variables: Theorem Let X, Y be independent
real-valued random variables on a probability space (, , ), with distributions
X
,
Y
. Then the distribution of
X +Y is the convolution
X

Y
(257A).
344 Probability theory 272T
proof Set =
X

Y
. Take a R and set h = ], a]. Then h is -integrable, so
], a] =
_
hd =
_
h(x +y)(
X

Y
)(d(x, y))
(by 257B, writing
X

Y
for the product measure on R
2
)
=
_
h(x +y)
(X,Y )
(d(x, y))
(by 272G, writing
(X,Y )
for the joint distribution of (X, Y ); this is where we use the hypothesis that X and Y are
independent)
= E(h(X +Y ))
(applying 271E to the function (x, y) h(x +y))
= Pr(X +Y a).
As a is arbitrary,
X

Y
is the distribution of X +Y .
272U Corollary Suppose that X and Y are independent real-valued random variables, and that they have densities
f and g. Then the convolution f g is a density function for X +Y .
proof By 257F, f g is a density function for
X

Y
=
X+Y
.
272V The following simple result will be very useful when we come to stochastic processes in Volume 4, as well as
in the next section.
Etemadis lemma (Etemadi 96) Let X
0
, . . . , X
n
be independent real-valued random variables. For m n, set
S
m
=

m
i=0
X
i
. Then
Pr(sup
mn
[S
m
[ 3) 3 max
mn
Pr([S
m
[ )
for every > 0.
proof As in 272P, we may suppose that every X
i
is a measurable function dened everywhere on a measure space .
Set = max
mn
Pr([S
m
[ ). For each r n, set
E
r
= : [S
m
()[ < 3 for every m < r, [S
r
()[ 3.
Then E
0
, . . . , E
n
is a partition of : max
mn
[S
m
()[ 3. Set E

r
= : E
r
, [S
n
()[ < . Then
E

r
: E
r
, [(S
n
S
r
)()[ > 2. But E
r
depends on X
0
, . . . , X
r
so is independent of : [(S
n
S
r
)()[ > 2,
which can be calculated from X
r+1
, . . . , X
n
(272K). So
E

r
: E
r
, [(S
n
S
r
)()[ > 2 = E
r
Pr([S
n
S
r
[ > 2)
E
r
(Pr([S
n
[ > ) + Pr([S
r
[ > )) 2E
r
,
and (E
r
E

r
) (12)E
r
. On the other hand, E
r
E

rn
is a disjoint family of sets all included in : [S
n
()[
. So
: [S
n
()[

n
r=0
(E
r
E

r
) (1 2)

n
r=0
E
r
,
and
Pr(sup
rn
[S
r
[ 3) =

n
r=0
E
r
min(1,

12
) 3,
(considering
1
3
,
1
3
separately), as required.
*272W The next result is a similarly direct application of the ideas of this section. While it will not be used in this
volume, it is an accessible and useful representative of a very large number of results on tails of sums of independent
random variables.
Theorem (Hoeffding 63) Let X
0
, . . . , X
n
be independent real-valued random variables such that 0 X
i
1 a.e.
for every i. Set S =
1
n+1

n
i=0
X
i
and a = E(S). Then
Pr(S a c) exp(2(n + 1)c
2
)
272X Independence 345
for every c 0.
proof (a) Set a
i
= E(X
i
) for each i. If b 0 and i n, then
E(e
bX
i
) exp(ba
i
+
1
8
b
2
).
PPP Set (x) = e
bx
for x R. Then is convex, so
(x) 1 +x(e
b
1)
whenever x [0, 1],
(X
i
)
a.e.
1 + (e
b
1)X
i
and
E(e
bX
i
) = E((X
i
)) 1 + (e
b
1)a
i
= e
h(b)
where h(t) = ln(1 a
i
+a
i
e
t
) for t R. Now h(0) = 0,
h

(t) =
a
i
e
t
1a
i
+a
i
e
t
= 1
1a
i
1a
i
+a
i
e
t
, h

(0) = a
i
,
h

(t) =
1a
i
1a
i
+a
i
e
t

a
i
e
t
1a
i
+a
i
e
t

1
4
because a
i
e
t
and 1a
i
are both greater than or equal to 0. By Taylors theorem with remainder, there is some t [0, b]
such that
h(b) = h(0) +bh

(0) +
1
2
b
2
h

i
(t) ba
i
+
1
8
b
2
,
and
E(e
bX
i
) exp(ba
i
+
1
8
b
2
). QQQ
(b) Take any b 0. Then
Pr(S a c) = Pr(
n

i=0
(X
i
a
i
c) 0) E(exp(b
n

i=0
X
i
a
i
c))
(because exp(b

n
i=0
X
i
a
i
c) 1 whenever

n
i=0
X
i
a
i
c 0)
= e
(n+1)bc
n

i=0
e
ba
i
E(
n

i=0
exp(bX
i
))
= e
(n+1)bc
n

i=0
e
ba
i
n

i=0
E(exp(bX
i
))
(because the random variables exp(bX
i
) are independent, by 272E, so the expectation of the product is the product of
the expectations, by 272R)
e
(n+1)bc
n

i=0
e
ba
i
exp(ba
i
+
1
8
b
2
)
((a) above)
= exp((n + 1)bc +
n+1
8
b
2
).
Now the minimum value of the quadratic
n+1
8
b
2
(n + 1)cb is 2(n + 1)c
2
when b = 4c, so Pr(S a c)
exp(2(n + 1)c
2
), as claimed.
272X Basic exercises (a) Let (, , ) be an atomless probability space, and
n

nN
any sequence in [0, 1]. Show
that there is an independent sequence E
n

nN
in such that E
n
=
n
for every n. (Hint: 215D.)
346 Probability theory 272Xb
>>>(b) Let X
i

iI
be a family of real-valued random variables. Show that it is independent i
E(h
1
(X
i
1
) . . . h
n
(X
i
n
)) =

n
j=1
E(h
j
(X
i
j
))
whenever i
1
, . . . , i
n
are distinct members of I and h
1
, . . . , h
n
are Borel measurable functions from R to R such that
E(h
j
(X
i
j
)) are all nite.
(c) Write out a proof of 272F which does not use the theory of product measures.
(d) Let XXX = (X
1
, . . . , X
n
) be a family of real-valued random variables all dened on the same probability space,
and suppose that XXX has a density function f expressible in the form f(
1
, . . . ,
n
) = f
1
(
1
)f
2
(
2
) . . . f
n
(
n
) for suitable
functions f
1
, . . . , f
n
of one real variable. Show that X
1
, . . . , X
n
are independent.
(e) Let X
1
, X
2
be independent real-valued random variables both with distribution and distribution function F.
Set Y = max(X
1
, X
2
). Show that the distribution of Y is absolutely continuous with respect to , with a Radon-
Nikod ym derivative F +F

, where F

(x) = lim
tx
F(t) for every x R.
(f ) Use 254Sa and the idea of 272J to give another proof of 272O.
(g) Let (, , ) be a probability space and
n

nN
a non-decreasing sequence of -subalgebras of . Let

be
the -algebra generated by

nN

n
. Let T be another -subalgebra of such that
n
and T are independent for
each n. Show that

and T are independent. (Hint: apply the Monotone Class Theorem to E : (EF) = E F
for every F T.) Use this to prove 272O.
(h) Let X
n

nN
be a sequence of real-valued random variables and Y a real-valued random variable such that Y
and X
n
are independent for each n N. Suppose that Pr(Y N) = 1 and that

n=0
Pr(Y n)E([X
n
[) is nite. Set
Z =

Y
n=0
X
n
(that is, Z() =

Y ()
n=0
X
n
() whenever domY is such that Y () N and domX
n
for every
n Y ()). (i) Show that E(Z) =

n=0
Pr(Y n)E(X
n
). (Hint: set X

n
() = X
n
() if Y () n, 0 otherwise.) (ii)
Show that if E(X
n
) = for every n N then E(Z) = E(Y ). (This is Walds equation.)
(i) Let X
1
, . . . , X
n
be independent real-valued random variables. Show that if X
1
+. . . +X
n
has nite expectation
so does every X
j
. (Hint: part (b) of the proof of 272S.)
>>>(j) Let X and Y be independent real-valued random variables with densities f and g. Show that X Y has a
density function h where h(x) =
_

1
|y|
g(y)f(
x
y
)dy for almost every x. (Hint: 271K.)
(k) Let X
0
, . . . , X
n
be independent real-valued random variables such that d
i
X
i
d

i
a.e. for every i. (i) Show
that if b 0 then E(e
bX
i
) exp(ba
i
+
1
8
b
2
(d

i
d
i
)
2
) for each i, where a
i
= E(X
i
). (ii) Set S =
1
n+1

n
i=0
X
i
and
a = E(S). Show that
Pr(S a c) exp(
2(n+1)
2
c
2
d
)
for every c 0, where d =

n
i=0
(d

i
d
i
)
2
.
(l) Suppose that X
0
, . . . , X
n
are independent real-valued random variables, all with expectation 0, such that
Pr([X
i
[ 1) = 1 for every i. Set S =
1

n+1

n
i=0
X
i
. Show that Pr(S c) exp(c
2
/2) for every c 0.
272Y Further exercises (a) Let X
0
, . . . , X
n
be independent real-valued random variables with distributions

0
, . . . ,
n
and distribution functions F
0
, . . . , F
n
. Show that, for any Borel set E R,
Pr(sup
in
X
i
E) =

n
i=0
_
E

i1
j=0
F

j
(x)

n
j=i+1
F
j
(x)
i
(dx),
where F

j
(x) = lim
tx
F
j
(t) for each j, and we interpret the empty products

1
j=0
F

j
(x),

n
j=n+1
F
j
(x) as 1.
(b) Let XXX = X
n

nN
be an independent sequence of real-valued random variables on a complete probability space
(, , ). Let B be the Borel -algebra of R
N
(271Ya). Let
(B)
XXX
be the probability measure with domain B dened by
setting
(B)
XXX
E = XXX
1
[E] for every E B, and write
XXX
for the completion of
(B)
XXX
. Show that
XXX
is just the product
of the distributions
X
n
.
272 Notes Independence 347
(c) Let X
1
, . . . , X
n
be real-valued random variables such that for each j < n the family
(X
1
, . . . , X
j
, X
j+1
, . . . , X
n
)
has the same joint distribution as the original family (X
1
, . . . , X
n
). Set S
j
= X
1
+ . . . + X
j
for each j n. (i) Show
that for any a 0
Pr(sup
1jn
[S
j
[ a) 2 Pr([S
n
[ a).
(Hint: show that if E
j
= :

in
domX
i
, [S
i
()[ < a for i < j, [S
j
()[ a then : E
j
, [S
n
()[
[S
j
()[
1
2
E
j
.) (ii) Show that E(sup
jn
[S
j
[) 2E([S
n
[). (iii) Show that E(sup
in
S
2
i
) 2E(S
2
n
).
(d) Let X
i

iN
be an independent sequence of real-valued random variables, and set S
n
=

n
i=0
X
i
for each n.
Show that if S
n

nN
converges to S in L
0
for the topology of convergence in measure, then S
n

nN
converges to S
a.e.
(e) Let (, , ) be a probability space.
(i) Let E
n

nN
be an independent sequence in . Show that for any real-valued random variable X with nite
expectation,
lim
n
_
E
n
X d E
n
E(X) = 0.
(Hint: let T
0
be the subalgebra of generated by E
n
: n N and T the -subalgebra of generated by E
n
: n N.
Start by considering X = E for E T
0
and then X = E for E T. Move from L
1
(T) to L
1
() by using
conditional expectations.)
(ii) Let X
n

nN
be a uniformly integrable independent sequence of real-valued random variables on . Show that
for any bounded real-valued random variable Y ,
lim
n
E(X
n
Y ) E(X
n
)E(Y ) = 0.
(iii) Suppose that 1 < p and set q = p/(p 1) (taking q = 1 if p = ). Let X
n

nN
be an independent
sequence of real-valued random variables with sup
nN
|X
n
|
p
< , and Y a real-valued random variable with |Y |
q
< .
Show that
lim
n
E(X
n
Y ) E(X
n
)E(Y ) = 0.
(f ) Let (, , ) be a probability space and Z
n

nN
a sequence of random variables on such that Pr(Z
n
N) = 1
for each n, and Pr(Z
m
= Z
n
) = 0 for all m ,= n. Let X
n

nN
be a sequence of real-valued random variables on , all
with the same distribution , and independent of each other and the Z
n
, in the sense that if
n
is the -algebra dened
by X
n
, and T
n
the -algebra dened by Z
n
, and T is the -algebra generated by

nN
T
n
, then (T,
0
,
1
, . . . ) is
independent. Set Y
n
() = X
Z
n
()
() whenever this is dened, that is, domZ
n
, Z
n
() N and domX
Z
n
()
.
Show that Y
n

nN
is an independent sequence of random variables and that every Y
n
has the distribution .
(g) Show that all the ideas of this section apply equally to complex-valued random variables, subject to suitable
adjustments (to be devised).
(h) Develop a theory of independence for random variables taking values in R
r
, following through as many as
possible of the ideas of this section.
272 Notes and comments This section is lengthy for two reasons: I am trying to pack in the basic results associated
with one of the most fertile concepts of mathematics, and it is hard to know where to stop; and I am trying to do this
in language appropriate to abstract measure theory, insisting on a variety of distinctions which are peripheral to the
essential ideas. For while I am prepared to be exible on the question of whether the letter X should denote a space or
a function, some of the applications of these results which are most important to me are in contexts where we expect
to be exactly clear what the domains of our functions are. Consequently it is necessary to form an opinion on such
matters as what the -algebra dened by a random variable really is (272C).
The point of 272Q is that the family c does not have to be related in any way to the family
i

iI
, except, of
course, that we are dealing with measurable sets. All we need to know is that I should be large compared with c;
for instance, that c is countable and I is uncountable. The family
j

jJ
is now a kind of tail of
i

iI
, safely
independent of the head -algebra generated by c.
Of course I should emphasize again that such proofs as those in 272R-272S are to be thought of as conrmations
that we have a suitable model of probability theory, rather than as reasons for believing the results to be valid in
348 Probability theory 272 Notes
statistical contexts. Similarly, 272T-272U can be approached by a variety of intuitions concerning discrete random
variables and random variables with continuous densities, and while the elegant general results are delightful, they are
more important to the pure mathematician than to the statistician. But I came to an odd obstacle in the proof of 272S,
when showing that if X
1
+. . . +X
n
has nite variance then so does every X
j
. We have done enough measure theory for
this to be readily dealt with, but the connexion with ordinary probabilistic intuition, both here and in 272Xi, remains
unclear to me.
There are four ideas in 272W worth storing for future use. The rst is the estimate
E(e
bX
i
) 1 a
i
+e
b
a
i
in part (a), a crude but eective way of using the hypothesis that X
i
is bounded. The second is the use of Taylors
theorem to show that 1 a
i
+e
b
a
i
exp(a
i
+
1
8
b
2
). The third is the estimate
Pr(Y 0) E(e
bY
) if b 0
used in part (b); and the fourth is 272R. After this one need only be suciently determined to reach 272Xk. But even
the special case 272W is both striking and useful.
273 The strong law of large numbers
I come now to the rst of the three main theorems of this chapter. Perhaps I should call it a principle, rather
than a theorem, as I shall not attempt to enunciate any fully general form, but will give three theorems (273D, 273H,
273I), with a variety of corollaries, each setting out conditions under which the averages of a sequence of independent
random variables will almost surely converge. At the end of the section (273N) I add a result on norm-convergence of
averages.
273A It will be helpful to start with an explicit statement of a very simple but very useful lemma.
Lemma Let E
n

nN
be a sequence of measurable sets in a measure space (, , ), and suppose that

n=0
E
n
< .
Then n : E
n
is nite for almost every .
proof We have
: n : E
n
is innite = (

nN
_
mn
E
m
) = inf
nN
(
_
mn
E
m
)
inf
nN

m=n
E
m
= 0.
273B Lemma Let X
n

nN
be an independent sequence of real-valued random variables, and set S
n
=

n
i=0
X
i
for each n N.
(a) If S
n

nN
is convergent in measure, then it is convergent almost everywhere.
(b) In particular, if E(X
n
) = 0 for every n and

n=0
E(X
2
n
) < , then

n=0
X
n
is dened, and nite, almost
everywhere.
proof (a) Let (, , ) be the underlying probability space. If we change each X
n
on a negligible set, we do not change
the independence of X
n

nN
(272H), and the S
n
are also changed only on a negligible set; so we may suppose from
the beginning that every X
n
is a measurable function dened on the whole of .
Because the functional X E(min(1, [X[)) is one of the pseudometrics dening the topology of convergence in
measure (245A), lim
m,n
E(min(1, [S
m
S
n
[)) = 0, and we can nd for each k N an n
k
N such that E(min(1, [S
m

S
n
k
[)) 4
k
for every m n
k
. So Pr([S
m
S
n
k
[ 2
k
) 2
k
for every m n
k
. By Etemadis lemma (272V)
applied to X
i

in
k
,
Pr(sup
n
k
mn
[S
m
S
n
k
[ 3 2
k
) 3 2
k
for every n n
k
. Setting
H
kn
= : sup
n
k
mn
[S
m
() S
n
k
()[ 3 2
k
for n n
k
,
H
k
=

nn
k
H
kn
,
we have
273C The strong law of large numbers 349
H
k
= lim
n
H
kn
3 2
k
for each k, so

k=0
H
k
is nite and almost every belongs to only nitely many of the H
k
(273A).
Now take any such . Then there is some r N such that / H
k
for any k r. In this case, for every k r,
/

nn
k
H
kn
, that is, [S
n
() S
n
k
()[ < 3 2
k
for every n n
k
. But this means that S
n
()
nN
is a Cauchy
sequence, therefore convergent. Since this is true for almost every , S
n

nN
converges almost everywhere, as claimed.
(b) Now suppose that E(X
n
) = 0 for every n and that

n=0
E(X
2
n
) < . In this case, for any m < n,
|S
n
S
m
|
2
1
||
2
2
|S
n
S
m
|
2
2
(by Cauchys inequality, 244Eb)
= E(S
n
S
m
)
2
= Var(S
n
S
m
)
(because E(S
n
S
m
) =

n
i=m+1
E(X
i
) = 0)
=
n

i=m+1
Var(X
i
)
(by Bienaymes equality, 272S)
0
as m . So S

nN
is a Cauchy sequence in L
1
() and converges in L
1
(), by 242F; by 245G, it converges in
measure in L
0
(), that is, S
n

nN
converges in measure in L
0
(). By (a), S
n

nN
converges almost everywhere, that
is,

i=0
X
i
is dened and nite almost everywhere.
Remark The proof above assumes familiarity with the ideas of Chapter 24. However part (b), at least, can be
established without any of these; see 273Xa. In 276B there is a generalization of (b) based on a dierent approach.
273C We now need a lemma (part (b) below) from the theory of summability. I take the opportunity to include
an elementary fact which will be useful later in this section and elsewhere.
Lemma (a) If lim
n
x
n
= x, then lim
n
1
n+1

n
i=0
x
i
= x.
(b) Let x
n

nN
be summable, and b
n

nN
a non-decreasing sequence in [0, [ diverging to . Then lim
n
1
b
n

n
k=0
b
k
x
k
=
0.
proof (a) Let > 0. Let m be such that [x
n
x[ whenever n m. Let m

m be such that [

m1
i=0
xx
i
[ m

.
Then for n m

we have
[x
1
n+1
n

i=0
x
i
[ =
1
n+1
[
n

i=0
x x
i
[

1
n+1
[
m1

i=0
x x
i
[ +
1
n+1
n

i=m
[x x
i
[

n+1
+
(nm+1)
n+1
2.
As is arbitrary, lim
n
1
n+1

n
i=0
x
i
= x.
(b) Let > 0. Write s
n
=

n
i=0
x
i
for each n, and
s = lim
n
s
n
=

i=0
x
i
;
set s

= sup
nN
[s
n
[ < . Let m N be such that [s
n
s[ whenever n m; then [s
n
s
j
[ 2 whenever j,
n m. Let m

m be such that b
m
s

b
m
.
Take any n m

. Then
350 Probability theory 273C
[
n

k=0
b
k
x
k
[ = [b
0
s
0
+b
1
(s
1
s
0
) +. . . +b
n
(s
n
s
n1
)[
= [(b
0
b
1
)s
0
+ (b
1
b
2
)s
1
+. . . + (b
n1
b
n
)s
n1
+b
n
s
n
[
= [b
0
s
n
+
n1

i=0
(b
i+1
b
i
)(s
n
s
i
)[
b
0
[s
n
[ +
m1

i=0
(b
i+1
b
i
)[s
n
s
i
[ +
n1

i=m
(b
i+1
b
i
)[s
n
s
i
[
b
0
s

+ 2s

m1

i=0
(b
i+1
b
i
) + 2
n1

i=m
(b
i+1
b
i
)
= b
0
s

+ 2s

(b
m
b
0
) + 2(b
n
b
m
) 2s

b
m
+ 2b
n
.
Consequently, because b
n
b
m
,
[
1
b
n

n
k=0
b
k
x
k
[ 2
s

b
m
b
n
+ 2 4.
As is arbitrary,
lim
n
1
b
n

n
k=0
b
k
x
k
= 0,
as required.
Remark Part (b) above is sometimes called Kroneckers lemma.
273D The strong law of large numbers: rst form Let X
n

nN
be an independent sequence of real-
valued random variables, and suppose that b
n

nN
is a non-decreasing sequence in ]0, [, diverging to , such
that

n=0
1
b
2
n
Var(X
n
) < . Then
lim
n
1
b
n

n
i=0
(X
i
E(X
i
)) = 0
almost everywhere.
proof As usual, write (, , ) for the underlying probability space. Set
Y
n
=
1
b
n
(X
n
E(X
n
))
for each n; then Y
n

nN
is independent (272E), E(Y
n
) = 0 for each n, and

n=0
E(Y
2
n
) =

n=0
1
b
2
n
Var(X
n
) < .
By 273B, Y
n
()
nN
is summable for almost every . But by 273Cb,
lim
n
1
b
n

n
i=0
(X
i
() E(X
i
)) = lim
n
1
b
n

n
i=0
b
i
Y
i
() = 0
for all such . So we have the result.
273E Corollary Let X
n

nN
be an independent sequence of random variables such that E(X
n
) = 0 for every n
and sup
nN
E(X
2
n
) < . Then
lim
n
1
b
n
(X
0
+. . . +X
n
) = 0
almost everywhere whenever b
n

nN
is a non-decreasing sequence of strictly positive numbers and

n=0
1
b
2
n
is nite.
In particular,
lim
n
1
n+1
(X
0
+. . . +X
n
) = 0
273H The strong law of large numbers 351
almost everywhere.
Remark For most of the rest of this section, we shall take b
n
= n + 1. The special virtue of 273D is that it allows
other b
n
, e.g., b
n
=

nln n. A direct strengthening of this theorem is in 276C below.
273F Corollary Let E
n

nN
be an independent sequence of measurable sets in a probability space (, , ). and
suppose that
lim
n
1
n+1

n
i=0
E
i
= c.
Then
lim
n
1
n+1
#(i : i n, E
i
) = c
for almost every .
proof In 273D, set X
n
= E
n
, b
n
= n + 1. For almost every , we have
lim
n
1
n+1

n
i=0
(E
i
() a
i
) = 0,
writing a
i
= E
i
= E(X
i
) for each i. (I see that I am relying on 272F to support the claim that X
n

nN
is independent.)
But for any such ,
lim
n
_
1
n+1
#(i : i n, E
i
)
1
n+1
n

i=0
a
i
_
= lim
n
1
n+1
n

i=0
(E
i
() a
i
) = 0;
because we are supposing that lim
n
1
n+1

n
i=0
a
i
= c, we must have
lim
n
1
n+1
#(i : i n, E
i
) = c,
as required.
273G Corollary Let be the usual measure on TN, as described in 254Jb. Then for -almost every set a N,
lim
n
1
n+1
#(a 0, . . . , n) =
1
2
.
proof The sets E
n
= a : n a are independent, with measure
1
2
.
Remark The limit lim
n
1
n+1
#(a 0, . . . , n) is called the asymptotic density of a.
273H Strong law of large numbers: second form Let X
n

nN
be an independent sequence of real-valued
random variables, and suppose that sup
nN
E([X
n
[
1+
) < for some > 0. Then
lim
n
1
n+1

n
i=0
(X
i
E(X
i
)) = 0
almost everywhere.
proof As usual, call the underlying probability space (, , ); as in 273B we can adjust the X
n
on negligible sets
so as to make them measurable and dened everywhere on , without changing E(X
n
), E([X
n
[) or the convergence of
the averages except on a negligible set.
(a) For each n, dene a random variable Y
n
on by setting
Y
n
() = X
n
() if [X
n
()[ n,
= 0 if [X
n
()[ > n.
Then Y
n

nN
is independent (272E). For each n N,
352 Probability theory 273H
Var(Y
n
) E(Y
2
n
) E(n
1
[X
n
[
1+
) n
1
K,
where K = sup
nN
E([X
n
[
1+
), so

n=0
1
(n+1)
2
Var(Y
n
)

n=0
n
1
(n+1)
2
K < .
By 273D,
G = : lim
n
1
n+1

n
i=0
(Y
i
() E(Y
i
)) = 0
is conegligible.
(b) On the other hand, setting
E
n
= : Y
n
() ,= X
n
() = : [X
n
()[ > n,
we have K n
1+
E
n
for each n, so

n=0
E
n
1 +K

n=1
1
n
1+
< ,
and the set H = : n : E
n
is nite is conegligible (273A). But of course
lim
n
1
n+1

n
i=0
(X
i
() Y
i
()) = 0
for every H.
(c) Finally,
[E(Y
n
) E(X
n
)[
_
E
n
[X
n
[
_
E
n
n

[X
n
[
1+
n

K
whenever n 1, so lim
n
E(Y
n
) E(X
n
) = 0 and
lim
n
1
n+1

n
i=0
E(Y
i
) E(X
i
) = 0
(273Ca). Putting these three together, we get
lim
n
1
n+1

n
i=0
X
i
() E(X
i
) = 0
whenever belongs to the conegligible set G H. So
lim
n
1
n+1

n
i=0
X
i
E(X
i
) = 0
almost everywhere, as required.
273I Strong law of large numbers: third form Let X
n

nN
be an independent sequence of real-valued
random variables of nite expectation, and suppose that they are identically distributed, that is, all have the same
distribution. Then
lim
n
1
n+1

n
i=0
(X
i
E(X
i
)) = 0
almost everywhere.
proof The proof follows the same line as that of 273H, but some of the inequalities require more delicate arguments.
As usual, call the underlying probability space (, , ) and suppose that the X
n
are all measurable and dened
everywhere on . (We need to remember that changing a random variable on a negligible set does not change its
distribution.) Let be the common distribution of the X
n
.
(a) For each n, dene a random variable Y
n
on by setting
Y
n
() = X
n
() if [X
n
()[ n,
= 0 if [X
n
()[ > n.
Then Y
n

nN
is independent (272E). For each n N,
Var(Y
n
) E(Y
2
n
) =
_
[n,n]
x
2
(dx)
273I The strong law of large numbers 353
(271E). To estimate

n=0
1
(n+1)
2
E(Y
2
n
), set
f
n
(x) =
x
2
(n+1)
2
if [x[ n, 0 if [x[ > n,
so that
1
(n+1)
2
Var(Y
n
)
_
f
n
d. If r 1 and r < [x[ r + 1 then

n=0
f
n
(x)

n=r+1
1
(n+1)
2
(r + 1)[x[
(r + 1)[x[

n=r+1
(
1
n

1
n+1
) [x[,
while if [x[ 1 then

n=0
f
n
(x)

n=0
1
(n+1)
2
=

2
6
2 < .
(You do not need to know that the sum is

2
6
, only that it is nite; but see 282Xo.) Consequently
f(x) =

n=0
f
n
(x) 2 +[x[
for every x, and
_
fd < , because
_
[x[(dx) is the common value of E([X
n
[), and is nite. By any of the great
convergence theorems,

n=0
1
(n+1)
2
Var(Y
n
)

n=0
_
f
n
d =
_
fd < .
By 273D,
G = : lim
n
1
n+1

n
i=0
(Y
i
() E(Y
i
)) = 0
is conegligible.
(b) Next, setting
E
n
= : X
n
() ,= Y
n
() = : [X
n
()[ > n,
we have
E
n
=

in
F
ni
,
where
F
ni
= : i < [X
n
()[ i + 1.
Now
F
ni
= x : i < [x[ i + 1
for every n and i. So

n=0
E
n
=

n=0

i=n
F
ni
=

i=0
i

n=0
F
ni
=

i=0
(i + 1)x : i < [x[ i + 1
_
(1 +[x[)(dx) < .
Consequently the set H = : n : X
n
() ,= Y
n
() is nite is conegligible (273A). But of course
lim
n
1
n+1

n
i=0
X
i
() Y
i
() = 0
for every H.
(c) Finally,
[E(Y
n
) E(X
n
)[
_
E
n
[X
n
[ =
_
R\[n,n]
[x[(dx)
whenever n N, so lim
n
E(Y
n
) E(X
n
) = 0 and
354 Probability theory 273I
lim
n
1
n+1

n
i=0
E(Y
i
) E(X
i
) = 0
(273Ca). Putting these three together, we get
lim
n
1
n+1

n
i=0
X
i
() E(X
i
) = 0
whenever belongs to the conegligible set G H. So
lim
n
1
n+1

n
i=0
X
i
E(X
i
) = 0
almost everywhere, as required.
Remarks In my own experience, this is the most important form of the strong law from the point of view of pure
measure theory. I note that 273G above can also be regarded as a consequence of this form.
For a very striking alternative proof, see 275Yn. Yet another proof treats this result as a special case of the Ergodic
Theorem (see 372Xg in Volume 3).
273J Corollary Let (, , ) be a probability space. If f is a real-valued function such that
_
fd is dened in
[, ], then
lim
n
1
n+1

n
i=0
f(
i
) =
_
fd
for -almost every =
n

nN

N
, where is the product measure on
N
(254A-254C).
proof (a) To begin with, suppose that f is integrable. Dene functions X
n
on
N
by setting
X
n
() = f(
n
) whenever
n
domf.
Then X
n

nN
is an independent sequence of random variables, all with the same distribution as f (272M). So
lim
n
1
n+1

n
i=0
f(
i
)
_
fd = lim
n
1
n+1

n
i=0
X
i
() E(X
i
) = 0
for almost every , by 273I, and
lim
n
1
n+1

n
i=0
f(
i
) =
_
fd
for almost every .
(b) Next, suppose that f 0 and
_
f = . In this case, for every m N,
liminf
n
1
n+1
n

i=0
f(
i
) liminf
n
1
n+1
n

i=0
min(m, f(
i
))
=
_
min(m, f())(d)
for almost every , so
liminf
n
1
n+1

n
i=0
f(
i
) sup
mN
_
min(m, f())(d) =
and
lim
n
1
n+1

n
i=0
f(
i
) = =
_
f
for almost every .
(c) In general, if
_
f = , this is because
_
f
+
= and f

is integrable, so
lim
n
1
n+1
n

i=0
f(
i
) = lim
n
1
n+1
n

i=0
f
+
(
i
) lim
n
1
n+1
n

i=0
f

(
i
)
=
_
f

=
_
f
for almost every . Similarly,
273L The strong law of large numbers 355
lim
n
1
n+1

n
i=0
f(
i
) =
for almost every if
_
fd = .
Remark I nd myself slipping here into measure-theorists terminology; this corollary is one of the basic applications
of the strong law to measure theory. Obviously, in view of 272J and 272M, this corollary covers 273I. It could also
(in theory) be used as a denition of integration on a probability space (see 273Ya); it is sometimes called the Monte
Carlo method of integration.
273K It is tempting to seek extensions of 273I in which the X
n
are not identically distributed, but are otherwise
well-behaved. Any such idea should be tested against the following example. I nd that I need another standard result,
complementing that in 273A.
Borel-Cantelli lemma Let (, , ) be a probability space and E
n

nN
a sequence of measurable subsets of such
that

n=0
E
n
= and (E
m
E
n
) E
m
E
n
whenever m ,= n. Then almost every point of belongs to innitely
many of the E
n
.
proof For n, k N set X
n
=

n
i=0
E
i
,
n
=

n
i=0
E
n
= E(X
n
) and F
nk
= x : x , #(i : i n, x E
i
) k.
Then
E(X
2
n
) =
n

i=0
n

j=0
(E
i
E
j
)
n

i=0
E
i
+
n

i=0

j=i
E
i
E
j
=
n
+
2
n

n

i=0
(E
i
)
2
,
so
Var(X
n
) =
n

n
i=0
(E
i
)
2

n
.
Now if k <
n
,
(
n
k)
2
F
nk
= (
n
k)
2
Pr(X
n
k) E(X
n

n
)
2
= Var(X
n
)
n
and F
nk


n
(
n
k)
2
.
Now recall that we are assuming that lim
n

n
= . So for any k N,
(

nN
F
nk
) = lim
n
F
nk
lim
n

n
(
n
k)
2
= 0.
Accordingly
x : x belongs to only nitely many E
n
= (

kN

nN
F
nk
) = 0,
and almost every point of belongs to innitely many E
n
.
Remark Of course this result is usually applied to an independent sequence E
n

nN
. But very occasionally it is of
interest to know that it is enough to assume that weaker hypotheses suce. See also 273Yb.
273L Now for the promised example.
Example There is an independent sequence X
n

nN
of non-negative random variables such that lim
n
E(X
n
) = 0
but
limsup
n
1
n+1

i=0
X
i
E(X
i
) = ,
liminf
n
1
n+1

i=0
X
i
E(X
i
) = 0
almost everywhere.
proof Let (, , ) be a probability space with an independent sequence E
n

nN
of measurable sets such that
E
n
=
1
(n+3) ln(n+3)
for each n. (I have nowhere explained exactly how to build such a sequence. Two obvious
methods are available to us, and another a trie less obvious. (i) Take = 0, 1
N
and to be the product of the
356 Probability theory 273L
probabilities
n
on 0, 1, dened by saying that
n
1 =
1
(n+3) ln(n+3)
for each n; set E
n
= : (n) = 1, and
appeal to 272M to check that the E
n
are independent. (ii) Build the E
n
inductively as subsets of [0, 1], arranging
that each E
n
should be a nite union of intervals, so that when you come to choose E
n+1
the sets E
0
, . . . , E
n
dene a
partition J
n
of [0, 1] into intervals, and you can take E
n+1
to be the union of (say) the left-hand subintervals of length
a proportion
1
(n+3) ln(n+3)
of the intervals in J
n
. (iii) Use 215D to see that the method of (ii) can be used on any
atomless probability space, as in 272Xa.)
Set X
n
= (n + 3) ln ln(n + 3)E
n
for each n; then X
n

nN
is an independent sequence of real-valued random
variables (272F) and E(X
n
) =
ln ln(n+3)
ln(n+3)
for each n, so that E(X
n
) 0 as n . Thus, for instance, X
n
: n N
is uniformly integrable and X
n

nN
0 in measure (246Jc); while surely lim
n
1
n+1

n
i=0
E(X
i
) = 0.
On the other hand,

n=0
E
n
=

n=0
1
(n+3) ln(n+3)

_

0
1
(x+3) ln(x+3)
dx
= lim
a
(ln ln(a + 3) ln ln 3) = ,
so almost every belongs to innitely many of the E
n
, by the Borel-Cantelli lemma (273K). Now if we write Y
n
=
1
n+1

n
i=0
X
i
, then if E
n
we have X
n
() = (n + 3) ln ln(n + 3) so
Y
n
()
n+3
n+1
ln ln(n + 3).
This means that
: limsup
n
1
n+1
n

i=0
(X
i
() E(X
i
)) = = : limsup
n
1
n+1
n

i=0
X
i
() =
= : sup
nN
Y
n
() = : n : E
n
is innite
is conegligible, and the strong law of large numbers does not apply to X
n

nN
.
Because
lim
n
|Y
n
|
1
= lim
n
E(Y
n
) = lim
n
E(X
n
) = 0
(273Ca), Y
n

nN
0 for the topology of convergence in measure, and Y
n

nN
has a subsequence converging to 0
almost everywhere (245K). So
liminf
n
1
n+1

n
i=0
(X
i
() E(X
i
)) = liminf
n
Y
n
() = 0
for almost every . The fact that both limsup
n
Y
n
and liminf
n
Y
n
are constant almost everywhere is of course
a consequence of the zero-one law (272P).
*273M All the above has been concerned with pointwise convergence of the averages of independent random
variables, and that is the important part of the work of this section. But it is perhaps worth complementing it with a
brief investigation of norm-convergence. To deal eciently with convergence in L
p
, we need the following. (I should
perhaps remark that, compared with the general case treated here, the case p = 2 is trivial; see 273Xl.)
Lemma For any p ]1, [ and > 0, there is a > 0 such that |S + X|
p
1 + |X|
p
whenever S and X are
independent random variables, |S|
p
= 1, |X|
p
and E(X) = 0.
proof (a) Take ]0, 1] such that p 2 and
(1 +)
p
1 +p +
p
2
2

2
whenever [[ ; such exists because
lim
0
(1+)
p
1p

2
=
p(p1)
2
<
p
2
2
.
Observe that
*273M The strong law of large numbers 357
(1 +)
p
(1 +
1

)
p
+
p
+ 2p
p1
for every 0. PPP If
1

, this is trivial. If
1

, then
(1 +)
p
=
p
(1 +
1

)
p

p
(1 +
p

+
p
2
2
2
)

p
(1 +
p

+
p
2

2
) =
p
+p
p1
(1 +
p
2
)
p
+ 2p
p1
. QQQ
Dene > 0 by declaring that 3
p1
=

2
(this is one of the places where we need to know that p > 1). Let > 0
be such that
,
p
2
2
2
+ (1 +
1

)
p

p1

p
2
.
(b) Now suppose that S and X are independent random variables with |S|
p
= 1, |X|
p
and E(X) = 0. If
|X|
p
= 0 then of course |S + X|
p
1 + |X|
p
, so suppose that X is non-trivial. Write (, , ) for the underlying
probability space and adjust S and X on negligible sets so that they are measurable and dened everywhere on . Set
= |X|
p
, = /,
E = : S() ,= 0, F = : [X()[ > [S()[, = |S F|
p
.
Then
_
[S +X[
p
=
_
F
[S +X[
p
+
_
E\F
[S +X[
p
(because S and X are both zero on (E F))
= |(S F) + (X F)|
p
p
+
_
E\F
[S[
p
[1 +
X
S
[
p
(|S F|
p
+|X F)|
p
)
p
+
_
E\F
[S[
p
(1 +p
X
S
+
p
2
2

2
)
(because [
X
S
[

1 everywhere on E F)
( +)
p
+ (1 +
p
2
2

2
)
_
E\F
[S[
p
+p
_
E\F
[S[
p1
sgn S X
(writing sgn() = /[[ if ,= 0, 0 if = 0)
= ( +)
p
+ (1 +
p
2
2

2
)
_
\F
[S[
p
+p
_
\F
[S[
p1
sgn S X
(because S = 0 on E)
=
p
(1 +

)
p
+ (1 +
p
2
2

2
)(1
p
) p
_
F
[S[
p1
sgn S X
(because X and [S[
p1
sgn S are independent, by 272L, so
_
[S[
p1
sgn S X = E([S[
p1
sgn S)E(X) = 0)

p
_
(1 +
1

)
p
+ 2p(

)
p1
+ (

)
p
_
+ (1 +
p
2
2

2
)(1
p
)
+p
_
F
[S[
p1
[X[
(see (a) above)

p
(1 +
1

)
p
+
p
+ 2p
p1
+ (1 +
p
2
2

2
)(1
p
)
+p
_
F
1

p1
[X[
p

p
(1 +
1

)
p
+ 2p

p

p1
+ 1 +
p
2
2

2
+p

p

p1
(because = |S F|
p

1

|X F|
p

)
358 Probability theory *273M
=
p
(1 +
1

)
p
+ 3p
p1
+ 1 +
p
2

2
2
2
= 1 +
_

p1
(1 +
1

)
p
+ 3p
p1
+
p
2

2
2
_

1 +
_

p1
(1 +
1

)
p
+ 3p
p1
+
p
2

2
2
_

1 +p (1 +|X|
p
)
p
.
So |S +X|
p
1 +|X|
p
, as required.
*Remark What is really happening here is that = | |
p
p
: L
p
R is dierentiable (as a real-valued function on the
normed space L
p
) and

(S

)(X

) = p
_
[S[
p1
sgn S X,
so that in the context here
((S +X)

) = (S

) +

(S

)(X

) +o(|X|
p
) = 1 +o(|X|
p
)
and |S+X|
p
= 1+o(|X|
p
). The calculations above are elaborate partly because they do not appeal to any non-trivial
ideas about normed spaces, and partly because we need the estimates to be uniform in S.
273N Theorem Let X
n

nN
be an independent sequence of real-valued random variables with zero expectation,
and set Y
n
=
1
n+1
(X
0
+. . . +X
n
) for each n N.
(a) If X
n

nN
is uniformly integrable, then lim
n
|Y
n
|
1
= 0.
*(b) If p ]1, [ and sup
nN
|X
n
|
p
< , then lim
n
|Y
n
|
p
= 0.
proof (a) Let > 0. Then there is an M 0 such that E([X
n
[ M)
+
for every n N. Set
X

n
= (M) (X
n
M),
n
= E(X

n
),

X
n
= X

n

n
, X

n
= X
n
X

n
for each n N. Then X

nN
and

X
n

nN
are independent and uniformly bounded, and |X

n
|
1
for every n. So
if we write

Y
n
=
1
n+1

n
i=0

X
i
, Y

n
=
1
n+1

n
i=0
X

i
,

Y
n

nN
0 almost everywhere, by 273E (for instance), while |Y

n
|
1
for every n. Moreover,
[
n
[ = [E(X

n
X
n
)[ E([X

n
[)
for every n. As [

Y
n
[ 2M almost everywhere for each n, lim
n
|

Y
n
|
1
= 0, by Lebesgues Dominated Convergence
Theorem. So
limsup
n
|Y
n
|
1
= limsup
n
|

Y
n
+Y

n
+
n
|
1
lim
n
|

Y
n
|
1
+ sup
nN
|Y

n
|
1
+ sup
nN
[
n
[
2.
As is arbitrary, lim
n
|Y
n
|
1
= 0, as claimed.
*(b) Set M = sup
nN
|X
n
|
p
. For n N, set S
n
=

n
i=0
X
i
. Let > 0. Then there is a > 0 such that
|S + X|
p
1 + |X|
p
whenever S and X are independent random variables, |S|
p
= 1, |X|
p
and E(X) = 0
(273M). It follows that |S +X|
p
|S|
p
+|X|
p
whenever S and X are independent random variables, |S|
p
is nite,
|X|
p
|S|
p
and E(X) = 0. In particular, |S
n+1
|
p
|S
n
|
p
+M whenever |S
n
|
p
M/. An easy induction shows
that
|S
n
|
p

M

+M +nM
for every n N. But this means that
limsup
n
|Y
n
|
p
= limsup
n
1
n+1
|S
n
|
p
M.
As is arbitrary, lim
n
|Y
n
|
p
= 0.
Remark There are strengthenings of (a) in 276Xd, and of (b) in 276Ya.
273Xk The strong law of large numbers 359
273X Basic exercises (a) In part (b) of the proof of 273B, use Bienaymes equality to show that lim
m
sup
nm
Pr([S
n

S
m
[ ) = 0 for every > 0, so that we can apply the argument of part (a) of the proof directly, without appealing
to 242F or 245G or even 244E.
(b) Show that

n=0
(1)
(n)
n+1
is dened in R for almost every = (n)
nN
in 0, 1
N
, where 0, 1
N
is given its
usual measure (254J).
(c) Let E
n

nN
be an independent sequence of measurable sets in a probability space, all with the same non-zero
measure. Let a
n

nN
be a sequence of non-negative real numbers such that

n=0
a
n
= . Show that

n=0
a
n
E
n
=
a.e. (Hint: Take a strictly increasing sequence k
n

nN
such that d
n
=

k
n+1
i=k
n
+1
a
i
1 for each n. Set c
i
=
a
i
(n+1)d
n
for k
n
< i k
n+1
; show that

n=0
c
2
n
< =

n=0
c
n
. Apply 273D with X
n
= c
n
E
n
and b
n
=
_

n
i=0
c
i
.)
>>>(d) Take any q [0, 1], and give TN a measure such that
a : I a = q
#(I)
for every I N, as in 254Xg. Show that for -almost every a N,
lim
n
1
n+1
#(a 0, . . . , n) = q.
>>>(e) Let be the usual probability measure on TN (254Jb), and for r 1 let
r
be the product probability measure
on (TN)
r
. Show that
lim
n
1
n+1
#(a
1
. . . a
r
0, . . . , n) = 2
r
,
lim
n
1
n+1
#((a
1
. . . a
r
) 0, . . . , n) = 1 2
r
for
r
-almost every (a
1
, . . . , a
r
) (TN)
r
.
(f ) Let be the usual probability measure on TN, and b any innite subset of N. Show that lim
n
#(ab{0,... ,n})
#(b{0,... ,n})
=
1
2
for almost every a N.
>>>(g) For each x [0, 1], let
k
(x) be the kth digit in the decimal expansion of x (choose for yourself what to do
with 0100 . . . = 0099 . . . ). Show that lim
k
1
k
#(j : j k,
j
(x) = 7) =
1
10
for almost every x [0, 1].
(h) Let F
n

nN
be a sequence of distribution functions for real-valued random variables, in the sense of 271Ga,
and F another distribution function; suppose that lim
n
F
n
(q) = F(q) for every q Q and lim
n
F
n
(a

) = F(a

)
whenever F(a

) < F(a), where I write F(a

) for lim
xa
F(x). Show that F
n
F uniformly.
>>>(i) Let (, , ) be a probability space and X
n

nN
an independent identically distributed sequence of real-valued
random variables on with common distribution function F. For a R, n N and

in
domX
i
set
F
n
(, a) =
1
n+1
#(i : i n, X
i
() a).
Show that
lim
n
sup
aR
[F
n
(, a) F(a)[ = 0
for almost every .
(j) Let (, , ) be a probability space, and the product measure on
N
. Let f : R be a function, and set
f

() = limsup
n
1
n+1

n
i=0
f(
i
) for =
n

nN

N
. Show that
_
f

d =
_
fd whenever the right-hand-side
is nite. (Hint: 133J(a-i).)
(k) Find an independent sequence X
n

nN
of random variables with zero expectation such that |X
n
|
1
= 1 and
|
1
n+1

n
i=0
X
i
|
1

1
2
for every n N. (Hint: take Pr(X
n
,= 0) very small.)
360 Probability theory 273Xl
(l) Use 272S to prove 273Nb in the case p = 2.
(m) Find an independent sequence X
n

nN
of random variables with zero expectation such that |X
n
|

=
|
1
n+1

n
i=0
X
i
|

= 1 for every n N.
(n) Repeat the work of this section for complex-valued random variables.
273Y Further exercises (a) Let (, , ) be a probability space, and the product measure on
N
. Suppose
that f is a real-valued function, dened on a subset of , such that
h() = lim
n
1
n+1

n
i=0
f(
i
)
exists in R for -almost every =
n

nN
in
N
. Show (i) that f has conegligible domain (ii) f is

-measurable,
where

is the domain of the completion of (iii) there is an a R such that h = a almost everywhere in
N
(iv) f is
integrable, with
_
fd = a.
(b) Let X
n

nN
be a sequence of random variables with nite variance. Suppose that lim
n
E(X
n
) = and
liminf
n
E(X
2
n
)
(E(X
n
))
2
1. Show that limsup
n
X
n
= a.e.
273 Notes and comments I have tried in this section to oer the most useful of the standard criteria for pointwise
convergence of averages of independent random variables. In my view the strong law of large numbers, like Fubinis
theorem, is one of the crucial steps in measure theory, where the subject changes character. Theorems depending
on the strong law have a kind of depth and subtlety to them which is missing in other parts of the subject. I have
described only a handful of applications here, but I hope that 273G, 273J, 273Xd, 273Xg and 273Xi will give an idea
of what is to be expected. These do have rather dierent weights. Of the four, only 273J requires the full resources of
this chapter; the others can be deduced from the essentially simpler version in 273Xi.
273Xi is the fundamental theorem of statistics or Glivenko-Cantelli theorem. The F
n
(., a) are statistics, computed
from the X
i
; they are the empirical distributions, and the theorem says that, almost surely, F
n
F uniformly. (I say
uniformly to make the result look more striking, but of course the real content is that F
n
(., a) F(a) almost surely
for each a; the extra step is just 273Xh.)
I have included 273N to show that independence is quite as important in questions of norm-convergence as it is in
questions of pointwise convergence. It does not really rely on any form of the strong law; in the proof I quote 273E as a
quick way of disposing of the uniformly bounded parts X

n
, but of course Bienaymes equality (272S) is already enough
to show that if X

nN
is an independent uniformly bounded sequence of random variables with zero expectation,
then |
1
n+1
(X
0
+. . . +X
n
)|
p
0 for p = 2, and therefore for every p < .
The proofs of 273H, 273I and 273Na all involve truncation; the expression of a random variable X as the sum of
a bounded random variable and a tail. This is one of the most powerful techniques in the subject, and will appear
again in 274 and 276. In 273Na I used a slightly dierent formulation of the method, solely because it matched the
denition of uniformly integrable more closely.
274 The central limit theorem
The second of the great theorems to which this chapter is devoted is of a new type. It is a limit theorem, but the limit
involved is a limit of distributions, not of functions (as in the strong limit theorem above or the martingale theorem
below), nor of equivalence classes of functions (as in Chapter 24). I give three forms of the theorem, in 274I-274K, all
drawn as corollaries of Theorem 274G; the proof is spread over 274C-274G. In 274A-274B and 274M I give the most
elementary properties of the normal distribution.
274A The normal distribution We need some facts from basic probability theory.
(a) Recall that
_

e
x
2
/2
dx =

2
(263G). Consequently, if we set

G
E =
1

2
_
E
e
x
2
/2
dx
274B The central limit theorem 361
for every Lebesgue measurable set E,
G
is a Radon probability measure (256E); we call it the standard normal
distribution. The corresponding distribution function is
(a) =
G
], a] =
1

2
_
a

e
x
2
/2
dx
for a R; for the rest of this section I will reserve the symbol for this function.
Writing for the algebra of Lebesgue measurable subsets of R, (R, ,
G
) is a probability space. Note that it is
complete, and has the same negligible sets as Lebesgue measure, because e
x
2
/2
> 0 for every x (cf. 234Lc).
(b) A random variable X is standard normal if its distribution is
G
; that is, if the function x
1

2
e
x
2
/2
is a
density function for X. The point of the remarks in (a) is that there are such random variables; for instance, take the
probability space (R, ,
G
) there, and set X(x) = x for every x R.
(c) If X is a standard normal random variable, then
E(X) =
1

2
_

xe
x
2
/2
dx = 0,
Var(X) =
1

2
_

x
2
e
x
2
/2
dx = 1
by 263H.
(d) More generally, a random variable X is normal if there are a R, > 0 such that Z = (X a)/ is standard
normal. In this case X = Z +a so E(X) = E(Z) +a = a, Var(X) =
2
Var(Z) =
2
.
We have, for any c R,
1

2
_
c

e
(xa)
2
/2
2
dx =
1

2
_
(ca)/

e
y
2
/2
dy
(substituting x = a +y for < y (c a)/)
= Pr(Z
ca

) = Pr(X c).
So x
1

2
e
(xa)
2
/2
2
is a density function for X (271Ib). Conversely, of course, a random variable with such a
density function is normal, with expectation a and variance
2
. The normal distributions are the distributions with
these density functions.
(e) If Z is standard normal, so is Z, because
Pr(Z a) = Pr(Z a) =
1

2
_

a
e
x
2
/2
dx =
1

2
_
a

e
x
2
/2
dx.
The denition in the rst sentence of (d) now makes it obvious that if X is normal, so is a + bX for any a R,
b R 0.
274B Proposition Let X
1
, . . . , X
n
be independent normal random variables. Then Y = X
1
+. . . +X
n
is normal,
with E(Y ) = E(X
1
) +. . . +E(X
n
) and Var(Y ) = Var(X
1
) +. . . + Var(X
n
).
proof There are innumerable proofs of this fact; the following one gives me a chance to show o the power of Chapter
26, but of course (at the price of some disagreeable algebra) 272U also gives the result.
(a) Consider rst the case n = 2. Setting a
i
= E(X
i
),
i
=
_
Var(X
i
), Z
i
= (X
i
a
i
)/
i
we get independent standard
normal variables Z
1
, Z
2
. Set =
_

2
1
+
2
2
, and express
1
,
2
as cos , sin . Consider U = cos Z
1
+sin Z
2
. We
know that (Z
1
, Z
2
) has a density function
(
1
,
2
) g(
1
,
2
) =
1
2
1

2
e
(
2
1
+
2
2
)/2
(272I). Consequently, for any c R,
Pr(U c) =
_
F
g(z)dz,
362 Probability theory 274B
where F = (
1
,
2
) :
1
cos +
2
sin c. But now let T be the matrix
_
cos sin
sin cos
_
.
Then it is easy to check that
T
1
[F] = (
1
,
2
) :
1
c,
det T = 1,
g(Ty) = g(y) for every y R
2
,
so by 263A
Pr(U c) =
_
F
g(z)dz =
_
T
1
[F]
g(Ty)dy =
_
],c]R
g(y)dy = Pr(Z
1
c) = (c).
As this is true for every c R, U also is standard normal (I am appealing to 271Ga again). But
X
1
+X
2
=
1
Z
1
+
2
Z
2
+a
1
+a
2
= U +a
1
+a
2
,
so X
1
+X
2
is normal.
(b) Now we can induce on n. If n = 1 the result is trivial. For the inductive step to n + 1 2, we know that
X
1
+ . . . + X
n
is normal, by the inductive hypothesis, and that X
n+1
is independent of X
1
+ . . . + X
n
, by 272L. So
X
1
+. . . +X
n
+X
n+1
is normal, by (a).
The computation of the expectation and variance of X
1
+. . . +X
n
is immediate from 271Ab and 272S.
274C Lemma Let U
0
, . . . , U
n
, V
0
, . . . , V
n
be independent real-valued random variables and h : R R a bounded
Borel measurable function. Then
[E
_
h(

n
i=0
U
i
) h(

n
i=0
V
i
)
_
[

n
i=0
sup
tR
[E
_
h(t +U
i
) h(t +V
i
)
_
[.
proof For 0 j n + 1, set Z
j
=

j1
i=0
U
i
+

n
i=j
V
i
, taking Z
0
=

n
i=0
V
i
and Z
n+1
=

n
i=0
U
i
, and for j n set
W
j
=

j1
i=0
U
j
+

n
i=j+1
V
j
, so that Z
j
= W
j
+ V
j
and Z
j+1
= W
j
+ U
j
and W
j
, U
j
and V
j
are independent (I am
appealing to 272K, as in 272L). Then
[E
_
h(
n

i=0
U
i
) h(
n

i=0
V
i
)
_
[ = [E
_
n

i=0
h(Z
i+1
) h(Z
i
)
_
[

i=0
[E
_
h(Z
i+1
) h(Z
i
)
_
[
=
n

i=0
[E
_
h(W
i
+U
i
) h(W
i
+V
i
)
_
[.
To estimate this sum I turn it into a sum of integrals, as follows. For each i, let
W
i
be the distribution of W
i
, and
so on. Because (w, u) w + u is continuous, therefore Borel measurable, (w, u) h(w, u) also is Borel measurable;
accordingly (w, u, v) h(w+u) h(w+v) is measurable for each of the product measures
W
i

U
i

V
i
on R
3
, and
271E and 272G give us

E
_
h(W
i
+U
i
)h(W
i
+V
i
)
_

_
h(w +u) h(w +v)(
W
i

U
i

V
i
)d(w, u, v)

_
_
_
h(w +u) h(w +v)(
U
i

V
i
)d(u, v)
_

W
i
(dw)

_
h(w +u) h(w +v)(
U
i

V
i
)d(u, v)

W
i
(dw)
=
_

E
_
h(w +U
i
) h(w +V
i
)
_

W
i
(dw)
sup
tR

E
_
h(t +U
i
) h(t +V
i
)
_

.
274D The central limit theorem 363
So we get
[E
_
h(
n

i=0
U
i
) h(
n

i=0
V
i
)
_
[
n

i=0
[E
_
h(W
i
+U
i
) h(W
i
+V
i
)
_
[

i=0
sup
tR
[E
_
h(t +U
i
) h(t +V
i
)
_
[,
as required.
274D Lemma Let h : R R be a bounded three-times-dierentiable function such that M
2
= sup
xR
[h

(x)[,
M
3
= sup
xR
[h

(x)[ are both nite. Let > 0.


(a) Let U be a real-valued random variable of zero expectation and nite variance
2
. Then for any t R we have
[E(h(t +U)) h(t)

2
2
h

(t)[
1
6
M
3

2
+M
2
E(

(U))
where

(x) = 0 if [x[ , x
2
if [x[ > .
(b) Let U
0
, . . . , U
n
, V
0
, . . . , V
n
be independent random variables with nite variances, and suppose that E(U
i
) =
E(V
i
) = 0, Var(U
i
) = Var(V
i
) =
2
i
for every i n. Then
[E
_
h(
n

i=0
U
i
) h(
n

i=0
V
i
)
_
[

1
3
M
3
n

i=0

2
i
+M
2
n

i=0
E
_

(U
i
)
_
+M
2
n

i=0
E
_

(V
i
)
_
.
proof (a) The point is that, by Taylors theorem with remainder,
[h(t +x) h(t) xh

(t)[
1
2
M
2
x
2
,
[h(t +x) h(t) xh

(t)
1
2
x
2
h

(t)[
1
6
M
3
[x[
3
for every x R. So
[h(t +x) h(t) xh

(t)
1
2
x
2
h

(t)[ min(
1
6
M
3
[x[
3
, M
2
x
2
)
1
6
M
3
x
2
+M
2

(x).
Integrating with respect to the distribution of U, we get
[E
_
h(t +U)) h(t)
1
2
h

(t)
2
_
[ = [E(h(t +U)) h(t) h

(t)E(U)
1
2
h

(t)E(U
2
)[
= [E
_
h(t +U) h(t) h

(t)U
1
2
h

(t)U
2
_
[
E
_
[h(t +U) h(t) h

(t)U
1
2
h

(t)U
2
[
_
E
_
1
6
M
3
U
2
+M
2

(U)
_
=
1
6
M
3

2
+M
2
E(

(U)),
as claimed.
(b) By 274C,
[E
_
h(
n

i=0
U
i
) h(
n

i=0
V
i
)
_
[
n

i=0
sup
tR
[E
_
h(t +U
i
) h(t +V
i
)
_
[

i=0
sup
tR
_
[E(h(t +U
i
)) h(t)
1
2
h

(t)
2
i
[
+[E(h(t +V
i
)) h(t)
1
2
h

(t)
2
i
[
_
,
364 Probability theory 274D
which by (a) above is at most

n
i=0
1
3
M
3

2
i
+M
2
E(

(U
i
)) +M
2
E(

(V
i
)),
as claimed.
274E Lemma For any > 0, there is a three-times-dierentiable function h : R [0, 1], with continuous third
derivative, such that h(x) = 1 for x and h(x) = 0 for x .
proof Let f : ], [ ]0, [ be any twice-dierentiable function such that
lim
x
f
(n)
(x) = lim
x
f
(n)
(x) = 0
for n = 0, 1 and 2, writing f
(n)
for the nth derivative of f; for instance, you could take f(x) = (
2
x
2
)
3
, or
f(x) = exp(
1

2
x
2
). Now set
h(x) = 1
_
x

f/
_

f
for [x[ .
274F Lindebergs theorem Let > 0. Then there is a > 0 such that whenever X
0
, . . . , X
n
are independent
real-valued random variables such that
E(X
i
) = 0 for every i n,

n
i=0
Var(X
i
) = 1,

n
i=0
E(

(X
i
))
(writing

(x) = 0 if [x[ , x
2
if [x[ > ), then

Pr(

n
i=0
X
i
a) (a)


for every a R.
proof (a) Let h : R [0, 1] be a three-times-dierentiable function, with continuous third derivative, such that
], ] h ], ], as in 274E. Set
M
2
= sup
xR
[h

(x)[ = sup
|x|
[h

(x)[,
M
3
= sup
xR
[h

(x)[ = sup
|x|
[h

(x)[;
because h

is continuous, both are nite. Write

= (1
2

2
) > 0, and let > 0 be such that
(
1
3
M
3
+ 2M
2
)

.
Note that lim
m

m
(x) = 0 for every x, so if X is a random variable of nite variance we must have lim
m
E(
m
(X)) =
0, by Lebesgues Dominated Convergence Theorem; let m 1 be such that E(
m
(Z)) , where Z is some (or any)
standard normal random variable. Finally, take > 0 such that , +
2
(/m)
2
.
(I hope that you have seen enough - arguments not to be troubled by any expectation of understanding the reasons
for each particular formula here before reading the rest of the argument. But the formula
1
3
M
3
+ 2M
2
, in association
with

, should recall 274D.)


(b) Let X
0
, . . . , X
n
be independent random variables with zero expectation such that

n
i=0
Var(X
i
) = 1 and

n
i=0
E(

(X
i
)) . We need an auxiliary sequence Z
0
, . . . , Z
n
of standard normal random variables to match against
the X
i
. To create this, I use the following device. Suppose that the probability space underlying X
0
, . . . , X
n
is (, , ).
Set

= R
n+1
, and let

be the product measure on

, where is given the measure and each factor R of


R
n+1
is given the measure
G
. Set X

i
(, z) = X
i
() and Z
i
(, z) =
i
for domX
i
, z = (
0
, . . . ,
n
) R
n+1
, i n.
Then X

0
, . . . , X

n
, Z
0
, . . . , Z
n
are independent, and each X

i
has the same distribution as X
i
(272Mb). Consequently
S

= X

0
+. . .+X

n
has the same distribution as S = X
0
+. . .+X
n
(using 272T, or otherwise); so that E(g(S

)) = E(g(S))
for any bounded Borel measurable function g (using 271E). Also each Z
i
has distribution
G
, so is standard normal.
(c) Write
i
=
_
Var(X
i
) for each i, and set K = i : i n,
i
> 0. Observe that /
i
m for each i K. PPP
We know that
274F The central limit theorem 365

2
i
= Var(X
i
) = E(X
2
i
) E(
2
+

(X
i
)) =
2
+E(

(X
i
))
2
+,
so
/
i
/

+
2
m
by the choice of . QQQ
(d) Consider the independent normal random variables
i
Z
i
. We have E(
i
Z
i
) = E(X

i
) = 0 and Var(
i
Z
i
) =
Var(X

i
) =
2
i
for each i, so that Z = Z
0
+ . . . + Z
n
has expectation 0 and variance

n
i=0

2
i
= 1; moreover, by 274B,
Z is normal, so in fact it is standard normal. Now we have
n

i=0
E(

(
i
Z
i
)) =

iK
E(

(
i
Z
i
)) =

iK

2
i
E(
/
i
(Z
i
))
(because
2

/
(x) =

(x) whenever x R, > 0)


=

iK

2
i
E(
/
i
(Z))

iK

2
i
E(
m
(Z))
(because, by (c), /
i
m for every i K, so
/
i
(t)
m
(t) for every t)

iK

2
i

(by the choice of m)


= .
On the other hand, we surely have

n
i=0
E(

(X

i
)) =

n
i=0
E(

(X
i
))

n
i=0
E(

(X
i
)) .
(e) For any real number t, set
h
t
(x) = h(x t)
for each x R. Then h
t
is three-times-dierentiable, with sup
xR
[h

t
(x)[ = M
2
and sup
xR
[h

(x)[ = M
3
. Conse-
quently
[E(h
t
(S)) E(h
t
(Z))[

.
PPP By 274Db,
[E(h
t
(S)) E(h
t
(Z))[ = [E(h
t
(S

)) E(h
t
(Z))[
= [E(h
t
(
n

i=0
X

i
)) E(h
t
(
n

i=0

i
Z
i
))[

1
3
M
3
n

i=0

2
i
+M
2
n

i=0
E(

(X
i
)) +M
2
n

i=0
E(

(
i
Z
i
))

1
3
M
3
+M
2
+M
2

,
by the choice of . QQQ
(f ) Now take any a R. We have
], a 2] h
a
], a] h
a+
], a +].
Note also that, for any b,
(b + 2) = (b) +
1

2
_
b+2
b
e
x
2
/2
dx (b) +
2

2
= (b) +

.
Consequently
(a) (a 2)

= Pr(Z a 2)

E(h
a
(Z))

E(h
a
(S)) Pr(S a)
E(h
a+
(S)) E(h
a+
(Z)) +

Pr(Z a + 2) +

= (a + 2) +

(a) +.
366 Probability theory 274F
But this means just that

Pr(

n
i=0
X
i
a) (a)

,
as claimed.
274G Central Limit Theorem Let X
n

nN
be an independent sequence of random variables, all with zero
expectation and nite variance; write s
n
=
_

n
i=0
Var(X
i
) for each n. Suppose that
lim
n
1
s
2
n

n
i=0
E(
s
n
(X
i
)) = 0 for every > 0,
writing

(x) = 0 if [x[ , x
2
if [x[ > . Set
S
n
=
1
s
n
(X
0
+. . . +X
n
)
for each n N such that s
n
> 0. Then
lim
n
Pr(S
n
a) = (a)
uniformly for a R.
proof Given > 0, take > 0 as in Lindebergs theorem (274F). Then for all n large enough,
1
s
2
n

n
i=0
E(
s
n
(X
i
)) .
Fix on any such n. Of course we have s
n
> 0. Set
X

i
=
1
s
n
X
i
for i n;
then X

0
, . . . , X

n
are independent, with zero expectation,

n
i=0
Var(X

i
) =

n
i=0
1
s
2
n
Var(X
i
) = 1,

n
i=0
E(

(X

i
)) =

n
i=0
1
s
2
n
E(
s
n
(X
i
)) .
By 274F,

Pr(S
n
a) (a)

Pr(

n
i=0
X

i
a) (a)


for every a R. Since this is true for all n large enough, we have the result.
274H Remarks (a) The condition
lim
n
1
s
2
n

n
i=0
E(
s
n
(X
i
)) = 0 for every > 0
is called Lindebergs condition, following Lindeberg 22.
(b) Lindebergs condition is necessary as well as sucient, in the following sense. Suppose that X
n

nN
is an
independent sequence of real-valued random variables with zero expectation and nite variance; write
n
=
_
Var(X
n
),
s
n
=
_

n
i=0
Var(X
i
) for each n. Suppose that lim
n
s
n
= , lim
n

n
s
n
= 0 and that lim
n
Pr(S
n
a) = (a)
for each a R, where S
n
=
1
s
n
(X
0
+. . . +X
n
). Then
lim
n
1
s
2
n

n
i=0
E(
s
n
(X
i
)) = 0
for every > 0. (Feller 66, XV.6, Theorem 3; Lo` eve 77, 21.2.)
(c) The proof of 274F-274G here is adapted from Feller 66, VIII.4. It has the virtue of being elementary, in
that it does not involve characteristic functions. Of course this has to be paid for by a number of detailed estimations;
and what is much more serious it leaves us without one of the most powerful techniques for describing distributions.
The proof does oer a method of bounding
[ Pr(S
n
a) (a)[;
274J The central limit theorem 367
but it should be said that the bounds obtained are not useful ones, being grossly over-pessimistic, at least in the readily
analysable cases. (For instance, a better bound, in many cases, is given by the Berry-Esseen theorem: if X
n

nN
is
independent and identically distributed, with zero expectation, and the common values of
_
E(X
2
n
), E([X
n
[
3
) are ,
< , then
[ Pr(S
n
a) (a)[
33
4
3

n+1
;
see Feller 66, XVI.5, Lo` eve 77, 21.3, or Hall 82.) Furthermore, when [a[ is large, (a) is exceedingly close
to either 0 or 1, so that any uniform bound for [ Pr(S a) (a)[ gives very little information; a great deal of
work has been done on estimating the tails of such distributions more precisely, subject to special conditions. For
instance, if X
0
, . . . , X
n
are independent random variables, of zero expectation, uniformly bounded with [X
i
[ K
almost everywhere for each i, Y = X
0
+. . . +X
n
, s =
_
Var(Y ) > 0, S =
1
s
Y , then for any [0, s/K]
Pr([S[ ) 2 exp
_

2
2(1 +
K
2s
)
2
_
2e

2
/2
if s K (R enyi 70, VII.4, Theorem 1).
I now list some of the standard cases in which Lindebergs conditions are satised, so that we may apply the theorem.
274I Corollary Let X
n

nN
be an independent sequence of real-valued random variables, all with the same
distribution, and suppose that their common expectation is 0 and their common variance is nite and not zero. Write
for the common value of
_
Var(X
n
), and set
S
n
=
1

n+1
(X
0
+. . . +X
n
)
for each n N. Then
lim
n
Pr(S
n
a) = (a)
uniformly for a R.
proof In the language of 274G-274H, we have
n
= , s
n
=

n + 1, so the rst two conditions are surely satised;


moreover, if is the common distribution of the X
n
, then
E(
s
n
(X
n
)) =
_
{x:|x|>

n}
x
2
(dx) 0
by Lebesgues Dominated Convergence Theorem; so that
1
s
2
n

n
i=0
E(
s
n
(X
n
)) 0
by 273Ca. Thus Lindebergs conditions are satised and 274G gives the result.
274J Corollary Let X
n

nN
be an independent sequence of real-valued random variables with zero expectation,
and suppose that X
2
n
: n N is uniformly integrable and that
liminf
n
1
n+1

n
i=0
Var(X
i
) > 0.
Set
s
n
=
_

n
i=0
Var(X
i
), S
n
=
1
s
n
(X
0
+. . . +X
n
)
for large n N. Then
lim
n
Pr(S
n
a) = (a)
uniformly for a R.
proof The condition
liminf
n
1
n+1

n
i=0
Var(X
i
) > 0
means that there are c > 0, n
0
N such that s
n
c

n + 1 for every n n
0
. Let the underlying space be (, , ),
and take , > 0. Writing

(x) = 0 for [x[ , x


2
for [x[ > , as in 274F-274G, we have
368 Probability theory 274J
E(
s
n
(X
i
)) E(
c

n+1
(X
i
)) =
_
F(i,c

n+1)
X
2
i
d
for n n
0
, i n, where F(i, ) = : domX
i
, [X
i
()[ > . Because X
2
i
: i N is uniformly integrable, there
is a 0 such that
_
F(i,)
X
2
i
d c
2
for every i N (246I). Let n
1
n
0
be such that c

n
1
+ 1 ; then for any
n n
1
1
s
2
n

n
i=0
E(
s
n
(X
i
))
1
c
2
(n+1)

n
i=0
c
2
= .
As , are arbitrary, the conditions of 274G are satised and the result follows.
274K Corollary Let X
n

nN
be an independent sequence of real-valued random variables with zero expectation,
and suppose that
(i) there is some > 0 such that sup
nN
E([X
n
[
2+
) < ,
(ii) liminf
n
1
n+1

n
i=0
Var(X
i
) > 0.
Set s
n
=
_

n
i=0
Var(X
i
) and
S
n
=
1
s
n
(X
0
+. . . +X
n
)
for large n N. Then
lim
n
Pr(S
n
a) = (a)
uniformly for a R.
proof The point is that X
2
n
: n N is uniformly integrable. PPP Set K = 1 + sup
nN
E([X
n
[
2+
). Given > 0, set
M = (K/)
1/
. Then (X
2
n
M)
+
M

[X
n
[
2+
, so
E(X
2
n
M)
+
KM

=
for every n N. As is arbitrary, X
2
n
: n N is uniformly integrable. QQQ
Accordingly the conditions of 274J are satised and we have the result.
274L Remarks (a) All the theorems of this section are devoted to nding conditions under which a random variable
S is nearly standard normal, in the sense that Pr(S a) Pr(Z a) uniformly for a R, where Z is some (or any)
standard normal random variable. In all cases the random variable S is normalized to have expectation 0 and variance
1, and is a sum of a large number of independent random variables. (In 274G and 274I-274K it is explicit that there
must be many X
i
, since they refer to a limit as n . This is not said in so many words in the formulation I give of
Lindebergs theorem, but the proof makes it evident that n

+
2
1, so surely n will have to be large there also.)
(b) I cannot leave this section without remarking that the form of the denition of nearly standard normal may
lead your intuition astray if you try to apply it to other distributions. If we take F to be the distribution function of
S, so that F(a) = Pr(S a), I am saying that S is nearly standard normal if sup
aR
[F(a) (a)[ is small. It is
natural to think of this as approximation in a metric, writing
(,

) = sup
aR
[F

(a) F

(a)[
for distributions ,

on R, where F

(a) = ], a]. In this form, the theorems above can be read as nding conditions
under which lim
n
(
S
n
,
G
) = 0. But the point is that is not really the right metric to use. It works here because

G
is atomless. But suppose, for instance, that is the distribution which gives mass 1 to the point 0 (I mean, that
E = 1 if 0 E R, 0 if 0 / E R), and that
n
is the distribution of a normal random variable with expectation 0
and variance
1
n
, for each n 1. Then F

(0) = 1 and F

n
(0) =
1
2
, so (
n
, ) =
1
2
for each n 1. However, for most
purposes one would regard the dierence between
n
and as small, and surely is the only distribution which one
could reasonably call a limit of the
n
.
(c) The diculties here present themselves in more than one form. A statistician would be unhappy with the
idea that the
n
of the last paragraph were far from (and from each other), on the grounds that any measurement
involving random variables with these distributions must be subject to error, and small errors of measurement will
render them indistinguishable. A pure mathematician, looking forward to the possibility of generalizing these results,
will be unhappy with the emphasis given to the values of ], a], for which it may be dicult to nd suitable
equivalents in more abstract spaces.
274Xg The central limit theorem 369
(d) These considerations join together to lead us to a rather dierent denition for a topology on the space P of
probability distributions on R. For any bounded continuous function h : R R we have a pseudometric
h
: P P
[0, [ dened by writing

h
(,

) = [
_
hd
_
hd

[
for all ,

P. The vague topology on P is that generated by the pseudometrics


h
(2A3F). I will not go into
its properties in detail here (some are sketched in 274Ya-274Yd below; see also 285K-285L, 285S and 437J-437P in
Volume 4). But I maintain that the right way to look at the results of this chapter is to say that (i) the distributions

S
are close to
G
for the vague topology (ii) the sets : (,
G
) < are open for that topology, and that is why
(
S
,
G
) is small.
*274M I conclude with a simple pair of inequalities which are frequently useful when studying normal random
variables.
Lemma (a)
_

x
e
t
2
/2
dt
1
x
e
x
2
/2
for every x > 0.
(b)
_

x
e
t
2
/2
dt
1
2x
e
x
2
/2
for every x 1.
proof (a)
_

x
e
t
2
/2
dt =
_

0
e
(x+s)
2
/2
ds e
x
2
/2
_

0
e
xs
ds =
1
x
e
x
2
/2
.
(b) Set
f(t) = e
t
2
/2
(1 x(t x))e
x
2
/2
.
Then f(x) = f

(x) = 0 and f

(t) = (t
2
1)e
t
2
/2
is positive for t x (because x 1). Accordingly f(t) 0 for every
t x, and
_
x+1/x
x
f(t)dt 0. But this means just that
_

x
e
t
2
/2
dt
_
x+
1
x
x
e
t
2
/2
dt
_
x+
1
x
x
(1 x(t x))e
x
2
/2
dt =
1
2x
e
x
2
/2
,
as required.
274X Basic exercises >>>(a) Use 272U to give an alternative proof of 274B.
(b) Prove 274D when h

is M
3
-Lipschitz but not necessarily dierentiable.
(c) Let m
k

kN
be a strictly increasing sequence in N such that m
0
= 0 and lim
k
m
k
/m
k+1
= 0. Let X
n

nN
be an independent sequence of random variables such that Pr(X
n
=

m
k
) = Pr(X
n
=

m
k
) = 1/2m
k
, Pr(X
n
=
0) = 1 1/m
k
whenever m
k1
n < m
k
. Show that the Central Limit Theorem is not valid for X
n

nN
. (Hint:
setting W
k
= (X
0
+. . . +X
m
k
1
)/

m
k
, show that Pr(W
k
G) 1 for every open set G including Z.)
(d) Let X
n

nN
be any independent sequence of random variables all with the same distribution; suppose that they
all have nite variance
2
> 0, and that their common expectation is c. Set S
n
=
1

n+1
(X
0
+. . . +X
n
) for each n, and
let Y be a normal random variable with expectation c and variance
2
. Show that lim
n
Pr(S
n
a) = Pr(Y a)
uniformly for a R.
>>>(e) Show that for any a R,
lim
n
1
2
n

n
2
+a

n
2

r=0
n!
r!(n r)!
= lim
n
1
2
n
#(I : I n, #(I)
n
2
+a

n
2
) = (a).
(f ) Show that 274I is a special case of 274J.
(g) Let X
n

nN
be an independent sequence of real-valued random variables with zero expectation. Set s
n
=
_

n
i=0
Var(X
i
) and
370 Probability theory 274Xg
S
n
=
1
s
n
(X
0
+. . . +X
n
)
for each n N. Suppose that there is some > 0 such that
lim
n
1
s
2+
n

n
i=0
E([X
i
[
2+
) = 0.
Show that lim
n
Pr(S
n
a) = (a) uniformly for a R. (This is a form of Liapounos central limit theorem;
see Liapounoff 1901.)
(h) Let P be the set of Radon probability measures on R. Let
0
P, a R. Show that the map ], a] :
P [0, 1] is continuous at
0
for the vague topology on P i
0
a = 0.
(i) Suppose that f : R R is absolutely continuous on every closed bounded interval, and that
_

[f

(x)[e
ax
2
dx <
for every a > 0. Let X be a normal random variable with zero expectation. Show that E(Xf(X)) and E(X
2
)E(f

(X))
are dened and equal.
(j) (Steele 86) Suppose that X
0
, . . . , X
n
, Y
0
, . . . , Y
n
are independent random variables such that, for each i n,
X
i
and Y
i
have the same distribution. Let h : R
n+1
R be a Borel measurable function, and set Z = h(X
0
, . . . , X
n
),
Z
i
= h(X
0
, . . . , X
i1
, Y
i
, X
i+1
, . . . , X
n
) for each i (with Z
0
= h(Y
0
, X
1
, . . . , X
n
) and Z
n
= h(X
0
, . . . , X
n1
, Y
n
), of
course). Suppose that Z has nite expectation. Show that Var(Z)
1
2

n
i=0
E(Z
i
Z)
2
.
(k) Let X
n

nN
be an independent identically distributed sequence of random variables with non-zero nite variance.
Let t
n

nN
be a sequence in R such that

n=0
t
2
n
= . Show that

n=0
t
n
X
n
is undened or innite a.e. (Hint:
First deal with the case in which t
n

nN
does not converge to 0. Otherwise, use 274G to show that, for any n N,
lim
m
Pr([

m
i=n
t
i
X
i
[ 1)
1
2
).)
(l) Let X
n

nN
be an independent sequence of real-valued random variables with zero expectation. Suppose that
M 0 is such that [X
n
[ M a.e. for every n, and that

n=0
Var(X
n
) = . Set s
n
=
_

n
i=0
Var(X
i
) for each n,
and S
n
=
1
s
n

n
i=0
X
i
when s
n
> 0. Show that lim
n
Pr(S
n
a) = (a) for every a R.
274Y Further exercises (a) Write P for the set of Radon probability measures on R. For ,

P set
(,

) = inf : 0, ], a ]

], a] ], a +] +
for every a R.
Show that is a metric on P and that it denes the vague topology on P. ( is called Levys metric.)
(b) Write P for the set of Radon probability measures on R, and let be the metric on P dened in 274Lb. Show
that if P is atomless and > 0, then

P, (

, ) < is open for the vague topology on R.


(c) Let S
n

nN
be a sequence of real-valued random variables, and Z a standard normal random variable. Show
that the following are equiveridical:
(i)
G
= lim
n

S
n
for the vague topology, writing
S
n
for the distribution of S
n
;
(ii) E(h(Z)) = lim
n
E(h(S
n
)) for every bounded continuous function h : R R;
(iii) E(h(Z)) = lim
n
E(h(S
n
)) for every bounded function h : R R such that () h has continuous derivatives
of all orders () x : h(x) ,= 0 is bounded;
(iv) lim
n
Pr(S
n
a) = (a) for every a R;
(v) lim
n
Pr(S
n
a) = (a) uniformly for a R;
(vi) a : lim
n
Pr(S
n
a) = (a) is dense in R.
(See also 285L.)
(d) Let (, , ) be a probability space, and P the set of Radon probability measures on R. Show that X
X
:
L
0
() P is continuous for the topology of convergence in measure on L
0
() and the vague topology on P.
(e) Let X
n

nN
be an independent sequence of real-valued random variables. Suppose that there is an M 0 such
that [X
n
[ M a.e. for every n N, and that

n=0
X
n
is dened, as a real number, almost everywhere. Show that

n=0
Var(X
n
) < .
275B Martingales 371
274 Notes and comments For more than two hundred years the Central Limit Theorem has been one of the glories
of mathematics, and no branch of mathematics or science would be the same without it. I suppose it is the most
important single theorem of probability theory; and I observe that the proof hardly uses measure theory. To be sure,
I have clothed the arguments above in the language of measure and integration. But if you look at their essence, the
vital elements of the proof are
(i) a linear combination of independent normal random variables is normal (274Ae, 274B);
(ii) if U, V , W are independent random variables, and h is a bounded continuous function, then
[E(h(U, V, W))[ sup
tR
[E(h(U, V, t))[ (274C);
(iii) if (X
0
, . . . , X
n
) are independent random variables, then we can nd independent random variables
(X

0
, . . . , X

n
, Z
0
, . . . , Z
n
) such that Z
j
is standard normal and X

j
has the same distribution as X
j
, for each
j (274F).
The rest of the argument consists of elementary calculus, careful estimations and a few of the most fundamental
properties of expectations and independence. Now (ii) and (iii) are justied above by appeals to Fubinis theorem,
but surely they belong to the list of probabilistic intuitions which take priority over the identication of probabilities
with countably additive functionals. If they had given any insuperable diculty it would have been a telling argument
against the model of probability we were using, but would not have aected the Central Limit Theorem. In fact (i)
seems to be the place where we really need a mathematical model of the concept of distribution, and all the relevant
calculations can be done in terms of the Riemann integral on the plane, with no mention of countable additivity. So
while I am happy and proud to have written out a version of these beautiful ideas, I have to admit that they are in no
essential way dependent on the rest of this treatise.
In 285 I will describe a quite dierent approach to the theorem, using much more sophisticated machinery; but
it will again be the case, perhaps more thoroughly hidden, that the relevance of measure theory will not be to the
theorem itself, but to our imagination of what an arbitrary distribution is. For here I do have a claim to make for
my subject. The characterization of distribution functions as arbitrary monotonic functions, continuous on the right,
and with the right limits at (271Xb), together with the analysis of monotonic functions in 226, gives us a chance
of forming a mental picture of the proper class of objects to which such results as the Central Limit Theorem can be
applied.
Theorem 274F is a triing modication of Theorem 3 of Lindeberg 22. Like the original, it emphasizes what
I believe to be vital to all the limit theorems of this chapter: they are best founded on a proper understanding of
nite sequences of random variables. Lindebergs condition was the culmination of a long search for the most general
conditions under which the Central Limit Theorem would be valid. I oer a version of Laplaces theorem (274Xe) as
the starting place, and Liapounos condition (274Xg) as an example of one of the intermediate stages. Naturally the
corollaries 274I, 274J, 274K and 274Xd are those one seeks to apply by choice. There is an intriguing, but as far as I
know purely coincidental, parallel between 273H/274K and 273I/274Xd. As an example of an independent sequence
X
n

nN
of random variables, all with expectation zero and variance 1, to which the Central Limit Theorem does not
apply, I oer 274Xc.
275 Martingales
This chapter so far has been dominated by independent sequences of random variables. I now turn to another of the
remarkable concepts to which probabilistic intuitions have led us. Here we study evolving systems, in which we gain
progressively more information as time progresses. I give the basic theorems on pointwise convergence of martingales
(275F-275H, 275K) and a very brief account of stopping times (275L-275P).
275A Denition Let (, , ) be a probability space with completion (,

, ), and
n

nN
a non-decreasing
sequence of -subalgebras of

. (Such sequences
n

nN
are called ltrations.) A martingale adapted to
n

nN
is a sequence X
n

nN
of integrable real-valued random variables on such that (i) domX
n

n
and X
n
is
n
-
measurable for each n N (ii) whenever m n N and E
m
then
_
E
X
n
=
_
E
X
m
.
Note that for (ii) it is enough if
_
E
X
n+1
=
_
E
X
n
whenever n N and E
n
.
275B Examples We have seen many contexts in which such sequences appear naturally; here are a few.
(a) Let (, , ) be a probability space and
n

nN
a non-decreasing sequence of -subalgebras of . Let X be any
real-valued random variable on with nite expectation, and for each n N let X
n
be a conditional expectation of X
on
n
, as in 233. Subject to the conditions that domX
n

n
and X
n
is actually
n
-measurable for each n (a purely
technical point see 232He), X
n

nN
will be a martingale adapted to
n

nN
, because
_
E
X
n+1
=
_
E
X =
_
E
X
n
whenever E
n
.
372 Probability theory 275Bb
(b) Let (, , ) be a probability space and X
n

nN
an independent sequence of random variables all with zero
expectation. For each n N let

n
be the -algebra generated by

in

X
i
, writing
X
i
for the -algebra dened by
X
i
(272C), and set S
n
= X
0
+ . . . + X
n
. Then S
n

nN
is a martingale adapted to

n
. (Use 272K to see that
X
n+1
is independent of

n
, so that
_
E
X
n+1
=
_
X
n+1
E = 0 for every E

n
, by 272R.)
(c) Let (, , ) be a probability space and X
n

nN
an independent sequence of random variables all with expec-
tation 1. For each n N let

n
be the -algebra generated by

in

X
i
, writing
X
i
for the -algebra dened by X
i
,
and set W
n
= X
0
. . . X
n
. Then W
n

nN
is a martingale adapted to

nN
.
275C Remarks (a) It seems appropriate to the concept of a random variable X being adapted to a -algebra
to require that domX and that X should be -measurable, even though this may mean that other random
variables, equal almost everywhere to X, may fail to be adapted to .
(b) Technical problems of this kind evaporate, of course, if all -negligible subsets of X belong to
0
. But examples
such as 275Bb make it seem unreasonable to insist on such a simplication as a general rule.
(c) The concept of martingale can readily be extended to other index sets than N; indeed, if I is any partially
ordered set, we can say that X
i

iI
is a martingale on (, , ) adapted to
i

iI
if (i) each
i
is a -subalgebra of

(ii) each X
i
is an integrable real-valued
i
-measurable random variable such that domX
i

i
(iii) whenever i j
in I, then
i

j
and
_
E
X
i
=
_
E
X
j
for every E
i
. The principal case, after I = N, is I = [0, [; I = Z also
is interesting, and I think it is fair to say that the most important ideas can already be expressed in theorems about
martingales indexed by nite sets I. But in this volume I will generally take martingales to be indexed by N.
(d) Given just a sequence X
n

nN
of integrable real-valued random variables on a probability space (, , ),
we can say simply that X
n

nN
is a martingale on (, , ) if there is some non-decreasing sequence
n

nN
of
-subalgebras of

(the completion of ) such that X
n

nN
is a martingale adapted to
n

nN
. If we write

n
for
the -algebra generated by

in

X
i
, where
X
i
is the -algebra dened by X
i
, as in 275Bb, then it is easy to see
that X
n

nN
is a martingale i it is a martingale adapted to

nN
.
(e) Continuing from (d), it is also easy to see that if X
n

nN
is a martingale on (, , ), and X

n
=
a.e.
X
n
for
every n, then X

nN
is a martingale on (, , ). (The point is that if X
n

nN
is adapted to
n

nN
, then both
X
n

nN
and X

nN
are adapted to

nN
, where

n
= EF : E
n
, F is negligible.)
Consequently we have a concept of martingale as a sequence in L
1
(), saying that a sequence X

nN
in L
1
() is a
martingale i X
n

nN
is a martingale.
Nevertheless, I think that the concept of martingale adapted to a sequence of -algebras is the primary one, since
in all the principal applications the -algebras reect some essential aspect of the problem, which may not be fully
encompassed by the random variables alone.
(f ) The word martingale originally (in English; the history in French is more complex) referred to a strap used to
prevent a horse from throwing its head back. Later it was used as the name of a gambling system in which the gambler
doubles his stake each time he loses, and (in French) as a general term for gambling systems. These may be regarded
as a class of stopped-time martingales, as described in 275L-275P below.
275D A large part of the theory of martingales consists of inequalities of various kinds. I give two of the most
important, both due to J.L.Doob. (See also 276Xa-276Xb.)
Lemma Let (, , ) be a probability space, and X
n

nN
a martingale on . Fix n N and set X

= max(X
0
, . . . , X
n
).
Then for any > 0,
Pr(X

)
1

E(X
+
n
),
writing X
+
n
= max(0, X
n
).
proof Write for the completion of , and

for its domain. Let
n

nN
be a non-decreasing sequence of -subalgebras
of

to which X
n

nN
is adapted. For each i n set
E
i
= : domX
i
, X
i
() ,
275F Martingales 373
F
i
= E
i

j<i
E
j
.
Then F
0
, . . . , F
n
are disjoint and F =

in
F
i
=

in
E
i
; moreover, writing H for the conegligible set

in
domX
i
,
: X

() = F H,
so that
Pr(X

) = : X

() = F =

n
i=0
F
i
.
On the other hand, E
i
and F
i
belong to
i
for each i n, so
_
F
i
X
n
=
_
F
i
X
i
F
i
for every i, and
F =

n
i=0
F
i

n
i=0
_
F
i
X
n
=
_
F
X
n

_
F
X
+
n
E(X
+
n
),
as required.
Remark Note that in fact we have F
_
F
X
n
, where F = : X

() ; this is of great importance in many


applications.
275E Up-crossings The next lemma depends on the notion of up-crossing. Let x
0
, . . . , x
n
be any list of real
numbers, and a < b in R. The number of up-crossings from a to b in the list x
0
, . . . , x
n
is the number of pairs
(j, k) such that 0 j < k n, x
j
a, x
k
b and a < x
i
< b for j < i < k. Note that this is also the largest m such
that s
m
< , if we write
r
1
= infi : i n, x
i
a,
s
1
= infi : r
1
< i n, x
i
b,
r
2
= infi : s
1
< i n, x
i
a,
s
2
= infi : r
2
< i n, x
i
b
and so on, taking inf = .
275F Lemma Let (, , ) be a probability space and X
n

nN
a martingale on . Suppose that n N and that
a < b in R. For each

in
domX
i
, let U() be the number of up-crossings from a to b in the list X
0
(), . . . , X
n
().
Then
E(U)
1
ba
E((X
n
X
0
)
+
),
writing (X
n
X
0
)
+
() = max(0, X
n
() X
0
()) for domX
n
domX
0
.
proof Each individual step in the proof is elementary, but the structure as a whole is non-trivial.
(a) The following fact will be useful. Suppose that x
0
, . . . , x
n
are real numbers; let u be the number of up-crossings
from a to b in the list x
0
, . . . , x
n
. Set y
i
= max(x
i
, a) for each i; then u is also the number of up-crossings from a to
b in the list y
0
, . . . , y
n
. For each k n, set c
k
= 1 if there is a j k such that x
j
a and x
i
< b for j i k, 0
otherwise. Then
(b a)u

n1
k=0
c
k
(y
k+1
y
k
).
PPP I induce on m to show that (dening r
m
, s
m
as in 275E)
(b a)m

s
m
1
k=0
c
k
(y
k+1
y
k
)
whenever m u. For m = 0 (taking s
0
= 1) we have 0 = 0. For the inductive step to m 1, we have s
m1
< r
m
<
s
m
n (because I am supposing that m u), and c
k
= 0 if s
m1
k < r
m
, c
k
= 1 if r
m
k < s
m
. So
s
m
1

k=0
c
k
(y
k+1
y
k
) =
s
m1
1

k=0
c
k
(y
k+1
y
k
) +
s
m
1

k=r
m
(y
k+1
y
k
)
(b a)(m1) +y
s
m
y
r
m
(by the inductive hypothesis)
374 Probability theory 275F
(b a)m
(because y
s
m
b, y
r
m
= a), and the induction proceeds.
Accordingly

s
u
1
k=0
c
k
(y
k+1
y
k
) (b a)u.
As for the sum

n1
k=s
u
c
k
(y
k+1
y
k
), we have c
k
= 0 for s
u
k < r
u+1
, c
k
= 1 for r
u
k < s
u+1
, while s
u+1
> n, so
if n r
u+1
we have

n1
k=0
c
k
(y
k+1
y
k
) =

s
u
1
k=0
c
k
(y
k+1
y
k
) (b a)u,
while if n > r
u+1
we have
n1

k=0
c
k
(y
k+1
y
k
) =
s
u
1

k=0
c
k
(y
k+1
y
k
) +
n1

k=r
u+1
y
k+1
y
k
(b a)u +y
n
y
r
u+1
(b a)u
because y
n
a = y
r
u+1
. Thus in both cases we have the required result. QQQ
(b)(i) Now dene
Y
k
() = max(a, X
k
()) for domX
k
,
F
k
= :

ik
domX
i
, j k, X
j
() a, X
i
() < b if j i k
for each k N. If
n

nN
is a non-decreasing sequence of -algebras to which X
n

nN
is adapted, then F
k

k
(because if j k all the sets domX
j
, : X
j
() a, : X
j
() < b belong to
j

k
).
(ii) We nd that
_
F
Y
k

_
F
Y
k+1
if F
k
. PPP Set G = : X
k
() > a
k
. Then
_
F
Y
k
=
_
FG
X
k
+a (F G)
=
_
FG
X
k+1
+a (F G)

_
FG
Y
k+1
+
_
F\G
Y
k+1
=
_
F
Y
k+1
. QQQ
(iii) Consequently
_
F
Y
k+1
Y
k

_
Y
k+1
Y
k
for every F
k
.
PPP
_
(Y
k+1
Y
k
)
_
F
(Y
k+1
Y
k
) =
_
\F
Y
k+1

_
\F
Y
k
0. QQQ
(c) Let H be the conegligible set domU =

in
domX
i

n
. We ought to check at some point that U is
n
-
measurable; but this is clearly true, because all the relevant sets : X
i
() a, : X
i
() b belong to
n
. For
each H, apply (a) to the list X
0
(), . . . , X
n
() to see that
(b a)U()

n1
k=0
F
k
()(Y
k+1
() Y
k
()).
Because H is conegligible, it follows that
(b a)E(U)
n1

k=0
_
F
k
Y
k+1
Y
k

n1

k=0
_
Y
k+1
Y
k
(using (b-iii))
= E(Y
n
Y
0
) E((X
n
X
0
)
+
)
because Y
n
Y
0
(X
n
X
0
)
+
everywhere on domX
n
domX
0
. This completes the proof.
275G We are now ready for the principal theorems of this section.
275I Martingales 375
Doobs Martingale Convergence Theorem Let X
n

nN
be a martingale on a probability space (, , ), and
suppose that sup
nN
E([X
n
[) < . Then lim
n
X
n
() is dened in R for almost every in .
proof (a) Set H =

nN
domX
n
, and for H set Y () = liminf
n
X
n
(), Z() = limsup
n
X
n
(),
allowing in both cases. But note that Y liminf
n
[X
n
[, so by Fatous Lemma Y () < for almost every
; similarly Z() > for almost every . It will therefore be enough if I can show that Y =
a.e.
Z, for then
Y () = Z() R for almost every , and X
n
()
nN
will be convergent for almost every .
(b) ??? So suppose, if possible, that Y and Z are not equal almost everywhere. Of course both are

-measurable,
where (,

, ) is the completion of (, , ), so we must have
: H, Y () < Z() > 0.
Accordingly there are rational numbers q, q

such that q < q

and G > 0, where


G = : H, Y () < q < q

< Z().
Now, for each H, n N, let U
n
() be the number of up-crossings from q to q

in the list X
0
(), . . . , X
n
(). Then
275F tells us that
E(U
n
)
1
q

q
E((X
n
X
0
)
+
)
1
q

q
E([X
n
[ +[X
0
[)
2M
q

q
,
if we write M = sup
iN
E([X
i
[). By B.Levis theorem, U() = sup
nN
U
n
() < for almost every . On the other
hand, if G, then there are arbitrarily large j, k such that X
j
() < q and X
k
() > q

, so U() = . This means


that G must be 0, contrary to the choice of q, q

. XXX
(c) Thus we must in fact have Y =
a.e.
Z, and X
n
()
nN
is convergent for almost every , as claimed.
275H Theorem Let (, , ) be a probability space, and
n

nN
a non-decreasing sequence of -subalgebras of
. Let X
n

nN
be a martingale adapted to
n

nN
. Then the following are equiveridical:
(i) there is a random variable X, of nite expectation, such that X
n
is a conditional expectation of X on
n
for
every n;
(ii) X
n
: n N is uniformly integrable;
(iii) X

() = lim
n
X
n
() is dened in R for almost every , and E([X

[) = lim
n
E([X
n
[) < .
proof (i)(ii) By 246D, the set of all conditional expectations of X is uniformly integrable, so X
n
: n N is surely
uniformly integrable.
(ii)(iii) If X
n
: n N is uniformly integrable, we surely have sup
nN
E([X
n
[) < , so 275G tells us that X

is dened almost everywhere. By 246Ja, X

is integrable and lim


n
E([X
n
X

[) = 0. Consequently E([X

[) =
lim
n
E([X
n
[) < .
(iii)(i) Because E([X

[) = lim
n
E([X
n
[), lim
n
E([X
n
X

[) = 0 (245H(a-ii)). Now let n N, E


n
.
Then
_
E
X
n
= lim
m
_
E
X
m
=
_
E
X

.
As E is arbitrary, X
n
is a conditional expectation of X

on
n
.
275I Theorem Let (, , ) be a probability space, and
n

nN
a non-decreasing sequence of -subalgebras of
; write

for the -algebra generated by

nN

n
. Let X be any real-valued random variable on with nite
expectation, and for each n N let X
n
be a conditional expectation of X on
n
. Then X

() = lim
n
X
n
() is
dened almost everywhere; lim
n
E([X

X
n
[) = 0, and X

is a conditional expectation of X on

.
proof (a) By 275G-275H, we know that X

is dened almost everywhere, and, as remarked in 275H, lim


n
E([X

X
n
[) = 0. To see that X

is a conditional expectation of X on

, set
/ = E : E

,
_
E
X

=
_
E
X, J =

nN

n
.
Now J and / satisfy the conditions of the Monotone Class Theorem (136B). PPP () Of course J and J is closed
under nite intersections, because
n

nN
is a non-decreasing sequence of -algebras; in fact J is a subalgebra of T,
and is closed under nite unions and complements. () If E J, say E
n
; then
_
E
X

= lim
m
_
E
X
m
=
_
E
X,
as in (iii)(i) of 275H, so E /. Thus J /. () If E, F / and E F, then
376 Probability theory 275I
_
F\E
X

=
_
F
X


_
E
X

=
_
F
X
_
E
X =
_
F\E
X,
so F E /. () If E
k

nN
is a non-decreasing sequence in / with union E, then
_
E
X

= lim
k
_
E
k
X

= lim
k
_
E
k
X =
_
E
X,
so E /. Thus / is a Dynkin class. QQQ
Consequently, by 136B, / includes

; that is, X

is a conditional expectation of X on

.
Remark I have written lim
n
E([X
n
X

[) = 0; but you may prefer to say X

= lim
n
X

n
in L
1
(), as in
Chapter 24.
The importance of this theorem is such that you may be interested in a proof based on 275D rather than 275E-275G;
see 275Xd.
*275J As a corollary of this theorem I give an important result, a kind of density theorem for product measures.
Proposition Let (
n
,
n
,
n
)
nN
be a sequence of probability spaces with product (, , ). Let X be a real-valued
random variable on with nite expectation. For each n N dene X
n
by setting
X
n
() =
_
X(
0
, . . . ,
n
,
n+1
, . . . )d(
n+1
, . . . )
wherever this is dened, where I write
_
. . . d(
n+1
, . . . ) to mean integration with respect to the product measure

n
on

in+1

i
. Then X() = lim
n
X
n
() for almost every = (
0
,
1
, . . . ) in , and lim
n
E([X X
n
[) = 0.
proof For each n, we can identify with the product of
n
and

n
, where
n
is the product measure on
0
. . .
n
(254N). So 253H tells us that X
n
is a conditional expectation of X on the -algebra
n
= E

i>n

i
: E dom
n
.
Since (by 254N again) we can think of
n+1
as the product of
n
and
n+1
,
n

n+1
for each n. So 275I tells us
that X
n

nN
converges almost everywhere to a conditional expectation X

of X on the -algebra

generated by

nN

n
. Now

and also

nN

, so every member of is sandwiched between two members of

of the same measure (254Ff), and X

must be equal to X almost everywhere. Moreover, 275I also tells us that


lim
n
E([X X
n
[) = lim
n
E([X

X
n
[) = 0,
as required.
275K Reverse martingales We have a result corresponding to 275I for decreasing sequences of -algebras. While
this is used less often than 275G-275I, it does have very important applications.
Theorem Let (, , ) be a probability space, and
n

nN
a non-increasing sequence of -subalgebras of , with
intersection

. Let X be any real-valued random variable with nite expectation, and for each n N let X
n
be
a conditional expectation of X on
n
. Then X

= lim
n
X
n
is dened almost everywhere and is a conditional
expectation of X on

.
proof (a) Set H =

nN
domX
n
, so that H is conegligible. For n N, a < b in R, and H, write U
abn
() for the
number of up-crossings from a to b in the list X
n
(), X
n1
(), . . . , X
0
() (275E). Then
E(U
abn
)
1
ba
E((X
0
X
n
)
+
)
(275F)

1
ba
E([X
0
[ +[X
n
[)
2
ba
E([X
0
[) < .
So lim
n
U
abn
() is nite for almost every . But this means that
: liminf
n
X
n
() < a, limsup
n
X
n
() > b
is negligible. As a and b are arbitrary, X
n

nN
is convergent a.e., just as in 275G. Set X

() = lim
n
X
n
()
whenever this is dened in R.
(b) By 246D, X
n
: n N is uniformly integrable, so E([X
n
X

[) 0 as n (246Ja), and
_
E
X

= lim
n
_
E
X
n
=
_
E
X
0
for every E

.
275N Martingales 377
(c) Now there is a conegligible set G

such that G domX

and X

G is

-measurable. PPP For each n N,


there is a conegligible set G
n

n
such that G
n
domX
n
and X
n
G
n
is
n
-measurable. Set G

nN

mn
G
m
;
then, for any r N, G

nr

mn
G
m
belongs to
r
, so G

, while of course G

is conegligible. For n N,
set X

n
() = X
n
() for G
n
, 0 for G
n
; then for G

, lim
n
X

n
() = lim
n
X
n
() if either is dened
in R. Writing X

= lim
n
X

n
whenever this is dened in R, 121F and 121H tell us that X

is
r
-measurable and
domX


r
for every r N, so that G

= domX

belongs to

and X

is

-measurable. We also know, from


(b), that G

is conegligible. So setting G = G

we have the result. QQQ


Thus X

is a conditional expectation of X on

.
275L Stopping times In a sense, the main work of this section is over; I have no room for any more theorems of
importance comparable to 275G-275I. However, it would be wrong to leave this chapter without briey describing one
of the most fruitful ideas of the subject.
Denition Let (, , ) be a probability space, with completion (,

, ), and
n

nN
a non-decreasing sequence
of -subalgebras of

. A stopping time adapted to
n

nN
(also called optional time, Markov time) is a
function from to N such that : () n
n
for every n N.
Remark Of course the condition
: () n
n
for every n N
can be replaced by the equivalent condition
: () = n
n
for every n N.
I give priority to the former expression because it is more appropriate to other index sets (see 275Cc).
275M Examples (a) If X
n

nN
is a martingale adapted to a sequence
n

nN
of -algebras, and H
n
is a Borel
subset of R
n+1
for each n, then we have a stopping time adapted to
n

nN
dened by the formula
() = infn :

in
domX
i
, (X
0
(), . . . , X
n
()) H
n
,
setting inf = as usual. (For by 121Ka the set E
n
= : (X
0
(), . . . , X
n
()) H
n
belongs to
n
for each n, and
: () n =

in
E
i
.) In particular, for instance, the formulae
infn : X
n
() a, infn : [X
n
()[ > a
dene stopping times.
(b) Any constant function : N is a stopping time. If ,

are two stopping times adapted to the same


sequence
n

nN
of -algebras, then

is a stopping time adapted to


n

nN
, setting (

)() = min((),

())
for .
275N Lemma Let (, , ) be a complete probability space, and
n

nN
a non-decreasing sequence of -subalgebras
of . Suppose that and

are stopping times on , and X


n

nN
a martingale, all adapted to
n

nN
.
(a) The family

= E : E , E : () n
n
for every n N
is a -subalgebra of .
(b) If ()

() for every , then


.
(c) Now suppose that is nite almost everywhere. Set

() = X
()
()
whenever () < and domX
()
. Then dom

X

and

X

is

-measurable.
(d) If is essentially bounded, that is, there is some m N such that m almost everywhere, then E(

X

) exists
and is equal to E(X
0
).
(e) If

almost everywhere, and

is essentially bounded, then



X

is a conditional expectation of

X

on

.
proof (a) This is elementary. Write H
n
= : () n for each n N. The empty set belongs to

because it
belongs to
n
for every n. If E

, then
( E) H
n
= H
n
(E H
n
)
n
378 Probability theory 275N
because H
n

n
; this is true for for every n, so X E

. If E
k

kN
is any sequence in

then
(

kN
E
k
) H
n
=

kN
E
k
H
n

n
for every n, so

kN
E
k

.
(b) If E

then of course E , and if n N then :

() n : () n, so that
E :

() n = E : () n :

() n
belongs to
n
; as n is arbitrary, E

.
(c) Set H
n
= : () n for each n N. For any a R,
H
n
: dom

X

,

X

() a
=
_
kn
: () = k, domX
k
, X
k
() a
n
.
As n is arbitrary,
G
a
= : dom

X

,

X

() a

.
As a is arbitrary, dom

X

mN
G
m

and

X

is

-measurable.
(d) Set H
k
= : () = k for k m. Then

km
H
k
is conegligible, so
E(X

) =

m
k=0
_
H
k
X
k
=

m
k=0
_
H
k
X
m
=
_

X
m
=
_

X
0
.
(e) Suppose

n almost everywhere. Set H


k
= : () = k, H

k
= :

() = k for each k; then both


H
k

kn
and H

kn
are partitions of conegligible subsets of X. Now suppose that E

. Then
_
E

n
k=0
_
EH
k

n
k=0
_
EH
k
X
k
=

n
k=0
_
EH
k
X
n
=
_
E
X
n
because E H
k

k
for every k. By (b), E

, so we also have
_
E

=
_
E
X
n
. Thus
_
E

=
_
E

for every
E

, as claimed.
275O Proposition Let X
n

nN
be a martingale and a stopping time, both adapted to the same sequence
n

nN
of -algebras. For each n, set ( n)() = min((), n) for ; then n is a stopping time, and

X
n

nN
is a
martingale adapted to

nN
, dening

X
n
and

n
as in 275N.
proof As remarked in 275Mb, each n is a stopping time. If m n, then

n
by 275Nb. Each

X
m
is

m
-measurable, with domain belonging to

m
, by 275Nc, and has nite expectation, by 275Nd; nally, if m n,
then

X
m
is a conditional expectation of

X
n
on

m
, by 275Ne.
275P Corollary Suppose that (, , ) is a probability space and X
n

nN
is a martingale on such that W =
sup
nN
[X
n+1
X
n
[ is nite almost everywhere and has nite expectation. Then for almost every , either
lim
n
X
n
() exists in R or sup
nN
X
n
() = and inf
nN
X
n
() = .
proof Let
n

nN
be a non-decreasing sequence of -algebras to which X
n

nN
is adapted. Let H be the conegligible
set

nN
domX
n
: W() < . For each m N, set

m
() = infn : domX
n
, X
n
() > m.
As in 275Ma,
m
is a stopping time adapted to
n

nN
. Set
Y
mn
=

X

m
n
,
dened as in 275Nc, so that Y
mn

nN
is a martingale, by 275O. If H, then either
m
() > n and
Y
mn
() = X
n
() m,
or 0 <
m
() n and
Y
mn
() = X

m
()
() W() +X

m
()1
() W() +m,
or
m
() = 0 and
Y
mn
() = X
0
().
275Xh Martingales 379
Thus
Y
mn
() [X
0
()[ +W() +m
for every H, and
[Y
mn
()[ = 2 max(0, Y
mn
()) Y
mn
() 2([X
0
()[ +W() +m) Y
mn
(),
E([Y
mn
[) 2E([X
0
[) + 2E(W) + 2mE(Y
mn
) = 2E([X
0
[) + 2E(W) + 2mE(X
0
)
by 275Nd. As this is true for every n N, sup
nN
E([Y
mn
[) < , and lim
n
Y
mn
is dened in R almost everywhere,
by Doobs Martingale Convergence Theorem (275G). Let F
m
be the conegligible set on which Y
mn

nN
converges. Set
H

= H

mN
F
m
, so that H

is conegligible.
Now consider
E = : H

, sup
nN
X
n
() < .
For any E, there must be an m N such that sup
nN
X
n
() m. Now this means that Y
mn
() = X
n
() for
every n, and as F
m
we have
lim
n
X
n
() = lim
n
Y
mn
() R.
This means that X
n
()
nN
is convergent for almost every such that X
n
() : n N is bounded above.
Similarly, X
n
()
nN
is convergent for almost every such that X
n
() : n N is bounded below, which completes
the proof.
275X Basic exercises >>>(a) Let X
n

nN
be an independent sequence of random variables with zero expectation
and nite variance. Set s
n
= (

n
i=0
Var(X
i
))
1/2
, Y
n
= (X
0
+ . . . + X
n
)
2
s
2
n
for each n. Show that Y
n

nN
is a
martingale.
>>>(b) Let X
n

nN
be a martingale. Show that for any > 0, Pr(sup
nN
[X
n
[) )
1

sup
nN
E([X
n
[).
(c) Polyas urn scheme Imagine a box containing red and white balls. At each move, a ball is drawn at random
from the box and replaced together with another of the same colour. (i) Writing R
n
, W
n
for the numbers of red
and white balls after the nth move and X
n
= R
n
/(R
n
+ W
n
), show that X
n

nN
is a martingale. (ii) Starting
from R
0
= W
0
= 1, nd the distribution of (R
n
, W
n
) for each n. (iii) Show that X = lim
n
X
n
is dened almost
everywhere, and nd its distribution when R
0
= W
0
= 1. (See Feller 66 for a discussion of other starting values.)
>>>(d) Let (, , ) be a probability space, and
n

nN
a non-decreasing sequence of -subalgebras of ; for each
n N let P
n
: L
1
L
1
be the conditional expectation operator corresponding to
n
, where L
1
= L
1
() (242J).
(i) Show that V = u : u L
1
, lim
n
|P
n
u u|
1
= 0 is a | |
1
-closed linear subspace of L
1
. (ii) Show that
E : E , E

V is a Dynkin class including

nN

n
, so includes the -algebra

generated by

nN

n
.
(iii) Show that if u L
1
then v = sup
nN
P
n
[u[ is dened in L
1
and is of the form W

where Pr(W )
1

|u|
1
for
every > 0. (Hint: 275D.) (iv) Show that if X is a

-measurable random variable with nite expectation, and for


each n N X
n
is a conditional expectation of X on
n
, then X

V and X =
a.e.
lim
n
X
n
. (Hint: apply (iii) to
u = (X X
m
)

for large m.)


(e) Let (, , ) be a probability space,
n

nN
a non-decreasing sequence of -subalgebras of , and

the
-algebra generated by

nN

n
. For each n N let P
n
: L
1
L
1
be the conditional expectation operator
corresponding to
n
, where L
1
= L
1
(). Show that lim
n
|P
n
u u|
p
= 0 whenever p [1, [ and u L
p
().
(Hint: 275Xd, 233J/242K, 246Xg.)
(f ) Let X
n

nN
be a martingale, and suppose that p ]1, [ is such that sup
nN
|X
n
|
p
< . Show that
X = lim
n
X
n
is dened almost everywhere and that lim
n
|X
n
X|
p
= 0.
>>>(g) Let (, , ) be [0, 1] with Lebesgue measure. For each n N let
n
be the nite subalgebra of generated
by intervals of the type [0, 2
n
r] for r 2
n
. Use 275I to show that for any integrable X : [0, 1] R we must
have X(t) = lim
n
2
n
_
I
n
(t)
X for almost every t [0, 1[, where I
n
(t) is the interval of the form [2
n
r, 2
n
(r + 1)[
containing t. Compare this result with 223A and 261Yd.
(h) In 275K, show that lim
n
|X
n
X

|
p
= 0 for any p [1, [ such that |X
0
|
p
is nite. (Compare 275Xe.)
380 Probability theory 275Xi
(i) Let (, , ) be a probability space, with completion (,

, ), and
n

nN
a non-decreasing sequence of -
subalgebras of

. Let X
n

nN
be a uniformly integrable martingale adapted to
n
, and set X

= lim
n
X
n
. Let
be a stopping time adapted to
n

nN
, and set

X

() = X
()
() whenever domX
()
, allowing as a value of
(). Show that

X

is a conditional expectation of X

on

, as dened in 275N.
(j) Let (, , ) be a probability space, with completion (,

, ), and
n

nN
a non-decreasing sequence of -
subalgebras of

. Let X
n

nN
be a martingale and a stopping time, both adapted to
n

nN
. Suppose that
sup
nN
E([X
n
[) < and that is nite almost everywhere. Show that

X

, as dened in 275Nc, has nite expectation,


but that E(

X

) need not be equal to E(X


0
).
(k)(i) Find a martingale X
n

nN
such that X
2n

nN
0 a.e. but [X
2n+1
[ 1 a.e. for every n N. (ii) Find a
martingale which converges in measure but is not convergent a.e.
275Y Further exercises (a) Let (, , ) be a complete probability space,
n

nN
a non-decreasing sequence of
-subalgebras of all containing every negligible set, and X
n

nN
a martingale adapted to
n

nN
. Let be another
probability measure with domain which is absolutely continuous with respect to , with Radon-Nikod ym derivative
Z. For each n N let Z
n
be a conditional expectation of Z on
n
(with respect to the measure ). (i) Show that
Z
n
is a Radon-Nikod ym derivative of
n
with respect to
n
. (ii) Set W
n
() = Z
n
()/Z
n1
() if this is dened
in R, otherwise 0. For n 1, let V
n
be a conditional expectation of W
n
(X
n
X
n1
) on
n1
(with respect to the
measure ). Set Y
0
= X
0
, Y
n
= X
n

n
k=1
V
k
for n 1. Show that Y
n

nN
is a martingale adapted to
n

nN
with
respect to the measure .
(b) Combine the ideas of 275Cc with those of 275Cd-275Ce to describe a notion of martingale indexed by I, where
I is an arbitrary partially ordered set.
(c) Let X
k

kN
be a martingale on a complete probability space (, , ), and x n N. Set X

= max([X
0
[, . . . , [X
n
[).
Let p ]1, [. Show that |X

|
p

p
p1
|X
n
|
p
. (Hint: set F
t
= : X

() t. Show that tF
t

_
F
t
[X
n
[. Using
Fubinis theorem on [0, [ and on [0, [ [0, [, show that
E((X

)
p
) = p
_

0
t
p1
F
t
dt,
_

0
t
p2
_
F
t
[X
n
[dt =
1
p1
E([X
n
[ (X

)
p1
),
E([X
n
[ (X

)
p1
) |X
n
|
p
|X

|
p1
p
.
Compare 286A.)
(d) Let X
k

kN
be a martingale on a complete probability space (, , ), and x n N. Set X

= max([X
0
[, . . . , [X
n
[),
F
t
= : X

() t, G
t
= : [X
n
()[
1
2
t for t 0. (i) Show that tF
t
2
_
G
t
[X
n
[ for every t 0. (ii) Show
that E(X

) 1 + 2 ln 2E([X
n
[) + 2E([X
n
[ ln
+
[X
n
[), where ln
+
t = ln t for t 1, 0 for t [0, 1].
(e) Let (, , ) be a probability space and
i

iI
a countable family of -subalgebras of such that for any i,
j I either
i

j
or
j

i
. Let X be a real-valued random variable on such that |X|
p
< , where 1 < p < ,
and suppose that X
i
is a conditional expectation of X on
i
for each i I. Show that | sup
iI
[X
i
[|
p

p
p1
|X|
p
.
(f ) Let (, , ) be a probability space, with completion (,

, ), and let
n

nN
be a non-decreasing sequence
of -subalgebras of

. Let X
n

nN
be a sequence of -integrable real-valued functions such that domX
n

n
and
X
n
is
n
-measurable for each n N. We say that X
n

nN
is a submartingale adapted to
n

nN
(also called
semi-martingale) if
_
E
X
n+1

_
E
X
n
for every n N and every E
n
. Prove versions of 275D, 275F, 275G,
275Xf for submartingales.
(g) Let X
n

nN
be a martingale, and : R R a convex function. Show that (X
n
)
nN
is a submartingale.
(Hint: 233J.) Re-examine part (b-ii) of the proof of 275F in the light of this fact.
(h) Let X
n

nN
be an independent sequence of non-negative random variables all with expectation 1. Set W
n
=
X
0
. . . X
n
for every n. (i) Show that W = lim
n
W
n
is dened a.e. (ii) Show that E(W) is either 0 or 1. (Hint:
275 Notes Martingales 381
suppose E(W) > 0. Set Z
n
= lim
m
X
n
. . . X
m
. Show that lim
n
Z
n
= 1 when 0 < W < , therefore a.e., by
the zero-one law, while E(Z
n
) 1, by Fatous lemma, so lim
n
E(Z
n
) = 1, while E(W) = E(W
n
)E(Z
n+1
) for every
n.) (iii) Set =

n=0
E(

X
n
). Show that > 0 i E(W) = 1. (Hint: Pr(W
n

1
4

2
)
1
4

2
for every n, so if > 0
then W cannot be zero a.e.; while E(

W) .)
(i) Let (
n
,
n
,
n
)
nN
be a sequence of probability spaces with product (, , ). Suppose that for each n N
we have a probability measure
n
, with domain
n
, which is absolutely continuous with respect to
n
, with Radon-
Nikod ym derivative f
n
, and suppose that

n=0
_
f
n
d
n
> 0. Let be the product of
n

nN
. Show that is an
indenite-integral measure over , with Radon-Nikod ym derivative f, where f() =

n=0
f
n
(
n
) for -almost every
=
n

nN
in . (Hint: use 275Yh to show that
_
fd = 1.)
(j) Let p
n

nN
be a sequence in [0, 1]. Let be the usual measure on 0, 1
N
(254J) and the product of
n

nN
,
where
n
is the probability measure on 0, 1 dened by setting
n
1 = p
n
. Show that is an indenite-integral
measure over i

n=0
[p
n

1
2
[
2
< .
(k) Let (, , ) be a probability space,
n

nN
a non-decreasing sequence of -subalgebras of and X
n

nN
a
sequence of random variables on such that E(sup
nN
[X
n
[) is nite and X = lim
n
X
n
is dened almost everywhere.
For each n, let Y
n
be a conditional expectation of X
n
on
n
. Show that Y
n

nN
converges almost everywhere to a
conditional expectation of X on the -algebra generated by

nN

n
.
(l) Show that 275Yk can fail if X
n

nN
is merely uniformly integrable, rather than dominated by an integrable
function.
(m) Let (, , ) be a probability space,
n

nN
an independent sequence of -subalgebras of , and X a random
variable on with nite variance. Let X
n
be a conditional expectation of X on
n
for each n. Show that lim
n
X
n
=
E(X) almost everywhere. (Hint: consider

n=0
Var(X
n
).)
(n) Let (, , ) be a complete probability space, and X
n

nN
an independent sequence of random variables on
, all with the same distribution, and of nite expectation. For each n, set S
n
=
1
n+1
(X
0
+ . . . + X
n
); let
n
be the
-algebra dened by S
n
and

n
the -algebra generated by

mn

m
. Show that S
n
is a conditional expectation
of X
0
on

n
. (Hint: assume every X
i
dened everywhere on . Set () = X
i
()
iN
. Show that : R
N
is inverse-measure-preserving for a suitable product measure on R
N
, and that every set in

n
is of the form
1
[H]
where H R
N
is a Borel set invariant under permutations of coordinates in the set 0, . . . , n, so that
_
E
X
i
=
_
E
X
j
whenever i j n and E

n
.) Hence show that S
n

nN
converges almost everywhere. (Compare 273I.)
(o) Formulate and prove versions of the results of this chapter for martingales consisting of functions taking values
in C or R
r
rather than R.
(p)(i) Find a martingale X
n

nN
which is convergent in measure, but is not convergent a.e. (Compare 272Yd.) (ii)
Find a martingale X
n

nN
such that the sequence
X
n
of distributions (271C) is convergent for the vague topology
(274Ld), but X
n

nN
is not convergent in measure.
(q) Let X
n

nN
be an independent sequence of real-valued random variables such that

n=0
X
n
is dened in R
almost everywhere. Suppose that there is an M 0 such that [X
n
[ M a.e. for every n. Show that

n=0
E(X
n
) is
dened in R. (Hint: 274Ye, 275G.)
(r) Let (, , ) be a probability space and X
n

nN
an independent sequence of real-valued random variables on
; set E
n
= : domX
n
, [X
n
()[ > 1, Y
n
= X
n
( E
n
) for each n, and Z
n
() = med(1, X
n
(), 1) for
n N and domX
n
. Show that the following are equiveridical: (i)

n=0
X
n
() is dened in R for almost every
; (ii)

n=0
E
n
< ,

n=0
E(Y
n
) is dened in R, and

n=0
Var(Y
n
) < ; (iii)

n=0
E
n
< ,

n=0
E(Z
n
) is
dened in R, and

n=0
Var(Z
n
) < . (Hint: 273K, 275Yq.) (This is a version of the Three Series Theorem.)
275 Notes and comments I hope that the sketch above, though distressingly abbreviated, has suggested some of
the richness of the concepts involved, and will provide a foundation for further study. All the theorems of this section
have far-reaching implications, but the one which is simply indispensable in advanced measure theory is 275I, Levys
martingale convergence theorem, which I will use in the proof of the Lifting Theorem in Chapter 34 of the next volume.
As for stopping times, I mention them partly in an attempt to cast further light on what martingales are for (see
276Ed below), and partly because the ideas of 275N-275O are so important in modern probability theory that, just
as a matter of general knowledge, you should be aware that there is something there. I add 275P as one of the most
accessible of the standard results which may be obtained by this method.
382 Probability theory 276 intro.
276 Martingale dierence sequences
Hand in hand with the concept of martingale is that of martingale dierence sequence (276A), a direct general-
ization of the notion of independent sequence. In this section I collect results which can be naturally expressed in
terms of dierence sequences, including yet another strong law of large numbers (276C). I end the section with a proof
of Komlos theorem (276H).
276A Martingale dierence sequences (a) If X
n

nN
is a martingale adapted to a sequence
n

nN
of -
algebras, then we have
_
E
X
n+1
X
n
= 0
whenever E
n
. Let us say that if (, , ) is a probability space, with completion (,

, ), and
n

nN
is a
non-decreasing sequence of -subalgebras of

, then a martingale dierence sequence adapted to
n

nN
is a
sequence X
n

nN
of real-valued random variables on , all with nite expectation, such that (i) domX
n

n
and
X
n
is
n
-measurable, for each n N (ii)
_
E
X
n+1
= 0 whenever n N, E
n
.
(b) Evidently X
n

nN
is a martingale dierence sequence adapted to
n

nN
i

n
i=0
X
i

nN
is a martingale
adapted to
n

nN
.
(c) Just as in 275Cd, we can say that a sequence X
n

nN
is in itself a martingale dierence sequence if

n
i=0
X
i

nN
is a martingale, that is, if X
n

nN
is a martingale dierence sequence adapted to

nN
, where

n
is
the -algebra generated by

in

X
i
.
(d) If X
n

nN
is a martingale dierence sequence then a
n
X
n

nN
is a martingale dierence sequence for any real
a
n
.
(e) If X
n

nN
is a martingale dierence sequence and X

n
=
a.e.
X
n
for every n, then X

nN
is a martingale
dierence sequence. (Compare 275Ce.)
(f ) Of course the most important example of martingale dierence sequence is that of 275Bb: any independent
sequence of random variables with zero expectation is a martingale dierence sequence. It turns out that some of the
theorems of 273 concerning such independent sequences may be generalized to martingale dierence sequences.
276B Proposition Let X
n

nN
be a martingale dierence sequence such that

n=0
E(X
2
n
) < . Then

n=0
X
n
is dened, and nite, almost everywhere.
proof (a) Let (, , ) be the underlying probability space, (,

, ) its completion, and
n

nN
a non-decreasing
sequence of -subalgebras of

such that X
n

nN
is adapted to
n

nN
. Set Y
n
=

n
i=0
X
i
for each n N. Then
Y
n

nN
is a martingale adapted to
n

nN
.
(b) E(Y
n
X
n+1
) = 0 for each n. PPP Y
n
is a sum of random variables with nite variance, so E(Y
2
n
) < , by
244Ba; it follows that Y
n
X
n+1
has nite expectation, by 244Eb. Because the constant function 000 is a conditional
expectation of X
n+1
on
n
,
E(Y
n
X
n+1
) = E(Y
n
000) = 0,
by 242L. QQQ
(c) It follows that E(Y
2
n
) =

n
i=0
E(X
2
i
) for every n. PPP Induce on n. For the inductive step, we have
E(Y
2
n+1
) = E(Y
2
n
+ 2Y
n
X
n+1
+X
2
n+1
) = E(Y
2
n
) +E(X
2
n+1
)
because, by (b), E(Y
n
X
n+1
) = 0. QQQ
(d) Of course
E([Y
n
[) =
_
[Y
n
[ |Y
n
|
2
||
2
=
_
E(Y
2
n
),
so
sup
nN
E([Y
n
[) sup
nN
_
E(Y
2
n
) =
_

i=0
E(X
2
i
) < .
By 275G, lim
n
Y
n
is dened and nite almost everywhere, that is,

i=0
X
i
is dened and nite almost everywhere.
276E Martingale dierence sequences 383
276C The strong law of large numbers: fourth form Let X
n

nN
be a martingale dierence sequence, and
suppose that b
n

nN
is a non-decreasing sequence in ]0, [, diverging to , such that

n=0
1
b
2
n
Var(X
n
) < . Then
lim
n
1
b
n

n
i=0
X
i
= 0
almost everywhere.
proof (Compare 273D.) As usual, write (, , ) for the underlying probability space. Set

X
n
=
1
b
n
X
n
for each n; then

X
n

nN
also is a martingale dierence sequence, and

n=1
E(

X
2
n
) =

n=1
1
b
2
n
Var(X
n
) < .
By 276B,

X
n
()
nN
is summable for almost every . But by 273Cb,
lim
n
1
b
n

n
i=0
X
i
() = lim
n
1
b
n

n
i=0
b
i

X
i
() = 0
for all such . So we have the result.
276D Corollary Let X
n

nN
be a martingale such that b
n
= E(X
2
n
) is nite for each n.
(a) If sup
nN
b
n
is innite, then lim
n
1
b
n
X
n
= 0 a.e.
(b) If sup
n1
1
n
b
n
< , then lim
n
1
n
X
n
= 0 a.e.
proof Consider the martingale dierence sequence Y
n

nN
= X
n+1
X
n

nN
. Then E(Y
n
X
n
) = 0, so E(Y
2
n
) +
E(X
2
n
) = E(X
2
n+1
) for each n. In particular, b
n

nN
must be non-decreasing.
(a) If lim
n
b
n
= , take m such that b
m
> 0; then

n=m
1
b
2
n+1
Var(Y
n
) =

n=m
1
b
2
n+1
(b
n+1
b
n
)
_

b
m
1
t
2
dt < .
By 276C (modifying b
i
for i < m, if necessary),
lim
n
1
b
n
X
n
= lim
n
1
b
n+1
(X
0
+

n
i=0
Y
i
) = lim
n
1
b
n+1

n
i=0
Y
i
= 0
almost everywhere.
(b) If = sup
n1
1
n
b
n
< , then
1
(n+1)
2
min(1,
2
/t
2
) for b
n
< t b
n+1
, so

n=0
1
(n+1)
2
(b
n+1
b
n
) +
2
_

1
t
2
dt < ,
and, by the same argument as before, lim
n
1
n
X
n
= 0 a.e.
276E Impossibility of systems (a) I return to the word martingale and the idea of a gambling system. Consider
a gambler who takes a sequence of fair bets, that is, bets which have payo expectations of zero, but who chooses
which bets to take on the basis of past experience. The appropriate model for such a sequence of random events is a
martingale in the sense of 275A, taking
n
to be the algebra of all events which are observable up to and including
the outcome of the nth bet, and X
n
to be the gamblers net gain at that time. (In this model it is natural to take

0
= , and X
0
= 0.) Certain paradoxes can arise if we try to imagine this model with atomless
n
; to begin with
it is perhaps easier to work with the discrete case, in which each
n
is nite, or is the set of unions of some countable
family of atomic events. Now suppose that the bets involved are just two-way bets, with two equally likely outcomes,
but that the gambler chooses his stake each time. In this case we can think of the outcomes as corresponding to an
independent sequence W
n

nN
of random variables, each taking the values 1 with equal probability. The gamblers
system must be of the form
X
n+1
= X
n
+Z
n+1
W
n+1
,
where Z
n+1
is his stake on the (n + 1)-st bet, and must be constant on each atom of the -algebra
n
generated by
W
1
, . . . , W
n
. The point is that because
_
E
W
n+1
= 0 for each E
n
, E(Z
n+1
W
n+1
) = 0, so E(X
n+1
) = E(X
n
).
384 Probability theory 276Eb
(b) The general result, of which this is a special case, is the following. If W
n

nN
is a martingale dierence sequence
adapted to
n

nN
, and Z
n

n1
is a sequence of random variables such that (i) Z
n
is
n1
-measurable (ii) Z
n
W
n
has nite expectation for each n 1, then W
0
, Z
1
W
1
, Z
2
W
2
, . . . is a martingale dierence sequence adapted to

nN
; the proof that
_
E
Z
n+1
W
n+1
= 0 for every E
n
is exactly the argument of (b) of the proof of 276B.
(c) I invited you to restrict your ideas to the discrete case for a moment; but if you feel that you understand what
it means to say that a system or predictable sequence Z
n

n1
must be adapted to
n

nN
, in the sense that
every Z
n
is
n1
-measurable, then any further diculty lies in the measure theory needed to show that the integrals
_
E
Z
n+1
W
n+1
are zero, which is what this book is about.
(d) Consider the gambling system mentioned in 275Cf. Here the idea is that W
n
= 1, as in (a), and Z
n+1
= 2
n
a
if X
n
0, 0 if X
n
> 0; that is, the gambler doubles his stake each time until he wins, and then quits. Of course he
is almost sure to win eventually, so we have lim
n
X
n
= a almost everywhere, even though E(X
n
) = 0 for every n.
We can compute the distribution of X
n
: for n 1 we have Pr(X
n
= a) = 1 2
n
, Pr(X
n
= (2
n
1)a) = 2
n
.
Thus E([X
n
[) = (2 2
n+1
)a and the almost-everywhere convergence of the X
n
is an example of Doobs Martingale
Convergence Theorem.
In the language of stopping times (275N), X
n
=

Y
n
, where Y
n
=

n
k=0
2
k
aW
k
and = minn : Y
n
> 0.
*276F I come now to Komlos theorem. The rst step is a triing renement of 276C.
Lemma Let (, , ) be a probability space, and
n

nN
a non-decreasing sequence of -subalgebras of . Suppose
that X
n

nN
is a sequence of random variables on such that (i) X
n
is
n
-measurable for each n (ii)

n=0
1
(n+1)
2
E(X
2
n
)
is nite (iii) lim
n
X

n
= 0 a.e., where X

n
is a conditional expectation of X
n
on
n1
for each n 1. Then
lim
n
1
n+1

n
k=0
X
k
= 0 a.e.
proof Making suitable adjustments on a negligible set if necessary, we may suppose that X

n
is
n1
-measurable for
n 1 and that every X
n
and X

n
is dened on the whole of . Set X

0
= X
0
and Y
n
= X
n
X

n
for n N. Then
Y
n

nN
is a martingale dierence sequence adapted to
n

nN
. Also E(Y
2
n
) E(X
2
n
) for every n. PPP If n 1, X

n
is
square-integrable (244M), and E(Y
n
X

n
) = 0, as in part (b) of the proof of 276B. Now
E(X
2
n
) = E(Y
n
+X

n
)
2
= E(Y
2
n
) + 2E(Y
n
X

n
) +E(X

n
)
2
E(Y
2
n
). QQQ
This means that

n=0
1
(n+1)
2
E(Y
2
n
) must be nite. By 276C, lim
n
1
n+1

n
i=0
Y
i
= 0 a.e. But by 273Ca we also
have lim
n
1
n+1

n
i=0
X

i
= 0 whenever lim
n
X

n
= 0, which is almost everywhere. So lim
n
1
n+1

n
i=0
X
i
= 0
a.e.
*276G Lemma Let (, , ) be a probability space, and X
n

nN
a sequence of random variables on such that
sup
nN
E([X
n
[) is nite. For k N and x R set F
k
(x) = x if [x[ k, 0 otherwise. Let T be an ultralter on N.
(a) For each k N there is a measurable function Y
k
: [k, k] such that lim
nF
_
E
F
k
(X
n
) =
_
E
Y
k
for every
E .
(b) lim
nF
E((F
k
(X
n
) Y
k
)
2
) lim
nF
E(F
k
(X
n
)
2
) for each k.
(c) Y = lim
k
Y
k
is dened a.e. and lim
k
E([Y Y
k
[) = 0.
proof (a) For each k, [F
k
(X
n
)[
a.e.
k for every n, so that F
k
(X
n
) : n N is uniformly integrable, and
F
k
(X
n
)

: n N is relatively weakly compact in L


1
= L
1
() (247C). Accordingly v
k
= lim
nF
F
k
(X
n
)

is dened
in L
1
(2A3Se); take Y
k
: R to be a measurable function such that Y

k
= v
k
. For any E ,
_
E
Y
k
=
_
v
k
(E)

= lim
nF
_
E
F
k
(X
n
).
In particular,
[
_
E
Y
k
[ sup
nN
[
_
E
F
k
(X
n
)[ kE
for every E, so that : Y
k
() > k and : Y
k
() < k are both negligible; changing Y
k
on a negligible set if
necessary, we may suppose that [Y
k
()[ k for every .
(b) Because Y
k
is bounded, Y

k
L

(), and
lim
nF
_
F
k
(X
n
) Y
k
= lim
nF
_
F
k
(X
n
)

k
=
_
Y

k
Y

k
=
_
Y
2
k
.
Accordingly
*276H Martingale dierence sequences 385
lim
nF
_
(F
k
(X
n
) Y
k
)
2
= lim
nF
_
F
k
(X
n
)
2
2 lim
nF
_
F
k
(X
n
) Y
k
+
_
Y
2
k
= lim
nF
_
F
k
(X
n
)
2

_
Y
2
k
lim
nF
_
F
k
(X
n
)
2
.
(c) Set W
0
= Y
0
= 0, W
k
= Y
k
Y
k1
for k 1. Then E([W
k
[) lim
nF
E([F
k
(X
n
) F
k1
(X
n
)[) for every k 1.
PPP Set E = : W
k
() 0. Then
_
E
W
k
=
_
E
Y
k

_
E
Y
k1
= lim
nF
_
E
F
k
(X
n
) lim
nF
_
E
F
k1
(X
n
)
= lim
nF
_
E
F
k
(X
n
) F
k1
(X
n
) lim
nF
_
E
[F
k
(X
n
) F
k1
(X
n
)[.
Similarly,
[
_
X\E
W
k
[ lim
nF
_
X\E
[F
k
(X
n
) F
k1
(X
n
)[.
So
E([W
k
[) =
_
E
W
k

_
X\E
W
k
lim
nF
_
[F
k
(X
n
) F
k1
(X
n
)[. QQQ
It follows that

k=0
E([W
k
[) is nite. PPP For any m 1,
m

k=0
E([W
k
[)
m

k=1
lim
nF
E([F
k
(X
n
) F
k1
(X
n
)[)
= lim
nF
E(
m

k=1
[F
k
(X
n
) F
k1
(X
n
)[)
= lim
nF
E([F
m
(X
n
)[) sup
nN
E([X
n
[).
So

k=0
E([W
k
[) sup
nN
E([X
n
[) is nite. QQQ
By B.Levis theorem (123A), lim
m

m
k=0
[W
k
[ is nite a.e., so that
Y = lim
m
Y
m
=

k=0
W
k
is dened a.e.; and moreover
E([Y Y
k
[) lim
m
E(

m
j=k+1
[W
j
[) 0
as k .
*276H Komlos theorem (Koml os 67) Let (, , ) be any measure space, and X
n

nN
a sequence of integrable
real-valued functions on such that sup
nN
_
[X
n
[ is nite. Then there are a subsequence X

nN
of X
n

nN
and
an integrable function Y such that Y =
a.e.
lim
n
1
n+1

n
i=0
X

i
whenever X

nN
is a subsequence of X

nN
.
proof Since neither the hypothesis nor the conclusion is aected by changing the X
n
on a negligible set, we may
suppose throughout that every X
n
is measurable and dened on the whole of . In addition, to begin with (down to
the end of (e) below), let us suppose that X = 1. As in 276G, set F
k
(x) = x for [x[ k, 0 for [x[ > k.
(a) Let T be any non-principal ultralter on N (2A1O). For j N set p
j
= lim
nF
Pr([X
n
[ > j). Then

j=0
p
j
is
nite. PPP For any k N,
k

j=0
p
j
=
k

j=0
lim
nF
Pr([X
n
[ > j) = lim
nF
k

j=0
Pr([X
n
[ > j)
lim
nF
(1 +
_
[X
n
[) 1 + sup
nN
_
[X
n
[.
386 Probability theory *276H
So

j=0
p
j
1 + sup
nN
_
[X
n
[ is nite. QQQ
Setting
p

j
= p
j
p
j+1
= lim
nF
Pr(j < [X
n
[ j + 1)
for each j, we have

j=0
(j + 1)p

j
= lim
m
_
m

j=0
(j + 1)p
j

m

j=0
(j + 1)p
j+1
_
= lim
m
m

j=0
p
j
(m+ 1)p
m+1

j=0
p
j
< .
Next,
lim
nF
_
F
k
(X
n
)
2

k
j=0
(j + 1)
2
p

j
for each k. PPP Setting E
jn
= : j [X
n
()[ < j + 1 for j, n N, F
k
(X
n
)
2

k
j=0
(j + 1)
2
E
jn
, so
lim
nF
_
F
k
(X
n
)
2
lim
nF

k
j=0
(j + 1)
2
E
jn
=

k
j=0
(j + 1)
2
p

j
. QQQ
(b) Dene Y
k

kN
and Y =
a.e.
lim
k
Y
k
from X
n

nN
and T as in Lemma 276G. Then
J
k
= n : n N,
_
(F
k
(X
n
) Y
k
)
2
1 +

k
j=0
(j + 1)
2
p

belongs to T for every k N. PPP By (a) above and 276Gb,


lim
nF
_
(F
k
(X
n
) Y
k
)
2
lim
nF
_
F
k
(X
n
)
2

k
j=0
(j + 1)
2
p

j
. QQQ
Also, of course,
K
k
= n : n N, Pr(F
j
(X
n
) ,= X
n
) p
j
+ 2
j
for every j k
belongs to T for every k.
(c) For n, k N let Z
kn
be a simple function such that [Z
kn
[ [F
k
(X
n
) Y
k
[ and
_
[F
k
(X
n
) Y
k
Z
kn
[ 2
k
.
For m N let
m
be the algebra of subsets of generated by sets of the form : Z
kn
() = for k, n m and
R. Because each Z
kn
takes only nitely many values,
m
is nite (and is therefore a -subalgebra of ); and of
course
m

m+1
for every m.
We need to look at conditional expectations on the
m
, and because
m
is always nite these have a particularly
straightforward expression. Let /
m
be the set of atoms, or minimal non-empty sets, in
m
; that is, the set of
equivalence classes in under the relation

if Z
kn
() = Z
kn
(

) for all k, n m. For any integrable random


variable X on , dene E
m
(X) by setting
E
m
(X)() =
1
A
_
A
X if x A /
m
and A > 0,
= 0 if x A /
m
and A = 0.
Then E
m
(X) is a conditional expectation of X on
m
.
Now
lim
nF
_
[E
m
(F
k
(X
n
) Y
k
)[ = lim
nF

AA
m
_
A
[E
m
(F
k
(X
n
) Y
k
)[
= lim
nF

AA
m
[
_
A
E
m
(F
k
(X
n
) Y
k
)[
(because E
m
(F
k
(X
n
) Y
k
) is constant on each A /
m
)
= lim
nF

AA
m
[
_
A
F
k
(X
n
) Y
k
[
=

AA
m
lim
nF
[
_
A
F
k
(X
n
) Y
k
[ = 0
*276H Martingale dierence sequences 387
by the choice of Y
k
. So if we set
I
m
= n : n N,
_
[E
m
(F
k
(X
n
) Y
k
)[ 2
k
for every k m,
then I
m
T for every m.
(d) Suppose that r(n)
nN
is any strictly increasing sequence in N such that r(0) > 0, r(n) J
n
K
n
for every n
and r(n) I
r(n1)
for n 1. Then
1
n+1

n
i=0
X
r(i)
Y a.e. as n . PPP Express X
r(n)
as
(X
r(n)
F
n
(X
r(n)
)) + (F
n
(X
r(n)
) Y
n
Z
n,r(n)
) +Y
n
+Z
n,r(n)
for each n. Taking these pieces in turn:
(i)

n=0
Pr(X
r(n)
,= F
n
(X
r(n)
))

n=0
p
n
+ 2
n
(because r(n) K
n
for every n)
<
by (a). But this means that X
r(n)
F
n
(X
r(n)
) 0 a.e., since the sequence is eventually zero at almost every point,
and
1
n+1

n
i=0
X
r(i)
F
i
(X
r(i)
) 0 a.e. by 273Ca.
(ii) By the choice of the Z
n,r(n)
,

n=0
_
[F
n
(X
r(n)
) Y
n
Z
n,r(n)
[

n=0
2
n
is nite, so F
n
(X
r(n)
) Y
n
Z
n,r(n)
0 a.e. and
1
n+1

n
i=0
F
i
(X
r(i)
) Y
i
Z
i,r(i)
0 a.e.
(iii) By 276G, Y
n
Y a.e. and
1
n+1

n
i=0
Y
i
Y a.e.
(iv) We know that, for each n 1, r(n) I
r(n1)
. So (because r(n 1) n)
_
[E
r(n1)
(F
n
(X
r(n)
) Y
n
[ 2
n
.
But as also
_

E
r(n1)
_
F
n
(X
r(n)
) Y
n
Z
n,r(n)
_

_
[F
n
(X
r(n)
) Y
n
Z
n,r(n)
[ 2
n
by 244M and the choice of Z
n,r(n)
,
_
[E
r(n1)
Z
n,r(n)
[ =
_

E
r(n1)
(F
n
(X
r(n)
) Y
n
) E
r(n1)
(F
n
(X
r(n)
) Y
n
Z
n,r(n)
)

2
n+1
for every n. Accordingly E
r(n1)
Z
n,r(n)
0 a.e.
On the other hand,

n=0
1
(n+1)
2
_
Z
2
n,r(n)

n=0
1
(n+1)
2
_
F
n
(X
r(n)
Y
n
)
2

n=0
1
(n+1)
2
(1 +
n

j=0
(j + 1)
2
p

j
)
(because r(n) J
n
)

n=0
1
(n+1)
2
+

j=0
(j + 1)
2
p

n=j
1
(n+1)
2


2
6
+ 2

j=0
(j + 1)p

j
is nite. (I am using the estimate
388 Probability theory *276H

n=j
1
(n+1)
2

n=j
2
n+1

2
n+2
=
2
j+1
.)
By 276F, applied to
r(n)

nN
and Z
n,r(n)

nN
,
1
n+1

n
i=0
Z
i,r(i)
0 a.e.
(v) Adding these four components, we see that
1
n+1

n
i=0
X
r(i)
0, as claimed. QQQ
(e) Now x any strictly increasing sequence s(n)
nN
in N such that s(0) > 0, s(n) J
n
K
n
for every n and
s(n) I
s(n1)
for n 1; such a sequence exists because J
n
K
n
I
s(n1)
belongs to T, so is innite, for every n 1.
Set X

n
= X
s(n)
for every n. If X

nN
is a subsequence of X

nN
, then it is of the form X
s(r(n))

nN
for some
strictly increasing sequence r(n)
nN
. In this case
s(r(0)) s(0) > 0,
s(r(n)) J
r(n)
K
r(n)
J
n
K
n
for every n,
s(r(n)) I
s(r(n)1)
I
s(r(n1))
for every n 1.
So (d) tells us that
1
n+1

n
i=0
X

i
Y a.e.
(f ) Thus the theorem is proved in the case in which (, , ) is a probability space. Now suppose that is -nite
and > 0. In this case there is a strictly positive measurable function f : R such that
_
fd = 1 (215B(ix)). Let
be the corresponding indenite-integral measure (234J), so that is a probability measure on , and
1
f
X
n

nN
is
a sequence of -integrable functions such that sup
nN
_
1
f
X
n
d is nite (235K). From (a)-(e) we see that there must
be a -integrable function Y and a subsequence X

nN
of X
n

nN
such that
1
n+1

n
i=0
1
f
X

i
Y -a.e. for every
subsequence X

nN
of X

nN
. But and have the same negligible sets (234Lc), so
1
n+1

n
i=0
X

i
f Y -a.e.
for every subsequence X

nN
of X

nN
.
(g) Since the result is trivial if = 0, the theorem is true whenever is -nite. For the general case, set

nN
: X
n
() ,= 0 =

m,nN
: [X
n
()[ 2
m
,
so that the subspace measure

is -nite. Then there are a

-integrable function

Y and a subsequence X

nN
of X
n

nN
such that
1
n+1

n
i=0
X

-a.e. for every subsequence X

nN
of X

nN
. Setting Y () =

Y ()
if

, 0 for

, we see that Y is -integrable and that
1
n+1

n
i=0
X

i
Y -a.e. whenever X

nN
is a
subsequence of X

nN
. This completes the proof.
276X Basic exercises >>>(a) Let X
n

nN
be a martingale adapted to a sequence
n

nN
of -algebras. Show
that
_
E
X
2
n

_
E
X
2
n+1
for every n N, E
n
(allowing as a value of an integral). (Hint: see the proof of 276B.)
>>>(b) Let X
n

nN
be a martingale. Show that for any > 0,
Pr(sup
nN
[X
n
[ )
1

2
sup
nN
E(X
2
n
).
(Hint: put 276Xa together with the argument for 275D.)
(c) When does 276Xb give a sharper result than 275Xb?
(d) Let X
n

nN
be a martingale dierence sequence and set Y
n
=
1
n+1
(X
0
+ . . . + X
n
) for each n N. Show that
if X
n

nN
is uniformly integrable then lim
n
|Y
n
|
1
= 0. (Hint: use the argument of 273Na, with 276C in place of
273D, and setting

X
n
= X

n
Z
n
, where Z
n
is an appropriate conditional expectation of X

n
.)
>>>(e) Strong law of large numbers: fth form A sequence X
n

nN
of random variables is exchangeable if
(X
n
0
, . . . , X
n
k
) has the same joint distribution as (X
0
, . . . , X
k
) whenever n
0
, . . . , n
k
are distinct. Show that if X
n

nN
is an exchangeable sequence of random variables with nite expectation, then
1
n+1

i=0
X
i

nN
converges a.e. (Hint:
276H.)
276 Notes Martingale dierence sequences 389
(f ) Let X
n

nN
be an independent identically distributed sequence of random variables with zero expectation and
non-zero nite variance, and t
n

nN
a sequence in R. Show that (i) if

n=0
t
2
n
< , then

n=0
t
n
X
n
is dened in R
a.e. (ii) if

n=0
t
2
n
= then

n=0
t
n
X
n
is undened a.e. (Hint: 276B, 274Xk.)
(g) Suppose that X
n

nN
is a uniformly bounded martingale dierence sequence and a
n

nN

2
. Show that
lim
n

n
i=0
(1+a
i
X
i
) is dened and nite almost everywhere. (Hint: a
n
X
n

nN
is summable and square-summable
a.e.)
276Y Further exercises (a) Let X
n

nN
be a martingale dierence sequence such that sup
nN
|X
n
|
p
is nite,
where p ]1, [. Show that lim
n
|
1
n+1

n
i=0
X
i
|
p
= 0. (Hint: 273Nb.)
(b) Let X
n

nN
be a uniformly integrable martingale dierence sequence and Y a bounded random variable. Show
that lim
n
E(X
n
Y ) = 0. (Compare 272Ye.)
(c) Use 275Yg to prove 276Xa.
(d) Let X
n

nN
be a sequence of random variables such that, for some > 0, sup
nN
n

E([X
n
[) is nite. Set
S
n
=
1
n+1
(X
0
+. . . +X
n
) for each n. Show that lim
n
S
n
= 0 a.e. (Hint: set Z
k
= 2
k
([X
0
[ +. . . +[X
2
k
1
[). Show
that

k=0
E(Z
k
) < . Show that S
n
2Z
k+1
if 2
k
< n 2
k+1
.)
(e) Strong law of large numbers: sixth form Let X
n

nN
be a martingale dierence sequence such that, for
some > 0, sup
nN
E([X
n
[
1+
) is nite. Set S
n
=
1
n+1
(X
0
+. . . +X
n
) for each n. Show that lim
n
S
n
= 0 a.e. (Hint:
take a non-decreasing sequence
n

nN
to which X
n

nN
is adapted. Set Y
n
= X
n
when [X
n
[ n, 0 otherwise. Let
U
n
be a conditional expectation of Y
n
on
n1
and set Z
n
= Y
n
U
n
. Use ideas from 273H, 276C and 276Yd above
to show that
1
n+1

n
i=0
V
i
0 a.e. for V
i
= Z
i
, V
i
= U
i
, V
i
= X
i
Y
i
.)
(f ) Show that there is a martingale X
n

nN
which converges in measure but is not convergent a.e. (Compare
273Ba.) (Hint: arrange that : X
n+1
() ,= 0 = E
n
: [X
n+1
() X
n
()[ 1, where E
n

nN
is an
independent sequence of sets and E
n
=
1
n+1
for each n.)
(g) Give an example of an identically distributed martingale dierence sequence X
n

nN
such that
1
n+1
(X
0
+. . . +
X
n
)
nN
does not converge to 0 almost everywhere. (Hint: start by devising a uniformly bounded sequence U
n

nN
such that lim
n
E([U
n
[) = 0 but
1
n+1
(U
0
+ . . . + U
n
)
nN
does not converge to 0 almost everywhere. Now repeat
your construction in such a context that the U
n
can be derived from an identically distributed martingale dierence
sequence by the formulae of 276Ye.)
(h) Construct a proof of Komlos theorem which does not involve ultralters, or any other use of the full axiom of
choice, but proceeds throughout by selecting appropriate sub-subsequences. Remember to check that you can prove
any fact you use about weakly convergent sequences in L
1
on the same rules.
276 Notes and comments I include two more versions of the strong law of large numbers (276C, 276Ye) not because
I have any applications in mind but because I think that if you know the strong law for | |
1+
-bounded independent
sequences, and what a martingale dierence sequence is, then there is something missing if you do not know the strong
law for | |
1+
-bounded martingale dierence sequences. And then, of course, I have to add 276Yf and 276Yg, lest you
be tempted to think that the strong law is really about martingale dierence sequences rather than about independent
sequences. (Compare 272Yd and 275Yp.)
Komlos theorem is rather outside the scope of this volume; it is quite hard work and surely much less important,
to most probabilists, than many results I have omitted. It does provide a quick proof of 276Xe. However it is relevant
to questions arising in some topics treated in Volumes 3 and 4, and the proof ts naturally into this section.
390 Fourier analysis
Chapter 28
Fourier analysis
For the last chapter of this volume, I attempt a brief account of one of the most important topics in analysis. This is
a bold enterprise, and I cannot hope to satisfy the reasonable demands of anyone who knows and loves the subject as it
deserves. But I also cannot pass it by without being false to my own subject, since problems contributed by the study
of Fourier series and transforms have led measure theory throughout its history. What I will try to do, therefore, is to
give versions of those results which everyone ought to know in language unifying them with the rest of this treatise,
aiming to open up a channel for the transfer of intuitions and techniques between the abstract general study of measure
spaces, which is the centre of our work, and this particular family of applications of the theory of integration.
I have divided the material of this chapter, conventionally enough, into three parts: Fourier series, Fourier transforms
and the characteristic functions of probability theory. While it will be obvious that many ideas are common to all
three, I do not think it useful, at this stage, to try to formulate an explicit generalization to unify them; that belongs
to a more general theory of harmonic analysis on groups, which must wait until Volume 4. I begin however with a
section on the Stone-Weierstrass theorem (281), which is one of the basic tools of functional analysis, as well as being
useful for this chapter. The nal section (286), a proof of Carlesons theorem, is at a rather dierent level from the
rest.
281 The Stone-Weierstrass theorem
Before we begin work on the real subject of this chapter, it will be helpful to have a reasonably general statement
of a fundamental theorem on the approximation of continuous functions. In fact I give a variety of forms (281A, 281E,
281F and 281G, together with 281Ya, 281Yd and 281Yg), all of which are sometimes useful. I end the section with a
version of Weyls Equidistribution Theorem (281M-281N).
281A Stone-Weierstrass theorem: rst form Let X be a topological space and K a compact subset of X.
Write C
b
(X) for the space of all bounded continuous real-valued functions on X, so that C
b
(X) is a linear space over
R. Let A C
b
(X) be such that
A is a linear subspace of C
b
(X);
[f[ A for every f A;
X A;
whenever x, y are distinct points of K there is an f A such that f(x) ,= f(y).
Then for every continuous h : K R and > 0 there is an f A such that
[f(x) h(x)[ for every x K,
if K ,= , inf
xX
f(x) inf
xK
h(x) and sup
xX
f(x) sup
xK
h(x).
Remark I have stated this theorem in its natural context, that of general topological spaces. But if these are unfamiliar
to you, you do not in fact need to know what they are. If you read let X be a topological space as let X be a subset
of R
r
and K is a compact subset of X as K is a subset of X which is closed and bounded in R
r
, you will have
enough for all the applications in this chapter. In order to follow the proof, of course, you will need to know a little
about compactness in R
r
; I have written out the necessary facts in 2A2.
proof (a) If K is empty, then we can take f = 0 to be the constant function with value 0. So henceforth let us suppose
that K ,= .
(b) The rst point to note is that if f, g A then f g and f g belong to A, where
(f g)(x) = min(f(x), g(x)), (f g)(x) = max(f(x), g(x))
for every x X; this is because
f g =
1
2
(f +g [f g[), f g =
1
2
(f +g +[f g[).
It follows by induction on n that f
0
. . . f
n
and f
0
. . . f
n
belong to A for all f
0
, . . . , f
n
A.
(c) If x, y are distinct points of K, and a, b R, there is an f A such that f(x) = a and f(y) = b. PPP Start from
g A such that g(x) ,= g(y); this is the point at which we use the last of the list of four hypotheses on A. Set
=
ab
g(x)g(y)
, =
bg(x)ag(y)
g(x)g(y)
, f = g +X A. QQQ
281B The Stone-Weierstrass theorem 391
(d) (The heart of the proof lies in the next two paragraphs.) Let h : K [0, [ be a continuous function and x
any point of K. For any > 0, there is an f A such that f(x) = h(x) and f(y) h(y) + for every y K. PPP Let (
x
be the family of those open sets G X for which there is some f A such that f(x) = h(x) and f(w) h(w) + for
every w KG. I claim that K

(
x
. To see this, take any y K. By (c), there is an f A such that f(x) = h(x)
and f(y) = h(y). Now hfK : K R is a continuous function, taking the value 0 at y, so there is an open subset G
of X, containing y, such that (h fK)(w) for every w G K, that is, f(w) h(w) + for every w G K.
Thus G (
x
and y

(
x
, as required.
Because K is compact, (
x
has a nite subcover G
0
, . . . , G
n
say. For each i n, take f
i
A such that f
i
(x) = h(x)
and f
i
(w) h(w) + for every w G
i
K. Then
f = f
0
f
1
. . . f
n
A,
by (b), and evidently f(x) = h(x), while if y K there is some i n such that y G
i
, so that
f(y) f
i
(y) h(y) +. QQQ
(e) If h : K R is any continuous function and > 0, there is an f A such that [f(y) h(y)[ for every y K.
PPP This time, let ( be the set of those open subsets G of X for which there is some f A such that f(y) h(y) +
for every y K and f(x) h(x) for every x G K. Once again, ( is an open cover of K. To see this, take any
x K. By (d), there is an f A such that f(x) = h(x) and f(y) h(y) + for every y K. Now hfK : K R is
a continuous function which is zero at x, so there is an open subset G of X, containing x, such that (h fK)(w)
for every w G K, that is, f(w) h(w) for every w G K. Thus G ( and x

(, as required.
Because K is compact, ( has a nite subcover G
0
, . . . , G
m
say. For each j m, take f
j
A such that f
j
(y) h(y)+
for every y K and f
j
(w) h(w) for every w G
j
K. Then
f = f
0
f
1
. . . f
m
A,
by (b), and evidently f(y) h(y) + for every y K, while if x K there is some j m such that x G
j
, so that
f(x) f
j
(x) h(x) .
Thus [f(x) h(x)[ for every x K, as required. QQQ
(f ) Thus we have an f satisfying the rst of the two requirements of the theorem. But for the second, set M
0
=
inf
xK
h(x) and M
1
= sup
xK
h(x), and
f
1
= (M
0
X) (f M
1
X);
f
1
satises the second condition as well as the rst. (I am tacitly assuming here what is in fact the case, that M
0
and
M
1
are nite; this is because K is compact see 2A2G or 2A3N.)
281B We need some simple tools, belonging to the basic theory of normed spaces; but I hope they will be accessible
even if you have not encountered normed spaces before, if you keep a nger at the beginning of 2A4 as you read the
next lemma.
Lemma Let X be any set. Write

(X) for the set of bounded functions from X to R. For f

(X), set
|f|

= sup
xX
[f(x)[,
counting the supremum as 0 if X is empty. Then
(a)

(X) is a normed space.


(b) Let A

(X) be a subset and A its closure (2A3D).


(i) If A is a linear subspace of

(X), so is A.
(ii) If f g A whenever f, g A, then f g A whenever f, g A.
(iii) If [f[ A whenever f A, then [f[ A whenever f A.
proof (a) This is a routine verication. To conrm that

(X) is a linear space over R, we have to check that f +g,


cf belong to

(X) whenever f, g

(X) and c R; simultaneously we can conrm that | |

is a norm on

(X)
by observing that
[(f +g)(x)[ [f(x)[ +[g(x)[ |f|

+|g|

,
[cf(x)[ = [c[[f(x)[ [c[|f|

whenever f, g

(X) and c R. It is worth noting at the same time that if f, g

(X), then
[(f g)(x)[ = [f(x)[[g(x)[ |f|

|g|

392 Fourier analysis 281B


for every x X, so that |f g|

|f|

|g|

.
(Of course all these remarks are very elementary special cases of parts of 243; see 243Xl.)
(b) Recall that
A = f : f

(X), > 0 f
1
A, |f f
1
|


(2A3Kb). Take f, g A and c R, and let > 0. Set
= min(1,

2+|c|+f

+g

) > 0.
Then there are f
1
, g
1
A such that |f f
1
|

, |g g
1
|

.
Now
|(f +g) (f
1
+g
1
)|

|f f
1
|

+|g g
1
|

2 ,
|cf cf
1
|

= [c[|f f
1
|

[c[ ,
|(f g) (f
1
g
1
)|

= |(f f
1
) g +f (g g
1
) (f f
1
) (g g
1
)|

|(f f
1
) g|

+|f (g g
1
)|

+|(f f
1
) (g g
1
)|

|f f
1
|

|g|

+|f|

|g g
1
)|

+|f f
1
|

|g g
1
|

(|g|

+|f|

+) (|g|

+|f|

+ 1) ,
|[f[ [f
1
[|

|f f
1
|

.
(i) If A is a linear subspace, then f
1
+g
1
and cf
1
belong to A. As is arbitrary, f +g and cf belong to A. As f,
g and c are arbitrary, A is a linear subspace of

(X).
(ii) If A is closed under multiplication, then f
1
g
1
A. As is arbitrary, f g A.
(iii) If the absolute values of functions in A belong to A, then [f
1
[ A. As is arbitrary, [f[ A.
281C Lemma There is a sequence p
n

nN
of real polynomials such that lim
n
p
n
(x) = [x[ uniformly for x
[1, 1].
proof (a) By the Binomial Theorem we have
(1 x)
1/2
= 1
1
2
x
1
42!
x
2

13
2
3
3!
x
3
. . . =

n=0
(2n)!
(2n1)(2
n
n!)
2
x
n
whenever [x[ < 1, with the convergence being uniform on any interval [a, a] with 0 a < 1. (For a proof of this,
see almost any book on real or complex analysis. If you have no favourite text to hand, you can try to construct
a proof from the following facts: (i) the radius of convergence of the series is 1, so on any interval [a, a], with
0 a < 1, it is uniformly absolutely summable (ii) writing f(x) for the sum of the series for [x[ < 1, use Lebesgues
Dominated Convergence Theorem to nd expressions for the indenite integrals
_
x
0
f,
_
0
x
f and show that these are
2
3
(1 (1 x)f(x)),
2
3
(1 (1 + x)f(x)) for 0 x < 1 (iii) use the Fundamental Theorem of Calculus to show that
f(x) + 2(1 x)f

(x) = 0 (iv) show that


d
dx
_
f(x)
2
1x
_
= 0 and hence (v) that f(x)
2
= 1 x whenever [x[ < 1. Finally,
show that because f is continuous and non-zero in ]1, 1[, f(x) must be the positive square root of 1 x throughout.)
We have a further fragment of information. If we set
q
0
(x) = 1, q
1
(x) = 1
1
2
x, q
n
(x) =

n
k=0
(2k)!
(2k1)(2
k
k!)
2
x
k
for n 2 and x [0, 1], so that q
n
is the nth partial sum of the binomial series for (1 x)
1/2
, then we have
lim
n
q
n
(x) = (1 x)
1/2
for every x [0, 1[. But also every q
n
is non-increasing on [0, 1], and q
n
(x)
nN
is a
non-increasing sequence for each x [0, 1]. So we must have

1 x q
n
(x) n N, x [0, 1[,
and therefore, because all the q
n
are continuous,

1 x q
n
(x) n N, x [0, 1].
281E The Stone-Weierstrass theorem 393
Moreover, given > 0, set a = 1
1
4

2
, so that

1 a =

2
. Then there is an n
0
N such that q
n
(x)

1 x

2
for
every x [0, a] and n n
0
. In particular, q
n
(a) , so q
n
(x) and q
n
(x)

1 x for every x [a, 1], n n


0
.
This means that
0 q
n
(x)

1 x n n
0
, x [0, 1];
as is arbitrary, q
n
(x)
nN

1 x uniformly on [0, 1].


(b) Now set p
n
(x) = q
n
(1 x
2
) for x R. Because each q
n
is a real polynomial of degree n, each p
n
is a real
polynomial of degree 2n. Next,
sup
|x|1
[p
n
(x) [x[[ = sup
|x|1
[q
n
(1 x
2
)
_
1 (1 x
2
)[
= sup
y[0,1]
[q
n
(y)
_
1 y[ 0
as n , so lim
n
p
n
(x) = [x[ uniformly for [x[ 1, as required.
281D Corollary Let X be a set, and A a norm-closed linear subspace of

(X) containing X and such that


f g A whenever f, g A. Then [f[ A for every f A.
proof Set
f
1
=
1
1+f

f,
so that f
1
A and |f
1
|

1. Because A contains X and is closed under multiplication, p

f
1
A for every
polynomial p with real coecients. In particular, g
n
= p
n

f
1
A for every n, where p
n

nN
is the sequence of 281C.
Now, because [f
1
(x)[ 1 for every x X,
|g
n
[f
1
[|

= sup
xX
[p
n
(f
1
(x)) [f
1
(x)[[ sup
|y|1
[p
n
(y) [y[[ 0
as n . Because A is | |

-closed, [f
1
[ A; consequently [f[ A, as claimed.
281E Stone-Weierstrass theorem: second form Let X be a topological space and K a compact subset of X.
Write C
b
(X) for the space of all bounded continuous real-valued functions on X. Let A C
b
(X) be such that
A is a linear subspace of C
b
(X);
f g A for every f, g A;
X A;
whenever x, y are distinct points of K there is an f A such that f(x) ,= f(y).
Then for every continuous h : K R and > 0 there is an f A such that
[f(x) h(x)[ for every x K,
if K ,= , inf
xX
f(x) inf
xK
h(x) and sup
xX
f(x) sup
xK
h(x).
proof Let A be the | |

-closure of A in

(X). It is helpful to know that A C


b
(X); this is because the uniform
limit of continuous functions is continuous. (But if this is new to you, or your memory has faded, dont take time to
look it up now; just read AC
b
(X) in place of A in the rest of this argument.) By 281B-281D, A is a linear subspace
of C
b
(X) and [f[ A for every f A, so the conditions of 281A apply to A.
Take a continuous h : K R and an > 0. The cases in which K = or h is constant are trivial, because all
constant functions belong to A; so I suppose that M
0
= inf
xK
h(x) and M
1
= sup
xK
h(x) are dened and distinct.
As observed at the end of the proof of 281A, M
0
and M
1
are nite. Set
= min(
1
3
,
1
2
(M
1
M
2
)) > 0,

h(x) = med(M
0
+, h(x), M
1
) for x K
(denition: 2A1Ac), so that

h : K R is continuous and M
0
+

h(x) M
1
for every x K. By 281A, there
is an f
0
A such that [f
0
(x)

h(x)[ for every x K and M


0
+ f
0
(x) M
1
for every x X. Now there
is an f A such that |f f
0
|

, so that
[f(x) h(x)[ [f(x) f
0
(x)[ +[f
0
(x)

h(x)[ +[

h(x) h(x)[ 3
for every x K, while
M
0
f
0
(x) f(x) f
0
(x) + M
1
for every x X.
394 Fourier analysis 281F
281F Corollary: Weierstrass theorem Let K be any closed bounded subset of R. Then every continuous
h : K R can be uniformly approximated on K by polynomials.
proof Apply 281E with X = K (noting that K, being closed and bounded, is compact), and A the set of polynomials
with real coecients, regarded as functions from K to R.
281G Stone-Weierstrass theorem: third form Let X be a topological space and K a compact subset of X.
Write C
b
(X; C) for the space of all bounded continuous complex-valued functions on X, so that C
b
(X; C) is a linear
space over C. Let A C
b
(X; C) be such that
A is a linear subspace of C
b
(X; C);
f g A for every f, g A;
X A;
the complex conjugate

f of f belongs to A for every f A;
whenever x, y are distinct points of K there is an f A such that f(x) ,= f(y).
Then for every continuous h : K C and > 0 there is an f A such that
[f(x) h(x)[ for every x K,
if K ,= , sup
xX
[f(x)[ sup
xK
[h(x)[.
proof If K = , or h is identically zero, we can take f = 0. So let us suppose that M = sup
xK
[h(x)[ > 0.
(a) Set
A
R
= f : f A, f(x) is real for every x X.
Then A
R
satises the conditions of 281E. PPP (i) Evidently A
R
is a subset of C
b
(X) = C
b
(X; R), is closed under addition,
multiplication by real scalars and pointwise multiplication of functions, and contains X. If x, y are distinct points of
K, there is an f A such that f(x) ,= f(y). Now
!e f =
1
2
(f +

f), Jmf =
1
2i
(f

f)
both belong to A and are real-valued, so belong to A
R
, and at least one of them takes dierent values at x and y. QQQ
(b) Consequently, given a continuous function h : K C and > 0, we may apply 281E twice to nd f
1
, f
2
A
R
such that
[f
1
(x) !e(h(x))[ , [f
2
(x) Jm(h(x))[
for every x K, where = min(
1
2
,
M
6M+4
) > 0. Setting g = f
1
+ if
2
, we have g A and [g(x) h(x)[ 2 for every
x K.
(c) Set L = |g|

. If L M we can take f = g and stop. Otherwise, consider the function


(t) =
M
max(M,

t)
for t [0, L
2
]. By Weierstrass theorem (281F), there is a real polynomial p such that [(t) p(t)[

L
whenever
0 t L
2
. Note that [g[
2
= g g A, so that
f = g p([g[
2
) A.
Now
[p(t)[ (t) +

L
(t) +

max(M,

t)
=
M
max(M,

t)
whenever 0 t L
2
, so
[f(x)[ [g(x)[
M
max(M,|g(x)|)
M
for every x X. Next, if 0 t min(L, M + 2)
2
,
[1 p(t)[

L
+ 1 (t)

M
+ 1
M
M+2

4
M
.
Consequently, if x K, so that
[g(x)[ min(L, [h(x)[ + 2) min(L, M + 2),
281L The Stone-Weierstrass theorem 395
we shall have
[1 p([g(x)[
2
)[
4
M
,
and
[f(x) h(x)[ [g(x) h(x)[ +[g(x)[[1 p([g(x)[
2
)[
2 +
4
M
(M + 2) 2 +
4
M
(M + 1) ,
as required.
Remark Of course we could have saved ourselves eort by settling for
sup
xX
[f(x)[ 2 sup
xK
[h(x)[,
which would be quite good enough for the applications below.
281H Corollary Let [a, b] R be a non-empty bounded closed interval and h : [a, b] C a continuous function.
Then for any > 0 there are y
0
, . . . , y
n
R and c
0
, . . . , c
n
C such that
[h(x)

n
k=0
c
k
e
iy
k
x
[ for every x [a, b],
sup
xR
[

n
k=0
c
k
e
iy
k
x
[ sup
x[a,b]
[h(x)[.
proof Apply 281G with X = R, K = [a, b] and A the linear span of the functions x e
iyx
as y runs over R.
281I Corollary Let S
1
be the unit circle z : [z[ = 1 C. Then for any continuous function h : S
1
C and
> 0, there are n N and c
n
, c
n+1
, . . . , c
0
, . . . , c
n
C such that [h(z)

n
k=n
c
k
z
k
[ for every z S
1
.
proof Apply 281G with X = K = S
1
and A the linear span of the functions z z
k
for k Z.
281J Corollary Let h : [, ] C be a continuous function such that h() = h(). Then for any > 0 there
are n N, c
n
, . . . , c
n
C such that [h(x)

n
k=n
c
k
e
ikx
[ for every x [, ].
proof The point is that

h : S
1
C is continuous on S
1
, where

h(z) = h(arg z); this is because arg is continuous
everywhere except at 1, and
lim
x
h(x) = h() = h() = lim
x
h(x),
so
lim
zS
1
,z1

h(z) = h() =

h(1).
Now by 281H there are c
n
, . . . , c
n
C such that [

h(z)

n
k=n
c
k
z
k
[ for every z S
1
, and these coecients
serve equally for h.
281K Corollary Suppose that r 1 and that K R
r
is a non-empty closed bounded set. Let h : K C be a
continuous function, and > 0. Then there are y
0
, . . . , y
n
Q
r
and c
0
, . . . , c
n
C such that
[h(x)

n
k=0
c
k
e
iy
k
. x
[ for every x K,
sup
xR
r [

n
k=0
c
k
e
iy
k
. x
[ sup
xK
[h(x)[,
writing y. x =

r
j=1

j
when y = (
1
, . . . ,
r
) and x = (
1
, . . . ,
r
) belong to R
r
.
proof Apply 281G with X = R
r
and A the linear span of the functions x e
iy . x
as y runs over Q
r
.
281L Corollary Suppose that r 1 and that K R
r
is a non-empty closed bounded set. Let h : K R be
a continuous function, and > 0. Then there are y
0
, . . . , y
n
R
r
and c
0
, . . . , c
n
C such that, writing g(x) =

n
k=0
c
k
e
iy
k
. x
, g is real-valued and
[h(x) g(x)[ for every x K,
inf
yK
h(y) g(x) sup
yK
h(y) for every x R
r
.
proof Apply 281E with X = R
r
and A the set of real-valued functions on R
r
which are complex linear combinations
of the functions x e
iy . x
; as remarked in part (a) of the proof of 281G, A satises the conditions of 281E.
396 Fourier analysis 281M
281M Weyls Equidistribution Theorem We are now ready for one of the basic results of number theory. I
shall actually apply it to provide an example in 285 below, but (at least in the one-variable case) it is surely on the
(rather long) list of things which every pure mathematician should know. For the sake of the application I have in
mind, I give the full r-dimensional version, but you may wish to take it in the rst place with r = 1.
It will be helpful to have a notation for fractional part. For any real number x, write <x> for that number in [0, 1[
such that x <x> is an integer. Now for the theorem.
281N Theorem Let
1
, . . . ,
r
be real numbers such that 1,
1
, . . . ,
r
are linearly independent over Q. Then
whenever 0
j

j
1 for each j r,
lim
n
1
n+1
#(m : m n, <m
j
> [
j
,
j
] for every j r) =

r
j=1
(
j

j
).
Remark Thus the theorem says that the long-term proportion of the r-tuples (<m
1
>, . . . , <m
r
>) which belong to
the interval [a, b] [0, 1] is just the Lebesgue measure [a, b] of the interval. Of course the condition
1
, . . . ,
r
are
linearly independent over Q is necessary as well as sucient (281Xg).
proof (a) Write y = (
1
, . . . ,
r
) R
r
,
<my> = (<m
1
>, . . . , <m
r
>) [0, 1[ = [0, 1[
r
for each m N. Set I = [0, 1] = [0, 1]
r
, and for any function f : I R write
L(f) = limsup
n
1
n+1

n
m=0
f(<my>),
L(f) = liminf
n
1
n+1

n
m=0
f(<my>);
and for f : I C write
L(f) = lim
n
1
n+1

n
m=0
f(<my>)
if the limit exists. It will be worth noting that for non-negative functions f, g, h : I R such that h f +g,
L(h) L(f) +L(g),
and that L(cf +g) = cL(f) +L(g) for any two functions f, g : I C such that L(f) and L(g) exist, and any c C.
(b) I mean to show that L(f) exists and is equal to
_
I
f for (many) continuous functions f. The key step is to
consider functions of the form
f(x) = e
2ik. x
,
where k = (
1
, . . . ,
r
) Z
r
. In this case, if k ,= 0,
k. y =

r
j=1

j
/ Z
because 1,
1
, . . . ,
r
are linearly independent over Q. So
L(f) = lim
n
1
n+1
n

m=0
e
2ik. <my>
= lim
n
1
n+1
n

m=0
e
2imk. y
(because mk. y k. <my> =

r
j=1

j
(m
j
<m
j
>) is an integer)
= lim
n
1 e
2i(n+1)k. y
(n + 1)(1 e
2ik. y
)
(because e
2ik. y
,= 1)
= 0,
because [1 e
2i(n+1)k. y
[ 2 for every n. Of course we can also calculate the integral of f over I, which is
_
I
f(x)dx =
_
I
e
2ik. x
dx =
_
I
r

j=1
e
2i
j

j
dx
(writing x = (
1
, . . . ,
r
))
281N The Stone-Weierstrass theorem 397
=
_
1
0
. . .
_
1
0
r

j=1
e
2i
j

j
d
r
. . . d
1
=
_
1
0
e
2i
r

r
d
r
. . .
_
1
0
e
2i
1

1
d
1
= 0
because at least one
j
is non-zero, and for this j we must have
_
1
0
e
2i
j

j
d
j
=
1
2i
j
(e
2i
j
1) = 0.
So we have L(f) =
_
I
f = 0 when k ,= 0. On the other hand, if k = 0, then f is constant with value 1, so
L(f) = lim
n
1
n+1

n
m=0
f(<my>) = lim
n
1 = 1 =
_
I
f(x)dx.
(c) Now write I = [0, 1] ]0, 1[, the boundary of I. If f : I C is continuous and f(x) = 0 for x I, then
L(f) =
_
I
f. PPP As in 281I, let S
1
be the unit circle z : z C, [z[ = 1, and set K = (S
1
)
r
C
r
. If we think of K as
a subset of R
2r
, it is closed and bounded. Let : K I be given by
(
1
, . . . ,
r
) = (
1
2
+
arg
1
2
, . . . ,
1
2
+
arg
r
2
)
for
1
, . . . ,
r
S
1
. Then h = f : K C is continuous, because is continuous on (S
1
1)
r
and
lim
wz
f(w) = f(z) = 0
for any z K (S
1
1)
r
. (Compare 281J.) Now apply 281G with X = K and A the set of polynomials in

1
, . . . ,
r
,
1
1
, . . . ,
1
r
to see that, given > 0, there is a function of the form
g(z) =

kJ
c
k

1
1
. . .

r
r
,
for some nite set J Z
r
and constants c
k
C for k J, such that
[g(z) h(z)[ for every z K.
Set
g(x) = g(e
i(2
1
1)
, . . . , e
i(2
r
1)
) =

kJ
c
k
e
ik. (2x1)
=

kJ
(1)
k.1
c
k
e
2ik. x
,
so that g = g, and see that
sup
xI
[ g(x) f(x)[ = sup
zK
[g(z) h(z)[ .
Now g is of the form dealt with in (a), so we must have L( g) =
_
I
g. Let n
0
be such that

_
I
g
1
n+1

n
m=0
g(<my>)


for every n n
0
. Then
[
_
I
f
_
I
g[
_
I
[f g[
and
[
1
n+1
n

m=0
g(<my>)
1
n+1
n

m=0
f(<my>)[
1
n+1
n

m=0
[ g(<my>) f(<my>)[

1
n+1
(n + 1) =
for every n N. So for n n
0
we must have
[
1
n+1

n
m=0
f(<my>)
_
I
f[ 3.
As is arbitrary, L(f) =
_
I
f, as required. QQQ
(d) Observe next that if a, b ]0, 1[ = ]0, 1[
r
, and > 0, there are continuous functions f
1
, f
2
such that
f
1
[a, b] f
2
]0, 1[,
_
I
f
2

_
I
f
1
.
398 Fourier analysis 281N
PPP This is elementary. For n N, dene h
n
: R [0, 1] by setting h
n
() = 0 if 0, 2
n
if 0 2
n
and 1 if
2
n
. Set
f
1n
(x) =

r
j=1
h
n
(
j

j
)h
n
(
j

j
),
f
2n
(x) =

r
j=1
(1 h
n
(
j

j
))(1 h
n
(
j

j
))
for x = (
1
, . . . ,
r
) R
r
. (Compare the proof of 242Oa.) Then f
1n
[a, b] f
2n
for each n, f
2n
]0, 1[ for all n
so large that
2
n
min(min
jr

j
, min
jr
(1
j
)),
and lim
n
f
2n
(x) f
1n
(x) = 0 for every x, so
lim
n
_
I
f
2n

_
I
f
1n
= 0.
Thus we can take f
1
= f
1n
, f
2
= f
2n
for any n large enough. QQQ
(e) It follows that if a, b ]0, 1[ and a b, L([a, b]) = [a, b]. PPP Let > 0. Take f
1
, f
2
as in (d). Then, using (c),
L([a, b]) L(f
2
) = L(f
2
) =
_
I
f
2

_
I
f
1
+ [a, b] +,
L([a, b]) L(f
1
) = L(f
1
) =
_
I
f
1

_
I
f
2
[a, b] ,
so
[a, b] L([a, b]) L([a, b]) [a, b] +.
As is arbitrary,
[a, b] = L([a, b]) = L([a, b]) = L([a, b]),
as required. QQQ
(f ) To complete the proof, take any a, b I with a b. For 0
1
2
, set I

= [1, (1 )1], so that I

is a closed
interval included in ]0, 1[ and I

= (1 2)
r
. Of course L(I) = I = 1, so
L((I I

)) = L(I) L(I

) = 1 I

,
and
[a, b] 1 +I

[a, b] +I

([a, b] I

) = ([a, b] I

)
= L(([a, b] I

)) L(([a, b]))
L(([a, b])) L(([a, b] I

)) +L((I I

))
= L(([a, b] I

)) + 1 I

= ([a, b] I

) + 1 I

[a, b] + 1 I

.
As is arbitrary,
[a, b] = L([a, b]) = L([a, b]) = L([a, b]),
as stated.
281X Basic exercises (a) Let A be the set of those bounded continuous functions f : R
r
R
r
R which are
expressible in the form f(x, y) =

n
k=0
g
k
(x)g

k
(y), where all the g
k
, g

k
are continuous functions from R
r
to R. Show
that for any bounded continuous function h : R
r
R
r
R and any bounded set K R
r
R
r
and any > 0, there is
an f A such that [f(x, y) h(x, y)[ for every (x, y) K and sup
x,yR
r [f(x, y)[ sup
x,yR
r [h(x, y)[.
(b) Let K be a closed bounded set in R
r
, where r 1, and h : K R a continuous function. Show that for any
> 0 there is a polynomial p in r variables such that [h(x) p(x)[ for every x K.
>>>(c) Let [a, b] be a non-empty closed interval of R and h : [a, b] R a continuous function. Show that for any > 0
there are y
0
, . . . , y
n
, a
0
, . . . , a
n
, b
0
, . . . , b
n
R such that
[h(x)

n
k=0
(a
k
cos y
k
x +b
k
sin y
k
x)[ for every x [a, b],
sup
xR
[

n
k=0
(a
k
cos y
k
x +b
k
sin y
k
x)[ sup
x[a,b]
[h(x)[.
281Ye The Stone-Weierstrass theorem 399
(d) Let h be a complex-valued function on ], ] such that [h[
p
is integrable, where 1 p < . Show that for every
> 0 there is a function of the form x f(x) =

n
k=n
c
k
e
ikx
, where c
k
, . . . , c
k
C, such that
_

[h f[
p
.
(Compare 244H.)
>>>(e) Let h : [, ] R be a continuous function such that h() = h(), and > 0. Show that there are
a
0
, . . . , a
n
, b
1
, . . . , b
n
R such that
[h(x)
1
2
a
0

n
k=1
(a
k
cos kx +b
k
sin kx)[
for every x [, ].
(f ) Let K be a non-empty closed bounded set in R
r
, where r 1, and h : K R a continuous function. Show that
for any > 0 there are y
0
, . . . , y
n
R
r
, a
0
, . . . , a
n
, b
0
, . . . , b
n
R such that
[h(x)

n
k=0
(a
k
cos y
k
. x +b
k
sin y
k
. x)[ for every x K,
sup
xR
[

n
k=0
(a
k
cos y
k
. x +b
k
sin y
k
. x)[ sup
xK
[h(x)[,
interpreting y. x as in 281K.
(g) Let y
1
, . . . , y
r
be real numbers which are not linearly independent over Q. Show that there is a non-trivial
interval [a, b] [0, 1] R
r
such that (<ky
1
>, . . . , <ky
r
>) / [a, b] for every k Z.
(h) Let
1
, . . . ,
r
be real numbers such that 1,
1
, . . . ,
r
are linearly independent over Q. Suppose that 0
j

j
1 for each j r. Show that for every > 0 there is an n
0
N such that
[

r
j=1
(
j

j
)
1
n+1
#(m : k m k +n, <m
j
> [
j
,
j
] for every j r)[
whenever n n
0
and k N. (Hint: in 281N, set
L(f) = limsup
n
sup
kN
1
n+1

k+n
m=k
f(<my>).)
281Y Further exercises (a) Show that under the hypotheses of 281A, there is an f A, the | |

-closure of A in
C
b
(X), such that fK = h. (Hint: take f = lim
n
f
n
where
|f
n+1
f
n
|

sup
xK
[f
n
(x) h(x)[ 2
n
for every n N.)
(b) Let X be a topological space and K X a compact subset. Suppose that for any distinct points x, y of K there
is a continuous function f : X R such that f(x) ,= f(y). Show that for any r N and any continuous h : K R
r
there is a continuous f : X R
r
extending h. (Hint: consider r = 1 rst.)
(c) Let X
i

iI
be any family of compact Hausdor spaces, and X their product as topological spaces. For each i,
write C(X
i
) for the set of continuous functions from X
i
to R, and
i
: X X
i
for the coordinate map. Show that the
subalgebra of C(X) generated by f
i
: i I, f C(X
i
) is | |

-dense in C(X). (Note: you will need to know that


X is compact, and that if Z is any compact Hausdor space then for any distinct z, w Z there is an f C(Z) such
that f(z) ,= f(w). For references see 3A3J and 3A3Bf in the next volume.)
(d) Let X be a topological space and K a compact subset of X. Let A be a linear subspace of the space C
b
(X) of
real-valued continuous functions on X such that [f[ A for every f A. Let h : K R be a continuous function such
that whenever x, y K there is an f A such that f(x) = h(x) and f(y) = h(y). Show that for every > 0 there is
an f A such that [f(x) h(x)[ for every x K.
(e) Let X be a compact topological space and write C(X) for the set of continuous functions from X to R. Suppose
that h C(X), and let A C(X) be such that
A is a linear subspace of C(X);
either [f[ A for every f A or f g A for every f, g A or f f A for every f A;
whenever x, y X and > 0 there is an f A such that [f(x) h(x)[ , [f(y) h(y)[ .
Show that for every > 0 there is an f A such that [h(x) f(x)[ for every x X.
400 Fourier analysis 281Yf
(f ) Let X be a compact topological space and A a | |

-closed linear subspace of the space C(X) of continuous


functions from X to R. Show that the following are equiveridical:
(i) [f[ A for every f A;
(ii) f f A for every f A;
(iii) f g A for all f, g A,
and that in this case A is closed in C(X) for the topology dened by the pseudometrics
(f, g) [f(x) g(x)[ : C(X) C(X) [0, [
as x runs over X (the topology of pointwise convergence on C(X)).
(g) Show that under the hypotheses of 281G there is an f A, the | |

-closure of A in C
b
(X; C), such that fK = h
and (if K ,= ) |f|

= sup
xK
[h(x)[.
(h) Let y R be irrational. Show that for any Riemann integrable function f : [0, 1] R,
_
1
0
f(x)dx = lim
n
1
n+1

n
m=0
f(<my>),
writing <my> for the fractional part of my. (Hint: recall Riemanns criterion: for any > 0, there are a
0
, . . . , a
n
with 0 = a
0
a
1
. . . a
n
= 1 and

a
j
a
j1
: j n, sup
x[a
j1
,a
j
]
f(x) inf
x[a
j1
,a
j
]
f(x) .)
(i) Let t
n

nN
be a sequence in [0, 1]. Show that the following are equiveridical: (i) lim
n
1
n+1

n
k=0
f(t
k
)
=
_
1
0
f for every continuous function f : [0, 1] R; (ii) lim
n
1
n+1

n
k=0
f(t
k
) =
_
1
0
f for every Riemann integrable
function f : [0, 1] R; (iii) liminf
n
1
n+1
#(k : k n, t
k
G) G for every open set G [0, 1]; (iv)
lim
n
1
n+1
#(k : k n, t
k
) = for every [0, 1]; (v) lim
n
1
n+1
#(k : k n, t
k
E) = E for every
E [0, 1] such that (int E) = E (vi) lim
n
1
n+1

n
k=0
e
2imt
k
= 0 for every m 1. (Cf. 273J. Such sequences
t
n

nN
are called equidistributed or uniformly distributed.)
(j) Show that the sequence <ln(n + 1)>
nN
is not equidistributed.
(k) Give [0, 1]
N
its product measure . Show that -almost every sequence t
n

nN
[0, 1]
N
is equidistributed in
the sense of 281Yi. (Hint: 273J.)
(l) Let f : [0, 1]
2
C be a continuous function. Show that if R is irrational then
_
[0,1]
2
f = lim
a
1
a
_
a
0
f(<t>, <t>)dt.
(Hint: consider rst functions of the form x e
2ik. x
.)
(m) A sequence t
n

nN
in [0, 1] is well-distributed if liminf
n
inf
lN
1
n+1
#(k : l k l + n, t
k
G) G
for every open set G [0, 1]. (i) Show that t
n

nN
is well-distributed i lim
n
sup
lN
[
_
1
0
f
1
n+1

k+n
k=l
f(t
k
)[ = 0
for every continuous f : [0, 1] R. (ii) Show that <n>
nN
is well-distributed for every irrational .
281 Notes and comments I have given three statements (281A, 281E and 281G) of the Stone-Weierstrass theorem,
with an acknowledgement (281F) of Weierstrass own version, and three further forms (281Ya, 281Yd, 281Yg) in the
exercises. Yet another will appear in 4A6 in Volume 4. Faced with such a multiplicity, you may wish to try your own
hand at writing out theorems which will cover some or all of these versions. I myself see no way of doing it without
setting up a confusing list of alternative hypotheses and conclusions. At which point, I ask what is a theorem, anyway?,
and answer, it is a stopping-place on our journey; it is a place where we can rest, and congratulate ourselves on our
achievement; it is a place which we can learn to recognise, and use as a starting point for new adventures; it is a place
we can describe, and share with others. For some theorems, like Fermats last theorem, there is a canonical statement,
an exactly locatable point. For others, like the Stone-Weierstrass theorem here, we reach a mass of closely related
results, all depending on some arrangement of the arguments laid out in 281A-281G and 281Ya (which introduces
a new idea), and all useful in dierent ways. I suppose, indeed, that most authors would prefer the versions 281Ya
and 281Yg, which eliminate the variable which appears in 281A, 281E and 281G, at the expense of taking a closed
subspace A. But I nd that the corollaries which will be useful later (281H-281L) are more naturally expressed in
terms of linear subspaces which are not closed.
The applications of the theorem, or the theorems, or the method choose your own expression are legion; only a
few of them are here. An apparently innocent one is in 281Xa and, in a dierent variant, in 281Yc; these are enormously
282Ba Fourier series 401
important in their own domains. In this volume the principal application will be to 285L below, depending on 281K,
and it is perhaps right to note that there is an alternative approach to this particular result, based on ideas in 282G.
But I oer Weyls equidistribution theorem (281M-281N) as evidence that we can expect to nd good use for these
ideas in almost any branch of mathematics.
282 Fourier series
Out of the enormous theory of Fourier series, I extract a few results which may at least provide a foundation for
further study. I give the denitions of Fourier and Fejer sums (282A), with ve of the most important results concerning
their convergence (282G, 282H, 282J, 282L, 282O). On the way I include the Riemann-Lebesgue lemma (282E). I end
by mentioning convolutions (282Q).
282A Denition Let f be an integrable complex-valued function dened almost everywhere in ], ].
(a) The Fourier coecients of f are the complex numbers
c
k
=
1
2
_

f(x)e
ikx
dx
for k Z.
(b) The Fourier sums of f are the functions
s
n
(x) =
n

k=n
c
k
e
ikx
for x ], ], n N.
(c) The Fourier series of f is the series

k=
c
k
e
ikx
, or (because we ordinarily consider the symmetric partial
sums s
n
) the series c
0
+

k=1
(c
k
e
ikx
+c
k
e
ikx
).
(d) The Fejer sums of f are the functions

m
=
1
m+1
m

n=0
s
n
for m N.
(e) It will be convenient to have a further phrase available. If f is any function with domf ], ], its periodic
extension is the function

f, with domain

kZ
(domf + 2k), such that

f(x) = f(x 2k) whenever k Z and
x domf + 2k.
282B Remarks I have made two more or less arbitrary choices here.
(a) I have chosen to express Fourier series in their complex form rather than their real form. From the point
of view of pure measure theory (and, indeed, from the point of view of the nineteenth-century origins of the subject)
there are gains in elegance from directing attention to real functions f and looking at the real coecients
a
k
=
1

f(x) cos kxdx for k N,


b
k
=
1

f(x) sin kxdx for k 1.


If we do this we have
c
0
=
1
2
a
0
,
and for k 1 we have
402 Fourier analysis 282Ba
c
k
=
1
2
(a
k
ib
k
), c
k
=
1
2
(a
k
+ib
k
), a
k
= c
k
+c
k
, b
k
= i(c
k
c
k
),
so that the Fourier sums become
s
n
(x) =
1
2
a
0
+
n

k=1
a
k
cos kx +b
k
sin kx.
The advantage of this is that real functions f correspond to real coecients a
k
, b
k
, so that it is obvious that if f is
real-valued so are its Fourier and Fejer sums. The disadvantages are that we have to use a variety of trigonometric
equalities which are rather more complicated than the properties of the complex exponential function which they reect,
and that we are farther away from the natural generalizations to locally compact abelian groups. So both electrical
engineers and harmonic analysts tend to prefer the coecients c
k
.
(b) I have taken the functions f to be dened on the interval ], ] rather than on the circle S
1
= z : z C, [z[ =
1. There would be advantages in elegance of language in using S
1
, though I do not recall often seeing the formula
c
k
=
_
z
k
f(z)dz
which is the natural translation of c
k
=
1
2
_
e
ikx
f(x)dx under the substitution x = arg z, dx = 2(dz). However,
applications of the theory tend to deal with periodic functions on the real line, so I work with ], ], and accept the
fact that its group operation +
2
, writing x +
2
y for whichever of x +y, x +y + 2, x +y 2 belongs to ], ], is
less familiar than multiplication on S
1
.
(c) The remarks in (b) are supposed to remind you of 255.
(d) Observe that if f =
a.e.
g then f and g have the same Fourier coecients, Fourier sums and Fejer sums. This
means that we could, if we wished, regard the c
k
, s
n
and
m
as associated with a member of L
1
C
, the space of equivalence
classes of integrable functions (242), rather than as associated with a particular function f. Since however the s
n
and

m
appear as actual functions, and since many of the questions we are interested in refer to their values at particular
points, it is more natural to express the theory in terms of integrable functions f rather than in terms of members of
L
1
C
.
282C The problems (a) Under what conditions, and in what senses, do the Fourier and Fejer sums s
n
and
m
of
a function f converge to f?
(b) How do the properties of the double-ended sequence c
k

kZ
reect the properties of f, and vice versa?
Remark The theory of Fourier series has been one of the leading topics of analysis for nearly two hundred years, and
innumerable further problems have contributed greatly to our understanding. (For instance: can one characterize those
sequences c
k

kZ
which are the Fourier coecients of some integrable function?) But in this outline I will concentrate
on the question (a) above, with one and a half results (282K, 282Rb) addressing (b), which will give us more than
enough material to work on.
While most people would feel that the Fourier sums are somehow closer to what we really want to know, it turns out
that the Fejer sums are easier to analyse, and there are advantages in dealing with them rst. So while you may wish
to look ahead to the statements of 282J, 282L and 282O for an idea of where we are going, the rst half of this section
will be largely about Fejer sums. Note that in any case in which we know that the Fourier sums converge (which is
quite common; see, for instance, the examples in 282Xh and 282Xo), then if we know that the Fejer sums converge to
f, we can deduce that the Fourier sums also do, by 273Ca.
The rst step is a basic lemma showing that both the Fourier and Fejer sums of a function f can be thought of as
convolutions of f with kernels describable in terms of familiar functions.
282D Lemma Let f be a complex-valued function which is integrable over ], ], and
c
k
=
1
2
_

f(x)e
ikx
dx, s
n
(x) =

n
k=n
c
k
e
ikx
,
m
(x) =
1
m+1

m
n=0
s
n
(x)
its Fourier coecients, Fourier sums and Fejer sums. Write

f for the periodic extension of f (282Ae). For m N,
write

m
(t) =
1cos(m+1)t
2(m+1)(1cos t)
282D Fourier series 403
for 0 < [t[ . (If you like, you can set
m
(0) =
m+1
2
to make
m
continuous on [, ].)
(a) For each n N, x ], ],
s
n
(x) =
1
2
_

f(t)
sin(n+
1
2
)(xt)
sin
1
2
(xt)
dt
=
1
2
_

f(x +t)
sin(n+
1
2
)t
sin
1
2
t
dt
=
1
2
_

f(x
2
t)
sin(n+
1
2
)t
sin
1
2
t
dt,
writing x
2
t for whichever of x t, x t 2, x t + 2 belongs to ], ].
(b) For each m N, x ], ],

m
(x) =
_

f(x +t)
m
(t)dt
=
_

0
(

f(x +t) +

f(x t))
m
(t)dt
=
_

f(x
2
t)
m
(t)dt.
(c) For any n N,
1
2
_
0

sin(n+
1
2
)t
sin
1
2
t
dt =
1
2
_

0
sin(n+
1
2
)t
sin
1
2
t
dt =
1
2
,
1
2
_

sin(n+
1
2
)t
sin
1
2
t
dt = 1.
(d) For any m N,
(i) 0
m
(t)
m+1
2
for every t;
(ii) for any > 0, lim
m

m
(t) = 0 uniformly on t : [t[ ;
(iii)
_
0


m
=
_

0

m
=
1
2
,
_


m
= 1.
proof Really all that these amount to is summing geometric series.
(a) For (a), we have
n

k=n
e
ikt
=
e
int
e
i(n+1)t
1 e
it
=
e
i(n+
1
2
)t
e
i(n+
1
2
)t
e
1
2
it
e

1
2
it
=
sin(n +
1
2
)t
sin
1
2
t
.
So
s
n
(x) =
n

k=n
c
k
e
ikx
=
1
2
_

f(t)
_
n

k=n
e
ik(xt)
_
dt
=
1
2
_

f(t)
sin(n+
1
2
)(xt)
sin
1
2
(xt)
dt =
1
2
_

f(t)
sin(n+
1
2
)(xt)
sin
1
2
(xt)
dt
=
1
2
_
x
x

f(x +t)
sin(n+
1
2
)t
sin
1
2
t
dt =
1
2
_

f(x +t)
sin(n+
1
2
)t
sin
1
2
t
dt
because

f and t
sin(n+
1
2
)t
sin
1
2
t
are periodic with period 2, so that the integral from x to must be the same as
the integral from x to .
For the expression in terms of f(x
2
t), we have
s
n
(x) =
1
2
_

f(x +t)
sin(n+
1
2
)t
sin
1
2
t
dt =
1
2
_

f(x t)
sin(n+
1
2
)(t)
sin
1
2
(t)
dt
(substituting t for t)
404 Fourier analysis 282D
=
1
2
_

f(x
2
t)
sin(n+
1
2
)t
sin
1
2
t
dt
because (for x, t ], ]) f(x
2
t) =

f(x t) whenever either is dened, and sin is an odd function.
(b) In the same way, we have
m

n=0
sin(n +
1
2
)t = Jm
_
m

n=0
e
i(n+
1
2
)t
_
= Jm
_
e
1
2
it
m

n=0
e
int
_
= Jm
_
e
1
2
it
1 e
i(m+1)t
1 e
it
_
= Jm
_
1 e
i(m+1)t
e

1
2
it
e
1
2
it
_
= Jm
_
1 e
i(m+1)t
2i sin
1
2
t
_
= Jm
_
i(1 e
i(m+1)t
)
2 sin
1
2
t
_
=
1 cos(m+ 1)t
2 sin
1
2
t
.
So

m
n=0
sin(n+
1
2
)t
sin
1
2
t
=
1cos(m+1)t
2 sin
2 1
2
t
=
1cos(m+1)t
1cos t
= 2(m+ 1)
m
(t).
Accordingly,

m
(x) =
1
m+1
m

n=0
s
n
(x)
=
1
m+1
m

n=0
1
2
_

f(x +t)
sin(n+
1
2
)t
sin
1
2
t
dt
=
1
2
_

f(x +t)
_
1
m+1
m

n=0
sin(n+
1
2
)t
sin
1
2
t
_
dt
=
_

f(x +t)
m
(t)dt =
_

f(x
2
t)
m
(t)dt
as in (a), because cos and
m
are even functions. For the same reason,
_

0

f(x t)
m
(t)dt =
_
0

f(x +t)
m
(t)dt,
so

m
(x) =
_

0
(

f(x +t) +

f(x t))
m
(t)dt.
(c) We need only look at where the formula
sin(n+
1
2
)t
sin
1
2
t
came from to see that
1
2
_
I
sin(n+
1
2
)t
sin
1
2
t
dt =
1
2
_
I
n

k=n
e
ikt
dt
=
1
2
_
I
(1 + 2
n

k=1
cos kt)dt =
1
2
for both I = [, 0] and I = [0, ], because
_
I
cos kt dt = 0 for every k ,= 0.
(d)(i)
m
(t) 0 for every t because 1 cos(m+1)t, 1 cos t are always greater than or equal to 0. For the upper
bound, we have, using the constructions in (a) and (b),

sin(n+
1
2
)t
sin
1
2
t

k=n
e
ikt

2n + 1
282F Fourier series 405
for every n, so

m
(t) =
1
2(m+1)
m

n=0
sin(n+
1
2
)t
sin
1
2
t

1
2(m+1)
m

n=0
2n + 1 =
m+1
2
.
(ii) If [t[ ,

m
(t)
1
(m+1)(1cos t)

1
(m+1)(1cos )
0
as m .
(iii) also follows from the construction in (b), because
_
I

m
=
1
2(m+1)
m

n=0
_
I
sin(n+
1
2
)t
sin
1
2
t
dt =
1
m+1
m

n=0
1
2
=
1
2
for both I = [, 0] and I = [0, ], using (c).
Remarks For a discussion of substitution in integrals, if you feel any need to justify the manipulations in part (a) of
the proof, see 263I.
The functions
t
sin(n+
1
2
)t
sin
1
2
t
, t
1cos(m+1)t
(m+1)(1cos t)
are called respectively the Dirichlet kernel and the Fejer kernel.
I give the formulae in terms of f(x
2
t) in (a) and (b) in order to provide a link with the work of 255O.
282E The next step is a vital lemma, with a suitably distinguished name which (you will be glad to know) reects
its importance rather than its diculty.
The Riemann-Lebesgue lemma Let f be a complex-valued function which is integrable over R. Then
lim
y
_
f(x)e
iyx
dx = lim
y
_
f(x)e
iyx
dx = 0.
proof (a) Consider rst the case in which f = ]a, b[, where a < b. Then
[
_
f(x)e
iyx
dx[ = [
_
b
a
e
iyx
dx[ = [
1
iy
(e
iyb
e
iya
)[
2
|y|
if y ,= 0. So in this case the result is obvious.
(b) It follows at once that the result is true if f is a step-function with bounded support, that is, if there are
a
0
a
1
. . . a
n
such that f is constant on every interval ]a
j1
, a
j
[ and zero outside [a
0
, a
n
].
(c) Now, for a given integrable f and > 0, there is a step-function g such that
_
[f g[ (242Oa). So
[
_
f(x)e
iyx
dx
_
g(x)e
iyx
dx[
_
[f(x) g(x)[dx
for every y, and
limsup
y
[
_
f(x)e
iyx
dx[ ,
limsup
y
[
_
f(x)e
iyx
dx[ .
As is arbitrary, we have the result.
282F Corollary (a) Let f be a complex-valued function which is integrable over ], ], and c
k

kZ
its sequence
of Fourier coecients. Then lim
k
c
k
= lim
k
c
k
= 0.
(b) Let f be a complex-valued function which is integrable over R. Then lim
y
_
f(x) sin yxdx = 0.
406 Fourier analysis 282F
proof (a) We need only identify
c
k
=
1
2
_

f(x)e
ikx
dx
with
_
g(x)e
ikx
dx, where g(x) = f(x)/2 for x domf and 0 for [x[ > .
(b) This is just because
_
f(x) sin yxdx =
1
2i
(
_
f(x)e
iyx
dx
_
f(x)e
iyx
dx).
282G We are now ready for theorems on the convergence of Fejer sums. I start with an easy one, almost a
warming-up exercise.
Theorem Let f : ], ] C be a continuous function such that lim
t
f(t) = f(). Then its sequence
m

mN
of
Fejer sums converges uniformly to f on ], ].
proof The conditions on f amount just to saying that its periodic extension

f is dened and continuous everywhere
on R. Consequently it is bounded and uniformly continuous on any bounded interval, in particular, on the interval
[2, 2]. Set K = sup
|t|2
[

f(t)[ = sup
t],]
[f(t)[. Write

m
(t) =
1cos(m+1)t
2(m+1)(1cos t)
for m N, 0 < [t[ , as in 282D.
Given > 0 we can nd a ]0, ] such that [

f(x +t)

f(x)[ whenever x [, ] and [t[ . Next, we can
nd an m
0
N such that M
m


4K
for every m m
0
, where M
m
= sup
|t|

m
(t) (282D(d-ii)). Now suppose
that m m
0
and x ], ]. Set g(t) =

f(x +t) f(x) for [t[ . Then [g(t)[ 2K for all t [, ] and [g(t)[
if [t[ , so

g
m

[g[
m
+
_

[g[
m
+
_

[g[
m
2M
m
K( ) +
_

m
+ 2M
m
K( )
4M
m
K + 2.
Consequently, using 282Db and 282D(d-iii),
[
m
(x) f(x)[ = [
_

(

f(x +t) f(x))
m
(t)dt[ 2
for every m m
0
; and this is true for every x ], ]. As is arbitrary,
m

mN
converges to f uniformly on
], ].
282H I come now to a theorem describing the behaviour of the Fejer sums of general functions f. The hypothesis
of the theorem may take a little bit of digesting; you can get an idea of its intended scope by glancing at Corollary
282I.
Theorem Let f be a complex-valued function which is integrable over ], ], and
m

mN
its sequence of Fejer
sums. Suppose that x ], ] and c C are such that
lim
0
1

_

0
[

f(x +t) +

f(x t) 2c[dt = 0,
writing

f for the periodic extension of f, as usual; then lim
m

m
(x) = c.
proof Set (t) = [

f(x + t) +

f(x t) 2c[ when this is dened, which is almost everywhere, and (t) =
_
t
0
, which
is dened for every t 0, because

f is integrable over ], ] and therefore over every bounded interval.
As in 282D, set

m
(t) =
1cos(m+1)t
2(m+1)(1cos t)
282I Fourier series 407
for m N, 0 < [t[ . We have
[
m
(x) c[ = [
_

0
(

f(x +t) +

f(x t) 2c)
m
(t)dt[
_

0
(t)
m
by (b) and (d) of 282D.
Let > 0. By hypothesis, lim
t0
(t)/t = 0; let ]0, ] be such that (t) t for every t [0, ]. Take any
m /. I break the integral
_

0

m
up into three parts.
(i) For the integral from 0 to 1/m, we have
_
1/m
0

m

_
1/m
0
m+1
2
=
m+1
2
(
1
m
)
(m+1)
2m
,
because
m
(t)
m+1
2
for every t (282D(d-i)).
(ii) For the integral from 1/m to , we have
_

1/m

m

1
2(m+ 1)
_

1/m
(t)
1
1 cos t
dt

4(m+ 1)
_

1/m
(t)
t
2
dt
(because 1 cos t
2t
2

2
for [t[ )
=

4(m+ 1)
_
()

2

(
1
m
)
(
1
m
)
2
+
_

1/m
2(t)
t
3
dt
_
(integrating by parts see 225F)


4(m+ 1)
_

+
_

1/m
2
t
2
dt
_
(because (t) t for 0 t )


4(m+ 1)
_

+ 2m
_


4(m+ 1)
+

2


4
+

2
2.
(iii) For the integral from to , we have
_


m

_

1
(m+1)(1cos )
0 as m
because is integrable over [, ]. There must therefore be an m
0
N such that
_


m

for every m m
0
.
Putting these together, we see that
_

0

m
+ 2 + = 4
for every m max(m
0
,

). As is arbitrary, lim
m

m
(x) = c, as claimed.
282I Corollary Let f be a complex-valued function which is integrable over ], ], and
m

mN
its sequence of
Fejer sums.
(a) f(x) = lim
m

m
(x) for almost every x ], ].
(b) lim
m
_

[f
m
[ = 0.
(c) If g is another integrable function with the same Fourier coecients, then f =
a.e.
g.
(d) If x ], [ is such that a = lim
tdomf,tx
f(t) and b = lim
tdomf,tx
f(t) are both dened in C, then
lim
m

m
(x) =
1
2
(a +b).
408 Fourier analysis 282I
(e) If a = lim
tdomf,t
f(t) and b = lim
tdomf,t
f(t) are both dened in C, then
lim
m

m
() =
1
2
(a +b).
(f) If f is dened and continuous at x ], [, then
lim
m

m
(x) = f(x).
(g) If

f, the periodic extension of f, is dened and continuous at , then
lim
m

m
() = f().
proof (a) We have only to recall that by 223D
limsup
0
1

_

0
[f(x +t) +f(x t) 2f(x)[dt
limsup
0
1

_
_

0
[f(x +t) f(x)[dt +
_

0
[f(x t) f(x)[dt
_
= limsup
0
1

[f(x +t) f(x)[dt = 0


for almost every x ], [.
(b) Next observe that, in the language of 255O,

m
= f
m
,
by the last formula in 282Db. Consequently, by 255Od,
|
m
|
1
|f|
1
|
m
|
1
,
writing |
m
|
1
=
_

[
m
[. But this means that we have
f(x) = lim
m

m
(x) for almost every x, limsup
m
|
m
|
1
|f|
1
;
and it follows from 245H that lim
m
|f
m
|
1
= 0.
(c) If g has the same Fourier coecients as f, then it has the same Fourier and Fejer sums, so we have
g(x) = lim
m

m
(x) = f(x)
almost everywhere.
(d)-(e) Both of these amount to considering x ], ] such that
lim
tdom

f,tx

f(t) = a, lim
tdom

f,tx

f(t) = b.
Setting c =
1
2
(a+b), (t) = [

f(x+t) +

f(xt) 2c[ whenever this is dened, we have lim
tdom,t0
(t) = 0, so surely
lim
0
1

0
= 0, and the theorem applies.
(f )-(g) are special cases of (d) and (e).
282J I now turn to conditions for the convergence of Fourier sums. Probably the easiest result one which is both
striking and satisfying is the following.
Theorem Let f be a complex-valued function which is square-integrable over ], ]. Let c
k

kZ
be its Fourier
coecients and s
n

nN
its Fourier sums (282A). Then
(i)

k=
[c
k
[
2
=
1
2
_

[f[
2
,
(ii) lim
n
_

[f s
n
[
2
= 0.
proof (a) I recall some notation from 244N/244P. Let L
2
C
be the space of square-integrable complex-valued functions
on ], ]. For g, h L
2
C
, write
(g[h) =
_

h, |g|
2
=
_
(g[g).
282J Fourier series 409
Recall that |g +h|
2
|g|
2
+|h|
2
for all g, h L
2
C
(244Fb/244Pb). For k Z, x ], ] set e
k
(x) = e
ikx
, so that
(f[e
k
) =
_

f(x)e
ikx
dx = 2c
k
.
Moreover, if [k[ n,
(s
n
[e
k
) =
n

j=n
c
j
_

e
ijx
e
ikx
dx = 2c
k
,
because
_

e
ijx
e
ikx
dx = 2 if j = k,
= 0 if j ,= k.
So
(f s
n
[e
k
) = 0 whenever [k[ n;
in particular,
(f s
n
[s
n
) =
n

k=n
c
k
(f s
n
[e
k
) = 0
for every n N.
(b) Fix > 0. The next element of the proof is the fact that there are m N, a
m
, . . . , a
m
C such that
|f h|
2
, where h =

m
k=m
a
k
e
k
. PPP By 244Hb/244Pb we know that there is a continuous function g : [, ] C
such that |f g|
2


3
. Next, modifying g on a suitably short interval ] , ], we can nd a continuous function
g
1
: [, ] C such that |g g
1
|
2


3
and g
1
() = g
1
(). (Set M = sup
x[,]
[g(x)[, take ]0, 2] such that
(2M)
2
(/3)
2
, and set g
1
( t) = tg( ) +(1 t)g() for t [0, 1].) Either by the Stone-Weierstrass theorem
(281J), or by 282G above, there are a
m
, . . . , a
m
such that [g
1
(x)

m
k=m
a
k
e
ikx
[

3

2
for every x [, ];
setting h =

m
k=m
a
k
e
k
, we have |g
1
h|
2

1
3
, so that
|f h|
2
|f g|
2
+|g g
1
|
2
+|g
1
h|
2
. QQQ
(c) Now take any n m. Then s
n
h is a linear combination of e
n
, . . . , e
n
, so (f s
n
[s
n
h) = 0. Consequently

2
(f h[f h)
= (f s
n
[f s
n
) + (f s
n
[s
n
h) + (s
n
h[f s
n
) + (s
n
h[s
n
h)
= |f s
n
|
2
2
+|s
n
h|
2
2
|f s
n
|
2
2
.
Thus |f s
n
|
2
for every n m. As is arbitrary, lim
n
|f s
n
|
2
2
= 0, which proves (ii).
(d) As for (i), we have
n

k=n
[c
k
[
2
=
1
2
n

k=n
c
k
(s
n
[e
k
) =
1
2
(s
n
[s
n
) =
1
2
|s
n
|
2
2
.
But of course

|s
n
|
2
|f|
2

|s
n
f|
2
0
as n , so

k=
[c
k
[
2
=
1
2
lim
n
|s
n
|
2
2
=
1
2
|f|
2
2
=
1
2
_

[f[
2
,
as required.
410 Fourier analysis 282K
282K Corollary Let L
2
C
be the Hilbert space of equivalence classes of square-integrable complex-valued functions
on ], ], with the inner product
(f

[g

) =
_

f g
and norm
|f

|
2
=
_
_

[f[
2
_
1/2
,
writing f

L
2
C
for the equivalence class of a square-integrable function f. Let
2
C
(Z) be the Hilbert space of square-
summable double-ended complex sequences, with the inner product
(ccc[ddd) =

k=
c
k

d
k
and norm
|ccc|
2
=
_

k=
[c
k
[
2
_
1/2
for ccc = c
k

kZ
, ddd = d
k

kZ
in
2
C
(Z). Then we have an inner-product-space isomorphism S : L
2
C

2
C
(Z) dened by
saying that
S(f

)(k) =
1

2
_

f(x)e
ikx
dx
for every square-integrable function f and every k Z.
proof (a) As in 282J, write L
2
C
for the space of square-integrable functions. If f, g L
2
C
and f

= g

, then f =
a.e.
g,
so
1

2
_

f(x)e
ikx
dx =
1

2
_

g(x)e
ikx
dx
for every k N. Thus S is well-dened.
(b) S is linear. PPP This is elementary. If f, g L
2
C
and c C,
S(f

+g

)(k) =
1

2
_

(f(x) +g(x))e
ikx
dx
=
1

2
_

f(x)e
ikx
dx +
1

2
_

g(x)e
ikx
dx
= S(f

)(k) +S(g

)(k)
for every k Z, so that S(f

+g

) = S(f

) +S(g

). Similarly,
S(cf

)(k) =
1

2
_

cf(x)e
ikx
dx =
c

2
_

f(x)e
ikx
dx = cS(f

)(k)
for every k Z, so that S(cf

) = cS(f

). QQQ
(c) If f L
2
C
has Fourier coecients c
k
, then S(f

) = c
k

2
kZ
, so by 282J(i)
|S(f

)|
2
2
= 2

k=
[c
k
[
2
=
_

[f[
2
= |f

|
2
2
.
Thus Su
2
C
(Z) and |Su|
2
= |u|
2
for every u L
2
C
. Because S is linear and norm-preserving, it is surely injective.
(d) It now follows that (Sv[Su) = (v[u) for every u, v L
2
C
. PPP (This is of course a standard fact about Hilbert
spaces.) We know that for any t R
282L Fourier series 411
|u|
2
2
+ 2 !e(e
it
(v[u)) +|v|
2
2
= (u[u) +e
it
(v[u) +e
it
(u[v) + (v[v)
= (u +e
it
v[u +e
it
v)
= |u +e
it
v|
2
2
= |S(u +e
it
v)|
2
2
= |Su|
2
2
+ 2 !e(e
it
(Sv[Su)) +|Sv|
2
2
= |u|
2
2
+ 2 !e(e
it
(Sv[Su)) +|v|
2
2
,
so that !e(e
it
(Sv[Su)) = !e(e
it
(v[u)). As t is arbitrary, (Sv[Su) = (v[u). QQQ
(e) Finally, S is surjective. PPP Let ccc = c
k

kZ
be any member of
2
C
(Z). Set c
(n)
k
= c
k
if [k[ n, 0 otherwise, and
ccc
(n)
= c
(n)
k

kN
. Consider
s
n
=
n

k=n
c
k
e
k
, u
n
= s

n
where I write e
k
(x) =
1

2
e
ikx
for x ], ]. Then Su
n
= ccc
(n)
, by the same calculations as in part (a) of the proof
of 282J. Now
|ccc
(n)
ccc|
2
=
_

|k|>n
[c
k
[
2
0
as n , so
|u
m
u
n
|
2
= |ccc
(m)
ccc
(n)
|
2
0
as m, n , and u
n

nN
is a Cauchy sequence in L
2
C
. Because L
2
C
is complete (244G/244Pb), u
n

nN
has a limit
u L
2
C
, and now
Su = lim
n
Su
n
= lim
n
ccc
(n)
= ccc. QQQ
Thus S : L
2
C

2
C
(Z) is an inner-product-space isomorphism.
Remark In the language of Hilbert spaces, all that is happening here is that e

kZ
is a Hilbert space basis or
complete orthonormal sequence in L
2
C
, which is matched by S with the standard basis of
2
C
(Z). The only step which
calls on non-trivial real analysis, as opposed to the general theory of Hilbert spaces, is the check that the linear subspace
generated by e

k
: k Z is dense; this is part (b) of the proof of 282J.
Observe that while S : L
2

2
is readily described, its inverse is more of a problem. If ccc
2
, we should like
to say that S
1
ccc is the equivalence class of f, where f(x) =
1

k=
c
k
e
ikx
for every x. This works very well if
k : c
k
,= 0 is nite, but for the general case it is less clear how to interpret the sum. It is in fact the case that if
ccc
2
then
g(x) =
1

2
lim
n

n
k=n
c
k
e
ikx
is dened for almost every x ], ], and that S
1
ccc = g

in L
2
; this is, in eect, Carlesons theorem (286V). A proof
of Carlesons theorem is out of our reach for the moment. What is covered by the results of this section is that
h(x) =
1

2
lim
m
1
m+1

m
n=0

n
k=n
c
k
e
ikx
is dened for almost every x ], ], and that h

= S
1
ccc. (The point is that we know from the result just proved
that there is some square-integrable f such that ccc is the sequence of Fourier coecients of f; now 282Ia declares that
the Fejer sums of f converge to f almost everywhere, that is, that h =
a.e.
1

2
f.)
282L The next result is the easiest, and one of the most useful, theorems concerning pointwise convergence of
Fourier sums.
Theorem Let f be a complex-valued function which is integrable over ], ], and s
n

nN
its sequence of Fourier
sums.
(i) If f is dierentiable at x ], [, then f(x) = lim
n
s
n
(x).
(ii) If the periodic extension

f of f is dierentiable at , then f() = lim
n
s
n
().
412 Fourier analysis 282L
proof (a) Take x ], ] such that

f is dierentiable at x; of course this covers both parts. We have
s
n
(x) =
1
2
_

f(x+t)
sin
1
2
t
sin(n +
1
2
)t dt
for each n, by 282Da.
(b) Next,
_

f(x+t)

f(x)
t
dt
exists in C, because there is surely some ]0, ] such that (

f(x +t)

f(x))/t is bounded on t : 0 < [t[ , while
_

f(x+t)

f(x)
t
dt,
_

f(x+t)

f(x)
t
dt
exist because 1/t is bounded on those intervals. It follows that
_

f(x+t)

f(x)
sin
1
2
t
dt
exists, because [t[ [ sin
1
2
t[ if [t[ . So by the Riemann-Lebesgue lemma (282Fb),
lim
n
_

f(x+t)

f(x)
sin
1
2
t
sin(n +
1
2
)t dt = 0.
(c) Because
1
2
_

f(x)
sin(n+
1
2
)t
sin
1
2
t
dt =

f(x)
for every n (282Dc),
s
n
(x) =

f(x) +
1
2
_

f(x+t)

f(x)
sin
1
2
t
sin(n +
1
2
)t dt

f(x)
as n , as required.
282M Lemma Suppose that f is a complex-valued function, dened almost everywhere and of bounded variation
on ], ]. Then sup
kZ
[kc
k
[ < , where c
k
is the kth Fourier coecient of f, as in 282A.
proof Set
M = lim
xdomf,x
[f(x)[ + Var
],[
(f).
By 224J,
[kc
k
[ =
1
2

kf(t)e
ikt
dt

1
2
M sup
c[,]

_
c

ke
ikt
dt

=
M
2
sup
c[,]
[e
ikc
e
ik
[
M

for every k.
282N I give another lemma, extracting the technical part of the proof of the next theorem. (Its most natural
application is in 282Xn.)
Lemma Let d
k

kN
be a complex sequence, and set t
n
=

n
k=0
d
k
,
m
=
1
m+1

m
n=0
t
n
for n, m N. Suppose that
sup
kN
[kd
k
[ = M < . Then for any j 1 and any c C,
[t
n
c[
M
j
+ (2j + 3) sup
mnn/j
[
m
c[
for every n j
2
.
282O Fourier series 413
proof (a) The rst point to note is that for any n, n

N,
[t
n
t
n
[
M|nn

|
1+min(n,n

)
.
PPP If n = n

this is trivial. Suppose that n

< n. Then
[t
n
t
n
[ = [
n

k=n

+1
d
k
[
n

k=n

+1
M
k

M(nn

)
n

+1
=
M|nn

|
1+min(n

,n)
.
Of course the case n < n

is identical. QQQ
(b) Now take any n j
2
. Set = sup
mnn/j
[
m
c[. Let m j be such that jm n < j(m + 1); then
n < jm+m; also
n(1
1
j
) m(j + 1)(1
1
j
) mj.
Set

=
1
m
jm+m

=jm+1
t
n
=
jm+m+1
m

jm+m

jm+1
m

jm
.
Then
[

c[ = [
jm+m+1
m

jm+m

jm+1
m

jm
c[
= [
jm+m+1
m
(
jm+m
c)
jm+1
m
(
jm
c)[

jm+m+1
m
+
jm+1
m
(2j + 3).
On the other hand,
[

t
n
[ =

1
m
jm+m

=jm+1
(t
n
t
n
)

1
m
jm+m

=jm+1
M|nn

|
1+min(n,n

1
m
jm+m

=jm+1
Mm
1+jm
=
Mm
1+jm

M
j
.
Putting these together, we have
[t
n
c[ [t
n

[ +[

c[
M
j
+ (2j + 3) =
M
j
+ (2j + 3) sup
mnn/j
[
m
c[,
as required.
282O Theorem Let f be a complex-valued function of bounded variation, dened almost everywhere in ], ],
and let s
n

nN
be its sequence of Fourier sums.
(i) If x ], [, then
lim
n
s
n
(x) =
1
2
(lim
tdomf,tx
f(t) + lim
tdomf,tx
f(t)).
(ii) lim
n
s
n
() =
1
2
(lim
tdomf,t
f(t) + lim
tdomf,t
f(t)).
(iii) If f is dened throughout ], ], is continuous, and lim
t
f(t) = f(), then s
n
(x) f(x) uniformly on
], ].
proof (a) Note rst that 224F shows that the limits lim
tdomf,tx
f(t), lim
tdomf,tx
f(t) required in the formulae
above always exist. We know also from 282M that M = sup
kZ
[kc
k
[ < , where c
k
is the kth Fourier coecient of f.
Take any x ], ], and set
c =
1
2
(lim
tdomf,tx

f(t) + lim
tdom

f,tx

f(t)),
414 Fourier analysis 282O
writing

f for the periodic extension of f, as usual. We know from 282Id-282Ie that c = lim
m

m
(x), writing
m
for
the Fejer sums of f. Let > 0. Take any j max(2, 2M/), and m
0
1 such that [
m
(x) c[ /(2j + 3) for every
m m
0
.
Now if n max(j
2
, 2m
0
), apply Lemma 282N with
d
0
= c
0
, d
k
= c
k
e
ikx
+c
k
e
ikx
for k 1,
so that t
n
= s
n
(x),
m
=
m
(x) and [kd
k
[ 2M for every k, n, m N. We have n n/j
1
2
n m
0
, so
= sup
mnn/j
[
m
c[ sup
mm
0
[
m
c[

2j+3
.
So 282N tells us that
[s
n
(x) c[ = [t
n
c[
2M
j
+ (2j + 3) sup
mnn/j
[
m
c[ + (2j + 3) 2.
As is arbitrary, lim
n
s
n
(x) = c, as required.
(b) This proves (i) and (ii) of this theorem. Finally, for (iii), observe that under these conditions
m
(x) f(x)
uniformly as m , by 282G. So given > 0 we choose j max(2, 2M/) and m
0
N such that [
m
(x) f(x)[
/(2j + 3) whenever m m
0
and x ], ]. By the same calculation as before,
[s
n
(x) f(x)[ 2
for every n max(j
2
, 2m
0
) and every x ], ]. As is arbitrary, lim
n
s
n
(x) = f(x) uniformly for x ], ].
282P Corollary Let f be a complex-valued function which is integrable over ], ], and s
n

nN
its sequence of
Fourier sums.
(i) Suppose that x ], [ is such that f is of bounded variation on some neighbourhood of x. Then
lim
n
s
n
(x) =
1
2
(lim
tdomf,tx
f(t) + lim
tdomf,tx
f(t)).
(ii) If there is a > 0 such that f is of bounded variation on both ], +] and [ , ], then
lim
n
s
n
() =
1
2
(lim
tdomf,t
f(t) + lim
tdomf,t
f(t)).
proof In case (i), take > 0 such that f is of bounded variation on [x , x + ] and set f
1
(t) = f(t) if x
domf [x , x + ], 0 for other t ], ]; in case (ii), set f
1
(t) = f(t) if t domf and [t[ , 0 for other
t ], ], and say that x = . In either case, f
1
is of bounded variation, so by 282O the Fourier sums s

nN
of
f
1
converge at x to the value given by the formulae above. But now observe that, writing

f and

f
1
for the periodic
extensions of f and f
1
,

f

f
1
= 0 on a neighbourhood of x, so
_

f(x+t)

f
1
(x+t)
sin
1
2
t
dt
exists in C, and by 282Fb
lim
n
_

f(x +t)

f
1
(x +t)
sin
1
2
t
sin(n +
1
2
)t dt = 0,
that is, lim
n
s
n
(x) s

n
(x) = 0. So s
n

nN
also converges to the right limit.
282Q I cannot leave this section without mentioning one of the most important facts about Fourier series, even
though I have no space here to discuss its consequences.
Theorem Let f and g be complex-valued functions which are integrable over ], ], and c
k

kN
, d
k

kN
their
Fourier coecients. Let f g be their convolution, dened by the formula
(f g)(x) =
_

f(x
2
t)g(t)dt =
_

f(x t)g(t)dt,
as in 255O, writing

f for the periodic extension of f. Then the Fourier coecients of f g are 2c
k
d
k

kZ
.
proof By 255O(c-i),
282Xg Fourier series 415
1
2
_

(f g)(x)e
ikx
dx =
1
2
_

e
ik(t+u)
f(t)g(u)dtdu
=
1
2
_

e
ikt
f(t)dt
_

e
iku
g(u)du = 2c
k
d
k
.
*282R In my hurry to get to the theorems on convergence of Fejer and Fourier sums, I have rather neglected the
elementary manipulations which are essential when applying the theory. One basic result is the following.
Proposition (a) Let f : [, ] C be an absolutely continuous function such that f() = f(), and c
k

kZ
its
sequence of Fourier coecients. Then the Fourier coecients of f

are ikc
k

kZ
.
(b) Let f : R C be a dierentiable function such that f

is absolutely continuous on [, ], and f() = f().


If c
k

kZ
are the Fourier coecients of f ], ], then

k=
[c
k
[ is nite.
proof (a) By 225Cb, f

is integrable over [, ]; by 225E, f is an indenite integral of f

. So 225F tells us that


_

(x)e
ikx
dx = f()e
ik
f()e
ik
+ik
_

f(x)e
ikx
dx = ikc
k
for every k Z.
(b)(i) Suppose rst that f

() = f

(). By (a), applied twice, the Fourier coecients of f

are k
2
c
k

kZ
, so
sup
kZ
k
2
[c
k
[ is nite; because

k=1
1
k
2
< ,

k=
[c
k
[ < .
(ii) Next, suppose that f(x) = x
2
for every x. Then, for k ,= 0,
c
k
=
1
2
_
x
2
e
ikx
dx =
1
2
_

1
ik
(
2
e
ik

2
e
ik
) +
_

2x
ik
e
ikx
dx
_
=
1
ik
_

1
ik
(e
ik
+e
ik
) +
1
ik
_

e
ikx
dx
_
=
2
k
2
(1)
k
,
so

kZ
[c
k
[ [c
0
[ + 4

k=1
1
k
2
is nite.
(iii) In general, we can express f as f
1
+cf
2
where f
2
(x) = x
2
for every x, c =
1
4
(f

() f

()), and f
1
satises
the conditions of (i); so that c
k

kZ
is the sum of two summable sequences and is itself summable.
282X Basic exercises >>>(a) Suppose that c
k

kN
is an absolutely summable double-ended sequence of complex
numbers. Show that f(x) =

k=
c
k
e
ikx
exists for every x R, that f is continuous and periodic, and that its
Fourier coecients are the c
k
.
(c) Set
n
(t) =
2
t
sin(n +
1
2
t) for t ,= 0. (This is sometimes called the modied Dirichlet kernel.) Show that for
any integrable function f on ], ], with Fourier sums s
n

nN
and periodic extension

f,
lim
n
[s
n
(x)
1
2
_


n
(t)

f(x +t)dt[ = 0
for every x ], ]. (Hint: show that
2
t

1
sin
1
2
t
is bounded, and use 282E.)
(d) Give a proof of 282Ib from 242O, 255O and 282G.
(e) Give another proof of 282Ic, based on 242O and 281J instead of on 282H.
(f ) Use the idea of 255Ya to shorten one of the steps in the proof of 282H, taking g
m
(t) = min(
m+1
2
,

4(m+1)t
2
) for
[t[ , so that g
m

m
on [, ].
>>>(g)(i) Let f be a real square-integrable function on ], ], and a
k

kN
, b
k

k1
its real Fourier coecients
(282Ba). Show that
1
2
a
2
0
+

k=1
(a
2
k
+b
2
k
) =
1

[f[
2
. (ii) Show that f (
_

2
a
0
,

a
1
,

b
1
, . . . ) denes an inner-
product-space isomorphism between the real Hilbert space L
2
R
of equivalence classes of real square-integrable functions
on ], ] and the real Hilbert space
2
R
of square-summable sequences.
416 Fourier analysis 282Xh
(h) Show that

4
= 1
1
3
+
1
5

1
7
+ . . . . (Hint: nd the Fourier series of f where f(x) = x/[x[, and compute the
sum of the series at

2
. Of course there are other methods, e.g., examining the Taylor series for arctan

4
.)
(i) Let f be an integrable complex-valued function on ], ], and s
n

nN
its sequence of Fourier sums. Suppose
that x ], [, a C are such that
_

f(t)a
tx
dt exists and is nite. Show that lim
n
s
n
(x) = a. Explain how this
generalizes 282L. What modication is appropriate to obtain a limit lim
n
s
n
()?
(j) Suppose that > 0, K 0 and f : ], [ C are such that [f(x) f(y)[ K[x y[

for all x, y ], [.
(Such functions are called Holder continuous.) Show that the Fourier sums of f converge to f everywhere in ], [.
(Hint: use 282Xi.) (Compare 282Yb.)
(k) In 282L, show that it is enough if

f is dierentiable with respect to its domain at x or (see 262Fb), rather
than dierentiable in the strict sense.
(l) Show that lim
a
_
a
0
sin t
t
dt exists and is nite. (Hint: use 224J to estimate
_
b
a
sin t
t
dt for 0 < a b.)
(m) Show that
_

0
| sin t|
t
dt = . (Hint: show that sup
a0
[
_
a
1
cos 2t
t
dt[ < , and therefore that sup
a0
_
a
1
sin
2
t
t
dt =
.)
>>>(n) Let d
k

kN
be a sequence in C such that sup
kN
[kd
k
[ < and
lim
m
1
m+1

m
n=0

n
k=0
d
k
= c C.
Show that c =

k=0
d
k
. (Hint: 282N.)
>>>(o) Show that

n=1
1
n
2
=

2
6
. (Hint: (b-ii) of the proof of 282R.)
(p) Let f be an integrable complex-valued function on ], ], and s
n

nN
its sequence of Fourier sums. Suppose
that x ], [ is such that
(i) there is an a C such that
either
_
x

af(t)
xt
dt exists in C
or there is some > 0 such that f is of bounded variation on [x , x], and a = lim
tdomf,tx
f(t)
(ii) there is a b C such that
either
_

x
f(t)b
tx
dt exists in C
or there is some > 0 such that f is of bounded variation on [x, x +], and b = lim
tdomf,tx
f(t).
Show that lim
n
s
n
(x) =
1
2
(a +b). What modication is appropriate to obtain a limit lim
n
s
n
()?
>>>(q) Let f, g be integrable complex-valued functions on ], ], and ccc = c
k

kZ
, ddd = d
k

kZ
their sequences of
Fourier coecients. Suppose that either

k=
[c
k
[ < or

k=
[c
k
[
2
+ [d
k
[
2
< . Show that the sequence of
Fourier coecients of f g is just the convolution ccc ddd of ccc and ddd (255Xk).
(r) In 282Ra, what happens if f() ,= f()?
(s) Suppose that c
k

kN
is a double-ended sequence of complex numbers such that

k=
[kc
k
[ < . Show that
f(x) =

k=
c
k
e
ikx
exists for every x R and that f is dierentiable everywhere.
(t) Let c
k

kZ
be a double-ended sequence of complex numbers such that sup
kZ
[kc
k
[ < . Show that there is
a square-integrable function f on ], ] such that the c
k
are the Fourier coecients of f, that f is the limit almost
everywhere of its Fourier sums, and that f f f is dierentiable. (Hint: use 282K to show that there is an f, and
282Xn to show that its Fourier sums converge wherever its Fejer sums do; use 282Q and 282Xs to show that f f f
is dierentiable.)
282Y Further exercises (a) Let f be a non-negative integrable function on ], ], with Fourier coecients
c
k

kZ
. Show that

n
j=0

n
k=0
a
j
a
k
c
jk
0
for all complex numbers a
0
, . . . , a
n
. (See also 285Xr below.)
282 Notes Fourier series 417
(b) Let f : ], ] C, K 0, > 0 be such that [f(x) f(y)[ K[x y[

for all x, y ], ]. Let c


k
, s
n
be
the Fourier coecients and sums of f. (i) Show that sup
kZ
[k[

[c
k
[ < . (Hint: show that c
k
=
1
4
_

(f(x)

f(x +

k
))e
ikx
dx.) (ii) Show that if f() = lim
x
f(x) then s
n
f uniformly. (Compare 282Xj.)
(c) Let f be a measurable complex-valued function on ], ], and suppose that p [1, [ is such that
_

[f[
p
< .
Let
m

mN
be the sequence of Fejer sums of f. Show that lim
m
_

[f
m
[
p
= 0. (Hint: use 245Xl, 255Yk and
the ideas in 282Ib.)
(d) Construct a continuous function h : [, ] R such that h() = h() but the Fourier sums of h are
unbounded at 0, as follows. Set (m, n) =
_

0
sin(m+
1
2
)t sin(n+
1
2
)t
sin
1
2
t
dt. Show that lim
n
(m, n) = 0 for every m, but
lim
n
(n, n) = . Set h
0
(x) =

k=0

k
sin(m
k
+
1
2
)x for 0 x , 0 for x 0, where
k
> 0, m
k
N are
such that ()
k
2
k
,
k
[(m
k
, m
n
)[ 2
k
for every n < k (choosing
k
) ()
k
(m
k
, m
k
) k,
n
[(m
k
, m
n
)[ 2
n
for every n < k (choosing m
k
). Now modify h
0
on [, 0[ by adding a function of bounded variation.
(e)(i) Show that lim
n
_

[
sin(n+
1
2
)t
sin
1
2
t
[dt = . (Hint: 282Xm.) (ii) Show that for any > 0 there are n N, f 0
such that
_

f ,
_

[s
n
[ 1, where s
n
is the nth Fourier sum of f. (Hint: take n such that
1
2
_

[
sin(n+
1
2
)t
sin
1
2
t
[dt >
1

and set f(x) =


for 0 x , 0 otherwise, where is small.) (iii) Show that there is an integrable function
f : ], ] R such that sup
nN
|s
n
|
1
is innite, where s
n

nN
is the sequence of Fourier sums of f. (Hint: it helps
to know the Uniform Boundedness Theorem of functional analysis, but f can also be constructed bare-handed by the
method of 282Yd.)
(f ) Let u : [, ] R be an absolutely continuous function such that u() = u() and
_

u = 0. Show that
|u|
2
|u

|
2
. (This is Wirtingers inequality.)
(g) For 0 r < 1, t R set A
r
(t) =
1r
2
12r cos t+r
2
. (A
r
is the Poisson kernel; see 478Xl
1
in Volume 4.) (i) Show that
1
2
_

A
r
= 1. (ii) For a real function f which is integrable over ], ], with real Fourier coecients a
k
, b
k
(282Ba),
set S
r
(x) =
1
2
a
0
+

k=1
r
k
(a
k
cos kx + b
k
sin kx) for x ], ], r [0, 1[. Show that S
r
(x) =
1
2
_

A
r
(x t)f(t)dt
for every x ], ]. (Hint: A
r
(t) = 1 + 2

n=1
r
n
cos nt.) (iii) Show that lim
r1
S
r
(x) = f(x) for every x ], [
which is in the Lebesgue set of f. (Hint: 223Yg.) (iv) Show that lim
r1
_

[S
r
f[ = 0. (v) Show that if f is dened
everywhere on ], ], is continuous, and f() = lim
x
f(x), then lim
r1
sup
x],]
[S
r
(x) f(x)[ = 0.
282 Notes and comments This has been a long section with a potentially confusing collection of results, so perhaps
I should recapitulate. Associated with any integrable function on ], ] we have the corresponding Fourier sums,
being the symmetric partial sums

n
k=n
c
k
e
ikx
of the complex series

k=
c
k
e
ikx
, or, equally, the partial sums
1
2
a
0
+

n
k=1
a
k
cos kx +b
k
sin kx of the real series
1
2
a
0
+

k=1
a
k
cos kx +b
k
sin kx. The Fourier coecients c
k
, a
k
, b
k
are the only natural ones, because if the series is to converge with any regularity at all then
1
2
_

k=
c
k
e
ikx
_
e
ilx
dx
ought to be simultaneously

k=
1
2
_

c
k
e
ikx
e
ilx
dx = c
l
and
1
2
_

f(x)e
ilx
dx.
(Compare the calculations in 282J.) The eect of taking Fejer sums
m
(x) rather than the Fourier sums s
n
(x) is to
smooth the sequence out; recall that if lim
n
s
n
(x) = c then lim
m

m
(x) = c, by 273Ca in the last chapter.
Most of the work above is concerned with the question of when Fourier or Fejer sums converge, in some sense, to
the original function f. As has happened before, in 245 and elsewhere, we have more than one kind of convergence to
1
Later editions only.
418 Fourier analysis 282 Notes
consider. Norm convergence, for | |
1
or | |
2
or | |

, is the simplest; the three theorems 282G, 282Ib and 282J at least
are relatively straightforward. (I have given 282Ib as a corollary of 282Ia; but there is an easier proof from 282G. See
282Xd.) Respectively, we have
if f is continuous (and matches at , that is, f() = lim
t
f(t)) then
m
f uniformly, that is, for
| |

(282G);
if f is any integrable function, then
m
f for | |
1
(282Ib);
if f is a square-integrable function, then s
n
f for | |
2
(282J);
if f is continuous and of bounded variation (and matches at ), then s
n
f uniformly (282O).
There are some similar results for other | |
p
(282Yc); but note that the Fourier sums need not converge for | |
1
(282Ye).
Pointwise convergence is harder. The results I give are
if f is any integrable function, then
m
f almost everywhere (282Ia);
this relies on some careful calculations in 282H, and also on the deep result 223D. Next we have the results which look
at the average of the limits of f from the two sides. Suppose I write
f

(x) =
1
2
(lim
tx
f(t) + lim
tx
f(t))
whenever this is dened, taking f

() =
1
2
(lim
t
f(t) + lim
t
f(t)). Then we have
if f is any integrable function,
m
f

wherever f

is dened (282I);
if f is of bounded variation, s
n
f

everywhere (282O).
Of course these apply at any point at which f is continuous, in which case f(x) = f

(x). Yet another result of this


type is
if f is any integrable function, s
n
f at any point at which f is dierentiable (282L);
in fact, this can be usefully extended for very little extra labour (282Xi, 282Xp).
I cannot leave this list without mentioning the theorem I have not given. This is Carlesons theorem:
if f is square-integrable, s
n
f almost everywhere
(Carleson 66). I will come to this in 286. There is an elementary special case in 282Xt. The result is in fact valid
for many other f (see the notes to 286).
The next glaring lacuna in the exposition here is the absence of any examples to show how far these results are best
possible. There is no suggestion, indeed, that there are any natural necessary and sucient conditions for
s
n
f at every point.
Nevertheless, we have to make an eort to nd a continuous function for which this is not so, and the construction of an
example by du Bois-Reymond (Bois-Reymond 1876) was an important moment in the history of analysis, not least
because it forced mathematicians to realise that some comfortable assumptions about the classication of functions
essentially, that functions are either good or so bad that one neednt trouble with them were false. The example is
instructive but I have had to omit it for lack of space; I give an outline of a possible method in 282Yd. (You can nd
a detailed construction in K orner 88, chapter 18, and a proof that such a function exists in Dudley 89, 7.4.3.) If
you allow general integrable functions, then you can do much better, or perhaps I should say much worse; there is an
integrable f such that sup
nN
[s
n
(x)[ = for every x ], ] (Kolmogorov 26; see Zygmund 59, VIII.3-4).
In 282C I mentioned two types of problem. The rst when is a Fourier series summable? has at least been
treated at length, even though I cannot pretend to have given more than a sample of what is known. The second
how do properties of the c
k
reect properties of f? I have hardly touched on. I do give what seem to me to be the
three most important results in this area. The rst is
if f and g have the same Fourier coecients, they are equal almost everywhere (282Ic).
This at least tells us that we ought in principle to be able to learn almost anything about f by looking at its Fourier
series. (For instance, 282Ya describes a necessary and sucient condition for f to be non-negative almost everywhere.)
The second is
f is square-integrable i

k=
[c
k
[
2
< ;
in fact,

k=
[c
k
[
2
=
1
2
_

[f[
2
(282J).
Of course this is fundamental, since it shows that Fourier coecients provide a natural Hilbert space isomorphism
between L
2
and
2
(282K). I should perhaps remark that while the real Hilbert spaces L
2
R
,
2
R
are isomorphic as inner
product spaces (282Xg), they are certianly not isomorphic as Banach lattices; for instance,
2
R
has atomic elements ccc
such that if 0 ddd ccc then ddd is a multiple of ccc, while L
2
R
does not. Perhaps even more important is
283B Fourier transforms I 419
the Fourier coecients of a convolution f g are just a scalar multiple of the products of the Fourier
coecients of f and g (282Q);
but to use this eectively we need to study the Banach algebra structure of L
1
, and I have no choice but to abandon
this path immediately. (It will form a conspicuous part of Chapter 44 in Volume 4.) 282Xt gives an elementary
consequence, and 282Xq a very partial description of the relationship between a product f g of two functions and
the convolution product of their sequences of Fourier coecients.
The Fejer sums considered in this section are one way of working around the convergence diculties associated
with Fourier sums. When we come to look at Fourier transforms in the next two sections we shall need some further
manoeuvres. A dierent type of smoothing is obtained by using the Poisson kernel in place of the Dirichlet or Fejer
kernel (282Yg).
I end these notes with a remark on the number 2. This enters nearly every formula involving Fourier series, but
could I think be removed totally from the present section, at least, by re-normalizing the measure of ], ]. If instead
of Lebesgue measure we took the measure =
1
2
throughout, then every 2 would disappear. (Compare the remark
in 282Bb concerning the possibility of doing integrals over S
1
.) But I think most of us would prefer to remember the
location of a 2 in every formula than to deal with an unfamiliar measure.
283 Fourier transforms I
I turn now to the theory of Fourier transforms on R. In the rst of two sections on the subject, I present those parts
of the elementary theory which can be dealt with using the methods of the previous section on Fourier series. I nd
no way of making sense of the theory, however, without introducing a fragment of L.Schwartz theory of distributions,
which I present in 284. As in 282, of course, this treatment also is nothing but a start in the topic.
The whole theory can also be done in R
r
. I leave this extension to the exercises, however, since there are few new
ideas, the formulae are signicantly more complicated, and I shall not, in this volume at least, have any use for the
multidimensional versions of these particular theorems, though some of the same ideas will appear, in multidimensional
form, in 285.
283A Denitions Let f be a complex-valued function which is integrable over R.
(a) The Fourier transform of f is the function

f : R C dened by setting

f(y) =
1

2
_

e
iyx
f(x)dx
for every y R. (Of course the integral is always dened because x e
iyx
is bounded and continuous, therefore
measurable.)
(b) The inverse Fourier transform of f is the function

f : R C dened by setting

f(y) =
1

2
_

e
iyx
f(x)dx
for every y R.
283B Remarks (a) It is a mildly vexing feature of the theory of Fourier transforms vexing, that is, for outsiders
like myself that there is in fact no standard denition of Fourier transform. The commonest denitions are, I think,

f(y) =
1

2
_

e
iyx
f(x)dx,

f(y) =
_

e
iyx
f(x)dx,

f(y) =
_

e
2iyx
f(x)dx,
corresponding to inverse transforms

f(y) =
1

2
_

e
iyx
f(x)dx,

f(y) =
1
2
_

e
iyx
f(x)dx,
420 Fourier analysis 283B

f(y) =
_

e
2iyx
f(x)dx.
I leave it to you to check that the whole theory can be carried through with any of these six pairs, and to investigate
other possibilities (see 283Xa-283Xb below).
(b) The phrases Fourier transform, inverse Fourier transform make it plain that (

f)

is supposed to be f, at least
some of the time. This is indeed the case, but the class of f for which this is true in the literal sense is somewhat
constrained, and we shall have to wait a little while before investigating it.
(c) No amount of juggling with constants, in the manner of (a) above, can make

f and

f quite the same. However,


on the denitions I have chosen, we do have

f(y) =

f(y) for every y, so that

f and

f will share essentially all the


properties of interest to us here; in particular, everything in the next proposition will be valid with

in place of

, if
you change signs at the right points in parts (c), (h) and (i).
283C Proposition Let f and g be complex-valued functions which are integrable over R.
(a) (f +g)

f +

g.
(b) (cf)

= c

f for every c C.
(c) If c R and h(x) = f(x +c) whenever this is dened, then

h(y) = e
icy

f(y) for every y R.
(d) If c R and h(x) = e
icx
f(x) for every x domf, then

h(y) =

f(y c) for every y R.


(e) If c > 0 and h(x) = f(cx) whenever this is dened, then

h(y) =
1
c

f(cy) for every y R.


(f)

f : R C is continuous.
(g) lim
y

f(y) = lim
y

f(y) = 0.
(h) If
_

[xf(x)[dx < , then

f is dierentiable, and its derivative is

(y) =
i

2
_

e
iyx
xf(x)dx
for every y R.
(i) If f is absolutely continuous on every bounded interval and f

is integrable, then (f

(y) = iy

f(y) for every


y R.
proof (a) and (b) are trivial, and (c), (d) and (e) are elementary substitutions.
(f ) If y
n

nN
is any convergent sequence in R with limit y, then

f(y) =
1

2
_

lim
n
e
iy
n
x
f(x)dx
= lim
n
1

2
_

e
iy
n
x
f(x)dx = lim
n

f(y
n
)
by Lebesgues Dominated Convergence Theorem, because [e
iy
n
x
f(x)[ [f(x)[ for every n N, x domf. As y
n

nN
is arbitrary,

f is continuous.
(g) This is just the Riemann-Lebesgue lemma (282E).
(h) The point is that [

y
e
iyx
f(x)[ = [xf(x)[ for every x domf, y R. So by 123D

(y) =
1

2
d
dy
_

e
iyx
f(x)dx =
1

2
d
dy
_
domf
e
iyx
f(x)dx
=
1

2
_
domf

y
e
iyx
f(x)dx =
1

2
_

ixe
iyx
f(x)dx
=
i

2
_

xe
iyx
f(x)dx.
(i) Because f is absolutely continuous on every bounded interval,
f(x) = f(0) +
_
x
0
f

for x 0, f(x) = f(0)


_
0
x
f

for x 0.
283D Fourier transforms I 421
Because f

is integrable,
lim
x
f(x) = f(0) +
_

0
f

, lim
x
f(x) = f(0)
_
0

both exist. Because f also is integrable, both limits must be zero. Now we can integrate by parts (225F) to see that
(f

(y) =
1

2
_

e
iyx
f

(x)dx =
1

2
lim
a
_
a
a
e
iyx
f

(x)dx
=
1

2
_
lim
a
e
iya
f(a) lim
a
e
iya
f(a)
_
+
iy

2
_

e
iyx
f(x)dx
= iy

f(y).
283D Lemma (a) lim
a
_
a
0
sin x
x
dx =

2
, lim
a
_
a
a
sin x
x
dx = .
(b) There is a K < such that [
_
b
a
sin cx
x
dx[ K whenever a b and c R.
proof (a)(i) Set
F(a) =
_
a
0
sin x
x
dx if a 0, F(a) =
_
0
a
sin x
x
dx if a 0,
so that F(a) = F(a) and
_
b
a
sin x
x
dx = F(b) F(a) for all a b.
If 0 < a b, then by 224J
[
_
b
a
sin x
x
dx[ (
1
b
+
1
a

1
b
) sup
c[a,b]
[
_
c
a
sin xdx[
1
a
sup
c[a,b]
[ cos c cos a[
2
a
.
In particular, [F(n) F(m)[
2
m
if m n in N, and F(n)
nN
is a Cauchy sequence with limit say; now
[ F(a)[ = lim
n
[F(n) F(a)[
2
a
for every a > 0, so lim
a
F(a) = . Of course we also have
lim
a
_
a
a
sin x
x
dx = lim
a
(F(a) F(a)) = lim
a
2F(a) = 2.
(ii) So now I have to calculate . For this, observe rst that
2 = lim
a
_
a
a
sin x
x
dx = lim
a
_

sin at
t
dt
(substituting x = t/a). Next,
lim
t0
1
t

1
2 sin
1
2
t
= lim
u0
sin uu
2usin u
= 0,
so
_

1
t

1
2 sin
1
2
t

dt < ,
and by the Riemann-Lebesgue lemma (282Fb)
lim
a
_

_
1
t

1
2 sin
1
2
t
_
sin at dt = 0.
But we know that
_

sin(n+
1
2
)t
2 sin
1
2
t
dt =
for every n (using 282Dc), so we must have
lim
a
_
a
a
sin t
t
dt = lim
a
_

sin at
t
dt = lim
a
_

sin at
2 sin
1
2
t
dt
= lim
n
_

sin(n+
1
2
)t
2 sin
1
2
t
dt = ,
and = /2, as claimed.
422 Fourier analysis 283D
(b) Because F is continuous and
lim
a
F(a) = =

2
, lim
a
F(a) = =

2
,
F is bounded; say [F(a)[ K
1
for all a R. Try K = 2K
1
. Now suppose that a < b and c R. If c > 0, then
[
_
b
a
sin cx
x
dx[ = [
_
bc
ac
sin t
t
dt[ = [F(bc) F(ac)[ 2K
1
= K,
substituting x = t/c. If c < 0, then
[
_
b
a
sin cx
x
dx[ = [
_
b
a
sin(c)x
x
dx[ K;
while if c = 0 then
[
_
b
a
sin cx
x
dx[ = 0 K.
283E The hardest work of this section will lie in the pointwise inversion theorems 283I and 283K below. I begin
however with a relatively easy, and at least equally important, result, showing (among other things) that an integrable
function f can (essentially) be recovered from its Fourier transform.
Lemma Whenever c < d in R,
lim
a
_
a
a
e
iyx
e
idy
e
icy
y
dy = 2i if c < x < d,
= i if x = c or x = d,
= 0 if x < c or x > d.
proof We know that for any b > 0
lim
a
_
a
a
sin bx
x
dx = lim
a
_
ab
ab
sin t
t
dt =
(subsituting x = t/b), and therefore that for any b < 0
lim
a
_
a
a
sin bx
x
dx = lim
a
_
a
a
sin(b)x
x
dx = .
Now consider, for x R,
lim
a
_
a
a
e
iyx
e
idy
e
icy
y
dy.
First note that all the integrals
_
a
a
exist, because
lim
y0
e
idy
e
icy
y
= i(d c)
is nite, and the integrand is certainly continuous except at 0. Now we have
_
a
a
e
iyx
e
idy
e
icy
y
dy
=
_
a
a
e
i(dx)y
e
i(cx)y
y
dy
=
_
a
a
cos(dx)ycos(cx)y
y
dy +i
_
a
a
sin(dx)ysin(cx)y
y
dy
= i
_
a
a
sin(dx)ysin(cx)y
y
dy
because cos is an even function, so
_
a
a
cos(dx)ycos(cx)y
y
dy = 0
283F Fourier transforms I 423
for every a 0. (Once again, this integral exists because
lim
y0
cos(dx)ycos(cx)y
y
= 0.)
Accordingly
lim
a
_
a
a
e
iyx
e
idy
e
icy
y
dy = i lim
a
_
a
a
sin(dx)y
y
dy i lim
a
_
a
a
sin(cx)y
y
dy
= i i = 0 if x < c,
= i 0 = i if x = c,
= i +i = 2i if c < x < d,
= 0 +i = i if x = d,
= i +i = 0 if x > d.
283F Theorem Let f be a complex-valued function which is integrable over R, and

f its Fourier transform. Then


whenever c d in R,
_
d
c
f =
i

2
lim
a
_
a
a
e
icy
e
idy
y

f(y)dy.
proof If c = d this is trivial; let us suppose that c < d.
(a) Writing

a
(x) =
_
a
a
e
iyx
e
idy
e
icy
y
dy
for x R, a 0, 283E tells us that
lim
a

a
(x) = 2i(x)
where =
1
2
([c, d] +]c, d[) takes the value 1 inside the interval [c, d], 0 outside and the value
1
2
at the endpoints. At
the same time,
[
a
(x)[ = [
_
a
a
sin(dx)ysin(cx)y
y
dy[
[
_
a
a
sin(dx)y
y
dy[ +[
_
a
a
sin(cx)y
y
dy[ 2K
for all a 0, x R, where K is the constant of 283Db. Consequently [f
a
[ 2K[f[ everywhere on domf, for every
a 0, and (applying Lebesgues Dominated Convergence Theorem to sequences f
a
n

nN
, where a
n
)
lim
a
_
f
a
= 2i
_
f = 2i
_
d
c
f.
(b) Now consider the limit in the statement of the theorem. We have
_
a
a
e
icy
e
idy
y

f(y)dy =
1

2
_
a
a
_

e
icy
e
idy
y
e
iyx
f(x)dxdy
=
1

2
_

_
a
a
e
icy
e
idy
y
e
iyx
f(x)dydx
=
1

2
_

f
a
,
by Fubinis and Tonellis theorems (252H), using the fact that (e
icy
e
idy
)/y is bounded to see that
_

_
a
a

e
icy
e
idy
y
e
iyx
f(x)

dydx
424 Fourier analysis 283F
is nite. Accordingly
i

2
lim
a
_
a
a
e
icy
e
idy
y

f(y)dy =
i
2
lim
a
_

f
a
=
i
2
2i
_
d
c
f =
_
d
c
f,
as required.
283G Corollary If f and g are complex-valued functions which are integrable over R, then

f =

g i f =
a.e.
g.
proof If f =
a.e.
g then of course

f(y) =
1

2
_

e
iyx
f(x)dx =
1

2
_

e
iyx
g(x)dx =

g(y)
for every y R. Conversely, if

f =

g, then by the last theorem
_
d
c
f =
_
d
c
g
for all c d, so f = g almost everywhere, by 222D.
283H Lemma Let f be a complex-valued function which is integrable over R, and

f its Fourier transform. Then


for any a > 0, x R,
1

2
_
a
a
e
ixy

f(y)dy =
1

sin a(xt)
xt
f(t)dt =
1

sin at
t
f(x t)dt.
proof We have
_
a
a
_

[e
ixy
e
iyt
f(t)[dtdy 2a
_

[f(t)[dt < ,
so (because the function (t, y) e
ixy
e
iyt
f(t) is surely jointly measurable) we may reverse the order of integration,
and get
1

2
_
a
a
e
ixy

f(y)dy =
1
2
_
a
a
_

e
ixy
e
iyt
f(t)dt dy
=
1
2
_

f(t)
_
a
a
e
i(xt)y
dy dt
=
1
2
_

2 sin(xt)a
xt
f(t)dt =
1

sin au
u
f(x u)du,
substituting t = x u.
283I Theorem Let f be a complex-valued function which is integrable over R, and suppose that f is dierentiable
at x R. Then
f(x) =
1

2
lim
a
_
a
a
e
ixy

f(y)dy =
1

2
lim
a
_
a
a
e
ixy

f(y)dy.
proof Set g(u) = f(x) if [u[ 1, 0 otherwise, and observe that lim
u0
1
u
(f(x u) g(u)) = f

(x) is nite, so that


there is a ]0, 1] such that
K = sup
0<|u|

f(xu)g(u)
u

< .
Consequently
_

f(xu)g(u)
u

du
1

[f(x u)[du +
1

_
1
1
[g[
+
_

K +
1

[f(x u)[du

[f[ +
2

[f(x)[ + 2K < .
283K Fourier transforms I 425
By the Riemann-Lebesgue lemma (282Fb),
lim
a
_

sin au
u
(f(x u) g(u))du = 0.
If we now examine
_
sin au
u
g(u)du, we get
_

sin au
u
g(u)du =
_
1
1
sin au
u
f(x)du = f(x)
_
a
a
sin v
v
dv,
substituting u = v/a. So we get
lim
a
_

sin au
u
f(x u)du = lim
a
_

sin au
u
g(u)du
= lim
a
f(x)
_
a
a
sin v
v
dv = f(x),
by 283D. Accordingly
1

2
lim
a
_
a
a
e
ixy

f(y)dy =
1

lim
a
_

sin au
u
f(x u)du = f(x),
using 283H. As for the second equality,
1

2
lim
a
_
a
a
e
ixy

f(y)dy =
1

2
lim
a
_
a
a
e
ixy

f(y)dy
=
1

2
lim
a
_
a
a
e
ixu

f(u)du = f(x)
(substituting y = u).
Remark Compare 282L.
283J Corollary Let f : R C be an integrable function such that f is dierentiable and

f is integrable. Then
f = (

f)

= (

f)

.
proof Because

f is integrable,

(x) = lim
a
1

2
_
a
a
e
ixy

f(y)dy = f(x)
for every x R. Similarly,

(x) = lim
a
1

2
_
a
a
e
ixy

f(y)dy = f(x).
283K The next proposition gives a class of functions to which the last corollary can be applied.
Proposition Suppose that f is a twice-dierentiable function from R to C such that f, f

and f

are all integrable.


Then

f is integrable.
proof Because f

and f

are integrable, f and f

are absolutely continuous on any bounded interval (225L). So by


283Ci we have
(f

(y) = iy(f

(y) = y
2

f(y)
for every y R. At the same time, by 283Cf-283Cg, (f

and

f must be bounded; say [

f(y)[ + [(f

(y)[ K for
every y R. Now
[

f(y)[
K
1+y
2
for every y, so that
_

f[ K
_
1

1
y
2
dy + 2K +K
_

1
1
y
2
dy = 4K < .
Remark Compare 282Rb.
426 Fourier analysis 283L
283L I turn now to the result corresponding to 282O, using a slightly dierent approach.
Theorem Let f be a complex-valued function which is integrable over R, with Fourier transform

f and inverse Fourier


transform

f, and suppose that f is of bounded variation on some neighbourhood of x R. Set a = lim


tdomf,tx
f(t),
b = lim
tdomf,tx
f(t). Then
1

2
lim

e
ixy

f(y)dy =
1

2
lim

e
ixy

f(y)dy =
1
2
(a +b).
proof (a) The limits lim
tdomf,tx
f(t) and lim
tdomf,tx
f(t) exist because f is of bounded variation near x (224F).
Recall from 283Db that there is a constant K < such that
[
_

sin cx
x
dx[ K
whenever and c R.
(b) Let > 0. The hypothesis is that there is some > 0 such that Var
[x,x+]
(f) < . Consequently
lim
0
Var
]x,x+]
(f) = lim
0
Var
[x,x[
(f) = 0
(224E). There is therefore an > 0 such that
max(Var
[x,x[
(f), Var
]x,x+]
(f)) .
Of course
[f(t) f(u)[ Var
[x,x[
(f)
whenever t, u domf and x t u < x, so we shall have
[f(t) a[ for every t domf [x , x[,
and similarly
[f(t) b[ whenever t domf ]x, x +].
(c) Now set
g
1
(t) = f(t) when t domf and [x t[ > , 0 otherwise,
g
2
(t) = a when x t < x, b when x < t x +, 0 otherwise,
g
3
= f g
1
g
2
.
Then f = g
1
+g
2
+g
3
; each g
j
is integrable; g
1
is zero on a neighbourhood of x;
sup
tdomg
3
,t=x
[g
3
(t)[ ,
Var
[x,x[
(g
3
) , Var
]x,x+]
(g
3
) .
(d) Consider the three parts g
1
, g
2
, g
3
separately.
(i) For the rst, we have
lim

2
_

e
ixy

g
1
(y)dy = 0
by 283I.
(ii) Next,
1

2
_

e
ixy

g
2
(y)dy =
1

sin(xt)
xt
g
2
(t)dt
(by 283H)
283L Fourier transforms I 427
=
a

_
x
x
sin(xt)
xt
dt +
b

_
x+
x
sin(xt)
xt
dt
=
a

_

0
sin u
u
du +
b

_

0
sin u
u
du
(substituting t = x
1

u in the rst integral, t = x +


1

u in the second)

a+b
2
as
by 283Da.
(iii) As for the third, we have, for any > 0,

2
_

e
ixy

g
3
(y)dy

=
1

sin(xt)
xt
g
3
(t)dt

=
1

sin t
t
g
3
(x t)dt

_
0

sin t
t
g
3
(x t)dt

+
1

_

0
sin t
t
g
3
(x t)dt

_
sup
tdomg
3
]x,x[
[g
3
(t)[ + Var
]x,x[
(g
3
)
+ sup
tdomg
3
]x,x+[
[g
3
(t)[ + Var
]x,x+[
(g
3
)
_
4
K

,
using 224J to bound the integrals in terms of the variation and supremum of g
3
and integrals of
sin t
t
over subintervals.
(e) We therefore have
limsup

2
_

e
ixy

f(y)dy
a+b
2

limsup

e
ixy

g
1
(y)dy

+ limsup

2
_

e
ixy

g
2
(y)dy
a+b
2

+ limsup

e
ixy

g
3
ydy

0 + 0 +
4K


by the calculations in (d). As is arbitrary,
lim

2
_

e
ixy

f(y)dy
a+b
2
= 0.
(f ) This is the rst half of the theorem. But of course the second half follows at once, because
1

2
lim

e
ixy

f(y)dy =
1

2
lim

e
ixy

f(y)dy
=
1

2
lim

e
ixy

f(y)dy =
a+b
2
.
Remark You will see that this argument uses some of the same ideas as those in 282O-282P. It is more direct because
(i) I am not using any concept corresponding to Fejer sums (though a very suitable one is available; see 283Xf) (ii) I
do not trouble to give the result concerning uniform convergence of the Fourier integrals when f is continuous and of
bounded variation (283Xj) (iii) I do not give any pointer to the signicance of the fact that if f is of bounded variation
then sup
yR
[y

f(y)[ < (283Xk).


428 Fourier analysis 283M
283M Corresponding to 282Q, we have the following.
Theorem Let f and g be complex-valued functions which are integrable over R, and f g their convolution product,
dened by setting
(f g)(x) =
_

f(t)g(x t)dt
whenever this is dened (255E). Then
(f g)

(y) =

f(y)

g(y), (f g)

(y) =

f(y)

g(y)
for every y R.
proof For any y,
(f g)

(y) =
1

2
_

e
iyx
(f g)(x)dx
=
1

2
_

e
iy(t+u)
f(t)g(u)dtdu
(using 255G)
=
1

2
_

e
iyt
f(t)dt
_

e
iyu
g(u)du =

f(y)

g(y).
Now, of course,
(f g)

(y) = (f g)

(y) =

f(y)

g(y) =

f(y)

g(y).
283N I show how to compute a special Fourier transform, which will be used repeatedly in the next section.
Lemma For > 0, set

(x) =
1

2
e
x
2
/2
2
for x R. Then its Fourier transform and inverse Fourier transform are

=
1

1/
.
In particular,

1
=
1
.
proof (a) I begin with the special case = 1, using the Maclaurin series
e
iyx
=

k=0
(iyx)
k
k!
and the expressions for
_

x
k
e
x
2
/2
dx from 263.
Fix y R. Writing
g
k
(x) =
(iyx)
k
k!
e
x
2
/2
, h
n
(x) =

n
k=0
g
k
(x), h(x) = e
|yx|x
2
/2
,
we see that
[g
k
(x)[
|yx|
k
k!
e
x
2
/2
,
so that
[h
n
(x)[

k=0
[g
k
(x)[ e
|yx|
e
x
2
/2
= h(x)
for every n; moreover, h is integrable, because [h(x)[ e
|x|
whenever [x[ 2(1 +[y[). Consequently, using Lebesgues
Dominated Convergence Theorem,
283Wa Fourier transforms I 429

1
(y) =
1
2
_

lim
n
h
n
=
1
2
lim
n
_

h
n
=
1
2

k=0
_

g
k
=
1
2

k=0
(iy)
k
k!
_

x
k
e
x
2
/2
dx
=
1
2

j=0
(iy)
2j
(2j)!
(2j)!
2
j
j!

2
(by 263H)
=
1

j=0
(y
2
)
j
2
j
j!
=
1

2
e
y
2
/2
=
1
(y),
as claimed.
(b) For the general case,

(x) =
1

1
(
x

), so that

(y) =
1

1
(y) =
1

1/
(y)
by 283Ce. Of course we now have

(y) =

(y) =
1

1/
(y)
because
1/
is an even function.
283O To lead into the ideas of the next section, I give the following very simple fact.
Proposition Let f and g be two complex-valued functions which are integrable over R. Then
_

g =
_

f g
and
_

g =
_

f g.
proof Of course
_

[e
ixy
f(x)g(y)[dxdy =
_

[f[
_

[g[ < ,
so
_

g =
1

2
_

f(y)e
iyx
g(x)dxdy
=
1

2
_

f(y)e
ixy
g(x)dydx =
_

f g.
For the other half of the proposition, replace every e
ixy
in the argument by e
ixy
.
283W Higher dimensions I oer a series of exercises designed to provide hints on how the work of this section
may be done in the r-dimensional case, where r 1.
(a) Let f be an integrable complex-valued function dened almost everywhere in R
r
. Its Fourier transform is
the function

f : R
r
C dened by the formula

f(y) =
1
(

2)
r
_
e
iy . x
f(x)dx,
writing y. x =
1

1
+. . . +
r

r
for x = (
1
, . . . ,
r
) and y = (
1
, . . . ,
r
) R
r
, and
_
. . . dx for integration with respect
to Lebesgue measure on R
r
. Similarly, the inverse Fourier transform of f is the function

f given by

f(y) =
1
(

2)
r
_
e
iy . x
f(x)dx =

f(y).
Show that, for any integrable complex-valued function f on R
r
,
(i)

f : R
r
C is continuous;
430 Fourier analysis 283Wa
(ii) lim
y

f(y) = 0, writing |y| =



y. y as usual;
(iii) if
_
|x|[f(x)[dx < , then

f is dierentiable, and

f(y) =
i
(

2)
r
_
e
iy . x

j
f(x)dx
for j r, y R
r
, always taking
j
to be the jth coordinate of x R
r
;
(iv) if j r and
f

j
is dened everywhere and is integrable, and if lim
x
f(x) = 0, then (
f

j
)

(y) = i
j

f(y)
for every y R
r
.
(b) Let f be an integrable complex-valued function on R
r
, and

f its Fourier transform. If c d in R


r
, show that
_
[c,d]
f = (
i

2
)
r
lim

1
,... ,
r

_
[a,a]
r

j=1
e
i
j

j
e
i
j

f(y)dy,
setting a = (
1
, . . . ), c = (
1
, . . . ), d = (
1
, . . . ).
(c) Let f be an integrable complex-valued function on R
r
, and

f its Fourier transform. Show that if we write


B

(0, a) = y : [
j
[ a for every j r,
then
1
(

2)
r
_
B

(0,a)
e
ix. y

f(y)dy =
_

a
(t)f(x t)dt
for every a 0, where

a
(t) =
1

r
j=1
sin a
j

j
for t = (
1
, . . . ,
r
) R
r
.
(d) Let f and g be integrable complex-valued functions on R
r
. Show that f

g = (

2)
r
(

f g)

.
(e) For > 0, dene

: R
r
C by setting

(x) =
1
(

2)
r
e
x. x/2
2
for every x R
r
. Show that

=
1

1/
.
(f ) Dening

as in (e), show that lim


0
(f

)(x) = f(x) for every continuous integrable f : R


r
C, x R
r
.
(g) Show that if f : R
r
C is continuous and integrable, and

f also is integrable, then f =

. (Hint: Show that


both are equal at every point to
lim

2)
r
(

= lim

f
1/
.)
(h) Show that
_
R
r
1
1+x
r+1
dx < .
(i) Show that if f : R
r
C can be partially dierentiated r + 1 times, and f and all its partial derivatives

k
f

j
1

j
2
...
j
k
are integrable for k r + 1, then

f is integrable.
(j) Show that if f and g are integrable complex-valued functions on R
r
, then (f g)

= (

2)
r

f

g.
(k) Show that if f and g are integrable complex-valued functions on R
r
, then
_
f

g =
_
f g.
283Xg Fourier transforms I 431
(l) Show that if f
1
, . . . , f
r
are integrable complex-valued functions on R with Fourier transforms g
1
, . . . , g
r
, and we
write f(x) = f
1
(
1
) . . . f
r
(
r
) for x = (
1
, . . . ,
r
) R
r
, then the Fourier transform of f is y g
1
(
1
) . . . g
r
(
r
).
(m)(i) Show that
_
2(k+1)
2k
sin t
t

t
dt > 0 for every k N, and hence that
_

0
sin t
t

t
dt > 0.
(ii) Set f
1
() = 1/
_
[[ for 0 < [[ 1, 0 for other . Show that lim
a
1

a
_
a
a

f
1
()d exists in R and is greater
than 0.
(iii) Construct an integrable function f
2
, zero on some neighbourhood of 0, such that there are innitely many
m N for which [
_
m
m

f
2
()d[
1

m
. (Hint: take f
2
() = 2
k
sin m
k
for k +1 < k +2, for a suciently rapidly
increasing sequence m
k

kN
.)
(iv) Set f(x) = f
1
(
1
)f
2
(
2
) for x R
2
. Show that f is integrable, that f is zero in a neighbourhood of 0, but
that
limsup
a
1
2
[
_
B

(0,a)

f(y)dy[ > 0,
dening B

as in (c).
283X Basic exercises (a) Conrm that the six alternative denitions of the transforms

f,

f oered in 283B all


lead to the same theory; nd the constants involved in the new versions of 283Ch, 283Ci, 283L, 283M and 283N.
(b) If we redened

f(y) to be
_

e
ixy
f(x)dx, what would

f(y) be?
(c) Show that nearly every 2 would disappear from the theorems of this section if we dened a measure on R by
saying that E =
1

2
E for every Lebesgue measurable set E, where is Lebesgue measure, and wrote

f(y) =
_

e
iyx
f(x)(dx),

f(y) =
_

e
iyx
f(x)(dx),
(f g)(x) =
_

f(t)g(x t)(dt).
What is lim
a
_
a
a
sin t
t
(dt)?
>>>(d) Let f be an integrable complex-valued function on R, with Fourier transform

f. Show that (i) if g(x) = f(x)


whenever this is dened, then

g(y) =

f(y) for every y R; (ii) if g(x) = f(x) whenever this is dened, then

g(y) =

f(y) for every y.


(e) Let f be an integrable complex-valued function on R, with Fourier transform

f. Show that
_
d
c

f(y)dy =
i

2
_

e
idx
e
icx
x
f(x)dx
whenever c d in R.
>>>(f ) For an integrable complex-valued function f on R, let its Fejer integrals be

c
(x) =
1
c

2
_
c
0
__
a
a
e
ixy

f(y)dy
_
da
for c > 0. Show that

c
(x) =
1

1cos ct
ct
2
f(x t)dt.
(g) Show that
_

1cos at
at
2
dt = for every a > 0. (Hint: integrate by parts and use 283Da.) Show that
lim
a
_

1cos at
at
2
dt = lim
a
sup
t
1cos at
at
2
= 0
for every > 0.
432 Fourier analysis 283Xh
(h) Let f be an integrable complex-valued function on R, and dene its Fejer integrals
a
as in 283Xf above. Show
that if x R, c C are such that
lim
0
1

0
[f(x +t) +f(x t) 2c[dt = 0,
then lim
a

a
(x) = c. (Hint: adapt the argument of 282H.)
>>>(i) Let f be an integrable complex-valued function on R, and dene its Fejer integrals
a
as in 283Xf above. Show
that f(x) = lim
a

a
(x) for almost every x R.
(j) Let f : R C be a continuous integrable complex-valued function of bounded variation, and dene its Fejer
integrals
a
as in 283Xf above. Show that f(x) = lim
a

a
(x) uniformly for x R.
>>>(k) Let f be an integrable complex-valued function of bounded variation on R, and

f its Fourier transform. Show


that sup
yR
[y

f(y)[ < .
(l) Let f and g be integrable complex-valued functions on R. Show that f

g =

2(

f g)

.
(m) Let f be an integrable complex-valued function on R, and x x R. Set

f
x
(y) =
_

f(t) cos y(x t)dt


for y R. Show that
(i) if f is dierentiable at x,
f(x) =
1

lim
a
_
a
0

f
x
(y)dy;
(ii) if there is a neighbourhood of x in which f has bounded variation, then
1

lim
a
_
a
0

f
x
(y)dy =
1
2
(lim
tdomf,t0
f(t) + lim
tdomf,t0
f(t));
(iii) if f is twice dierentiable and f

, f

are integrable then



f
x
is integrable and f(x) =
1

f
x
. (The formula
f(x) =
1

0
__

f(t) cos y(x t)dt


_
dy,
valid for such functions f, is called Fouriers integral formula.)
(n) Show that if f is a complex-valued function of bounded variation, dened almost everywhere in R, and converging
to 0 at , then
g(y) =
1

2
lim
a
_
a
a
e
iyx
f(x)dx
is dened in C for every y ,= 0, and that the limit is uniform in any region bounded away from 0.
(o) Let f be an integrable complex-valued function on R. Set

f
c
(y) =
1

2
_

cos yxf(x)dx,

f
s
(y) =
1

2
_

sin yxf(x)dx
for y R. Show that
1

2
_
a
a
e
ixy

f(y)dy =
_
2

_
a
0
cos xy

f
c
(y)dy +
_
2

_
a
0
sin xy

f
s
(y)dy
for every x R, a 0.
(p) Use the fact that
_
a
0
_

0
e
xy
sin y dxdy =
_

0
_
a
0
e
xy
sin y dydx whenever a 0 to show that lim
a
_
a
0
sin y
y
dy =
_

0
1
1+x
2
dx.
(q) Let f : R C be an integrable function which is absolutely continuous on every bounded interval, and suppose
that its derivative f

is of bounded variation on R. Show that

f is integrable and that f =

. (Hint: 283Ci, 283Xk.)


283 Notes Fourier transforms I 433
>>>(r) Show that if f(x) = e
|x|
, where > 0, then

f(y) =
2

2(
2
+y
2
)
. Hence, or otherwise, nd the Fourier
transform of y
1
1+y
2
.
(s) Find the inverse Fourier transform of the characteristic function of a bounded interval in R. Show that in a
formal sense 283F can be regarded as a special case of 283O.
(t) Let f be a non-negative integrable function on R, with Fourier transform

f. Show that

n
j=0

n
k=0
a
j
a
k

f(y
j

y
k
) 0 whenever y
0
, . . . , y
n
in R and a
0
, . . . , a
n
C.
(u) Let f be an integrable complex-valued function on R. Show that

f(x) =

n=
f(x +2n) is dened in C for
almost every x. (Hint:

n=
_

[f(x + 2n)[dx < .) Show that



f is periodic. Show that the Fourier coecients
of

f ], ] are
1

f(k)
kZ
.
283Y Further exercises (a) Show that if f : R C is absolutely continuous in every bounded interval, f

is of
bounded variation on R, and lim
x
f(x) = lim
x
f(x) = 0, then
g(y) =
1

2
lim
a
_
a
a
e
iyx
f(x)dx =
i
y

2
lim
a
_
a
a
e
iyx
f

(x)dx
is dened, with
y
2
[g(y)[
4

2
Var
R
(f

),
for every y ,= 0.
(b) Let f : R [0, [ be an even function such that f is convex on [0, [ and lim
x
f(x) = 0.
(i) Show that, for any y > 0, k N,
_
2k/y
2k/y
e
iyx
f(x)dx 0.
(ii) Show that g(y) =
1

2
lim
a
_
a
a
e
iyx
f(x)dx exists in [0, [ for every y ,= 0.
(iii) For n N, set f
n
(x) = e
|x|/(n+1)
f(x) for every x. Show that f
n
is integrable and convex on [0, [.
(iv) Show that g(y) = lim
n

f
n
(y) for every y ,= 0.
(vi) Show that if f is integrable then
_
a
a

f =
4

2
_

0
sin at
t
f(t)dt
4a

2
_
/a
0
f 2

2f(1)
for every a 0. Hence show that whether f is integrable or not, g is integrable and f
n
= (

f
n
)

for every n.
(vii) Show that lim
a0
sup
nN
_
a
a

f
n
= 0.
(viii) Show that if f

is bounded (on its domain) then

f
n
: n N is uniformly integrable (hint: use (vii) and
283Ya), so that lim
n
|

f
n
g|
1
= 0 and f =

g.
(ix) Show that if f

is unbounded then for every > 0 we can nd h


1
, h
2
: R [0, [, both even, convex and
converging to 0 at , such that f = h
1
+h
2
, h

1
is bounded,
_
h
2
and h
2
(1) . Hence show that in this case also
f =

g.
(c) Suppose that f : R R is even, twice dierentiable and convergent to 0 at , that f

is continuous and that


x : f

(x) = 0 is bounded in R. Show that f is the Fourier transform of an integrable function. (Hint: use 283Yb
and 283Xq.)
(d) Let g : R R be an odd function of bounded variation such that
_

1
1
x
g(x)dx = . Show that g cannot be
the Fourier transform of any integrable function f. (Hint: show that if g =

f then
_
1
0
f =
2i

2
lim
a
_
a
0
1cos x
x
g(x)dx = .)
283 Notes and comments I have tried in this section to give the elementary theory of Fourier transforms of integrable
functions on R, with an eye to the extension of the concept which will be attempted in the next section. Following
282, I have given prominence to two theorems (283I and 283L) describing conditions for the inversion of the Fourier
434 Fourier analysis 283 Notes
transform to return to the original function; we nd ourselves looking at improper integrals lim
a
_
a
a
, just as earlier
we needed to look at symmetric sums lim
n

n
k=n
. I do not go quite so far as in 282, and in particular I leave
the study of square-integrable functions for the moment, since their Fourier transforms may not be describable by the
simple formulae used here.
One of the most fundamental obstacles in the subject is the lack of any eective criteria for determining which
functions are the Fourier transforms of integrable functions. (Happily, things are better for square-integrable functions;
see 284O-284P.) In 283Yb-283Yc I sketch an argument showing that ordinary non-oscillating even functions which
converge to 0 at are Fourier transforms of integrable functions. Strikingly, this is not true of odd functions; thus
y
1
ln(e+y
2
)
is the Fourier transform of an integrable function, but y
arctan y
ln(e+y
2
)
is not (283Yd).
In 283W I sketch the corresponding theory of Fourier transforms in R
r
. There are few surprises. One point to note
is that where in the one-dimensional case we ask for a well-behaved second derivative, in the r-dimensional case we may
need to dierentiate r + 1 times (283Wi). Another is that we lose the localization principle. In the one-dimensional
case, if f is integrable and zero on an interval ]c, d[, then lim
a
_
a
a
e
ixy

f(y)dy = 0 for every x ]c, d[; this is
immediate from either 283I or 283L. But in higher dimensions the most natural formulation of a corresponding result
is false (283Wm).
284 Fourier transforms II
The basic paradox of Fourier transforms is the fact that while for certain functions (see 283J-283K) we have (

f)

= f,
ordinary integrable functions f (for instance, the characteristic functions of non-trivial intervals) give rise to non-
integrable Fourier transforms

f for which there is no direct denition available for

, making it a puzzle to decide in


what sense the formula f =

might be true. What now seems by far the most natural resolution of the problem lies
in declaring the Fourier transform to be an operation on distributions rather than on functions. I shall not attempt
to describe this theory properly (almost any book on Distributions will cover the ground better than I can possibly
do here), but will try to convey the fundamental ideas, so far as they are relevant to the questions dealt with here, in
language which will make the transition to a fuller treatment straightforward. At the same time, these methods make
it easy to prove strong versions of the classical theorems concerning Fourier transforms.
284A Test functions: Denition Throughout this section, a rapidly decreasing test function or Schwartz
function will be a function h : R C such that h is smooth, that is, dierentiable everywhere any nite number of
times, and moreover
sup
xR
[x[
k
[h
(m)
(x)[ <
for all k, m N, writing h
(m)
for the mth derivative of h.
284B The following elementary facts will be useful.
Lemma (a) If g and h are rapidly decreasing test functions, so are g +h and ch, for any c C.
(b) If h is a rapidly decreasing test function and y R, then x h(y x) is a rapidly decreasing test function.
(c) If h is any rapidly decreasing test function, then h and h
2
are integrable.
(d) If h is a rapidly decreasing test function, so is its derivative h

.
(e) If h is a rapidly decreasing test function, so is the function x xh(x).
(f) For any > 0, the function x e
x
2
is a rapidly decreasing test function.
proof (a) is trivial.
(b) Write g(x) = h(y x) for x R. Then g
(m)
(x) = (1)
m
h
(m)
(y x) for every m, so g is smooth. For any k N,
[x[
k
2
k
([y[
k
+[y x[
k
)
for every x, so
sup
xR
[x[
k
[g
(m)
(x)[ = sup
xR
[x[
k
[h
(m)
(y x)[
2
k
[y[
k
sup
xR
[h
(m)
(y x)[ + 2
k
sup
xR
[y x[
k
[h
(m)
(y x)[
= 2
k
[y[
k
sup
xR
[h
(m)
(x)[ + 2
k
sup
xR
[x[
k
[h
(m)
(x)[ < .
284E Fourier transforms II 435
(c) Because
M = sup
xR
[h(x)[ +x
2
[h(x)[
is nite, we have
_
[h[
_
M
1+x
2
dx < .
Of course we now have [h
2
[ M[h[, so h
2
also is integrable.
(d) This is immediate from the denition, as every derivative of h

is a derivative of h.
(e) Setting g(x) = xh(x), g
(m)
(x) = xh
(m)
(x) +mh
(m1)
(x) for m 1, so
sup
xR
[x
k
g
(m)
(x)[ sup
xR
[x
k+1
h
(m)
(x)[ +msup
xR
[x
k
h
(m1)
(x)[
is nite, for all k N, m 1.
(f ) If h(x) = e
x
2
, then for each m N we have h
(m)
(x) = p
m
(x)h(x), where p
0
(x) = 1 and p
m+1
(x) =
p

m
(x) 2xp
m
(x), so that p
m
is a polynomial. Because e
x
2

k+1
x
2k+2
/(k + 1)! for all x, k 0,
lim
|x|
[x[
k
h(x) = lim
x
x
k
/e
x
2
= 0
for every k, and lim
|x|
p(x)h(x) = 0 for every polynomial p; consequently
lim
|x|
x
k
h
(m)
(x) = lim
|x|
x
k
p
m
(x)h(x) = 0
for all k, m, and h is a rapidly decreasing test function.
284C Proposition Let h : R C be a rapidly decreasing test function. Then

h : R C and

h : R C are
rapidly decreasing test functions, and

= h.
proof (a) Let k, m N. Then sup
xR
([x[
m
+ [x[
m+2
)[h
(k)
(x)[ < and
_

[x
m
h
(k)
(x)[dx < . We may therefore
use 283Ch-283Ci to see that y i
k+m
y
k

h
(m)
(y) is the Fourier transform of x x
m
h
(k)
(x), and therefore that
lim
|y|
y
k

h
(m)
(y) = 0, by 283Cg, so that (because

h
(m)
is continuous) sup
yR
[y
k

h
(m)
(y)[ is nite. As k and m are
arbitrary,

h is a rapidly decreasing test function.


(b) Since

h(y) =

h(y) for every y, it follows at once that

h is a rapidly decreasing test function.


(c) By 283J, it follows from (a) and (b) that

= h.
284D Denition I will use the phrase tempered function on R to mean a measurable complex-valued function
f, dened almost everywhere in R, such that
_

1
1+|x|
k
[f(x)[dx <
for some k N.
284E As in 284B I spell out some elementary facts.
Lemma (a) If f and g are tempered functions, so are [f[, f +g and cf, for any c C.
(b) If f is a tempered function then it is integrable over any bounded interval.
(c) If f is a tempered function and x R, then t f(x +t) and t f(x t) are both tempered functions.
proof (a) is elementary; if
_

1
1+|x|
j
f(x)dx < ,
_

1
1+|x|
k
g(x)dx < ,
then
_

1
1+|x|
j+k
[(f +g)(x)[ <
because
1 +[x[
j+k
max(1, [x[
j+k
) max(1, [x[
j
, [x[
k
)
1
2
max(1 +[x[
j
, 1 +[x[
k
)
436 Fourier analysis 284E
for all x.
(b) If
_

1
1+|x|
k
[f(x)[dx = M < ,
then for any a b
_
b
a
[f[ M(1 +[a[
k
+[b[
k
)(b a) < .
(c) The idea is the same as in 284Bb. If k N is such that
_

1
1+|t|
k
[f(t)[dt = M < ,
then we have
1 +[x +t[
k
2
k
(1 +[x[
k
)(1 +[t[
k
)
so that
1
1+|t|
k
2
k
(1 +[x[
k
)
1
1+|x+t|
k
for every t, and
_

|f(x+t)|
1+|t|
k
dt 2
k
(1 +[x[
k
)
_

|f(x+t)|
1+|x+t|
k
dt 2
k
(1 +[x[
k
)M < .
Similarly,
_

|f(xt)|
1+|t|
k
dt 2
k
(1 +[x[
k
)M < .
284F Linking the two concepts, we have the following.
Lemma Let f be a tempered function on R and h a rapidly decreasing test function. Then f h is integrable.
proof Of course f h is measurable. Let k N be such that
_

1
1+|x|
k
[f(x)[dx < . There is a M such that
(1 +[x[
k
)[h(x)[ M for every x R, so that
_

[f h[ M
_

1
1+|x|
k
[f(x)[dx < .
284G Lemma Suppose that f
1
and f
2
are tempered functions and that
_
f
1
h =
_
f
2
h for every rapidly
decreasing test function h. Then f
1
=
a.e.
f
2
.
proof (a) Set g = f
1
f
2
; then
_
g h = 0 for every rapidly decreasing test function h. Of course g is a tempered
function, so is integrable over any bounded interval. By 222D, it will be enough if I can show that
_
b
a
g = 0 whenever
a < b, since then we shall have g = 0 a.e. on every bounded interval and f
1
=
a.e.
f
2
.
(b) Consider the function
0
(x) = e
1/x
for x > 0. Then
0
is dierentiable arbitrarily often everywhere in ]0, [,
0 <
0
(x) < 1 for every x > 0, and lim
x

0
(x) = 1. Moreover, writing
(m)
0
for the mth derivative of
0
,
lim
x0

(m)
0
(x) = lim
x0
1
x

(m)
0
(x) = 0
for every m N. PPP (Compare 284Bf.) We have
(m)
0
(x) = p
m
(
1
x
)
0
(x), where p
0
(t) = 1 and p
m+1
(t) = t
2
(p
m
(t)
p

m
(t)), so that p
m
is a polynomial for each m N. Now for any k N,
lim
t
t
k
e
t
lim
t
(k+1)!t
k
t
k+1
= 0,
so
lim
x0

(m)
0
(x) = lim
t
p
m
(t)e
t
= 0,
lim
x0
1
x

(m)
0
(x) = lim
t
tp
m
(t)e
t
= 0. QQQ
284Ie Fourier transforms II 437
(c) Consequently, setting (x) = 0 for x 0, e
1/x
for x > 0, is smooth, with mth derivative

(m)
(x) = 0 for x 0,
(m)
(x) =
(m)
0
(x) for x > 0.
(The proof is an easy induction on m.) Also 0 (x) 1 for every x R, and lim
x
(x) = 1.
(d) Now take any a < b, and for n N set

n
(x) = (n(x a))(n(b x)).
Then
n
will be smooth and
n
(x) = 0 if x / ]a, b[, so surely
n
is a rapidly decreasing test function, and
_

g
n
= 0.
Next, 0
n
(x) 1 for every x, n, and if a < x < b then lim
n

n
(x) = 1. So
_
b
a
g =
_
g (]a, b[) =
_
g (lim
n

n
) = lim
n
_
g
n
= 0,
using Lebesgues Dominated Convergence Theorem. As a and b are arbitrary, g = 0 a.e., as required.
284H Denition Let f and g be tempered functions in the sense of 284D. Then I will say that g represents the
Fourier transform of f if
_

g h =
_

h
for every rapidly decreasing test function h.
284I Remarks (a) As usual, when shifting denitions in this way, we have some checking to do. If f is an integrable
complex-valued function on R,

f its Fourier transform, then surely

f is a tempered function, being a bounded continuous


function; and if h is any rapidly decreasing test function, then
_
f h =
_
f

h by 283O. Thus

f represents the
Fourier transform of f in the sense of 284H above.
(b) Note also that 284G assures us that if g
1
, g
2
are two tempered functions both representing the Fourier transform
of f, then g
1
=
a.e.
g
2
, since we must have
_
g
1
h =
_
f

h =
_
g
2
h
for every rapidly decreasing test function h.
(c) Of course the value of this indirect approach is that we can assign Fourier transforms, in a sense, to many more
functions. But we must note at once that if g represents the Fourier transform of f then so will any function equal
almost everywhere to g; we can no longer expect to be able to speak of the Fourier transform of f as a function. We
could say that the Fourier transform of f is a functional on the space of rapidly decreasing test functions, dened
by setting (h) =
_
f

h; alternatively, we could say that the Fourier transform of f is a member of L


0
C
, the space
of equivalence classes of almost-everywhere-dened measurable functions (241J).
(d) It is now natural to say that g represents the inverse Fourier transform of f just when f represents
the Fourier transform of g; that is, when
_
f h =
_
g

h for every rapidly decreasing test function h. Because

= h for every such h, this is the same thing as saying that


_
f

h =
_
g h for every rapidly decreasing test
function h, which is the other natural expression of what it might mean to say that g represents the inverse Fourier
transform of f.
(e) If f, g are tempered functions and we write

g(x) = g(x) whenever this is dened, then

g will also be a
tempered function, and we shall always have
_

h =
_
g(x)

h(x)dx =
_
g(x)

h(x)dx =
_
g

h,
so that
g represents the Fourier transform of f

_
g h =
_
f

h for every test function h

_
g

h =
_
f

for every h

_

g

h =
_
f h for every h
438 Fourier analysis 284Ie


g represents the inverse Fourier transform of f.
Combining this with (d), we get
g represents the Fourier transform of f

f = f represents the inverse Fourier transform of g

f represents the Fourier transform of g.


(f ) Yet again, I ought to spell out the check: if f is integrable and

f is its inverse Fourier transform as dened in


283Ab, then
_
f

h =
_
f

=
_
f h
for every rapidly decreasing test function h, so

f represents the inverse Fourier transform of f in the sense given here.


284J Lemma Let f be any tempered function and h a rapidly decreasing test function. Then f h, dened by the
formula
(f h)(y) =
_

f(t)h(y t)dt,
is dened everywhere.
proof Take any y R. By 284Bb, t h(y t) is a rapidly decreasing test function, so the integral is always dened
in C, by 284F.
284K Proposition Let f and g be tempered functions such that g represents the Fourier transform of f, and h a
rapidly decreasing test function.
(a) The Fourier transform of the integrable function f h is
1

2
g

h, where g

h is the convolution of g and

h.
(b) The Fourier transform of the continuous function f h is represented by the product

2g

h.
proof (a) Of course f h is integrable, by 284F, while g

h is dened everywhere, by 284C and 284J.


Fix y R. Set (x) =

h(y x) for x R; then is a rapidly decreasing test function because

h is (284Bb). Now

(t) =
1

2
_

e
itx

h(y x)dx =
1

2
_

e
it(yx)

h(x)dx
=
1

2
e
ity
_

e
itx

h(x)dx = e
ity

(t) = e
ity
h(t),
using 284C. Accordingly
(f h)

(y) =
1

2
_

e
ity
f(t)h(t)dt
=
1

2
_

f(t)

(t)dt =
1

2
_

g(t)(t)dt
(because g represents the Fourier transform of f)
=
1

2
_

g(t)

h(y t)dt =
1

2
(g

h)(y).
As y is arbitrary,
1

2
g

h is the Fourier transform of f h.


(b) Write for the Fourier transform of g

h,

f (x) = f(x) when this is dened, and

h(x) = h(x) for every


x, so that

f represents the Fourier transform of g, by 284Ie, and

h is the Fourier transform of

h. By (a), we have
=
1

h. This means that the inverse Fourier transform of

2g

h must be

= (

h)

; and as
284L Fourier transforms II 439
(

h)

(y) = (

h)(y)
=
_

f (t)

h(y t)dt
=
_

f(t)h(y +t)dt
=
_

f(t)h(y t)dt = (f h)(y),


the inverse Fourier transform of

2g

h is f h (which is therefore continuous), and

2g

h must represent the


Fourier transform of f h.
Remark Compare 283M. It is typical of the theory of Fourier transforms that we have formulae valid in a wide variety
of contexts, each requiring a dierent interpretation and a dierent proof.
284L We are now ready for a result corresponding to 282H. I use a dierent method, or at least a dierent
arrangement of the ideas, through the following fact, which is important in other ways.
Proposition Let f be any tempered function. Writing

(x) =
1

2
e
x
2
/2
2
for x R and > 0, then
lim
0
(f

)(x) = c
whenever x R and c C are such that
lim
0
1

0
[f(x +t) +f(x t) 2c[dt = 0.
proof (a) By 284Bf, every

is a rapidly decreasing test function, so that f

is dened everywhere, by 284J. We


need to know that
_

= 1; this is because (substituting u = x/)


_

=
1

2
_

e
u
2
/2
du = 1,
by 263G. The argument now follows the lines of 282H. Set
(t) = [f(x +t) +f(x t) 2c[
when this is dened, which is almost everywhere, and (t) =
_
t
0
, dened for all t 0 because f is integrable over
every bounded interval (284Eb). We have
[(f

)(x) c[ = [
_

f(x t)

(t)dt c
_

(t)dt[
= [
_
0

(f(x t) c)

(t)dt +
_

0
(f(x t) c)

(t)dt[
= [
_

0
(f(x +t) c)

(t)dt +
_

0
(f(x t) c)

(t)dt[
(because

is an even function)
= [
_

0
(f(x +t) +f(x t) 2c)

(t)dt[

_

0
[f(x +t) +f(x t) 2c[

(t)dt =
_

0

.
(b) I should explain why this last integral is nite. Because f is a tempered function, so are the functions t f(x+t),
t f(x t) (284Ec); of course constant functions are tempered, so t (t) = [f(x +t) +f(x t) 2c[ is tempered,
and because

is a rapidly decreasing test function we may apply 284F to see that the product is integrable.
(c) Let > 0. By hypothesis, lim
t0
(t)/t = 0; let > 0 be such that (t) t for every t [0, ]. Take any
]0, ]. I break the integral
_

up into three parts.


(i) For the integral from 0 to , we have
440 Fourier analysis 284L
_

0


_

0
1

2
=
1

2
()

2
,
because

(t)
1

2
for every t.
(ii) For the integral from to , we have
_

2
_

(t)
2
2
t
2
dt
(because e
t
2
/2
2
= 1/e
t
2
/2
2
1/(t
2
/2
2
) = 2
2
/t
2
for every t ,= 0)
=
_
2

(t)
t
2
dt =
_
2

_
()

2

()

+
_

2(t)
t
3
dt
_
(integrating by parts see 225F)

_

+
_

2
t
2
dt
_
(because (t) t for 0 t and
_
2/ 1)

_

+
2

_
3.
(iii) For the integral from to , we have
_

=
1

2
_

(t)
e
t
2
/2
2

dt.
Now for any t ,

1

e
t
2
/2
2
: ]0, ] R
is monotonically increasing, because its derivative
d
d
1

e
t
2
/2
2
=
1

2
_
t
2

2
1
_
e
t
2
/2
2
is positive, and
lim
0
1

e
t
2
/2
2
= lim
a
ae
a
2
t
2
/2
= 0.
So we may apply Lebesgues Dominated Convergence Theorem to see that
lim
n
_

(t)
e
t
2
/2
2
n

n
dt = 0
whenever
n

nN
is a sequence in ]0, ] converging to 0, so that
lim
0
_

(t)
e
t
2
/2
2

dt = 0.
There must therefore be a
0
]0, ] such that
_


for every
0
.
Putting these together, we see that
[(f

)(x) c[
_

+ 3 + = 5
whenever 0 <
0
. As is arbitrary, lim
0
(f

)(x) = c, as claimed.
284N Fourier transforms II 441
284M Theorem Let f and g be tempered functions such that g represents the Fourier transform of f. Then
(a)(i) g(y) = lim
0
1

2
_

e
iyx
e
x
2
f(x)dx for almost every y R.
(ii) If y R is such that a = lim
tdomg,ty
g(t) and b = lim
tdomg,ty
g(t) are both dened in C, then
lim
0
1

2
_

e
iyx
e
x
2
f(x)dx =
1
2
(a +b).
(b)(i) f(x) = lim
0
1

2
_

e
ixy
e
y
2
g(y)dy for almost every x R.
(ii) If x R is such that a = lim
tdomf,tx
f(t) and b = lim
tdomf,tx
f(t) are both dened in C, then
lim
0
1

2
_

e
ixy
e
y
2
g(y)dy =
1
2
(a +b).
proof (a)(i) By 223D,
lim
0
1
2
_

[g(y +t) g(y)[dt = 0


for almost every y R, because g is integrable over any bounded interval. Fix any such y. Set (t) = [g(y +t) +g(y
t) 2g(y)[ whenever this is dened. Then, as in 282Ia,
_

0

_

[g(y +t) g(y)[dt,


so lim
0
1

0
= 0. Consequently, by 284L,
g(y) = lim

(g
1/
)(y).
We know from 283N that the Fourier transform of

is
1

1/
for any > 0. Accordingly, by 284K, g
1/
is the
Fourier transform of

2f

, that is,
(g
1/
)(y) =
_

e
iyx

(x)f(x)dx.
So
g(y) = lim

e
iyx

(x)f(x)dx
= lim

2
_

e
iyx
e
x
2
/2
2
f(x)dx
= lim
0
1

2
_

e
iyx
e
x
2
f(x)dx.
And this is true for almost every y.
(ii) Again, setting c =
1
2
(a+b), (t) = [g(y+t)+g(yt)2c[ whenever this is dened, we have lim
tdom,t0
(t) =
0, so of course lim
0
1

0
= 0, and
c = lim

(g
1/
)(y) = lim
0
1

2
_

e
iyx
e
x
2
f(x)dx
as before.
(b) This can be shown by similar arguments; or it may be actually deduced from (a), by observing that x

f (x) =
f(x) represents the Fourier transform of g (see 284Id), and applying (a) to g and

f .
284N L
2
spaces We are now ready for results corresponding to 282J-282K.
Lemma Let L
2
C
be the space of square-integrable complex-valued functions on R, and S the space of rapidly decreasing
test functions. Then for every f L
2
C
and > 0 there is an h S such that |f h|
2
.
proof Set (x) = e
1/x
for x > 0, zero for x 0; recall from the proof of 284G that is smooth. For any a < b, the
functions
x
n
(x) = (n(x a))(n(b x))
provide a sequence of test functions converging to ]a, b[ from below, so (as in 284G)
442 Fourier analysis 284N
inf
hS
|]a, b[ h|
2
2
lim
n
_
b
a
[1
n
[
2
= 0.
Because S is a linear space (284Ba), it follows that for every step-function g with bounded support and every > 0
there is an h S such that |g h|
2

1
2
. But we know from 244H that for every f L
2
C
and > 0 there is a
step-function g with bounded support such that |f g|
2

1
2
; so there must be an h S such that
|f h|
2
|f g|
2
+|g h|
2
.
As f and are arbitrary, we have the result.
284O Theorem (a) Let f be any complex-valued function which is square-integrable over R. Then f is a tempered
function and its Fourier transform is represented by another square-integrable function g, and |g|
2
= |f|
2
.
(b) If f
1
and f
2
are complex-valued functions, square-integrable over R, with Fourier transforms represented by
functions g
1
, g
2
, then
_

f
1


f
2
=
_

g
1
g
2
.
(c) If f
1
and f
2
are complex-valued functions, square-integrable over R, with Fourier transforms represented by
functions g
1
, g
2
, then the integrable function f
1
f
2
has Fourier transform
1

2
g
1
g
2
.
(d) If f
1
and f
2
are complex-valued functions, square-integrable over R, with Fourier transforms represented by
functions g
1
, g
2
, then

2g
1
g
2
represents the Fourier transform of the continuous function f
1
f
2
.
proof (a)(i) Consider rst the case in which f is a rapidly decreasing test function and g is its Fourier transform; we
know that g is also a rapidly decreasing test function, and that f is the inverse Fourier transform of g (284C). Now
the complex conjugate g of g is given by the formula
g(y) =
1

2
_

e
iyx
f(x)dx =
1

2
_

e
iyx
f(x)dx,
so that g is the inverse Fourier transform of f. Accordingly
_
f f =
_

g f =
_
g

f =
_
g g,
using 283O for the middle equality.
(ii) Now suppose that f L
2
C
. I said that f is a tempered function; this is simply because
_

_
1
1+|x|
_
2
dx < ,
so
_

|f(x)|
1+|x|
dx <
(244Eb). By 284N, there is a sequence f
n

nN
of rapidly decreasing test functions such that lim
n
|f f
n
|
2
= 0.
By (i),
lim
m,n
|

f
m

f
n
|
2
= lim
m,n
|f
m
f
n
|
2
= 0,
and the sequence

nN
of equivalence classes is a Cauchy sequence in L
2
C
. Because L
2
C
is complete (244G),

nN
has a limit in L
2
C
, which is representable as g

for some g L
2
C
. Like f, g must be a tempered function. Of course
|g|
2
= lim
n
|

f
n
|
2
= lim
n
|f
n
|
2
= |f|
2
.
Now if h is any rapidly decreasing test function, h L
2
C
(284Bc), so we shall have
_
g h = lim
n
_
f
n
h = lim
n
_
f
n

h =
_
f

h.
So g represents the Fourier transform of f.
(b) Of course any functions representing the Fourier transforms of f
1
and f
2
must be equal almost everywhere to
square-integrable functions, and therefore square-integrable, with the right norms. It follows as in 282K (part (d) of
the proof) that if g
1
, g
2
represent the Fourier transforms of f
1
, f
2
, so that ag
1
+ bg
2
represents the Fourier transform
of af
1
+bf
2
and |ag
1
+bg
2
|
2
= |af
1
+bf
2
|
2
for all a, b C, we must have
_
f
1
f
2
= (f
1
[f
2
) = (g
1
[g
2
) =
_
g
1
g
2
.
(c) Of course f
1
f
2
is integrable because it is the product of two square-integrable functions (244E).
284Qb Fourier transforms II 443
(i) Let y R and set f(x) = f
2
(x)e
iyx
for x R. Then f L
2
C
. We need to know that the Fourier transform of
f is represented by g, where g(u) = g
2
(y u). PPP Let h be a rapidly decreasing test function. Then
_
g h =
_
g
2
(y u)h(u)du =
_
g
2
(u)h(y u)du
=
_
g
2
h
1
=
_
f
2

h
1
,
where h
1
(u) = h(y u). To compute

h
1
, we have

h
1
(v) =
1

2
_

e
ivu
h
1
(u)du =
1

2
_

e
ivu
h(y u)du
=
1

2
_

e
ivu
h(y u)du =
1

2
_

e
iv(yu)
h(u)du = e
ivy

h(v).
So
_
g h =
_
f
2

h
1
=
_
f
2
(v)

h
1
(v)dv
=
_
f
2
(v)e
ivy

h(v)dv =
_
f

h :
as h is arbitrary, g represents the Fourier transform of f. QQQ
(ii) We now have
(f
1
f
2
)

(y) =
1

2
_

e
iyx
f
1
(x)f
2
(x)dx
=
1

2
_

f
1


f =
1

2
_

g
1
g
(using part (b))
=
1

2
_

g
1
(u)g
2
(y u)du =
1

2
(g
1
g
2
)(y).
As y is arbitrary, (f
1
f
2
)

=
1

2
g
1
g
2
, as claimed.
(d) By (c), the Fourier transform of

2g
1
g
2
is

f
1

f
2
, writing

f
1
(x) = f
1
(x), so that

f
1
represents the Fourier
transform of g
1
. So the inverse Fourier transform of

2g
1
g
2
is (

f
1

f
2
)

. But, just as in the proof of 284Kb,


(

f
1

f
2
)

= f
1
f
2
, so f
1
f
2
is the inverse Fourier transform of

2g
1
g
2
, and

2g
1
g
2
represents the Fourier
transform of f
1
f
2
, as claimed. Also f
1
f
2
, being the Fourier transform of an integrable function, is continuous
(283Cf; see also 255K).
284P Corollary Writing L
2
C
for the Hilbert space of equivalence classes of square-integrable complex-valued func-
tions on R, we have a linear isometry T : L
2
C
L
2
C
given by saying that T(f

) = g

whenever f, g L
2
C
and g
represents the Fourier transform of f.
284Q Remarks (a) 284P corresponds, of course, to 282K, where the similar isometry between
2
C
(Z) and L
2
C
(], ])
is described. In that case there was a marked asymmetry which is absent from the present situation; because the relevant
measure on Z, counting measure, gives non-zero mass to every point, members of
2
C
are true functions, and it is not
surprising that we have a straightforward formula for S(f

)
2
C
for every f L
2
C
(], ]). The diculty of describing
S
1
:
2
C
(Z) L
2
C
(], ]) is very similar to the diculty of describing T : L
2
C
(R) L
2
C
(R) and its inverse. 284Yg and
286U-286V show just how close this similarity is.
(b) I have spelt out parts (c) and (d) of 284O in detail, perhaps in unnecessary detail, because they give me an
opportunity to insist on the dierence between

2g
1
g
2
represents the Fourier transform of f
1
f
2
and
1

2
g
1
g
2
is the Fourier transform of f
1
f
2
. The actual functions g
1
and g
2
are not well-dened by the hypothesis that they
444 Fourier analysis 284Qb
represent the Fourier transforms of f
1
and f
2
, though their equivalence classes g

1
, g

2
L
2
C
are. So the product g
1
g
2
is also not uniquely dened as a function, though its equivalence class (g
1
g
2
)

= g

1
g

2
is well-dened as a member
of L
1
C
. However the continuous function g
1
g
2
is unaected by changes to g
1
and g
2
on negligible sets, so is well dened
as a function; and since f
1
f
2
is integrable, and has a true Fourier transform, it is to be expected that (f
1
f
2
)

should be exactly equal to


1

2
g
1
g
2
.
(c) Of course 284Oc-284Od also exhibit a characteristic feature of arguments involving Fourier transforms, the
extension by continuity of relations valid for test functions.
(d) 284Oa is a version of Plancherels theorem. The formula |f|
2
= |

f|
2
is Parsevals identity.
284R Diracs delta function Consider the tempered function R with constant value 1. In what sense, if any,
can we assign a Fourier transform to R?
If we examine
_
R

h, as suggested in 284H, we get


_

h =
_

h =

(0) =

2h(0)
for every rapidly decreasing test function h. Of course there is no function g such that
_
g h =

2h(0) for every


rapidly decreasing test function h, since (using the arguments of 284G) we should have to have
_
b
a
g =

2 whenever
a < 0 < b, so that the indenite integral of g could not be continuous at 0. However there is a measure on R with
exactly the right property, the Dirac measure
0
concentrated at 0; this is a Radon probability measure (257Xa), and
_
hd
0
= h(0) for every function h dened at 0. So we shall have
_

h =

2
_
hd
0
for every rapidly decreasing test function h, and we can reasonably say that the measure =

2
0
represents the
Fourier transform of R.
We note with pleasure at this point that
1

2
_
e
ixy
(dy) = 1
for every x R, so that R can be called the inverse Fourier transform of .
If we look at the formulae of Theorem 284M, we get ideas consistent with this pairing of R with . We have
1

2
_

e
iyx
e
x
2
R(x)dx =
1

2
_

e
iyx
e
x
2
dx =
1

2
e
y
2
/4
for every y R, using 283N with = 1/

2. So
lim
0
1

2
_

e
iyx
e
x
2
R(x)dx = 0
for every y ,= 0, and the Fourier transform of R should be zero everywhere except at 0. On the other hand, the
functions y
1

2
e
y
2
/4
all have integral

2, concentrated more and more closely about 0 as decreases to 0, so


also point us directly to , the measure which gives mass

2 to 0.
Thus allowing measures, as well as functions, enables us to extend the notion of Fourier transform. Of course we
can go very much farther than this. If h is any rapidly decreasing test function, then
_

h(x)dx = i

2h

(0),
so that the identity function x x can be assigned, as a Fourier transform, the operator h i

2h

(0).
At this point we are entering the true theory of (Schwartzian) distributions or generalized functions, and I had
better stop. The Dirac delta function is most naturally regarded as the measure
0
above; alternatively, as
1

R.
284W The multidimensional case As in 283, I give exercises designed to point the way to the r-dimensional
generalization.
(a) A rapidly decreasing test function on R
r
is a function h : R
r
C such that (i) h is smooth, that is, all
repeated partial derivatives
284Xc Fourier transforms II 445

m
h

j
1
...
j
m
are dened and continuous everywhere in R
r
(ii)
sup
xR
r |x|
k
[h(x)[ < , sup
xR
r |x|
k
[

m
h

j
1
...
j
m
(x)[ <
for every k N, j
1
, . . . , j
m
r. A tempered function on R
r
is a measurable complex-valued function f, dened
almost everywhere in R
r
, such that, for some k N,
_
R
r
1
1+x
k
[f(x)[dx < .
Show that if f is a tempered function on R
r
and h is a rapidly decreasing test function on R
r
then f h is integrable.
(b) Show that if h is a rapidly decreasing test function on R
r
so is

h, and that in this case

= h.
(c) Show that if f is a tempered function on R
r
and
_
f h = 0 for every rapidly decreasing test function h on R
r
,
then f = 0 a.e.
(d) If f and g are tempered functions on R
r
, I say that g represents the Fourier transform of f if
_
gh =
_
f

h
for every rapidly decreasing test function h on R
r
. Show that if f is integrable then

f represents the Fourier transform


of f in this sense.
(e) Let f be any tempered function on R
r
. Writing

(x) =
1
(

2)
r
e
x. x/2
2
for x R
r
, show that lim
0
(f

)(x) = c whenever x R
r
, c C are such that lim
0
1

r
_
B(x,)
[f(t) c[dt = 0, writing B(x, ) = t : |t x| .
(f ) Let f and g be tempered functions on R
r
such that g represents the Fourier transform of f, and h a rapidly
decreasing test function. Show that (i) the Fourier transform of f h is
1
(

2)
r
g

h (ii) (

2)
r
g

h represents the
Fourier transform of f h.
(g) Let f and g be tempered functions on R
r
such that g represents the Fourier transform of f. Show that
g(y) = lim
0
1
(

2)
r
_
R
r
e
iy . x
e
x. x
f(x)dx
for almost every y R
r
.
(h) Show that for any square-integrable complex-valued function f on R
r
and any > 0 there is a rapidly decreasing
test function h such that |f h|
2
.
(i) Let L
2
C
be the space of square-integrable complex-valued functions on R
r
. Show that
(i) for every f L
2
C
there is a g L
2
C
which represents the Fourier transform of f, and in this case |g|
2
= |f|
2
;
(ii) if g
1
, g
2
L
2
C
represent the Fourier transforms of f
1
, f
2
L
2
C
, then
1
(

2)
r
g
1
g
2
is the Fourier transform of
f
1
f
2
, and (

2)
r
g
1
g
2
represents the Fourier transform of f
1
f
2
.
(j) Let T be an invertible real r r matrix, regarded as a linear operator from R
r
to itself. (i) Show that

f =
[ det T[(f

T)

for every integrable complex-valued function on R


r
. (ii) Show that h

T is a rapidly decreasing test


function for every rapidly decreasing test function h. (iii) Show that if f, g are a tempered functions and g represents
the Fourier transform of f, then
1
| det T|
g

(T

)
1
represents the Fourier transform of f

T; so that if T is orthogonal,
then g

T represents the Fourier transform of fT.


284X Basic exercises (a) Show that if g and h are rapidly decreasing test functions, so is g h.
(b) Show that there are non-zero continuous integrable functions f, g : R C such that f g = 0 everywhere.
(Hint: take them to be Fourier transforms of suitable test functions.)
(c) Suppose that f : R C is a dierentiable function such that its derivative f

is a tempered function and, for


some k N,
lim
x
x
k
f(x) = lim
x
x
k
f(x) = 0.
(i) Show that
_
f h

=
_
f

h for every rapidly decreasing test function h. (ii) Show that if g is a tempered
function representing the Fourier transform of f, then y iyg(y) represents the Fourier transform of f

.
446 Fourier analysis 284Xd
(d) Show that if h is a rapidly decreasing test function and f is any measurable complex-valued function, dened
almost everywhere in R, such that
_

[x[
k
[f(x)[dx < for every k N, then the convolution f h is a rapidly
decreasing test function. (Hint: show that the Fourier transform of f h is a test function.)
>>>(e) Let f be a tempered function such that lim
a
_
a
a
f exists in C. Show that this limit is also equal to
lim
0
_

e
x
2
f(x)dx. (Hint: set g(x) = f(x) +f(x). Use 224J to show that if 0 a b then [
_
b
a
g(x)e
x
2
dx[
sup
c[a,b]
[
_
c
a
g[, so that lim
a
_
a
0
g(x)e
x
2
dx exists uniformly in , while lim
0
_
a
0
g(x)e
x
2
dx =
_
a
0
g for every
a 0.)
>>>(f ) Let f and g be tempered functions on R such that g represents the Fourier transform of f. Show that
g(y) = lim
a
1

2
_
a
a
e
iyx
f(x)dx
at almost all points y for which the limit exists. (Hint: 284Xe, 284M.)
>>>(g) Let f be an integrable complex-valued function on R such that

f also is integrable. Show that

= f at any
point at which f is continuous.
(h) Show that for every p [1, [, f L
p
C
and > 0 there is a rapidly decreasing test function h such that
|f h|
p
.
>>>(i) Let f and g be square-integrable complex-valued functions on R such that g represents the Fourier transform
of f. Show that
_
d
c
f =
i

2
_

e
icy
e
idy
y
g(y)dy
whenever c < d in R.
(j) Let f be a measurable complex-valued function, dened almost everywhere in R, such that
_
[f[
p
< , where
1 < p 2. Show that f is a tempered function and that there is a tempered function g representing the Fourier
transform of f. (Hint: express f as f
1
+ f
2
, where f
1
is integrable and f
2
is square-integrable.) (Remark Dening
|f|
p
, |g|
q
as in 244D, where q = p/(p 1), we have |g|
q
(2)
(p2)/2p
|f|
p
; see Zygmund 59, XVI.3.2.)
(k) Let f, g be square-integrable complex-valued functions on R such that g represents the Fourier transform of f.
(i) Show that
1

2
_
a
a
e
ixy
g(y)dy =
1

sin at
t
f(x t)dt
for every x R, a > 0. (Hint: nd the inverse Fourier transform of y e
ixy
[a, a](y), and use 284Ob.)
(ii) Show that if f(x) = 0 for x ]c, d[ then
1

2
lim
a
_
a
a
e
ixy
g(y)dy = 0
for x ]c, d[.
(iii) Show that if f is dierentiable at x R, then
1

2
lim
a
_
a
a
e
ixy
g(y)dy = f(x).
(iv) Show that if f has bounded variation over some interval properly containing x, then
1

2
lim
a
_
a
a
e
ixy
g(y)dy =
1
2
(lim
tdomf,tx
f(t) + lim
tdomf,tx
f(t)).
(l) Let f be an integrable complex function on R. Show that if

f is square-integrable, so is f.
(m) Let f
1
, f
2
be square-integrable complex-valued functions on R with Fourier transforms represented by g
1
, g
2
.
Show that
_

f
1
(t)f
2
(t)dt =
_

g
1
(t)g
2
(t)dt.
(n) Suppose x R. Write
x
for Dirac measure on R concentrated at x. Describe a sense in which

2
x
can be
regarded as the Fourier transform of the function t e
ixt
.
284Ye Fourier transforms II 447
(o) For any tempered function f and x R, let
x
be the Dirac measure on R concentrated at x, and set
(
x
f)(u) =
_
f(u t)
x
(dt) = f(u x)
for every u for which u x domf (cf. 257Xe). If g represents the Fourier transform of f, nd a corresponding
representation of the Fourier transform of
x
f, and relate it to the product of g with the Fourier transform of
x
.
(p) Show that
lim
0,a
_
_

a
1
x
e
iyx
dx +
_
a

1
x
e
iyx
dx
_
= i sgn y
for every y R, writing sgn y = y/[y[ if y ,= 0 and sgn 0 = 0. (Hint: 283Da.)
(ii) Show that
lim
c
1
c
_
c
0
_
a
a
e
ixy
sgn y dy da =
2i
x
for every x ,= 0.
(iii) Show that for any rapidly decreasing test function h,
_

0
1
x
(

h(x)

h(x))dx = lim
0,a
_
_

a
1
x

h(x)dx +
_
a

1
x

h(x)dx
_
=
i

2
_

h(y) sgn y dy.


(iv) Show that for any rapidly decreasing test function h,
i

2
_

h(x) sgn xdx =


_

0
1
y
(h(y) h(y))dy.
(q) Let h
n

nN
be a sequence of rapidly decreasing test functions such that (f) = lim
n
_

h
n
f is dened for
every rapidly decreasing test function f. Show that lim
n
_

n
f, lim
n
_

h
n
f and lim
n
_

(h
n
g)f
are dened for all rapidly decreasing test functions f and g, and are zero if is identically zero. (Hint: 255G will help
with the last.)
284Y Further exercises (a) Let f be an integrable complex-valued function on ], ], and

f its periodic
extension, as in 282Ae. Show that

f is a tempered function. Show that for any rapidly decreasing test function h,
_

f

h =

k=
c
k
h(k), where c
k

kN
is the sequence of Fourier coecients of f. (Hint: begin with the case
f(x) = e
inx
. Next show that
M =

k=
[h(k)[ +

k=
sup
x[(2k1),(2k+1)]
[

h(x)[ < ,
and that
[
_

f

k=
c
k
h(k)[ M|f|
1
.
Finally apply 282Ib.)
(b) Let f be a complex-valued function, dened almost everywhere in R, such that f h is integrable for every
rapidly decreasing test function h. Show that f is tempered.
(c) Let f and g be tempered functions on R such that g represents the Fourier transform of f. Show that
_
d
c
f =
i

2
lim

e
icy
e
idy
y
e
y
2
/2
2
g(y)dy
whenever c d in R. (Hint: set = [c, d]. Show that both sides are lim

_
f (
1/
), dening

as in 283N.)
(d) Show that if g : R R is an odd function of bounded variation such that
_

1
1
x
g(x)dx = , then g does not
represent the Fourier transform of any tempered function. (Hint: 283Yd, 284Yc.)
(e) Let S be the space of rapidly decreasing test functions. For k, m N set
km
(h) = sup
xR
[x[
k
[h
(m)
(x)[ for every
h S, writing h
(m)
for the mth derivative of h as usual. (i) Show that each
km
is a seminorm and that S is complete
448 Fourier analysis 284Ye
and separable for the metrizable linear space topology T they dene. (ii) Show that h

h : S S is continuous for T.
(iii) Show that if f is any tempered function, then h
_
f h is T-continuous. (iv) Show that if f is an integrable
function such that
_
[x
k
f(x)[dx < for every k N, then h f h : S S is T-continuous.
(f ) Show that if f is a tempered function on R and
= lim
c
1
c
_
c
0
_
a
a
f(x)dxda
is dened in C, then is also
lim
0
_

f(x)e
|x|
dx.
(g) Let f, g be square-integrable complex-valued functions on R such that g represents the Fourier transform of f.
Suppose that m Z and that (2m 1) < x < (2m + 1). Set

f(t) = f(t + 2m) for those t ], ] such that
t + 2m domf. Let c
k

kZ
be the sequence of Fourier coecients of

f. Show that
1

2
lim
a
_
a
a
e
ixy
g(y)dy = lim
n

n
k=n
c
k
e
ikx
in the sense that if one limit exists in C so does the other, and they are then equal. (Hint: 284Xk(i), 282Da.)
(h) Show that if f is integrable over R and there is some M 0 such that f(x) =

f(x) = 0 for [x[ M, then f = 0


a.e. (Hint: reduce to the case M = . Looking at the Fourier series of f ], ], show that f is expressible in the
form f(x) =

m
k=m
c
k
e
ikx
for almost every x ], ]. Now compute

f(2n +
1
2
) for large n.)
(i) Let be a Radon measure on R (denition: 256A) which is tempered in the sense that
_

1
1+|x|
k
(dx) is nite
for some k N. (i) Show that every rapidly decreasing test function is -integrable. (ii) Show that if has bounded
support (denition: 256Xf), and h is a rapidly decreasing test function, then h is a rapidly decreasing test function,
where ( h)(x) =
_

h(x y)(dy) for x R. (iii) Show that there is a sequence h


n

nN
of rapidly decreasing test
functions such that lim
n
_

h
n
f =
_

fd for every rapidly decreasing test function f.


(j) Let : S R be a functional dened by the formula of 284Xq. Show that is continuous for the topology of
284Ye. (Note: it helps to know a little more about metrizable linear topological spaces than is covered in 2A5.)
284 Notes and comments Yet again I must warn you that the material above gives a very restricted view of the
subject. I have tried to indicate how the theory of Fourier transforms of good functions here taken to be the
rapidly decreasing test functions may be extended, through a kind of duality, to a very much wider class of functions,
the tempered functions. Evidently, writing S for the linear space of rapidly decreasing test functions, we can seek
to investigate a Fourier transform of any linear functional : S C, writing

(h) = (

h) for any h S. (It is


actually commoner at this point to restrict attention to functionals which are continuous for the standard topology
on S, described in 284Ye; these are called tempered distributions.) By 284F-284G, we can identify some of these
functionals with equivalence classes of tempered functions, and then set out to investigate those tempered functions
whose Fourier transforms can again be represented by tempered functions.
I suppose the structure of the theory of Fourier transforms is best laid out through the formulae involved. Our aim
is to set up pairs (f, g) = (f,

f) = (

g, g) in such a way that we have


Inversion:

= h;
Reversal :

h(y) =

h(y);
Linearity: (h
1
+h
2
)

h
1
+

h
2
, (ch)

= c

h;
Dierentiation: (h

(y) = iy

h(y);
Shift: if h
1
(x) = h(x +c) then

h
1
(y) = e
iyc

h(y);
Modulation: if h
1
(x) = e
icx
h(x) then

h
1
(y) =

h(y c);
Symmetry: if h
1
(x) = h(x) then

h
1
(y) =

h(y);
Complex Conjugate: (h)

(y) =

h(y);
Dilation: if h
1
(x) = h(cx), where c > 0, then

h
1
(y) =
1
c

h(
y
c
);
284 Notes Fourier transforms II 449
Convolution: (h
1
h
2
)

h
1

h
2
, (h
1
h
2
)

=
1

h
1

h
2
;
Duality:
_

h
1

h
2
=
_

h
1
h
2
;
Parseval :
_

h
1
h
2
=
_

h
1

h
2
;
and, of course,

h(y) =
1

2
_

e
iyx
h(x)dx,
_
d
c

h(y)dy =
i

2
_

e
icy
e
idy
y
h(y)dy.
(I have used the letter h in the list above to suggest what is in fact the case, that all the formulae here are valid
for rapidly decreasing test functions.) On top of all this, it is often important that the operation h

h should be
continuous in some sense.
The challenge of the pure theory of Fourier transforms is to nd the widest possible variety of objects h for which
the formulae above will be valid, subject to appropriate interpretations of

, and
_

. I must of course remark here


that from the very beginnings, the subject has been enriched by its applications in other parts of mathematics, the
physical sciences and the social sciences, and that again and again these have suggested further possible pairs (f,

f),
making new demands on our power to interpret the rules we seek to follow. Even the theory of distributions does not
seem to give a full canonical account of what can be done. First, there are great diculties in interpreting the product
of two arbitrary distributions, making several of the formulae above problematic; and second, it is not obvious that
only one kind of distribution need be considered. In this section I have looked at just one space of test functions, the
space S of rapidly decreasing test functions; but at least two others are signicant, the space D of smooth functions
with bounded support and the space Z of Fourier transforms of functions in D. The advantage of starting with S is
that it gives a symmetric theory, since

h S for every h S; but it is easy to nd objects (e.g., the function x e


x
2
,
or the function x 1/[x[) which cannot be interpreted as functionals on S, so that their Fourier transforms must be
investigated by other methods, if at all. In 284Xp I sketch some of the arguments which can be used to justify the
assertion that the Fourier transform of the function x 1/x is, or can be represented by, the function y i
_

2
sgn y;
the general principle in this case being that we approach both 0 and symmetrically. For a variety of such matching
pairs, established by arguments based on the idea in 284Xq, see Lighthill 59, chap. 3.
Accordingly it seems that, after nearly two centuries, we must still proceed by carefully examining particular classes
of function, and checking appropriate interpretations of the formulae. In the work above I have repeatedly used the
concepts
lim
a
_
a
a
f, lim
0
_

e
x
2
f(x)dx
as alternative interpretations of
_

f. (Of course they are closely related; see 284Xe.) The reasons for using the
particular kernel e
x
2
are that it belongs to S, it is an even function, its Fourier transform is calculable and easy to
manipulate, and it is associated with the normal probability density function
1

2
e
x
2
/2
2
, so that any miscellaneous
facts we gather have a chance of being valuable elsewhere. But there are applications in which alternative kernels are
more manageable e.g., e
|x|
(283Xr, 283Yb, 284Yf).
One of the guiding principles here is that purely formal manipulations, along the lines of those in the list above, and
(especially) changes in the order of integration, with other exchanges of limit, again and again give rise to formulae
which, suitably interpreted, are valid. First courses in analysis are often inhibitory; students are taught to distrust any
manipulation which they cannot justify. To my own eye, the delight of this subject lies chiey in the variety of the
arguments demanded by a rigorous approach, the ground constantly shifting with the context; but there is no doubt
that cheerful sanguinity is often the best guide to the manipulations which it will be right to try to justify.
This being a book on measure theory, I am of course particularly interested in the possibility of a measure appearing
as a Fourier transform. This is what happens if we seek the Fourier transform of the constant function R (284R).
More generally, any periodic tempered function f with period 2 can be assigned a Fourier transform which is a signed
measure (for our present purposes, a complex linear combination of measures) concentrated on Z, the mass at each
k Z being determined by the corresponding Fourier coecient of f ], ] (284Xn, 284Ya). In the next section I will
go farther in this direction, with particular reference to probability distributions on R
r
. But the reason why positive
measures have not forced themselves on our attention so far is that we do not expect to get a positive function as a
Fourier transform unless some very special conditions are satised, as in 283Yb.
As in 282, I have used the Hilbert space structure of L
2
C
as the basis of the discussion of Fourier transforms of
450 Fourier analysis 284 Notes
functions in L
2
C
(284O-284P). But as with Fourier series, Carlesons theorem (286U) provides a more direct description.
285 Characteristic functions
I come now to one of the most eective applications of Fourier transforms, the use of characteristic functions to
analyse probability distributions. It turns out not only that the Fourier transform of a probability distribution deter-
mines the distribution (285M) but that many of the things we want to know about a distribution are easily calculated
from its transform (285G, 285Xf). Even more strikingly, pointwise convergence of Fourier transforms corresponds (for
sequences) to convergence for the vague topology in the space of distributions, so they provide a new and extremely
powerful method for proving such results as the Central Limit Theorem and Poissons theorem (285Q).
As the applications of the ideas here mostly belong to probability theory, I return to probabilists terminology, as in
Chapter 27. There will nevertheless be many points at which it is appropriate to speak of integrals, and there will often
be more than one measure in play; so I should say directly that an integral
_
f(x)dx will always be with respect to
Lebesgue measure (usually, but not always, one-dimensional), as in the rest of the chapter, while integrals with respect
to other measures will be expressed in the forms
_
fd or
_
f(x)(dx).
285A Denition (a) Let be a Radon probability measure on R
r
(256A). Then the characteristic function of
is the function

: R
r
C given by the formula

(y) =
_
e
iy . x
(dx)
for every y R
r
, writing y. x =
1

1
+. . . +
r

r
if y = (
1
, . . . ,
r
), x = (
1
, . . . ,
r
).
(b) Let X
1
, . . . , X
r
be real-valued random variables on the same probability space. The characteristic function
of XXX = (X
1
, . . . , X
r
) is the characteristic function
XXX
=

XXX
of their joint probability distribution
XXX
as dened in
271C.
285B Remarks (a) By one of the ordinary accidents of history, the denitions of characteristic function and
Fourier transform have evolved independently. In 283Ba I remarked that the denition of the Fourier transform
remains unxed, and that the formulae

f(y) =
_

e
iyx
f(x)dx,

f(y) =
1
2
_

e
iyx
f(x)dx
are sometimes used. On the other hand, I think that nearly all authors agree on the denition of the characteristic
function as given above. You may feel therefore that I should have followed their lead, and chosen the denition of
Fourier transform which best matches the denition of characteristic function. I did not do so largely because I wished
to emphasise the symmetry between the Fourier transform and the inverse Fourier transform, and the correspondence
between Fourier transforms and Fourier series. The principal advantage of matching the denitions up would be to
make the constants in such theorems as 283F, 285Xh the same, and would be balanced by the need to remember
dierent constants for

f,

f in such results as 283M.


(b) A secondary reason for not trying too hard to make the formulae of this section match directly those of 283-284
is that the r-dimensional case is at the heart of some of the most important applications of characteristic functions, so
that it seems right to introduce it from the beginning; and consequently the formulae of this section will necessarily
have new features compared with those in the body of the work so far.
285C Of course there is a direct way to describe the characteristic function of a family (X
1
, . . . , X
r
) of random
variables, as follows.
Proposition Let X
1
, . . . , X
r
be real-valued random variables on the same probability space (, , ), and
XXX
their
joint distribution. Then their characteristic function

XXX
is given by

XXX
(y) = E(e
iy .XXX
) = E(e
i
1
X
1
e
i
2
X
2
. . . e
i
r
X
r
)
for every y = (
1
, . . . ,
r
) R
r
.
proof Apply 271E to the functions h
1
, h
2
: R
r
R dened by
285E Characteristic functions 451
h
1
(x) = cos(y. x), h
2
(y) = sin(y. x),
to see that

XXX
(y) =
_
h
1
(x)
XXX
(dx) +i
_
h
2
(x)
XXX
(dx)
= E(h
1
(XXX)) +iE(h
2
(XXX)) = E(e
iy .XXX
).
285D I ought to spell out the correspondence between Fourier transforms, as dened in 283A, and characteristic
functions.
Proposition Let be a Radon probability measure on R. Write

(y) =
1

2
_

e
iyx
(dx)
for every y R, and

for the characteristic function of .


(a)

(y) =
1

(y) for every y R.


(b) For any Lebesgue integrable complex-valued function h dened almost everywhere in R,
_

(y)h(y)dy =
_

h(x)(dx).
(c) For any rapidly decreasing test function h on R (see 284),
_

h(x)(dx) =
_

h(y)

(y)dy.
(d) If is an indenite-integral measure over Lebesgue measure, with Radon-Nikod ym derivative f, then

is the
Fourier transform of f.
proof (a) This is immediate from the denitions of

and

.
(b) Because
_

[h(y)[(dx)dy =
_

[h(y)[dy < ,
we may change the order of integration to see that
_

(y)h(y)dy =
1

2
_

e
iyx
h(y)(dx)dy
=
1

2
_

e
iyx
h(y)dy (dx) =
_

h(x)(dx).
(c) This follows immediately from (b), because

h is integrable and

= h (284C).
(d) The point is just that
_
hd =
_
h(x)f(x)dx
for every bounded Borel measurable h : R R (235K), and therefore for the functions x e
iyx
: R C. Now

(y) =
1

2
_

e
iyx
(dx) =
1

2
_

e
iyx
f(x)dx =

f(y)
for every y.
285E Lemma Let X be a normal random variable with expectation a and variance
2
, where > 0. Then the
characteristic function of X is given by
(y) = e
iya
e

2
y
2
/2
.
proof This is just 283N with the constants changed. We have
(y) = E(e
iyX
) =
1

2
_

e
iyx
e
(xa)
2
/2
2
dx
(taking the density function for X given in 274Ad, and applying 271Ic)
452 Fourier analysis 285E
=
1

2
_

e
iy(t+a)
e
t
2
/2
dt
(substituting x = t +a)
= e
iya

1
(y)
(setting
1
(x) =
1

2
e
x
2
/2
, as in 283N)
= e
iya
e

2
y
2
/2
.
285F I now give results corresponding to parts of 283C, with an extra renement concerning independent random
variables (285I).
Proposition Let be a Radon probability measure on R
r
, and its characteristic function.
(a) (0) = 1.
(b) : R
r
C is uniformly continuous.
(c) (y) = (y), [(y)[ 1 for every y R
r
.
(d) If r = 1 and
_
[x[(dx) < , then

(y) exists and is equal to i


_
xe
ixy
(dx) for every y R.
(e) If r = 1 and
_
x
2
(dx) < , then

(y) exists and is equal to


_
x
2
e
ixy
(dx) for every y R.
proof (a) (0) =
_
R
r
(dx) = (R
r
) = 1.
(b) Let > 0. Let M > 0 be such that
x : |x| M ,
writing |x| =

x. x as usual. Let > 0 be such that [e
ia
1[ whenever [a[ . Now suppose that y, y

R
r
are
such that |y y

| /M. Then whenever |x| M,


[e
iy . x
e
iy

. x
[ = [e
iy

. x
[[e
i(yy

) . x
1[ = [e
i(yy

) . x
1[
because
[(y y

). x[ |y y

||x| .
Consequently
[(y) (y

)[
_
xM
[e
iy . x
e
iy

. x
[(dx) +
_
x>M
[e
iy . x
[(dx)
+
_
x>M
[e
iy

. x
[(dx)
+ + = 3.
As is arbitrary, is uniformly continuous.
(c) This is elementary;
(y) =
_
e
iy . x
(dx) =
_
e
iy . x
(dx) = (y),
[(y)[ = [
_
e
iy . x
(dx)[
_
[e
iy . x
[(dx) =
_
R
r
(dx) = 1.
(d) The point is that [

y
e
iyx
[ = [x[ for every x, y R. So by 123D (applied, strictly speaking, to the real and
imaginary parts of the function)

(y) =
d
dy
_
e
iyx
(dx) =
_

y
e
iyx
(dx) =
_
ixe
iyx
(dx).
(e) Since we now have [

y
xe
iyx
[ = x
2
for every x, y, we can repeat the argument to get

(y) = i
d
dy
_
xe
iyx
(dx) = i
_

y
xe
iyx
(dx) =
_
x
2
e
iyx
(dx).
285J Characteristic functions 453
285G Corollary (a) Let X be a real-valued random variable with nite expectation, and its characteristic
function. Then

(0) = iE(X).
(b) Let X be a real-valued random variable with nite variance, and its characteristic function. Then

(0) =
E(X
2
).
proof We have only to match X to its distribution , and say that
X has nite expectation
corresponds to

_
[x[(dx) = E([X[) < ,
so that

(0) = i
_
x(dx) = iE(X),
and that
X has nite variance
corresponds to

_
x
2
(dx) = E(X
2
) < ,
so that

(0) =
_
x
2
(dx) = E(X
2
),
as in 271E.
285H Remark Observe that there is no result corresponding to 283Cg (lim
|y|

f(y) = 0). If is the Dirac


measure on R concentrated at 0, that is, the distribution of a random variable which is zero almost everywhere, then
(y) = 1 for every y.
285I Proposition Let X
1
, . . . , X
n
be independent real-valued random variables, with characteristic functions

1
, . . . ,
n
. Let be the characteristic function of their sum X = X
1
+. . . +X
n
. Then
(y) =

n
j=1

j
(y)
for every y R.
proof Let y R. By 272E, the variables
Y
j
= e
iyX
j
are independent, so by 272R
(y) = E(e
iyX
) = E(e
iy(X
1
+...+X
n
)
) = E(

n
j=1
Y
j
) =

n
j=1
E(Y
j
) =

n
j=1

j
(y),
as required.
Remark See also 285R below.
285J There is an inversion theorem for characteristic functions, corresponding to 283F; I give it in 285Xh, with an
r-dimensional version in 285Yb. However, this does not seem to be as useful as the following group of results.
Lemma Let be a Radon probability measure on R
r
, and its characteristic function. Then for any j r and a > 0,
x : [
j
[ a 7a
_
1/a
0
(1 !e (te
j
))dt,
where e
j
R
r
is the jth unit vector.
proof We have
454 Fourier analysis 285J
7a
_
1/a
0
(1 !e (te
j
))dt = 7a
_
1/a
0
_
1 !e
_
R
r
e
it
j
(dx)
_
dt
= 7a
_
1/a
0
_
R
r
1 cos(t
j
)(dx)dt
= 7a
_
R
r
_
1/a
0
1 cos(t
j
)dt (dx)
(because (x, t) 1 cos(t
j
) is bounded and R
r

1
a
is nite)
= 7a
_
R
r
_
1
a

1

j
sin

j
a
_
(dx)
7a
_
|
j
|a
_
1
a

1

j
sin

j
a
_
(dx)
(because
1

sin

a

1
a
for every ,= 0)
x : [
j
[ a,
because
sin


sin 1
1

6
7
if 1,
so
a(
1
a

1

j
sin

j
a
)
1
7
if [
j
[ a.
285K Characteristic functions and the vague topology The time has come to return to ideas mentioned
briey in 274L. Fix r 1 and let P be the set of all Radon probability measures on R
r
. For any bounded continuous
function h : R
r
R, dene
h
: P P R by setting

h
(,

) = [
_
hd
_
hd

[
for ,

P. Then the vague topology on P is the topology generated by the pseudometrics


h
(274Ld).
285L Theorem Let ,
n

nN
be Radon probability measures on R
r
, with characteristic functions ,
n

nN
.
Then the following are equiveridical:
(i) = lim
n

n
for the vague topology;
(ii)
_
hd = lim
n
_
hd
n
for every bounded continuous h : R
r
R;
(iii) lim
n

n
(y) = (y) for every y R
r
.
proof (a) The equivalence of (i) and (ii) is virtually the denition of the vague topology; we have
lim
n

n
= for the vague topology
lim
n

h
(
n
, ) = 0 for every bounded continuous h
(2A3Mc)
lim
n
[
_
hd
n

_
hd[ = 0 for every bounded continuous h.
(b) Next, (ii) obviously implies (iii), because
!e (y) =
_
h
y
d = lim
n
h
y
d
n
= lim
n
!e
n
(y),
setting h
y
(x) = cos x. y for each x, and similarly
Jm(y) = lim
n
Jm
n
(y)
285L Characteristic functions 455
for every y R
r
.
(c) So we are left to prove that (iii)(ii). I start by showing that, given > 0, there is a closed bounded set K such
that

n
(R
r
K) for every n N.
PPP We know that (0) = 1 and that is continuous at 0 (285Fb). Let a > 0 be so large that for every j r, [t[ 1/a
we have
1 !e (te
j
)

14r
,
writing e
j
for the jth unit vector, as in 285J. Then
7a
_
1/a
0
(1 !e (te
j
))dt

2r
for each j r. By Lebesgues Dominated Convergence Theorem (since of course the functions t 1 !e
n
(te
j
) are
uniformly bounded on [0,
1
a
]), there is an n
0
N such that
7a
_
1/a
0
(1 !e
n
(te
j
))dt

r
for every j r, n n
0
. But 285J tells us that now

n
x : [
j
[ a

r
for every j r, n n
0
. On the other hand, there is surely a b a such that

n
x : [
j
[ b

r
for every j r, n < n
0
. So, setting K = x : [
j
[ b for every j r,

n
(R
r
K)
for every n N, as required. QQQ
(d) Now take any bounded continuous h : R
r
R and > 0. Set M = 1 +sup
xR
r [h(x)[, and let K be a bounded
closed set such that

n
(R
r
K)

M
for every n N, (R
r
K)

M
,
using (b) just above. By the Stone-Weierstrass theorem (281K) there are y
0
, . . . , y
m
Q
r
and c
0
, . . . , c
m
C such
that
[h(x) g(x)[ for every x K,
[g(x)[ M for every x R
r
,
writing g(x) =

m
k=0
c
k
e
iy
k
. x
for x R
r
. Now
lim
n
_
g d
n
= lim
n

m
k=0
c
k

n
(y
k
) =

m
k=0
c
k
(y
k
) =
_
g d.
On the other hand, for every n N,
[
_
g d
n

_
hd
n
[
_
K
[g h[d
n
+ 2M
n
(R K) 3,
and similarly [
_
g d
_
hd[ 3. Consequently
limsup
n
[
_
hd
n

_
hd[ 6.
As is arbitrary,
lim
n
_
hd
n
=
_
hd,
and (ii) is true.
456 Fourier analysis 285M
285M Corollary (a) Let ,

be two Radon probability measures on R


r
with the same characteristic functions.
Then they are equal.
(b) Let (X
1
, . . . , X
r
) and (Y
1
, . . . , Y
r
) be two families of real-valued random variables. If
E(e
i
1
X
1
+...+i
r
X
r
) = E(e
i
1
Y
1
+...+i
r
Y
r
)
for all
1
, . . . ,
r
R, then (X
1
, . . . , X
r
) has the same joint distribution as (Y
1
, . . . , Y
r
).
proof (a) Applying 285L with
n
=

for every n, we see that


_
hd

=
_
hd for every bounded continuous
h : R
r
R. By 256D(iv), =

.
(b) Apply (a) with ,

the two joint distributions.


285N Remarks Probably the most important application of this theorem is to the standard proof of the Central
Limit Theorem. I sketch the ideas in 285Xn and 285Yj-285Ym; details may be found in most serious probability texts;
two on my shelf are Shiryayev 84, III.4, and Feller 66, XV.6. However, to get the full strength of Lindebergs
version of the Central Limit Theorem we have to work quite hard, and I therefore propose to illustrate the method
with a version of Poissons theorem (285Q) instead. I begin with two lemmas which are very frequently used in results
of this kind.
285O Lemma Let c
0
, . . . , c
n
, d
0
, . . . , d
n
be complex numbers of modulus at most 1. Then
[

n
k=0
c
k

n
k=0
d
k
[

n
k=0
[c
k
d
k
[.
proof Induce on n. The case n = 0 is trivial. For the case n = 1 we have
[c
0
c
1
d
0
d
1
[ = [c
0
(c
1
d
1
) + (c
0
d
0
)d
1
[
[c
0
[[c
1
d
1
[ +[c
0
d
0
[[d
1
[ [c
1
d
1
[ +[c
0
d
0
[,
which is what we need. For the inductive step to n + 1, we have
[
n+1

k=0
c
k

n+1

k=0
d
k
[ [
n

k=0
c
k

n

k=0
d
k
[ +[c
n+1
d
n+1
[
(by the case just done, because c
n+1
, d
n+1
,

n
k=0
c
k
and

n
k=0
d
k
all have modulus at most 1)

k=0
[c
k
d
k
[ +[c
n+1
d
n+1
[
(by the inductive hypothesis)
=
n+1

k=0
[c
k
d
k
[,
so the induction continues.
285P Lemma Let M, > 0. Then there are > 0 and y
0
, . . . , y
n
R such that whenever X, Z are two real-valued
random variables with E([X[) M, E([Z[) M and [
X
(y
j
)
Z
(y
j
)[ for every j n, then F
X
(a) F
Z
(a+) +
for every a R, where I write
X
for the characteristic function of X and F
X
for the distribution function of X.
proof Set =

7
> 0, b = M/.
(a) Dene h
0
: R [0, 1] by setting
h
0
(x) = 1 if x 0, h
0
(x) = 1 x/ if 0 x , h
0
(x) = 0 if x .
Then h
0
is continuous. Let m be the integral part of b/, and for m k m+ 1 set h
k
(x) = h
0
(x k).
By the Stone-Weierstrass theorem (281K), there are y
0
, . . . , y
n
R and c
0
, . . . , c
n
C such that, writing g
0
(x) =

n
j=0
c
j
e
iy
j
x
,
[h
0
(x) g
0
(x)[ for every x [b (m+ 1), b +m],
[g
0
(x)[ 1 for every x R.
285Q Characteristic functions 457
For m k m+ 1, set
g
k
(x) = g
0
(x k) =

n
j=0
c
j
e
iy
j
k
e
iy
j
x
.
Set = /(1 +

n
j=0
[c
j
[) > 0.
(b) Now suppose that X, Z are random variables such that E([X[) M, E([Z[) M and [
X
(y
j
)
Z
(y
j
)[
for every j n. Then for any k we have
E(g
k
(X)) = E(

n
j=0
c
j
e
iy
j
k
e
iy
j
X
) =

n
j=0
c
j
e
iy
j
k

X
(y
j
),
and similarly
E(g
k
(Z)) =

n
j=0
c
j
e
iy
j
k

Z
(y
j
),
so
[E(g
k
(X)) E(g
k
(Z))[

n
j=0
[c
j
[[
X
(y
j
)
Z
(y
j
)[

n
j=0
[c
j
[ .
Next,
[h
k
(x) g
k
(x)[ for every x [b (m+ 1) +k, b +m +k] [b, b],
[h
k
(x) g
k
(x)[ 2 for every x,
Pr([X[ b)
M
b
= ,
so E([h
k
(X) g
k
(X)[) 3; and similarly E([h
k
(Z) g
k
(Z)[) 3. Putting these together,
[E(h
k
(X)) E(h
k
(Z))[ 7 =
whenever m k m+ 1.
(c) Now suppose that b a b. Then there is a k such that m k m+ 1 and a k a +. Since
], a] ], k] h
k
], (k + 1)] ], a + 2],
we must have
Pr(X a) E(h
k
(X)),
E(h
k
(Z)) Pr(Z a + 2) Pr(Z a +).
But this means that
Pr(X a) E(h
k
(X)) E(h
k
(Z)) + Pr(Z a +) +
whenever a [b, b].
(d) As for the cases a b, a b, we surely have
b(1 F
Z
(b)) = b Pr(Z b) E([Z[) M,
so if a b then
F
X
(a) 1 F
Z
(a) + 1 F
Z
(b) F
Z
(a) +
M
b
= F
Z
(a) + F
Z
(a +) +.
Similarly,
bF
X
(b) E([X[) M,
so
F
X
(a) F
Z
(a +) +
for every a b. This completes the proof.
285Q Law of Rare Events: Theorem For any M 0 and > 0 there is a > 0 such that whenever X
0
, . . . , X
n
are independent 0, 1-valued random variables with Pr(X
k
= 1) = p
k
for every k n, and

n
k=0
p
k
= M,
and X = X
0
+. . . +X
n
, then
[ Pr(X = m)

m
m!
e

[
458 Fourier analysis 285Q
for every m N.
proof (a) We should begin by calculating some characteristic functions. First, the characteristic function
k
of X
k
will be given by

k
(y) = (1 p
k
)e
iy0
+p
k
e
iy1
= 1 +p
k
(e
iy
1).
Next, if Z is a Poisson random variable with parameter (that is, if Pr(Z = m) =
m
e

/m! for every m N; all


you need to know at this point about the Poisson distribution is that

m=0

m
e

/m! = 1), then its characteristic


function
Z
is given by

Z
(y) =

m=0

m
m!
e

e
iym
= e

m=0
(e
iy
)
m
m!
= e

e
e
iy
= e
(e
iy
1)
.
(b) Before getting down to s and s, I show how to estimate
X
(y)
Z
(y). We know that

X
(y) =

n
k=0

k
(y)
(using 285I), while

Z
(y) =

n
k=0
e
p
k
(e
iy
1)
.
Because
k
(y), e
p
k
(e
iy
1)
all have modulus at most 1 (we have
[e
p
k
(e
iy
1)
[ = e
p
k
(1cos y)
1,)
285O tells us that
[
X
(y)
Z
(y)[

n
k=0
[
k
(y) e
p
k
(e
iy
1)
[ =

n
k=0
[e
p
k
(e
iy
1)
1 p
k
(e
iy
1)[.
(c) So we have a little bit of analysis to do. To estimate [e
z
1 z[ where !e z 0, consider the function
g(t) = !e(c(e
tz
1 tz))
where [c[ = 1. We have g(0) = g

(0) = 0 and
[g

(t)[ = [ !e(c(z
2
e
tz
))[ [c[[z
2
[[e
tz
[ [z[
2
for every t 0, so that
[g(1)[
1
2
[z[
2
by the (real-valued) Taylor theorem with remainder, or otherwise. As c is arbitrary,
[e
z
1 z[
1
2
[z[
2
whenever !e z 0. In particular,
[e
p
k
(e
iy
1)
1 p
k
(e
iy
1)[
1
2
p
2
k
[e
iy
1[
2
2p
2
k
for each k, and
[
X
(y)
Z
(y)[

n
k=0
[e
p
k
(e
iy
1)
1 p
k
(e
iy
1)[ 2

n
k=0
p
2
k
for each y R.
(d) Now for the detailed estimates. Given M 0 and > 0, let > 0 and y
0
, . . . , y
l
R be such that
Pr(X a) Pr(Z a +
1
2
) +

2
whenever X, Z are real-valued random variables, E([X[) M, E([Z[) M and [
X
(y
j
)
X
(y
j
)[ for every
j l (285P). Take = /(2M +1) and suppose that X
0
, . . . , X
n
are independent 0, 1-valued random variables with
Pr(X
k
= 1) = p
k
for every k n, =

n
k=0
p
k
M. Set X = X
0
+ . . . + X
n
and let Z be a Poisson random
variable with parameter ; then by the arguments of (a)-(c),
[
X
(y)
Z
(y)[ 2

n
k=0
p
2
k
2

n
k=0
p
k
= 2
for every y R. Also
E([X[) = E(X) =

n
k=0
p
k
= M,
285T Characteristic functions 459
E([Z[) = E(Z) =

m=0
m

m
m!
e

= e

m=1

m
(m1)!
= e

m=0

m+1
m!
= M.
So
Pr(X a) Pr(Z a +
1
2
) +

2
,
Pr(Z a) Pr(X a +
1
2
) +

2
for every a. But as both X and Z take all their values in N,
[ Pr(X m) Pr(Z m)[

2
for every m N, and
[ Pr(X = m)

m
m!
e

[ = [ Pr(X = m) Pr(Z = m)[


for every m N, as required.
285R Convolutions Recall from 257A that if , are Radon probability measures on R
r
then they have a
convolution dened by writing
( )(E) = ( )(x, y) : x +y E
for every Borel set E R
r
, which is also a Radon probability measure. We can readily compute the characteristic
function

from 257B: we have


(y) =
_
e
iy . x
( )(dx) =
_
e
iy . (x+x

)
(dx) (dx

)
=
_
e
iy . x
e
iy . x

(dx) (dx

) =
_
e
iy . x
(dx)
_
e
iy . x

(dx

) =

(y)

(y).
(Thus convolution of measures corresponds to pointwise multiplication of characteristic functions, just as convolution
of functions corresponds to pointwise multiplication of Fourier transforms.) Recalling that the sum of independent
random variables corresponds to convolution of their distributions (272T), this gives another way of looking at 285I.
Remember also that if , have Radon-Nikod ym derivatives f,

f with respect to Lebesgue measure then f

f is a
Radon-Nikod ym derivative of (257F).
285S The vague topology and pointwise convergence of characteristic functions In 285L we saw that a
sequence
n

nN
of Radon probability measures on R
r
converges in the vague topology to a Radon probability measure
if and only if
lim
n
_
e
iy . x

n
(dx) =
_
e
iy . x
(dx)
for every y R
r
; that is, i
lim
n

y
(
n
, ) = 0 for every y R
r
,
writing

y
(,

) = [
_
e
iy . x
(dx)
_
e
iy . x

(dx)[
for Radon probability measures ,

on R
r
and y R
r
. It is natural to ask whether the pseudometrics

y
actually
dene the vague topology. Writing T for the vague topology and S for the topology dened by

y
: y R
r
, we surely
have S T, just because every

y
is one of the pseudometrics used in the denition of T. Also we know that S and
T give the same convergent sequences, and incidentally that T is metrizable (see 285Xq). But all this does not quite
amount to saying that the two topologies are the same, and indeed they are not, as the next result shows.
285T Proposition Let y
0
, . . . , y
n
R and > 0. Then there are innitely many m N such that [1 e
iy
k
m
[
for every k n.
proof Let
1
, . . . ,
r
R be such that 1 =
0
,
1
, . . . ,
r
are linearly independent over Q and every y
k
/2 is a linear
combination of the
j
over Q; say y
k
= 2

r
j=0
q
kj

j
where every q
kj
Q. Express the q
kj
as p
kj
/p where each
p
kj
Z and p N 0. Set M = max
kn

r
j=0
[p
kj
[.
460 Fourier analysis 285T
Take any m
0
N and let > 0 be such that [1 e
2ix
[ whenever [x[ 2M. By Weyls Equidistribution
Theorem (281N), there are innitely many m such that <m
j
> whenever 1 j r; in particular, there is such
an m m
0
. Let m
j
be the integral part of m
j
, so that [m
j
m
j
[ for 0 j r. Then
[mpy
k
2

r
j=0
p
kj
m
j
[ 2

r
j=0
[p
kj
[[m
j
m
j
[ 2M,
so that
[1 e
iy
k
mp
[ = [1 exp(i(mpy
k
2

r
j=0
p
kj
m
j
))[
for every k n. As mp m
0
and m
0
is arbitrary, this proves the result.
285U Corollary The topologies S and T on the space of Radon probability measures on R, as described in 285S,
are dierent.
proof Let
x
be the Dirac measure on R concentrated at x. By 285T, every member of S which contains
0
also
contains
m
for innitely many m N. On the other hand, the set
G = :
_
e
x
2
(dx) >
1
2

is a member of T, containing
0
, which does not contain
m
for any integer m ,= 0. So G T S and T ,= S.
285X Basic exercises (a) Let be a Radon probability measure on R
r
, where r 1, and suppose that
_
|x|(dx) <
. Show that the characteristic function of is dierentiable (in the full sense of 262Fa) and that

j
(y) =
i
_

j
e
iy . x
(dx) for every j r, y R
r
, using
j
,
j
to represent the coordinates of x and y as usual.
>>>(b) Let XXX = (X
1
, . . . , X
r
) be a family of real-valued random variables, with characteristic function
XXX
. Show
that the characteristic function
X
j
of X
j
is given by

X
j
(y) =
XXX
(ye
j
) for every y R,
where e
j
is the jth unit vector of R
r
.
>>>(c) Let X be a real-valued random variable and
X
its characteristic function. Show that

aX+b
(y) = e
iyb

X
(ay)
for any a, b, y R.
(d) Let X be a real-valued random variable and its characteristic function.
(i) Show that for any integrable complex-valued function h on R,
E(

h(X)) =
1

2
_

(y)h(y)dy,
writing

h for the Fourier transform of h.


(ii) Show that for any rapidly decreasing test function h,
E(h(X)) =
1

2
_

(y)

h(y)dy.
(e) Let be a Radon probability measure on R, and suppose that its characteristic function is square-integrable.
Show that is an indenite-integral measure over Lebesgue measure and that its Radon-Nikod ym derivatives are also
square-integrable. (Hint: use 284O to nd a square-integrable f such that
_
h f =
1

2
_

h for every rapidly


decreasing test function h, and ideas from the proof of 284G to show that
_
b
a
f = ]a, b[ whenever a < b in R.)
>>>(f ) Let XXX = (X
1
, . . . , X
r
) be a family of real-valued random variables with characteristic function
XXX
. Suppose
that
XXX
is expressible in the form

XXX
(y) =

r
j=1

j
(
j
)
for some functions
1
, . . . ,
r
, writing y = (
1
, . . . ,
r
) as usual. Show that X
1
, . . . , X
r
are independent. (Hint:
show that the
j
must be the characteristic functions of the X
j
; now show that the distribution of XXX has the same
characteristic function as the product of the distributions of the X
j
.)
285Xq Characteristic functions 461
(g) Let X
1
, X
2
be independent real-valued random variables with the same distribution, and the characteristic
function of X
1
X
2
. Show that (t) = (t) 0 for every t R.
(h) Let be a Radon probability measure on R, with characteristic function . Show that
1
2
([c, d] + ]c, d[) =
i
2
lim
a
_
a
a
e
idy
e
icy
y
(y)dy
whenever c d in R. (Hint: use part (a) of the proof of 283F.)
(i) Let X be a real-valued random variable and
X
its characteristic function. Show that
Pr([X[ a) 7a
_
1/a
0
(1 !e(
X
(y))dy
for every a > 0.
(j) We say that a set Q of Radon probability measures on R is uniformly tight if for every > 0 there is an M 0
such that (R [M, M]) for every Q. Show that if Q is any uniformly tight family of Radon probability
measures on R, and > 0, then there are > 0 and y
0
, . . . , y
n
R such that
], a]

], a +] +
whenever ,

Q and [

(y
j
)

(y
j
)[ for every j n, writing

for the characteristic function of .


(k) Let
n

nN
be a sequence of Radon probability measures on R, and suppose that it converges for the vague
topology to a Radon probability measure . Show that
n
: n N is uniformly tight in the sense of 285Xj.
>>>(l) Let ,

be two totally nite Radon measures on R


r
which agree on all closed half-spaces, that is, sets of the
form x : x. y c for c R, y R
r
. Show that =

. (Hint: reduce to the case R


r
=

R
r
= 1 and use 285M.)
>>>(m) For > 0, the Cauchy distribution with centre 0 and scale parameter is the Radon probability measure

dened by the formula

(E) =

_
E
1

2
+t
2
dt.
(i) Show that if X is a random variable with distribution

then Pr(X 0) = Pr([X[ ) =


1
2
. (ii) Show that the
characteristic function of

is y e
|y|
. (Hint: 283Xr.) (iii) Show that if X and Y are independent random variables
with Cauchy distributions, both centered at 0 and with scale parameters , respectively, and , are not both 0,
then X +Y has a Cauchy distribution centered at 0 and with scale parameter [[ +[[. (iv) Show that if X and
Y are independent normally distributed random variables with expectation 0 then X/Y has a Cauchy distribution.
>>>(n) Let X
1
, X
2
, . . . be an independent identically distributed sequence of random variables, all of zero expectation
and variance 1; let be their common characteristic function. For each n 1, set S
n
=
1

n
(X
1
+. . . +X
n
).
(i) Show that the characteristic function
n
of S
n
is given by the formula
n
(y) = ((
y

n
))
n
for each n.
(ii) Show that [
n
(y) e
y
2
/2
[ n[(
y

n
) e
y
2
/2n
[.
(iii) Setting h(y) = (y) e
y
2
/2
, show that h(0) = h

(0) = h

(0) = 0 and therefore that lim


n
nh(y/

n) = 0,
so that lim
n

n
(y) = e
y
2
/2
for every y R.
(iv) Show that lim
n
Pr(S
n
a) =
1

2
_
a

e
x
2
/2
dx for every a R.
>>>(o) A random variable X has a Poisson distribution with parameter > 0 if Pr(X = n) = e

n
/n! for every
n N. (i) Show that in this case E(X) = Var(X) = . (ii) Show that if X and Y are independent random variables
with Poisson distributions then X +Y has a Poisson distribution. (iii) Find a proof of (ii) based on 285Q.
>>>(p) For x R
r
, let
x
be the Dirac measure on R
r
concentrated at x. Show that
x

y
=
x+y
for all x, y R
r
.
(q) Let P be the set of Radon probability measures on R
r
. For y R
r
, set

y
(,

) = [

(y)

(y)[ for all ,

P, writing

for the characteristic function of . Set (x) =


1
(

2)
r
e
x. x/2
for x R
r
. Show that the vague
topology on P is dened by the family

y
: y Q
r
, dening

as in 285K, and is therefore metrizable. (Hint:


281K; cf. 285Xj.)
462 Fourier analysis 285Xr
>>>(r) Let : R
r
C be the characteristic function of a Radon probability measure on R
r
. Show that (0) = 1 and
that

n
j=0

n
k=0
c
j
c
k
(a
j
a
k
) 0 whenever a
0
, . . . , a
n
R
r
and c
0
, . . . , c
n
C. (Bochners theorem states that
these conditions are sucient, as well as necessary, for to be a characteristic function; see 445N in Volume 4.)
(s) Let X
n

nN
be an independent sequence of real-valued random variables and set S
n
=

n
j=0
X
j
for each n N.
Suppose that the sequence
S
n

nN
of distributions is convergent for the vague topology to a distribution. Show that
S
n

nN
converges in measure, therefore a.e. (Hint: 285J, 273B.)
(t) Let X be a normal random variable with expectation a and variance
2
. Show that E(e
X
) = exp(a +
1
2

2
).
285Y Further exercises (a) Let be a Radon probability measure on R
r
. Write

(y) =
1
(

2)
r
_
e
iy . x
(dx)
for every y R
r
.
(i) Writing

for the characteristic function of , show that



(y) =
1
(

2)
r

(y) for every y R


r
.
(ii) Show that
_

(y)h(y)dy =
_
h(x)(dx) for any Lebesgue integrable complex-valued function h on R
r
, dening
the Fourier transform

h as in 283Wa.
(iii) Show that
_
h(x)(dx) =
_
h(y)

(y)dy for any rapidly decreasing test function h on R


r
.
(iv) Show that if is an indenite-integral measure over Lebesgue measure, with Radon-Nikod ym derivative f,
then

is the Fourier transform of f.
(b) Let be a Radon probability measure on R
r
, with characteristic function . Show that whenever c d in R
r
then
_
i
2
_
r
lim

1
,... ,
r

_

1

1
. . .
_

r

r
_
r

j=1
e
i
j

j
e
i
j

j
_
(y)dy
exists and lies between ]c, d[ and [c, d], writing ]c, d[ =

jr
]
j
,
j
[ if c = (
1
, . . . ,
r
) and d = (
1
, . . . ,
r
).
(c) Let X
n

nN
be an independent identically distributed sequence of (not-essentially-constant) random variables,
and set S
n
=

n
k=0
X
2k+1
X
2k
for each n N. Show that lim
n
Pr([S
n
[ ) = 1 for every R. (Hint: 285Xg,
proof of 285J.) Hence, or otherwise, show that lim
n
Pr([

n
k=0
X
k
[ ) = 1 for every R.
(d) For Radon probability measures ,

on R
r
set
(,

) = inf : 0, ], a]

], a +1] + ], a + 21] + 2
for every a R
r
,
writing ], a] = (
1
, . . . ,
r
) :
j

j
for every j r when a = (
1
, . . . ,
r
), and 1 = (1, . . . , 1) R
r
. Show that
is a metric on the set of Radon probability measures on R
r
, and that the topology it denes is the vague topology.
(Cf. 274Ya.)
(e) Let r 1. We say that a set Q of Radon probability measures on R
r
is uniformly tight if for every > 0 there
is a compact set K R
r
such that (R
r
K) for every Q. Show that if Q is any uniformly tight family of Radon
probability measures on R
r
, and > 0, then there are > 0, y
0
, . . . , y
n
R
r
such that ], a]

], a +1] +
whenever ,

Q and a R
r
and [

(y
j
)

(y
j
)[ for every j n, writing

for the characteristic function


of .
(f ) Show that for any M 0 the set of Radon probability measures on R
r
such that
_
|x|(dx) M is uniformly
tight in the sense of 285Ye.
(g) Let C
b
(R
r
) be the Banach space of bounded continuous real-valued functions on R
r
.
(i) Show that any Radon probability measure on R
r
corresponds to a continuous linear functional h

: C
b
(R
r
)
R, writing h

(f) =
_
fd for f C
b
(R
r
).
(ii) Show that if h

= h

then =

.
(iii) Show that the vague topology on the set of Radon probability measures corresponds to the weak* topology
on the dual (C
b
(R
r
))

of C
b
(R
r
).
285Yr Characteristic functions 463
(h) Let r 1 and let P be the set of Radon probability measures on R
r
. For m N let

m
be the pseudometric on
P dened by setting

m
(,

) = sup
ym
[

(y)

(y)[ for ,

P, writing

for the characteristic function of


. Show that

m
: m N denes the vague topology on P.
(i) Let r 1 and let P be the set of Radon probability measures on R
r
. For m N let

m
be the pseudometric on
P dened by setting

m
(,

) =
_
{y:ym}
[

(y)

(y)[dy
for ,

P, writing

for the characteristic function of . Show that

m
: m N denes the vague topology on P.
(j) Let X be a real-valued random variable with nite variance. Show that for any 0,
[(y) 1 iyE(X) +
1
2
y
2
E(X
2
)[
1
6
[y
3
[E(X
2
) +y
2
E(

(X)),
writing for the characteristic function of X and

(x) = 0 for [x[ , x


2
for [x[ > .
(k) Suppose that > 0 and that X
0
, . . . , X
n
are independent real-valued random variables such that
E(X
k
) = 0 for every k n,

n
k=0
Var(X
k
) = 1,

n
k=0
E(

(X
k
))
(writing

(x) = 0 if [x[ , x
2
if [x[ > ). Set = /

2
+, and let Z be a standard normal random variable. Show
that
[(y) e
y
2
/2
[
1
3
[y[
3
+y
2
( +E(

(Z)))
for every y R, writing for the characteristic function of X =

n
k=0
X
k
. (Hint: write
k
for the characteristic
function of X
k
and

k
for the characteristic function of
k
Z, where
k
=
_
Var(X
k
). Show that
[
k
(y)

k
(y)[
1
3
[y
3
[
2
k
+y
2
_
E(

(X
k
)) +
2
k
E(

(Z))
_
.)
(l) Show that for every > 0 there is a > 0 such that whenever X
0
, . . . , X
n
are independent real-valued random
variables such that
E(X
k
) = 0 for every k n,

n
k=0
Var(X
k
) = 1,

n
k=0
E(

(X
k
))
(writing

(x) = 0 if [x[ , x
2
if [x[ > ), then [(y) e
y
2
/2
[ (y
2
+ [y
3
[) for every y R, writing for the
characteristic function of X = X
0
+. . . +X
n
.
(m) Use 285Yl to prove Lindebergs theorem (274F).
(n) Let r 1 and let P be the set of Radon probability measures on R
r
. Show that convolution, regarded as a map
from P P to P, is continuous when P is given the vague topology. (Hint: 281Xa and 257B will help.)
(o) Let S be the topology on R dened by

y
: y R, where

y
(x, x

) = [e
iyx
e
iyx

[ (compare 285S). Show that


addition and subtraction are continuous for S in the sense of 2A5A.
(p) Let be a Radon probability measure on R
r
with bounded support (denition: 256Xf). Show that its charac-
teristic function is smooth.
(q) Let (, , ) be a probability space. Suppose that X
n

nN
is a sequence of real-valued random variables on ,
and X another real-valued random variable on ; let
X
n
,
X
be the corresponding characteristic functions. Show that
the following are equiveridical: (i) lim
n
E(f(X
n
)) = E(f(X)) for every bounded continuous function f : R R;
(ii) lim
n

X
n
(y) =
X
(y) for every y R. (In this case we say that X
n

nN
converges in distribution to X.)
(r) Let (, , ) be a probability space, and P the set of Radon probability measures on R. (i) Show that we have
a function : L
0
() P dened by saying that (X

) is the distribution of X whenever X is a real-valued random


variable on . (ii) Show that is continuous for the topology of convergence in measure on L
0
() and the vague
topology on P. (Compare 271Yd.)
464 Fourier analysis 285 Notes
285 Notes and comments Just as with Fourier transforms, the power of methods which use the characteristic func-
tions of distributions is based on three points: (i) the characteristic function of a distribution determines the distribution
(285M); (ii) the properties of interest in a distribution are reected in accessible properties of its characteristic function
(285G, 285I, 285J) (iii) these properties of the characteristic function are actually dierent from the corresponding
properties of the distribution, and are amenable to dierent kinds of investigation. Above all, the fact that (for se-
quences!) convergence in the vague topology of distributions corresponds to pointwise convergence for characteristic
functions (285L) provides us with a path to the classic limit theorems, as in 285Q and 285Xn. In 285S-285U I show that
this result for sequences does not correspond immediately to any alternative characterization of the vague topology,
though it can be adapted in more than one way to give such a characterization (see 285Yh-285Yi).
Concerning the Central Limit Theorem there is one conspicuous dierence between the method suggested here and
that of 274. The previous approach oered at least a theoretical possibility of giving an explicit formula for in 274F
as a function of , and hence an estimate of the rate of convergence to be expected in the Central Limit Theorem. The
arguments in the present chapter, involving as they do an entirely non-constructive compactness argument in 281A,
leave us with no way of achieving such an estimate. But in fact the method of characteristic functions, suitably rened,
is the basis of the best estimates known, such as the Berry-Esseen theorem (274Hc).
In 285D I try to show how the characteristic function

of a Radon probability measure can be related to a Fourier


transform

of which corresponds directly to the Fourier transforms of functions discussed in 283-284. If f is a non-
negative Lebesgue integrable function and we take to be the corresponding indenite-integral measure, then

=

f.
Thus the concept of Fourier transform of a measure is a natural extension of the Fourier transform of an integrable
function. Looking at it from the other side, the formula of 285Dc shows that can be thought of as representing the
inverse Fourier transform of

in the sense of 284H-284I. Taking to be the measure which assigns a mass 1 to the
point 0, we get the Dirac delta function, with Fourier transform the constant function R. These ideas can be extended
without diculty to handle convolutions of measures (285R).
It is a striking fact that while there is no satisfactory characterization of the functions which are Fourier transforms
of integrable functions, there is a characterization of the characteristic functions of probability distributions. This is
Bochners theorem. I give the condition in 285Xr, asking you to prove its necessity as an exercise; we already have
three-quarters of the machinery to prove its suciency, but the last step will have to wait for Volume 4.
286 Carlesons theorem
Carlesons theorem (Carleson 66) was the (unexpected) solution to a long-standing problem. Remarkably, it can
be proved by elementary arguments. The hardest part of the work below, in 286I-286L, involves only the laborious
verication of inequalities. How the inequalities were chosen is a dierent matter; for once, some of the ideas of the
proof lie in the statements of the lemmas. The argument here is a greatly expanded version of Lacey & Thiele 00.
The Hardy-Littlewood Maximal Theorem (286A) is important, and worth learning even if you leave the rest of the
section as an unexamined monument. I bring 286B-286D forward to the beginning of the section, even though they
are little more than worked exercises, because they also have potential uses in other contexts.
The complexity of the argument is such that it is useful to introduce a substantial number of special notations.
Rather than include these in the general index, I give a list in 286W. Among them are ten constants C
1
, . . . , C
10
. The
values of these numbers are of no signicance. The method of proof here is quite inappropriate if we want to estimate
rates of convergence. I give recipes for the calculation of the C
n
only for the sake of the linear logic in which this
treatise is written, and because they occasionally oer clues concerning the tactics being used.
In this section all integrals are with respect to Lebesgue measure on R unless otherwise stated.
286A The Maximal Theorem Suppose that 1 < p < and that f L
p
C
() (denition: 244P). Set
f

(x) = sup
1
ba
_
b
a
[f[ : a x b, a < b
for x R. Then |f

|
p

2
1/p
p
p1
|f|
p
.
proof (a) It is enough to consider the case f 0. Note that if E R has nite measure, then
_
E
f =
_
(f E) E |f E|
p
(E)
1/q
|f|
p
(E)
1/q
is nite, where q =
p
p1
, by H olders inequality (244Eb). Consequently, if t > 0 and
_
E
f tE, we must have
tE |f E|
p
(E)
1/q
and
286A Carlesons theorem 465
E = (E)
pp/q

1
t
p
|f E|
p
p
=
1
t
p
_
E
f
p
.
(b) For t > 0, set
G
t
= x :
_
a
x
f > (a x)t for some a > x.
(i) G
t
is an open set. PPP For any a R,
G
ta
= x : x < a,
_
a
x
f > (a x)t
is open, because x
_
a
x
f and x (a x)t are continuous (225A); so G
t
=

aR
G
ta
is open. QQQ
(ii) By 2A2I, there is a partition ( of G
t
into open intervals. Now C is bounded and tC
_
C
f for every C (.
PPP Express C as ]a, b[ (for the moment, we have to allow for the possibility that one or both of a, b is innite).
() If x C, there is some (nite) c > x such that
_
c
x
f > (c x)t. Set d = min(b, c) > x. If d = c, then of course
_
d
x
f > (d x)t. If d = b < c, then (because b / G
t
)
_
c
b
f (c b)t, so again
_
d
x
f =
_
b
x
f =
_
c
x
f
_
c
b
f > (c x)t (c b)t = (b x)t = (d x)t.
Thus we always have some d ]x, b] such that
_
d
x
f > (d x)t.
() Now take any z C, and consider
A
z
= x : z x b,
_
x
z
f (x z)t.
Then z A
z
, and A
z
is closed, again because the functions x
_
x
z
f and x (x z)t are continuous. Moreover, A
z
is bounded, because x z
1
t
p
|f|
p
p
for every x A
z
, by (a). ??? If sup A
z
= x
0
< b, then x
0
A
z
, and there is a
d ]x
0
, b] such that
_
d
x
0
f t(d x
0
), by (); but in this case d A
z
, which is impossible. XXX Thus b = sup A
z
A
z
(in particular, b < ), and
_
b
z
f (b z)t.
() Letting z decrease to a, we see that b a
1
t
p
|f|
p
p
, so a is nite, and also
_
b
a
f = lim
za
_
b
z
f lim
za
(b z)t = (b a)t,
as required. QQQ
(iii) Accordingly, because ( is countable and f is non-negative,
G
t
=

CC
C

CC
1
t
p
_
C
f
p

1
t
p
_

f
p
is nite, and
_
G
t
f =

CC
_
C
f

CC
tC = tG
t
.
(c) All this is true for every t > 0. Now if we set
f

1
(x) = sup
a>x
1
ax
_
a
x
f
for x R, we have x : f

1
(x) > t = G
t
for every t > 0.
For any t > 0,
1
p
tG
t
= (1
1
q
)tG
t

_
G
t
f
1
q
tR
_

(f
1
q
tR)
+
.
So
_

(f

1
)
p
=
_

0
x : f

1
(x)
p
> tdt
(see 252O)
= p
_

0
u
p1
x : f

1
(x) > udu
(substituting t = u
p
)
466 Fourier analysis 286A
p
_

0
u
p1
G
u
du = p
2
_

0
u
p2
_
_

(f
1
q
uR)
+
_
du
= p
2
_

_

0
max(0, f(x)
1
q
u)u
p2
dudx
(by Fubinis theorem, 252B, because (x, u) u
p2
max(0, f(x)
1
q
u) is measurable and non-negative)
= p
2
_

_
qf(x)
0
u
p2
(f(x)
1
q
u)dudx
=
p
2
q
p1
p(p1)
_

f
p
= (
p
p1
)
p
|f|
p
p
.
(d) Similarly, setting f

2
(x) = sup
a<x
1
xa
_
x
a
f for x R,
_

(f

2
)
p
(
p
p1
)
p
|f|
p
p
. But f

= max(f

1
, f

2
). PPP Of
course f

1
f

and f

2
f

. But also, if f

(x) > t, there must be a non-trivial interval I containing x such that


_
I
f > tI; if a = inf I and b = sup I, then either
_
x
a
f > (x a)t and f

2
(x) > t, or
_
b
x
f > (b x)t and f

1
(x) > t. As
x and t are arbitrary, f

= max(f

1
, f

2
). QQQ
Accordingly
|f

|
p
p
=
_

(f

)
p
=
_

max((f

1
)
p
, (f

2
)
p
)

(f

1
)
p
+ (f

2
)
p
2(
p
p1
)
p
|f|
p
p
.
Taking pth roots, we have the inequality we seek.
286B Lemma Let g : R [0, [ be a function which is non-decreasing on ], ], non-increasing on [, [
and constant on [, ], where . Then for any measurable function f : R [0, ],
_

f g
_

g
sup
a,b,a<b
1
ba
_
b
a
f.
proof Set = sup
a,b,a<b
1
ba
_
b
a
f. For n, k N set E
nk
= x : 2
n
x +2
n
, g(x) 2
n
(k +1), so that
E
nk
is either empty or a bounded interval including [, ], and
_
E
nk
f E
nk
. For n N, set g
n
= 2
n

4
n
1
k=0
E
nk
;
then g
n

nN
is a non-decreasing sequence of functions with supremum g, and
_

f g = sup
nN
_

f g
n
= sup
nN
2
n
4
n
1

k=0
_
E
nk
f
sup
nN
2
n
4
n
1

k=0
E
nk
= sup
nN

g
n
=
_

g,
as claimed.
286C Shift, modulation and dilation Some of the calculations below will be easier if we use the following
formalism. For any function f with domain included in R, and R, we can dene
(S

f)(x) = f(x +), (M

f)(x) = e
ix
f(x), (D

f)(x) = f(x)
whenever the right-hand sides are dened. In the case of S

f and D

f it is sometimes convenient to allow as a


value of the function. We have the following elementary facts.
(a) S

f = f, D
1/
D

f = f if ,= 0.
(b) S

(f g) = S

f S

g, D

(f g) = D

f D

g.
(c) D

[f[ = [D

f[.
286Eb Carlesons theorem 467
(d) If f is integrable, then
(M

f)

= S

f, (S

f)

= M

f, (S

f)

= M

f;
if moreover > 0, then
(D

f)

= D
1/

f, (D

f)

= D
1/

f
(283Cc-283Ce).
(e) If f belongs to L
1
C
= L
1
C
(), so do S

f, M

f and (if ,= 0) D

f, and in this case


|S

f|
1
= |M

f|
1
= |f|
1
, |D

f|
1
=
1
||
|f|
1
.
(f ) If f belongs to L
2
C
so do S

f, M

f and (if ,= 0) D

f, and in this case


|S

f|
2
= |M

f|
2
= |f|
2
, |D

f|
2
=
1

||
|f|
2
.
(g) If f is a rapidly decreasing test function (284A), so are M

f and S

f and (if ,= 0) D

f.
286D Lemma Suppose that f : R [0, ] is a measurable function such that, for some constant C 0,
_
E
f C

E whenever E < . Then f is nite almost everywhere and


_

1
1+|x|
f(x)dx is nite.
proof For any n 1, set E
n
= x : [x[ n, f(x) n; then
nE
n

_
E
n
f C

E
n
,
so E
n

C
2
n
2
and
x : f(x) = =

n1

mn
E
m
has measure at most inf
n1

m=n
E
m
= 0.
As for the integral, set F(x) =
_
x
0
f for x 0. Then, for any a 0,
_
a
0
f(x)
1+x
dx =
F(a)
1+a
+
_
a
0
F(x)
(1+x)
2
dx
(225F)
C
_

a
1+a
+
_
a
0

x
(1+x)
2
dx
_
C
_
1 +
_

0

x
(1+x)
2
dx
_
,
so
_

0
f(x)
1+x
dx C
_
1 +
_

x
(1+x)
2
dx
_
is nite. Similarly,
_
0

f(x)
1x
dx is nite, so we have the result.
286E The Lacey-Thiele construction (a) Let J be the family of all dyadic intervals of the form
_
2
k
n, 2
k
(n + 1)
_
where k, n Z. The essential geometric property of J is that if I, J J then either I J or J I or I J = . Let
Q be the set of all pairs = (I

, J

) J
2
such that I

= 1. For Q, let k

Z be such that I

= 2
k

and J

= 2
k

; let x

be the midpoint of I

, y

the midpoint of J

, J
l

J the left-hand half-interval of J

, J
r

J
the right-hand half-interval of J

, and y
l

the lower quartile of J

, that is, the midpoint of J


l

.
(b) There is a rapidly decreasing test function such that

is real-valued and [
1
6
,
1
6
]

[
1
5
,
1
5
]. PPP Look
at parts (b)-(d) of the proof of 284G. The process there can be used to provide us with a smooth function
1
which
is zero outside the interval [
1
6
,
1
5
] and strictly positive on

1
6
,
1
5
_
; multiplying by a suitable factor, we can arrange that
_

1
= 1. So if we set
2
(x) = 1
_
x

1
for x R,
2
will be smooth, and

,
1
6

,
1
5

. Now
set
0
(x) =
2
(x)
2
(x) for x R, and =

0
;

=
0
(284C) will have the required property. QQQ
468 Fourier analysis 286Eb
For Q, set

= 2
k

/2
M
y
l

S
x

D
2
k

, so that

(x) = 2
k

/2
e
iy
l

x
(2
k

(x x

)).
Observe that

is a rapidly decreasing test function. Now

= 2
k

/2
S
y
l

M
x

D
2
k

, that is,

(y) = 2
k

/2
e
ix

(yy
l

(2
k

(y y
l

)),
which is zero unless [y y
l

[
1
5
2
k

; since the length of J


l

is
1
2
2
k

, this can be true only when y J


l

. We have the
following simple facts.
(i) |

|
2
= 2
k

/2
2
k

/2
||
2
=||
2
for every Q.
(ii) |

|
1
= 2
k

/2
2
k

|
1
=2
k

/2
|

|
1
for every Q.
(iii) If , Q and J

,= J

and J
r

J
r

is non-empty, then J
l

J
l

= so
(

) = (

) = 0,
by 284Ob. (For f, g L
2
C
, I write (f[g) for
_
f g.)
(c) Set w(x) =
1
(1+|x|)
3
for x R. Then there is a C
1
> 0 such that [(x)[ C
1
min(w(3), w(x)
2
) for every
x R (because lim
x
x
6
(x) = lim
x
x
6
(x) = 0). For Q, set w

= 2
k

S
x

D
2
k

w, so that w

(x) =
2
k

w(2
k

(x x

)) for every x. Elementary calculations show that


(i) w

depends only on I

;
(ii)
_

=
_

w = 1 for every ;
(iii)
[

(x)[ C
1
min(2
k

/2
w

(x), 2
3k

/2
w

(x)
2
)
for every x and (because [(x)[ C
1
w(x)
2
C
1
w(x) for every x R).
286F Two partial orders (a) For , Q say that if I

and J

. Then is a partial order on


Q. We have the following elementary facts.
(i) If , then k

.
(ii) If and are incomparable (that is, , and , ), then (I

) (I

) is empty. PPP We may


suppose that k

. If J

,= , then J

, because both are dyadic intervals, and J

is the shorter; but as


, , this means that I

, I

and I

= . QQQ
(iii) If ,

are incomparable and both less than or equal to , then I

= , because J

.
(iv) If and k

k k

, then there is a (unique)

such that

and k

= k. (The point is that


there is a unique I J such that I

I I

and I = 2
k
; and similarly there is just one candidate for J

.)
(b) For , Q say that
r
if I

and J
r

J
r

(that is, either = or J

J
r

), so that, in particular,
. Note that if ,


r
and k

,= k

then J
r

J
r

,= , so (

) = 0 (286E(b-iii)).
(c) It will be convenient to have a shorthand for the following: if P, R Q, say that P R if for every P there
is a R such that .
286G We shall need the results of some elementary calculations. The rst three are nearly trivial.
Lemma (a) For any m N,

n=m
w(n +
1
2
)
1
2(1+m)
2
.
(b) Suppose that P and that I is an interval not containing x

in its interior. Then


_
I
w

(x)I, where x
is the midpoint of I.
(c) For any x R,

n=
w(x n) 2.
(d) There is a constant C
2
0 such that
_

w(x)w(x +)dx C
2
w() whenever 0 1 and R.
(e) There is a constant C
3
0 such that [(

)[ 2
k

/2
2
k

/2
C
3
_
I

whenever , Q and k

.
(f) There is a constant C
4
0 such that whenever Q and k Z, then

Q,,k

=k
_
R\I

C
4
.
proof (a) The point is just that w is convex on ], 0] and [0, [. So we can apply 233Ib with f(x) = x, or argue
directly from the fact that w(n +
1
2
)
1
2
(w(n +
1
2
+x) +w(n +
1
2
x)) for [x[
1
2
, to see that w(n +
1
2
)
_
n+1
n
w for
every n 0. Accordingly
286G Carlesons theorem 469

n=m
w(n +
1
2
)
_

m
w =
1
2(1+m)
2
.
(b) Similarly, because I lies all on the same side of x

, w

is convex on I, so the same inequality yields w

(x)I
_
I
w

.
(c) Let m be such that [x m[
1
2
. Then, using the same inequalities as before to estimate w(x n) for n ,= m,
we have

n=
w(x n) w(x m) +
_
xm
1
2

w +
_

xm+
1
2
w
1 +
_

w = 2.
(d)(i) The rst step is to note that
w(
1
2
(1 +))
w()
=
8(1 +)
3
(3 +)
3
8
for every 0. Now w( +) 4w() whenever 0 and
1
2
. PPP For t
1
2
,
d
dt
tw(t +t) =
12t(1+)
(1+t+t)
4
0,
so
w( +)
1
2
w(
1
2
+
1
2
) 4w(). QQQ
Of course this means that
1

w(
1+
2
) 8w()
whenever 0 and 0 < 1.
(ii) Try C
2
= 16. If 0 < 1 and 0, set =
1+
2
. Then, for any x ,
1 +x + = (1 +)(1 +
x
1+
)
1
2
(1 +),
so w(x +) 8w() and
_

w(x)w(x +)dx 8w()


_

w 8w().
On the other hand,
_

w(x)w(x +)dx w()


_

w(x +)dx
=
1

w(
1+
2
)
_

w 8w().
Putting these together,
_

w(x)w(x +)dx 16w(); and this is true whenever 0 < 1 and 0.


(iii) If = 0, then
_

w(x)w(x +)dx = w()


_

w = w() C
2
w()
for any . If 0 < 1 and < 0, then
_

w(x)w(x +)dx =
_

w(x)w(x )dx
(because w is an even function)
=
_

w(x)w(x )dx C
2
w()
(by (ii) above)
470 Fourier analysis 286G
= C
2
w().
So we have the required inequality in all cases.
(e) Set C
3
= max(C
2
1
C
2
, ||
2
2
/
_
1/2
1/2
w).
(i) It is worth disposing immediately of the case = . In this case,
[(

)[ = |

|
2
2
= ||
2
2
,
while
_
I

= 2
k

_
x

+2
k

1
x

2
k

1
w(2
k

(x x

))dx =
_
1/2
1/2
w,
so certainly [(

)[ C
3
_
I

.
(ii) Now suppose that I

,= I

. In this case, because k

, I

must all lie on the same side of x

, so
_
I

(x

)I

, by (b).
We know from 286E(c-iii) that [

(x)[ 2
k

/2
C
1
w

(x) for every x. So


[(

)[ 2
k

/2
2
k

/2
C
2
1
_

= 2
k

/2
2
k

/2
C
2
1
_

w(2
k

(x x

))w(2
k

(x x

))dx
= 2
k

/2
2
k

/2
C
2
1
_

w(2
k

x + 2
k

(x

))w(x)dx
2
k

/2
2
k

/2
C
2
1
C
2
w(2
k

(x

))
(by (d), since 2
k

1)
2
k

/2
2
k

/2
C
3
w

(x

) 2
k

/2
2
k

/2
C
3
_
I

,
as required.
(f ) Set C
4
= 2

j=0
_

j+
1
2
w; this is nite because
_

w =
1
2(1+)
2
for every 0.
If k < k

then k

,= k for any , so the result is trivial. If k k

, then for each dyadic subinterval I of I

of
length 2
k
there is exactly one such that I

= I. List these as
0
, . . . in ascending order of the centres x

j
, so
that if I

=
_
2
k

m, 2
k

(m+ 1)
_
then x

j
= 2
k

m+ 2
k
(j +
1
2
), for j < 2
kk

. Now
2
kk

j=0
_
2
k

j
= 2
k
2
kk

j=0
_
2
k

w(2
k
(x 2
k

m) j
1
2
)dx
=
2
kk

j=0
_
0

w(x j
1
2
)dx

j=0
_

j+
1
2
w =
1
2
C
4
.
Similarly (since w is an even function, so the whole picture is symmetric about x

2
kk

1
j=0
_

2
k

(m+1)
w

j

1
2
C
4
,
and

,k

=k
_
R\I

C
4
,
as required.
286I Carlesons theorem 471
286H Mass and energy (Lacey & Thiele 00) If P is a subset of Q, E R is measurable, h : R R is
measurable, and f L
2
C
, set
mass
Eh
(P) = sup
P,Q,
_
Eh
1
[J

]
w

sup
Q
_

= 1,
energy
f
(P) = sup
Q
2
k

/2
_

P,
r

[(f[

)[
2
.
If P

P then mass
Eh
(P

) mass
Eh
(P) and energy
f
(P

) energy
f
(P). Note that energy
f
() = 2
k

/2
[(f[

)[ for
any Q, since if
r
then k

.
286I Lemma Set C
5
= 2
12
. If P Q is nite, E R is measurable, h : R R is measurable, and mass
Eh
(P),
then we can nd sets P
1
P, P
2
Q such that mass
Eh
(P
1
)
1
4
,

P
2
I

C
5
E and P P
1
P
2
(in the
notation of 286Fc).
proof (a) Set P
1
= : P, mass
Eh
()
1
4
. Then mass
Eh
(P
1
)
1
4
. If = 0 we can stop here, as P
1
= P.
Otherwise, for each P P
1
let

Q be such that

and
_
Eh
1
[J

]
w

>
1
4
. Let P
2
be the set of elements
of

: P P
1
which are maximal for ; then P P
1
P
2
.
(b) For k N set
R
k
= : P
2
, 2
k

(E h
1
[J

] I
(k)

) 2
2k9
,
where I
(k)

is the interval with the same centre as I

and 2
k
times its length. Now P
2
=

kN
R
k
. PPP Take P
2
. If
k N and x R I
(k)

, then [x x

[ 2
kk

1
, so
w

(x) = 2
k

w(2
k

(x x

)) 2
k

w(2
k1
) = 2
k

(1 + 2
k1
)
3
.
So
1
4
<
_
Eh
1
[J

]
w

=
_
Eh
1
[J

]I

k=0
_
Eh
1
[J

]I
(k+1)
\I
(k)

2
k

(E h
1
[J

] I

) +

k=0
2
k

(E h
1
[J

] I
(k+1)

)(1 + 2
k1
)
3
.
It follows that either
2
k

(E h
1
[J

] I

)
1
8

and R
0
, or there is some k N such that
2
k

(E h
1
[J

] I
(k+1)

)(1 + 2
k1
)
3
2
k4

and
2
k

(E h
1
[J

] I
(k+1)

) (1 + 2
k1
)
3
2
k4
2
2k7
,
so that R
k+1
. QQQ
(c) For every k N,

R
k
I

2
11k
E. PPP If R
k
= , this is trivial. Otherwise, enumerate R
k
as
j

jn
in
such a way that k

j
k

l
if j l n. Dene q : 0, . . . , n 0, . . . , n inductively by the rule
q(l) = min(l q(j) : j < l, (I
(k)

q(j)
J

q(j)
) (I
(k)

l
J

l
) ,= )
for each l n. A simple induction shows that q(q(l)) = q(l) l for every l n. Note that, for l n, I
(k)

q(l)
I
(k)

l
,= ,
so that
I

l
I
(k)

l
I
(k+2)

q(l)
,
because I
(k)

l
I
(k)

q(l)
. Moreover, if j < l n and q(j) = q(l), then both J

j
and J

l
meet J

q(j)
, therefore include it,
and J

j
J

l
. But as
j
and
l
are distinct members of P
2
,
l
,
j
and I

j
I

l
must be empty.
Set M = q(j) : j n. We have
472 Fourier analysis 286I

R
k
I

mM

jn
q(j)=m
I

mM
I
(k+2)

m
= 2
k+2

mM
I

m
2
k+2

mM
2
92k
(E h
1
[J

m
] I
(k)

m
)
2
k+2
2
92k
E = 2
11k
E
because if l, m M and l < m then I
(k)

l
J

l
and I
(k)

m
J

m
are disjoint (since otherwise q(m) l and there can be
no j such that q(j) = m), so that h
1
[J

l
] I
(k)

l
and h
1
[J

m
] I
(k)

m
are disjoint. QQQ
(d) Accordingly

P
2
I

k=0

R
k
I

2
12
E,
as required.
286J Lemma If P Q is nite and f L
2
C
, then

,P,J

=J

(f[

)(

)(

[f)

C
3

P
[(f[

)[
2
C
3
|

P
(f[

|
2
|f|
2
.
proof

,P,J

=J

(f[

)(

)(

[f)

,P,J

=J

1
2
_
[(f[

)[
2
+[(f[

)[
2
_
[(

)[
(because [[
1
2
([[
2
+[[
2
) for all complex numbers , )
=

P
J

=J

[(f[

)[
2
[(

)[

P
[(f[

)[
2

P,J

=J

C
3
_
I

(by 286Ge, since k

= k

if J

= J

P
[(f[

)[
2
C
3
_

(because if ,

are distinct members of P and J

= J

, then I

and I

are disjoint)
= C
3

P
[(f[

)[
2
= C
3

P
(f[

)(

[f)
= C
3
_

P
(f[

f
_
C
3
|

P
(f[

|
2
|f|
2
by Cauchys inequality (244Eb).
286K Lemma Set
C
6
= 4(C
3
+ 4C
3

2C
4
).
Let P Q be a nite set, f L
2
C
and |f|
2
= 1. Suppose that energy
f
(P). Then we can nd nite sets P
1
P
and P
2
Q such that energy
f
(P
1
)
1
2
,
2

P
2
I

C
6
, and P P
1
P
2
.
proof (a) We may suppose that > 0 and that P ,= , since otherwise we can take P
1
= P and P
2
= .
286K Carlesons theorem 473
(i) For Q, A Q set
T

= : P,
r
, (A) =

A
[(f[

)[
2
.
There are only nitely many sets of the form T

; let R Q be a non-empty nite set such that whenever Q and


T

is not empty, there is a

R such that T

= T

and k

; this is possible because if A P is not empty then


k

min
A
k

whenever A = T

.
(ii) Choose
0
,
1
, . . . , P

0
, P

1
, . . . inductively, as follows. P

0
= P. Given that P

j
P is not empty, consider
R
j
= : R, 2
k

(P

j
T

)
1
4

2
.
If R
j
= , stop the induction and set n = j, P
2
=
l
: l < j, P
1
= P

j
. Otherwise, among the members of R
j
take
one with y

as far to the left as possible, and call it


j
; set P

j+1
= P

j
: P,
j
, and continue. Note that
as R
j+1
R
j
for every j, y

j+1
y

j
for every j.
The induction must stop at a nite stage because if it does not stop with n = j then (P

j
T

j
) > 0, so P

j
T

j
is
not empty and P

j+1
P

j
T

j
is a proper subset of P

j
, while P

0
= P is nite. Since R
n
= ,
energy
f
(P
1
) = energy
f
(P

n
) = sup
Q
2
k

/2
_
(P

n
T

)
= max
R
2
k

/2
_
(P

n
T

)
1
2
.
We also have P P
1

j
: j < n.
(iii) Set P

j
= P

j
T

j
P

j
P

j+1
for j < n, so that P

j

j<n
is disjoint, and P

j<n
P

j
P. Then if P

,
j < n and J

j
J
l

, I

j
= . PPP??? Otherwise, take l < n such that P

l
. Then J

j
J

, so k

j
k

and I

must be included in I

j
; thus
j
and / P

j+1
. On the other hand,
y

j
J

j
J
l

, y

l
J
r

l
J
r

,
so y

j
< y

l
and j < l. But this means that / P

l
, while we chose l such that P

l
. XXXQQQ
It follows that if , P

are distinct and J


l

J
l

is not empty, then I

= . PPP If J

= J

this is true just


because ,= . Otherwise, since J

and J

intersect, one is included in the other; suppose that J

. Since J

meets J
l

, J

J
l

. Now let j < n be such that P

j
; then
j
, so J

j
J

J
l

, and I

j
I

= by
the last remark. QQQ
(b) Now let us estimate

j<n
I

j
=

j<n
2
k

2
4

j<n
(P

j
)
= 4

j<n

j
[(f[

)[
2
= 4

[(f[

)[
2
= 4
say.
Because |f|
2
= 1, we have |

(f[

|
2
(see 286J). So

2
|

(f[

|
2
2
=

,P

(f[

)(

)(

[f)
=

,P

,J

=J

(f[

)(

)(

[f) + 2

,P

,J

J
l

(f[

)(

)(

[f)
because (

) = 0 unless J
l

J
l

,= , as noted in 286Eb. Take these two terms separately. For the rst, we have

,P

,J

=J

(f[

)(

)(

[f)

C
3

by 286J. For the second term, we have


474 Fourier analysis 286K

,P

,J

J
l

(f[

)(

)(

[f)

[(f[

)[

,J

J
l

[(

)(

[f)[

[(f[

)[
2

_

P

,J

J
l

[(

)(

[f)[
_
2
=

j<n
H
j
,
where for j < n I set
H
j
=

j
_

P

,J

J
l

[(

)(

[f)[
_
2
.
Now we can estimate H
j
by observing that, for any P

,
[(

[f)[ = 2
k

/2
energy
f
() 2
k

/2
,
while if , P

and J
l

then
[(

)[ 2
k

/2
2
k

/2
C
3
_
I

by 286Ge. We also need to know that if P

j
and ,

are distinct elements of P

such that J

J
l

J
l

, then I

,
I

and I

j
are all disjoint, by (a-iii) above, because J

j
J

. So we have
H
j

j
_

P

,J

J
l

2
k

/2
2
k

/2
2
k

/2
C
3
_
I

_
2
= C
2
3

j
2
k

_

P

,J

J
l

_
I

_
2
C
2
3

j
2
k

(
_
R\I

j
w

)
2
C
2
3

k=k

j
2
k

j
,k

=k
_
R\I

j
w

C
2
3

k=k

j
2
k
C
4
(by 286Gf, since
j
for every P

j
)
= C
2
3

2
2
k

j
+1
C
4
.
Accordingly

j<n
H
j
2C
2
3

2
C
4

j<n
2
k

j
2C
2
3
C
4
4.
Putting these together,

,P

,J

=J

(f[

)(

)(

[f)

+ 2

,P

,J

J
l

(f[

)(

)(

[f)

C
3
+ 2

j<n
H
j
C
3
+ 4C
3

_
2C
4

=
_
C
3
+ 4C
3
_
2C
4
_
=
1
4
C
6
.
But this means that

j<n
I

j
4 C
6
,
and P
2
=
j
: j < n has the property required.
286L Carlesons theorem 475
286L Lemma Set
C
7
= C
1
_
6 +
28
w(3/2)
+
4

14C
3
w(3/2)
_
.
Suppose that P is a nite subset of Q with an upper bound in Q. Suppose that E R is measurable, h : R R is
measurable and f L
2
C
. Then

P
[(f[

)
_
Eh
1
[J
r

[ 2
k

C
7
energy
f
(P) mass
Eh
(P).
proof Set = energy
f
(P),

= mass
Eh
(P). If P = the result is trivial, so suppose that P ,= .
(a)(i) For a dyadic interval J =
_
2
k
n, 2
k
(n + 1)
_
set J

=
_
2
k
(n 1), 2
k
(n + 2)
_
, the half-open interval with the
same centre as J and three times its length. Let be the family of those J J such that I

, J

for any P
such that I

J. Because P is nite, all suciently small intervals belong to , and

= R; let / be the set of


maximal members of , so that / is disjoint. Then

/ = R. PPP The point is that P ,= ; x P for the moment.


If J , consider for each n N the interval

J
(n)
J including J with length 2
n
J. Then there is some n N such
that

J
(n)
I

and I

(

J
(n)
)

, so that

J
(k)
/ for any k n, and there must be some k < n such that

J
(k)
/.
Thus J

J
(k)

/; as J is arbitrary,

/ =

= R. QQQ
(ii) For K /, let l
K
Z be such that K = 2
l
K
. If l
K
k

, so that K I

, then K must lie within the


interval

I with centre x

and length 7I

, since otherwise we should have I

= , where

K is the dyadic interval
of length 2K including K, and

K would belong to . But this means that

KK,l
K
k

K 7I

= 7 2
k

,
because / is disjoint.
(iii) For any l < k

, there are just three members K of / such that l


K
= l. PPP If J J and J > I

, then either
I

or I

= , and J i I

is empty. This means that if K J and K = 2


l
, K / i I

is
empty and I

. So if I


_
2
l
n, 2
l
(n + 1)
_
and K =
_
2
l
m, 2
l
(m+ 1)
_
, we shall have K / i
either m = n 2 or m = n + 2 or m = n 3 is even or m = n + 3 is odd;
which for any given n happens for just three values of m. QQQ
(b) For P, let

be a complex number of modulus 1 such that

(f[

)
_
Eh
1
[J
r

= [(f[

)
_
Eh
1
[J
r

[.
Set W = P /. For (, K) W, set

K
= (f[

)
_
Eh
1
[J
r

]K

.
The aim of the proof is to estimate

(f[

)
_
Eh
1
[J
r

(,K)W

K
.
It will be helpful to note straight away that

(,K)W
[
K
[

P
[(f[

)[
_

[
is nite.
Set
W
0
= (, K) : P, K /, k

l
K
k

,
W
1
= (, K) : P, K /, l
K
< k

,
W
2
= (, K) : P, K /, k

< l
K
, ,
r
,
W
3
= (, K) : P, K /, k

< l
K
,
r
.
Because k

for every P, W = W
0
W
1
W
2
W
3
. I will give estimates for

j
=

(,K)W
j

K
for each j; the three components in the expression for C
7
given above are bounds for [
0
[+[
1
[, [
2
[ and [
3
[ respectively.
476 Fourier analysis 286L
(c)(i) Whenever K / and P,
[(f[

)[ 2
k

/2
,
by 286H, and
_
Eh
1
[J
r

]K
[

[ 2
3k

/2
C
1
_
Eh
1
[J
r

]K
w
2

(286E(c-iii))
2
3k

/2
C
1
_
Eh
1
[J

]
w

sup
xK
w

(x)
2
3k

/2
C
1

sup
xK
w

(x)
= 2
k

/2
C
1

w(2
k

(x

, K)),
where I write (x

, K) for inf
xK
[x x

[. So, for xed K / and k l


K
,

P,k

=k
[
K
[ 2
k
C
1

P,k

=k
w(2
k
(x

, K))
2
k
C
1

n=2
kl
K
w(n +
1
2
)
because the x

, for P and k

= k, are all distinct and all a distance at least 2


l
K
from K (because I

, K

); so
there are at most two with (x

, K) = 2
k
(n +
1
2
) for each n 2
kl
K
. So we have

P,k

=k
[
K
[ 2
k
C
1

(1 + 2
kl
K
)
2
2
k2
C
1

by 286Ga. And this is true whenever K / and k l


K
.
(ii) Now
[
0
[

(,K)W
0
[
K
[ =

KK
l
K
k

P
k

l
K
[
K
[
=

KK
l
K
k

k=l
K

P
k

=k
[
K
[

KK
l
K
k

k=l
K
2
k2
C
1

= C
1

KK,l
K
k

2
l
K
1
=
1
2
C
1

KK,l
K
k

K 4 2
k

C
1

by the formula in (a-ii). This deals with


0
.
(d) Next consider W
1
. We have
[
1
[

(,K)W
1
[
K
[ =

KK
l
K
<k

k=k

P
k

=k
[
K
[
C
1

KK
l
K
<k

k=k

2
k
(1 + 2
kl
K
)
2
(by (c-i) above )
286L Carlesons theorem 477
C
1

l=

KK
l
K
=l
(1 + 2
k

l
)
2

k=k

2
k
= 2
k

+1
C
1

l=

KK
l
K
=l
(1 + 2
k

l
)
2
= 6 2
k

C
1

l=
(1 + 2
k

l
)
2
(by (a-iii))
= 6 2
k

C
1

l=1
(1 + 2
l
)
2
2 2
k

C
1

because

l=1
(1 + 2
l
)
2

l=1
1
4
l
=
1
3
.
This deals with
1
.
(e) For K /, set G
K
= K E

P,k

<l
K
h
1
[J

]. Then G
K
2

K/w(
3
2
). PPP If l
K
k

, then G
K
= , so
we may suppose that l
K
> k

. Let

K J be the dyadic interval containing K and with twice the length. Then

K / ,
so there is a P such that

K

and I



K, so that k

l
K
1 k

. Let Q be such that


and k

= l
K
1 (286F(a-iv)). Then I

meets

K

, so

K is either equal to I

or adjacent to it, and [x x

[
3
2
2
k

for every x

K, therefore for every x K. Accordingly
w

(x) 2
k

w(
3
2
) = w(
3
2
)/2K
for every x K. On the other hand, because P and ,
_
Eh
1
[J

]
w

. So
(E h
1
[J

] K) 2

K/w(
3
2
).
Now suppose that

P and k

< l
K
. Then J

is the dyadic interval of length 2


k

including J

. But J

is
the dyadic interval of length 2
k

including J

, so includes J

, and h
1
[J

] h
1
[J

]. As

is arbitrary, G
K

E h
1
[J

] K and G
K
2

K/w(
3
2
), as claimed. QQQ
(f )(i) For x R, set
v
2
(x) =

(,K)W
2

(f[

(x)(E h
1
[J
r

] K)(x)

.
Then, for any x R, either v
2
(x) = 0 or there is a k k

such that
v
2
(x) =

P,k

=k

(f[

(x)

.
PPP If v
2
(x) ,= 0, then we have a pair (, L) W
2
such that x E h
1
[J
r

] L. Now suppose that (, K) W


2
and
either K ,= L or k

,= k

. If K ,= L then of course x / L, because / is a disjoint family. If k

,= k

, then examine
J

and J

. These are dyadic intervals of dierent lengths, and both include J

. On the other hand, neither of the


right-hand halves J
r

, J
r

includes J
r

, because ,
r
and ,
r
. So either J

J
r

= (if k

< k

) or J

J
r

= (if
k

> k

); in either case, J
r

J
r

is empty, and x / h
1
[J
r

].
On the other hand, of couse, if P and k

= k

, then k

< l
L
and J
r

= J
r

does not include J


r

, so that
(, L) W
2
and x E h
1
[J
r

] L.
Thus
v
2
(x) = [

(,K)W
2
,xh
1
[J
r

]K

(f[

(x)[ = [

P,k

=k

(f[

(x)[,
and we can set k = k

. QQQ
(ii) It follows that v
2
(x) 2C
1
for every x R. PPP If v
2
(x) = 0 this is trivial. Otherwise, take k from (i). Then
v
2
(x)

P,k

=k
[(f[

(x)[

P,k

=k
2
k/2
2
k/2
C
1
w

(x)
(by 286H and 286E(c-iii))
478 Fourier analysis 286L
= C
1

P,k

=k
w(2
k
(x x

)) C
1

n=
w(2
k
x n
1
2
)
(because the x

, for P and k

= k, are all distinct and of the form 2


k
(n +
1
2
))
2C
1

by 286Gc. QQQ
(iii) Note also that, if v
2
(x) > 0, there is a pair (, K) W
2
such that x h
1
[J

] K, so that k

< l
K
and x G
K
. But now we have
[
2
[ =


(,K)W
2

(f[

)
_

(E h
1
[J
r

] K)

v
2
=

KK,l
K
>k

_
G
K
v
2

KK,l
K
>k

4C
1

K/w(
3
2
)
(putting the estimates in (e) and (ii) just above together)
28 2
k

C
1

/w(
3
2
)
by (a-ii). This deals with
2
.
(g) Set P

= : P,
r
and g =

(f[

. Then
| g|
2
2
2
k

C
3

2
.
PPP If ,

and k

,= k

, then (

) = 0 (286Fb). While if k

= k

, then J

= J

, because and

have a
common upper bound . So
| g|
2
2
=

(f[

)(

)(

[f)

,J

=J

(f[

)(

)(

[f)

C
3

[(f[

)[
2
(by 286J)
C
3
2
k

2
by the denition of energy. QQQ
(h) For m N, set
g
m
=

,k

(f[

.
Then whenever x, x

R and [x x

[ 2
m
, [ g
m
(x)[
1
2
C
1
g

(x

), where
g

(x

) = sup
ax

b,a<b
1
ba
_
b
a
[ g[
as in 286A. PPP (i) Since k

for every P

, we may take it that m k

. Let

J be the dyadic interval of length
2
m
including J

, and y its midpoint. Set = S


y
D
2
m
/3

, that is, (y) =

(
1
3
2
m
(y y)) for y R.
(ii) If P

and k

m and

(y) ,= 0, then y J

. But J



J J

is not empty, so J



J, [y y[
1
2
2
m
,
[
1
3
2
m
(y y)[
1
6
and (y) = 1.
(iii) If P

and k

> m and

(y) ,= 0, then J
r



J J
r

is non-empty, so

J J
r

and y y

y; now
y y = ( y y

) + (y

y)
1
2
2
m
+
1
20
2
k

3
5
2
m
, [
1
3
2
m
(y y)[
1
5
and (y) = 0.
(iv) What this means is that if P

then
286L Carlesons theorem 479

if k

m,
= 0 if k

> m,
so that

g
m
=

g.
(v) By 283M, g
m
=
1

2
g

, where g

is the convolution of g and the inverse Fourier transform

of .
(Strictly speaking, 283M, with the help of 284C, tells us that g
m
and
1

2
g

have the same Fourier transforms. By


283G, they are equal almost everywhere; by 255K, the convolution is dened everywhere and is continuous; so in fact
they are the same function.) Now

= 3 2
m
M
y
D
32
m

= 3 2
m
M
y
D
32
m,
that is,

(x) = 3 2
m
e
ix y
(3 2
m
x)
for y R.
(vi) Set (x) = min(w(3), w(x)) for x R, so that is non-decreasing on ], 3], non-increasing on [3, [,
and constant on [3, 3], and [(x)[ C
1
(x) for every x, by the choice of C
1
(286Ec). Take any x, x

R such that
[x x

[ 2
m
. Then
[ g
m
(x)[
1

2
_

[ g(x t)[[

(t)[dt =
32
m

2
_

[ g(x t)[[(3 2
m
t)[dt

32
m

2
C
1
_

[ g(x t)[(3 2
m
t)dt =
32
m

2
C
1
_

[ g(x +t)[(3 2
m
t)dt
(because is an even function)

32
m

2
C
1
_

(3 2
m
t)dt sup
a2
m
,b2
m
1
ba
_
b
a
[ g(x +t)[dt
(by 286B, because t (3 2
m
t) is non-decreasing on ], 2
m
], non-increasing on [2
m
, [ and constant on
[2
m
, 2
m
])
=
1

2
C
1
_

sup
ax2
m
,bx+2
m
1
ba
_
b
a
[ g[
1
2
C
1
_

w g

(x

)
(because if a x 2
m
and b x + 2
m
then a x

b)
=
1
2
C
1
g

(x

)
as required. QQQ
(i) For x R, set
v
3
(x) =

(,K)W
3

(f[

(x)(E h
1
[J
r

] K)(x)

.
Then whenever L / and x, x

L, [v
3
(x)[ C
1
g

(x

). PPP We may suppose that v


3
(x) ,= 0, so that, in particular,
x E. The only pairs (, K) contributing to the sum forming v
3
(x) are those in which x K, so that K = L, and
h(x) J
r

. Moreover, since we are looking only at such that


r
, so that J
r

J
r

, J
r

will always be the dyadic


interval of length 2
k

1
including J
r

. So these intervals are nested, and there will be some m such that (for
r
)
h(x) J
r

i k

m. Accordingly
v
3
(x) =

,mk

<l
L

(f[

(x)

= [ g
l
L
1
(x) g
m1
(x)[
(we must have m < l
L
because v
3
(x) ,= 0). Now [x x

[ 2
l
L
+1
2
m+1
, so (h) tells us that both [ g
l
L
1
(x)[ and
[ g
m1
(x)[ are at most
1
2
C
1
g

(x

), and v
3
(x) C
1
g

(x

), as claimed. QQQ
It follows that v
3
(x)
C
1
L
_
L
g

for every x L.
480 Fourier analysis 286L
(j) Now we are in a position to estimate
[
3
[ = [

(,K)W
3

K
[
_

v
3

KK,l
K
>k

_
G
K
v
3
(because if v
3
(x) ,= 0 there are (, K) W
3
such that x K, and in this case x G
K
and l
K
> k

KK,l
K
>k

G
K

C
1
K
_
K
g

(by (i) above, because G


K
K)
C
1

KK,l
K
>k

w(
3
2
)
_
K
g

(by (e))

2C
1

w(
3
2
)
_

I
g

(because if l
K
> k

then K

I, as noted in (a-ii))

2C
1

w(
3
2
)
_

I | g

|
2
(by Cauchys inequality)

2C
1

w(
3
2
)

7 2
k

/2

8| g|
2
(by the Maximal Theorem, 286A)

4C
1

14
w(
3
2
)
2
k

/2
2
k

/2
_
C
3

(by (g))
=
4C
1

14C
3
w(
3
2
)

2
k

.
(k) Assembling these,

(f[

)
_
Eh
1
[J
r

P,KK

K
=
3

j=0

(,K)W
j

K

3

j=0
[
j
[
4 2
k

C
1

+ 2 2
k

C
1

+ 28 2
k

C
1

/w(
3
2
)
+ 4
_
14C
3
2
k

C
1

/w(
3
2
)
= 2
k

C
7

,
as claimed.
286M The Lacey-Thiele lemma Set C
8
= 3C
7
(C
5
+C
6
). Then

Q
[(f[

)
_
Eh
1
[J
r

[ C
8
whenever f L
2
C
, |f|
2
= 1, E 1 and h : R R is measurable.
proof (a) The rst step is to combine 286I and 286K, as follows: if P Qis nite and max(
_
mass
Eh
(P), energy
f
(P))
, there are P

P and R Q such that max(


_
mass
Eh
(P

), energy
f
(P

))
1
2
,
2

R
2
k

C
5
+ C
6
,
and P P

R. PPP Since mass


Eh
(P)
2
, 286I tells us that there are P
0
P, R
0
Q such that
mass
Eh
(P
0
)
1
4

2
,
2

R
0
2
k

C
5
, and P P
0
R
0
. Now turn to 286K: since energy
f
(P
0
) energy
f
(P) ,
we can nd P

P
0
and R
1
Q such that energy
f
(P

)
1
2
,
2

R
1
2
k

C
6
, and P
0
P

R
1
.
Now mass
Eh
(P

) mass
Eh
(P
0
)
1
4

2
, so max(
_
mass
Eh
(P

), energy
f
(P

))
1
2
; and setting R = R
0
R
1
,

R
2
k

C
5
+C
6
, while P P

R. QQQ
286N Carlesons theorem 481
(b) Now take any nite P Q. Let k N be such that max(
_
mass
Eh
(P), energy
f
(P)) 2
k
. By (a), we can
choose P
n

nN
, R
n

nN
inductively such that P
0
= P and, for each n N,
P
n+1
P
n
, P
n
P
n+1
R
n
,
max(
_
mass
Eh
(P
n
), energy
f
(P
n
)) 2
kn
, 2
2k2n

R
n
2
k

C
5
+C
6
.
Since energy
f
() = 2
k

/2
[(f[

)[ > 0 whenever (f[

) ,= 0 (286H), (f[

) = 0 whenever

nN
P
n
, and

(f[

)
_
Eh
1
[J
r

nN
P
n
\P
n+1

(f[

)
_
Eh
1
[J
r

n=0

P
n
\P
n+1

(f[

)
_
Eh
1
[J
r

n=0

R
n

P
n

(f[

)
_
Eh
1
[J
r

n=0

R
n
2
k

C
7
energy
f
(P
n
) mass
Eh
(P
n
)
(by 286L)

n=0
C
7
(C
5
+C
6
)2
2n2k
2
kn
min(1, 2
2k2n
)
(because mass
Eh
(P
n
) 1 for every n, as noted in 286H )
= C
7
(C
5
+C
6
)

n=0
min(2
nk
, 2
kn
)
C
7
(C
5
+C
6
)

n=
min(2
n
, 2
n
) = 3C
7
(C
5
+C
6
).
(c) Since this true for every nite P Q,

Q
[(f[

)
_
Eh
1
[J
r

[ C
8
,
as claimed.
286N Lemma Set C
9
= C
8

2. Suppose that f L
2
C
, h : R R is measurable and F < . Then

Q
[(f[

)
_
Fh
1
[J
r

[ C
9
|f|
2

F.
proof This is trivial if |f|
2
= 0, that is, f = 0 a.e. So we may take it that |f|
2
> 0. Dividing both sides by |f|
2
,
we may suppose that |f|
2
= 1.
Let k Z be such that 2
k1
< F 2
k
. We have a bijection

: Q Q dened by saying that

= (2
k
I

, 2
k
J

); so that k

= k

+k, x

= 2
k
x

, y
l

= 2
k
y
l

, J
r

= 2
k
J
r

, and for every x R

(2
k
x) = 2
k

/2
e
2
k
iy
l

x
(2
k

(2
k
x x

))
= 2
k

/2
e
iy
l

x
(2
k

+k
(x 2
k
x

))
= 2
k

/2
e
iy
l

x
(2
k

(x x

)) = 2
k/2

(x).
Write

F = 2
k
F, so that

F 1, and

h(x) = 2
k
h(2
k
x) for every x. Then, for Q,
F h
1
[J
r

] = x : x F, h(x) J
r

= x : 2
k
x

F, 2
k

h(2
k
x) J
r

= x : 2
k
x

F,

h(2
k
x) J
r

= 2
k
x : x

F,

h(x) J
r

.
Write

f(x) = 2
k/2
f(2
k
x), so that
482 Fourier analysis 286N
|

f|
2
= 2
k/2
|D
2
kf|
2
= |f|
2
= 1,
while
(f[

) =
_

= 2
k
_

f(2
k
x)

(2
k
x)dx = (

f[

)
for every Q. Putting all these together,

(f[

)
_
Fh
1
[J
r

= 2
k

(f[

)
_
2
k
(Fh
1
[J
r

])

(2
k
x)dx

= 2
k/2

(

f[

)
_

h
1
[J
r

= 2
k/2

(

f[

)
_

h
1
[J
r

2
k/2
C
8
(by the Lacey-Thiele lemma, applied to

h,

F and

f)
C
9
_
F = C
9
|f|
2
_
F.
286O Lemma Suppose that f L
2
C
. For x R, set
(Af)(x) = sup
zR

Q,zJ
r

[(f[

(x)[.
Then Af : R [0, ] is Borel measurable, and
_
F
Af C
9
|f|
2

F whenever F < .
proof (a) For z R and nite P Q, set g
Pz
=

P,zJ
r

[(f[

[, so that
Af(x) = supg
Pz
(x) : P Q is nite, z R.
Because every g
Pz
is continuous, Af is Borel measurable and
_
F
Af = sup
_
F
g
P
0
z
0
. . . g
P
n
z
n
: P
0
, . . . , P
n
Q are nite, z
0
, . . . , z
n
R
for every measurable set F (256M).
(b) Given nite sets P
0
, . . . , P
n
Q and z
0
, . . . , z
n
, x R, set
g(x) = max
jn
g
P
j
z
j
(x), l(x) = minj : j n, g(x) = g
P
j
z
j
(x), h(x) = z
l(x)
;
because every g
P
j
z
j
is continuous, l : R 0, . . . , n and h : R R are measurable. Now
_
F
g =
_
F
g
P
l(x)
,h(x)
(x)dx =
_
F

P
l(x)
,h(x)J
r

[(f[

(x)[dx

_
F

Q,h(x)J
r

[(f[

(x)[dx
=

Q
_
Fh
1
[J
r

]
[(f[

[ C
9
|f|
2
_
F
by 286N. Since P
0
, . . . , P
n
and z
0
, . . . , z
n
are arbitrary,
_
F
Af C
9
|f|
2

F.
286P Lemma For any z R, dene
z
: R [0, 1] by setting

z
(y) =

(2
k
(y y))
2
whenever there is a dyadic interval J J of length 2
k
such that z belongs to the right-hand half of J and y belongs to
the left-hand half of J and y is the lower quartile of J, and zero if there is no such J. Then 0
z
(y) 1 for every
y R,
z
(y) = 0 if y z, and 2[(

f
z
)

[ Af for any rapidly decreasing test function f.


286P Carlesons theorem 483
proof (a) I had better start by explaining why the recipe above denes a function
z
. Let M be the set of those k Z
such that z belongs to the right-hand half of the dyadic interval

J
k
of length 2
k
containing z. For k M, let y
k
be the
midpoint of the left-hand half

J
l
k
of

J
k
, and set
k
(y) =

(2
k
(y y
k
))
2
for y R; then
k
is smooth and zero outside

J
l
k
. But now observe that if k, k

are distinct members of M, then



J
l
k
and

J
l
k
are disjoint, as observed in 286E(b-iii).
So
z
is just the sum

kM

k
. Because

takes values in [0, 1], so does


z
. If y z, then of course y /

J
l
k
for any
k M, so
z
(y) = 0.
(b) Fix a rapidly decreasing test function f. For k M, set R
k
= (I,

J
k
) : I J, I = 2
k
, so that :
Q, z J
r

kM
R
k
, and

kM

R
k
[(f[

(x)[ (Af)(x).
Now, for k M, 2(

f
k
)

R
k
(f[

. PPP If R
k
, y
l

= y
k
and x

is of the form 2
k
(n +
1
2
) for some
n Z. So
(f[

) =
_

(284O)
=
_

f(t) 2
k/2
e
2
k
i(n+
1
2
)(t y
k
)

(2
k
(t y
k
))dt
(by the formula in 286Eb, because

is real-valued)
= 2
k/2
_

f(2
k
t + y
k
)e
i(n+
1
2
)t

(t)dt
= 2
k/2
_

f(2
k
t + y
k
)e
i(n+
1
2
)t

(t)dt
(because

(t) = 0 if [t[
1
5
)
= 2
k/2
_

g(t)e
int
dt,
where g(t) =

f(2
k
t + y
k
)e
it/2

(t) for < t . So if we set c


n
=
1
2
_

g(t)e
int
dt, as in 282A, we have
(f[

) = 2
k/2
2c
n
when R
k
and x

= 2
k
(n +
1
2
). Note that as g is smooth and zero outside [
1
5
,
1
5
],

n=
[c
n
[ < (282Rb).
Now, for any y

J
l
k
,

R
k
(f[

(y) =

n=
2
k/2
2c
n
2
k/2
e
2
k
i(n+
1
2
)(y y
k
)

(2
k
(y y
k
))
= 2

(2
k
(y y
k
))e
2
k1
i(y y
k
)

n=
c
n
e
2
k
in(y y
k
)
= 2

(2
k
(y y
k
))e
2
k1
i(y y
k
)

n=
c
n
e
in2
k
(y y
k
)
= 2

(2
k
(y y
k
))e
2
k1
i(y y
k
)
g(2
k
(y y
k
))
(by 282L, because 2
k
[y y
k
[
1
4
< )
= 2e
2
k1
i(y y
k
)

(2
k
(y y
k
))

f(y)e
2
k1
i(y y
k
)

(2
k
(y y
k
))
= 2

f(y)
k
(y).
On the other hand, if y R

J
l
k
,
k
(y) =

(y) = 0 for every R


k
, so again

R
k
(f[

(y) = 2

f(y)
k
(y).
Next,

R
k
[(f[

)[ = 2 2
k/2

n=
[c
n
[
484 Fourier analysis 286P
and
sup
R
k
_

[ = 2
k/2
_

[
are nite. So

R
k
_
[(f[

[ is nite. Accordingly, for any x R,


2(

f
k
)

(x) =
_

R
k
(f[

(x)
=
1

2
_

R
k
(f[

(y)e
ixy
dy
=
1

R
k
_

(f[

(y)e
ixy
dy
(226E)
=
1

R
k
(f[

)(

(x) =
1

R
k
(f[

(x)
by 284C. QQQ
(c) Because every
k
is non-negative,
z
=

kM

k
is bounded above by 1, and

f is integrable,
2(

f
z
)

(x) = 2
1

2
_

e
ixy

f(y)
z
(y)dy
=

2
_

kM
e
ixy

f(y)
k
(y)dy
=

kM
_

e
ixy

f(y)
k
(y)dy
(by 226E again)
= 2

kM
(

f
k
)

(x) =

kM

R
k
(f[

(x),
and
2[(

f
z
)

(x)[

kM

R
k
[(f[

(x)[ (Af)(x).
286Q Lemma For > 0 and y, z, R, set

z
(y) =
z+
(y +). Then
(a) the function (, , y, z)

z
(y) : ]0, [ R
3
[0, 1] is Borel measurable;
(b) for any rapidly decreasing test function f,
2[(

z
)

[ D
1/
A(M

f)
(in the notation of 286C) at every point.
proof (a) We need only observe that (y, z)
z
(y) : R
2
R is Borel measurable, and that (, , y, z)

z
(y) is
built up from this, + and .
(b) Set v = z +, so that

z
= D

v
. Then

z
=

f D

v
= D

(S

D
1/

f
v
)
= D

(S

(D

f)


v
) = D

((M

f)


v
),
so
(

z
)

=
_
D

((M

f)


v
)
_

= D
1/
_
S

((M

f)


v
)
_

= D
1/
M

_
(M

f)


v
_

286R Carlesons theorem 485


and
2[(

z
)

[ = 2D
1/

_
(M

f)


v
_

D
1/
A(M

f)
by 286P.
286R Lemma For any y, z R,

z
(y) =
_
2
1
1

_
lim
n
1
n
_
n
0

z
(y)d
_
d
is dened, and

z
(y) =

1
(0) > 0 if y < z,
= 0 if y z.
proof (a) The case y z is trivial; because if y z then y + z + for all > 0 and R, so that

z
(y) = 0
for every > 0, R and

z
(y) = 0. For the rest of the proof, therefore, I look at the case y < z.
(b)(i) Given y < z R and > 0, set l = log
2
(20(z y)). Then

z,,+2
l
(y) =

z
(y) for every R. PPP If

z
(y) =
z+
(y +) is non-zero, there must be k, m Z such that
2
k
(m+
1
2
) z + < 2
k
(m+ 1)
and

(2
k
(y +) (m+
1
4
))
2
=

z
(y) ,= 0,
so
2
k
m y + 2
k
(m+
9
20
)
because

is zero outside [
1
5
,
1
5
]. In this case,
1
20
2
k
< (z y), so that k l. We therefore have
2
k
(m+ 2
lk
+
1
2
) z + + 2
l
< 2
k
(m+ 2
lk
+ 1),
2
k
(m+ 2
lk
) y + + 2
l
< 2
k
(m+ 2
lk
+
1
2
),
so

z,,+2
l
(y) =

(2
k
(y + + 2
l
) (m+ 2
lk
+
1
4
))
2
=

z
(y).
Similarly,
2
k
(m2
lk
+
1
2
) z + 2
l
< 2
k
(m2
lk
+ 1),
2
k
(m2
lk
) y + 2
l
< 2
k
(m2
lk
+
1
2
),
so

z,,2
l
(y) =

(2
k
(y + 2
l
) (m2
lk
+
1
4
))
2
=

z
(y).
What this shows is that

z,,+2
l
(y) =

z
(y) if either is non-zero, so we have the equality in any case. QQQ
(ii) It follows that g(, y, z) = lim
b
1
b
_
b
0

z
(y)d is dened. PPP Set
= 2
l
_
2
l
0

z
(y)d.
From (i) we see that
= 2
l
_
2
l
(m+1)
2
l
m

z
(y)d
486 Fourier analysis 286R
for every m Z, and therefore that
=
1
2
l
m
_
2
l
m
0

z
(y)d
for every m 1. Now
z
(y) is always greater than or equal to 0, so if 2
l
m b 2
l
(m+ 1) then
m
m+1
=
1
2
l
(m+1)
_
2
l
m
0

z

1
b
_
b
0

1
2
l
m
_
2
l
(m+1)
0

z
=
m+1
m
,
which approach as b . QQQ
(c) Because (, )

z
(y) is Borel measurable, each of the functions
1
n
_
n
0

z
, for n 1, is Borel
measurable (putting 251M and 252P together), and g(, y, z) : ]0, [ R is Borel measurable; at the same time,
since 0

z
(y) 1 for all and , 0 g(, y, z) 1 for every , and

z
(y) =
_
2
1
1

g(, y, z)d is dened in [0, 1].


(d) For any y < z, R and > 0, g(, y + , z + ) = g(, y, z). PPP It is enough to consider the case 0. In
this case
g(, y +, z +) = lim
b
1
b
_
b
0

z+,,
(y +)d
= lim
b
1
b
_
b
0

z++
(y + +)d
= lim
b
1
b
_
b+

z+
(y +)d
= lim
b
1
b
_
b+

z
(y)d,
so
[g(, y +, z +) g(, y, z)[ = lim
b
1
b

_
b+
b

z
(y)d
_

0

z
(y)d

lim
b
2
b
= 0. QQQ
It follows that whenever y < z and R,

z+
(y +) =
_
1
0
1

g(, y +, z +)d =
_
1
0
1

g(, y, z)d =

z
(y).
(e) The next essential fact to note is that
2z
(2y) is always equal to
z
(y). PPP If
z
(y) ,= 0, then (as in (b) above)
there are k, m Z such that
2
k
(m+
1
2
) z < 2
k
(m+ 1), 2
k
m y 2
k
(m+
1
2
),
z
(y) =

(2
k
y (m+
1
4
))
2
.
In this case,
2
k+1
(m+
1
2
) 2z < 2
k+1
(m+ 1), 2
k+1
m 2y 2
k+1
(m+
1
2
),
so

2z
(2y) =

(2
k1
2y (m+
1
4
))
2
=
z
(y).
Similarly,
2
k1
(m+
1
2
)
1
2
z < 2
k1
(m+ 1), 2
k1
m
1
2
y 2
k1
(m+
1
2
),
so
286R Carlesons theorem 487
1
2
z
(
1
2
y) =

(2
k+1

1
2
y (m+
1
4
))
2
=
z
(y).
This shows that
2z
(2y) =
z
(y) if either is non-zero, and therefore in all cases. QQQ
Accordingly

z,2,2
(y) =
2z+2
(2y + 2) =
z+
(y +) =

z
(y)
for all y, z, R and all > 0.
(f ) Consequently
g(2, y, z) = lim
b
1
b
_
b
0

z,2,
(y)d = lim
b
2
b
_
b/2
0

z,2,2
(y)d
= lim
b
2
b
_
b/2
0

z
(y)d = lim
b
1
b
_
b
0

z
(y)d = g(, y, z)
whenever > 0 and y, z R. It follows that
_

g(, y, z)d =
_

g(2, y, z)d =
_
2
2
1

g(, y, z)d
whenever 0 < , and therefore that
_
2

g(, y, z)d =
_
2
1
1

g(, y, z)d
for every > 0. PPP Take k Z such that 2
k
< 2
k+1
. Then
_
2

g(, y, z)d =
_
2
k+1
2
k
1

g(, y, z)d
_

2
k
1

g(, y, z)d +
_
2
2
k+1
1

g(, y, z)d
=
_
2
k+1
2
k
1

g(, y, z)d =
_
2
1
1

g(, y, z)d. QQQ


(g) Now if , > 0 and y < z,
g(, y, z) = lim
b
1
b
_
b
0

z+
(y +)d = g(, y, z).
So if > 0 and y < z,

z
(y) =
_
2
1
1

g(, y, z)d =
_
2
1
1

g(, y, z)d
=
_
2

g(, y, z)d =
_
2
1
1

g(, y, z)d =

z
(y).
Putting this together with (d), we see that if y < z then

z
(y) =

zy
(0) =

1
(0).
(h) I have still to check that

1
(0) is not zero. But suppose that 1 <
7
6
and that there is some m Z such that
2(m+
1
12
) 2(m+
5
12
). Then 2(m+
1
2
) + < 2(m+ 1), while [
1
2
(m+
1
4
)[
1
6
, so

+
() =

(
1
2
(m+
1
4
))
2
= 1.
What this means is that, for 1 <
7
6
,
g(, 1, 0) = lim
m
1
2m
_
2m
0

+
()d
lim
m
1
2m
m1

j=0
[2(j +
1
12
), 2(j +
5
12
)] =
1
3
.
So
488 Fourier analysis 286R

1
(0) =
_
2
1
1

g(, 1, 0)d
1
3
_
7/6
1
1

d > 0.
This completes the proof.
286S Lemma Suppose that f L
2
C
.
(a) For every x R,
(

Af)(x) = liminf
n
1
n
_
2
1
1

_
n
0
(D
1/
AM

f)(x)dd
is dened in [0, ], and

Af : R [0, ] is Borel measurable.
(b)
_
F

Af C
9
|f|
2

F whenever F < .
(c) If f is a rapidly decreasing test function and z R, 2[(

z
)

[

Af at every point.
proof (a) The point here is that the function
(, , x) (D
1/
AM

f)(x) : ]0, [ R
2
[0, ]
is Borel measurable. PPP
(D
1/
AM

f)(x) = (AM

f)(
x

)
= sup
zR,PQ is nite

P,zJ
r

[(M

f[

(
x

)[.
Look at the central term in this formula. For any Q, we have
(M

f[

) =
_

e
it
f(t)

(t)dt
=
1

e
it/
f(t)

(t/)dt.
Now

is a rapidly decreasing test function, so there is some 0 such that [

(t)[ /(1 + t
2
) for every t R.
This means that if > 0 and
n

nN
is a sequence in
_
1
2
,
_
and we set g(t) = sup
nN
[

(t/
n
)[ for t R, then
g(t) 4/(4 + t
2
) for every t and g is integrable. So Lebesgues Dominated Convergence Theorem tells us that if

nN
and
n

nN
,
1

n
_

e
i
n
t/
n
f(t)

(t/
n
)dt
1

e
it/
f(t)

(t/)dt.
Thus
(, ) (M

f[

) : ]0, [ R R
is continuous; and this is true for every Q.
Accordingly
(, , x)

P,zJ
r

[(M

f[

(
x

)[
is continuous for every z R and every nite P Q, and (, , x) (D
1/
AM

f)(x) is Borel measurable by


256Ma. QQQ
It follows that the repeated integrals
_
2
1
1

_
n
0
(D
1/
AM

f)(x)dd
are dened in [0, ] and are Borel measurable functions of x (252P again), so that

Af is Borel measurable.
(b) For any n N,
_
F
1
n
_
2
1
1

_
n
0
(D
1/
AM

f)(x)dddx
=
1
n
_
2
1
1

_
n
0
_
F
(D
1/
AM

f)(x)dxdd
(by Fubinis theorem, 252H)
286S Carlesons theorem 489
=
1
n
_
2
1
_
n
0
_
F
1

(AM

f)(
x

)dxdd
=
1
n
_
2
1
_
n
0
_

1
F
(AM

f)(x)dxdd

1
n
_
2
1
_
n
0
C
9
|M

f|
2
_
(
1
F)dd
(286O)
= C
9

1
n
_
2
1
_
n
0
1

|f|
2

1

_
Fdd
= C
9
|f|
2
_
F
1
n
_
2
1
1

_
n
0
dd
= C
9
|f|
2
_
F ln 2 C
9
|f|
2
_
F.
So
_
F

Af =
_
F
liminf
n
1
n
_
2
1
1

_
n
0
(D
1/
AM

f)(x)dddx
liminf
n
_
F
1
n
_
2
1
1

_
n
0
(D
1/
AM

f)(x)dddx
(by Fatous lemma)
C
9
|f|
2
_
F.
(c) For any x R,
_

f(y)[
_
2
1
1

_
sup
nN
1
n
_
n
0

z
(y)d
_
ddy ln 2
_

f[
is nite. So
(

z
)

(x) =
1

2
_

e
ixy

f(y)

z
(y)dy
=
1

2
_

e
ixy

f(y)
_
2
1
1

lim
n
1
n
_
n
0

z
(y)dddy
=
1

2
lim
n
_

e
ixy

f(y)
_
2
1
1
n
_
n
0

z
(y)dddy
(by Lebesgues Dominated Convergence Theorem)
=
1

2
lim
n
_
2
1
1
n
_
n
0
_

e
ixy

f(y)

z
(y)dydd
(by Fubinis theorem)
= lim
n
_
2
1
1
n
_
n
0
(

z
)

(x)dd,
and
2[(

z
)

(x)[ = 2

lim
n
_
2
1
1
n
_
n
0
(

z
)

(x)dd

2 liminf
n
_
2
1
1
n
_
n
0
[(

z
)

(x)[dd
liminf
n
_
2
1
1
n
_
n
0
(D
1/
AM

f)(x)dd
(286Qb)
= (

Af)(x).
490 Fourier analysis 286T
286T Lemma Set C
10
= C
9
/

1
(0). For f L
2
C
, dene

Af : R [0, ] by setting
(

Af)(y) = sup
ab
1

2
[
_
b
a
e
ixy
f(x)dx[
for each y R. Then
_
F

Af C
10
|f|
2

F whenever F < .
proof (a) As usual, the rst step is to conrm that

Af is measurable. PPP For a b, y [
1

2
_
b
a
e
ixy
f(x)dx[ is
continuous (by 283Cf, since f [a, b] is integrable), so 256M gives the result. QQQ
(b) Suppose that f is a rapidly decreasing test function. Then
(

Af)(y)
1

1
(0)
(

A

f)(y)
for every y R. PPP If a R then
1

2
[
_
a

e
ixy
f(x)dx[ =
1

1
(0)

2
[
_

e
ixy

a
(x)f(x)dx[
(286R)
=
1

1
(0)
[(f

a
)

(y)[ =
1

1
(0)
[(

a
)

(y)[
(284C)

1
2

1
(0)
(

A

f)(y)
(286Sc). So if a b in R,
1

2
[
_
b
a
e
ixy
f(x)dx[
1

1
(0)
(

A

f)(y);
taking the supremum over a and b, we have the result. QQQ
It follows that
_
F

Af
1

1
(0)
_
F

f
1

1
(0)
C
9
|f|
2
_
(F)
(286Sb, 284Oa)
= C
10
|f|
2
_
F.
(c) For general square-integrable f, take any > 0 and any n N. Set
(

A
n
f)(y) = sup
nabn
1

2
[
_
b
a
e
ixy
f(x)dx[
for each y R. Let g be a rapidly decreasing test function such that |f g|
2
(284N). Then

Ag

A
n
g

A
n
f

2n

(using Cauchys inequality), so


_
F

A
n
f
_
F

Ag +
_
n

F C
10
(|f|
2
+)

F +
_
n

F.
As is arbitrary,
_
F

A
n
f C
10
|f|
2

F; letting n , we get
_
F

Af C
10
|f|
2

F.
286U Theorem If f L
2
C
then
g(y) = lim
a,b
1

2
_
b
a
e
ixy
f(x)dx
is dened in C for almost every y R, and g represents the Fourier transform of f.
proof (a) For n N, y R set
286U Carlesons theorem 491

n
(y) = sup
an,bn
1

_
b
a
e
ixy
f(x)dx
_
n
n
e
ixy
f(x)dx

.
Then g(y) is dened whenever inf
nN

n
(y) = 0. PPP If inf
nN

n
(y) = 0 and > 0, take m N such that
m
(y)
1
2
;
then
1

2
[
_
b
a
e
ixy
f(x)dx
_
n
n
e
ixy
f(x)dx[ whenever n m and a n, b n. But this means, rst, that

_
n
n
e
ixy
f(x)dx
nN
is a Cauchy sequence, so has a limit say, and, second, that = lim
a,b
_
b
a
e
ixy
f(x)dx,
so that g(y) =

2
is dened. QQQ
Also each
n
is a measurable function (cf. part (a) of the proof of 286T).
(b) ??? Suppose, if possible, that y : inf
nN

n
(y) > 0 is not negligible. Then
lim
m
y : [y[ m, inf
nN

n
(y)
1
m
> 0,
so there is an > 0 such that
F = y : [y[
1

, inf
nN

n
(y)
has measure greater than . Let n N be such that
4C
2
10
(
_

[f(x)[
2
dx
_
n
n
[f(x)[
2
dx) <
3
,
and set f
1
= f f [n, n]; then 2C
10
|f
1
|
2

3/2
.
We have

n
(y) = sup
an,bn
1

_
b
a
e
ixy
f
1
(x)dx
_
n
n
e
ixy
f
1
(x)dx

2 sup
ab
1

2
[
_
b
a
e
ixy
f
1
(x)dx[ 2(

Af
1
)(y),
so that
F
_
F

n
2
_
F

Af
1
2C
10
|f
1
|
2
_
F
(286T)

3/2
_
F
and F ; but we chose so that F would be greater than . XXX
(c) Thus g(y) is dened for almost every y R. Now g represents the Fourier transform of f. PPP Let h be a rapidly
decreasing test function. Then the restriction of

Af to the set on which it is nite is a tempered function, by 286D, so
_

(

Af)(y)[h[ is nite, by 284F. Now
_

g h =
1

2
_

_
lim
n
_
n
n
e
ixy
f(x)dx
_
h(y)dy
=
1

2
lim
n
_

_
n
n
e
ixy
f(x)h(y)dxdy
(because
1

2
[
_
n
n
e
ixy
f(x)dx[

Af(y) for every n and y, so we can use Lebesgues Dominated Convergence Theorem)
=
1

2
lim
n
_
n
n
_

e
ixy
f(x)h(y)dydx
(because
_

_
n
n
[f(x)h(y)[dxdy is nite for each n)
= lim
n
_
n
n
f

h =
_

h
because f

h is certainly integrable. As h is arbitrary, g represents the Fourier transform of f. QQQ


492 Fourier analysis 286V
286V TheoremFor any square-integrable complex-valued function on ], ], its sequence of Fourier sums converges
to it almost everywhere.
proof Suppose that f L
2
C
(
],]
). Set f
1
(x) = f(x) for x domf, 0 for x R ], ]; then f
1
L
2
C
(). Let g
L
2
C
() represent the inverse Fourier transform of f
1
(284O). Then 286U tells us that f
2
(x) = lim
a
1

2
_
a
a
e
ixy
g(y)dy
is dened for almost every x, and that f
2
represents the Fourier transform of g, so is equal almost everywhere to f
1
(284Ib).
Now, for any a 0, x R,
_
a
a
e
ixy
g(y)dy = (g[h
ax
)
(where h
ax
(y) = e
ixy
if [y[ a, 0 otherwise)
= (f
2
[

h
ax
)
(284Ob)
=
1

2
_

f
2
(t)
_

e
ity
h
ax
(y)dy dt
=
2

2
_

sin(xt)a
xt
f
2
(t)dt =
2

2
_

sin(xt)a
xt
f(t)dt.
So
f(x) = f
2
(x) = lim
a
1

sin(xt)a
xt
f(t)dt
for almost every x ], ].
On the other hand, writing s
n

nN
for the sequence of Fourier sums of f, we have, for any x ], [,
s
n
(x) =
1
2
_

f(t)
sin(n+
1
2
)(xt)
sin
1
2
(xt)
dt
for each n, by 282Da. Now
1
2
_

f(t)
sin(n+
1
2
)(xt)
sin
1
2
(xt)
dt
1

f(t)
sin(n+
1
2
)(xt)
xt
dt
=
1

f(t)
_
sin(n+
1
2
)(xt)
2 sin
1
2
(xt)
dt
sin(n+
1
2
)(xt)
xt
_
dt
=
1

_
x+
x
f(x t) sin(n +
1
2
)t
_
1
2 sin
1
2
t

1
t
_
dt.
But if we look at the function
p
x
(t) = f(x t)
_
1
2 sin
1
2
t

1
t
_
if x < t < x + and t ,= 0,
= 0 otherwise,
p
x
is integrable, because f is integrable over ], ] and lim
t0
1
2 sin
1
2
t

1
t
= 0, so sup
t=0,xtx+
[
1
2 sin
1
2
t

1
t
[ is
nite. (This is where we need to know that [x[ < .) So
lim
n
s
n
(x)
1

f(t)
sin(n+
1
2
)(xt)
xt
dt = lim
n
_

p
x
(t) sin(n +
1
2
)t dt = 0
by the Riemann-Lebesgue lemma (282Fb). But this means that lim
n
s
n
(x) = f(x) for any x ], [ such that
f(x) = lim
a
1

sin(xt)a
xt
f(t)dt, which is almost every x ], ].
286 Notes Carlesons theorem 493
286W Glossary The following special notations are used in more than one paragraph of this section:
for Lebesgue measure on R. 286G: C
2
, C
3
, C
4
. 286O: Af.
286A: f

. 286H: mass, energy. 286P:


z
(y).
286C: S

f, M

f, D

f. 286I: C
5
. 286Q:

z
(y).
286Ea: J, Q, I

, J

, k

, x

, y

, J
l

, J
r

, y
l

. 286K: C
6
. 286R:

z
(y).
286Eb: ,

, (f[g). 286L: C
7
. 286S:

Af.
286Ec: w, w

, C
1
. 286M: C
8
. 286T: C
10
,

Af.
286F: ,
r
, . 286N: C
9
.
286X Basic exercises (a) Use 284Oa and 284Xf to shorten part (c) of the proof of 286U.
(b) Show that if c
k

kN
is a sequence of complex numbers such that

k=0
[c
k
[
2
is nite, then

k=0
c
k
e
ikx
is dened
in C for almost all x R.
286Y Further exercises (a) Show that if f is a square-integrable function on R
r
, where r 2, then
g(y) =
1
(

2)
r
lim

1
,... ,
r
,
1
,... ,
r

_
b
a
e
iy . x
f(x)dx
is dened in C for almost every y R
r
, and that g represents the Fourier transform of f.
286 Notes and comments This is not the longest single section in this treatise as a whole, but it is by a substantial
margin the longest in the present volume, and thirty pages of sub-superscripts must tax the endurance of the most
enthusiastic. You will easily understand why Carlesons theorem is not usually presented at this level. But I am trying
in this book to present complete proofs of the principal theorems, there is no natural place for Carlesons theorem in
later volumes as at present conceived, and it is (just) accessible at this point; so I take the space to do it here.
The proof here divides naturally into two halves: the combinatorial part in 286E-286M, up to the Lacey-Thiele
lemma, followed by the analytic part in 286N-286V, in which the averaging process
_
2
1
1

lim
b
1
b
_
b
0
. . . dd
is used to transform the geometrically coherent, but analytically irregular, functions
z
into the characteristic functions
1

1
(0)

z
. From the standpoint of ordinary Fourier analysis, this second part is essentially routine; there are many paths
we could follow, and we have only to take the ordinary precautions against illegitimate operations.
Carleson (Carleson 66) stated his theorem in the Fourier-series form of 286V; but it had long been understood
that this was equiveridical with the Fourier-transform version in 286U. There are of course many ways of extending
the theorem. In particular, there are corresponding results for functions in L
p
for any p > 1, and even for functions
f such that f ln(1 + [f[) ln ln ln(16 + [f[) is integrable (Antonov 96). The methods here do not seem to reach
so far. I ought also to remark that if we dene

Af as in 286T, then there is for every p > 1 a constant C such that
|

Af|
p
C|f|
p
for every f L
p
C
(Hunt 67, Mozzochi 71, Jrsboe & Mejlbro 82, Arias de Reyna 02).
Note that the point of Carlesons theorem, in either form, is that we take special limits. In the formulae

f(y) =
1

2
lim
a,b
_
b
a
e
ixy
f(x)dx,
f(x) = lim
n

n
n
c
k
e
ikx
,
valid almost everywhere for square-integrable functions f, we are not taking the ordinary integral
_

e
ixy
f(x)dx
or the unconditional sum

kZ
c
k
e
ikx
. If f is not integrable, or

k=
[c
k
[ is innite, these will not be dened at
even one point. Carlesons theorem makes sense only because we have a natural preference for particular kinds of
improper integral and conditional sum. So when we return, in Chapter 44 of Volume 4, to Fourier analysis on general
topological groups, there will simply be no language in which to express the theorem, and while versions have been
proved for other groups (e.g., Schipp 78), they necessarily depend on some structure beyond the simple notion of
locally compact Hausdor abelian topological group. Even in R
2
, I understand that it is still unknown whether
lim
a
1
2
_
B(0,a)
e
iy . x
f(x)dx
will be dened a.e. for any square-integrable function f, if we use ordinary Euclidean balls B(0, a) in place of the
rectangles in 286Ya.
494 Appendix
Appendix to Volume 2
Useful Facts
In the course of writing this volume, I have found that a considerable number of concepts and facts from various
branches of mathematics are necessary to us. Nearly all of them are embedded in important and well-established
theories for which many excellent textbooks are available and which I very much hope that you will one day study in
depth. Nevertheless, I am reluctant to send you o immediately to courses in general topology, functional analysis and
set theory, as if these were essential prerequisites for our work here, along with real analysis and basic linear algebra.
For this reason I have written this Appendix, setting out those results which we actually need at some point in this
volume. The great majority of them really are elementary indeed, some are so elementary that they are not always
spelt out in detail in orthodox treatments of their subjects.
While I do not put this book forward as the proper place to learn any of these topics, I have tried to set them out
in a way that you will nd easy to integrate into regular approaches. I do not expect anybody to read systematically
through this work, and I hope that the references given in the main chapters of this volume will be adequate to guide
you to the particular items you need.
2A1 Set theory
Especially for the examples in Chapter 21, we need some non-trivial set theory, which is best approached through the
standard theory of cardinals and ordinals; and elsewhere in this volume I make use of Zorns Lemma. Here I give a very
brief outline of the results involved, largely omitting proofs. Most of this material should be in any sound introduction
to set theory. The references I give are to books which happen to have come my way and which I can recommend as
reasonably suitable for beginners.
I do not discuss axiom systems or logical foundations. The set theory I employ is naive in the sense that I rely on my
understanding of the collective experience of the last ninety years, rather than on any attempt at formal description,
to distinguish legitimate from unsafe arguments. There are, however, points in Volume 5 at which such a relaxed
philosophy becomes inappropriate, and I therefore use arguments which can, I believe, be translated into standard
Zermelo-Fraenkel set theory without new ideas being invoked.
Although in this volume I use the axiom of choice without scruple whenever appropriate, I will divide this section
into two parts, starting with ideas and results not dependent on the axiom of choice (2A1A-2A1I) and continuing with
the remainder (2A1J-2A1P). I believe that even at this level it helps us to understand the nature of the arguments
better if we maintain a degree of separation.
2A1A Ordered sets (a) Recall that a partially ordered set is a set P together with a relation on P such that
if p q and q r then p r
p p for every p P
if p q and q p then p = q.
In this context, I will write p q to mean q p, and p < q or q > p to mean p q and p ,= q. is a partial order
on P.
(b) Let (P, ) be a partially ordered set, and A P. A maximal element of A is a p A such that p ,< a for any
a A. Note that A may have more than one maximal element. An upper bound for A is a p P such that a p
for every a A; a supremum or least upper bound is an upper bound p such that p q for every upper bound q
of A. There can be at most one such, because if p, p

are both least upper bounds then p p

and p

p. Accordingly
we may safely write p = sup A if p is the least upper bound of A.
Similarly, a minimal element of A is a p A such that p ,> a for every a A; a lower bound of A is a p P such
that p a for every a A; and inf A = a means that
q P, a q p q for every p A.
A subset A of P is order-bounded if it has both an upper bound and a lower bound.
A subset A of P is upwards-directed if for any p, p

A there is a q A such that p q and p

q; that is, if
any non-empty nite subset of A has an upper bound in A. Similarly, A P is downwards-directed if for any p,
p

A there is a q A such that q p and q p

; that is, if any non-empty nite subset of A has a lower bound in A.


It is sometimes convenient to adapt the notation for closed intervals to arbitrary partially ordered sets: [p, q] will be
r : p r q.
2A1B Set theory 495
(c) A totally ordered set is a partially ordered set (P, ) such that
for any p, q P, either p q or q p.
is then a total or linear order on P. In any totally ordered set we have a median function; for p, q, r P set
med(p, q, r) = max(min(p, q), min(p, r), min(q, r))
= min(max(p, q), max(p, r), max(q, r)),
so that med(p, q, r) = q if p q r.
(d) A lattice is a partially ordered set (P, ) such that
for any p, q P, p q = supp, q and p q = infp, q are dened in P.
(e) A well-ordered set is a totally ordered set (P, ) such that inf A exists and belongs to A for every non-empty
set A P; that is, every non-empty subset of P has a least element. In this case is a well-ordering of P.
2A1B Transnite Recursion: Theorem Let (P, ) be a well-ordered set and X any class. For p P write L
p
for the set q : q P, q < p and X
L
p
for the class of all functions from L
p
to X. Let F :

pP
X
L
p
X be any
function. Then there is a unique function f : P X such that f(p) = F(fL
p
) for every p P.
proof There are versions of this result in Enderton 77 (p. 175) and Halmos 60 (18). Nevertheless I write out
a proof, since it seems to me that most elementary books on set theory do not give it its proper place at the very
beginning of the theory of well-ordered sets.
(a) Let be the class of all functions such that
() dom is a subset of P, and L
p
dom for every p dom;
() (p) X for every p dom, and (p) = F(L
p
) for every p dom.
(b) If , then and agree on dom dom. PPP??? If not, then A = q : q dom dom, (q) ,= (q)
is non-empty. Because P is well-ordered, A has a least element p say. Now L
p
dom dom and L
p
A = , so
(p) = F(L
p
) = F(L
p
) = (p),
which is impossible. XXXQQQ
(c) It follows that is a set, since the function dom is an injective function from to TP, and its inverse is
a surjection from a subset of TP onto . We can therefore, without inhibitions, dene a function f by writing
domf =

dom, f(p) = (p) whenever , p dom.


(If you think that a function is just the set of ordered pairs (p, (p)) : p dom, then f becomes

.) Then
f . PPP Of course f is a function from a subset of P to X. If p domf, then there is a such that p dom,
in which case
L
p
dom domf, f(p) = (p) = F(L
p
) = F(fL
p
). QQQ
(d) f is dened everywhere in P. PPP??? Otherwise, P domf is non-empty and has a least element r say. Now
L
r
domf. Dene a function by saying that dom = rdomf, (p) = f(p) for p domf and (r) = F(fL
r
).
Then , because if p dom
either p domf so L
p
domf dom and
(p) = f(p) = F(fL
p
) = F(L
p
)
or p = r so L
p
= L
r
domf dom and
(p) = F(fL
r
) = F(L
r
).
Accordingly and r dom domf. XXXQQQ
(e) Thus f : P X is a function such that f(p) = F(fL
p
) for every p. To see that f is unique, observe that any
function of this type must belong to , so must agree with f on their common domain, which is the whole of P.
Remark If you have been taught to distinguish between the words set and class, you will observe that my naive set
theory is a relatively tolerant one in that it is willing to allow class variables in its theorems.
496 Appendix 2A1C
2A1C Ordinals An ordinal (sometimes called a von Neumann ordinal) is a set such that
if then is a set and , ,
if then ,
writing to mean or = , (, ) is well-ordered
(Enderton 77, p. 191; Halmos 60, 19; Henle 86, p. 27; Krivine 71, p. 24; Roitman 90, 3.2.8. Of course many
set theories do not allow sets to belong to themselves, and/or take it for granted that every object of discussion is a
set, but I prefer not to take a view on such points in general.)
2A1D Basic facts about ordinals (a) If is an ordinal, then every member of is an ordinal. (Enderton 77,
p. 192; Henle 86, 6.4; Krivine 71, p. 14; Roitman 90, 3.2.10.)
(b) If , are ordinals then either or = or (and no two of these can occur together). (Enderton
77, p. 192; Henle 86, 6.4; Krivine 71, p. 14; Lipschutz 64, 11.12; Roitman 90, 3.2.13.) It is customary, in this
case, to write < if and if either or = . Note that i .
(c) If A is any non-empty class of ordinals, then there is an A such that for every A. (Henle 86,
6.7; Krivine 71, p. 15.)
(d) If is an ordinal, so is ; call it + 1. If < then + 1 ; + 1 is the least ordinal greater than .
(Enderton 77, p. 193; Henle 86, 6.3; Krivine 71, p. 15.) For any ordinal , either there is a greatest ordinal < ,
in which case = + 1 and we call a successor ordinal, or =

, in which case we call a limit ordinal.


(e) The rst few ordinals are 0 = , 1 = 0 +1 = 0 = , 2 = 1 +1 = 0, 1 = , , 3 = 2 +1 = 0, 1, 2, . . . .
The rst innite ordinal is = 0, 1, 2, . . . , which may be identied with N.
(f ) The union of any set of ordinals is an ordinal. (Enderton 77, p. 193; Henle 86, 6.8; Krivine 71, p. 15;
Roitman 90, 3.2.19.)
(g) If (P, ) is any well-ordered set, there is a unique ordinal such that P is order-isomorphic to , and the
order-isomorphism is unique. (Enderton 77, pp. 187-189; Henle 86, 6.13; Halmos 60, 20.)
2A1E Initial ordinals An initial ordinal is an ordinal such that there is no bijection between and any member
of . (Enderton 77, p. 197; Halmos 60, 25; Henle 86, p. 34; Krivine 71, p. 24; Roitman 90, 5.1.10, p. 79).
2A1F Basic facts about initial ordinals (a) All nite ordinals, and the rst innite ordinal , are initial ordinals.
(b) For every well-ordered set P there is a unique initial ordinal such that there is a bijection between P and .
(c) For every ordinal there is a least initial ordinal greater than . (Enderton 77, p. 195; Henle 86, 7.2.1.) If
is an initial ordinal, write
+
for the least initial ordinal greater than . We write
1
for
+
,
2
for
+
1
, and so on.
(d) For any initial ordinal there is a bijection between and ; consequently there are bijections between
and
r
for every r 1.
2A1G Schr oder-Bernstein theorem I remind you of the following fundamental result: if X and Y are sets and
there are injections f : X Y , g : Y X then there is a bijection h : X Y . (Enderton 77, p. 147; Halmos 60,
22; Henle 86, 7.4; Lipschutz 64, p. 145; Roitman 90, 5.1.2. It is also a special case of 344D in Volume 3.)
2A1H Countable subsets of TN The following results will be needed below.
(a) There is a bijection between TN and R. (Enderton 77, p. 149; Lipschutz 64, p. 146.)
(b) Suppose that X is any set such that there is an injection from X into TN. Let ( be the set of countable subsets of
X. Then there is a surjection from TN onto (. PPP Let f : X TN be an injection. Set f
1
(x) = 0i +1 : i f(x);
then f
1
: X TN is injective and f
1
(x) ,= for every x X. Dene g : TN TX by setting
g(A) = x : n N, f
1
(x) = i : 2
n
(2i + 1) A
for each A N. Then g(A) is countable, since we have an injection
x minn : f
1
(x) = i : 2
n
(2i + 1) A
2A1K Set theory 497
from g(A) to N. Thus g is a function from TN to (. To see that g is surjective, observe that = g(), while if C X
is countable and not empty there is a surjection h : N C; now set
A = 2
n
(2i + 1) : n N, i f
1
(h(n)),
and see that g(A) = C. QQQ
(c) Again suppose that X is a set such that there is an injection from X to TN, and write H for the set of functions
h such that domh is a countable subset of X and h takes values in 0, 1. Then there is a surjection from TN onto H.
PPP Let ( be the set of countable subsets of X and let g : TN ( be a surjection, as in (a). For A N set
g
0
(A) = g(i : 2i A), g
1
(A) = g(i : 2i + 1 A),
so that g
0
(A), g
1
(A) are countable subsets of X, and A (g
0
(A), g
1
(A)) is a surjection from TN onto ( (. Let h
A
be the function with domain g
0
(A) g
1
(A) such that h
A
(x) = 1 if x g
1
(A), 0 if x g
0
(A) g
1
(A). Then A h
A
is
a surjection from TN onto H. QQQ
2A1I Filters I pause for a moment to discuss a construction which is of great value in investigating topological
spaces, but has other uses, and in its nature belongs to elementary set theory (much more elementary, indeed, than
the work above).
(a) Let X be a non-empty set. A lter on X is a family T of subsets of X such that
X T, / T,
E F T whenever E, F T,
E T whenever X E F T.
The second condition implies (inducing on n) that F
0
. . . F
n
T whenever F
0
, . . . , F
n
T.
(b) Let X, Y be non-empty sets, T a lter on X and f : D Y a function, where D T. Then
E : E Y, f
1
[E] T
is a lter on Y (because f
1
[Y ] = D, f
1
[] = , f
1
[E F] = f
1
[E] f
1
[F], X f
1
[E] f
1
[F] whenever
Y E F); I will call it f[[T]], the image lter of T under f.
Remark Of course there is a hidden variable in this notation. Ordinarily in this book I regard a function f as being
dened by its domain domf and its values on its domain; that is, it is determined by its graph (x, f(x)) : x domf,
and indeed I normally do not distinguish between a function and its graph. This means that when I write f : D Y
is a function then the class D = domf can be recovered from the function, but the class Y cannot; all I promise is
that Y includes the class f[D] of values of f. Now in the notation f[[T]] above we do actually need to know which set
Y it is to be a lter on, even though this cannot be discovered from knowledge of f and T. So you will always have to
infer it from the context.
2A1J The Axiom of Choice I come now to the second half of this section, in which I discuss concepts and
theorems dependent on the Axiom of Choice. Let me remind you of the statement of this axiom:
(AC) whenever I is a set and X
i

iI
is a family of non-empty sets indexed by I, there is a function f, with
domain I, such that f(i) X
i
for every i I.
The function f is a choice function; it picks out one member of each of the given family of non-empty sets X
i
.
I believe that ones attitude to this principle is a matter for individual choice. It is an indispensable foundation
for very large parts of twentieth-century pure mathematics, including a substantial fraction of the present volume;
but there are also signicant areas in which principles actually contradictory to it can be employed to striking eect,
leading in my view to equally valid mathematics. (I will describe one of these in 567 of Volume 5.) At present it
is the case that more current mathematical activity, by volume, depends on asserting the axiom of choice than on all
its rivals put together; but it is a matter of judgement and taste where the most important, or exciting, ideas are to
be found. For the present volume I follow standard practice in twentieth-century abstract analysis, using the axiom of
choice whenever necessary.
2A1K Zermelos Well-Ordering Theorem (a) The Axiom of Choice is equiveridical with each of the statements
for every set X there is a well-ordering of X,
for every set X there is a bijection between X and some ordinal,
for every set X there is a unique initial ordinal such that there is a bijection between X and .
(Enderton 77, p. 196 et seq.; Halmos 60, 17; Henle 86, 9.1-9.3; Krivine 71, p. 20; Lipschutz 64, 12.1; Roitman
90, 3.6.38.)
498 Appendix 2A1Kb
(b) When assuming the axiom of choice, as I do nearly everywhere in this treatise, I write #(X) for that initial
ordinal such that there is a bijection between and X; I call this the cardinal of X.
2A1L Fundamental consequences of the Axiom of Choice (a) For any two sets X and Y , there is a bijection
between X and Y i #(X) = #(Y ). More generally, there is an injection from X to Y i #(X) #(Y ), and a
surjection from X onto Y i either #(X) #(Y ) > 0 or #(X) = #(Y ) = 0.
(b) In particular, #(TN) = #(R); write c for this common value, the cardinal of the continuum. Cantors
theorem that TN and R are uncountable becomes the result < c, that is,
1
c.
(c) If X is any innite set, and r 1, then there is a bijection between X
r
and X. (Enderton 77, p. 162; Halmos
60, 24.) (I note that we need some form of the axiom of choice to prove the result in this generality. But of course for
most of the innite sets arising naturally in mathematics sets like N and TR it is easy to prove the result without
appeal to the axiom of choice.)
(d) Suppose that is an innite cardinal. If I is a set of cardinal at most and A
i

iI
is a family of sets with
#(A
i
) for every i I, then #(

iI
A
i
) . Consequently #(

/) whenever / is a family of sets such that


#(/) and #(A) for every A /. In particular,
1
cannot be expressed as a countable union of countable
sets, and
2
cannot be expressed as a countable union of sets of cardinal at most
1
.
(e) Now we can rephrase 2A1Hc as: if #(X) c, then #(H) c, where H is the set of functions from a countable
subset of X to 0, 1. PPP For we have an injection from X into TN, and therefore a surjection from TN onto H. QQQ
(f ) Any non-empty class of cardinals has a least member (by 2A1Dc).
2A1M Zorns Lemma In 2A1K I described the well-ordering principle. I come now to another proposition which
is equiveridical with the axiom of choice:
Let (P, ) be a non-empty partially ordered set such that every non-empty totally ordered subset of P
has an upper bound in P. Then P has a maximal element.
This is Zorns Lemma. For the proof that the axiom of choice implies, and is implied by, Zorns Lemma, see Enderton
77, p. 151; Halmos 60, 16; Henle 86, 9.1-9.3; Roitman 90, 3.6.38.
2A1N Ultralters A lter T on a set X is an ultralter if for every A X either A T or X A T.
If T is an ultralter on X and f : D Y is a function, where D T, then f[[T]] is an ultralter on Y (because
f
1
[Y A] = D f
1
[A] for every A Y ).
One type of ultralter can be described easily: if x is any point of a set X, then T = F : x F X is an
ultralter on X. (You need only read the denitions. Ultralters of this type are called principal ultralters.) But
it is not obvious that there are any further ultralters, and indeed it is not possible to prove that there are any, without
using a strong form of the axiom of choice, as follows.
2A1O The Ultralter Theorem As an example of the use of Zorns lemma which will be of great value in studying
compact topological spaces (2A3N et seq., and 247), I give the following result.
Theorem Let X be any non-empty set, and T a lter on X. Then there is an ultralter 1 on X such that T 1.
proof (Cf. Henle 86, 9.4; Roitman 90, 3.6.37.) Let P be the set of all lters on X including T, and order P by
inclusion, so that, for (
1
, (
2
P, (
1
(
2
in P i (
1
(
2
. It is easy to see that P is a partially ordered set, and it is
non-empty because T P. If Q is any non-empty totally ordered subset of P, then 1
Q
=

Q P. PPP Of course 1
Q
is a family of subsets of X. (i) Take any (
0
Q; then X (
0
1
Q
. If ( Q, then ( is a lter, so / (; accordingly
/ 1
Q
. (ii) If E, F 1
Q
, then there are (
1
, (
2
Q such that E (
1
and F (
2
. Because Q is totally ordered,
either (
1
(
2
or (
2
(
1
. In either case, ( = (
1
(
2
Q. Now ( is a lter containing both E and F, so it contains
E F, and E F 1
Q
. (iii) If X E F 1
Q
, there is a ( Q such that F (; and E ( 1
Q
. This shows
that 1
Q
is a lter on X. (iv) Finally, 1
Q
(
0
T, so 1
Q
P. QQQ Now 1
Q
is evidently an upper bound for Q in P.
We may therefore apply Zorns Lemma to nd a maximal element 1 of P. This 1 is surely a lter on X including
T.
Now let A X be such that A / 1. Consider
1
1
= E : E X, E A 1.
2A1P Set theory 499
This is a lter on X. PPP Of course it is a family of subsets of X. (i) X A = X 1, so X 1
1
. A = A / 1 so
/ 1
1
. (ii) If E, F 1
1
then
(E F) A = (E A) (F A) 1,
so E F 1
1
. (iii) If X E F 1
1
then E A F A 1, so E A 1 and E 1
1
. QQQ Also 1
1
1, so
1
1
P. But 1 is a maximal element of P, so 1
1
= 1. Since (X A) A = X 1, X A 1
1
and X A 1.
As A is arbitrary, 1 is an ultralter, as required.
2A1P I come now to a result from innitary combinatorics for which I give a detailed proof, not because it cannot
be found in many textbooks, but because it is usually given in enormously greater generality, to the point indeed that
it may be harder to understand why the stated theorem covers the present result than to prove the latter from rst
principles.
Theorem (a) Let K

A
be a family of countable sets, with #(A) strictly greater than c, the cardinal of the
continuum. Then there are a set M, of cardinal at most c, and a set B A, of cardinal strictly greater than c, such
that K

M whenever , are distinct members of B.


(b) Let I be a set, and f

A
a family in 0, 1
I
, the set of functions from I to 0, 1, with #(A) > c. If K

A
is any family of countable subsets of I, then there is a set B A, of cardinal greater than c, such that f

and f

agree
on K

for all , B.
(c) In particular, under the conditions of (b), there are distinct , A such that f

and f

agree on K

.
proof (a) Choose inductively a family M

<
1
of sets by the rule
if there is any set N such that
() N is disjoint from

<
M

, #(N) c and
#( : A, K

N = ) c,
choose such a set for M

;
otherwise set M

= .
When M

has been chosen for every <


1
, set M =

<
1
M

. The rule ensures that M

<
1
is disjoint and that
#(M

) c for every <


1
, while
1
c, so #(M) c.
Let P be the family of sets P A such that K

M for all distinct , P. Order P by inclusion, so that


it is a partially ordered set. If Q P is totally ordered, then

Q P. PPP If , are distinct members of

Q, there
are Q
1
, Q
2
Q such that Q
1
, Q
2
; now P = Q
1
Q
2
is equal to one of Q
1
, Q
2
, and in either case belongs to
P and contains both and , so K

M. QQQ By Zorns Lemma, P has a maximal element B, and we surely


have K

M for all distinct , B.


??? Suppose, if possible, that #(B) c. Set N =

B
K

M. Then N has cardinal at most c, being included


in a union of at most c countable sets. For every A B, B / P, so there must be some B such that
K

, M; that is, K

N ,= . Thus : K

N = B has cardinal at most c. But this means that


in the rule for choosing M

, there was always an N satisfying the condition (), and therefore M

also did. Thus


C

= : K

= has cardinal at most c for every <


1
. So C =

<
1
C

also has. But the original


hypothesis was that #(A) > c, so there is an A C. In this case, K

,= for every <


1
. But this means
that we have a surjection : K

M
1
given by setting
(i) = if i K

.
Since #(K

) <
1
, this is impossible. XXX
Accordingly #(B) > c and we have found a suitable pair M, B.
(b) By (a), we can nd a set M, of cardinal at most c, and a set B
0
A, of cardinal greater than c, such that
K

M for all distinct , B


0
. Let H be the set of functions from countable subsets of M to 0, 1; then
f

= f

(K

M) H for each B
0
. Now B
0
=

hH
: B
0
, f

= h has cardinal greater than c, while


#(H) c (2A1Le), so there must be some h H such that B = : B
0
, f

= h has cardinal greater than c.


If , are distinct members of B, then K

M, because , B
0
; but this means that
f

= hK

= f

.
Thus B has the required property. (Of course f

and f

agree on K

if = .)
(c) follows at once.
Remark The result we need in this volume (in 216E) is part (c) above. There are other proofs of this, perhaps a little
simpler; but the stronger result in part (b) will be useful in Volume 3.
500 Appendix 2A2 intro.
2A2 The topology of Euclidean space
In the appendix to Volume 1 (1A2) I discussed open and closed sets in R
r
; the chief aim there was to support
the idea of Borel set, which is vital in the theory of Lebesgue measure, but of course they are also fundamental to
the study of continuous functions, and indeed to all aspects of real analysis. I give here a very brief introduction to
the further elementary facts about closed and compact sets and continuous functions which we need for this volume.
Much of this material can be derived from the generalizations in 2A3, but nevertheless I sketch the proofs, since for
the greater part of the volume (most of the exceptions are in Chapter 24) Euclidean space is sucient for our needs.
2A2A Closures: Denition For any r 1 and any A R
r
, the closure of A, A, is the intersection of all the
closed subsets of R
r
including A. This is itself closed (being the intersection of a non-empty family of closed sets, see
1A2Fd), so is the smallest closed set including A. In particular, A is closed i A = A.
2A2B Lemma Let A R
r
be any set. Then for x R
r
the following are equiveridical:
(i) x A, the closure of A;
(ii) B(x, ) A ,= for every > 0, where B(x, ) = y : |y x| ;
(iii) there is a sequence x
n

nN
in A such that lim
n
|x
n
x| = 0.
proof (a)(i)(ii) Suppose that x A and > 0. Then U(x, ) = y : |y x| < is an open set (1A2D), so
F = R
r
U(x, ) is closed, while x / F. Now
x A F = A , F = A , F = A U(x, ) ,= = A B(x, ) ,= .
As is arbitrary, (ii) is true.
(b)(ii)(iii) If (ii) is true, then for each n N we can nd an x
n
A such that |x
n
x| 2
n
, and now
lim
n
|x
n
x| = 0.
(c)(iii)(i) Assume (iii). ??? Suppose, if possible, that x / A. Then x belongs to the open set R
r
A and there is
a > 0 such that U(x, ) R
r
A. But now there is an n such that |x
n
x| < , in which case x
n
U(x, ) A
U(x, ) A. XXX
2A2C Continuous functions (a) I begin with a characterization of continuous functions in terms of open sets. If
r, s 1, D R
r
and : D R
s
is a function, we say that is continuous if for every x D and > 0 there is a
> 0 such that |(y) (x)| whenever y D and |y x| . Now is continuous i for every open set G R
s
there is an open set H R
r
such that
1
[G] = D H.
PPP (i) Suppose that is continuous and that G R
s
is open. Set
H =

U : U R
r
is open, [U D] G.
Then H is a union of open sets, therefore open (1A2Bd), and H D
1
[G]. If x
1
[G], then (x) G, so
there is an > 0 such that U((x), ) G; now there is a > 0 such that |(y) (x)|
1
2
whenever y D and
|y x| , so that
[U(x, ) D] U((x), ) G
and
x U(x, ) H.
As x is arbitrary,
1
[G] = H D. As G is arbitrary, satises the condition.
(ii) Now suppose that satises the condition. Take x D and > 0. Then U((x), ) is open, so there is an
open H R
r
such that H D =
1
[U((x), )]; we see that x H, so there is a > 0 such that U(x, ) H; now
if y D and |y x|
1
2
then y D H, (y) U((x), ) and |(y) (x)| . As x and are arbitrary, is
continuous. QQQ
(b) Using the - denition of continuity, it is easy to see that a function from a subset D of R
r
to R
s
is continuous
i all its components
i
are continuous, writing (x) = (
1
(x), . . . ,
s
(x)) for x D. PPP (i) If is continuous, i s,
x D and > 0, then there is a > 0 such that
[
i
(y)
i
(x)[ |(y) (x)|
whenever y D and |y x| . (ii) If every
i
is continuous, x D and > 0, then there are
i
> 0 such that
[
i
(y)
i
(x)[ /

s whenever y D and |y x|
i
; setting = min
1ir

i
> 0, we have |(y) (x)|
whenever y D and |y x| . QQQ
2A2F The topology of Euclidean space 501
2A2D Compactness in R
r
: Denition A subset F of R
r
is called compact if whenever ( is a family of open
sets covering F then there is a nite subset (
0
of ( still covering F.
2A2E Elementary properties of compact sets Take any r 1, and subsets D, F, G and K of R
r
.
(a) If K is compact and F is closed, then KF is compact. PPP Let ( be an open cover of F K. Then ( R
r
F
is an open cover of K, so has a nite subcover (
0
say. Now (
0
R
r
F is a nite subset of ( covering K F. As (
is arbitrary, K F is compact. QQQ
(b) If s 1, : D R
s
is a continuous function, K is compact and K D, then [K] is compact. PPP Let 1 be an
open cover of [K]. Let 1 be
H : H R
r
is open, V 1,
1
[V ] = D H.
If x K, then (x) [K] so there is a V 1 such that (x) V ; now there is an H 1 such that D H
1
[V ]
contains x (2A2Ca); as x is arbitrary, K

1. Let 1
0
be a nite subset of H covering K. For each H 1
0
, let
V
H
1 be such that
1
[V
H
] = D H; then V
H
: H 1
0
is a nite subset of 1 covering [K]. As 1 is arbitrary,
[K] is compact. QQQ
(c) If K is compact, it is closed. PPP Write H = R
r
K. Take any x H. Then G
n
= R
r
B(x, 2
n
) is open for
every n N (1A2G). Also

nN
G
n
= y : y R
r
, |y x| > 0 = R
r
x K.
So there is some nite set (
0
G
n
: n N which covers K. There must be an n such that (
0
G
i
: i n, so that
K

(
0

in
G
i
= G
n
,
and B(x, 2
n
) H. As x is arbitrary, H is open and K is closed. QQQ
(d) If K is compact and G is open and K G, then there is a > 0 such that K +B(0, ) G. PPP If K = , this
is trivial, as then
K +B(0, 1) = x +y : x K, y B(0, 1) = .
Otherwise, set
( = U(x, ) : x R
r
, > 0, U(x, 2) G.
Then ( is a family of open sets and

( = G (because G is open), so ( is an open cover of K and has a nite subcover


(
0
. Express (
0
as U(x
0
,
0
), . . . , U(x
n
,
n
) where U(x
i
, 2
i
) G for each i. Set = min
in

i
> 0. If x K and
y B(0, ), then there is an i n such that x U(x
i
,
i
); now
|(x +y) x
i
| |x x
i
| +|y| <
i
+ 2
i
,
so x +y U(x
i
, 2
i
) G. As x and y are arbitrary, K +B(0, ) G. QQQ
Remark This result is a simple form of the Lebesgue covering lemma.
2A2F The value of the concept of compactness is greatly increased by the fact that there is an eective charac-
terization of the compact subsets of R
r
.
Theorem For any r 1, a subset K of R
r
is compact i it is closed and bounded.
proof (a) Suppose that K is compact. By 2A2Ec, it is closed. To see that it is bounded, consider ( = U(0, n) : n N.
( consists entirely of open sets, and

( = R
r
K, so there is a nite (
0
( covering K. There must be an n such
that (
0
G
i
: i n, so that
K

(
0

in
U(0, i) = U(0, n),
and K is bounded.
(b) Thus we are left with the converse; I have to show that a closed bounded set is compact. The main part
of the argument is a proof by induction on r that the closed interval [n, n] is compact for all n N, writing
n = (n, . . . , n) R
r
.
(i) If r = 1 and n N and ( is a family of open sets in R covering [n, n], set
A = x : x [n, n], there is a nite (
0
( such that [n, x]

(
0
.
502 Appendix 2A2F
Then n A, because if n G ( then [n, n]

G, and A is bounded above by n, so c = sup A exists and


belongs to [n, n].
Next, c [n, n]

(, so there is a G ( containing c. Let > 0 be such that U(c, ) G. There is an x A


such that x c . Let (
0
be a nite subset of ( covering [n, x]. Then (
1
= (
0
G is a nite subset of ( covering
[n, c +
1
2
]. But c +
1
2
/ A so c +
1
2
> n and (
1
is a nite subset of ( covering [n, n]. As ( is arbitrary, [n, n] is
compact and the induction starts.
(ii) For the inductive step to r + 1, regard the closed interval F = [n, n], taken in R
r+1
, as the product of the
closed interval E = [n, n], taken in R
r
, with the closed interval [n, n] R; by the inductive hypothesis, both E and
[n, n] are compact. Let ( be a family of open subsets of R
r+1
covering F. Write 1 for the family of open subsets H
of R
r
such that H [n, n] is covered by a nite subfamily of (. Then E

1. PPP Take x E. Set


|
x
= U : U R is open, G (, open H R
r
, x H and H U G.
Then |
x
is a family of open subsets of R. If [n, n], there is a G ( containing (x, ); there is a > 0 such that
U((x, ), ) G; now U(x,
1
2
) and U(,
1
2
) are open sets in R
r
, R respectively and
U(x,
1
2
) U(,
1
2
) U((x, ), ) G,
so U(,
1
2
) |
x
. As is arbitrary, |
x
is an open cover of [n, n] in R. By (i), it has a nite subcover U
0
, . . . , U
k
say.
For each j k we can nd H
j
, G
j
such that H
j
is an open subset of R
r
containing x and H
j
U
j
G
j
(. Now set
H =

jk
H
j
. This is an open subset of R
r
containing x, and H [n, n]

jn
G
j
is covered by a nite subfamily
of (. So x H 1. As x is arbitrary, 1 covers E. QQQ
(iii) Now the inductive hypothesis tells us that E is compact, so there is a nite subfamily 1
0
of 1 covering E.
For each H 1
0
let (
H
be a nite subfamily of ( covering H [n, n]. Then

HH
0
(
H
is a nite subfamily of (
covering E [n, n] = F. As ( is arbitrary, F is compact and the induction proceeds.
(iv) Thus the interval [n, n] is compact in R
r
for every r, n. Now suppose that K is a closed bounded set in
R
r
. Then there is an n N such that K [n, n], that is, K = K [n, n]. As K is closed and [n, n] is compact,
K is compact, by 2A2Ea.
This completes the proof.
2A2G Corollary If : D R is continuous, where D R
r
, and K D is a non-empty compact set, then is
bounded and attains its bounds on K.
proof By 2A2Eb, [K] is compact; by 2A2F it is closed and bounded. To say that [K] is bounded is just to say
that is bounded on K. Because [K] is a non-empty bounded set, it has an inmum a and a supremum b; now both
belong to [K] (by the criterion 2A2B(ii), or otherwise); because [K] is closed, both belong to [K], that is, attains
its bounds.
2A2H Lim sup and lim inf revisited In 1A3 I briey discussed limsup
n
a
n
, liminf
n
a
n
for real se-
quences a
n

nN
. In this volume we need the notion of limsup
0
f(), liminf
0
f() for real functions f. I say that
limsup
0
f() = u [, ] if (i) for every v > u there is an > 0 such that f() is dened and less than or equal
to v for every ]0, ] (ii) for every v < u and > 0 there is a ]0, ] such that f() is dened and greater than or
equal to v. Similarly, liminf
0
f() = u [, ] if (i) for every v < u there is an > 0 such that f() is dened
and greater than or equal to v for every ]0, ] (ii) for every v > u and > 0 there is an ]0, ] such that f() is
dened and less than or equal to v.
2A2I In the one-dimensional case, we have a particularly simple description of the open sets.
Proposition If G R is any open set, it is expressible as the union of a countable disjoint family of open intervals.
proof For x, y G write x y if either x y and [x, y] G or y x and [y, x] G. It is easy to check that
is an equivalence relation on G. Let ( be the set of equivalence classes under . Then ( is a partition of G. Now
every C ( is an open interval. PPP Set a = inf C, b = sup C (allowing a = and/or b = if C is unbounded). If
a < x < b, there are y, z C such that y x z, so that [y, x] [y, z] G and y x and x C; thus ]a, b[ C. If
x C, there is an open interval I containing x and included in G; since x y for every y I, I C; so
a inf I < x < sup I b
and x ]a, b[. Thus C = ]a, b[ is an open interval. QQQ
To see that ( is countable, observe that every member of ( contains a member of Q, so that we have a surjective
function from a subset of Q onto (, and ( is countable (1A1E).
2A3C General topology 503
2A3 General topology
At various points principally 245-247, but also for certain ideas in Chapter 27 we need to know something about
non-metrizable topologies. I must say that you should probably take the time to look at some book on elementary
functional analysis which has the phrases weak compactness or weakly compact in the index. But I can list here the
concepts actually used in this volume, in a good deal less space than any orthodox, complete treatment would employ.
2A3A Topologies First we need to know what a topology is. If X is any set, a topology on X is a family T of
subsets of X such that (i) , X T (ii) if G, H T then G H T (iii) if ( T then

( T (cf. 1A2B). The pair


(X, T) is now a topological space. In this context, members of T are called open and their complements (in X) are
called closed (cf. 1A2E-1A2F).
2A3B Continuous functions (a) If (X, T) and (Y, S) are topological spaces, a function : X Y is continuous
if
1
[G] T for every G S. (By 2A2Ca above, this is consistent with the - denition of continuity for functions
from one Euclidean space to another. See also 2A3H below.)
(b) If (X, T), (Y, S) and (Z, U) are topological spaces and : X Y and : Y Z are continuous, then
: X Z is continuous. PPP If G U then
1
[G] S so ()
1
[G] =
1
[
1
[G]] T. QQQ
(c) If (X, T) is a topological space, a function f : X R is continuous i x : a < f(x) < b is open whenever a < b
in R. PPP (i) Every interval ]a, b[ is open in R, so if f is continuous its inverse image x : a < f(x) < b must be open.
(ii) Suppose that f
1
[ ]a, b[ ] is open whenever a < b, and let H R be any open set. By the denition of open set
in R (1A2A),
H =
_
]y , y +[ : y R, > 0, ]y , y +[ H,
so
f
1
[H] =

f
1
[ ]y , y +[ ] : y R, > 0, ]y , y +[ H
is a union of open sets in X, therefore open. QQQ
(d) If r 1, (X, T) is a topological space, and : X R
r
is a function, then is continuous i
i
: X R
is continuous for each i r, where (x) = (
1
(x), . . . ,
r
(x)) for each x X. PPP (i) Suppose that is continuous.
For i r, y = (
1
, . . . ,
r
) R
r
, set
i
(y) =
i
. Then [
i
(y)
i
(z)[ |y z| for all y, z R
r
so
i
: R
r
R is
continuous. Consequently
i
=
i
is continuous, by (b) above. (ii) Suppose that every
i
is continuous, and that
H R
r
is open. Set
( = G : G X is open, G
1
[H].
Then G
0
=

( is open, and G
0

1
[H]. But suppose that x
0
is any point of
1
[H]. Then there is a > 0 such that
U((x
0
), ) H, because H is open and contains (x
0
). For 1 i r set V
i
= x :
i
(x
0
)

r
<
i
(x) <
i
(x
0
)+

r
;
then V
i
is the inverse image of an open set under the continuous map
i
, so is open. Set G =

ir
V
i
. Then G is
open (using (ii) of the denition 2A3A), x
0
G, and |(x) (x
0
| < for every x G, so G
1
[H], G ( and
x
0
G
0
. This shows that
1
[H] = G
0
is open. As H is arbitrary, is continuous. QQQ
(e) If (X, T) is a topological space, f
1
, . . . , f
r
are continuous functions from X to R, and h : R
r
R is continuous,
then h(f
1
, . . . , f
r
) : X R is continuous. PPP Set (x) = (f
1
(x), . . . , f
r
(x)) R
r
for x X. By (d), is continuous, so
by 2A3Bb h(f
1
, . . . , f
r
) = h is continuous. QQQ In particular, f +g, f g and f g are continuous for all continuous
functions f, g : X R.
(f ) If (X, T) and (Y, S) are topological spaces and : X Y is a continuous function, then
1
[F] is closed in X
for every closed set F Y . (For X
1
[F] =
1
[Y F] is open.)
2A3C Subspace topologies If (X, T) is a topological space and D X, then T
D
= G D : G T is a
topology on D. PPP (i) = D and D = X D belong to T
D
. (ii) If G, H T
D
there are G

, H

T such that
G = G

D, H = H

D; now G H = G

D T
D
. (iii) If ( T
D
set 1 = H : H T, H D (; then

( = (

1) D T
D
. QQQ
T
D
is called the subspace topology on D, or the topology on D induced by T. If (Y, S) is another topological
space, and : X Y is (T, S)-continuous, then D : D Y is (T
D
, S)-continuous. (For if H S then
(D)
1
[H] = D
1
[H] T
D
.)
504 Appendix 2A3D
2A3D Closures and interiors (a) In the proof of 2A3Bd I have already used the following idea. Let (X, T) be
any topological space and A any subset of X. Write
int A =

G : G T, G A.
Then int A is an open set, being a union of open sets, and is of course included in A; it must be the largest open set
included in A, and is called the interior of A.
(b) Because a set is closed i its complement is open, we have a complementary notion:
A =

F : F is closed, A F
= X
_
X F : F is closed, A F
= X
_
G : G is open, A G =
= X
_
G : G is open, G X A = X int(X A).
A is closed (being the complement of an open set) and is the smallest closed set including A; it is called the closure
of A. (Compare 2A2A.) Because the union of two closed sets is closed (cf. 1A2Fc), A B = A B for all A, B X.
(c) There are innumerable ways of looking at these concepts; a useful description of the closure of a set is
x A x / int(X A)
there is no open set containing x and included in X A
every open set containing x meets A.
2A3E Hausdor topologies (a) The concept of topological space is so widely drawn, and so widely applicable,
that a vast number of dierent types of topological space have been studied. For this volume we shall not need much
of the (very extensive) vocabulary which has been developed to describe this variety. But one useful word (and one
of the most important concepts) is that of Hausdor space; a topological space X is Hausdor if for all distinct x,
y X there are disjoint open sets G, H X such that x G and y H.
(b) In a Hausdor space X, nite sets are closed. PPP If z X, then for any x X z there is an open set
containing x but not z, so X z is open and z is closed. So a nite set is a nite union of closed sets and is
therefore closed. QQQ
2A3F Pseudometrics Many important topologies (not all!) can be dened by families of pseudometrics; it will be
useful to have a certain amount of technical skill with these.
(a) Let X be a set. A pseudometric on X is a function : X X [0, [ such that
(x, z) (x, y) +(y, z) for all x, y, z X
(the triangle inequality;)
(x, y) = (y, x) for all x, y X;
(x, x) = 0 for all x X.
A metric is a pseudometric satisfying the further condition
if (x, y) = 0 then x = y.
(b) Examples (i) For x, y R, set (x, y) = [x y[; then is a metric on R (the usual metric on R).
(ii) For x, y R
r
, where r 1, set (x, y) = |x y|, dening |z| =
_

r
i=1

2
i
, as usual. Then is a metric,
the Euclidean metric on R
r
. (The triangle inequality for comes from Cauchys inequality in 1A2C: if x, y, z R
r
,
then
(x, z) = |x z| = |(x y) + (y z)| |x y| +|y z| = (x, y) +(y, z).
The other required properties of are elementary. Compare 2A4Bb below.)
(iii) For an example of a pseudometric which is not a metric, take r 2 and dene : R
r
R
r
[0, [ by
setting (x, y) = [
1

1
[ whenever x = (
1
, . . . ,
r
), y = (
1
, . . . ,
r
) R
r
.
2A3I General topology 505
(c) Now let X be a set and P a non-empty family of pseudometrics on X. Let T be the family of those subsets G
of X such that for every x G there are
0
, . . . ,
n
P and > 0 such that
U(x;
0
, . . . ,
n
; ) = y : y X, max
in

i
(y, x) < G.
Then T is a topology on X.
PPP (Compare 1A2B.) (i) T because the condition is vacuously satised. X T because U(x; ; 1) X
for any x X, P. (ii) If G, H T and x G H, take
0
, . . . ,
m
,

0
, . . . ,

n
P, ,

> 0 such that


U(x;
0
, . . . ,
m
; ) G, U(x;

0
, . . . ,

n
;

) G; then
U(x;
0
, . . . ,
m
,

0
, . . . ,

n
; min(,

)) G H.
As x is arbitrary, GH T. (iii) If ( T and x

(, there is a G ( such that x G; now there are


0
, . . . ,
n
P
and > 0 such that
U(x;
0
, . . . ,
n
; ) G

(.
As x is arbitrary,

( T. QQQ
T is the topology dened by P.
(d) You may wish to have a convention to deal with the case in which P is the empty set; in this case the topology
on X dened by P is , X.
(e) In many important cases, P is upwards-directed in the sense that for any
1
,
2
P there is a P such that

i
(x, y) (x, y) for all x, y X and both i. In this case, of course, any set U(x;
0
, . . . ,
n
; ), where
0
, . . . ,
n
P,
includes some set of the form U(x; ; ), where P. Consequently, for instance, a set G X is open i for every
x G there are P, > 0 such that U(x; ; ) G.
(f ) A topology T is metrizable if it is the topology dened by a family P consisting of a single metric. Thus the
Euclidean topology on R
r
is the metrizable topology dened by , where is the metric of (b-ii) above.
2A3G Proposition Let X be a set with a topology dened by a non-empty set P of pseudometrics on X. Then
U(x;
0
, . . . ,
n
; ) is open for all x X,
0
, . . . ,
n
P and > 0.
proof (Compare 1A2D.) Take y U(x;
0
, . . . ,
n
; ). Set
= max
in

i
(y, x), = > 0.
If z U(y;
0
, . . . ,
n
; ) then

i
(z, x)
i
(z, y) +
i
(y, x) < + =
for each i n, so U(y;
0
, . . . ,
n
; ) U(x;
0
, . . . ,
n
; ). As y is arbitrary, U(x;
0
, . . . ,
n
; ) is open.
2A3H Now we have a result corresponding to 2A2Ca, describing continuous functions between topological spaces
dened by families of pseudometrics.
Proposition Let X and Y be sets; let P be a non-empty family of pseudometrics on X, and a non-empty family
of pseudometrics on Y ; let T and S be the corresponding topologies. Then a function : X Y is continuous i
whenever x X, and > 0, there are
0
, . . . ,
n
P and > 0 such that ((y), (x)) whenever y X and
max
in

i
(y, x) .
proof (a) Suppose that is continuous; take x X, and > 0. By 2A3G, U((x); ; ) S. So G =

1
[U((x); ; )] T. Now x G, so there are
0
, . . . ,
n
P and > 0 such that U(x;
0
, . . . ,
n
; ) G. In this
case ((y), (x)) whenever y X and max
in

i
(y, x)
1
2
. As x, and are arbitrary, satises the condition.
(b) Suppose satises the condition. Take H S and consider G =
1
[H]. If x G, then (x) H, so there are

0
, . . . ,
n
and > 0 such that U((x);
0
, . . . ,
n
; ) H. For each i n there are
i0
, . . . ,
i,m
i
P and
i
> 0
such that ((y), (x))
1
2
whenever y X and max
jm
i

ij
(y, x)
i
. Set = min
in

i
> 0; then
U(x;
00
, . . . ,
0,m
0
, . . . ,
n0
, . . . ,
n,m
n
; ) G.
As x is arbitrary, G T. As H is arbitrary, is continuous.
2A3I Remarks (a) If P is upwards-directed, the condition simplies to: for every x X, and > 0, there
are P and > 0 such that ((y), (x)) whenever y X and (y, x) .
506 Appendix 2A3Ib
(b) Suppose we have a set X and two non-empty families P, of pseudometrics on X, generating topologies T and
S on X. Then S T i the identity map from X to itself is a continuous function when regarded as a map from
(X, T) to (X, S), because this will mean that G =
1
[G] belongs to T whenever G S. Applying the proposition
above to , we see that this happens i for every , x X and > 0 there are
0
, . . . ,
n
P and > 0 such that
(y, x) whenever y X and max
in

i
(y, x) . Similarly, reversing the roles of P and , we get a criterion for
when T S, and putting the two together we obtain a criterion to determine when T = S.
2A3J Subspaces: Proposition If X is a set, P a non-empty family of pseudometrics on X dening a topology T
on X, and D X, then
(a) for every P, the restriction
(D)
of to D D is a pseudometric on D;
(b) the topology dened by P
D
=
(D)
: P on D is precisely the subspace topology T
D
described in 2A3C.
proof (a) is just a matter of reading through the denition in 2A3Fa. For (b), we have to think for a moment.
(i) Suppose that G belongs to the topology dened by P
D
. Set
1 = H : H T, H D G,
H

1 T, G

= H

D T
D
;
then G

G. On the other hand, if x G, then there are


0
, . . . ,
n
P and > 0 such that
U(x;
(D)
0
, . . . ,
(D)
n
; ) = y : y D, max
in

(D)
i
(y, x) < G.
Consider
H = U(x;
0
, . . . ,
n
; ) = y : y X, max
in

i
(y, x) < X.
Evidently
H D = U(x;
(D)
0
, . . . ,
(D)
n
; ) G.
Also H T. So H 1 and
x H D H

D = G

.
Thus G = G

T
D
.
(ii) Now suppose that G T
D
. Then there is an H T such that G = HD. Consider the identity map : D X,
dened by saying that (x) = x for every x D. obviously satises the criterion of 2A3H, if we endow D with
P
D
and X with P, because ((x), (y)) =
(D)
(x, y) whenever x, y D and P; so must be continuous for the
associated topologies, and
1
[H] must belong to the topology dened by P
D
. But
1
[H] = G. Thus every set in T
D
belongs to the topology dened by P
D
, and the two topologies are the same, as claimed.
2A3K Closures and interiors Let X be a set, P a non-empty family of pseudometrics on X and T the topology
dened by P.
(a) For any A X and x X,
x int A there is an open set included in A containing x
there are
0
, . . . ,
n
P, > 0 such that U(x;
0
, . . . ,
n
; ) A.
(b) For any A X and x X, x A i U(x;
0
, . . . ,
n
; ) A ,= for every
0
, . . . ,
n
P and > 0. (Compare
2A2B(ii), 2A3Dc.)
2A3L Hausdor topologies Recall that a topology T is Hausdor if any two points can be separated by open
sets (2A3E). Now a topology dened on a set X by a non-empty family P of pseudometrics is Hausdor i for any
two dierent points x, y of X there is a P such that (x, y) > 0. PPP (i) Suppose that the topology is Hausdor
and that x, y are distinct points in X. Then there is an open set G containing x but not containing y. Now there are

0
, . . . ,
n
P and > 0 such that U(x;
0
), . . . ,
n
; ) G, in which case
i
(y, x) > 0 for some i n. (ii) If P
satises the condition, and x, y are distinct points of X, take P such that (x, y) > 0, and set =
1
2
(x, y). Then
U(x; ; ) and U(y; ; ) are disjoint (because if z X, then
(z, x) +(z, y) (x, y) = 2,
2A3P General topology 507
so at least one of (z, x), (z, y) is greater than or equal to ), and they are open sets containing x, y respectively. As
x and y are arbitrary, the topology is Hausdor. QQQ
In particular, metrizable topologies are Hausdor.
2A3M Convergence of sequences (a) If (X, T) is any topological space, and x
n

nN
is a sequence in X, we say
that x
n

nN
converges to x X, or that x is a limit of x
n

nN
, or x
n

nN
x, if for every open set G containing
x there is an n
0
N such that x
n
G for every n n
0
.
(b) Warning In general topological spaces, it is possible for a sequence to have more than one limit, and we cannot
safely write x = lim
n
x
n
. But in Hausdor spaces, this does not occur. PPP If T is Hausdor, and x, y are distinct
points of X, there are disjoint open sets G, H such that x G and y H. If now x
n

nN
converges to x, there is an
n
0
such that x
n
G for every n n
0
, so x
n
/ H for every n n
0
, and x
n

nN
cannot converge to y. QQQ In particular,
a sequence in a metrizable space can have at most one limit.
(c) Let X be a set, and P a non-empty family of pseudometrics on X, generating a topology T; let x
n

nN
be a
sequence in X and x X. Then x
n

nN
converges to x i lim
n
(x
n
, x) = 0 for every P. PPP (i) Suppose that
x
n

nN
x and that P. Then for any > 0 the set G = U(x; ; ) is an open set containing x, so there is an
n
0
such that x
n
G for every n n
0
, that is, (x
n
, x) < for every n n
0
. As is arbitrary, lim
n
(x
n
, x) = 0.
(ii) If the condition is satised, take any open set G containing X. Then there are
0
, . . . ,
k
P and > 0 such
that U(x;
0
, . . . ,
k
; ) G. For each i k there is an n
i
N such that
i
(x
n
, x) < for every n n
i
. Set
n

= max(n
0
, . . . , n
k
); then x
n
U(x;
0
, . . . ,
k
; ) G for every n n

. As G is arbitrary, x
n

nN
x. QQQ
(d) Let (X, ) be a metric space, A a subset of X and x X. Then x A i there is a sequence in A converging to
x. PPP(i) If x A, then for every n N we can choose a point x
n
A U(x; ; 2
n
) (2A3Kb); now x
n

nN
x. (ii)
If x
n

nN
is a sequence in A converging to x, then for every open set G containing x there is an n such that x
n
G,
so that A G ,= ; by 2A3Dc, x A. QQQ
2A3N Compactness The next concept we need is the idea of compactness in general topological spaces.
(a) If (X, T) is any topological space, a subset K of X is compact if whenever ( is a family in T covering K, then
there is a nite (
0
( covering K. (Cf. 2A2D. A warning: many authors reserve the term compact for Hausdor
spaces.) A set A X is relatively compact in X if there is a compact subset of X including A. (Warning! in
non-Hausdor spaces, this is not the same thing as saying that A is compact.)
(b) Just as in 2A2E-2A2G (and the proofs are the same in the general case), we have the following results.
(i) If K is compact and E is closed, then K E is compact.
(ii) If K X is compact and : K Y is continuous, where (Y, S) is another topological space, then [K] is a
compact subset of Y .
(iii) If K X is compact and : K R is continuous, then is bounded and attains its bounds.
2A3O Cluster points (a) If (X, T) is a topological space, and x
n

nN
is a sequence in X, then a cluster point
of x
n

nN
is an x X such that whenever G is an open set containing x and n N then there is a k n such that
x
k
G.
(b) Now if (X, T) is a topological space and A X is relatively compact, every sequence x
n

nN
in A has a cluster
point in X. PPP Let K be a compact subset of X including A. Set
( = G : G T, n : x
n
G is nite.
??? If ( covers K, then there is a nite (
0
( covering K. Now
N = n : x
n
A = n : x
n

(
0
=

GG
0
n : x
n
G
is a nite union of nite sets, which is absurd. XXX Thus ( does not cover K. Take any x K

(. If G T and
x G and n N, then G / ( so k : x
k
G is innite and there is a k n such that x
k
G. Thus x is a cluster
point of x
n

nN
, as required. QQQ
2A3P Filters In R
r
, and more generally in all metrizable spaces, topological ideas can be eectively discussed
in terms of convergent sequences. (To be sure, this occasionally necessitates the use of a weak form of the axiom of
508 Appendix 2A3P
choice, in order to choose a sequence; but as measure theory without such choices is changed utterly see Chapter 56
in Volume 5 there is no point in fussing about them here.) For topological spaces in general, however, sequences are
quite inadequate, for very interesting reasons which I shall not enlarge upon. Instead we need to use nets or lters.
The latter take a moments more eort at the beginning, but are then (in my view) much easier to work with, so I
describe this method now.
2A3Q Convergent lters (a) Let (X, T) be a topological space, T a lter on X (see 2A1I) and x a point of X.
We say that T is convergent to x, or that x is a limit of T, and write T x, if every open set containing x belongs
to T.
(b) Let (X, T) and (Y, S) be topological spaces, : X Y a continuous function, x X and T a lter on X
converging to x. Then [[T]] (as dened in 2A1Ib) converges to (x) (because
1
[G] is an open set containing x
whenever G is an open set containing (x)).
2A3R Now we have the following characterization of compactness.
Theorem Let X be a topological space, and K a subset of X. Then K is compact i every ultralter on X containing
K has a limit in K.
proof (a) Suppose that K is compact and that T is an ultralter on X containing K. Set
( = G : G X is open, X G T.
Then the union of any two members of ( belongs to (, so the union of any nite number of members of ( belongs to
(; also no member of ( can include K, because X K / T. Because K is compact, it follows that ( cannot cover K.
Let x be any point of K

(. If G is any open set containing x, then G / ( so X G / T; but this means that G


must belong to T, because T is an ultralter. As G is arbitrary, T x. Thus every ultralter on X containing K has
a limit in K.
(b) Now suppose that every ultralter on X containing K has a limit in K. Let ( be a cover of K by open sets in
X. ??? Suppose, if possible, that ( has no nite subcover. Set
T = F : there is a nite (
0
(, F

(
0
K.
Then T is a lter on X. PPP (i) X

K so X T.

(
0
=

(
0
, K
for any nite (
0
(, by hypothesis, so / T. (ii) If E, F T there are nite sets (
1
, (
2
( such that E

(
1
and
F

(
2
both include K; now (E F)

((
1
(
2
) K so E F T. (iii) If X E F T then there is a nite
(
0
( such that F (
0
K; now E

(
0
K and E T. QQQ
By the Ultralter Theorem (2A1O), there is an ultralter T

on X including T. Of course K itself belongs to T,


so K T

. By hypothesis, T

has a limit x K. But now there is a set G ( containing x, and (X G) G K,


so X G T T

; which means that G cannot belong to T

, and x cannot be a limit of T

. XXX
So ( has a nite subcover. As ( is arbitrary, K must be compact.
Remark Note that this theorem depends vitally on the Ultralter Theorem and therefore on the axiom of choice.
2A3S Further calculations with lters (a) In general, it is possible for a lter to have more than one limit; but
in Hausdor spaces this does not occur. PPP (Compare 2A3Mb.) If (X, T) is Hausdor, and x, y are distinct points of
X, there are disjoint open sets G, H such that x G and y H. If now a lter T on X converges to x, G T so
H / T and T does not converge to y. QQQ
Accordingly we can safely write x = limT when T x in a Hausdor space.
(b) Now suppose that X is a set, T is a lter on X, (Y, S) is a Hausdor space, D T and : D Y is a function.
Then we write lim
xF
(x) for lim[[T]] if this is dened in Y ; that is, lim
xF
(x) = y i
1
[H] T for every open
set H containing y.
In the special case Y = R, lim
xF
(x) = a i x : [(x) a[ T for every > 0 (because every open set
containing a includes a set of the form [a , a +], which in turn includes the open set ]a , a +[).
(c) Suppose that X and Y are sets, T is a lter on X, is a non-empty family of pseudometrics on Y dening a
topology S on Y , and : X Y is a function. Then the image lter [[T]] converges to y Y i lim
xF
((x), y) = 0
in R for every . PPP (i) Suppose that [[T]] y. For every and > 0, U(y; ; ) = z : (z, y) < is
2A3Sg General topology 509
an open set containing y (2A3G), so belongs to [[T]], and its inverse image x : 0 ((x), y) < belongs to
T. As is arbitrary, lim
xF
((x), y) = 0. As is arbitrary, satises the condition. (ii) Now suppose that
lim
xF
((x), y) = 0 for every . Let G be any open set in Y containing y. Then there are
0
, . . . ,
n
and
> 0 such that
U(y;
0
, . . . ,
n
; ) =

in
U(y;
i
; ) G.
For each i n,

1
[U(y;
i
; )] = x : ((x), y) <
belongs to T; because T is closed under nite intersections, so do
1
[U(y;
0
, . . . ,
n
; )] and its superset
1
[G]. Thus
G [[T]]. As G is arbitrary, [[T]] y. QQQ
(d) In particular, taking X = Y and the identity map, if X has a topology T dened by a non-empty family P of
pseudometrics, then a lter T on X converges to x X i lim
yF
(y, x) = 0 for every P.
(e)(i) If X is any set, T is an ultralter on X, (Y, S) is a Hausdor space, and h : X Y is a function such that h[F]
is relatively compact in Y for some F T, then lim
xF
h(x) is dened in Y . PPP Let K Y be a compact set including
h[F]. Then K h[[T]], which is an ultralter (2A1N), so h[[T]] has a limit in Y (2A3R), which is lim
xF
h(x). QQQ
(ii) If X is any set, T is an ultralter on X, and h : X R is a function such that h[F] is bounded in R for
some set F T, then lim
xF
h(x) exists in R. PPP h[F] is closed and bounded, therefore compact (2A2F), so h[F] is
relatively compact and we can use (i). QQQ
(f ) The concepts of limsup, liminf can be applied to lters. Suppose that T is a lter on a set X, and that
f : X [, ] is any function. Then
limsup
xF
f(x) = infu : u [, ], x : f(x) u T
= inf
FF
sup
xF
f(x) [, ],
liminf
xF
f(x) = supu : u [, ], x : f(x) u T
= sup
FF
inf
xF
f(x).
It is easy to see that, for any two functions f, g : X R,
lim
xF
f(x) = a i a = limsup
xF
f(x) = liminf
xF
f(x),
and
limsup
xF
f(x) +g(x) limsup
xF
f(x) + limsup
xF
g(x),
liminf
xF
f(x) +g(x) liminf
xF
f(x) + liminf
xF
g(x),
liminf
xF
(f(x)) = limsup
xF
f(x), limsup
xF
(f(x)) = liminf
xF
f(x),
liminf
xF
cf(x) = c liminf
xF
f(x), limsup
xF
cf(x) = c limsup
xF
f(x)
whenever the right-hand-sides are dened in [, ] and c 0. So if a = lim
xF
f(x) and b = lim
xF
(x) exist in R,
lim
xF
f(x) +g(x) exists and is equal to a +b and lim
xF
cf(x) exists and is equal to c lim
xF
f(x) for every c R.
We also see that if f : X R is such that
for every > 0 there is an F T such that sup
xF
f(x) + inf
xF
f(x),
then limsup
xF
f(x) + liminf
xF
f(x) for every > 0, so that lim
xF
f(x) is dened in [, ].
(g) Note that the standard limits of real analysis can be represented in the form described here. For instance,
lim
n
, limsup
n
, liminf
n
correspond to lim
nF
0
, limsup
nF
0
, liminf
nF
0
where T
0
is the Frechet lter
on N, the lter N A : A N is nite of conite subsets of N. Similarly, lim
a
, limsup
a
, liminf
a
correspond to
lim
F
, limsup
F
, liminf
F
where
T = A : A R, h > 0 such that ]a, a +h] A.
510 Appendix 2A3T
2A3T Product topologies We need some brief remarks concerning topologies on product spaces.
(a) Let (X, T) and (Y, S) be topological spaces. Let U be the set of subsets U of XY such that for every (x, y) U
there are G T, H S such that (x, y) G H U. Then U is a topology on X Y . PPP (i) U because the
condition for membership of U is vacuously satised. X Y U because X T, Y S and (x, y) X Y X Y
for every (x, y) X Y . (ii) If U, V U and (x, y) U V , then there are G, G

T, H, H

S such that
(x, y) GH U, (x, y) G

V ;
now G G

T, H H

S and
(x, y) (G G

) (H H

) U V .
As (x, y) is arbitrary, U V U. (iii) If | U and (x, y)

|, then there is a U | such that (x, y) U; now there


are G T, H S such that (x, y) GH U

|. As (x, y) is arbitrary,

| U. QQQ
U is called the product topology on X Y .
(b) Suppose, in (a), that T and S are dened by non-empty families P, of pseudometrics in the manner of 2A3F.
Then U is dened by the family = : P

: of pseudometrics on X Y , where
((x, y), (x

, y

)) = (x, x

),

((x, y), (x

, y

)) = (y, y

)
whenever x, x

X, y, y

Y , P and .
PPP (i) Of course you should check that every ,

is a pseudometric on X Y .
(ii) If U U and (x, y) U, then there are G T, H Ssuch that (x, y) GH U. There are
0
, . . . ,
m
P,

0
, . . . ,
n
, ,

> 0 such that (in the language of 2A3Fc) U(x;


0
, . . . ,
m
; ) G, U(x;
0
, . . . ,
n
; ) H. Now
U((x, y);
0
, . . . ,
m
,

0
, . . . ,

n
; min(,

)) U.
As (x, y) is arbitrary, U is open for the topology generated by .
(iii) If U X Y is open for the topology dened by , take any (x, y) U. Then there are
0
, . . . ,
k

and > 0 such that U((x, y);
0
, . . . ,
k
; ) U. Take
0
, . . . ,
m
P and
0
, . . . ,
n
such that
0
, . . . ,
k


0
, . . . ,
m
,

0
, . . . ,

n
; then G = U(x;
0
, . . . ,
m
; ) T (2A3G), H = U(y;
0
, . . . ,
n
; ) S, and
(x, y) GH = U((x, y);
0
, . . . ,
m
,

0
, . . . ,

n
; ) U((x, y);
0
, . . . ,
k
; ) U.
As (x, y) is arbitrary, U U. This completes the proof that U is the topology dened by . QQQ
(c) In particular, the product topology on R
r
R
s
is the Euclidean topology if we identify R
r
R
s
with R
r+s
. PPP
The product topology is dened by the two pseudometrics
1
,
2
, where for x, x

R
r
and y, y

R
s
I write

1
((x, y), (x

, y

)) = |x x

|,
2
((x, y), (x

, y

)) = |y y

|
(2A3F(b-ii)). Similarly, the Euclidean topology on R
r
R
s
= R
r+s
is dened by the metric , where
((x, y), (x

, y

)) = |(x y) (x

, y

)| =
_
|x x

|
2
+|y y

|
2
.
Now if (x, y) R
r
R
s
and > 0, then
U((x, y); ; ) U((x, y);
j
; )
for both j, while
U((x, y);
1
,
2
;

2
) U((x, y); ; ).
Thus, as remarked in 2A3Ib, each topology is included in the other, and they are the same. QQQ
2A3U Dense sets (a) If X is a topological space, a set D X is dense in X if D = X, that is, if every non-empty
open set meets D. More generally, if D A X, then D is dense in A if it is dense for the subspace topology of A
(2A3C), that is, if A D.
(b) If T is dened by a non-empty family P of pseudometrics on X, then D X is dense i U(x;
0
, . . . ,
n
; )D ,=
whenever x X,
0
, . . . ,
n
P and > 0.
(c) If (X, T), (Y, S) are topological spaces, of which Y is Hausdor (in particular, if (X, ) and (Y, ) are metric
spaces), and f, g : X Y are continuous functions which agree on some dense subset D of X, then f = g. PPP???
Suppose, if possible, that there is an x X such that f(x) ,= g(x). Then there are open sets G, H Y such that
f(x) G, g(x) H and G H = . Now f
1
[G] g
1
[H] is an open set, containing x and therefore not empty, but
it cannot meet D, so x / D and D is not dense. XXXQQQ
2A4E Normed spaces 511
(d) A topological space is called separable if it has a countable dense subset. For instance, R
r
is separable for
every r 1, since Q
r
is dense.
2A4 Normed spaces
In Chapter 24 I discuss the spaces L
p
, for 1 p , and describe their most basic properties. These spaces form
a group of fundamental examples for the general theory of normed spaces, the basis of functional analysis. This is
not the book from which you should learn that theory, but once again it may save you trouble if I briey outline those
parts of the general theory which are essential if you are to make sense of the ideas here.
2A4A The real and complex elds While the most important parts of the theory, from the point of view of
measure theory, are most eectively dealt with in terms of real linear spaces, there are many applications in which
complex linear spaces are essential. I will therefore use the phrase
U is a linear space over
R
C

to mean that U is either a linear space over the eld R or a linear space over the eld C; it being understood that in
any particular context all linear spaces considered will be over the same eld. In the same way, I will write
R
C
to
mean that belongs to whichever is the current underlying eld.
2A4B Denitions (a) A normed space is a linear space U over
R
C
together with a norm, that is, a functional
| | : U [0, [ such that
|u +v| |u| +|v| for all u, v U,
|u| = [[|u| for u U,
R
C
,
|u| = 0 only when u = 0, the zero vector of U.
(Observe that if u = 0 (the zero vector) then 0u = u (where this 0 is the zero scalar) so that |u| = [0[|u| = 0.)
(b) If U is a normed space, then we have a metric on U dened by saying that (u, v) = |u v| for u,
v U. PPP (u, v) [0, [ for all u, v because |u| [0, [ for every u. (u, v) = (v, u) for all u, v because
|v u| = [ 1[|u v| = |u v| for all u, v. If u, v, w U then
(u, w) = |u w| = |(u v) + (v w)| |u v| +|v w| = (u, v) +(v, w).
If (u, v) = 0 then |u v| = 0 so u v = 0 and u = v. QQQ
We therefore have a corresponding topology, with open and closed sets, closures, convergent sequences and so on.
(c) If U is a normed space, a set A U is bounded (for the norm) if |u| : u A is bounded in R; that is, there
is some M 0 such that |u| M for every u A.
2A4C Linear subspaces (a) If U is any normed space and V is a linear subspace of U, then V is also a normed
space, if we take the norm of V to be just the restriction to V of the norm of U; the verication is trivial.
(b) If V is a linear subspace of U, so is its closure V . PPP Take u, u

V and
R
C
. If > 0, set = /(2 +[[) > 0;
then there are v, v

V such that |u v| , |u

| . Now v +v

, v V and
|(u +u

) (v +v

)| |u v| +|u

| , |u v| [[|u v| .
As is arbitrary, u +u

and u belong to V ; as u, u

and are arbitrary, and 0 surely belongs to V V , V is a linear


subspace of U. QQQ
2A4D Banach spaces (a) If U is a normed space, a sequence u
n

nN
in U is Cauchy if |u
m
u
n
| 0 as m,
n , that is, for every > 0 there is an n
0
N such that |u
m
u
n
| for all m, n n
0
.
(b) A normed space U is complete if every Cauchy sequence has a limit; a complete normed space is called a
Banach space.
2A4E It is helpful to know the following result.
Lemma Let U be a normed space such that u
n

nN
is convergent (that is, has a limit) in U whenever u
n

nN
is a
sequence in U such that |u
n+1
u
n
| 4
n
for every n N. Then U is complete.
512 Appendix 2A4E
proof Let u
n

nN
be any Cauchy sequence in U. For each k N, let n
k
N be such that |u
m
u
n
| 4
k
whenever
m, n n
k
. Set v
k
= u
n
k
for each k. Then |v
k+1
v
k
| 4
k
(whether n
k
n
k+1
or n
k+1
n
k
). So v
k

kN
has a
limit v U. I seek to show that v is the required limit of u
n

nN
. Given > 0, let l N be such that |v
k
v|
for every k l; let k l be such that 4
k
; then if n n
k
,
|u
n
v| = |(u
n
v
k
) + (v
k
v)| |u
n
v
k
| +|v
k
v| |u
n
u
n
k
| + 2.
As is arbitrary, v is a limit of u
n

nN
. As u
n

nN
is arbitrary, U is complete.
2A4F Bounded linear operators (a) Let U, V be two normed spaces. A linear operator T : U V is bounded
if |Tu| : u U, |u| 1 is bounded. (Warning! in this context, we do not ask for the whole set of values T[U] to
be bounded; a bounded linear operator need not be what we ordinarily call a bounded function.) Write B(U; V ) for
the space of all bounded linear operators from U to V , and for T B(U; V ) write |T| = sup|Tu| : u U, |u| 1.
(b) A useful fact: |Tu| |T||u| for every T B(U; V ), u U. PPP If [[ > |u| then
|
1

u| =
1
||
|u| 1,
so
|Tu| = |T(
1

u)| = [[|T(
1

u)| [[|T|;
as is arbitrary, |Tu| |T||u|. QQQ
(c) A linear operator T : U V is bounded i it is continuous for the norm topologies on U and V . PPP (i) If T is
bounded, u
0
U and > 0, then
|Tu Tu
0
| = |T(u u
0
)| |T||u u
0
|
whenever |u u
0
|

1+T
; by 2A3H, T is continuous. (ii) If T is continuous, then there is some > 0 such that
|Tu| = |Tu T0| 1 whenever |u| = |u 0| . If now |u| 1,
|Tu| =
1

|T(u)|
1

,
so T is a bounded operator. QQQ
2A4G Theorem B(U; V ) is a linear space over
R
C
, and | | is a norm on B(U; V ).
proof I am rather supposing that you are aware, but in any case you will nd it easy to check, that if S : U V and
T : U V are linear operators, and
R
C
, then we have linear operators S +T and T from U to V dened by the
formulae
(S +T)(u) = Su +Tu, (T)(u) = (Tu)
for every u U; moreover, that under these denitions of addition and scalar multiplication the space of all linear
operators from U to V is a linear space. Now we see that whenever S, T B(U; V ),
R
C
, u U and |u| 1,
|(S +T)(u)| = |Su +Tu| |Su| +|Tu| |S| +|T|,
|(T)u| = |(Tu)| = [[|Tu| [[|T|;
so that S +T and T belong to B(U; V ), with |S +T| |S| +|T| and |T| [[|T|. This shows that B(U; V ) is
a linear subspace of the space of all linear operators and is therefore a linear space over
R
C
in its own right. To check
that the given formula for |T| denes a norm, most of the work has just been done; I suppose I should remark, for the
sake of form, that |T| [0, [ for every T; if = 0, then of course |T| = 0 = [[|T|; for other ,
[[|T| = [[|
1
T| [[[
1
[|T| = |T|,
so |T| = [[|T|. Finally, if |T| = 0 then |Tu| |T||u| = 0 for every u U, so Tu = 0 for every u and T is the
zero operator (in the space of all linear operators, and therefore in its subspace B(U; V )).
2A4H Dual spaces The most important case of B(U; V ) is when V is the scalar eld
R
C
itself (of course we can
think of
R
C
as a normed space over itself, writing || = [[ for each scalar ). In this case we call B(U;
R
C
) the dual of
U; it is commonly denoted U

or U

; I use the latter.


*2A4K Normed spaces 513
2A4I Extensions of bounded operators: Theorem Let U be a normed space and V U a dense linear
subspace. Let W be a Banach space and T
0
: V W a bounded linear operator; then there is a unique bounded linear
operator T : U W extending T
0
, and |T| = |T
0
|.
proof (a) For any u U, there is a sequence v
n

nN
in V converging to u. Now
|T
0
v
m
T
0
v
n
| = |T
0
(v
m
v
n
)| |T
0
||v
m
v
n
| |T
0
|(|v
m
u| +|u v
n
|) 0
as m, n , so T
0
v
n

nN
is Cauchy and w = lim
n
T
0
v
n
is dened in W. If v

nN
is another sequence in V
converging to u, then
|w T
0
v

n
| |w T
0
v
n
| +|T
0
(v
n
v

n
)|
|w T
0
v
n
| +|T
0
|(|v
n
u| +|u v

n
|) 0
as n , so w is also the limit of T
0
v

nN
.
(b) We may therefore dene T : U W by setting Tu = lim
n
T
0
v
n
whenever v
n

nN
is a sequence in V
converging to u. If v V , then we can set v
n
= v for every n to see that Tv = T
0
v; thus T extends T
0
. If u, u

U
and
R
C
, take sequences v
n

nN
, v

nN
in V converging to u, u

respectively; in this case


|(u +u

) (v
n
+v

n
)| |u v
n
| +|u

n
| 0, |u u
n
| = [[|u u
n
| 0
as n , so that T(u +u

) = lim
n
T
0
(v
n
+v

n
), T(u) = lim
n
T
0
(v
n
), and
|T(u +u

) Tu Tu

| |T(u +u

) T
0
(v
n
+v

n
)| +|T
0
v
n
Tu| +|T
0
v

n
Tu

|
0,
|T(u) Tu| |T(u) T
0
(v
n
)| +[[|T
0
v
n
Tu| 0
as n . This means that |T(u +u

) Tu Tu

| = 0, |T(u) Tu| = 0 so T(u +u

) = Tu +Tu

, T(u) = Tu;
as u, u

and are arbitrary, T is linear.


(c) For any u U, let v
n

nN
be a sequence in V converging to u. Then
|Tu| |T
0
v
n
| +|Tu T
0
v
n
| |T
0
||v
n
| +|Tu T
0
v
n
|
|T
0
|(|u| +|v
n
u|) +|Tu T
0
v
n
| |T
0
||u|
as n , so |Tu| |T
0
||u|. As u is arbitrary, T is bounded and |T| |T
0
|. Of course |T| |T
0
| just because
T extends T
0
.
(d) Finally, let

T be any other bounded linear operator from U to W extending T. If u U, there is a sequence
v
n

nN
in V converging to u; now
|

Tu Tu| |

T(u v
n
)| +|T(v
n
u)| (|

T| +|T|)|u v
n
| 0
as n , so |

Tu Tu| = 0 and

Tu = Tu. As u is arbitrary,

T = T. Thus T is unique.
2A4J Normed algebras (a) A normed algebra is a normed space (U, | |) together with a multiplication, a
binary operator on U, such that
u (v w) = (u v) w,
u (v +w) = (u v) + (u w), (u +v) w = (u w) + (v w),
(u) v = u (v) = (u v),
|u v| |u||v|
for all u, v, w U and
R
C
.
(b) A Banach algebra is a normed algebra which is a Banach space. A normed algebra U is commutative if its
multiplication is commutative, that is, u v = v u for all u, v U.
*2A4K Denition A normed space U is uniformly convex if for every > 0 there is a > 0 such that
|u +v| 2 whenever u, v U, |u| = |v| = 1 and |u v| .
514 Appendix 2A5 intro.
2A5 Linear topological spaces
The principal objective of 2A3 is in fact the study of certain topologies on the linear spaces of Chapter 24. I give
some fragments of the general theory.
2A5A Linear space topologies Something which is not covered in detail by every introduction to functional
analysis is the general concept of linear topological space. The ideas needed for the work of 245 are reasonably briey
expressed.
Denition A linear topological space or topological vector space over
R
C
is a linear space U over
R
C
together
with a topology T such that the maps
(u, v) u +v : U U U,
(, u) u :
R
C
U U
are both continuous, where the product spaces U U and
R
C
U are given their product topologies (2A3T). Given a
linear space U, a topology on U satisfying the conditions above is a linear space topology. Note that
(u, v) u v = u + (1)v : U U U
will also be continuous.
2A5B All the linear topological spaces we need turn out to be readily presentable in the following terms.
Proposition Suppose that U is a linear space over
R
C
, and T is a family of functionals : U [0, [ such that
(i) (u +v) (u) +(v) for all u, v U, T;
(ii) (u) (u) if u U, [[ 1, T;
(iii) lim
0
(u) = 0 for every u U, T.
For T, dene

: U U [0, [ by setting

(u, v) = (u v) for all u, v U. Then each

is a pseudometric
on U, and the topology dened by P =

: T renders U a linear topological space.


proof (a) It is worth noting immediately that
(0) = lim
0
(0) = 0
for every T.
(b) To see that every

is a pseudometric, argue as follows.


(i)

takes values in [0, [ because does.


(ii) If u, v, w U then

(u, w) = (u w) = ((u v) + (v w))


(u v) +(v w) =

(u, v) +

(v, w).
(iii) If u, v U, then
(v, u) = (v u) = (1(u v)) (u, v) =

(u, v),
and similarly

(u, v)

(v, u), so the two are equal.


(iv) If u U then

(u, u) = (0) = 0.
(c) Let T be the topology on U dened by

: T (2A3F).
(i) Addition is continuous because, given T, we have

(u

+v

, u +v) = ((u

+v

) (u +v))
(u

u) +(v

v)

(u

, u) +

(v

, v)
for all u, v, u

, v

U. This means that, given > 0 and (u, v) U U, we shall have

(u

+v

, u +v) whenever (u

, v

) U((u, v);

;

2
),
2A5F Linear topological spaces 515
using the language of 2A3Tb. Because

are two of the pseudometrics dening the product topology of U U


(2A3Tb), (u, v) u +v is continuous, by the criterion of 2A3H.
(ii) Scalar multiplication is continuous because if u U and n N then (nu) n(u) for every T (induce
on n). Consequently, if T,
(u) n(

n
u) n(u)
whenever [[ < n N and T. Now, given (, u)
R
C
U and > 0, take n > [[ and > 0 such that
min(n [[,

2n
) and (u)

2
whenever [[ ; then

, u) = (

u) (

(u

u)) +((

)u)
n(u

u) +((

)u)
whenever u

U and


R
C
and [

[ < n N. Accordingly, setting (

, ) = [

[ for

,
R
C
,

, u) n +

2

whenever
(

, u

) U((, u);

,

; ).
Because

and

are among the pseudometrics dening the topology of


R
C
U, the map (, u) u satises the
criterion of 2A3H and is continuous.
Thus T is a linear space topology on U.
*2A5C We do not need it for Chapter 24, but the following is worth knowing.
Theorem Let U be a linear space and T a linear space topology on U.
(a) There is a family T of functionals satisfying the conditions (i)-(iii) of 2A5B and dening T.
(b) If T is metrizable, we can take T to consist of a single functional.
proof (a) Kelley & Namioka 76, p. 50.
(b) K othe 69, 15.11.
2A5D Denition Let U be a linear space over
R
C
. Then a seminorm on U is a functional : U [0, [ such that
(i) (u +v) (u) +(v) for all u, v U;
(ii) (u) = [[(u) if u U,
R
C
.
Observe that a norm is always a seminorm, and that a seminorm is always a functional of the type described in 2A5B.
In particular, the association of a metric with a norm (2A4Bb) is a special case of 2A5B.
2A5E Convex sets (a) Let U be a linear space over
R
C
. A subset C of U is convex if u+(1)v C whenever u,
v C and [0, 1]. The intersection of any family of convex sets is convex, so for every set A U there is a smallest
convex set including A; this is just the set of vectors expressible as

n
i=0

i
u
i
where u
0
, . . . , u
n
A,
0
, . . . ,
n
[0, 1]
and

n
i=0

i
= 1 (Bourbaki 87, II.2.3); it is the convex hull of A.
(b) If U is a linear topological space, the closure of any convex set is convex (Bourbaki 87, II.2.6). It follows that,
for any A U, the closure of the convex hull of A is the smallest closed convex set including A; this is the closed
convex hull of A.
(c) I note for future reference that in a linear topological space, the closure of any linear subspace is a linear subspace.
(Bourbaki 87, I.1.3; K othe 69, 15.2. Compare 2A4Cb.)
2A5F Completeness in linear topological spaces In normed spaces, completeness can be described in terms
of Cauchy sequences (2A4D). In general linear topological spaces this is inadequate. The true theory of completeness
demands the concept of uniform space (see 3A4 in the next volume, or Kelley 55, chap. 6; Engelking 89, 8.1:
Bourbaki 66, chap. II); I shall not describe this here, but will give a version adapted to linear spaces. I mention this
only because you will I hope some day come to the general theory (in Volume 3 of this treatise, if not before), and you
should be aware that the special case described here gives a misleading emphasis at some points.
Denitions Let U be a linear space over
R
C
, and T a linear space topology on U. A lter T on U is Cauchy if for
every open set G in U containing 0 there is an F T such that F F = u v : u, v F is included in G. U is
complete if every Cauchy lter on U is convergent.
516 Appendix 2A5G
2A5G Cauchy lters have a simple description when a linear space topology is dened by the method of 2A5B.
Lemma Let U be a linear space over
R
C
, and let T be a family of functionals dening a linear space topology on U, as
in 2A5B. Then a lter T on U is Cauchy i for every T and > 0 there is an F T such that (u v) for all
u, v F.
proof (a) Suppose that T is Cauchy, T and > 0. Then G = U(0;

; ) is open (using the language of 2A3F-2A3G),


so there is an F T such that F F G; but this just means that (u v) < for all u, v F.
(b) Suppose that T satises the criterion, and that G is an open set containing 0. Then there are
0
, . . . ,
n
T
and > 0 such that U(0;

0
, . . . ,

n
; ) G. For each i n there is an F
i
T such that
i
(u, v) <

2
for all u, v F
i
;
now F =

in
F
i
T and u v G for all u, v F.
2A5H Normed spaces and sequential completeness I had better point out that for normed spaces the denition
of 2A5F agrees with that of 2A4D.
Proposition Let (U, | |) be a normed space over
R
C
, and let T be the linear space topology on U dened by the method
of 2A5B from the set T = | |. Then U is complete in the sense of 2A5F i it is complete in the sense of 2A4D.
proof (a) Suppose rst that U is complete in the sense of 2A5F. Let u
n

nN
be a sequence in U which is Cauchy in
the sense of 2A4Da. Set
T = F : F U, n : u
n
/ F is nite.
Then it is easy to check that T is a lter on U, the image of the Frechet lter under the map n u
n
: N U.
If > 0, take m N such that |u
j
u
k
| whenever j, k m; then F = u
j
: j m belongs to T, and
|u v| for all u, v F. So T is Cauchy in the sense of 2A5F, and has a limit u say. Now, for any > 0, the set
v : |v u| < = U(u;

; ) is an open set containing u, so belongs to T, and n : |u
n
u| is nite, that is,
there is an m N such that |u
m
u| < whenever n m. As is arbitrary, u = lim
n
u
n
in the sense of 2A3M.
As u
n

nN
is arbitrary, U is complete in the sense of 2A4D.
(b) Now suppose that U is complete in the sense of 2A4D. Let T be a Cauchy lter on U. For each n N, choose
a set F
n
T such that |u v| 2
n
for all u, v F
n
. For each n N, F

n
=

in
F
i
belongs to T, so is not empty;
choose u
n
F

n
. If m N and j, k m, then both u
j
and u
k
belong to F
m
, so |u
j
u
k
| 2
m
; thus u
n

nN
is a
Cauchy sequence in the sense of 2A4Da, and has a limit u say. Now take any > 0 and m N such that 2
m+1
.
There is surely a k m such that |u
k
u| 2
m
; now u
k
F
m
, so
F
m
v : |v u
k
| 2
m
v : |v u| 2
m+1
v :

(v, u) ,
and v :

(v, u) T. As is arbitrary, T converges to u, by 2A3Sd. As T is arbitrary, U is complete.
(c) Thus the two denitions coincide, provided at least that we allow the countably many simultaneous choices of
the u
n
in part (b) of the proof.
2A5I Weak topologies I come now to brief notes on weak topologies on normed spaces; from the point of view
of this volume, these are in fact the primary examples of linear space topologies. Let U be a normed linear space over
R
C
.
(a) Write U

for its dual B(U;


R
C
) (2A4H). Let T be the set [h[ : h U

; then T satises the conditions of 2A5B,


so denes a linear space topology on U; this is called the weak topology of U.
(b) A lter T on U converges to u U for the weak topology of U i lim
vF

|h|
(v, u) = 0 for every h U

(2A3Sd), that is, i lim


vF
[h(v u)[ = 0 for every h U

, that is, i lim


vF
h(v) = h(u) for every h U

.
(c) A set C U is called weakly compact if it is compact for the weak topology of U. So (subject to the axiom
of choice) a set C U is weakly compact i for every ultralter T on U containing C there is a u C such that
lim
vF
h(v) = h(u) for every h U

(put 2A3R together with (b) above).


(d) A subset A of U is called relatively weakly compact if it is a subset of some weakly compact subset of U.
(e) If h U

, then h : U
R
C
is continuous for the weak topology on U and the usual topology of
R
C
; this is obvious
if we apply the criterion of 2A3H. So if A U is relatively weakly compact, h[A] must be bounded in
R
C
. PPP Let C A
be a weakly compact set. Then h[C] is compact in
R
C
, by 2A3Nb, so is bounded, by 2A2F (noting that if the underlying
eld is C, then it can be identied, as metric space, with R
2
). Accordingly h[A] also is bounded. QQQ
2A6C Factorization of matrices 517
(f ) If V is another normed space and T : U V is a bounded linear operator, then T is continuous for the respective
weak topologies. PPP If h V

then the composition hT belongs to U

. Now, for any u, v U,

|h|
(Tu, Tv) = [h(Tu Tv)[ = [hT(u v)[ =
|hT|
(u, v),
taking
|h|
,
|hT|
to be the pseudometrics on V , U respectively dened by the formula of 2A5B. By 2A3H, T is
continuous. QQQ
(g) Corresponding to the weak topology on a normed space U, we have the weak* or w*-topology on its dual U

,
dened by the set T = [ u[ : u U, where I write u(f) = f(u) for every f U

, u U. As in (a), this is a linear


space topology on U

. (It is essential to distinguish between the weak* topology and the weak topology on U

. The
former depends only on the action of U on U

, the latter on the action of U

= (U

. You will have no diculty in


checking that u U

for every u U, but the point is that there may be members of U

not representable in this


way, leading to open sets for the weak topology which are not open for the weak* topology.)
*2A5J Angelic spaces I do not rely on the following ideas, but they may throw light on some results in 246-247.
First, a topological space X is regular if whenever G X is open and x G then there is an open set H such that
x H H G. Next, a regular Hausdor space X is angelic if whenever A X is such that every sequence in A
has a cluster point in X, then A is compact and every point of A is the limit of a sequence in A. What this means is
that compactness in X, and the topologies of compact subsets of X, can be eectively described in terms of sequences.
Now the theorem (due to Eberlein and

Smulian) is that any normed space is angelic in its weak topology. (462D in
Volume 4; K othe 69, 24; Dunford & Schwartz 57, V.6.1.) In particular, this is true of L
1
spaces, which makes
it less surprising that there should be criteria for weak compactness in L
1
spaces which deal only with sequences.
2A6 Factorization of matrices
I spend a couple of pages on the linear algebra of R
r
required for Chapter 26. I give only one proof, because this is
material which can be found in any textbook of elementary linear algebra; but I think it may be helpful to run through
the basic ideas in the language which I use for this treatise.
2A6A Determinants We need to know the following things about determinants.
(i) Every r r real matrix T has a real determinant det T.
(ii) For any r r matrices S and T, det ST = det S det T.
(iii) If T is a diagonal matrix, its determinant is just the product of its diagonal entries.
(iv) For any r r matrix T, det T

= det T, where T

is the transpose of T.
(v) det T is a continuous function of the coecients of T.
There are so many routes through this topic that I avoid even a denition of determinant; I invite you to check your
memory, or your favourite text, to conrm that you are indeed happy with the facts above.
2A6B Orthonormal families For x = (
1
, . . . ,
r
), y = (
1
, . . . ,
r
) R
r
, write x. y =

r
i=1

i
; of course |x|,
as dened in 1A2A, is

x. x. Recall that x
1
, . . . , x
k
are orthonormal if x
i
. x
j
= 0 for i ,= j, 1 for i = j. The results
we need here are:
(i) If x
1
, . . . , x
k
are orthonormal vectors in R
r
, where k < r, then there are vectors x
k+1
, . . . , x
r
in R
r
such that x
1
, . . . , x
r
are orthonormal.
(ii) An r r matrix P is orthogonal if P

P is the identity matrix; equivalently, if the columns of P are


orthonormal.
(iii) For an orthogonal matrix P, det P must be 1 (put (ii)-(iv) of 2A6A together).
(iv) If P is orthogonal, then Px. Py = P

Px. y = x. y for all x, y R


r
.
(v) If P is orthogonal, so is P

= P
1
.
(vi) If P and Q are orthogonal, so is PQ.
2A6C I now give a proposition which is not always included in elementary presentations. Of course there are many
approaches to this; I oer a direct one.
Proposition Let T be any real r r matrix. Then T is expressible as PDQ where P and Q are orthogonal matrices
and D is a diagonal matrix with non-negative coecients.
518 Appendix 2A6C
proof I induce on r.
(a) If r = 1, then T = (
11
). Set D = ([
11
[), P = (1) and Q = (1) if
11
0, (1) otherwise.
(b)(i) For the inductive step to r + 1 2, consider the unit ball B = x : x R
r+1
, |x| 1. This is a closed
bounded set in R
r+1
, so is compact (2A2F). The maps x Tx : R
r+1
R
r+1
and x |x| : R
r+1
R are
continuous, so the function x |Tx| : B R is bounded and attains its bounds (2A2G), and there is a u B
such that |Tu| |Tx| for every x B. Observe that |Tu| must be the norm |T| of T as dened in 262H. Set
= |T| = |Tu|. If = 0, then T must be the zero matrix, and the result is trivial; so let us suppose that > 0. In
this case |u| must be exactly 1, since otherwise we should have u = |u|u

where |u

| = 1 and |Tu

| > |Tu|.
(ii) If x R
r+1
and x. u = 0, then Tx. Tu = 0. PPP??? If not, set = Tx. Tu ,= 0. Consider y = u +x for small
> 0. We have
|y|
2
= y. y = u. u + 2u. x +
2

2
x. x = |u|
2
+
2

2
|x|
2
= 1 +
2

2
|x|
2
,
while
|Ty|
2
= Ty. Ty = Tu. Tu + 2Tu. Tx +
2

2
Tx. Tx =
2
+ 2
2
+
2

2
|Tx|
2
.
But also |Ty|
2

2
|y|
2
(2A4Fb), so

2
+ 2
2
+
2

2
|Tx|
2

2
(1 +
2

2
|x|
2
)
and
2
2

2

2
|x|
2

2
|Tx|
2
,
that is,
2 (
2
|x|
2
|Tx|
2
).
But this surely cannot be true for all > 0, so we have a contradiction. XXXQQQ
(iii) Set v =
1
Tu, so that |v| = 1. Let u
1
, . . . , u
r+1
be orthonormal vectors such that u
r+1
= u, and let
Q
0
be the orthogonal (r + 1) (r + 1) matrix with columns u
1
, . . . , u
r+1
; then, writing e
1
, . . . , e
r+1
for the standard
orthonormal basis of R
r+1
, we have Q
0
e
i
= u
i
for each i, and Q
0
e
r+1
= u. Similarly, there is an orthogonal matrix P
0
such that P
0
e
r+1
= v.
Set T
1
= P
1
0
TQ
0
. Then
T
1
e
r+1
= P
1
0
Tu = P
1
0
v = e
r+1
,
while if x. e
r+1
= 0 then Q
0
x. u = 0 (2A6B(iv)), so that
T
1
x. e
r+1
= P
0
T
1
x. P
0
e
r+1
= TQ
0
x. v = 0,
by (ii). This means that T
1
must be of the form
_
S 0
0
_
,
where S is an r r matrix.
(iv) By the inductive hypothesis, S is expressible as

P

D

Q, where

P and

Q are orthogonal r r matrices and

D
is a diagonal r r matrix with non-negative coecients. Set
P
1
=
_

P 0
0 1
_
, Q
1
=
_

Q 0
0 1
_
, D =
_

D 0
0
_
.
Then P
1
and Q
1
are orthogonal and D is diagonal, with non-negative coecients, and P
1
DQ
1
= T
1
. Now set
P = P
0
P
1
, Q = Q
1
Q
1
0
,
so that P and Q are orthogonal and
PDQ = P
0
P
1
DQ
1
Q
1
0
= P
0
T
1
Q
1
0
= T.
Thus the induction proceeds.
272U Concordance 519
Concordance
I list here the section and paragraph numbers which have (to my knowledge) appeared in print in references to this
chapter, and which have since been changed.
211Ya Countable-cocountable algebra of R This exercise, referred to in the 2002 edition of Volume 3, has been
moved to 211Ye.
214J Subspace measures on measurable subspaces, direct sums 214J-214M, referred to in the 2002 and
2004 editions of Volume 3, the 2003 and 2006 editions of Volume 4, and the 2008 edition of Volume 5, have been moved
to 214K-214N.
214N Upper and lower integrals This result, referred to in the 2008 edition of Volume 5, has been moved to
214J.
215Yc Measurable envelopes This exercise, referred to in the 2000 edition of Volume 1, has been moved to
216Yc.
234 Section 234 has been rewritten, with a good deal of new material. The former paragraphs 234A-234G,
referred to in the 2002 and 2004 editions of Volume 3 and the 2003 and 2006 editions of Volume 4, are now 234I-234O.
235 Section 235 has been re-organized, with some material moved to 234. Specically, 235H, 235I, 235J, 235L,
235M, 235T and 235Xe, referred to in the 2002 and 2004 editions of Volume 3 and the 2003 and 2006 editions of Volume
4, are now dealt with in 234B, 235G, 235H, 235J, 235K, 235R and 234A.
241Yd Countable sup property This exercise, referred to in the 2002 edition of Volume 3, has been moved to
241Ye.
241Yh Quotient Riesz spaces This exercise, referred to in the 2002 edition of Volume 3, has been moved to
241Yc.
242Xf Inverse-measure-preserving functions This exercise, referred to in the 2002 edition of Volume 3, has
been moved to 242Xd.
242Yc Order-continuous norms This exercise, referred to in the 2002 edition of Volume 3, has been moved to
242Yg.
244O Complex L
p
This paragraph, referred to in the 2002 and 2004 editions of Volume 3, and the 2003 and 2006
editions of Volume 4, is now 244P.
244Xf L
p
and L
q
This exercise, referred to in the 2003 edition of Volume 4, has been moved to 244Xe.
244Yd-244Yf L
p
as Banach lattice These exercises, referred to in the 2002 and 2004 editions of Volume 3, are
now 244Ye-244Yg.
251N Paragraph numbers in the second half of 251, referred to in editions of Volumes 3 and 4 up to and including
2006, have been changed, so that 251M-251S are now 251N-251T.
252Yf Exercise This exercise, referred to in the rst edition of Volume 1, has been moved to 252Ym.
272S Distribution of a sum of independent random variables This result, referred to in the 2002 and 2004
editions of Volume 3, and the 2003 and 2006 editions of Volume 4, is now 272T.
272U Etemadis lemma This result, referred to in the 2003 and 2006 editions of Volume 4, is now 272V.
520 Concordance 272Yd
272Yd This exercise, referred to in the 2002 and 2004 editions of Volume 3, is now 272Ye.
273Xh This exercise, referred to in the 2006 edition of Volume 4, is now 273Xi.
References 521
References for Volume 2
Alexits G. [78] (ed.) Fourier Analysis and Approximation Theory. North-Holland, 1978 (Colloq. Math. Soc. Janos
Bolyai 19).
Antonov N.Yu. [96] Convergence of Fourier series, East J. Approx. 7 (1996) 187-196. [286 notes.]
Arias de Reyna J. [02] Pointwise Convergence of Fourier Series. Springer, 2002 (Lecture Notes in Mathematics
1785). [286 notes.]
Bergelson V., March P. & Rosenblatt J. [96] (eds.) Convergence in Ergodic Theory and Probability. de Gruyter,
1996.
du Bois-Reymond P. [1876] Untersuchungen uber die Convergenz und Divergenz der Fouriersche Darstellungformeln,
Abh. Akad. M unchen 12 (1876) 1-103. [282 notes.]
Bourbaki N. [66] General Topology. Hermann/Addison-Wesley, 1968. [2A5F.]
Bourbaki N. [87] Topological Vector Spaces. Springer, 1987. [2A5E.]
Carleson L. [66] On convergence and growth of partial sums of Fourier series, Acta Math. 116 (1966) 135-157. [282
notes, 286 intro., 286 notes.]
Clarkson J.A. [36] Uniformly convex spaces, Trans. Amer. Math. Soc. 40 (1936) 396-414. [244O.]
Defant A. & Floret K. [93] Tensor Norms and Operator Ideals, North-Holland, 1993. [253 notes.]
Dudley R.M. [89] Real Analysis and Probability. Wadsworth & Brooks/Cole, 1989. [282 notes.]
Dunford N. & Schwartz J.T. [57] Linear Operators I. Wiley, 1957 (reprinted 1988). [244 notes, 2A5J.]
Enderton H.B. [77] Elements of Set Theory. Academic, 1977. [2A1.]
Engelking R. [89] General Topology. Heldermann, 1989 (Sigma Series in Pure Mathematics 6). [2A5F.]
Etemadi N. [96] On convergence of partial sums of independent random variables, pp. 137-144 in Bergelson
March & Rosenblatt 96. [272V.]
Evans L.C. & Gariepy R.F. [92] Measure Theory and Fine Properties of Functions. CRC Press, 1992. [263Ec, 265
notes.]
Federer H. [69] Geometric Measure Theory. Springer, 1969 (reprinted 1996). [262C, 263Ec, 264 notes, 265 notes,
266 notes.]
Feller W. [66] An Introduction to Probability Theory and its Applications, vol. II. Wiley, 1966. [Chap. 27 intro.,
274H, 275Xc, 285N.]
Fremlin D.H. [74] Topological Riesz Spaces and Measure Theory. Cambridge U.P., 1974. [232 notes, 241F, 244
notes, 245 notes, 247 notes.]
Fremlin D.H. [93] Real-valued-measurable cardinals, pp. 151-304 in Judah 93. [232Hc.]
Haimo D.T. [67] (ed.) Orthogonal Expansions and their Continuous Analogues. Southern Illinois University Press,
1967.
Hall P. [82] Rates of Convergence in the Central Limit Theorem. Pitman, 1982. [274Hc.]
Halmos P.R. [50] Measure Theory. Van Nostrand, 1950. [251 notes, 252 notes, 255Yn.]
Halmos P.R. [60] Naive Set Theory. Van Nostrand, 1960. [2A1.]
Hanner O. [56] On the uniform convexity of L
p
and l
p
, Arkiv for Matematik 3 (1956) 239-244. [244O.]
Henle J.M. [86] An Outline of Set Theory. Springer, 1986. [2A1.]
Hoeding W. [63] Probability inequalities for sums of bounded random variables, J. Amer. Statistical Association
58 (1963) 13-30. [272W.]
Hunt R.A. [67] On the convergence of Fourier series, pp. 235-255 in Haimo 67. [286 notes.]
Jorsbe O.G. & Mejlbro L. [82] The Carleson-Hunt Theorem on Fourier Series. Springer, 1982 (Lecture Notes in
Mathematics 911). [286 notes.]
Judah H. [93] (ed.) Proceedings of the Bar-Ilan Conference on Set Theory and the Reals, 1991. Amer. Math. Soc.
(Israel Mathematical Conference Proceedings 6), 1993.
Kelley J.L. [55] General Topology. Van Nostrand, 1955. [2A5F.]
Kelley J.L. & Namioka I. [76] Linear Topological Spaces. Springer, 1976. [2A5C.]
Kirzbraun M.D. [34]

Uber die zusammenziehenden und Lipschitzian Transformationen, Fund. Math. 22 (1934)


77-108. [262C.]
Kolmogorov A.N. [26] Une serie de Fourier-Lebesgue divergente partout, C. R. Acad. Sci. Paris 183 (1926) 1327-
1328. [282 notes.]
Komlos J. [67] A generalization of a problem of Steinhaus, Acta Math. Acad. Sci. Hung. 18 (1967) 217-229. [276H.]
522 References
K orner T.W. [88] Fourier Analysis. Cambridge U.P., 1988. [282 notes.]
K othe G. [69] Topological Vector Spaces I. Springer, 1969. [2A5C, 2A5J.]
Krivine J.-L. [71] Introduction to Axiomatic Set Theory. D. Reidel, 1971. [2A1.]
Lacey M. & Thiele C. [00] A proof of boundedness of the Carleson operator, Math. Research Letters 7 (2000) 1-10.
[286 intro., 286H.]
Lebesgue H. [72] Oeuvres Scientiques. LEnseignement Mathematique, Institut de Mathematiques, Univ. de Gen`eve,
1972. [Chap. 27 intro.]
Liapouno A. [1901] Nouvelle forme du theor`eme sur la limite de probabilite, Mem. Acad. Imp. Sci. St-Petersbourg
12(5) (1901) 1-24. [274Xg.]
Lighthill M.J. [59] Introduction to Fourier Analysis and Generalised Functions. Cambridge U.P., 1959. [284 notes.]
Lindeberg J.W. [22] Eine neue Herleitung des Exponentialgesetzes in der Wahrscheinlichkeitsrechnung, Math.
Zeitschrift 15 (1922) 211-225. [274Ha, 274 notes.]
Lipschutz S. [64] Set Theory and Related Topics. McGraw-Hill, 1964 (Schaums Outline Series). [2A1.]
Lo`eve M. [77] Probability Theory I. Springer, 1977. [Chap. 27 intro., 274H.]
Luxemburg W.A.J. & Zaanen A.C. [71] Riesz Spaces I. North-Holland, 1971. [241F.]
Mozzochi C.J. [71] On the Pointwise Convergence of Fourier Series. Springer, 1971 (Lecture Notes in Mathematics
199). [286 notes.]
Naor A. [04] Proof of the uniform convexity lemma, http://www.cims.nyu.edu/

naor/homepage files/inequality
.pdf, 26.2.04. [244O.]
Renyi A. [70] Probability Theory. North-Holland, 1970. [274Hc.]
Roitman J. [90] An Introduction to Set Theory. Wiley, 1990. [2A1.]
Roselli P. & Willem M. [02] A convexity inequality, Amer. Math. Monthly 109 (2002) 64-70. [244Ym.]
Schipp F. [78] On Carlesons method, pp. 679-695 in Alexits 78. [286 notes.]
Semadeni Z. [71] Banach spaces of continuous functions I. Polish Scientic Publishers, 1971. [253 notes.]
Shiryayev A. [84] Probability. Springer, 1984. [285N.]
Sj olin P. [71] Convergence almost everywhere of certain singular integrals and multiple Fourier series, Arkiv for
Math. 9 (1971) 65-90. [286 notes.]
Steele J.M. [86] An Efron-Stein inequality of nonsymmetric statistics, Annals of Statistics 14 (1986) 753-758.
[274Xj.]
Zaanen A.C. [83] Riesz Spaces II. North-Holland, 1983. [241F.]
Zygmund A. [59] Trigonometric Series. Cambridge U.P, 1959. [244 notes, 282 notes, 284Xj.]
convex Principal topics and results 523
Index to volumes 1 and 2
Principal topics and results
The general index below is intended to be comprehensive. Inevitably the entries are voluminous to the point that
they are often unhelpful. I have therefore prepared a shorter, better-annotated, index which will, I hope, help readers
to focus on particular areas. It does not mention denitions, as the bold-type entries in the main index are supposed to
lead eciently to these; and if you draw blank here you should always, of course, try again in the main index. Entries
in the form of mathematical assertions frequently omit essential hypotheses and should be checked against the formal
statements in the body of the work.
absolutely continuous real functions 225
as indenite integrals 225E
absolutely continuous additive functionals 232
characterization 232B
atomless measure spaces
have elements of all possible measures 215D
Borel sets in R
r
111G
and Lebesgue measure 114G, 115G, 134F
bounded variation, real functions of 224
as dierences of monotonic functions 224D
integrals of their derivatives 224I
Lebesgue decomposition 226C
Brunn-Minkowski theorem 266C
Cantor set and function 134G, 134H
Caratheodorys construction of measures from outer measures 113C
Carlesons theorem (Fourier series of square-integrable functions converge a.e.) 286
Central Limit Theorem (sum of independent random variables approximately normal) 274
Lindebergs condition 274F, 274G
change of measure in the integral 235K
change of variable in the integral 235

_
J gd =
_
g d 235A, 235E, 235J
nding J 235M;
J = [ det T[ for linear operators T 263A; J = [ det

[ for dierentiable operators 263D


when the measures are Hausdor measures 265B, 265E
when is inverse-measure-preserving 235G

_
gd =
_
J g d 235R
characteristic function of a probability distribution 285
sequences of distributions converge in vague topology i characteristic functions converge pointwise 285L
complete measure space 212
completion of a measure 212C et seq.
concentration of measure 264H
conditional expectation
of a function 233
as operator on L
1
() 242J
construction of measures
image measures 234C
from outer measures (Caratheodorys method) 113C
subspace measures 131A, 214A
product measures 251C, 251F, 251W, 254C
as pull-backs 234F
convergence theorems (B.Levi, Fatou, Lebesgue) 123
convergence in measure (linear space topology on L
0
)
on L
0
() 245
when Hausdor/complete/metrizable 245E
convex functions 233G et seq.
524 Index convolution
convolution of functions
(on R
r
) 255

_
h (f g) =
_
h(x +y)f(x)g(y)dxdy 255G
f (g h) = (f g) h 255J
convolution of measures
(on R
r
) 257

_
hd(
1

2
) =
_
h(x +y)
1
(dx)
2
(dy) 257B
of absolutely continuous measures 257F
countable sets 111F, 1A1C et seq.
countable-cocountable measure 211R
counting measure 112Bd
Denjoy-Young-Saks theorem 222L
dierentiable functions (from R
r
to R
s
) 262, 263
direct sum of measure spaces 214L
distribution of a nite family of random variables 271
as a Radon measure 271B
of (XXX) 271J
of an independent family 272G
determined by characteristic functions 285M
Doobs Martingale Convergence Theorem 275G
exhaustion, principle of 215A
extended real line 135
extension of measures 214P
Fatous Lemma (
_
liminf liminf
_
for sequences of non-negative functions) 123B
Fejer sums (running averages of Fourier sums) converge to local averages of f 282H
uniformly if f is continuous 282G
Fourier series 282
norm-converge in L
2
282J
converge at points of dierentiability 282L
converge to midpoints of jumps, if f of bounded variation 282O
and convolutions 282Q
converge a.e. for square-integrable function 286V
Fourier transforms
on R 283, 284
formula for
_
d
c
f in terms of

f 283F
and convolutions 283M
in terms of action on test functions 284H et seq.
of square-integrable functions 284O, 286U
inversion formulae for dierentiable functions 283I; for functions of bounded variation 283L

f(y) = lim
0
1

2
_

e
iyx
e
x
2
f(x)dx a.e. 284M
Fubinis theorem 252

_
fd( ) =
__
f(x, y)dxdy 252B
when both factors -nite 252C, 252H
for characteristic functions 252D, 252F
Fundamental Theorem of the Calculus (
d
dx
_
x
a
f = f(x) a.e.) 222; (
_
b
a
F

(x)dx = F(b) F(a)) 225E


Hahn decomposition of a countably additive functional 231E
Hardy-Littlewood Maximal Theorem 286A
Hausdor measures (on R
r
) 264
are topological measures 264E
r-dimensional Hausdor measure on R
r
a multiple of Lebesgue measure 264I
(r 1)-dimensional measure on R
r
265F-265H
image measures 234C
outer Principal topics and results 525
indenite integrals
dierentiate back to original function 222E, 222H
to construct measures 234I
independent random variables 272
joint distributions are product measures 272G
sums 272S, 272T, 272V, 272W
limit theorems 273, 274
inner regularity of measures
(with respect to compact sets) Lebesgue measure 134F; Radon measures 256
integration of real-valued functions, construction 122
as a positive linear functional 122O
acting on L
1
() 242B
by parts 225F
characterization of integrable functions 122P, 122R
over subsets 131D, 214E
functions and integrals with values in [, ] 133
Jensens inequality 233I-233J
expressed in L
1
() 242K
Komlos subsequence theorem 276H
Lebesgues Density Theorem (in R) 223
lim
h0
1
2h
_
x+h
xh
f = f(x) a.e. 223A
lim
h0
1
2h
_
x+h
xh
[f(x y)[dy = 0 a.e. 223D
(in R
r
) 261C, 261E
Lebesgue measure, construction of 114, 115
further properties 134
Lebesgues Dominated Convergence Theorem (
_
lim = lim
_
for dominated sequences of functions) 123C
B.Levis theorem (
_
lim = lim
_
for monotonic sequences of functions) 123A; (
_
=
_
) 226E
Lipschitz functions 262
dierentiable a.e. 262Q
localizable measure space
assembling partial measurable functions 213N
which is not strictly localizable 216E
Lusins theorem (measurable functions are almost continuous)
(on R
r
) 256F
martingales 275
L
1
-bounded martingales converge a.e. 275G
when of form E(X[
n
) 275H, 275I
measurable envelopes
elementary properties 132E
existence 213L
measurable functions
(real-valued) 121
sums, products and other operations on nitely many functions 121E
limits, inma, suprema 121F
measure space 112
Monotone Class Theorem 136B
monotonic functions
are dierentiable a.e. 222A
non-measurable set (for Lebesgue measure) 134B
ordering of measures 234P
outer measures constructed from measures 132A et seq.
outer regularity of Lebesgue measure 134F
526 Index Plancherel
Plancherel Theorem (on Fourier series and transforms of square-integrable functions) 282K, 284O
Poissons theorem (a count of independent rare events has an approximately Poisson distribution) 285Q
product of two measure spaces 251
basic properties of c.l.d. product measure 251I
Lebesgue measure on R
r+s
as a product measure 251N
more than two spaces 251W
Fubinis theorem 252B
Tonellis theorem 252G
and L
1
spaces 253
continuous bilinear maps on L
1
() L
1
() 253F
conditional expectation on a factor 253H
product of any number of probability spaces 254
basic properties of the (completed) product 254F
characterization with inverse-measure-preserving functions 254G
products of subspaces 254L
products of products 254N
determination by countable subproducts 254O
subproducts and conditional expectations 254R
Rademachers theorem (Lipschitz functions are dierentiable a.e.) 262Q
Radon measures
on R
r
256
as completions of Borel measures 256C
indenite-integral measures 256E
image measures 256G
product measures 256K
Radon-Nikod ym theorem (truly continuous additive set-functions have densities) 232E
in terms of L
1
() 242I
repeated integrals 252
indexvheaderRothberger
Stone-Weierstrass theorem 281
for Riesz subspaces of C
b
(X) 281A
for subalgebras of C
b
(X) 281E
for *-subalgebras of C
b
(X; C) 281G
strictly localizable measures
sucient condition for strict localizability 213O
strong law of large numbers (lim
n
1
n+1

n
i=0
(X
i
E(X
i
)) = 0 a.e.) 273
when

n=0
1
(n+1)
2
Var(X
n
) < 273D
when sup
nN
E([X
n
[
1+
) < 273H
for identically distributed X
n
273I
for martingale dierence sequences 276C, 276F
convergence of averages for | |
1
, | |
p
273N
subspace measures
for measurable subspaces 131
for arbitrary subspaces 214
surface measure in R
r
265
tensor products of L
1
spaces 253
Tonellis theorem (f is integrable if
__
[f(x, y)[dxdy < ) 252G
uniformly integrable sets in L
1
246
criteria for uniform integrability 246G
and convergence in measure 246J
and weak compactness 247C
atomless General index 527
Vitalis theorem (for coverings by intervals in R) 221A
(for coverings by balls in R
r
) 261B
weak compactness in L
1
() 247
Weyls Equidistribution Theorem 281N
c.l.d. version 213D et seq.
L
0
() (space of equivalence classes of measurable functions) 241
as Riesz space 241E
L
1
() (space of equivalence classes of integrable functions) 242
norm-completeness 242F
density of simple functions 242M
(for Lebesgue measure) density of continuous functions and step functions 242O
L
p
() (space of equivalence classes of pth-power-integrable functions, where 1 < p < ) 244
is a Banach lattice 244G
has dual L
q
, where
1
p
+
1
q
= 1 244K
and conditional expectations 244M
L

() (space of equivalence classes of bounded measurable functions) 243


duality with L
1
243F-243G
norm-completeness 243E
order-completeness 243H
-algebras of sets 111
generated by given families 136B, 136G
-nite measures 215B
General index
References in bold type refer to denitions; references in italics are passing references. Denitions marked with >>>
are those in which my usage is dangerously at variance with that of some other author or authors.
Abels theorem 224Yi
absolute summability 226Ac
absolutely continuous additive functional 232Aa, 232B, 232D, 232F-232I, 232Xa, 232Xb, 232Xd, 232Xf, 232Xh,
232Yk, 234Kb, 256J, 257Xf
absolutely continuous function 225B, 225C-225G, 225K-225O, 225Xa-225Xh, 225Xn, 225Xo, 225Ya, 225Yc, 232Xb,
233Xc, 244Yi, 252Ye, 256Xg, 262Bc, 263I, 264Yp, 265Ya, 274Xi, 282R, 282Yf, 283Ci
adapted martingale 275A
stopping time 275L
additive functional on an algebra of sets see nitely additive (136Xg, 231A), countably additive (231C)
adjoint operator 243Fc, 243 notes
algebra see algebra of sets (136E), Banach algebra (2A4Jb), normed algebra (2A4J)
algebra of sets 113Yi, 136E, 136F, 136G, 136Xg, 136Xh, 136Xk, 136Ya, 136Yb, 231A, 231B, 231Xa; see also
-algebra (111A)
almost continuous function 256F
almost every, almost everywhere 112Dd
almost surely 112De
analytic (complex) function 133Xc
angelic topological space 2A5J
Archimedean Riesz space 241F, 241Yb, 242Xc
area see surface measure
arithmetic mean 266A
asymptotic density 273G
asymptotically equidistributed see equidistributed (281Yi)
atom (in a measure space) 211I, 211Xb, 246G
atomic see purely atomic (211K)
atomless measure (space) 211J, 211Md, 211Q, 211Xb, 211Yd, 211Ye, 212G, 213He, 214K, 214Xd, 215D, 215Xe,
215Xf, 216A, 216Ya, 234Be, 234Nf, 234Yi, 251U, 251Wo, 251Xs, 251Xt, 251Yd, 252Yo, 252Yq, 252Ys, 254V, 254Yf,
256Xd, 264Yg
528 Index automorphism
automorphism see measure space automorphism
axiom see Banach-Ulam problem, choice (2A1J), countable choice
ball (in R
r
) 252Q, 252Xi, 264H; see also sphere
Banach algebra >>>2A4Jb
Banach lattice 242G, 242Xc, 242Yc, 242Ye, 243E, 243Xb; see also L
p
Banach space 231Yh, 262Yi, 2A4D, 2A4E, 2A4I; see also Banach algebra (2A4Jb), Banach lattice (242G), sepa-
rable Banach space
Banach-Ulam problem 232Hc
Berry-Esseen theorem 274Hc, 285 notes
Bienaymes equality 272S
bilinear map 253 (253A), 255 notes
Bochner integral 253Yf, 253Yg, 253Yi
Bochners theorem 285Xr
Borel algebra see Borel -algebra (111Gd, 135C, 256Ye)
Borel-Cantelli lemma 273K
Borel measurable function 121C, 121D, 121Eg, 121H, 121K, 121Yd, 134Fd, 134Xg, 134Yt, 135Ef, 135Xc, 135Xe,
225H, 225J, 225Yf, 233Hc, 241Be, 241I, 241Xd, 256M
Borel measure (on R) 211P, 216A
Borel sets in R, R
r
111G, 111Yd, 114G, 114Yd, 115G, 115Yb, 115Yd, 121Ef, 121K, 134F, 134Xd, 135C, 136D,
136Xj, 254Xs, 225J, 264E, 264F, 264Xb, 266Bc
Borel -algebra (of subsets of R
r
) 111Gd, 114Yg-114Yi, 121J, 121Xd, 121Xe, 121Yc, 121Yd, 212Xc, 212Xd, 216A,
251M;
(of other spaces) 135C, 135Xb, 256Ye, 271Ya
bounded bilinear operator 253Ab, 253E, 253F, 253L, 253Xb, 253Yf
bounded linear operator 242Je, 253Ab, 253F, 253Gc, 253L, 253Xc, 253Yf, 253Yj, 2A4F, 2A4G-2A4I, 2A5If; see
also B(U; V ) (2A4F)
bounded set (in R
r
) 134E; (in a normed space) 2A4Bc; (in a linear topological space) 245Yf; see also order-bounded
(2A1Ab)
bounded support, function with see compact support
bounded variation, function of 224 (224A, 224K), 225Cb, 225M, 225Oc, 225Xh, 225Xn, 225Yc, 226Bc, 226C,
226Yd, 263Ye, 264Yp, 265Yb, 282M, 282O, 283L, 283Xj, 283Xk, 283Xm, 283Xn, 283Xq, 284Xk, 284Yd
Brunn-Minkowski inequality 266C
Cantor function 134H, 134I, 222Xd, 225N, 226Cc, 262K, 264Xf, 264Yn
Cantor measure 256Hc, 256Ia, 256Xk, 264Ym
Cantor set 134G, 134H, 134I, 134Xf, 256Hc, 256Xk, 264J, 264Ym, 264Yn; see also 0, 1
I
Caratheodory complete measure space see complete (211A)
Caratheodorys method (of constructing measures) 113C, 113D, 113Xa, 113Xd, 113Xg, 113Yc, 114E, 114Xa, 115E,
121Ye, 132Xc, 136Ya, 212A, 212Xf, 213C, 213Xa, 213Xb, 213Xd, 213Xf, 213Xg, 213Xj, 213Ya, 214H, 214Xb, 216Xb,
251C, 251Wa, 251Xe, 264C, 264K
cardinal 2A1Kb, 2A1L
Carlesons theorem 282K remarks, 282 notes, 284Yg, 286U, 286V
Cauchy distribution 285Xm
Cauchy lter 2A5F, 2A5G
Cauchys inequality 244Eb
Cauchy sequence 242Yc, 2A4D
Central Limit Theorem 274G, 274I-274K, 274Xc-274Xg, 274Xk, 274Xl, 285N, 285Xn, 285Ym
Ces` aro sums 273Ca, 282Ad, 282N, 282Xn
chain rule for Radon-Nikod ym derivatives 235Xh
change of variable in integration 235, 263A, 263D, 263F, 263G, 263I, 263Xc, 263Xe, 263Yc
characteristic function (of a set) 122Aa
(of a probability distribution) 285 (285Aa)
(of a random variable) 285 (285Ab)
Chebyshevs inequality 271Xa
choice, axiom of 134C, 1A1G, 254Da, 2A1J, 2A1K-2A1M; see also countable choice
choice function 2A1J
convex General index 529
circle group 255M, 255Ym
closed convex hull 253Gb, 2A5Eb
closed interval (in R or R
r
) 114G, 115G, 1A1A; (in a general partially ordered space) 2A1Ab
closed set (in a topological space) 134Fb, 134Xd, 1A2E, 1A2F, 1A2G, 2A2A, 2A3A, 2A3D, 2A3Nb
closure of a set 2A2A, 2A2B, 2A3D, 2A3Kb, 2A5E; see also essential closure (266B)
cluster point (of a sequence) 2A3O
Coarea Theorem 265 notes
conite see nite-conite algebra (231Xa)
commutative (ring, algebra) 2A4Jb
commutative Banach algebra 224Yb, 243Dd, 255Xi, 257Ya
compact set (in a topological space) 247Xc, 2A2D, 2A2E-2A2G, >>>2A3N, 2A3R; see also relatively compact
(2A3Na), relatively weakly compact (2A5Id), weakly compact (2A5Ic)
compact support, function with 242Xh, 256Be, 256D, 262Yd-262Yg; see also C
k
(X), C
k
(X; C)
compact support, measure with 284Yi
compact topological space >>>2A3N
complete linear topological space 245Ec, 2A5F, 2A5H
complete locally determined measure (space) 213C, 213D, 213H, 213J, 213L, 213O, 213Xg, 213Xi, 213Xl, 213Yd,
213Ye, 214I, 216D, 216E, 234Nb, 251Yc, 252B, 252D, 252N, 252Xc, 252Ye, 252Ym, 252Yq-252Ys, 252Yv, 253Yj,
253Yk; see also c.l.d. version (213E)
complete measure (space) 112Df, 113Xa, 122Ya, 211A, 211M, 211N, 211R, 211Xc, 211Xd, 212, 214I, 214K, 216A,
216C-216E, 216Ya, 234Ha, 234I, 234Ld, 234Ye, 235Xl, 254Fd, 254G, 254J, 264Dc; see also complete locally determined
measure
complete metric space 224Ye; see also Banach space (2A4D)
complete normed space see Banach space (2A4D), Banach lattice (242G)
complete Riesz space see Dedekind complete (241Fc)
completed indenite-integral measure see indenite-integral measure (234J)
completion (of a measure (space)) 212 (212C), 213Fa, 213Xa, 213Xb, 213Xk, 214Ib, 214Xi, 216Yd, 232Xe, 234Ba,
234Ke, 234Lb, 234Xc, 234Xl, 234Ye, 234Yo, 235D, 241Xb, 242Xb, 243Xa, 244Xa, 245Xb, 251T, 251Wn, 251Xr, 252Ya,
254I, 256C
complex-valued function 133
component (in a topological space) 111Ye
conditional expectation 233D, 233E, 233J, 233K, 233Xg, 233Yc, 233Ye, 235Yb, 242J, 246Ea, 253H, 253Le, 275Ba,
275H, 275I, 275K, 275Ne, 275Xi, 275Ya, 275Yk, 275Yl
conditional expectation operator 242Jf, 242K, 242L, 242Xe, 242Xj, 242Yk, 243J, 244M, 244Yl, 246D, 254R, 254Xp,
275Xd, 275Xe
conegligible set 112Dc, 211Xg, 212Xi, 214Cc, 234Hb
connected set 222Yb
continuity, points of 224H, 224Ye, 225J
continuous function 121D, 121Yd, 262Ia, 2A2C, 2A2G, 2A3B, 2A3H, 2A3Nb, 2A3Qb
continuous linear functional 284Yj; see also dual linear space (2A4H)
continuous linear operator 2A4Fc; see also bounded linear operator (2A4F)
continuous measure see atomless (211J), absolutely continuous (232Aa)
continuous see also semi-continuous (225H)
continuum see c (2A1L)
convergence in distribution 285Yq, 285Yr
convergence in mean (in L
1
() or L
1
()) 245Ib
convergence in measure (in L
0
()) 245 (>>>245A), 246J, 246Yc, 247Ya, 253Xe, 255Yh, 285Yr
(in L
0
()) 245 (>>>245A), 271Yd, 272Yd, 274Yd, 285Xs
(in the algebra of measurable sets) 232Ya
(of sequences) 245Ad, 245C, 245H-245L, 245Xd, 245Xf, 245Xl, 245Yh, 246J, 246Xh, 246Xi, 271L, 273Ba, 275Xk,
275Yp, 276Yf
convergent almost everywhere 245C, 245K, 273Ba, 276G, 276H; see also strong law
convergent lter 2A3Q, 2A3S, 2A5Ib;
sequence 135D, 245Yi, 2A3M, 2A3Sg; see also convergence in measure (245Ad)
convex function 233G, 233H-233J, 233Xb-233Xf (233Xd), 233Xh, 233Ya, 233Yc, 233Yd, 233Yg, 233Yi, 233Yj,
242K, 242Yi, 242Yj, 242Yk, 244Xm, 244Yh, 244Ym, 255Yj, 275Yg; see also mid-convex (233Ya)
convex hull 2A5E; see also closed convex hull (2A5Eb)
530 Index convex
convex set 233Xd, 244Yk, 233Yf-233Yj, 262Xh, 264Xg, 266Xc, 2A5E
convolution in L
0
255Xh, 255Xi, 255Xj, 255Yh, 255Yj
convolution of functions 255E, 255F-255K, 255O, 255Xa-255Xg, 255Ya, 255Yc, 255Yd, 255Yf, 255Yg, 255Yk,
255Yl, 255Yn, 262Xj, 262Yd, 262Ye, 262Yh, 263Ya, 282Q, 282Xt, 283M, 283Wd, 283Wf, 283Wj, 283Xl, 284J, 284K,
284O, 284Wf, 284Wi, 284Xb, 284Xd
convolution of measures 257 (257A), 272T, 285R, 285Yn
convolution of measures and functions 257Xe, 284Xo, 284Yi
convolution of sequences 255Xk, 255Yo, 282Xq
countable (set) 111F, 114G, 115G, 1A1, 226Yc
countable choice, axiom of 134C, 211P
countable-cocountable algebra 211R, 211Ya, 232Hb
countable-cocountable measure 211R, 232Hb, 252L
countable sup property (in a Riesz space) 241Yd, 242Yd, 242Ye, 244Yb
countably additive functional (on a -algebra of sets) 231 (231C), 232, 246Yg, 246Yi
counting measure 112Bd, 122Xd, 122 notes, 211N, 211Xa, 213Xb, 226A, 241Xa, 242Xa, 243Xl, 244Xh, 244Xn,
245Xa, 246Xc, 246Xd, 251Xc, 251Xi, 252K, 255Yo, 264Db
cover see measurable envelope (132D)
Covering Lemma see Lebesgue Covering Lemma
covering theorem 221A, 261B, 261F, 261Xc, 261Ya, 261Yi, 261Yk
cylinder (in measurable cylinder) 254Aa, 254F, 254G, 254Q, 254Xa
decimal expansions 273Xg
decomposable measure (space) see strictly localizable (211E)
decomposition (of a measure space) 211E, 211Ye, 213O, 213Xh, 214Ia, 214L, 214N, 214Xh
see also Hahn decomposition (231F), Jordan decomposition (231F), Lebesgue decomposition (232I, 226C)
decreasing rearrangement (of a function) 252Yo
Dedekind complete ordered set 135Ba, 234Yo
Riesz space 241Fc, 241G, 241Xf, 242H, 242Yc, 243Hb, 243Xj, 244L
Dedekind -complete Riesz space 241Fb, 241G, 241Xe, 241Yb, 241Yh, 242Yg, 243Ha, 243Xb
delta function see Diracs delta function (284R)
delta system see -system
Denjoy-Young-Saks theorem 222L
dense set in a topological space 136H, 242Mb, 242Ob, 242Pd, 242Xi, 243Ib, 244H, 244Pb, 244Xk, 244Yj, 254Xo,
281Yc, 2A3U, 2A4I
density function (of a random variable) 271H, 271I-271K, 271Xc-271Xe, 272U, 272Xd, 272Xj; see also Radon-
Nikod ym derivative (232Hf, 234J)
density point 223B, 223Xc, 223Yb, 266Xb
density topology 223Yb, 223Yc, 223Yd, 261Yf, 261Yl
density see also asymptotic density (273G), Lebesgues Density Theorem (223A)
derivate see Dini derivate (222J)
derivative of a function (of one variable) 222C, 222E-222J, 222Yd, 225J, 225L, 225Of, 225Xc, 226Be, 282R; (of many
variables) 262F, 262G, 262P; see partial derivative
determinant of a matrix 2A6A
determined see locally determined measure space (211H), locally determined negligible sets (213I)
determined by coordinates (in W is determined by coordinates in J) 254M, 254O, 254R-254T, 254Xp, 254Xr
Devils Staircase see Cantor function (134H)
dierentiability, points of 222H, 225J
dierentiable function (of one variable) 123D, 222A, 224I, 224Kg, 224Yc, 225L, 225Of, 225Xc, 225Xn, 233Xc, 252Ye,
255Xd, 255Xe, 262Xk, 265Xd, 274E, 282L, 282Rb, 282Xs, 283I-283K, 283Xm, 284Xc, 284Xk; (of many variables)
262Fa, 262Gb, 262I, 262Xg, 262Xi, 262Xj; see also derivative
dierentiable relative to its domain 222L, 262Fb, 262I, 262M-262Q, 262Xd-262Xf, 262Yc, 263D, 263Xc, 263Xd,
263Yc, 265E, 282Xk
dierentiating through an integral 123D
diused measure see atomless measure (211J)
dilation 286C
dimension see Hausdor dimension (264 notes)
lter General index 531
Dini derivates 222J, 222L, 222Ye, 225Yf
Dirac measure 112Bd, 257Xa, 284R, 284Xn, 284Xo, 285H, 285Xp
direct image (of a set under a function or relation) 1A1B
direct sum of measure spaces 214L, 214M, 214Xh-214Xk, 241Xg, 242Xd, 243Xe, 244Xf, 245Yh, 251Xi, 251Xj
directed setsee downwards-directed (2A1Ab), upwards-directed (2A1Ab)
Dirichlet kernel 282D; see also modied Dirichlet kernel (282Xc)
disintegration of a measure 123Ye
disjoint family (of sets) 112Bb
disjoint sequence theorems (in topological Riesz spaces) 246G, 246Ha, 246Yd-246Yf, 246Yj
distribution see Schwartzian distribution, tempered distribution
distribution of a random variable 241Xc, 271E-271G, 272G, 272T, 272Xe, 272Ya, 272Yb, 272Yf, 285H, 285Xg; see
also Cauchy distribution (285Xm), empirical distribution (273 notes), Poisson distribution (285Xo)
of a nite family of random variables 271B, 271C, 271D-271I, 272G, 272Yb, 272Yc, 285Ab, 285C, 285Mb
distribution function of a random variable >>>271G, 271L, 271Xb, 271Yb-271Yd, 272Xe, 272Ya, 273Xh, 273Xi,
274F-274L, 274Xd, 274Xg, 274Xh, 274Ya, 274Yc, 285P
Dominated Convergence Theorem see Lebesgues Dominated Convergence Theorem (123C)
Doobs Martingale Convergence Theorem 275G
downwards-directed partially ordered set 2A1Ab
dual normed space 243G, 244K, 2A4H
Dynkin class 136A, 136B, 136Xb
Eberleins theorem 2A5J
Egorovs theorem 131Ya, 215Yb
empirical distribution 273Xi, 273 notes
envelope see measurable envelope (132D)
equidistributed sequence (in a topological probability space) 281N, 281Yi, 281Yj, 281Yk; see alsowell-distributed
(281Ym)
Equidistribution Theorem see Weyls Equidistribution Theorem (281N)
equipollent sets 2A1G, 2A1Ha; see also cardinal (2A1Kb
equiveridical 121B, 212B
essential closure 266B, 266Xa, 266Xb
essential supremum of a family of measurable sets 211G, 213K, 215B, 215C
of a real-valued function 243D, 243I, 255K
essentially bounded function 243A
Etemadis lemma 272V
Euclidean metric (on R
r
) 2A3Fb
Euclidean topology 1A2, 2A2, 2A3Ff, 2A3Tc
even function 255Xb, 283Yb, 283Yc
exchangeable family of random variables 276Xe
exhaustion, principle of 215A, 215C, 215Xa, 215Xb, 232E
expectation of a random variable 271Ab, 271E, 271F, 271I, 271Xa, 271Ye, 272R, 272Xb, 272Xi, 274Xi, 285Ga,
285Xo, 285Xt; see also conditional expectation (233D)
extended real line 121C, 135
extension of nitely additive functionals 113Yi
extension of measures 113Yc, 113Yh, 132Yd, 212Xh, 214P, 214Xm, 214Xn, 214Ya-214Yc; see also completion
(212C), c.l.d. version (213E)
fair-coin probability 254J
Fatous Lemma 123B, 133K, 135Gb, 135Hb
Fatou norm on a Riesz space 244Yg
Fejer integral 283Xf, 283Xh-283Xj
Fejer kernel 282D
Fejer sums 282Ad, 282B-282D, 282G-282I, 282Yc
Feller, W. chap. 27 intro.
eld (of sets) see algebra (136E)
lter 211Xg, 2A1I, 2A1N, 2A1O; see also Cauchy lter (2A5F), convergent lter (2A3Q), Frechet lter (2A3Sg),
ultralter (2A1N)
532 Index ltration
ltration (of -algebras) 275A
nite-conite algebra 231Xa, 231Xc
nitely additive function(al) on an algebra of sets 136Xg, 136Ya, 136Yb, 231A, 231B, 231Xb-231Xe, 231Ya-231Yh,
232A, 232B, 232E, 232G, 232Ya, 232Ye, 232Yg, 243Xk; see also countably additive
Fourier coecients 282Aa, 282B, 282Cb, 282F, 282Ic, 282J, 282M, 282Q, 282R, 282Xa, 282Xg, 282Xq, 282Xt,
282Ya, 283Xu, 284Ya, 284Yg
Fouriers integral formula 283Xm
Fourier series 121G, 282 (282Ac)
Fourier sums 282Ab, 282B-282D, 282J, 282L, 282O, 282P, 282Xi-282Xk, 282Xp, 282Xt, 282Yd, 286V, 286Xb
Fourier transform 133Xd, 133Yc, 283 (283A, 283Wa), 284 (284H, 284Wd), 285Ba, 285D, 285Xd, 285Ya,
286U, 286Ya
Fourier-Stieltjes transform see characteristic function (285A)
Frechet lter 2A3Sg
323Ad
Fubinis theorem 252B, 252C, 252H, 252R, 252Yb, 252Yc
full outer measure 132F, 132Yd, 134D, 134Yt, 212Eb, 214F, 214J, 234F, 234Xa, 241Yg, 243Ya, 254Yf
function 1A1B
Fundamental Theorem of Calculus 222E, 222H, 222I, 225E
Fundamental Theorem of Statistics 273Xi, 273 notes
gamma function 225Xj, 225Xk, 252Xi, 252Yf, 252Yu, 255Xg
Gaussian distribution see standard normal distribution (274Aa)
Gaussian random variable see normal random variable (274Ad)
generated (-)algebra of sets 111Gb, 111Xe, 111Xf, 121J, 121Xd, 121Yc, 136B, 136C, 136G, 136H, 136Xc, 136Xk,
136Yb, 136Yc. 251D, 251Xa, 272C
geometric mean 266A
Glivenko-Cantelli theorem 273 notes
graph of a function 264Xf, 265Yb
group 255Yn, 255Yo; see also circle group
Hahn decomposition (for countably additive functionals) 231E
Hahn decomposition of an additive functional 231Eb, 231F
see also Vitali-Hahn-Saks theorem (246Yi)
half-open interval (in R or R
r
) 114Aa, 114G, 114Yj, 115Ab, 115Xa, 115Xc, 115Yd
Hanners inequalities 244O, 244Ym
Hardy-Littlewood Maximal Theorem 286A
Hausdor dimension 264 notes
Hausdor measure 264 (264C, 264Db, 264K, 264Yo), 265Yb; see also normalized Hausdor measure (265A)
Hausdor metric (on a space of closed subsets) 246Yb
Hausdor outer measure 264 (264A, 264K, 264Yo)
Hausdor topology 2A3E, 2A3L, 2A3Mb, 2A3S
Hilbert space 244Na, 244Yk
Hoedings theorem 272W, 272Xk, 272Xl
H older continuous function 282Xj, 282Yb
H olders inequality 244Eb, 244Yc, 244Ym
hull see convex hull (2A5Ea), closed convex hull (2A5Eb)
ideal in an algebra of sets 214O, 232Xc; see also -ideal (112Db)
identically distributed random variables 273I, 273Xi, 274Xk, 274 notes, 276Xf, 276Yg, 285Xn, 285Yc; see also
exchangeable sequence (276Xe)
image lter 212Xi, 2A1Ib, 2A3Qb, 2A3S
image measure 112Xf, 123Ya, 132Yb, 212Xf, 212Xi, 234C-234F (234D), 234Xb-234Xd, 234Ya, 234Yc, 234Yd, 235J,
235Xl, 235Ya, 254Oa, 256G
image measure catastrophe 235H
indenite integral 131Xa, 222D-222F, 222H, 222I, 222Xa-222Xc, 222Yc, 224Xg, 225E, 225Od, 225Xh, 232D, 232E,
232Yf, 232Yi; see also indenite-integral measure
indenite-integral measure 234I-234O (234J), 234Xh-234Xk, 234Yi, 234Yj, 234Yl-234Yn, 235K, 235N, 235Xh,
235Xl, 235Xm, 253I, 256E, 256J, 256L, 256Xe, 256Yd, 257F, 257Xe, 263Ya, 275Yi, 275Yj, 285Dd, 285Xe, 285Ya;
see also uncompleted indenite-integral measure
law General index 533
independence 272 (272A)
independent random variables 272Ac, 272D-272I, 272L, 272M, 272P, 272R-272W, 272Xb, 272Xd, 272Xe, 272Xh-
272Xl, 272Ya, 272Yb, 272Yd-272Yh, 273B, 273D, 273E, 273H, 273I, 273L-273N, 273Xi, 273Xk, 273Xm, 274B-274D,
274F-274K, 274Xc, 274Xd, 274Xg, 274Xj, 274Xl, 274Ye, 275B, 275Yh, 275Yq, 275Yr, 276Af, 285I, 285Xf, 285Xg,
285Xm-285Xo, 285Xs, 285Yc, 285Yk, 285Yl
independent sets 272Aa, 272Bb, 272F, 272N, 273F, 273K
independent -algebras 272Ab, 272B, 272D, 272F, 272J, 272K, 272M, 272O, 272Q, 275Ym
induced topology see subspace topology (2A3C)
inductive denitions 2A1B
inequality see Cauchy, Chebyshev, H older, isodiametric, Jensen, Wirtinger, Young
innity 112B, 133A, 135
initial ordinal 2A1E, 2A1F, 2A1K
inner measure 113Yg, 212Ya, 213Xe, 213Yc
dened by a measure 113Yh, 213Xe
inner product space 244N, 244Yn, 253Xe
inner regular measure 256A
with respect to closed sets 134Xe, 256Bb, 256Ya
with respect to compact sets 134Fb, 256Bc
integrable function 122 (122M), 123Ya, 133B, 133Db, 133Dc, 133F, 133J, 133Xa, 135Fa, 212Bc, 212Fb, 213B,
213Gsee also Bochner integrable function (253Yf), L
1
() (242A)
integral 122 (122E, 122K, 122M); see also integrable function, Lebesgue integral (122Nb), lower integral (133I),
Riemann integral (134K), upper integral (133I)
integration by parts 225F, 225Oe, 252Xb
integration by substitution see change of variable in integration
interior of a set 2A3D, 2A3Ka
interpolation see Riesz Convexity Theorem
interval see half-open interval (114Aa, 115Ab), open interval (111Xb
inverse Fourier transform 283Ab, 283B, 283Wa, 283Xb, 284I; see also Fourier transform
inverse image (of a set under a function or relation) 1A1B
inverse-measure-preserving function 134Yl-134Yn, 234A, 234B, 234Ea, 234F, 234Xa, 235G, 235Xm, 241Xg, 242Xd,
243Xn, 244Xo, 246Xf, 251L, 251Wp, 251Yb, 254G, 254H, 254Ka, 254O, 254Xc-254Xf, 254Xh, 254Yb; see also image
measure (234D)
Inversion Theorem (for Fourier series and transforms) 282G-282J, 282L, 282O, 282P, 283I, 283L, 284C, 284M; see
also Carlesons theorem
isodiametric inequality 264H
isomorphism see measure space isomorphism
Jacobian 263Ea
Jensens inequality 233I, 233J, 233Yi, 233Yj, 242Yi
joint distribution see distribution (271C)
Jordan decomposition of an additive functional 231F, 231Ya, 232C
kernel see Dirichlet kernel (282D), Fejer kernel (282D), modied Dirichlet kernel (282Xc), Poisson kernel (282Yg)
Kirzbrauns theorem 262C
Kolmogorovs Strong Law of Large Numbers 273I, 275Yn
Komlos theorem 276H, 276Yh
K onig H. 232Yk
Kroneckers lemma 273Cb
Lacey-Thiele Lemma 286M
Laplaces central limit theorem 274Xe
Laplace transform 123Xc, 123Yb, 133Xc, 225Xe
lattice 2A1Ad; see also Banach lattice (242G)
norm see Riesz norm (242Xg)
law of a random variable see distribution (271C)
law of large numbers see strong law (273)
law of rare events 285Q
534 Index Lebesgue
Lebesgue, H. Vol. 1 intro., chap. 27 intro.
Lebesgue Covering Lemma 2A2Ed
Lebesgue decomposition of a countably additive functional 232I, 232Yb, 232Yg, 256Ib
Lebesgue decomposition of a function of bounded variation 226C, 226Dc, 226Ya, 232Yb, 263Ye
Lebesgues Density Theorem 223, 261C, 275Xg
Lebesgues Dominated Convergence Theorem 123C, 133G
Lebesgue extension see completion (212C)
Lebesgue integrable function 122Nb, 122Yb, 122Ye, 122Yf
Lebesgue integral 122Nb
Lebesgue measurable function 121C, 121D, 134Xg, 134Xj, 225H, 233Yd, 262K, 262P, 262Yc
Lebesgue measurable set 114E, 114F, 114G, 114Xe, 114Ye, 115E, 115F, 115G
Lebesgue measure (on R) 114 (114E), 131Xb, 133Xc, 133Xd, 134G-134L, 212Xc, 216A, chap. 22, 242Xi, 246Yd,
246Ye, 252N, 252O, 252Xf, 252Xg, 252Ye, 252Yo, 255
(on R
r
) 115 (115E), 132C, 132Ef, 133Yc, 134, 211M, 212Xd, 245Yj, 251N, 251Wi, 252Q, 252Xi, 252Yi,
254Xk, 255A, 255L, 255Xj, 255Ye, 255Yf, 256Ha, 256J-256L, 264H, 264I, 266C
(on [0, 1], [0, 1[) 211Q, 216Ab, 234Ya, 252Yp, 254K, 254Xh, 254Xj-254Xl
(on other subsets of R
r
) 131B, 234Ya, 234Yc, 242O, 244Hb, 244I, 244Yi, 246Yf, 246Yl, 251R, 252Yh,
255M-255O, 255Yg, 255Ym
Lebesgue negligible set 114E, 115E, 134Yk
Lebesgue outer measure 114C, 114D, 114Xc, 114Yd, 115C, 115D, 115E, 115Xb, 115Xd, 115Yb, 132C, 134A, 134D,
134Fa
Lebesgue set of a function 223D, 223Xh-223Xj, 223Yg, 223Yh, 261E, 261Ye, 282Yg
Lebesgue-Stieltjes measure 114Xa, 114Xb, 114Yb, 114Yc, 114Yf, 131Xc, 132Xg, 134Xd, 211Xb, 212Xd, 212Xg,
225Xf, 232Xb, 232Yb, 235Xb, 235Xf, 235Xg, 252Xb, 256Xg, 271Xb, 224Yh
length of a curve 264Yl, 265Xd, 265Ya
length of an interval 114Ab
B.Levis theorem 123A, 123Xa, 133K, 135Ga, 135Hb, 226E, 242E
Levi property of a normed Riesz space 242Yb, 244Yf
Levys martingale convergence theorem 275I
Levys metric 274Ya, 285Yd
Liapounos central limit theorem 274Xg
limit of a lter 2A3Q, 2A3R, 2A3S
limit of a sequence 2A3M, 2A3Sg
limit ordinal 2A1Dd
Lindebergs central limit theorem 274F-274H, 285Ym
Lindebergs condition 274H
linear operator 262Gc, 263A, 265B, 265C, 2A6; see also bounded linear operator (2A4F), continuous linear operator
linear order see totally ordered set (2A1Ac)
linear space topology see linear topological space (2A5A), weak topology (2A5I), weak* topology (2A5Ig)
linear subspace (of a normed space) 2A4C; (of a linear topological space) 2A5Ec
linear topological space 245D, 284Ye, 2A5A, 2A5B, 2A5C, 2A5E-2A5I
Lipschitz constant 262A, 262C, 262Yi
Lipschitz function 224Ye, 225Yc, 262A, 262B-262E, 262N, 262Q, 262Xa-262Xc, 262Xh, 262Yi, 263F, 264G, 264Yj;
see also H older continuous (282Xj)
local convergence in measure see convergence in measure (245A)
localizable measure (space) 211G, 211L, 211Ya, 211Yb, 212Ga, 213Hb, 213L-213N, 213Xl, 213Xm, 214Ie, 214K,
214Xa, 214Xc, 214Xe, 216C, 216E, 216Ya, 216Yb, 234Nc, 234O, 234Yk-234Yn, 241G, 241Ya, 243Gb, 243Hb, 245Ec,
245Yf, 252Yp, 252Yq, 252Ys, 254U; see also strictly localizable (211E)
locally determined measure (space) 211H, 211L, 211Ya, 214Xb, 216Xb, 216Ya, 216Yb, 251Xd, 252Ya; see also
complete locally determined measure
locally determined negligible sets 213I, 213J-213L, 213Xj-213Xl, 214Ic, 214Xf, 214Xg, 216Yb, 234Yl, 252E, 252Yb
locally nite measure 256A, 256C, 256G, 256Xa, 256Ya
locally integrable function 242Xi, 255Xe, 255Xf, 256E, 261E, 261Xa, 262Yg
Lo`eve, M. chap. 27 intro.
lower integral 133I, 133J, 133Xf, 133Yd, 135Ha, 214Jb, 235A
lower Lebesgue density 223Yf
lower Riemann integral 134Ka
normal General index 535
lower semi-continuous function 225H, 225I, 225Xl, 225Xm, 225Yd, 225Ye
Lusins theorem 134Yd, 256F
Markov time see stopping time (275L)
martingale 275 (275A, 275Cc, 275Cd, 275Ce); see also reverse martingale
martingale convergence theorems 275G-275I, 275K, 275Xf
martingale dierence sequence 276A, 276B, 276C, 276E, 276Xd, 276Xg, 276Ya, 276Yb, 276Ye, 276Yg
martingale inequalities 275D, 275F, 275Xb, 275Yc-275Ye, 276Xb
maximal element in a partially ordered set 2A1Ab
maximal theorems 275D, 275Yc, 275Yd, 276Xb, 286A, 286T
mean (of a random variable) see expectation (271Ab)
Mean Ergodic Theorem see convergence in mean (245Ib)
measurable cover see measurable envelope (132D)
measurable envelope 132D, 132E, 132F, 132Xg, 134Fc, 134Xd, 213K-213M, 214G, 216Yc, 266Xa; see also full outer
measure (132F)
measurable envelope property 213Xl, 214Xk
measurable function (taking values in R) 121 (121C), 122Ya, 212B, 212Fa, 213Yd, 214Ma, 214Na, 235C, 235I,
252O, 252P, 256F, 256Yb, 256Yc
(taking values in R
r
) 121Yd, 256G
(taking values in other spaces) 133Da, 133E, 133Yb, 135E, 135Xd, 135Yf
((, T)-measurable function) 121Yc, 235Xc, 251Ya, 251Yd
see also Borel measurable, Lebesgue measurable
measurable set 112A; -measurable set 212Cd; see also relatively measurable (121A)
measurable space 111Bc
measurable transformation 235; see also inverse-measure-preserving function
measure 112A
(in measures E, E is measured by ) >>>112Be
measure algebra 211Yb, 211Yc
function see inverse-measure-preserving function (234A), measure space automorphism, measure space
isomorphism
measure space 112 (112A)
measure space automorphism 255A, 255L-255N, 255 notes
measure space isomorphism 254K, 254Xj-254Xl
median function 2A1Ac
metric 2A3F, 2A4Fb; Euclidean metric (2A3Fb), Hausdor metric (246Yb), Levys metric (274Ya), pseudometric
2A3F)
metric outer measure 264Xb, 264Yc
metric space 224Ye, 261Yi; see also complete metric space, metrizable space (2A3Ff)
metrizable (topological) space 2A3Ff, 2A3L; see also metric space, separable metrizable
mid-convex function 233Ya, 233Yd
minimal element in a partially ordered set 2A1Ab
Minkowskis inequality 244F, 244Yc, 244Ym; see also Brunn-Minkowski inequality (266C)
modied Dirichlet kernel 282Xc
modulation 286C
Monotone Class Theorem 136B
Monotone Convergence Theorem see B.Levis theorem (133A)
monotonic function 121D, 222A, 222C, 222Yb, 224D
Monte Carlo integration 273J, 273Ya
multilinear operator 253Xc
negligible set 112D, 131Ca, 214Cb, 234Hb; see also Lebesgue negligible (114E, 115E), null ideal (112Db)
Nikod ym see Radon-Nikod ym
non-decreasing sequence of sets 112Ce
non-increasing sequence of sets 112Cf
non-measurable set 134B, 134D, 134Xc
norm 2A4B; (of a linear operator) 2A4F, 2A4G, 2A4I; (of a matrix) 262H, 262Ya; (norm topology) 242Xg, 2A4Bb
normal density function 274A, 283N, 283We, 283Wf
normal distribution 274Ad, 495Xh; see also standard normal (274A)
536 Index normal
normal distribution function 274Aa, 274F-274K, 274M, 274Xe, 274Xg
normal random variable 274A, 274B, 274Xi, 285E, 285Xm, 285Xn, 285Xt
normalized Hausdor measure 264 notes, 265 (265A)
normed algebra 2A4J; see also Banach algebra (2A4Jb)
normed space 224Yf, 2A4 (2A4Ba); see also Banach space (2A4D)
null ideal of a measure 112Db, 251Xp
null set see negligible (112Da)
odd function 255Xb, 283Yd
open interval 111Xb, 114G, 115G, 1A1A, 2A2I
open set (in R
r
) 111Gc, 111Yc, 114Yd, 115G, 115Yb, 133Xb, 134Fa, 134Xe, 135Xa, 1A2A, 1A2B, 1A2D; (in R)
111Gc, 111Ye, 114G, 134Xd, 2A2I; (in other topological spaces) 256Ye, 2A3A, 2A3G; see also topology (2A3A)
optional time see stopping time (275L)
order-bounded set (in a partially ordered space) 2A1Ab
order-complete see Dedekind complete (241Ec)
order-continuous norm (on a Riesz space) 242Yc, 242Ye, 244Ye
order*-convergent sequence; ; (in L
0
()) 245C, 245K, 245L, 245Xc, 245Xd
order unit (in a Riesz space) 243C
ordered set see partially ordered set (2A1Aa), totally ordered set (2A1Ac), well-ordered set (2A1Ae)
ordering of measures 234P, 234Q, 234Xl, 234Yo
ordinal 2A1C, 2A1D-2A1F, 2A1K
ordinate set 252N, 252Yl, 252Ym, 252Yv
orthogonal matrix 2A6B, 2A6C
orthogonal projection in Hilbert space 244Nb, 244Yk, 244Yl
orthonormal vectors 2A6B
outer measure 113 (113A), 114Xd, 132B, 132Xg, 136Ya, 212Xf, 213C, 213Xa, 213Xg, 213Xk, 213Ya, 251B,
251Wa, 251Xe, 254B, 264B, 264Xa, 264Ya, 264Yo; see also Lebesgue outer measure (114C, 115C), metric outer
measure (264Yc), regular outer measure (132Xa)
dened from a measure 113Ya, 132A-132E (132B), 132Xa-132Xi, 132Xk, 132Ya-132Yc, 133Je, 212Ea,
212Xa, 212Xb, 213C, 213F, 213Xa, 213Xg-213Xj, 213Xk, 213Yd, 214Cd, 215Yc, 234Bf, 234Ya, 251P, 251S, 251Wk,
251Wm, 251Xn, 251Xq, 252D, 252I, 252O, 252Ym, 254G, 254L, 254S, 254Xb, 254Xq, 254Yd, 264Fb, 264Ye
outer regular measure 134Fa, 134Xe, 256Xi
Parsevals identity 284Qd; see also Plancherels theorem
partial derivative 123D, 252Ye, 262I, 262J, 262Xh, 262Yb, 262Yc
partial order see partially ordered set (2A1Aa)
partially ordered linear space 241E, 241Yg
partially ordered set 2A1Aa, 234Qa, 234Yo
partition 1A1J
Peano curve 134Yl-134Yo
periodic extension of a function on ], ] 282Ae
Plancherel Theorem (on Fourier series and transforms of square-integrable functions) 282K, 284O, 284Qd
point-supported measure 112Bd, 211K, 211O, 211Qb, 211Rc, 211Xb, 211Xf, 213Xo, 215Xr, 234Xb, 234Xe, 234Xk,
256Hb;see also Dirac measure (112Bd)
pointwise convergence (topology on a space of functions) 281Yf
pointwise convergent see order*-convergent (245Cb)
pointwise topology see pointwise convergence
Poisson distribution 285Q, 285Xo
Poisson kernel 282Yg
Poissons theorem 285Q
polar coordinates 263G, 263Xf
P olyas urn scheme 275Xc
polynomial (on R
r
) 252Yi
porous set 223Ye, 261Yg, 262L
positive cone 253Gb, 253Xi, 253Yd
positive denite function 283Xt, 285Xr
power set TX(usual measure on) 254J, 254Xf, 254Xq, 254Yd
TN 1A1Hb, 2A1Ha, 2A1Lb; (usual measure on) 273G, 273Xe, 273Xf
Riesz General index 537
predictable sequence 276Ec
presque partout 112De
primitive product measure 251C, 251E, 251F, 251H, 251K, 251Wa, 251Xb-251Xh, 251Xk, 251Xm-251Xo, 251Xq,
251Xr, 252Yc, 252Yd, 252Yl, 253Ya-253Yc, 253Yg
principal ultralter 2A1N
probability density function see density (271H)
probability space 211B, 211L, 211Q, 211Xb, 211Xc, 211Xd, 212Ga, 213Ha, 215B, 234Bb, 243Xi, 245Xe, 253Xh,
254, chap. 27
product measure chap. 25; see also c.l.d. product measure (251F, 251W), primitive product measure (251C),
product probability measure (254C)
product probability measure 254 (254C), 272G, 272J, 272M, 273J, 273Xj, 275J, 275Yi, 275Yj, 281Yk; see also
0, 1
I
product topology 281Yc, 2A3T
see also inner product space
pseudometric 2A3F, 2A3G-2A3M 2A3S-2A3U, 2A5B
pseudo-simple function 122Ye, 133Ye
pull-back measures 234F, 234Ye
purely atomic measure (space) 211K, 211N, 211R, 211Xb, 211Xc, 211Xd, 212Gd, 213He, 214K, 214Xd, 234Be,
234Xe, 234Xj, 251Xu
purely innite measurable set 213 notes
push-forward measure see image measure (234D)
quasi-Radon measure (space) 256Ya, 263Ya
quasi-simple function 122Yd, 133Yd
quotient partially ordered linear space 241Yg; see also quotient Riesz space
quotient Riesz space 241Yg, 241Yh, 242Yg
quotient topology 245Ba
Rademachers theorem 262Q
Radon measure (on R or R
r
) 256 (256A), 257A, 284R, 284Yi
(on ], ]) 257Yb
Radon-Nikod ym derivative 232Hf, 232Yj, 234J, 234Ka, 234Yi, 234Yj, 235Xh, 256J, 257F, 257Xe, 257Xf, 272Xe,
275Ya, 275Yi, 285Dd, 285Xe, 285Ya
Radon-Nikod ym theorem 232E-232G, 234O, 235Xj, 242I, 244Yl
Radon probability measure (on R or R
r
) 256Hc, 257A, 257Xa, 271B, 271C, 271Xb, 274Xh, 274Yb, 285A, 285D,
285F, 285J, 285M, 285R, 285Xa, 285Xh, 285Xe, 285Xj, 285Xk, 285Xm, 285Ya, 285Yb, 285Yg, 285Yp;
(on other spaces) 256Ye, 271Ya
Radon product measure (of nitely many spaces) 256K
random variable 271Aa
rapidly decreasing test function 284 (284A, 284Wa), 285Dc, 285Xd, 285Ya
rearrangement see decreasing rearrangement (252Yo)
recursion 2A1B
regular measure see inner regular (256Ac)
regular outer measure 132C, 132Xa, 213C, 214Hb, 214Xb, 251Xn, 254Xb, 264Fb
regular topological space 2A5J
relation 1A1B
relatively compact set >>>2A3Na, 2A3Ob; see also relatively weakly compact
relatively measurable set 121A
relatively weakly compact set (in a normed space) 247C, 2A5Id
repeated integral 252 (252A); see also Fubinis theorem, Tonellis theorem
reverse martingale 275K
Riemann integrable function 134K, 134L, 281Yh, 281Yi
Riemann integral 134K, 242 notes
Riemann-Lebesgue lemma 282E, 282F
Riesz Convexity Theorem 244 notes
Riesz norm 242Xg
Riesz space (= vector lattice) 231Yc, 241Ed, 241F, 241Yc, 241Yg; see also Banach lattice (242G), Riesz norm
(242Xg)
538 Index Saks
Saks see Denjoy-Young-Saks theorem (222L), Vitali-Hahn-Saks theorem (246Yi)
saltus function 226B, 226Db, 226Xa
saltus part of a function of bounded variation 226C, 226Xb, 226Xc, 226Yd
scalar multiplication of measures 234Xf, 234Xl
Schroder-Bernstein theorem 2A1G
Schwartz function see rapidly decreasing test function (284A)
Schwartzian distribution 284R, 284 notes; see also tempered distribution (284 notes)
self-supporting set (in a topological measure space) 256Xf
semi-continuous functionsee lower semi-continuous (225H)
semi-nite measure (space) 211F, 211L, 211Xf, 211Ya, 212Ga, 213A, 213B, 213Hc, 213Xc, 213Xd, 213Xj, 213Xl,
213Xm, 213Ya-213Yc 213Yf, 214Xd, 214Xg, 214Ya, 215B, 216Xa, 216Yb, 234B, 234Na, 234Xe, 234Xi, 235M, 235Xd,
241G, 241Ya, 241Yd, 243Ga, 245Ea, 245J, 245Xd, 245Xk, 245Xm, 246Jd, 246Xh, 251J, 251Xd, 252P, 252Yk, 253Xf,
253Xg
semi-nite version of a measure 213Xc, 213Xd
semi-martingale see submartingale (275Yf)
seminorm 2A5D
semi-ring of sets 115Ye
separable (topological) space 2A3Ud
separable Banach space 244I, 254Yc
separable metrizable space 245Yj, 264Yb, 284Ye
Sierpi nski Class Theorem see Monotone Class Theorem (136B)
signed measure see countably additive functional (231C)
simple function 122 (>>>122A), 242M
singular additive functional 232Ac, 232I, 232Yg
singular measures 231Yf, 232Yg
smooth function (on R or R
r
) 242Xi, 255Xf, 262Yd-262Yg, 284A, 284Wa, 285Yp
smoothing by convolution 261Ye
solid hull (of a subset of a Riesz space) 247Xa
space-lling curve 134Yl
sphere, surface measure on 265F-265H, 265Xa-265Xc, 265Xe
spherical polar coordinates 263Xf, 265F
square-integrable function 244Na; see also L
2
standard normal distribution, standard normal random variable 274A
Steiner symmetrization 264H
step-function 226Xa
Stieltjes measure see Lebesgue-Stieltjes measure (114Xa)
Stirlings formula 252Yu
stochastically independent see independent (272A)
Stone-Weierstrass theorem 281A, 281E, 281G, 281Ya, 281Yg
stopping time 275L, 275M-275O, 275Xi, 275Xj
strictly localizable measure (space) 211E, 211L, 211N, 211Xf, 211Ye, 212Gb, 213Ha, 213J, 213L, 213O, 213Xa,
213Xh, 213Xn, 213Ye, 214Ia, 214K, 215Xf, 216E, 216Yd, 234Nd, 235N, 251O, 251Q, 251Wl, 251Xo, 252B, 252D,
252Yr, 252Ys
strong law of large numbers 273D-273J, 275Yn, 276C, 276F, 276Ye, 276Yg
subalgebra; see -subalgebra (233A)
submartingale 275Yf, 275Yg
subspace measure 113Yb, 214A, 214B, 214C, 214H, 214I, 214Xb-214Xg, 214Ya, 216Xa, 216Xb, 241Ye, 242Yf,
243Ya, 244Yd, 245Yb, 251Q, 251R, 251Wl, 251Xo, 251Xp, 251Yc, 254La, 254Ye, 264Yf
on a measurable subset 131A, 131B, 131C, 132Xb, 135I, 214K, 214L, 214Xa, 214Xh, 241Yf, 247A
(integration with respect to a subspace measure) 131D, 131E-131H, 131Xa-131Xc, 133Dc, 133Xa, 135I,
214D, 214E-214G, 214N, 214J, 214Xl
subspace of a normed space 2A4C
subspace topology 2A3C, 2A3J
subspace -algebra 121A, 214Ce
substitution see change of variable in integration
successor cardinal 2A1Fc
ordinal 2A1Dd
vague General index 539
sum over arbitrary index set 112Bd, 226A
sum of measures 112Yf, 122Xi, 234G, 234H, 234Qb, 234Xd-234Xg, 234Xi, 234Xl, 234Yf-234Yh
summable family of real numbers 226A, 226Xf
of a topological measure 256Xf, 257Xd, 285Yp
see also bounded support, compact support
supported see point-supported (112Bd)
supportingsee self-supporting set (256Xf), support
supremum 2A1Ab
surface measure see normalized Hausdor measure (265A)
symmetric distribution 272Yc
symmetrization see Steiner symmetrization
tempered distribution 284 notes
function 284 (284D), 286D
measure 284Yi
tensor product of linear spaces 253 notes
test function 242Xi, 284 notes; see also rapidly decreasing test function (284A)
thick set see full outer measure (132F)
Three Series Theorem 275Yr
tight see uniformly tight (285Xj)
Tonellis theorem 252G, 252H, 252R
topological measure (space) 256A
topological space 2A3 (2A3A)
topological vector space see linear topological space (2A5A)
topology 2A2, 2A3 (2A3A); see also convergence in measure (245A), linear space topology (2A5A)
total order see totally ordered set (2A1Ac)
total variation (of an additive functional) 231Yh; (of a function) see variation (224A)
totally nite measure (space) 211C, 211L, 211Xb, 211Xc, 211Xd, 212Ga, 213Ha, 214Ia, 214Ka, 214Yc, 215Yc,
232Bd, 232G, 243Ic, 234Bc, 243Xk, 245Fd, 245Ye, 246Xi, 246Ya
totally ordered set 135Ba, 2A1Ac
trace (of a -algebra) see subspace -algebra (121A)
transnite recursion 2A1B
translation-invariant measure 114Xf, 115Xd, 134A, 134Ye, 134Yf, 255A, 255N, 255Yn
truly continuous additive functional 232Ab, 232B-232E, 232H, 232I, 232Xa, 232Xb, 232Xf, 232Xh, 232Ya, 232Ye
Ulam S. see Banach-Ulam problem
ultralter 254Yd, 2A1N, 2A1O, 2A3R, 2A3Se; see also principal ultralter (2A1N)
Ultralter Theorem 2A1O
uncompleted indenite-integral measure 234Kc
uniform space 2A5F
uniformly continuous function 224Xa, 255K
uniformly convex normed space 244O, 244Yn, 2A4K
uniformly distributed sequence see equidistributed (281Yi)
uniformly integrable set (in L
1
) 246 (>>>246A), 252Yo, 272Ye, 273Na, 274J, 275H, 275Xi, 275Yl, 276Xd, 276Yb; (in
L
1
()) 246 (246A), 247C, 247D, 247Xe, 253Xd
uniformly tight (set of measures) 285Xj, 285Xk, 285Ye, 285Yf
unit ball in R
r
252Q
universal mapping theorems 253F, 254G
up-crossing 275E, 275F
upper integral 133I, 133J-133L, 133Xf, 133Yd, 133Yf, 135H, 135I, 214Ja, 214Xl, 235A, 252Yj, 252Ym, 252Yn,
253J, 253K, 273Xj
upper Riemann integral 134Ka
upwards-directed partially ordered set 2A1Ab
usual measure on 0, 1
I
254J; see under 0, 1
I
on TX 254J; see under power set
vague topology (on a space of signed measures) 274Ld, 274Xh, 274Ya-274Yd, 275Yp, 285K, 285L, 285S, 285U,
285Xk, 285Xq, 285Xs, 285Yd, 285Yg-285Yi, 285Yn, 285Yr
540 Index variance
variance of a random variable 271Ac, 271Xa, 272S, 274Xj, 274Ye, 285Gb, 285Xo
variation of a function 224 (224A, 224K, 224Yd, 224Ye), 226B, 226Db, 226Xc, 226Xd, 226Yb, 226Yd, 264Xf,
265Yb; see also bounded variation (224A)
of a measure see total variation (231Yh)
vector integration see Bochner integral (253Yf)
vector lattice see Riesz space (241E)
virtually measurable function 122Q, 122Xe, 122Xf, 135Ia, 212Bb, 212Fa, 234K, 241A, 252E, 252O
Vitalis construction of a non-measurable set 134B
Vitali cover 261Ya
Vitalis theorem 221A, 221Ya, 221Yc-221Ye, 261B, 261Yg, 261Yk
Vitali-Hahn-Saks theorem 246Yi
volume 115Ac
of a ball in R
r
252Q, 252Xi
Walds equation 272Xh
of a normed space 247Ya, 2A5I
see also (relatively) weakly compact, weakly convergent
weak* topology on a dual space 253Yd, 285Yg, 2A5Ig; see also vague topology (274Ld)
weakly compact set (in a linear topological space) 247C, 247Xa, 247Xc, 247Xd, 2A5I; see also relatively weakly
compact (2A5Id)
weakly convergent sequence in a normed space 247Yb
Weierstrass approximation theorem 281F; see also Stone-Weierstrass theorem
well-distributed sequence 281Xh, 281Ym
well-ordered set 214P, 2A1Ae, 2A1B, 2A1Dg, 2A1Ka; see also ordinal (2A1C)
Well-ordering Theorem 2A1Ka
Weyls Equidistribution Theorem 281M, 281N, 281Xh
Wirtingers inequality 282Yf
Youngs inequality 255Yl
see also Denjoy-Young-Saks theorem (222L)
Zermelos Well-ordering Theorem 2A1Ka
zero-one law 254S, 272O, 272Xf, 272Xg
Zorns lemma 2A1M
a.e. (almost everywhere) 112Dd
a.s. (almost surely) 112De
B (in B(x, ), closed ball) 261A
B (in B(U; V ), space of bounded linear operators) 253Xb, 253Yj, 253Yk, 2A4F, 2A4G, 2A4H
c (the cardinal of R and TN) 2A1H, 2A1L, 2A1P
C = the set of complex numbers; (in
R
C
) 2A1A
C (in C(X), where X is a topological space) 243Xo, 281Yc, 281Ye, 281Yf; (in C([0, 1])) 242 notes
C
b
(in C
b
(X), where X is a topological space) 281A, 281E, 281G, 281Ya, 281Yd, 281Yg, 285Yg
C
k
(in C
k
(X), where X is a topological space) 242O, 244Hb, 244Yj, 256Xh; see also compact support
(in C
k
(X; C)) 242Pd
c.l.d. product measure 251F, 251G, 251I-251L, 251N-251U, 251W, 251Xb-251Xm, 251Xp, 251Xs-251Xu, 251Yb-
251Yd, 252-253, 254Db, 254U, 254Ye, 256K, 256L
c.l.d. version of a measure (space) 213E, 213F-213H, 213M, 213Xb-213Xe, 213Xg, 213Xj, 213Xk, 213Xn, 213Xo,
213Yb, 214Xe, 214Xi, 232Ye, 234Xl, 234Yj, 234Yo, 241Ya, 242Yh, 244Ya, 245Yc, 251Ic, 251T, 251Wf, 251Wl, 251Xe,
251Xk, 251Xl, 252Ya
D (in D
+
f, D

f) see Dini derivate (222J)


D (in D
+
f, D

f) see Dini derivate (222J)


diam (in diamA) = diameter
dom (in domf): the domain of a function f
E (in E(X), expectation of a random variable) 271Ab
ess sup see essential supremum (243Da)
SSS General index 541
f-algebra 241H, 241 notes
G

set 264Xd

1
(in
1
(X)) 242Xa, 243Xl, 246Xd, 247Xc, 247Xd

1
(=
1
(N)) 246Xc

2
244Xn, 282K, 282Xg

p
(in
p
(X)) 244Xn

(in

(X)) 243Xl, 281B, 281D

(=

(N)) 243Xl
L
0
(in L
0
()) 121Xb, 241 (241A), 245, 253C, 253Ya; see also L
0
(241C), L
0
strict
(241Yh), L
0
C
(241J)
L
0
strict
241Yh
L
0
C
(in L
0
C
()) 241J, 253L
L
0
(in L
0
()) 241 (241A), 242B, 242J, 243A, 243B, 243D, 243Xe, 243Xj, 245, 253Xe-253Xg, 271De, 272H;
(in L
0
C
()) 241J
see also L
0
(241A)
L
1
(in L
1
()) 122Xc, 242A, 242Da, 242Pa, 242Xb; (in L
1
strict
()) 242Yg; (in L
1
C
() 242P, 255Yn; (in L
1
V
())
253Yf; see also L
1
, | |
1
L
1
(in L
1
()) 242 (242A), 243De, 243F, 243G, 243J, 243Xf-243Xh, 245H, 245J, 245Xh, 245Xi, 246, 247, 253,
254R, 254Xp, 254Ya, 254Yc, 257Ya, 282Bd
(in L
1
V
()) 253Yf, 253Yi
see also L
1
, L
1
C
, | |
1
L
1
C
() 242P, 243K, 246K, 246Yl, 247E, 255Xi; see also convolution of functions
L
2
(in L
2
()) 253Yj, 286; (in L
2
C
()) 284N, 284O, 284Wh, 284Wi, 284Xi, 284Xk-284Xm, 284Yg; see also L
2
, L
p
,
| |
2
L
2
(in L
2
()) 244N, 244Yl, 247Xe, 253Xe; (in L
2
C
()) 244Pe, 282K, 282Xg, 284P; see also L
2
, L
p
, | |
2
L
p
(in L
p
()) 244A, 244Da, 244Eb, 244Pa, 244Xa, 244Ya, 244Yi, 246Xg, 252Yh, 253Xh, 255K, 255Of, 255Ye,
255Yf, 255Yk, 255Yl, 261Xa, 263Xa, 273M, 273Nb, 281Xd, 282Yc, 284Xj, 286A; see also L
p
, L
2
, | |
p
L
p
(in L
p
(), 1 < p < ) 244 (244A), 245G, 245Xk, 245Xl, 245Yg, 246Xh, 247Ya, 253Xe, 253Xi, 253Yk, 255Yh;
see also L
p
, | |
p
L

(in L

()) 243A, 243D, 243I, 243Xa, 243Xl, 243Xn; see also L

C
243K
L

strict
243Xb
L

(in L

()) 243 (243A), 253Yd; see also L

, L

C
, | |

C
243K, 243Xm
L (in L(U; V ), space of linear operators) 253A, 253Xa
lim (in limT) 2A3S; (in lim
xF
) 2A3S
liminf (in liminf
n
) 1A3 (1A3Aa), 2A3Sg; (in liminf
t0
) 1A3D, 2A3Sg; (in liminf
xF
) 2A3S
limsup (in limsup
n
) 1A3 (1A3Aa), 2A3Sg; (in limsup
t0
) 1A3D, 2A3Sg; (in limsup
xF
f(x)) 2A3S
ln
+
275Yd
M
0,
252Yo
M
1,
(in M
1,
()) 244Xl, 244Xm, 244Xo, 244Yd
med (in med(u, v, w)) see median function (2A1Ac)
N see power set
N N 111Fb
T see power set
p.p. (presque partout) 112De
Pr(X > a), Pr(XXX E) etc. 271Ad
Q (the set of rational numbers) 111Eb, 1A1Ef
R (the set of real numbers) 111Fe, 1A1Ha, 2A1Ha, 2A1Lb
R
I
245Xa, 256Ye; see also Euclidean metric, Euclidean topology
R see extended real line (135)
R
C
2A4A
S (in S(A)) 243I; (in S
f
= S(A
f
)) 242M, 244Ha
542 Index SSS
S see rapidly decreasing test function (284A)
S
1
(the unit circle, as topological group) see circle group
S
r1
(the unit sphere in R
r
) see sphere
sf
(in
sf
) see semi-nite version of a measure (213Xc); (in

sf
) 213Xf, 213Xg, 213Xk
T (in T
,
) 244Xm, 244Xo, 244Yd, 246Yc
T
m
see convergence in measure (245A)
T
s
(in T
s
(U, V )) see weak topology (2A5Ia), weak* topology (2A5Ig)
U (in U(x, )) 1A2A
Var (in Var(X)) see variance (271Ac); (in Var
D
f, Var f) see variation (224A)
w

-topology see weak* topology (2A5Ig)


Z (the set of integers) 111Eb, 1A1Ee; (as topological group) 255Xk
ZFC see Zermelo-Fraenkel set theory

r
(volume of unit ball in R
r
) 252Q, 252Xi, 265F, 265H, 265Xa, 265Xb, 265Xe
(in (z)) see gamma function (225Xj)
-system 2A1Pa

G
(standard normal distribution) 274Aa

XXX
see distribution of a random variable (271C)
- Theorem see Monotone Class Theorem (136B)
-additive see countably additive (231C)
-algebra of sets 111A, 111B, 111D-111G, 111Xc-111Xf, 111Yb, 136Xb, 136Xi, 212Xh; see also Borel -algebra
(111G))
-algebra dened by a random variable 272C, 272D
-complete see Dedekind -complete (241Fb)
-eld see -algebra (111A)
-nite measure (space) 211D, 211L, 211M, 211Xe, 212Ga, 213Ha, 213Ma, 214Ia, 214Ka, 215B, 215C, 215Xe,
215Ya, 215Yb, 216A, 232B, 232F, 234B, 234Ne, 234Xe, 235M, 235P, 235Xj, 241Yd, 243Xi, 245Eb, 245K, 245L, 245Xe,
251K, 251L, 251Wg, 251Wp, 252B-252E, 252H, 252P, 252R, 252Xd, 252Yb, 252Yg, 252Yv
-ideal (of sets) 112Db, 211Xc, 212Xe, 212Xh
-subalgebra of sets 233 (233A)

iI
a
i
112Bd, 222Ba, 226A
-additive measure 256M, 256Xb, 256Xc
see normal distribution function (274Aa)
(in A, where A is a set) 122Aa
(the rst innite ordinal) 2A1Fa

1
(the rst uncountable ordinal) 2A1Fc

2
(the second uncountable initial ordinal) 2A1Fc
(in E F, set dierence) 111C
(in EF, symmetric dierence) 111C

(in

nN
E
n
) 111C; (in

/) 1A1F

(in

nN
E
n
) 111C; (in

c) 1A2F
, (in a lattice) 121Xb, 2A1Ad
(in fA, the restriction of a function to a set) 121Eh
see closure (2A2A, 2A3Db)
(in

h(u), where h is a Borel function and u L
0
) 241I, 241Xd, 241Xi, 245Dd
=
a.e.
112Dg, 112Xe, 241C

a.e.
112Dg, 112Xe

a.e.
112Dg, 112Xe
(in ) see absolutely continuous (232Aa)
special symbols General index 543
(in f g, u v, , f, f ) see convolution (255E, 255O, 255Xh, 255Xk, 255Yn)
* (in weak*) see weak* topology (2A5Ig); (in U

= B(U; R), linear topological space dual) see dual (2A4H); (in

) see outer measure dened by a measure (132B)

(in

) see inner measure dened by a measure (113Yh,)

(in T

) see adjoint operator


_
(in
_
f,
_
fd,
_
f(x)(dx)) 122E, 122K, 122M, 122Nb; see also upper integral, lower integral (133I)
(in
_
u) 242Ab, 242B, 242D
(in
_
A
f) 131D, 214D, 235Xe; see also subspace measure
(in
_
A
u) 242Ac
_
see upper integral (133I)
_
see lower integral (133I)
R
_
see Riemann integral (134K)
[ [ (in a Riesz space) 241Ee, 242G
| |
1
(on L
1
()) 242 (242D), 246F, 253E, 275Xd, 282Ye; (on L
1
()) 242D, 242Yg, 273Na, 273Xk
| |
2
244Da, 273Xl, 282Yf; see also L
2
, | |
p
| |
p
(for 1 < p < ) 244 (244Da), 245Xj, 246Xb, 246Xh, 246Xi, 252Yh, 252Yo, 253Xe, 253Xh, 273M, 273Nb,
275Xe, 275Xf, 275Xh, 276Ya; see also L
p
, L
p
| |

243D, 243Xb, 243Xo, 244Xg, 273Xm, 281B; see also essential supremum (243D), L

, L

(in f g) 253B, 253C, 253I, 253J, 253L, 253Ya, 253Yb; (in uv) 253E, 253F, 253G, 253L, 253Xc-253Xg, 253Xi,
253Yd

(in

T) 251D, 251K, 251M, 251Xa, 251Xl, 251Ya, 252P, 252Xe, 252Xh, 253C

(in

iI

i
) 251Wb, 251Wf, 254E, 254F, 254Mc, 254Xc, 254Xi, 254Xs

(in

iI

i
) 254F; (in

iI
X
i
) 254Aa
# (in #(X), the cardinal of X) 2A1Kb

,

(in

f,

f) see Fourier transform, inverse Fourier transform (283A)


+
(in
+
, successor cardinal) 2A1Fc; (in f
+
, where f is a function) 121Xa, 241Ef; (in u
+
, where u belongs to a
Riesz space) 241Ef; (in F(x
+
), where F is a real function) 226Bb

(in f

, where f is a function) 121Xa, 241Ef; (in u

, in a Riesz space) 241Ef; (in F(x

), where F is a real
function) 226Bb
0, 1
I
(usual measure on) 254J, 254Xd, 254Xe, 254Yc, 272N, 273Xb
(when I = N) 254K, 254Xj, 254Xq, 256Xk, 261Yd
see innity
[ ] (in [a, b]) see closed interval (115G, 1A1A, 2A1Ab); (in f[A], f
1
[B], R[A], R
1
[B]) 1A1B
[[ ]] (in f[[T]]) see image lter (2A1Ib)
[ [ (in [a, b[) see half-open interval (115Ab, 1A1A)
] ] (in ]a, b]) see half-open interval (1A1A)
] [ (in ]a, b[) see open interval (115G, 1A1A)
<> (in <x>, fractional part) 281M
(in E) 234M, 235Xe

You might also like